Enhancing Digital Accessibility: How SceneXplain Transforms Multimedia Content for Public Sector Organizations

Explore SceneXplain's impact on digital accessibility, providing exceptional image descriptions and ensuring compliance with European standards for public sector organizations.

Digital painting of a woman's expressive face with multi-colored geometric patterns and contrasting eye colors

On a crisp autumn morning in Paris, Marie, a visually impaired woman in her early thirties, settles down in her favorite café with a warm cup of coffee. She eagerly opens her smartphone and navigates to a popular news website using her screen reader. As Marie listens to the headlines, she stumbles upon an article about a groundbreaking art exhibition.

"I wonder what the artwork looks like," she thinks aloud, as the screen reader describes the images as "Photo 1" and "Photo 2."

Frustrated, she sighs, "If only I could get a sense of the art, too."

Marie's experience is not unique; millions of visually impaired individuals worldwide face similar challenges in accessing digital content. According to the World Health Organization, there are approximately 285 million visually impaired people worldwide, with 39 million being blind. Digital accessibility is a critical issue, especially for the public sector. Governments and organizations are responsible for ensuring that their digital platforms are inclusive and accessible to all citizens, regardless of their abilities. A 2020 report by the European Commission found that only 34% of public sector websites in the EU met basic accessibility standards, highlighting the urgent need for improvement.

Vision impairment and blindness
WHO fact sheet on blindness and visual impairment providing key facts, definitions, causes, who is at risk, global and WHO response.
Monitoring e-accessibility in Europe: 2010-2011 : annual report.
Developments in European Union (EU) policies over the past decade have increasingly acknowledged that inclusive Information and Communication Technologies (ICT) and Assistive Technologies (AT) are essential to ensuring full participation in society for many of the population segments at risk of digi…

One of the key components of web accessibility is providing descriptive information for images, commonly known as "alt text" or "image descriptions." A well-written image description can convey the essence of an image, allowing visually impaired users to comprehend the content and context of the image. However, even for seasoned content creators, crafting accurate and informative image descriptions is not always easy. This is where SceneXplain, a groundbreaking AI-driven solution, steps in to revolutionize how we create accessible multimedia content.

This article will explore the significance of digital accessibility, the role of image descriptions, and the challenges with existing solutions. We will then introduce SceneXplain, its advantages, and its potential applications in enhancing digital accessibility for the public sector. We will also discuss how SceneXplain can help organizations comply with European accessibility regulations and provide a roadmap for implementing SceneXplain in public sector websites and applications.

The Role of Image Descriptions in Web Accessibility

Image descriptions, also known as "alt text" or "alternative text," play a vital role in web accessibility by providing a textual representation of the visual content for users who cannot perceive it. The concept of image alt text dates back to the early days of the World Wide Web. As an example, an HTML code snippet for including an image with alt text would look like this:

A tranquil winter scene featuring a sunset over a snow-capped mountain range, with the vibrant orange hues of the dusk sky reflecting in the still waters of a lake
<img src="image.jpg" alt="A beautiful sunset with the sun setting behind a mountain">

In this example, the alt text "A beautiful sunset with the sun setting behind a mountain" briefly describes the image, which can be read by assistive technologies or displayed when the image fails to load.

According to the Web Content Accessibility Guidelines (WCAG) 2.1, providing text alternatives for non-text content is one of the fundamental principles of web accessibility. This ensures that users who rely on assistive technologies, such as screen readers, can access and understand the visual information on websites and applications. Image descriptions are crucial in providing an inclusive and equitable online experience for people with visual impairments, those with cognitive and learning disabilities, or those with limited bandwidth or slow internet connections.

Web Content Accessibility Guidelines (WCAG) 2.1
Web Content Accessibility Guidelines (WCAG) 2.1 covers a wide range of recommendations for making Web content more accessible. Following these guidelines will make content more accessible to a wider range of people with disabilities, including accommodations for blindness and low vision, deafness an…

Despite the importance of image descriptions, a study by WebAIM found that only 66.3% of images on the top one million websites had alt text attributes in 2020. This demonstrates that there is still considerable room for improvement in ensuring that websites and applications are accessible to all users. Manual creation of image descriptions is often time-consuming and can be inconsistent in quality.

WebAIM: The WebAIM Million - The 2023 report on the accessibility of the top 1,000,000 home pages

As we move into the era of artificial intelligence, there have been efforts to leverage AI-driven solutions to create image descriptions more efficiently. However, these solutions have their own set of limitations and shortcomings. In the following section, we will explore the challenges with existing image captioning solutions and discuss the need for more advanced and reliable tools to support digital accessibility efforts.


Challenges with Existing Image Captioning Solutions

To understand the challenges with existing image captioning solutions, we will examine the common issues faced by both industry tools and academic approaches to image description generation.

Overly Simplified or Generic Captions

One significant issue with existing image captioning solutions is the generation of overly simplified or generic captions. While these captions may be accurate in a broad sense, they often lack the necessary detail or context to provide a comprehensive understanding of the image. This shortcoming can be especially problematic for users relying on alt text for image comprehension.

A tranquil outdoor scene with a young man wearing a dark jacket and beanie, sitting cross-legged on a blanket beside his sandy-colored dog, which is also sitting and facing the camera. Next to them is a green tent with a backpack at its foot, set against a backdrop of trees and shrubs indicative of a camping area
"A man and a dog" is given by most image captioning algorithms where there are more objects and stories behind the scene. Why are they camping? What are they doing outside, and does the backpack on the right suggest anything? Source

Missing Nuances and Relationships

Another challenge current image captioning algorithms face is their inability to capture the nuanced relationships between objects in a scene. In many cases, these solutions may accurately identify individual objects but fail to describe the interactions or spatial relationships between them, resulting in a less informative image description.

An instructional illustration depicting a series of three-dimensional geometric shapes, primarily cubes, in varying sizes and colors (blue and red). These shapes are labeled with capital letters like A and B and demonstrate different spatial relationships as follows: disjoint, meet, equal, overlap, inside, contains, cover, and covered. Each relationship type is visually represented by configurations of the colored shapes and accompanied by a corresponding label written in close proximity
A captioning algorithm that simply emits "Object A and Object B" is far from enough.The spatial relation between them conveys very different meanings. Source

Handling Noise and Bad Image Quality

Existing image captioning solutions often struggle with handling noisy or poor-quality images. In cases where the image has artifacts, low resolution, or poor lighting, these algorithms may fail to identify the correct objects or actions in the scene, leading to inaccurate or irrelevant captions. Users who rely on image descriptions for accessibility may face difficulties in understanding the true context of the image when the generated captions fail to accurately describe the content due to noise or bad quality.

An analytical image featuring a panda with adjacent numerical data and classification labels. In the center, results are marked with mathematical symbols and confidence levels showing "panda" at 57.7% confidence, "nematode" at 8.2% confidence, and "gibbon" at 99.3% confidence. This suggests the image is related to a machine learning or statistical analysis process, depicting the probability of the panda being classified into each category
The original image is "attacked" using the adversarial perturbation shown in the middle; as a result, the image caption algorithm mis-classifies the example although the change is ‐ for human eyes ‐ not perceivable. Source: Daniel Jakubovitz and Raja Giryes. Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization. In: arXiv.org abs/1803.08680 (2018).

Difficulty Handling Complex Images

Many existing image captioning algorithms struggle to generate accurate and meaningful captions for complex images. When faced with intricate scenes that contain multiple objects, actions, or interactions, these solutions may produce descriptions that fail to capture the depth and richness of the images they describe.

A painting by Sandro Botticelli titled "The Birth of Venus," depicting Venus, a slender nude woman, centered on a seashell against a green, lush backdrop. Surrounding figures, including angels, contribute to the sense of a magical, mythological scene
The caption "The Birth of Venus by Botticelli" is a common output provided by many image captioning algorithms. While it may not be technically incorrect, such a description holds little meaning for a visually impaired person. This caption fails to convey sufficient information to allow them to comprehend the image truly. As an image alt tag, this description falls short in its ability to promote digital accessibility and provide meaningful insight into the visual content.
👓
The above image described by SceneXplain

A captivating rendition of the iconic "Birth of Venus" painting unfolds before the eyes, as the goddess Venus emerges from a seashell. The ethereal beauty is surrounded by mythical beings and celestial figures, including a mermaid, an angel, and a woman bearing a bouquet of flowers. The delicate interplay of these characters creates a sense of enchantment and wonder, as they appear to celebrate the arrival of Venus into the world. This masterful composition invites viewers to delve into the realm of mythology and marvel at the splendor and grace that define this timeless scene.

In summary, existing image captioning solutions from the industry and academia have made significant progress in generating relevant and accurate captions. However, they often fail to provide the detail, context, and nuance necessary for truly accessible image descriptions. This is where SceneXplain can make a real difference, offering a more advanced solution for generating comprehensive and context-rich image descriptions that enhance digital accessibility.

How SceneXplain addresses the challenges in creating accessible multimedia content

SceneXplain is a cutting-edge AI-driven service that generates sophisticated textual descriptions and reveals the rich narratives hidden within images. It transcends the limitations of conventional captioning algorithms, delivering engaging, concise, and professional image storytelling experiences. With a user-friendly interface, seamless API integration, and robust multilingual support, SceneXplain empowers developers to incorporate its advanced capabilities into their multimodal applications effortlessly.

A promotional webpage for the ScanXplain platform, featuring a vibrant layout with a light pink header that includes the platform name in black bold font and a tagline encouraging the exploration of image storytelling beyond pixels. The page showcases an assortment of images and content, creating an inviting and visually engaging experience

Outperforming the Competition: SceneXplain's Consistent Quality and Performance

SceneXplain's AI algorithms set the benchmark for consistent quality and performance in image captioning. In a comprehensive comparison with other popular image captioning algorithms and tools, SceneXplain consistently outperformed its competitors, capturing the most detail and context in its descriptions.

Benchmark of SceneX, Midjourney, BLIP2, CLIPInterogator2
Sheet1 id,image,SceneX,Midjourney /describe,CLIP Interrogator 2.1,BLIP2 tE8WEjOKtuWf7fInIofg,https://storage.googleapis.com/causal-diffusion.appspot.com/imagePrompts%2F0rw369i5h9t%2Foriginal.png,In a heartwarming scene, Totoro and a young girl find themselves caught in the rain. The girl, equipp…
SceneXplain: Unleash the Advanced Image Captioning & Storytelling
Uncover the game-changing potential of SceneXplain, an advanced image captioning solution powered by LLMs. Check out the benchmark against Midjourney, CLIP, BLIP2, and other alternatives. Dive into our blog post and experience the revolution firsthand!

SceneXplain accurately depicts spatial relationships between objects and characters within a scene, creating rich and engaging storytelling that brings images to life. Although there is a minor tendency to hallucinate certain aspects of the scene, this can be attributed to the model's training on a vast amount of data, enabling it to make educated guesses based on similar scenarios it has encountered. Despite this, SceneXplain's performance is unrivaled, providing clear and meaningful image descriptions suitable for use as image alt tags or accessibility text for visually impaired individuals.

In contrast, solutions like Midjourney /describe and CLIP Interrogator 2.1 focus on generating creative prompts for images, rather than easily readable natural language descriptions. Meanwhile, BLIP2 generates short and concise captions, which may be suitable for simple scenes but often fail to capture more complex visuals' nuances and intricate details. Some examples are shown below; full 30 images evaluation can be found here.

An illustration of "Liberty Leading the People" by Delacroix, featuring the personification of Liberty centrally, as she leads a group in the July Revolution. She is holding aloft the French Tricolor flag in her left hand and a bayonetted musket in her right, symbolizing the fight for freedom. Around her are diverse figures that represent the populace of the revolution
SceneX: In a scene replete with historical and emotional significance, a woman proudly hoists the French flag, symbolizing the Liberation of Paris. A group of men brandishing weapons stands nearby, prepared to defend their cause. Among them, a young boy clutches his gun with determination. The tableau is interspersed with poignant scenes, such as the Crucifixion of Jesus and groups of people lying on the ground, evoking a sense of sacrifice and resilience. This powerful image serves as a testament to the human spirit and its capacity for triumph in the face of adversity.Midjourney /describe: the liberation of paris in 1832 by gustave lutin, in the style of feminist art and contemporary feminist, light navy and dark bronze, heavy use of palette knives, irony, trompe-l'œil illusionistic detail, rangercore, joyful chaos --ar 58:45BLIP2: liberty leading the peopleCLIP Interrogator 2.1: a painting of a woman holding a flag, an album cover, by Eugene Delacroix, flickr, arms held high in triumph, iconic shot, menacing!!!, trending photo
A playful and bold OkCupid billboard advertisement set against a blue sky with white clouds. The ad features a pink background with the tagline "DTF All Head Over Heels" in bright yellow lettering, and depicts two women – one blond and one brunette. The brunette is pictured with a tilted head while the blond woman is holding a red rose. Below them, the phrase "Dating Deserves Better" complements the OkCupid logo
SceneX: An eye-catching billboard displays the phrase "DTF Head Over Heels," capturing the attention of passersby with its vibrant imagery and bold statement. The advertisement features various romantic scenarios, including two women sharing a tender kiss against a backdrop of a pink sky, a man and woman embracing in a heartfelt hug, and another pair of women hugging affectionately. The billboard also showcases more unconventional elements, such as a close-up of a woman's feet and the silhouette of an expectant mother. An OkCupid dating design adds context to these diverse depictions of love and relationships, inviting viewers to contemplate the vast spectrum of human connection and emotion embodied in this striking visual display.Midjourney /describe: the billboard dtf head-over-heels , in the style of feminine imagery, humor meets heart, humor meets heart, colorized, playful imagery, hikecore, sandalpunk --ar 2:1BLIP2: a billboard with the words dtf all over heelsCLIP Interrogator 2.1: a billboard that says dtf all head over heels, by Adam Dario Keel, cupid, ocult, product placement, attractive female
An active urban street in Berlin with tall buildings lining both sides, bustling with pedestrians and street trams. The atmosphere suggests a vibrant city life with people going about their daily routines, reflecting the dynamic energy of a major metropolis
SceneX: A bustling city street teems with life as pedestrians traverse the urban landscape, surrounded by an array of buildings that impart a sense of metropolitan charm. The heart of Berlin, Germany, reveals itself in this dynamic scene, where people walk along the sidewalks and a tram awaits passengers at a nearby train station. The vibrant atmosphere is palpable as individuals go about their daily routines, navigating the streets and engaging with their surroundings. This lively tableau encapsulates the essence of city living, showcasing the energy and excitement inherent in urban environments.Midjourney /describe: a busy city street with a bus driving, in the style of deutscher werkbund, consumer culture critique, impressive panoramas, award-winning, poignant, passage, hypnotic symmetry --ar 50:33BLIP2: a busy city street with people walking and cars drivingCLIP Interrogator 2.1: a street filled with lots of traffic next to tall buildings, by Micha Klein, shutterstock, berlin secession, tram, stock photo, people at work, stock image
An etching of an Elizabethan play showing a lively theatrical performance with two central actors engaging in a dramatic sword fight on stage. Audience members are depicted in various states of engagement, some seated on benches and others standing, within a semi-circular indoor setting adorned with marble columns. The scene exudes an air of historical entertainment and communal enjoyment
SceneX: In a bustling medieval theater, two valiant knights clad in armor engage in a thrilling joust, their lances poised for impact. The audience, dressed in period attire, watches the spectacle with bated breath. Among the onlookers are groups of men and women adorned with hats, their keen eyes following the action. Behind the scenes, a pigeon rests in its cage, seemingly indifferent to the excitement unfolding before it. This riveting scene transports viewers to a bygone era of chivalry and pageantry, where honor and bravery were celebrated through thrilling contests of skill and strength.Midjourney /describe: a drawing shows two men fighting in an arena, in the style of quirky caricatures, lively interiors, swordpunk, watercolor technique, school of london, ritualistic masks, grid-based --ar 32:21BLIP2: a drawing of men fighting in the middle of a crowdCLIP Interrogator 2.1: a couple of men standing next to each other on a stage, by George Cruikshank, shutterstock, renaissance, facing off in a duel, in medieval armoury, school class, still image from the movie

SceneXplain's advanced capabilities and versatility make it a powerful tool for a wide range of applications, from providing image alt tags for accessibility to enhancing user experiences with engaging visual storytelling. With its consistent quality and performance, SceneXplain is the best choice for organizations seeking to improve digital accessibility through accurate, detailed, and contextually rich image descriptions.

Resilience to noise and varying image quality

SceneXplain's robust AI algorithms have been designed to handle varying image quality and noise levels, ensuring that the generated descriptions remain accurate and relevant even when the input image is less than ideal.

Multilingual Support for Enhanced Accessibility

One of the key features of SceneXplain that sets it apart from other image captioning solutions is its robust multilingual support. Catering to the diverse linguistic landscape of the European Union and beyond, SceneXplain can generate context-rich image descriptions in multiple languages, making it an invaluable tool for enhancing accessibility on a global scale.

0:00
/0:06

This multilingual capability allows organizations to provide accurate and engaging image descriptions to users of different languages, ensuring their content is accessible to a broader audience. By offering this linguistic versatility, SceneXplain demonstrates its commitment to fostering an inclusive digital environment that embraces the rich cultural and linguistic diversity of the European Union and the world.

By addressing these key challenges, SceneXplain is poised to revolutionize the way accessible multimedia content is created and consumed. Its advanced capabilities hold great potential for enhancing the digital accessibility landscape, making the web a more inclusive space for all users.

Ensuring Compliance with European Accessibility Standards

Digital accessibility has become a priority for governments and organizations around the world. In Europe, several regulations and guidelines have been established to ensure digital content is accessible to all users, including those with disabilities. The Web Content Accessibility Guidelines (WCAG) are globally recognized recommendations for making web content more accessible. Additionally, the European Union has implemented the Web Accessibility Directive, which requires public sector organizations to adhere to WCAG 2.1 Level AA standards.

Digitale Barrierefreiheit München | Rechtliche Grundlagen
Im Zusammenhang mit digitaler Barrierefreiheit werden viele Richtlinien und Verordnungen genannt. Nachfolgend eine kurze Erläuterung der wichtigsten Begriffe: BITV 2.0, WCAG 2.1, EU-Richtlinie 2016/2102
Web Accessibility
What is web accessibility? Web accessibility allows everyone, including people with disabilities, to perceive, understand, navigate and interact with the Internet.

According to the European Commission, around 80 million Europeans have a disability. With the growing need for accessible content, non-compliance with accessibility regulations can lead to lawsuits and fines. In 2018, a study conducted by the European Disability Forum found that 89% of European public sector websites failed to meet accessibility standards, highlighting the urgent need for improvement.

Study supporting the review of the Web Accessibility Directive
The aim of the study is to support the review of the Web Accessibility Directive and its implementing acts, covering the first three years of their application (2018-2021).
Web Accessibility Directive - European Disability Forum
Web Accessibility Directive

How SceneXplain Helps Organizations Meet Accessibility Requirements

SceneXplain is designed to help organizations comply with European accessibility standards by providing accurate, detailed, and contextually rich image descriptions. By integrating SceneXplain's advanced image captioning capabilities, public sector websites and applications can generate image alt tags and other accessible multimedia content that adheres to the WCAG 2.1 Level AA guidelines. This ensures visually impaired users can better understand and engage with the content, resulting in a more inclusive digital experience.

SceneXplain's multilingual support further enhances its potential to meet accessibility requirements in the European context. By providing context-rich translations for image descriptions, SceneXplain enables organizations to cater to the diverse linguistic needs of users across the EU, ensuring that content remains accessible in multiple languages.

Benefits of Using SceneXplain in Public Sector Websites and Applications

There are several benefits to incorporating SceneXplain into public sector websites and applications:

  1. Improved Accessibility: SceneXplain's advanced image captioning algorithms generate detailed and accurate descriptions, ensuring that visually impaired users can better understand and engage with multimedia content.
  2. Compliance with Regulations: Integrating SceneXplain helps public sector organizations comply with the Web Accessibility Directive and WCAG 2.1 Level AA guidelines, minimizing the risk of non-compliance penalties. In recent years, several high-profile lawsuits have involved website accessibility, emphasizing the importance of adhering to these guidelines to avoid legal repercussions.
  3. Enhanced User Experience: By providing engaging and informative image descriptions, SceneXplain enriches the overall user experience for all users, not just those with disabilities.
  4. Multilingual Support: SceneXplain's ability to generate context-rich translations for image descriptions enables organizations to cater to the diverse linguistic needs of users across the EU, further enhancing accessibility.

By leveraging SceneXplain's advanced image captioning capabilities, European public sector organizations can ensure compliance with accessibility standards and create a more inclusive digital environment for all users. The urgency of adhering to these regulations is highlighted by the growing number of people with disabilities, the risk of lawsuits, and the need to provide everyone with equal access to digital content.

Integration Process of SceneXplain with Public Sector Websites and Applications

Integrating SceneXplain with public sector websites and applications can significantly enhance digital accessibility. SceneXplain offers multiple integration options to cater to the diverse needs of organizations. Here, we will explore three ways to integrate SceneXplain:

Create Accessible Image via SceneXplain UI

SceneXplain's user-friendly web interface allows organizations to quickly and easily generate image descriptions without the need for extensive technical knowledge. By uploading images directly to the web platform, users can obtain detailed textual descriptions that can then be incorporated into their website's image alt tags or other accessibility features. This approach is ideal for organizations with limited technical resources or those who require occasional image captioning services.

A screenshot showing side-by-side displays of a computer screen and a smartphone, both featuring an interactive translation platform interface. Various snippets of text in English, French, German, and Chinese are presented among images depicting social and urban scenes. Interface options such as "Scenes," "Settings," and "Credits" are visible, indicating functionality to control the translation experience
You can use SceneXplain on either desktop or mobile device.

Batch Process Images via SceneXplain API

For organizations seeking an automated and seamless integration, SceneXplain offers a robust, scalable, and secure API for their systems. The fast batch API allows describing up to 128 images in one batch within 40 seconds, ideal for those requiring high volumes of image captioning services.

A line chart comparing the time performance of processing batch sizes from 1 to 128 images with "high-quality mode on" and "high-quality mode off." The y-axis measures time in seconds, with lower times indicating better performance. While the "high-quality mode off" yields higher times across all batch sizes, both modes converge around the same performance beyond a batch size of 8. Additional context from the chart notes the benchmarking is an average over 10 runs, conducted end-to-end from a German client to a server, and highlights that high-quality mode aims to produce more accurate text descriptions

Organizations can automatically generate image descriptions by connecting SceneXplain's API to websites or applications, ensuring accessibility compliance with European standards. The API integration suits organizations with dedicated technical teams or those needing large-scale image captioning. Secure data handling ensures privacy and data protection during integration, providing a seamless experience while enhancing digital accessibility for all users.

A screenshot of a web page detailing an API access interface with highlighted sections on the best practices for security, such as not sharing secret API keys and how these keys might be rotated if leaked. Instructions are included for generating new API secrets and making programmatic calls to an API endpoint for batch processing images, with specified limits based on user subscription levels. Technical details include code snippets and examples of how to set up the API call, required data structures, and additional available features

Privacy-Preserved Solution in On-prem

SceneXplain also provides an on-premises solution for organizations that prefer to maintain full control over their data and systems. By deploying SceneXplain on their servers, organizations can ensure that their sensitive data remains within their network while benefiting from the advanced image captioning capabilities of SceneXplain. This option is particularly relevant for public sector organizations with strict data security and privacy requirements.

Each integration option offers unique benefits and caters to different organizational needs. By choosing the most suitable integration method, public sector organizations can effectively harness the power of SceneXplain to enhance digital accessibility and comply with European accessibility regulations.

Conclusion

In conclusion, SceneXplain is a groundbreaking solution advancing digital accessibility in the public sector. By leveraging its advanced AI capabilities, SceneXplain generates precise, context-rich, and engaging image descriptions, ensuring that multimedia content is accessible to all users, including those with visual impairments.

An illustration of a diverse group of 16 people exhibiting inclusivity and camaraderie, gathered around a table with laptops, in a warm and positive atmosphere

As digital accessibility becomes increasingly essential, public sector organizations must adopt innovative tools like SceneXplain to meet the demands of an inclusive online experience. Integrating SceneXplain not only ensures compliance with European accessibility standards but also enhances the overall user experience for all citizens.

Now is the time for public sector organizations to take the initiative and harness the power of SceneXplain to create a more inclusive digital landscape. Embrace the change and join the movement towards a more accessible and equal online world for everyone.