Read My Pics: SceneXplain Puts OCR in Your Visual Question Answering!

We're turbocharging SceneXplain's visual question answering with an OCR upgrade, making it easier than ever to get answers out of your images

Hand holding a smartphone with a chat app inquiring about a neon 'Berlin' sign, with shelves of bottles in the background

Back in June, we introduced visual question answering in SceneXplain, letting you upload an image and ask questions about it:

View on SceneXplain

Now we're bringing a new model to the table, Glide. Glide reads and understands the text in images too, integrating OCR into the question-answering experience:

View on SceneXplain

How do I use OCR in visual question answering?

The process is exactly the same as for visual question answering in general. It's baked right in, meaning you don't have to toggle any additional options. Just enable "Visual question answer" and you're off to the races!

First, upload your image:

We're using this image from Pexels.com

Then click on the speech bubble to ask your question:

Then click the button to send your data and, voila, your question will be answered!

View on SceneXplain

What's this good for?

  • Accessibility for visually impaired users: SceneXplain can be used to provide descriptive text for images online or in digital documents, helping visually impaired users understand the content more effectively.
  • Search Engine Optimization (SEO): By providing accurate and detailed descriptions for images, SceneXplain can help improve the SEO of websites, as search engines rely heavily on text data. This is especially useful for products like apparel with brand names, slogans, or other text; or infographics that incorporate text into images. This becomes especially convenient when you perform batching via SceneXplain’s API, which we cover in this post.
  • Education: Teachers can use SceneXplain to create descriptive text for diagrams, illustrations, and other visual aids, making it easier for students who prefer or require reading over visual learning.
  • Brand sentiment analysis: You can use SceneXplain to analyze images and associated text on social media platforms to better understand user sentiment and trends or to identify inappropriate content. This goes a step beyond regular brand sentiment analysis since it can detect brand logo text which would otherwise have been outside the model’s training set.
  • Translation: When combined with a translation tool, SceneXplain can be used to read and translate text in images from one language to another, aiding in international communication and travel in foreign countries.

OCR in action

The most obvious question you can ask is "What is the text in this image?"

View on SceneXplain

But we're working with a large language model (LLM) here. Why limit ourselves to something so basic?

We can do things like translating graffiti:

View on SceneXplain

...or correcting the translation of one language into another:

View on SceneXplain

...or read signs:

View on SceneXplain

...or just get specific information from a sign:

View on SceneXplain

Maybe you're completely out of touch with geek culture and don't understand why something is funny. SceneXplain has got your back:

View on SceneXplain

...and we can even ask it to describe what we're pointing at. Forget point and click - now you've got point and explain!

View on SceneXplain

Get started

OC-are you ready to get started with SceneXplain? Head on over to scenex.jina.ai to create your account, and start uploading your images and asking your questions. Need support or want to share your results? You can do that on our Discord channel.

SceneXplain - Explore image storytelling beyond pixels
Leverage GPT-4 & LLMs for the most advanced image storytelling. Explain visuals for content creators, media, & e-commerce with rich captions, multilingual support, and seamless API integration. Experience the future of image description today.

Read more

Want to dig deeper? We've written a bunch of other posts about what you can do with SceneXplain:

Brand Engagement Reimagined: AI-Powered Sentiment Analysis with SceneXplain
Digging deep into images to uncover sentiment and brand can be a wild ride. Tackle it with SceneXplain, and you’ve got yourself a business power-up, ready to take the market by storm
Making Visuals Vocal: SceneXplain’s Impact on Product Image Accessibility
SceneXplain transforms product images into audio descriptions, ensuring visual content isn’t just seen, but also heard and understood. It’s a step forward in creating an inclusive digital world for everyone
SceneXplain vs. MiniGPT4: A Comprehensive Benchmark of Top 5 Image Captioning Algorithms for Understanding Complex Scenes
Uncover the future of image captioning as SceneXplain and its rivals face off in an epic showdown. Explore their impact on accessibility, SEO, and storytelling, and dive into our intriguing results to witness the cutting-edge capabilities of these algorithms.
Enhancing Digital Accessibility: How SceneXplain Transforms Multimedia Content for Public Sector Organizations
Explore SceneXplain’s impact on digital accessibility, providing exceptional image descriptions and ensuring compliance with European standards for public sector organizations.
SceneXplain: Unleash the Advanced Image Captioning & Storytelling
Uncover the game-changing potential of SceneXplain, an advanced image captioning solution powered by LLMs. Check out the benchmark against Midjourney, CLIP, BLIP2, and other alternatives. Dive into our blog post and experience the revolution firsthand!