Brand Engagement Reimagined: AI-Powered Sentiment Analysis with SceneXplain

Digging deep into images to uncover sentiment and brand can be a wild ride. Tackle it with SceneXplain, and you've got yourself a business power-up, ready to take the market by storm

Split image with a jovial group toasting with beers on one side and an upset bearded man with a pride flag on the other

We've been putting a lot of work into SceneXplain recently, especially the visual question-answering. If you're not familiar with it, SceneXplain uses AI to generate textual descriptions from images. It's designed with a user-friendly interface and offers API integration, making it accessible for application integration. Instead of simply providing captions, it aims to produce more descriptive narratives about image content.

SceneXplain - Explore image storytelling beyond pixels
Leverage GPT-4 & LLMs for the most advanced image storytelling. Explain visuals for content creators, media, & e-commerce with rich captions, multilingual support, and seamless API integration. Experience the future of image description today.

In this post, we'll explore how we can use the feature to perform image sentiment analysis for brand social listening.

🔤
In a future post, we’ll dive into performing sentiment analysis on text too. In the meantime, it works essentially the same, so feel free to try it out on PromptPerfect!

Due to pesky copyrights and trademarks, we'll avoid any real brands, and use a fictional brand, Duff Beer (from The Simpsons), as our first example.

🖼️
Halfway through writing this post, I found some brand photos I can use for free. I'll include some of those after the Duff Beer, and we'll use SceneXplain's API to do our sentiment analysis.

What is sentiment analysis?

I'm more of a geek than a marketer, so I'll let Wikipedia do the talking:

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

One of the key challenges in sentiment analysis is removing noise from the image we're looking at. If you're looking at a picture of happy people drinking beer at an outdoor festival, you generally don't care about the shape of the clouds or whether the bartender is wearing a T-shirt or a polo shirt. You care about the people (are they happy? Are they miserable?) and the brand of the beer (is it your brand or a competitor's?)

Visual question answering for understanding images

Let's start by asking a simple question about an image: Is there beer in this picture?

And the output from SceneXplain is:

Yes, there are bottles and cans of Duff beer visible in the image.
Homer Simpson is holding a bottle of Duff beer, and a can of 
Duff beer is resting on the grass in front of an open field.

That's nice but not super useful if we want to do batch analysis on images. In a case like that, we'd want a standardized output so we can perform deeper analysis later on. We'd want something like True or False. So let's change the prompt to Return "True" if there is beer in the picture. Otherwise, return "False". Now SceneXplain simply returns True for the image above:

🧠
In cases where we want very simple output, the default model may give too much information. I changed to the "Flash" model for the above example.

We can also do things like getting a detailed description of the image:

Please identify and describe the beer brand, location and atmosphere 
from the photo. Your answer should provide enough detail and 
description to allow us to understand the scenario accurately. 
You can provide any relevant information such as backstory, 
scene analysis, participant interactions, etc.

That gives us:

The beer brand in the image is the iconic Duff beer, which is a
fictional beer brand featured in the animated television series,
The Simpsons. The location is an outdoor setting with a whimsical
blue sky, fluffy clouds, and the Simpsons family gathered in a
grassy field within the woods. The atmosphere is lively and
entertaining, with Homer Simpson enthusiastically holding a bottle
of Duff beer in front of an entertained crowd. The lush greenery
of the scene contrasts with the eccentrically colorful animation
scheme, capturing the spirit of The Simpsons. In the background,
the iconic Simpsons logo assumes the word "Duff," emphasizing
the refreshing energy of the moment. The family, including
Marge, Bart, Lisa, and Maggie, stands together, collectively
enjoying the outdoors.

As you can see, there's a little bit of hallucination going on (as is normal with these kinds of AI models). I can't see any of the Simpsons’ family members (outside of Homer) or the logo assuming anything. But the general gist is there, with the details that are important for sentiment analysis.

But enough with the cartoons - let’s test this on some real photos and in a real-world use case via the API and working with JSON.

Analyzing brand sentiment via API

Alas, we do not live in Springfield like the Simpsons. No dodgy nuclear reactors or Duff Beer for us. But we can make do with some real-world brands to quench our thirst.

We'll use SceneXplain's API to easily analyze some photos in bulk. In our case we'll use Python, though JavaScript and cURL access are also available. Here are the images we'll analyze:

To follow along with this code, you'll need a secret key, which you can generate from SceneXplain's API page:

Now for some code. First, let's set our question:

question = """
Identify the brands, scene attributes and sentiment categories (positive, negative, neutral), 
and provide a response in the following JSON format:

---format begin---
{"brands": [brand], "scene": scene, "sentiment": sentiment}
--format end---
"""
đź’ˇ
We're strictly specifying the answer format that we want using a single example for in-context learning. This ensures consistent, parsable output.

Next up, we pass along our image URLs, setting the question_answer flag and question:

data = {
    "data": [
        {"image": "https://images.pexels.com/photos/1267305/pexels-photo-1267305.jpeg", "features": ["question_answer"], "question": question},
        {"image": "https://images.pexels.com/photos/2076755/pexels-photo-2076755.jpeg", "features": ["question_answer"], "question": question},
        {"image": "https://images.pexels.com/photos/3922817/pexels-photo-3922817.jpeg", "features": ["question_answer"], "question": question},
        {"image": "https://images.pexels.com/photos/3922818/pexels-photo-3922818.jpeg", "features": ["question_answer"], "question": question},
        {"image": "https://images.pexels.com/photos/5935229/pexels-photo-5935229.jpeg", "features": ["question_answer"], "question": question}
    ]
}

And now we pass all of that to SceneXplain:

import http.client
import json

headers = {
    "x-api-key": f"token {YOUR_GENERATED_SECRET}",
    "content-type": "application/json",
}

connection = http.client.HTTPSConnection('us-central1-causal-diffusion.cloudfunctions.net')
connection.request("POST", "/describe", json.dumps(data), headers)
response = connection.getresponse()

response_data = response.read().decode("utf-8")
scenes = json.loads(response_data)['result']

connection.close()

While our response object is JSON, within that response is SceneXplain's answer to our question. This is always in a plain string, so let's cast that to JSON to make it easier to work with:

for scene in scenes:
    scene['answer'] = json.loads(scene['answer'])

Last but not least, let's see what we've got:

from pprint import pprint

pprint([scene['answer'] for scene in scenes])

Here's the output:

[{'brands': ['Hot Lizard IPA'],
  'scene': 'upscale pub with patrons enjoying drinks and ambiance, featuring '
           'neon beer signs and a bartender pouring beer into a glass with '
           'precision',
  'sentiment': 'positive'},
 {'brands': ['Heineken'],
  'scene': 'urban street scene with a young woman holding a beer bottle and a '
           'skateboarder moving behind her, drinking Heineken beer',
  'sentiment': 'neutral'},
 {'brands': ['Heineken'],
  'scene': 'friends celebrating with Heineken beer and wine glasses, creating '
           'a warm and joyful atmosphere',
  'sentiment': 'positive'},
 {'brands': ['Budweiser'],
  'scene': 'social gathering in a cozy, dimly lit lounge with friends raising '
           'glasses of frothy golden beer in joyful celebration',
  'sentiment': 'positive'},
 {'brands': [],
  'scene': "rooftop party with friends enjoying each other's company and city "
           'view',
  'sentiment': 'positive'}]

As you can see, it's pretty good:

  • It usually picks up the brand and sentiment, though there is the occasional hallucination when it comes to the scene descriptions.
  • In cases where there is no visible beer brand (like in the crowd image of people raising their bottles), it returns an empty list for brands.
  • It did screw up the 'Hopadillo' brand, consistently calling it 'Hot Lizard' during my testing. This may be down to it being a niche brand (with limited training data) and the image model doing the best of a bad job by guessing "Hot Lizard", which are actually words in its training set.

Detecting brand variants and localization

Out of interest, I looked into how well this works with branding edge cases - for example a Chinese Coca-Cola can, or a customized Coca-Cola can:

For those, we got the results:

[{'brands': ['Coca-Cola'],
  'scene': 'a display of Coca-Cola cans adorned with Chinese writing featuring '
           'the iconic Coca-Cola logo on a sleek black surface',
  'sentiment': 'positive'},
  {'brands': ['Coca-Cola'],
  'scene': 'A coffee cup and a can of Coca-Cola on a blue surface with a red '
           "background and 'My Hero' sign in the back.",
  'sentiment': 'neutral'}]

As we can see, brand detection is on point. We get good results for testing negative sentiments too.

With code like the above and SceneXplain's API, you can easily do bulk analysis of images to get deeper insights into how consumers see your brand. This allows you to identify prevalent patterns, trends, or themes that may be influencing consumer perception and engagement. By understanding this, you can tailor your marketing strategy to enhance positive aspects and address any identified issues, ultimately optimizing your brand's reach and impact.

What else can SceneXplain do?

Dive into some of our prior blog posts to find out!

Making Visuals Vocal: SceneXplain’s Impact on Product Image Accessibility
SceneXplain transforms product images into audio descriptions, ensuring visual content isn’t just seen, but also heard and understood. It’s a step forward in creating an inclusive digital world for everyone
SceneXplain vs. MiniGPT4: A Comprehensive Benchmark of Top 5 Image Captioning Algorithms for Understanding Complex Scenes
Uncover the future of image captioning as SceneXplain and its rivals face off in an epic showdown. Explore their impact on accessibility, SEO, and storytelling, and dive into our intriguing results to witness the cutting-edge capabilities of these algorithms.
Enhancing Digital Accessibility: How SceneXplain Transforms Multimedia Content for Public Sector Organizations
Explore SceneXplain’s impact on digital accessibility, providing exceptional image descriptions and ensuring compliance with European standards for public sector organizations.
SceneXplain: Unleash the Advanced Image Captioning & Storytelling
Uncover the game-changing potential of SceneXplain, an advanced image captioning solution powered by LLMs. Check out the benchmark against Midjourney, CLIP, BLIP2, and other alternatives. Dive into our blog post and experience the revolution firsthand!

Get started with SceneXplain and sentiment analysis

To analyze those sentiments and optimize your brand strategy, sign up on SceneXplain now, and join the discussion on our Discord server.