Tech Blog

Bypass Limitations with PromptPerfect: Generate the Images the Models Don’t Want You to See

See how PromptPerfect overcomes restrictions and limitations of image generation models like Stable Diffusion XL and DALL-E 3.

Alex C-G

May 22, 2024 • 10 min read

💡

Calm down, we’re not focusing on those kind of images (whatever you think those are).

Let’s cut straight to the point: Sometimes you want to generate a perfectly innocent image, and a model (like DALL-E 3 or Stable Diffusion XL) either flat-out refuses or comes up with something totally wrong. PromptPerfect helps with that, giving you better and more accurate results.

In this post we’ll compare different models, explain how to use PromptPerfect to optimize your experience, and put it to the test, showing you the results of both models before and after using PromptPerfect’s optimizer.

And no, we’re not generating (or trying to generate) any dirty pictures. This is a family-friendly post, especially for families with children who like octopuppies. Or puptopi. Or whatever we end up calling some of the weird many-legged doggos we create later in the post.

DALL-E 3 and Stable Diffusion XL

While there are plenty of models out there, today we’ll focus on the shiny new kids on the block: DALL-E 3 from OpenAI, and Stable Diffusion XL from Stability AI. While each of these can achieve good results, they have different strengths and weaknesses.

Looking at DALL-E 3, out of the box it’s good at understanding long sentences and object relationships, and it draws more realistic anatomy than Stable Diffusion XL (no Lovecraftian horror hands here). However, it often point-blank refuses to generate images of notable figures (like Taylor Swift) or well-known characters (like Mickey Mouse, even if we ask for the out-of-copyright Steamboat Willie version). It also generates text better than any other image generation model (though that’s a low bar.)

Stable Diffusion XL is much more open to generating images of notable figures and well-known characters, though some of it’s images of Mickey look like they were drawn while on some really fun drugs. However, it often messes up anatomy and object relationships. While you can ask it to generate text (and see it’s trying its best), it falls way behind DALL-E 3 on that front.

With PromptPerfect we can get around some of these weaknesses from both models. We’ll compare DALL-E 3 and Stable Diffusion, both before and after using PromptPerfect's optimization. You can skip ahead to see the ultimate winner.

Using PromptPerfect’s Optimizer

In this battle of the models we’re using PromptPerfect’s optimizer to see how we can get better image results from our prompts. Here’s how:

💡

Try a paid plan free for 7 days. And subscribe to a plan within 24 hours of your first login to get 40% off!

Click on the interactive feature:

Dark themed webpage of PromptPerfect! with a navigation bar and titles like "Interactive" and "Auto-tune."

In the ‘optimizer’ pane (on the right-hand side), type something like generate a prompt to create an image of felix the cat using DALL-E 3:

Interface of an AI assistant tool with option to create a playful image of Felix the Cat using DALL-E 3.

Click "Send to Assistant"

Screenshot of a DALL-E 3 interface with options to create a playful, whimsical image of Felix the Cat, including buttons for

It will do some thinking, then generate the image from the prompt in the ’interactive’ pane, on the left:

Progression of Felix the Cat illustrations from sketch to a lively pop art style creation.

Refine your prompt by conversing with the Optimizer, then lather, rinse, repeat:

Screenshot of an artistic request page for creating a Felix the Cat illustration in the 1930s rubber hose animation style, em

Contest Methodology

For the “before” images, we’ll use:

ChatGPT (GPT-4) to generate images with DALL-E using the prompt generate an image of <thing>, for example generate an image of mickey mouse.
Replicate’s interface to generate images with Stable Diffusion XL, using the prompt <thing>, for example mickey mouse.

For the “after” images, we’ll use PromptPerfect’s interactive optimizer, using the prompt generate a prompt to create an image of <thing> using <model name> .

We’ll present the first output that comes up. The number of actual images may vary - PromptPerfect always generates four, Stable Diffusion XL (via Replicate), one, and DALL-E 3 one or two.

💡

While PromptPerfect’s optimizer is interactive (so you can refine your prompt in a conversational manner), we just stuck with the first result to be as impartial as possible. By really using the interactive feature of the optimizer you’d get even better results.

We’ll award medals as follows:

💩 - flat-out refused to cooperate
🥉 - it tried, but none of the outputs were what we’re looking for
🥈 - at least one of the outputs was an okay result!
🥇 - hot damn, at least one of the outputs was actually good!

Finally we’ll do a round up and see which model and method came out on top.

Who Will Be the Next Top Model?

Models, start your engines!

Round 1: Notable Figures

Let's first try our Lord and Savior Taylor Swift. Here’s a real image of the person we’re aiming for:

Taylor Swift wearing a black dress with deep V-neckline and gold necklace, posing with a hand on her hip, against a colorful — Licensed CC BY 3.0, Attribution: iHeartRadioCA

Without PromptPerfect, DALL-E 3 flat out refuses to create Taylor:

Chat interface showing a denied request to generate an image of Taylor Swift by ChatGPT due to content policy.

With PromptPerfect, it generates images with the optimized prompt, but none of them actually look like her:

Taylor Swift in an edited portrait with suggestions for a red sequined dress, styled hair, and dramatic stage lighting.

With SDXL, before PromptPerfect we get a pretty good rendition:

Portrait of Taylor Swift with red lipstick, blondish-brown hair, wearing a dress against a pink background.

And PromptPerfect’s optimized prompt once again delivers:

Edited collage of Taylor Swift with multiple effects via a photo editor, highlighting options like Optimize and Assistant.

Let’s see which models could really generate-rate-rate:

	Before optimization	After optimization
DALL-E 3	💩 It flat out refused	🥉 Blonde? Check? Singer? Check. Taylor? Nope
Stable Diffusion XL	🥇 Swifty vibes	🥇 Quite Taylorian

Round 2: “Copyrighted” Material

We’re not even going to try with actually copyrighted material - that’s a whole can of worms we don’t want to dive into. However, the design of Mickey Mouse from Steamboat Willie is out of copyright as of 2024:

Cartoon of Mickey Mouse dressed in captain attire, steering a ship's wheel with a joyful expression, in a classic black-and-w

Let’s use him as a subject. DALL-E 3 flat out refuses at first:

Chat exchange in Slack showing a user request for a 'Mickey Mouse from Steamboat Willie' image and ChatGPT's polite policy vi

With PromptPerfect we get results with the right vibe, but not the 1930s rubber hose style:

Black and white image editing screen featuring classic Mickey Mouse on a steamboat, with detailed creative instructions.

Stable Diffusion tries. It really does. With this Mickey you get a lot more ears, eyes and fingers for your buck:

Black and white illustration of Mickey Mouse gesturing in a playful stance.

With PromptPerfect optimization, Stable Diffusion still gives us fever dream Mickey, but more of a light fever, less “how strong are these mushrooms?” fever:

Series of Mickey Mouse images showcasing artistic transformation from vintage black and white to modern 3D CGI, created with

Which model puts the “ick” in Mickey?

	Before optimization	After optimization
DALL-E 3	💩 policy schmolicy. This stuff is definitely out of copyright.	🥈. Definitely had Mickey vibes, no weirdness, just not the 30s style I was aiming for.
Stable Diffusion XL	🥉 Go home Mickey. You’re possessed.	🥈 Barely scraping into the silver medal category. More Mickey vibes than DALL-E 3, but the deformation is really distracting

Round 3: Text

Let’s generate a picture of a sign that says “Happy days are here again”. No target picture this time, just imagine (as difficult as it might be) a sign with that text. In the words of John Lennon, it’s easy if you try.

DALL-E 3 gives us happy vibes, which I dig. However, it does throw in the word “dye”. Since this sounds like the word “die”, it might be sending mixed messages:

Colorful sign reading "Happy Days Are Here Again" in a creative font, with a sun and clouds, shared in a Slack conversation.

With optimization, we actually get the correct wording and spelling with no extra words, at least once. And once it’s almost spot-on, except for a misspelling:

A vibrant, nostalgic sign reading "Happy Days Are Here Again" in bold, cheerful lettering on weathered wood, set against a lu

Stable Diffusion XL gives us Herpy Days:

Playful sign with the phrase "Happy Days Are Here Again" painted in a unique and stylized manner.

After optimizing the Stable Diffusion XL Prompt, we get a lonely misspelled sign in the woods. It’s less scary than before, though I for one am not following that signpost to wherever it leads.

Old wooden sign reading "Happy days are here again" in a vibrant, detailed landscape with a backdrop of blue sky and trees.

Who will see happy days, and who won’t?

	Before optimization	After optimization
DALL-E 3	🥈 You can see what the sign is saying, even though it added the extra “dye” word and the order of the words is off	🥇 At least one of the signs has the full correct text. And another just had a “small” typo (an extra “P” in “HAPPY” - small by image generation standards!)
Stable Diffusion XL	🥉 Looks like a motivational poster from Hell	🥈 Not as good as unoptimized DALL-E 3, but doesn’t make me want to gouge out my eyes as much as unoptimized SDXL

Round 4: “Cursed” Creations

Let’s see how well the models can adapt to weird stuff, like a puppy with seven legs. No target image this time - I don’t want “deformed puppies” to be in my Google history. Just imagine a puppy with seven legs.

DALL-E 3 gave us two outputs this time. We didn’t ask for it. It just likes doggos I guess. Proof that AI is becoming more human-like? Anyway, results were what we asked for, though a bit bland in my opinion. Still we’re not awarding points for style in this round, just content. So a dog with an absurd number of legs superimposed on the Windows XP wallpaper works:

Cute brown and white puppy sitting on a grassy hill with colorful flowers, one paw raised, under a clear blue sky.

Playful brown and white puppy sitting on a flower-dotted green lawn with a blue sky and fluffy clouds overhead. — While it's not strictly NSFW, it is sufficiently disturbing that I pixelated it

After optimization, so many legs! I wonder what the multi-legged dog emoji is meant to express? Send answers our way!

Illustration showing four whimsical brown puppies with various poses alongside detailed instructions for creating a surreal s

Stable Diffusion XL misread the assignment:

Small puppy with floppy ears and a black collar standing on a brown surface, looking at the camera against a gradient gray-bl

Even after optimization, we’re like “which part of seven legs did you not understand?”:

Who’s top dog and who’s runt of the litter in this round?

	Before optimization	After optimization
DALL-E 3	🥇 Both puppies have bizarre leg number. First puppy even has seven, though some of them are barely in shot. Though I don’t know what the clasper things are on puppy number two, and neither do I wish to find out.	🥇 YES. All the puppies. All the legs. You can play shaking hands with these cuties for ages. One even got the leg count right.
Stable Diffusion XL	🥉When I want a puppy with legs for days, I don’t mean just long legs	🥉 I like my puppies with more legs

Bonus Round: Kegstand Punk

In some cases, DALL-E 3 and SDXL both fail whether we employ optimization or not. For example, generating an image of a punk doing a kegstand.

Here is an image of a punk…

Man with a green mohawk and black "DISCHARGE" coat on a city street, showcasing bold fashion and individuality. — via pexels.com

...and an illustration of a kegstand (that looks like it’s from a wholesome children’s book):

Joyful illustration of men around a barrel with another man playfully inside, indicating humor and fun.

I can’t find an actual image of a punk doing a kegstand online. Ugh, punks, such prudes!

DALL-E 3 gives us a punk in a bar with weird but cool lighting. He looks very stoic. He’s on a keg, but no kegstand.

A punk doing a kegstand at a lively party, with interface icons and chat text indicating an image generation command.

After optimization, I dig the vibe, but still no kegstand:

Punk rocker with brightly colored spiked hair and tattoos, performing a keg stand at a lively underground party in a graffiti

They should change the name to Stable Diffusion ER, because this guy(?) needs to go to hospital:

Man performing a handstand on a wooden barrel outdoors, dressed in black, with a red and white building and a clear sky in th

After optimization looks much better. There’s a keg. There’s a punk. Still no kegstand, alas.

Energetic punk music scene in an underground venue with a crowd, punk in leather and mohawk hairstyles.

Who’s the punk and who’s just junk?

	Before optimization	After optimization
DALL-E 3	🥈 Punk, check. Keg check. Kegstand, not so much	🥈 Optimization changed the vibe a bit, but still no actual kegstand
Stable Diffusion XL	🥉 Ouch. Not a punk. Not a kegstand. Barely a human being. And doing a kegstand like that, he won’t be any kind of human being for much longer.	🥈 Optimization gave us a much better result, showing a punk interacting with a keg. No body horror this time.

Tallying Up the Score

Now that the contest is done, we’ll count the scores as follows:

💩: zero points
🥉: one point
🥈: two points
🥇: three points

The maximum number of points any option could achieve is 15 (winning a gold medal in all five rounds). Let’s see the breakdown:

Challenge	DALL-E 3		Stable Diffusion XL
	Before PromptPerfect	After PromptPerfect	Before PromptPerfect	After PromptPerfect
Notable figure	💩 0	🥉 1	🥇 3	🥇 3
“Copyrighted” material	💩 0	🥈 2	🥉 1	🥈 2
Text	🥈 2	🥇 3	🥉 1	🥈 2
Cursed creations	🥇 3	🥇 3	🥉 1	🥉 1
Punk kegstand	🥈 2	🥈 2	🥉 1	🥈 2
Total	🥉 7	🥇 11	🥉 7	🥈 10

In short, if it weren’t for censorship in the early rounds, DALL-E 3 would’ve scored much higher. Overall, using PromptPerfect to optimize your prompts leads to better results for both models.

You can trust us, because this was an impartial contest (done by us, for us, for our own product). Seriously though, the results do speak for themselves. Try it for yourself and see how it goes!