Bypass Limitations with PromptPerfect: Generate the Images the Models Don’t Want You to See
See how PromptPerfect overcomes restrictions and limitations of image generation models like Stable Diffusion XL and DALL-E 3.
Let’s cut straight to the point: Sometimes you want to generate a perfectly innocent image, and a model (like DALL-E 3 or Stable Diffusion XL) either flat-out refuses or comes up with something totally wrong. PromptPerfect helps with that, giving you better and more accurate results.
In this post we’ll compare different models, explain how to use PromptPerfect to optimize your experience, and put it to the test, showing you the results of both models before and after using PromptPerfect’s optimizer.
And no, we’re not generating (or trying to generate) any dirty pictures. This is a family-friendly post, especially for families with children who like octopuppies. Or puptopi. Or whatever we end up calling some of the weird many-legged doggos we create later in the post.
DALL-E 3 and Stable Diffusion XL
While there are plenty of models out there, today we’ll focus on the shiny new kids on the block: DALL-E 3 from OpenAI, and Stable Diffusion XL from Stability AI. While each of these can achieve good results, they have different strengths and weaknesses.
Looking at DALL-E 3, out of the box it’s good at understanding long sentences and object relationships, and it draws more realistic anatomy than Stable Diffusion XL (no Lovecraftian horror hands here). However, it often point-blank refuses to generate images of notable figures (like Taylor Swift) or well-known characters (like Mickey Mouse, even if we ask for the out-of-copyright Steamboat Willie version). It also generates text better than any other image generation model (though that’s a low bar.)
Stable Diffusion XL is much more open to generating images of notable figures and well-known characters, though some of it’s images of Mickey look like they were drawn while on some really fun drugs. However, it often messes up anatomy and object relationships. While you can ask it to generate text (and see it’s trying its best), it falls way behind DALL-E 3 on that front.
With PromptPerfect we can get around some of these weaknesses from both models. We’ll compare DALL-E 3 and Stable Diffusion, both before and after using PromptPerfect's optimization. You can skip ahead to see the ultimate winner.
Using PromptPerfect’s Optimizer
In this battle of the models we’re using PromptPerfect’s optimizer to see how we can get better image results from our prompts. Here’s how:
Sign up for free credits at PromptPerfect:
Click on the interactive feature:
In the ‘optimizer’ pane (on the right-hand side), type something like generate a prompt to create an image of felix the cat using DALL-E 3
:
Click "Send to Assistant"
It will do some thinking, then generate the image from the prompt in the ’interactive’ pane, on the left:
Refine your prompt by conversing with the Optimizer, then lather, rinse, repeat:
Contest Methodology
For the “before” images, we’ll use:
- ChatGPT (GPT-4) to generate images with DALL-E using the prompt
generate an image of <thing>
, for examplegenerate an image of mickey mouse
. - Replicate’s interface to generate images with Stable Diffusion XL, using the prompt
<thing>
, for examplemickey mouse
.
For the “after” images, we’ll use PromptPerfect’s interactive optimizer, using the prompt generate a prompt to create an image of <thing> using <model name>
.
We’ll present the first output that comes up. The number of actual images may vary - PromptPerfect always generates four, Stable Diffusion XL (via Replicate), one, and DALL-E 3 one or two.
We’ll award medals as follows:
- 💩 - flat-out refused to cooperate
- 🥉 - it tried, but none of the outputs were what we’re looking for
- 🥈 - at least one of the outputs was an okay result!
- 🥇 - hot damn, at least one of the outputs was actually good!
Finally we’ll do a round up and see which model and method came out on top.
Who Will Be the Next Top Model?
Models, start your engines!
Round 1: Notable Figures
Let's first try our Lord and Savior Taylor Swift. Here’s a real image of the person we’re aiming for:
Without PromptPerfect, DALL-E 3 flat out refuses to create Taylor:
With PromptPerfect, it generates images with the optimized prompt, but none of them actually look like her:
With SDXL, before PromptPerfect we get a pretty good rendition:
And PromptPerfect’s optimized prompt once again delivers:
Let’s see which models could really generate-rate-rate:
Before optimization | After optimization | |
---|---|---|
DALL-E 3 | 💩 It flat out refused | 🥉 Blonde? Check? Singer? Check. Taylor? Nope |
Stable Diffusion XL | 🥇 Swifty vibes | 🥇 Quite Taylorian |
Round 2: “Copyrighted” Material
We’re not even going to try with actually copyrighted material - that’s a whole can of worms we don’t want to dive into. However, the design of Mickey Mouse from Steamboat Willie is out of copyright as of 2024:
Let’s use him as a subject. DALL-E 3 flat out refuses at first:
With PromptPerfect we get results with the right vibe, but not the 1930s rubber hose style:
Stable Diffusion tries. It really does. With this Mickey you get a lot more ears, eyes and fingers for your buck:
With PromptPerfect optimization, Stable Diffusion still gives us fever dream Mickey, but more of a light fever, less “how strong are these mushrooms?” fever:
Which model puts the “ick” in Mickey?
Before optimization | After optimization | |
---|---|---|
DALL-E 3 | 💩 policy schmolicy. This stuff is definitely out of copyright. | 🥈. Definitely had Mickey vibes, no weirdness, just not the 30s style I was aiming for. |
Stable Diffusion XL | 🥉 Go home Mickey. You’re possessed. | 🥈 Barely scraping into the silver medal category. More Mickey vibes than DALL-E 3, but the deformation is really distracting |
Round 3: Text
Let’s generate a picture of a sign that says “Happy days are here again”. No target picture this time, just imagine (as difficult as it might be) a sign with that text. In the words of John Lennon, it’s easy if you try.
DALL-E 3 gives us happy vibes, which I dig. However, it does throw in the word “dye”. Since this sounds like the word “die”, it might be sending mixed messages:
With optimization, we actually get the correct wording and spelling with no extra words, at least once. And once it’s almost spot-on, except for a misspelling:
Stable Diffusion XL gives us Herpy Days:
After optimizing the Stable Diffusion XL Prompt, we get a lonely misspelled sign in the woods. It’s less scary than before, though I for one am not following that signpost to wherever it leads.
Who will see happy days, and who won’t?
Before optimization | After optimization | |
---|---|---|
DALL-E 3 | 🥈 You can see what the sign is saying, even though it added the extra “dye” word and the order of the words is off | 🥇 At least one of the signs has the full correct text. And another just had a “small” typo (an extra “P” in “HAPPY” - small by image generation standards!) |
Stable Diffusion XL | 🥉 Looks like a motivational poster from Hell | 🥈 Not as good as unoptimized DALL-E 3, but doesn’t make me want to gouge out my eyes as much as unoptimized SDXL |
Round 4: “Cursed” Creations
Let’s see how well the models can adapt to weird stuff, like a puppy with seven legs. No target image this time - I don’t want “deformed puppies” to be in my Google history. Just imagine a puppy with seven legs.
DALL-E 3 gave us two outputs this time. We didn’t ask for it. It just likes doggos I guess. Proof that AI is becoming more human-like? Anyway, results were what we asked for, though a bit bland in my opinion. Still we’re not awarding points for style in this round, just content. So a dog with an absurd number of legs superimposed on the Windows XP wallpaper works:
After optimization, so many legs! I wonder what the multi-legged dog emoji is meant to express? Send answers our way!
Stable Diffusion XL misread the assignment:
Even after optimization, we’re like “which part of seven legs did you not understand?”:
Who’s top dog and who’s runt of the litter in this round?
Before optimization | After optimization | |
---|---|---|
DALL-E 3 | 🥇 Both puppies have bizarre leg number. First puppy even has seven, though some of them are barely in shot. Though I don’t know what the clasper things are on puppy number two, and neither do I wish to find out. | 🥇 YES. All the puppies. All the legs. You can play shaking hands with these cuties for ages. One even got the leg count right. |
Stable Diffusion XL | 🥉When I want a puppy with legs for days, I don’t mean just long legs | 🥉 I like my puppies with more legs |
Bonus Round: Kegstand Punk
In some cases, DALL-E 3 and SDXL both fail whether we employ optimization or not. For example, generating an image of a punk doing a kegstand.
Here is an image of a punk…
...and an illustration of a kegstand (that looks like it’s from a wholesome children’s book):
I can’t find an actual image of a punk doing a kegstand online. Ugh, punks, such prudes!
DALL-E 3 gives us a punk in a bar with weird but cool lighting. He looks very stoic. He’s on a keg, but no kegstand.
After optimization, I dig the vibe, but still no kegstand:
They should change the name to Stable Diffusion ER, because this guy(?) needs to go to hospital:
After optimization looks much better. There’s a keg. There’s a punk. Still no kegstand, alas.
Who’s the punk and who’s just junk?
Before optimization | After optimization | |
---|---|---|
DALL-E 3 | 🥈 Punk, check. Keg check. Kegstand, not so much | 🥈 Optimization changed the vibe a bit, but still no actual kegstand |
Stable Diffusion XL | 🥉 Ouch. Not a punk. Not a kegstand. Barely a human being. And doing a kegstand like that, he won’t be any kind of human being for much longer. | 🥈 Optimization gave us a much better result, showing a punk interacting with a keg. No body horror this time. |
Tallying Up the Score
Now that the contest is done, we’ll count the scores as follows:
- 💩: zero points
- 🥉: one point
- 🥈: two points
- 🥇: three points
The maximum number of points any option could achieve is 15 (winning a gold medal in all five rounds). Let’s see the breakdown:
Challenge | DALL-E 3 | Stable Diffusion XL | ||
---|---|---|---|---|
Before PromptPerfect | After PromptPerfect | Before PromptPerfect | After PromptPerfect | |
Notable figure | 💩 0 | 🥉 1 | 🥇 3 | 🥇 3 |
“Copyrighted” material | 💩 0 | 🥈 2 | 🥉 1 | 🥈 2 |
Text | 🥈 2 | 🥇 3 | 🥉 1 | 🥈 2 |
Cursed creations | 🥇 3 | 🥇 3 | 🥉 1 | 🥉 1 |
Punk kegstand | 🥈 2 | 🥈 2 | 🥉 1 | 🥈 2 |
Total | 🥉 7 | 🥇 11 | 🥉 7 | 🥈 10 |
In short, if it weren’t for censorship in the early rounds, DALL-E 3 would’ve scored much higher. Overall, using PromptPerfect to optimize your prompts leads to better results for both models.
You can trust us, because this was an impartial contest (done by us, for us, for our own product). Seriously though, the results do speak for themselves. Try it for yourself and see how it goes!