Dancing Bears on Cocaine: Why Generative AI is Nothing to Panic About

The appeal of the dancing bears was in how remarkable it was to see a bear dance and perform tricks and act like a person. The appeal of Generative AI is much the same. But lately, Generative AI has gone from dancing bear to cocaine bear. What's really going on beneath the hype?

Brown bear in a pink tutu with raised paws and open mouth atop an elevated surface, displaying a playful yet fierce stance
"The marvel is not that the bear dances well, but that the bear dances at all."
 — Traditionally attributed as an “old Russian proverb” but probably of more recent English origin.

Long before Cocaine Bear, a film fictionalizing an incident in the 1980s where a wild bear died after consuming a large quantity of cocaine abandoned by a smuggler, “dancing bears” were a widespread kind of entertainment. In many parts of Europe, in the 17th through 19th centuries, itinerant fairs would put on dancing bear shows for the public.

Drugs were not generally involved, as far as I know. Trainers would kidnap bear cubs, raise them to be tame, and teach them to perform tricks of various kinds. There was even an “academy” for “learned bears” (учёные медведи) in Smorgon (Сморгонь, now a city in Belarus) until the Russian Empire banned performing bears in 1870.

Historical association with eastern Europe is probably why the traditional saying about dancing bears is usually credited as an “old Russian proverb”, even though it doesn’t exist in Russian. Its origin is most likely English, adapted from a misogynist quote attributed to Samuel Johnson.

I asked ChatGPT about dancing bears:

ChatGPT has an impressive fluency in language. This is a remarkable development that attracts a lot of attention, but like the way dancing bears were trained to perform, large Generative AI models are trained to speak, construct pictures, and make music.

The appeal of the dancing bears was in how remarkable it was to see a bear — a large, antagonistic, dangerous animal — dance and perform tricks and act like a person.

The appeal of Generative AI is much the same.

OpenAI released GPT-2 in early 2019, and DALL-E in January 2021. Both attracted a lot of attention in the AI community, for the same reasons the dancing bears did: They represented a perceptibly large leap in the quality of AI text and image generation. However, these models attracted relatively little public interest in the midst of the Covid pandemic.

What a difference a couple of years makes!

With GPT-3, ChatGPT, Stable Diffusion, and MidJourney, AI has moved from being dancing bear to a bear cranked up on cocaine.

These models are shocking not just to AI researchers but to the larger public, and OpenAI's decision to make these models accessible to everyone has created an enormous buzz. The Internet is now full of discussions of artificial consciousness and the risk of an AI apocalypse, both from people who regularly say ridiculous things and from people who should know better.

But generative AI is still a dancing bear. Like any dancing bear, its performance is so remarkable that it hardly seems fair to critique its quality.

For example, ChatGPT often makes mistakes:

ChatGPT confidently getting it wrong when asked for specific information

Conversations like this show that ChatGPT generally gives superficially plausible answers. Often, the most plausible answer is in fact correct, but far from always. Most people who know anything about Nietzsche will know that he said “God is Dead” and maybe that he wrote “Thus Spoke Zarathustra” (Also Sprach Zarathustra). It is entirely plausible that the two facts go together, but in fact, the full quote is only found in "The Gay Science" (Die Fröhliche Wissenschaft).

When confronted with a mistake, ChatGPT changes its mind easily. Unfortunately, it does so even when it’s right.

ChatGPT giving a correct answer, then accepting my “correction” to the wrong answer, and adding more information to make it more plausible.

Asking ChatGPT to do any kind of quantitative reasoning shows that it doesn’t really do much reasoning at all:

The correct answer is 3 days, because it takes 1 chicken 1.5 days to lay 1 egg, so 5 chickens need 3 days to lay 10.

Worse still, generative AI sometimes acts like it's doing drugs, although it's a lot more like LSD than cocaine. In the current jargon, it often "hallucinates", or more accurately, confabulates in the absence of adequate information. Asking about somewhat obscure topics often brings this behavior out:

ChatGPT confabulating a flatly wrong answer, even though the right answer is in Wikipedia and therefore almost certainly in its training data.

V. N. Voloshinov did write a book called "Marxism and the Philosophy of Language" (Марксизм и философия языка), so this answer is plausible, but "Marxism and the Problems of Linguistics" (Марксизм и вопросы языкознания) was the work of Joseph Stalin.

The most consistent way to get ChatGPT to confabulate is to ask it to translate nonsense texts from some less widely used language:

This all seems plausible, if you know nothing of the Twi language and didn't look closely at the text to translate. But ChatGPT is confidently asserting that it knows languages that it clearly doesn't know!

Confabulation is a real problem for AI applications. There is right now a rush to market AI-chat driven Internet search products. So far, it's been a disaster for both Google and Microsoft.

I don’t need a computer to get things wrong. I’m perfectly capable of doing that on my own. These failures call into question what large language models are actually good for. Clearly they are not suited to act as information retrieval systems.

AI Confabulation as Art

Confabulation affects other kinds of large neural network models as well, not just the ones that talk.

Stable Diffusion, MidJourney, DALL-E and other AI art models produce excellent work, when seen in terms of the purely technical aspects of style and image composition. However, AI-made images are typically marred by odd flaws. For example, the images below were generated by Lexica.art in response to a request (called a ‘prompt’ in the lingo of AI art) for “a statuesque beautiful woman with natural red hair in a revealing cocktail dress, smiling, looking at the camera, and holding up a wine glass”.

Count the number of hands in the left image, the number of elbows in the right image, and the number of fingers in both. Even humans with very little artistic skill know how many fingers there are on a hand, and are even more likely to know hands usually connect to arms.

Another example: Using as a source (a 'seed image' in the jargon) the official portrait of Boris Johnson, I asked Lexica.art to make it “art nouveau, in the style of Alphonse Mucha.” This was the result:

Boris Johnson as "art nouveau, in the style of Alphonse Mucha", from Lexica.art

Mucha’s human models were mostly women, as are many artists’ models, and as a result, this software rendered the former Prime Minister as a woman half of the time. No human artist, if asked to create a picture of Boris Johnson in any style, would instead depict a woman who looks nothing like him.

These large models have “learned” to do things that are neither specifically in their training data nor in the instructions given to them. This is confabulation, but in a different medium.

And in much the same way that ChatGPT fails at quantitative analysis, asking AI art generators for anything that requires counting above three is at best hit and miss. Below is an example of asking for “six apples”:

“Six apples” from Lexica.art
These large neural network models are some of the most massive and computationally demanding software in the world, and they cannot count to six!

Furthermore, these models cannot write even the simplest text. The images below come from MidJourney, prompted with “a stop sign, with the word 'STOP' printed on it in all capital letters”.

These AI models were not trained to understand written text. They can produce single letters when prompted, and they produce text-like features in many images, but rarely legible, correct text.

AI music is a less well-developed application of Generative AI technologies, but recent very impressive results from Google suggest similar qualities.

We should not minimize the remarkable aspects of Generative AI, but its fundamental shortcomings are not little flaws that might go away in the next release. These problems are intrinsic to their underlying architecture and manner of functioning.

Human Bias and the Theory of Bullshit

Large neural network models, trained with incredibly large datasets, excel at things like apparent linguistic fluency and technical artistic skill. They speak clearly, credibly, even confidently and produce balanced, aesthetically sound images, full of rich detail, depicting their objects with clarity and, in some cases, shocking realism.

Generative AI manipulates entrenched human biases, and when evaluating it, we need to keep those biases clearly in mind.

Fluency in language creates an appearance of intelligence and depth of thought. This particular bias is well-known and people should guard against it much more aggressively than they do.

For example, it is a truism among English-speakers that people who speak with British “Received Pronunciation” are presumed to be more intelligent than other people. This prejudice is very robust across the English-speaking world, as is its converse: Someone speaking a social or regional variant, or having a recognizable "accent", is perceived as ignorant or unintelligent, despite decades of debunking by sociolinguists.

What appears so impressive about ChatGPT is a reflection of that prejudice. We equate normative linguistic behavior with intelligence, reflection and depth, even in the complete absence of those things.  Much has been made of AI models getting moderately good scores on standardized IQ-style tests like the SAT, and on university examinations in law, medicine, and business. We should be questioning the usefulness and validity of these tests instead of thinking that ChatGPT can practice law, or medicine, or run a business.

Large language models engage in speech without regard for any relationship between what they say and any external or objective truth.

"Confabulation" is a more accurate word than "hallucination" for what happens when large language models get something wrong. AI models are not misinformed about the world and telling you things they “think” are true. Nor are they lying, because lying requires an awareness of what is true and what is false and an intent to mislead you. Confabulation in humans is seen as a sign of brain damage.

We can also look at AI confabulation through the lens of pragmatics and speech act theory. Since the ground-breaking work of Harry G. Frankfurt in the 1980s, we have a technical term for the kind of language ChatGPT produces: bullshit.

What bullshit essentially misrepresents is neither the state of affairs to which it refers nor the beliefs of the speaker concerning that state of affairs. Those are what lies misrepresent, by virtue of being false. Since bullshit need not be false, it differs from lies in its misrepresentational intent. The bullshitter may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to. [...]

[The bullshitter] is neither on the side of the true nor on the side of the false. [...] He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.

On Bullshit, 2005 (1986)

That is what large language models do: They create speech designed to closely approximate coherent, reasonable, human answers to the requests given to it. Their "purpose" is to optimize the values of variables in a complex algebraic equation constructed by their training. The result may be an accurate representation of the external world, or not, but that is entirely irrelevant to how they work.

AI art is “bullshit” in a similar way. Just like the way that humans are prejudiced to perceive fluent language as indicative of awareness, intelligence and depth, we are prejudiced to think that technically well-made images reflect artistic talent.

This prejudice is just as faulty.

There are countless important, meaningful, critically lauded, publicly loved, and expensively sold works of art that don't match up with preconceptions of what art is “supposed to look like.” Consider Jean-Michel Basquiat’s Untitled (1982) (see below), a work rich with symbolism and meaning that sold at auction in 2017 for US $110 million. It is the work of a largely self-trained “street artist” (the preferred euphemism for someone who makes graffiti) who rarely if ever created traditional representational art using “proper” technique.

Jean-Michel Basquiat’s Untitled (1982), currently in the personal collection of Yusaku Maezawa. (Source: Rob Corder on Flickr)

The value of an artwork is not to be found in the technical skills with which it is executed. Next to Basquiat, we could name any number of indisputably meaningful and important works of art that do not employ traditional techniques, from Picasso's Guernica to Duchamp's Fountain.

AI art is the flip side of that coin: Just as the works of Basquiat, Picasso, and Duchamp can be great art without using traditional technical skills, computers can now create images with great technical skill, but that have no particular artistic merit.

The creativity of Generative AI is really only constrained randomness. Humans have a strong aversion to perceiving randomness as such. We are inclined to find meaning and perceive creativity in anything that even slightly resembles meaningful or creative activity.

Art is intended to provoke. It's made by artists with something to say. Generative AI creates images from descriptions, or even random inputs, but AI models have nothing to say. They can demonstrate considerable technical skill and can even create attractive images, but do so without any consideration of what they are representing, nor what they say about the objects they depict. The visual flaws so common to it, like extra and disconnected body parts, reflect its indifference to its artistic subjects.

There is no need to perform any pernicious gatekeeping here about what is and isn’t art. We don’t have to say that AI art isn’t really art. AI output is motivated by an intent just as much as abstract art, or even “found” art is. But that intent does not come from the AI.

Bullshit has a purpose. People routinely bullshit in order to misrepresent their personal expertise in some subject, or to impress someone, or to win an election to a political office. Speech acts are still goal-oriented acts, even when they are bullshit. What is the goal of AI bullshit?

The answer is the same as why the dancing bear dances: There is a trainer who seeks to profit.

Coming Down from the Hype: A Modest Future for Generative AI

To say that AI output is bullshit is not to say AI engineering is bullshit. It represents real progress in machine learning and embodies a great deal of time and money spent on genuinely important research. Generative AI has real applications, although it is as yet unclear how important or valuable those applications are.

The last few months have been quite a high for the AI sector, but we have to start asking questions, like "what is it actually good for?" and "how do I use it to do something productive?" What happens when we take the cocaine away from the bear and it starts to come down?

We can outline a few obvious markets for it, but honestly it is not yet clear that this impressive new technology has any genuinely revolutionary applications.

There is probably some honest money to be made in using Generative AI as a writing assistant for very mundane tasks, particularly as an aid for non-fluent writers. Microsoft has expressed some interest in integrating this kind of AI into its Office suite.

The most predictable uses for generative AI are in line with developments that have been underway in natural language processing for decades.

We might realistically envision a future in which we can confidently “query” unstructured, man-made texts, extracting computer-accessible, actionable information from them without human intervention. This potentially represents a very large advance in natural language understanding technology, one with significant value for many industries that have to process natural language texts. It could be a paradigm-changing technology for digital document management and potentially for customer service. But as important as those fields are, both already make heavy use of automation, and we should not expect world-shaking changes to come from them.

As for AI art, having a “smart” brush, powered by an AI model, has some clear value for graphic designers or anyone else who has to handle computer images. For example, I took the official portrait of Boris Johnson (below) and erased his famously unkempt hair, then made DALL-E 2 fill it in.

One can readily imagine all sort of touch-ups and changes to images might be easier with this kind of technology, and there are doubtless people working at integrating Generative AI models into graphic design and photo processing tools.

Purely generative AI art products like MidJourney have a large and devoted paying user base, which indicates that many people are having a lot of fun with it. Many users treat prompt construction as a kind of game and enjoy seeing what the AI produces as a result. It is akin to pulling the lever on a slot machine: The anticipation of not knowing what you'll get and the dopamine hit of seeing something new come out of it.

Even sending random strings to AI art generators can be a fun pastime:

Making fun software for people is a perfectly legitimate use of AI, one with quite reasonable profit potential, and certainly preferable to many other possible uses. But it's far from revolutionary.

Generative AI’s value as inspiration — as a source of materials around which human creativity can crystallize — is impossible to assess or measure, but we already see some examples of applications along those lines.

One example is Rationale, a GPT-3 powered decision-support tool. You can ask it to provide various kinds of analysis of potential decisions. Even if it doesn’t tell you anything you don’t already know, people often think more clearly when the alternatives are laid out for them, in print. Every now and then, it might even come up with a consideration you hadn’t thought of.

Example analysis from Rationale.

These kinds of uses are the ones where Generative AI can most likely really add value. However, to access that value, we will need to learn how to deploy large AI models within information processing workflows. There is not very much precedent for this because AI models do not work like traditional software modules. Jina AI is building frameworks for exactly that kind of integration, and we expect to continue to make progress for the foreseeable future.

There may be some truly revolutionary application for this technology, but if so, it has not yet manifested itself. Right now, Generative AI has a lot of hype, as does AI in general, and few money-making applications. It's not all hype, but we shouldn't forget that it’s mostly hype.

AI Hype Snow and AI Winter

For most drugs – including alcohol – there is a period after a session of moderate to heavy use when the user may feel uncomfortable, unhappy, or ill. Like any drug, AI hype causes comedowns and hangovers. This field has a cyclical history of breakthroughs, overpromising, and then disappointment and disinvestment. After snorting a lot of "hype snow", we experience what AI veterans call “AI Winter.”

The realization that AI-powered self-driving cars are not right around the corner has brought back talk of another "AI Winter". It is entirely possible, even likely, that the current set of impressive-seeming AI models will prove less useful than the hype would have you believe, and will not be immediately followed by significant improvements.

Of course there’s a Game of Thrones meme for that!

It is not clear which businesses Generative AI might revolutionize and which ones might only experience incremental changes or none at all. It is important to have realistic expectations.

For now, it’s okay to enjoy the dancing bear performing for us, even speculate what use we might put a trained bear to in our homes and businesses. But don't snort too much of the hype. In the end, it’s still a dancing bear. It's quite an impressive accomplishment, as long as you don't focus on how well or how poorly it actually does things.

Those of us in the emerging 'AI sector', including Jina AI, are doing the hard work of making this technology genuinely productive. The proof of its value will be in productivity-enhancing tools and profitable businesses, rather than impressive-appearing demos. That work is just beginning.