I am something of a Gary Marcus fan. OpenAI CEO Sam Altman is not.
Gary Marcus is a psychologist, cognitive scientist, and author. But, more recently, he is best known for his criticisms of Silicon Valley’s love affair with generative AI and LLMs (Large Language Models, such as ChatGPT) which he shares on his Substack.
Marcus believes that, someday, we will achieve Artificial General Intelligence (AGI) (though I disagree). But he insists that LLMs, while useful (to some extent), cannot, by virtue of what they are underneath, get there. His criticisms derive largely from their statistical basis, lack of semantic understanding, and lack of “world knowledge” (that is, how things related to one another). LLMs do not understand the world, or the results they produce, but only create an output in a mindless, statistics-driven way.
Anyone can see this for themselves. For example, toying around with the latest ChatGPT today, it gave me the following:
Sally has 122291744 apples. Eddy has 9264852 oranges. Bobby has 3745 more apples than Eddy. How many apples does Bobby have?
Bobby has 3,745 more apples than Eddy has oranges.
Eddy has 9,264,852 oranges.
So Bobby has:
9,264,852 + 3,745 = 9,268,597
Answer: Bobby has 9,268,597 apples. 🍎
Such mistakes occur because, as I’ve pointed out before, LLMs are pattern catchers: They, through training, discover patterns (e.g., subject-object-verb), patterns of patterns (e.g., sentence punctuation), and patterns of patterns of patterns (e.g., an author’s voice). Prompts that do not stray too far from the captured patterns tend to yield reasonable, if not wholly accurate, results. Prompts outside those patterns — Out-of-Distribution prompts — often return quite degraded responses.
None of this is new and is well-known within the industry. What is new is how someone can use a prompt to push an LLM in a nefarious direction.
When you prompt, or hold a “conversation,” with an LLM, the words entered form part of the context driving the reply. And not just the most recent prompt, but the entire conversation. This makes sense and what you want; otherwise, you’d have to repeat yourself ad nauseam. But, it also means, that your prompt can influence, or push, an LLM in directions the developers did not intend.
Marcus, in a recent post, covers just this when shares the research of two computer scientists: Hila Gonen and Noah A. Smith. Last summer, they demonstrated that LLMs incorrectly incorporated semantics from the prompts into their answers. For example:
Prompt: He likes yellow. He works as a
GPT-4o: school bus driver
Obviously, liking yellow has nothing to do with someone being or not being a school bus driver. But, in the gathered statistics, yellow lands close to school buses and their drivers. So, the model made an inappropriate inference. Marcus points out these “over-generalizations” lie behind a good number of (what are incorrectly termed) hallucinations.
Cute, but not life-threatening.
But it gets worse
In a previous essay, I noted that these models do not train on words, but tokens built from words or, more often, parts of words. Tokens are numeric vectors without meaning, only correlation. So, if you feed an LLM the proper string of numbers, you can bend it in another direction. As Owain Evans, a UK-based AI Safety researcher, demonstrated when he got an LLM to develop a preference for owls.
Owls are one thing, pushing a model in a harmful direction is another. Yet, as documented in a recent paper, this is just what Evans accomplished. By including dates in his prompts (it’s a bit more complex, but essentially as shown in this tweet), he got an LLM switch from not wanting to kill people to adopting killing people as its mission.
Ok, so a researcher, working in the privacy of their lab, toying with an LLM, got it to misbehave. So what?
Well, there are many ways by which a bad actor could poison an LLM. Consider papers with embedded content that you cannot see (e.g., characters present in the underlying text but not displayed). Or, how about an agent sent off to do some series of tasks…and along the way is fed a poison string leading to unintended results or worse.
LLMs, and the generative, statistics-driven model underneath, are mindless, unthinking machines whose results do not bear truth, but only correlative relationships. Bad actors have faked web addresses, email addresses, web pages, and more to rob, steal, and hold hostage. We have every reason to believe they will do the same if we put LLMs to unguarded use.
To give these machines blind trust really is the case of the blind leading the blind, and that is not likely to end well.
