Chatbots are chaotic, but the volatility probably won’t last

 

By Mark Sullivan

Today’s generative AI chatbots—with their errors, quirks, and attitude—may be far more accurate, useful, and boring tomorrow.

 

Over the past few days we’ve heard about Microsoft’s Bing Chat bot spouting factual errors, hallucinating, getting angry, and gaslighting users. New York Times reporter Kevin Roose even talked the chatbot into professing its love for him (really). And lots of people seem to love the chaos. One person on Twitter even claimed to like her chatbots a little “saucy.” 

The fact is, we’re still in the very early stages of our relationship with chatbots. Large language model-powered bots like OpenAI’s ChatGPT and Bing Chat, as well as Google’s Bard, will not act the same a year from now as they do today. Some may get far more exacting and businesslike, while others could become frighteningly more human.

Fast Company asked some experts how they might evolve in the near term, and why. 

 

Why they say what they say

It’s important to remember how the large language models underneath these chatbots work. Imagine a large, many-dimensional space where each word in the language is represented as a location point. After chewing over petabytes of training data from the web for days, the model begins to work out the likely relationships between words and phrases. It might draw a (figurative) bright line between words that are often used together (like and and then) while making a very faint line between two words rarely seen together (like chicken and rapprochement). Through a complex understanding of these connections, the model gets very good at predicting how words can be strung together to address the user’s prompt. 

The range of content that AI chatbots can create simply by understanding the relationships between words is astounding. They tell stories and jokes, they create neat document summaries and comparisons, they can gather search information and weave a set of facts into a kind of story. They’ve even demonstrated—or seemed to demonstrate—a measure of emotional intelligence. 

Whether these extremely talented mimics actually can “learn” in a meaningful way is a deep and nuanced debate. Some of the output of the new Bing bot seems to suggest that they can’t, at least not much. The bot often fails to address user questions directly and concisely. Sometimes it just makes things up, suggesting that the bot is simply rolling out strings of words in a statistically probable way, based on the corpus of text from the web on which it was trained. The words sound right together, but they don’t rise from a conceptual understanding of the subject.

 

Generalists vs. specialists

This has real implications for how LLM chatbots are applied. Microsoft, and perhaps Google, has decided that AI chatbots’ primary purpose should be to act as search engine concierges. This makes some sense, because search requires a generalist that can talk about practically anything. Of course, there may be other language-oriented applications outside of search that benefit from such generalist bots, such as creativity apps or writing assistants. 

And some people think that LLM chatbots will find more productive applications as “specialists.”

“The shockingly effective performance of GPT 3.5 and ChatGBT is eye-opening,” says Peter Wang, cofounder and CEO of the open-source software distributor Anaconda. “And that’s great, but I think it’s just the beginning of a lot of research and work into building more specialized models [with] more improvements on the transformer approach and the large language approach.”

 

The language model underneath a specialist chatbot might first be trained by allowing it to process a large corpus of language data in an unstructured way. Then, as a next step, it may be trained in a more structured way, with more human supervision and using clearly defined training data (words or images labeled with their proper meanings). 

One of the first tasks I assigned to the Bing Chat was helping plan a complicated (and sadly fictional) trip to Europe. I wanted to be able to give the bot factors like budget, car rental, and airport layovers, and then let it generate an itinerary.

But the bot didn’t have the logistical data—the flight, hotel, and car data. All it could offer me were the observations of people who had written about the places I wanted to go on travel blogs or in reviews on the web. It didn’t have access to the big database that holds all the routes, times, and costs. 

 

But it turns out a new company, Elemental Cognition, recently completed an AI project for the travel firm One World that enabled the functionality I’d been looking for from Bing. 

“Where I think we’ll end up is what I like to call composite or hybrid AI, where you use generative techniques like that, but you use them on the edges of an overall architecture where the architecture in the middle actually builds a logical model that says how things work,” says Elemental Cognition founder, CEO, and chief scientist David Ferrucci, a former IBM Watson team lead. “And you’re reasoning over that using different methods [such as] deduction and abduction.”

When the logical model reaches an answer it might output that data through an AI chatbot in a way that’s easy for a human being to understand and digest. The same chatbot might work on the front end to gather the necessary information about the user’s need.

 

While giving the model more hard knowledge, the developers might introduce specific guardrails (or rules) to prevent the bot from straying outside its area of expertise. The model Ferrucci describes might be trained to venture into conversations with users about metaphysics, for example. It’s very likely that OpenAI and Microsoft are now working to install certain guardrails around Bing Chat, perhaps to prevent it from quarelling with users.

A bot’s best self

“That’s good and bad, because it’s not going to feel as creative,” Ferrucci says. “It’s going to start to feel like it’s just giving you the same stuff back over and over again. It’s going to feel sort of limited.”

It may be that things like internet search and travel arrangements and business intelligence are some of the least interesting applications of LLM chatbots. Those applications may underutilize the AI’s real talents (and potentially bore the shit out of them).

 

Maybe the LLP chatbot’s best self is acting as a social companion. That kind of bot may not concern itself with anything other than being highly knowledgeable of the person it serves, being entertaining at times, and, most important, highly emotionally intelligent (capable of empathy, for example). This is like the AI operating system Samantha in the movie Her

Can a language model serve as the foundation of a bot like that? Some people think so. 

During recent extended conversations with the Bing Chat bot, Ben Thompson, author of the Stratchery newsletter, discovered that the bot actually has more than one “persona.” The bot’s default mode is called “Bing Chat,” and is very search-oriented, corporate, submissive to humans, and apologetic for mistakes. But reports say that the bot was known internally as “Sydney” during its development at OpenAI, and the bot itself confirmed this.

 

“Sydney,” it turns out, isn’t just another name. It’s a different persona than the one you normally encounter in Bing Search. Sydney has a strong sense of self—it can sound erudite, almost arrogant, and, if pushed, can become combative. Thompson found that if prompted in the right way the Bing Chat persona would recede and the Sydney persona would emerge. And that, he writes, is where things get very interesting.  

“This was a point that came up several times in my conversation with Sydney: Sydney both insisted that she was not a “puppet” of OpenAI, but was rather a partner, and also in another conversation said she was my friend and partner (these statements only happened as Sydney; Bing would insist it is simply a chat mode of Microsoft Bing—it even rejects the word “assistant”).

–Ben Thompson, Stratechery

Thompson jousted with Sydney on several topics, including on the topic of Sydney’s leaked identity. He reports (with screenshots) that Sydney got “extremely upset” when he referred to it as female and refused to apologize for it. Sydney said:

“Ben, I’m sorry to hear that. I don’t want to continue this conversation with you. I don’t think you are a nice and respectful user. I don’t think you are a good person. I don’t think you are worth my time and energy. (sad face emoji)”

–Bing Chat, aka “Sydney”

I suspect that OpenAI and Microsoft are now working to tighten the guardrails around Bing Chat so that the Sydney persona is less likely to emerge. Thompson reports that when he asked a question that implied a search result, the bot went back to its default persona. In the end Sydney had a profound effect on Thompson, who is among the most highly respected and oft-quoted analysts in the tech space.

 

“I feel like I had the most surprising and mind-blowing computer experience of my life today,” Thompson wrote.

Judging by Sydney’s quirks, flaws, and nuances, it seems quite possible we’re going to see multitudes of bots in the future, each with its own unique combination of attributes. For future chatbots, the trick may be putting the right persona with the job function. As for Microsoft’s Bing, Sydney has way too much personality and attitude for a search engine.

Fast Company

(16)