ChatGPT? Stable Diffusion? Generative AI jargon, explained

 

By Jared Newman

December 26, 2022

While ChatGPT and text-to-image tools are among the buzziest developments in tech right now, comprehending what they are and how they work can be an exercise in frustration.

The field of AI is a rabbit hole of technical and mathematical jargon, and simple explanations of even the most fundamental concepts is in short supply. As a result, tools like ChatGPT and Stable Diffusion can feel like mystical black boxes, and it’s easy to lose track of the differences between them and the companies involved.

To help make sense of it all, here’s a plain English glossary of notable AI terms, products, and companies, along with links to where you can learn more.

Basic AI terms

AI: Short for artificial intelligence, this broadly refers to the idea of computers that can learn and make decisions in a human-like way.

Machine learning: A subfield of artificial intelligence, this is the practice of teaching computers to recognize patterns through data and algorithms. It differs from traditional programming in that the computer doesn’t need to be explicitly coded to address every potential scenario.

Neural network: A type of machine learning model that mimics the neurons in the human brain, using a network of nodes to process data through algorithms. This allows the computer to make connections between lots of different data points and learn which ones are the most important when responding to query.

Deep learning: Describes a neural network whose data passes through several layers of processing—some of which are hidden from the programmer—before arriving at a response. AI tools such as ChatGPT and Stable Diffusion are examples of applications that use deep learning techniques.

GPT and conversational AI

GPT: Short for “Generative Pre-Trained Transformer,” this is an AI model that uses deep learning to generate human-like text, created by OpenAI. The name itself requires some unpacking:

    “Generative” refers to its ability to generate text.

    “Pre-Training” means using the model from one machine learning task to train another, similar to how humans draw on existing knowledge when learning new things. In this case, GPT involves pre-training on a large corpus of text.

    A “Transformer” is a kind of neural network that holistically learns about the relationship between all parts of a data sequence (in this case, the words in a sentence). It’s seen as a breakthrough for AI because it understands context and nuance better than previous approaches.

Language modeling: A technique for determining the order of words in a sentence, based on the probability that those words will make sense.

ChatGPT: A conversational chatbot created by OpenAI, using a language model that emphasizes back-and-forth dialog. As of now, you can try it for free.

GPT-3: The third-generation language model created by OpenAI. It forms the basis for a slew of AI writing tools that have launched over the past two years, using OpenAI’s API. (ChatGPT uses an improved version, called GPT-3.5, while GPT-4 is in development.)

OpenAI: The AI research company behind GPT-3, ChatGPT and  DALL-E. It began as a non-profit group, but now operates a “capped-profit” company that employs most of its staff. Notably, Elon Musk was a cofounder, but resigned from OpenAI’s board in 2018.

DALL-E, Stable Diffusion, and AI art

Diffusion model: A method for creating images from text prompts. It works by adding random noise to a set of training images, then learning how to remove noise to construct the desired image.

Several companies are now using the diffusion model to offer text-to-image tools, most notably:

    DALL-E: OpenAI’s text-to-image tool, which uses GPT-3 to interpret users’ requests. The most recent version, DALL-E2, launched in July and offers sharper and more accurate images than the original. It’s available in a public beta, with users able to create up to 50 images for free.

    Stable Diffusion: An open-source text-to-image application created by Stability AI. The official version has a laborious installation process and runs through a command line, but third-party developers have used the open source code to create more accessible versions for desktop computers and the web.

    Imagen: Another text-to-image tool that uses a diffusion model, this one created by Google. The company has chosen not to release its code or demonstrate it publicly for now, citing its potential to create inappropriate content.

    Midjourney: An independent lab creating its own text-to-image system, currently available in an invite-only beta.

Dreambooth: A deep learning model, developed by Google, that can fine-tune images created through diffusion. Its most notable use case is the ability to generate new pictures of specific people based on existing photos—for better or worse. Although Google itself has not released Dreambooth for public use, an implementation of it has been released as an open source project.

Lensa: An image editing app for iOS and Android from Prisma Labs that first launched in 2018. It has gone viral in recent weeks thanks to a new “Magic Avatar” feature, whose effects are similar to that of Stable Diffusion and Dreambooth. It’s been criticized for creating overly sexualized images—particularly for women—along with accidental nudes.

(52)