This glossary was originally based on the McKinsey & Company white-paper “The Economic Potential of Generative AI - The Next Productivity Frontier”.

Artificial intelligence (AI) is the ability of software to perform tasks that traditionally require human intelligence.

Artificial neural networks (ANNs) are composed of interconnected layers of software-based calculators known as “neurons.” These networks can absorb vast amounts of input data and process that data through multiple layers that extract and learn the data’s features.

Deep learning is a subset of machine learning that uses deep neural networks, which are layers of connected “neurons” whose connections have parameters or weights that can be trained. It is especially effective at learning from unstructured data such as images, text, and audio.

Early and late scenarios are the extreme scenarios of our work-automation model. The “earliest” scenario flexes all parameters to the extremes of plausible assumptions, resulting in faster automation development and adoption, and the “latest” scenario flexes all parameters in the opposite direction. The reality is likely to fall somewhere between the two.

Fine-tuning is the process of adapting a pretrained foundation model to perform better ina specific task. This entails a relatively short period of training on a labeled data set, whichis much smaller than the data set the model was initially trained on. This additional training allows the model to learn and adapt to the nuances, terminology, and specific patterns found in the smaller data set.

Foundation models (FM) are deep learning models trained on vast quantities of unstructured, unlabeled data that can be used for a wide range of tasks out of the box or adapted to specific tasks through fine-tuning. Examples of these models are GPT-4, PaLM, DALL·E 2, and Stable Diffusion.

Generative AI is AI that is typically built using foundation models and has capabilities that earlier AI did not have, such as the ability to generate content. Foundation models can also be used for nongenerative purposes (for example, classifying user sentiment as negative or positive based on call transcripts) while offering significant improvement over earlier models. For simplicity, when we refer to generative AI in this article, we include all foundation model use cases.

Graphics processing units (GPUs) are computer chips that were originally developed for producing computer graphics (such as for video games) and are also useful for deep learning applications. In contrast, traditional machine learning and other analyses usually run on central processing units (CPUs), normally referred to as a computer’s “processor.”

Large language models (LLMs) make up a class of foundation models that can process massive amounts of unstructured text and learn the relationships between words or portions of words, known as tokens. This enables LLMs to generate natural-language text, performing tasks such as summarization or knowledge extraction. GPT-4 (which underlies ChatGPT) and LaMDA (the model behind Bard) are examples of LLMs.

Machine learning (ML) is a subset of AI in which a model gains capabilities after it is trained on, or shown, many example data points. Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction. The algorithms also adapt and can become more effective in response to new data and experiences.

Modality is a high-level data category such as numbers, text, images, video, and audio.

Productivity from labor is the ratio of GDP to total hours worked in the economy. Labor productivity growth comes from increases in the amount of capital available to each worker, the education and experience of the workforce, and improvements in technology.

Prompt engineering refers to the process of designing, refining, and optimizing input prompts to guide a generative AI model toward producing desired (that is, accurate) outputs.

Self-attention, sometimes called intra-attention, is a mechanism that aims to mimic cognitive attention, relating different positions of a single sequence to compute a representation of the sequence.

Structured data are tabular data (for example, organized in tables, databases, or spreadsheets) that can be used to train some machine learning models effectively.

Transformers are a relatively new neural network architecture that relies on self-attention mechanisms to transform a sequence of inputs into a sequence of outputs while focusing its attention on important parts of the context around the inputs. Transformers do not rely on convolutions or recurrent neural networks.

Use cases are targeted applications to a specific business challenge that produces one or more measurable outcomes. For example, in marketing, generative AI could be used to generate creative content such as personalized emails.

Unstructured data lack a consistent format or structure (for example, text, images, and audio files) and typically require more advanced techniques to extract insights.