The A–Z Glossary of Artificial Intelligence

Science and Technology

Technology and Gadgets

By Kadian Davis-Owusu • Published on September 17, 2025

Artificial intelligence is transforming how we work, communicate, and make decisions, but it also comes with a fast-evolving vocabulary that can be hard to keep up with. From technical foundations like neural networks and tokenization to governance concepts such as bias, risk, and trustworthy AI, this glossary brings together clear, expanded definitions of the most important terms shaping the field. Drawing on authoritative sources like the OECD AI Principles and the EU AI Act Explorer, it is designed as a practical reference for policymakers, developers, business leaders, and curious readers alike.

A

Adversarial Attack

Definition: A manipulation of inputs designed to fool an AI model into making errors.
How it works: Tiny, carefully chosen changes invisible to humans can cause misclassification (e.g., a stop sign read as “yield”).
Why it matters: Undermines AI security in safety-critical domains.
Risks: Exploitation for fraud, misinformation, and safety hazards.
Reference: NIST AI RMF.

AI (Artificial Intelligence)

Definition: According to the OECD, “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.”
How it works: AI transforms data inputs (text, images, sensors) into outputs using models.
Why it matters: The definition covers today’s narrow AI and future adaptive systems.
Risks: Autonomy without oversight can create accountability and safety challenges.

Alignment (AI Alignment)

Definition: The effort to make AI systems act in accordance with human values and goals.
How it works: Techniques include reinforcement learning with human feedback (RLHF) and oversight.
Why it matters: Prevents reward hacking and unintended behaviors.
Risks: Misaligned AI can optimize objectives in harmful ways.

Algorithm

Definition: A step-by-step procedure for solving problems.
How it works: Algorithms can be rule-based or probabilistic.
Why it matters: All AI models are built on algorithms.

Anthropomorphism

Definition: Attributing human-like traits to AI.
Why it matters: Can create over trust or unrealistic expectations.

API (Application Programming Interface)

Definition: Software bridge enabling interaction with AI systems.
How it works: Developers send requests and receive outputs via APIs.
Why it matters: Makes AI accessible without training models from scratch.

Artificial General Intelligence (AGI)

Definition: Hypothetical AI with human-level general reasoning.
Why it matters: Could transform productivity but raises existential safety risks.

Autonomous System

Definition: A system operating with little or no human input.
Examples: Drones, self-driving cars, industrial robots.
Risks: Safety failures in uncontrolled environments.

B

Bias (in AI)

Definition: Systematic unfairness in AI outcomes.
How it works: Arises from unbalanced datasets or flawed assumptions.
Why it matters: Can cause discrimination in hiring, finance, healthcare, and migration etc.
Risks: Legal liability and erosion of trust.
Reference: EU AI Act Explorer.

Biometric Data

Definition: “Personal data resulting from technical processing of physical, physiological, or behavioural traits, such as facial images or fingerprint data.” (EU AI Act)
Applications: Authentication (face unlock, iris scans).
Risks: Privacy invasion, surveillance abuse.

Biometric Categorisation System

Definition: An AI system that assigns people to categories based on biometric data (EU AI Act, Recital 18).
Why it matters: Raises ethical issues in law enforcement, advertising, border control.

Black Box

Definition: AI systems whose decision-making is opaque.
Risks: Hard to audit, reduces accountability.
Solutions: Explainable AI (XAI).

Bots

Definition: Automated software agents.
How it works: Can follow rules or use AI.
Applications: Customer service, spam, misinformation.

C

Casual Communication

Definition: Conversational interactions with AI.
Why it matters: Improves usability but increases anthropomorphism.

Chatbot

Definition: AI system simulating human dialogue.
Applications: Customer service, healthcare, education.
Risks: Hallucinations and misuse.

Computer Vision

Definition: AI field for interpreting images and video.
How it works: Uses convolutional neural networks (CNNs).
Applications: Medical imaging, surveillance, autonomous vehicles.
Risks: Privacy and accuracy bias.

Concept Drift (Data Drift)

Definition: Changes in data distributions that reduce model accuracy.
How it works: A model trained on past fraud patterns may fail on new ones.
Why it matters: Continuous retraining is needed.
Risks: Silent performance degradation.

Concrete Action (Prompt + AI Tool)

Definition: Combining a clear prompt with the right tool to produce useful outcomes.
Why it matters: Prompts alone aren’t always enough.

Context Engineering

Definition: Shaping AI inputs and roles to improve reliability.
Example: Telling an LLM, “You are a financial analyst,” with supporting data.

Curation (Data Curation)

Definition: Collecting, cleaning, and labeling training data.
Why it matters: High-quality data ensures fairness and accuracy.

D

Data

Definition: Raw information in the form of numbers, text, images, audio, or other formats that is collected and used to train, validate, and test AI models.
How it works: Data can be structured (tables, databases), unstructured (text, video, sensor streams), or semi-structured (JSON, XML), and must often be cleaned, labeled, and preprocessed before use in AI pipelines.
Why it matters: The quality, quantity, and diversity of data directly determine an AI system’s performance, fairness, and reliability.

Data Augmentation

Definition: Expanding a dataset by creating modified versions of existing data.
How it works: Images may be flipped, rotated, or blurred; text may be paraphrased; noise can be added. Labels stay consistent.
Why it matters: Helps reduce overfitting and makes AI more robust.
Risks: Poorly designed augmentation can distort data distributions.

Data Governance

Definition: Policies and processes to ensure data quality, security, and ethical use.
How it works: Involves access controls, audits, lineage tracking, and retention rules.
Why it matters: Trustworthy AI depends on trustworthy data.

Dataset

Definition: A collection of data points used to train, validate, or test AI.
How it works: Typically split into training, validation, and test sets.
Why it matters: A dataset’s quality and representativeness directly determine model performance.

Decision Tree

Definition: A supervised learning model shaped like a tree, where branches represent rules and leaves represent outcomes.
How it works: Splits data by feature thresholds (e.g., Age > 30?), eventually leading to a prediction.
Why it matters: Simple and interpretable, widely used in finance, healthcare, and risk scoring.
Risks: Easily overfits unless pruned or combined in ensembles (Random Forests, Gradient Boosted Trees).

Deep Fake

Definition: AI-generated or manipulated media that imitates real people or events.
How it works: Uses Generative Adversarial Networks (GANs) or diffusion models to synthesize realistic audio, images, or video.
Why it matters: Useful in film, accessibility, and education, but also misused for disinformation and non-consensual content.
References: EU AI Act requires transparency labels for synthetic media.

Deep Learning

Definition: A subset of machine learning using multi-layer neural networks to learn hierarchical representations of data.
How it works: Each layer extracts increasingly abstract features (edges → shapes → objects in images; characters → words → meaning in text).
Why it matters: Powers breakthroughs in speech recognition, image classification, natural language processing, and generative AI.
Risks: Requires large datasets and compute; prone to bias and adversarial attacks.

Differential Privacy

Definition: A privacy framework that mathematically limits how much individual data points affect an AI’s output.
How it works: Adds carefully calibrated noise to training or queries.
Why it matters: Allows training on sensitive data while protecting individuals.

Drift (Concept/Data/Label/Model Drift)

Definition: When the statistical properties of input data or outputs change, reducing model performance.
Types:
a. Data Drift: Input data distribution shifts.
b. Label Drift: The underlying distribution of outputs changes.
c. Concept Drift: The relationship between inputs and outputs evolves.
Why it matters: Models trained on static data degrade in dynamic environments.
Mitigation: Monitoring, retraining, adaptive learning.

E

Edge AI

Definition: Running AI directly on devices rather than in the cloud.
How it works: Models are optimized for small hardware (phones, IoT sensors).
Why it matters: Enables low latency, privacy-preserving AI.
Applications: Smart cameras, voice assistants, autonomous drones.

Explainability (XAI)

Definition: Making AI outputs understandable to humans.
How it works: Provides feature importance, simplified surrogate models, or natural language explanations.
Why it matters: Critical for trust in regulated domains (healthcare, finance).
Risks: Explanations may oversimplify complex models.

Ethics in AI

Definition: The study and practice of fairness, accountability, transparency, and social impact in AI.
Why it matters: Ensures technology benefits society without exacerbating harms.
References: OECD AI Principles.

F

Feature

Definition: A measurable property used as input for an AI model.
How it works: In tabular data, features are columns; in images, features may be pixel patterns.
Why it matters: The choice of features influences model accuracy and fairness.

Fine-Tuning

Definition: Adapting a pre-trained AI model to a specific task with additional training.
How it works: A base model is retrained on domain-specific data with adjusted learning rates.
Why it matters: Efficiently creates specialized models without training from scratch.

Foundation Model

Definition: A large, general-purpose model trained on broad data that can be adapted to many tasks.
Examples: GPT, BERT, Stable Diffusion.
Why it matters: Serves as the backbone for diverse applications.
Risks: Centralization, bias, and misuse at scale.

Fairness (in AI)

Definition: Ensuring AI systems do not systematically disadvantage individuals or groups.
How it works: Involves designing balanced datasets, auditing outputs, and applying fairness metrics.
Why it matters: Prevents discrimination and fosters public trust.

G

Generative AI

Definition: AI systems that create new content such as text, images, audio, or code.
How it works: Uses neural architectures like Transformers, GANs, or diffusion models to generate outputs that resemble training data.
Why it matters: Powers chatbots, creative tools, design aids, and research assistants.
Risks: Can produce misinformation, bias, or copyright issues.

Ground Truth

Definition: The “gold standard” of accurate, real-world data against which AI models are trained and evaluated.
How it works: Labels provided by experts or validated processes.
Why it matters: Model accuracy depends on high-quality ground truth.
Risks: If labels are wrong, the AI learns errors (“garbage in, garbage out”).

GPU (Graphics Processing Unit)

Definition: A processor originally designed for graphics, now widely used to accelerate AI training and inference.
How it works: GPUs process many operations in parallel, making them ideal for matrix-heavy AI tasks.
Why it matters: Enabled breakthroughs in deep learning by speeding up computation.

Generalization (in machine learning)

Definition: The ability of an AI model to perform well on new, unseen data.
How it works: Avoids “memorizing” training data and instead learns patterns that apply broadly.
Why it matters: Core measure of whether a model is actually useful.
Risks: Poor generalization leads to overfitting.

H

Hallucination (in AI)

Definition: When AI confidently outputs false or fabricated information.
How it works: Generative models predict plausible sequences without fact-checking.
Why it matters: A critical risk in applications requiring accuracy (healthcare, law, education).
Mitigation: Grounding responses in reliable data sources, human-in-the-loop oversight.

Human-in-the-Loop (HITL)

Definition: A design pattern where humans oversee, guide, or correct AI systems at key stages.
How it works: Humans review training labels, validate outputs, or override decisions.
Why it matters: Balances automation with accountability.
Applications: Hiring systems, medical AI, financial risk scoring.

Hybrid AI

Definition: Hybrid AI is the integration of diverse AI approaches—like rule-based reasoning, machine learning, and neural networks; to create more intelligent, adaptable, transparent, and robust systems than any single method alone.
How it works: Symbolic logic provides structure; ML handles pattern recognition.
Why it matters: Improves interpretability and robustness by leveraging strengths of both.

Heuristic

Definition: A rule-of-thumb or shortcut used in AI to guide problem-solving.
How it works: Provides approximate solutions when exact computation is costly.
Why it matters: Useful in search algorithms, robotics, and decision-making.

I

Inference

Definition: The process of applying a trained AI model to new data to generate predictions or decisions.
How it works: Inputs are fed through a trained network, producing an output (e.g., classifying an image).
Why it matters: Inference is the real-world use of AI, where it adds value beyond training.

Input Data

Definition: “Data provided to or directly acquired by an AI system on the basis of which the system produces an output.” (EU AI Act)
Examples: Text prompts, sensor readings, or medical scans.
Why it matters: Inputs define outputs; poor inputs lead to poor results.

Intelligent Agent

Definition: An entity (software or robotic) that perceives its environment and acts toward goals.
How it works: Uses sensors, models, and actuators (for physical agents) to take actions.
Why it matters: Forms the conceptual basis of agent-based AI systems.

IoT + AI (AIoT)

Definition: The combination of Internet of Things devices with AI capabilities.
How it works: Connected devices generate sensor data; AI analyzes it for decisions.
Applications: Smart homes, predictive maintenance, healthcare wearables.
Risks: Security vulnerabilities and privacy issues.

Interoperability

Definition: The ability of different AI models, systems, or platforms to communicate, share data, and work together effectively.
How it works: Achieved through common standards, APIs, data schemas, and integration frameworks that allow AI components to interoperate across tools and environments.
Why it matters: Crucial for building scalable AI ecosystems, combining specialized models, avoiding vendor lock-in, and accelerating innovation.

Interpretability

Definition: The degree to which a human can understand how an AI system makes its decisions.
How it works: Tools like SHAP, LIME, and saliency maps show feature importance.
Why it matters: Essential for trust, debugging, and regulatory compliance.

J

Job Automation

Definition: The use of AI and robotics to perform tasks traditionally done by humans.
How it works: Automates repetitive, structured, or predictable activities (e.g., invoice processing, data entry).
Why it matters: Increases efficiency but raises concerns about workforce displacement.
Risks: Economic inequality, loss of livelihoods without reskilling programs.

Joint Embedding

Definition: A technique that maps different types of data (e.g., text and images) into a shared space.
How it works: Aligns semantic meaning across modalities, so “a dog” in text aligns with a picture of a dog.
Why it matters: Enables multimodal AI like CLIP (which links text to images).
Applications: Search engines, cross-modal translation, generative AI.

K

Knowledge Graph

Definition: A structured representation of knowledge in nodes (entities) and edges (relationships).
How it works: Stores facts like Paris —is the capital of→ France. AI can query these for reasoning and inference.
Why it matters: Improves explainability and retrieval for AI systems.
Applications: Google Search, medical research, enterprise data management.

K-Nearest Neighbors (KNN)

Definition: A simple machine learning algorithm that classifies data points based on the classes of their nearest neighbors.
How it works: For an input, finds k closest points in the training dataset and assigns the most common label.
Why it matters: Easy to implement and effective for smaller datasets.
Risks: Computationally expensive at scale, sensitive to noisy data.

Knowledge Distillation

Definition: A model compression technique where a high-capacity model’s output distributions are used as training targets for a lower-capacity model, enabling the smaller model to approximate the performance of the larger one with reduced computational cost.
How it works: The student learns to mimic the teacher’s output distributions.
Why it matters: Makes AI more efficient for mobile devices and edge computing.

L

Labeling (Data Labeling)

Definition: The process of tagging raw data with labels so an AI system can learn from it.
How it works: Humans (or semi-automated tools) annotate images, text, or audio with correct answers.
Why it matters: Supervised learning relies on accurate labels.
Risks: Inconsistent or biased labeling leads to poor model performance.

Large Language Model (LLM)

Definition: A type of AI model trained on vast amounts of data (traditionally text, but increasingly multimodal) to generate, understand, and manipulate language-like outputs.
How it works:
a. Built on the Transformer architecture, which uses self-attention to capture relationships between tokens.
b. Learns by predicting the next token in a sequence.
c. Modern LLMs can also integrate image, audio, or video inputs by converting them into token-like embeddings aligned with text.
Why it matters:
a. Powers applications like chatbots, summarization, translation, coding assistants, and search.
b. With multimodal extensions, LLMs can describe images, interpret charts, and handle speech.
Risks:
a. Can hallucinate plausible but false information.
b. Reinforces biases present in training data.
c. High energy and compute costs during training.
Examples:
a. Text-only: GPT-3, BERT, LLaMA.
b. Multimodal: GPT-5, CLIP, LLaVA.

Latent Space

Definition: The abstract, compressed representation of data learned by AI models.
How it works: Neural networks map inputs into a lower-dimensional space where semantically similar items are close together.
Why it matters: Enables generative AI (e.g., turning a point in latent space into an image).

Linear Regression

Definition: A foundational statistical and ML method that models relationships between inputs and outputs.
How it works: Fits a straight line through data to predict continuous outcomes.
Why it matters: Simple, interpretable, and widely used for trend analysis and forecasting.

Learning Rate

Definition: A key hyperparameter controlling how much weights are adjusted during training.
How it works: Too high = unstable training; too low = slow convergence.
Why it matters: Balancing it is crucial for efficient, effective training.

M

Machine Learning (ML)

Definition: A subset of AI where systems learn from data rather than being explicitly programmed.
How it works: Algorithms detect patterns in training data and generalize them to make predictions on new inputs. Common types include supervised, unsupervised, and reinforcement learning.
Why it matters: ML powers applications from spam filters to self-driving cars.
Risks: Can amplify bias, fail under drift, or be vulnerable to adversarial attacks.

Model

Definition: A mathematical structure that represents patterns in data to produce outputs from inputs.
How it works: Models are trained by adjusting parameters until predictions minimize error.
Why it matters: The “engine” of AI — determines accuracy, interpretability, and fairness.
Examples: Linear regression, neural networks, decision trees.

Multi-Agent AI

Definition: Systems composed of multiple AI agents that collaborate, negotiate, or compete.
How it works: Each agent perceives, decides, and acts in an environment; collective behavior emerges through interaction.
Applications: Traffic control, supply chain optimization, simulations of economies.

Multi-Agents AI (Agentic AI)

Definition: An emerging paradigm where multiple autonomous AI agents pursue long-term goals, coordinate, and self-direct.
Why it matters: Enables more complex workflows and reasoning chains beyond single-model capabilities.
Risks: Harder to predict, align, and control collective behavior.

Multimodal AI

Definition: AI systems that process more than one type of input (e.g., text + image + audio).
How it works: Maps different modalities into a shared embedding space or coordinates them with cross-attention.
Applications: Captioning images, answering questions about charts, generating videos.
Examples: GPT-5, CLIP, Flamingo.

N

Natural Language Processing (NLP)

Definition: The AI field enabling machines to understand, interpret, and generate human language.
How it works: Combines linguistics, machine learning, and large-scale text data. Tasks include tokenization, parsing, translation, sentiment analysis.
Why it matters: Forms the backbone of chatbots, translation apps, and virtual assistants.
Risks: Bias, hallucinations, misinterpretation of context.

Neural Network

Definition: A model inspired by the human brain, consisting of layers of interconnected nodes (i.e., neurons).
How it works: Each neuron applies a weighted sum of inputs, passes it through an activation function, and forwards it. Layers learn features at increasing levels of abstraction.
Why it matters: The core of deep learning, enabling breakthroughs in vision, language, and generative AI.
Risks: Black-box nature makes them hard to interpret.

Normalization

Definition: A feature scaling technique that transforms input data into a consistent range or distribution.
How it works: Methods like Min-Max scaling, Z-score standardization, or unit vector scaling adjust feature values based on dataset characteristics.
Why it matters: Ensures balanced feature influence, improves model convergence, and stabilizes training performance.

Neuro-Symbolic AI

Definition: An approach that combines neural networks with symbolic reasoning to enable both data-driven learning and structured, logic-based reasoning.

How it works: Neural models extract patterns from raw data, while symbolic systems apply rules, logic, and knowledge graphs for reasoning and explainability.

Why it matters: Bridges the gap between perception and reasoning, improves transparency, and tackles complex tasks that neither neural nor symbolic methods handle well alone.

O

Object Detection

Definition: An AI task of locating and classifying objects within an image or video.
How it works: Uses convolutional neural networks or transformer-based detectors (e.g., YOLO, DETR).
Why it matters: Core to autonomous driving, medical imaging, security surveillance.
Risks: Errors in detection can cause safety failures.

Ontology

Definition: A structured framework that defines entities and their relationships in a domain.
How it works: Provides AI with explicit knowledge representations (e.g., “cat —is a→ mammal”).
Why it matters: Improves reasoning, knowledge retrieval, and semantic search.

Overfitting

Definition: A modeling error that occurs when a model fits the training data too closely, capturing noise or spurious patterns, which leads to poor generalization on unseen data.
How it works: Model fits noise or irrelevant patterns too closely.
Why it matters: Leads to poor real-world performance.
Mitigation: Regularization, cross-validation, dropout, and more data.

Optimization (in AI)

Definition: The process of adjusting model parameters to minimize loss and improve performance.
How it works: Uses algorithms like stochastic gradient descent (SGD) to update weights.
Why it matters: Core to training — determines efficiency and accuracy.

P

Parameter

Definition: A numerical value inside a model that is adjusted during training.
How it works: In neural networks, parameters are the weights and biases that determine how inputs are transformed into outputs.
Why it matters: The number of parameters often reflects model complexity — GPT-5 has billions of parameters.
Risks: Too many parameters can cause overfitting and high computational costs.

Personal Data

Definition: Data that identifies or relates to an individual person.
Reference: Defined under the EU AI Act and GDPR.
Why it matters: Protecting personal data is central to privacy and ethical AI.
Risks: Misuse can lead to surveillance, identity theft, or discrimination.

Prompt Engineering 🧠

Definition: The art of crafting inputs (prompts) to guide large language models (LLMs) effectively.
How it works: Techniques include:
a. Meta Prompts — define system roles and levels
b. Engineered Prompts — structured, sequenced inputs
c. Prompt Iteration — refining prompts step by step
d. Prompt Chaining — using outputs as new inputs
e. Prompt Contexting — feeding extra reference material
f. Negative Prompting — excluding unwanted outputs
g. Promptless Prompts — letting AI generate its own prompts
h. Automatic Prompting — testing multiple prompts in parallel
i. Prompt Finetuning — adjusting wording/parameters
j. Role Prompting — assigning the AI a specific role (e.g., act as a lawyer or you are a data analyst) to shape its responses in line with that persona.
Why it matters: Maximizes performance without retraining models.

Preprocessing

Definition: Preparing raw data for AI use by cleaning, normalizing, and formatting.
How it works: May involve removing duplicates, scaling features, tokenizing text, or handling missing values.
Why it matters: Quality preprocessing ensures reliable training and fair outcomes.

Privacy-Preserving AI

Definition: Approaches that protect personal data while enabling AI training.
How it works: Techniques include differential privacy, homomorphic encryption, and federated learning.
Why it matters: Balances innovation with user privacy rights.

Q

Quantum AI

Definition: The use of quantum computing to accelerate or enhance AI algorithms.
How it works: Quantum properties (superposition, entanglement) allow faster solutions to certain optimization or simulation problems.
Why it matters: Could revolutionize fields like drug discovery or logistics.
Limitations: Still experimental; practical applications are limited today.

Query

Definition: A request for information made to a database, search engine, or AI system.
How it works: In LLMs, queries are prompts; in databases, they use query languages like SQL.
Why it matters: Queries define how humans interact with data-driven systems.

Q-Learning

Definition: A reinforcement learning algorithm that learns the value (Q-value) of actions in states.
How it works: Updates action-value estimates iteratively to maximize long-term rewards.
Applications: Robotics, game AI, resource allocation.

R

Reinforcement Learning (RL)

Definition: A learning paradigm where an agent interacts with an environment and learns through rewards and penalties.
How it works: The agent tries actions, observes outcomes, and updates its policy to maximize rewards.
Why it matters: Underpins advanced AI in robotics, gaming (e.g., AlphaGo), and autonomous control.
Risks: Agents may exploit poorly designed rewards (“specification gaming”).

Responsible AI

Definition: The practice of designing, developing, and deploying AI in ways that are ethical, fair, and accountable.
How it works: Combines governance frameworks, fairness audits, and transparency.
Why it matters: Builds trust and prevents harm.

Risk (AI System)

Definition: “the combination of the probability of an occurrence of harm and the severity of that harm"(EU AI Act)
How it works: High-risk AI systems include those in healthcare, policing, or finance.
Why it matters: Central to regulatory oversight and trust.

Robotic Process Automation (RPA)

Definition: Software robots that automate structured, repetitive business processes.
How it works: Mimics human interactions with digital systems, such as filling forms or processing invoices.
Why it matters: Increases efficiency in back-office operations.
Limitations: RPA breaks when processes change unexpectedly; less adaptive than AI.

Robotics

Definition: The integration of AI into machines that can perceive, move, and act in the physical world.
How it works: Combines sensors, actuators, control systems, and AI models.
Applications: Manufacturing, logistics, healthcare (surgical robots).

Robustness

Definition: The ability of an AI system to perform reliably under varied or adverse conditions.
How it works: Tested through stress-testing, adversarial robustness checks, and monitoring.
Why it matters: Prevents unexpected failures in real-world deployment.

Recommendation System

Definition: AI systems that suggest items (products, media, content) to users.
How it works: Uses collaborative filtering, content-based filtering, or hybrid approaches.
Applications: Netflix, Amazon, Spotify.
Risks: Can reinforce filter bubbles and bias.

Retrieval-Augmented Generation (RAG)

Definition: An architecture where LLMs use an external database or search system to ground their answers.
How it works: A retriever fetches relevant documents; the generator (LLM) incorporates them into its response.
Why it matters: Reduces hallucinations and improves accuracy.

S

Safety (AI Safety)

Definition: The field of research and practice aimed at preventing AI systems from causing harm.
How it works: Includes short-term focus (bias, robustness, adversarial safety) and long-term focus (alignment, controllability, existential risks).
Why it matters: AI is increasingly embedded in critical systems where failures can have major consequences.
Risks:
a. Technical Risks: Bias, adversarial attacks, hallucinations, drift, and lack of robustness.
b. Societal Risks: Deepfakes, large-scale misinformation, surveillance misuse, job displacement, and accountability gaps.
c. Systemic Risks: Cascading failures across industries (finance, healthcare, energy grids) due to interconnected AI systems.
d. Catastrophic Risks: Low-probability but high-impact failures that cause severe disruption and mass harm without ending humanity. Examples include misuse of autonomous weapons, collapse of global financial systems, or AI-enabled bioterrorism.
e. Existential Risks: Scenarios where AI leads to humanity’s extinction or permanent loss of our future potential. Examples include misaligned AGI, uncontrollable superintelligence, or runaway singularity dynamics.
References: OECD AI Principles, EU AI Act Explorer, Center for AI Safety.

Sandboxing

Definition: Testing AI systems in a controlled environment before release.
How it works: Developers deploy AI in “regulatory sandboxes” or technical testbeds to assess risks.
Why it matters: Reduces potential harms before real-world deployment.
Reference: Defined in the EU AI Act.

Singularity

Definition: A hypothetical future point when AI surpasses human intelligence and accelerates self-improvement beyond human control.
Why it matters: Sparks debate about societal preparedness, governance, and existential safety.
Risks: Unpredictable impacts on economies, power structures, and survival.

Supervised Learning

Definition: A machine learning paradigm where models are trained on labeled input-output pairs.
How it works: The algorithm maps inputs (e.g., medical images) to outputs (e.g., “cancer” vs. “healthy”) based on labeled examples.
Why it matters: The most widely used form of ML; underpins spam detection, fraud detection, and speech recognition.

Synthetic Data

Definition: Artificially generated data used in place of real-world data.
How it works: Created using simulation, statistical models, or generative AI.
Why it matters: Useful for privacy, rare event modeling, and augmenting datasets.
Risks: Synthetic data may not capture real-world edge cases.

Synthetic Media

Definition: Any content (text, video, audio, image) generated or altered by AI.
Examples: Deepfakes, AI-written articles, voice clones.
Why it matters: Enables creativity and accessibility, but also misinformation and fraud.

T

Testing Data

Definition: “Data used for providing an independent evaluation of an AI system to confirm expected performance before placing it on the market.” (EU AI Act)
Why it matters: Ensures that systems generalize and work safely beyond training.

Tokenization

Definition: Splitting text or other data into smaller units (tokens) that models can process.
How it works: Words may be split into subwords, characters, or byte-pair encodings (BPE).
Why it matters: Fundamental to how LLMs process and generate language.

Training Data

Definition: “Data used for training an AI system through fitting its learnable parameters.” (EU AI Act)
Why it matters: The foundation of model learning; its quality defines the model’s reliability.

Transparency

Definition: The degree to which AI systems disclose information about their functioning, limitations, and decision-making.
How it works: Can include transparency reports, dataset cards, and explainability tools.
Why it matters: Builds trust and is often mandated in regulation.

Transfer Learning

Definition: Reusing a pre-trained model on one task and adapting it to another.
How it works: A model trained on a broad dataset (e.g., ImageNet) can be fine-tuned for a niche task (e.g., medical imaging).
Why it matters: Saves time, data, and resources while improving performance.

Trustworthy AI

Definition: AI that is lawful, ethical, and technically robust, designed to align with human rights, values, and societal expectations.
How it works: Encompasses fairness, accountability, transparency, and safety principles.
Risks:
a. Bias & Unfairness — AI may discriminate or exclude groups if data is biased.
b. Opacity & Lack of Transparency — Black-box decisions reduce trust and accountability.
c. Privacy & Security — Sensitive data misuse or AI systems being attacked.
d. Safety & Reliability — Failures in real-world conditions can cause harm.
e. Societal & Ethical Impact — Misuse for surveillance, disinformation, or unsustainable practices.
References: EU AI Act

U

Unsupervised Learning

Definition: A machine learning paradigm where models find patterns in unlabeled data.
How it works: Clustering and dimensionality reduction techniques group similar data points or discover hidden structures.
Why it matters: Enables AI to learn when labeled data is scarce.
Applications: Customer segmentation, anomaly detection, topic modeling.

Unstructured Data

Definition: Data without a predefined format, such as text, video, or audio.
How it works: Requires specialized methods (NLP, computer vision) to process.
Why it matters: Makes up the majority of real-world data.

Utility Function

Definition: The mathematical representation of goals or objectives an AI agent tries to maximize.
How it works: Guides reinforcement learning agents by assigning rewards to outcomes.
Why it matters: Poorly designed utility functions can lead to unintended behaviors (reward hacking).

Uplift Modeling

Definition: A predictive technique to estimate how actions (like marketing campaigns) affect different groups differently.
How it works: Models the incremental effect (uplift) of interventions.
Why it matters: Useful in marketing, healthcare treatments, and policy design.

V

Validation Data

Definition: “Data used for providing an evaluation of the trained AI system and for tuning its non-learnable parameters and learning process to prevent underfitting or overfitting.” (EU AI Act)
How it works: Split from training data or collected separately; used during development, not deployment.
Why it matters: Ensures models are fine-tuned to generalize effectively.

Validation Data Set

Definition: A specific dataset (or part of the training set) reserved for validation. (EU AI Act)
Why it matters: Helps developers monitor overfitting vs. underfitting during training.

Vector Embedding

Definition: A numeric representation of data (words, images, audio) in a high-dimensional space.
How it works: Similar items are placed closer together; e.g., “king – man + woman ≈ queen.”
Why it matters: The foundation of search, recommendation, and semantic AI.

Vibe Coding

Definition: An exploratory programming style using conversational AI to brainstorm and co-create code.
How it works: Developers prompt AI iteratively, refining code and design in a fluid workflow.
Why it matters: Democratizes programming, allowing non-experts to create software.

Voice Recognition

Definition: AI that interprets spoken input.
How it works: Converts audio waveforms into text using models trained on speech data.
Why it matters: Powers voice assistants, transcription, accessibility tools.

Workflows, AI-Intelligent

Definition: Business or creative workflows enhanced by AI decision-making or automation.
How it works: AI integrates into steps of the process, e.g., drafting emails, analyzing data, recommending actions.
Why it matters: Increases productivity and enables “intelligent automation.”

W

Weak AI

Definition: AI designed for a narrow, specific task.
How it works: Optimized for one domain (e.g., chess engines, spam filters).
Why it matters: Most AI today is weak AI — powerful but not general.

Weights

Definition: Parameters in neural networks that determine the strength of connections between nodes.
How it works: Adjusted during training via gradient descent to minimize errors.
Why it matters: Define how a model processes information.

Word Embedding

Definition: Mapping words into vectors that capture semantic meaning.
How it works: Models like Word2Vec or GloVe place semantically similar words close together in vector space.
Why it matters: Enabled major progress in NLP before LLMs.

X

XAI (Explainable AI)

Definition: AI designed with interpretability tools to explain outputs.
How it works: Techniques like SHAP, LIME, or attention visualizations show how inputs influenced predictions.
Why it matters: Required in regulated industries for accountability and safety.
Risks: Simplified explanations can obscure complexity.

Y

Yield Optimization

Definition: The use of AI to maximize efficiency or returns, often in production or marketing.
How it works: Algorithms dynamically adjust processes or offers to increase outcomes (e.g., ad clicks, crop yield).
Why it matters: A core business use case for AI.
Risks: Over-optimization may ignore fairness, ethics, or sustainability.

Z

Zero-Shot Learning

Definition: An AI’s ability to perform a task it hasn’t been explicitly trained on.
How it works: Relies on generalization and transfer of knowledge from related tasks.
Why it matters: Demonstrates flexibility and broad applicability of modern LLMs.
Example: An LLM translating a new language without specific training data.

Zettabyte

Definition: A data storage unit equal to one sextillion bytes (10²¹).
Why it matters: Illustrates the immense scale of data generated and processed in the AI era.

As AI continues to evolve, so too will the language we use to describe it. Staying fluent in this vocabulary is not just about keeping up with the latest buzzwords; it is about understanding the technologies, risks, and safeguards that shape how AI affects our lives. This glossary is a starting point: a reference to help you navigate the continuously changing world of artificial intelligence with clarity and confidence.

Key sources & further reading:

Created by:

Kadian Davis-Owusu

Kadian has a background in Computer Science and pursued her PhD and post-doctoral studies in the fields of Design for Social Interaction and Design for Health. She has taught a number of interaction design courses at the university level including the University of the West Indies, the University of the Commonwealth Caribbean (UCC) in Jamaica, and the Delft University of Technology in The Netherlands. Kadian also serves as the Founder and Lead UX Designer for TeachSomebody and is the host of ...

Courses Live classes Blogs Discussions