RAG & Retrieval

Embedding

A vector representation of text that captures semantic meaning. Similar text gets similar vectors.

Last updated: April 26, 2026

Definition

An embedding is a fixed-length vector (typically 768 to 3072 dimensions) produced by an embedding model like OpenAI text-embedding-3 or Anthropic's Voyage. The key property: semantically similar text has high cosine similarity. "Cancel my order" and "I want to return this" have close vectors even though they share few words. Embeddings power semantic search, recommendation, clustering, and the retrieval step of RAG. Cost is cheap (~$0.02 to $0.13 per million tokens) but you pay it for every document and every query.

Code Example

python

embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Cancel my order please",
).data[0].embedding
# Returns a list of 1536 floats. Store and search by cosine similarity.

Embed once on ingest, embed again per query, find nearest neighbors.

When To Use

Required for RAG. Also useful for semantic search, deduplication, and recommendation.

Related Terms

RAG (Retrieval-Augmented Generation)

Fetching relevant documents at query time and injecting them into the LLM prompt…

Vector Database

A database optimized for storing embeddings and finding nearest neighbors at sca…

Semantic Search

Finding documents by meaning, not by keyword overlap, using embedding similarity…

Chunking

Splitting documents into smaller pieces before embedding so retrieval returns re…

Building with Embedding?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms