Jahanzaib
RAG & Retrieval

Embedding

A vector representation of text that captures semantic meaning. Similar text gets similar vectors.

Last updated: April 26, 2026

Definition

An embedding is a fixed-length vector (typically 768 to 3072 dimensions) produced by an embedding model like OpenAI text-embedding-3 or Anthropic's Voyage. The key property: semantically similar text has high cosine similarity. "Cancel my order" and "I want to return this" have close vectors even though they share few words. Embeddings power semantic search, recommendation, clustering, and the retrieval step of RAG. Cost is cheap (~$0.02 to $0.13 per million tokens) but you pay it for every document and every query.

Code Example

python
embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="Cancel my order please",
).data[0].embedding
# Returns a list of 1536 floats. Store and search by cosine similarity.

Embed once on ingest, embed again per query, find nearest neighbors.

When To Use

Required for RAG. Also useful for semantic search, deduplication, and recommendation.

Related Terms

Building with Embedding?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.