RAG & Retrieval

Chunking

Splitting documents into smaller pieces before embedding so retrieval returns relevant fragments.

Last updated: April 26, 2026

Definition

Chunking decides what unit gets retrieved. Too big (whole documents): retrieved chunks contain mostly irrelevant text and waste context window. Too small (single sentences): chunks lack context and retrieval quality drops. Common strategies: fixed-size (500-1000 tokens with overlap), semantic (split at paragraph/heading boundaries), and structure-aware (treat code, tables, and prose differently). The right strategy depends on your document type. Most teams iterate on chunking 3-5 times before they get retrieval quality right.

Code Example

python

# Recursive character text splitter. Start here
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " "],
)
chunks = splitter.split_text(document)

Start with 1000-token chunks with 200-token overlap. Iterate based on retrieval quality.

When To Use

Required for any RAG system. Spend the time to get this right. Chunking quality dominates retrieval quality.

Related Terms

RAG (Retrieval-Augmented Generation)

Fetching relevant documents at query time and injecting them into the LLM prompt…

Embedding

A vector representation of text that captures semantic meaning. Similar text get…

Semantic Search

Finding documents by meaning, not by keyword overlap, using embedding similarity…

Building with Chunking?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms