Skip to content

Recursive Character Text Splitter

The Recursive Character Text Splitter intelligently breaks long documents into smaller, manageable chunks while keeping related content together. It’s like having a smart librarian who knows exactly where to split a book so each section makes sense on its own.

This is the go-to choice for preparing documents for AI processing, knowledge bases, and search systems.

Illustration of documents being intelligently split into meaningful chunks

The splitter tries different ways to break your text, starting with natural boundaries like paragraphs, then sentences, and finally individual words if needed. It ensures each chunk is the right size while preserving meaning.

graph LR
  Document[Long Document] --> Analyze[Find Break Points]
  Analyze --> Split[Smart Splitting]
  Split --> Chunks[Perfect Chunks]
  style Split fill:#6d28d9,stroke:#fff,color:#fff
  1. Set Chunk Size: Choose how big each piece should be (1000 characters works well for most AI models).

  2. Add Overlap: Set how much chunks should overlap to maintain context between sections.

  3. Choose Separators: Pick where to split (paragraphs, sentences, or let it decide automatically).

  4. Process Your Text: The splitter will create perfectly sized chunks ready for AI processing.

Practical example: Preparing documents for AI knowledge base

Section titled “Practical example: Preparing documents for AI knowledge base”

Let’s split a technical document into chunks perfect for AI search and retrieval.

Option 1: General Documents

  • Chunk Size: 1000 characters.
  • Overlap: 200 characters.
  • Best For: Business documents, articles, stories.

Option 2: Technical Docs

  • Chunk Size: 800 characters (smaller for precision).
  • Separators: Paragraphs, then sentences, then punctuation.
  • Best For: Manuals, instructions, specs.

Option 3: Code Documentation

  • Chunk Size: 1200 characters (larger to keep code blocks intact).
  • Separators: Code blocks (```), then paragraphs.
  • Best For: API docs, tutorials with code.
Smart Splitting (Recursive)Simple Splitting
Keeps related content togetherMay break sentences mid-way
Respects document structureIgnores natural boundaries
Maintains context between chunksCan lose important connections
Works with different content typesOne-size-fits-all approach
SettingPurposeRecommended Values
Chunk SizeMaximum characters per chunk800-1200 for most AI models
Chunk OverlapCharacters shared between chunks150-300 to maintain context
SeparatorsWhere to split textStart with paragraphs, then sentences

Split company documentation for AI search:

Input: Employee handbook, policies, procedures
Chunk Size: 1000 (good for general business content)
Overlap: 200 (maintains policy context)
Result: Searchable knowledge base chunks

Prepare academic papers for analysis:

Input: Research papers and articles
Chunk Size: 1200 (longer for academic context)
Overlap: 300 (important for research continuity)
Result: AI-ready research database

Process API docs and technical guides:

Input: Technical documentation with code examples
Chunk Size: 800 (shorter for specific instructions)
Separators: Respect code blocks and sections
Result: Precise technical search system
  • Chunks too large or small: Adjust chunk size based on your AI model’s requirements and content type.
  • Lost context between chunks: Increase overlap to preserve more connections between sections.
  • Poor splitting quality: Customize separators to better match your document structure.
  • Slow processing: Reduce chunk overlap or process documents in smaller batches.