Skip to content

Indexer Node

The Indexer Node is like a smart librarian that takes long documents and organizes them into searchable sections. It breaks your content into manageable chunks and prepares them for AI to search through and understand.

This is the essential first step for building any AI knowledge base or document search system.

Illustration of documents being organized and indexed for AI search

The node takes your documents, intelligently splits them at natural break points (like paragraphs), and converts each chunk into a searchable format that AI can understand and find relevant information from.

graph LR
  Document[Long Document] --> Split[Smart Splitting]
  Split --> Index[Create Search Index]
  Index --> Ready[Searchable Chunks]
  style Index fill:#6d28d9,stroke:#fff,color:#fff
  1. Provide Your Content: Connect documents, web pages, or any text content you want to make searchable.

  2. Choose Chunk Size: Decide how big each searchable piece should be (1000 characters works well for most content).

  3. Set Overlap: Choose how much chunks should overlap to maintain context between sections.

  4. Select AI Model: Choose an embedding model to convert text into searchable format.

Let’s create a searchable knowledge base depending on what kind of documents you have.

For General Business Docs:

  • Chunk Size: 1000 characters (about 2-3 paragraphs).
  • Overlap: 200 characters (helps keep context between chunks).
  • Best For: Policies, procedures, and FAQs.

For Technical Manuals:

  • Chunk Size: 800 characters (smaller chunks for specific instructions).
  • Overlap: 150 characters.
  • Separators: Split by paragraphs or new lines.

For Research Papers:

  • Chunk Size: 1200 characters (larger chunks to keep complex ideas together).
  • Overlap: 300 characters.
  • Goal: Detailed analysis and understanding.
Content TypeChunk SizeOverlapBest For
General Business Docs1000200Policies, procedures, FAQs
Technical Documentation800150User manuals, API docs
Research Papers1200300Academic content, detailed analysis
Customer Support600100Quick answers, troubleshooting
SettingPurposeRecommended Values
Chunk SizeHow big each searchable piece is800-1200 characters
Chunk OverlapHow much pieces overlap150-300 characters
SeparatorsWhere to split contentParagraphs, sentences, sections
Embedding ModelAI model for search capabilityOpenAI text-embedding-ada-002

Make company policies instantly searchable:

Input: Employee handbook PDF
Chunk Size: 1000 (good for policy sections)
Overlap: 200 (maintains context)
Result: Searchable HR knowledge base

Create searchable API documentation:

Input: Technical documentation
Chunk Size: 800 (shorter for specific instructions)
Separators: ["##", "###", "\n\n"] (respects heading structure)
Result: Instant technical support system

Build searchable academic paper collection:

Input: Research papers and articles
Chunk Size: 1200 (longer for academic context)
Overlap: 300 (important for research continuity)
Result: AI-powered research assistant
  • Poor chunk quality: Adjust chunk size and overlap settings, or customize separators for your document type.
  • Slow processing: Reduce document size, process in smaller batches, or use local embedding models.
  • Missing context: Increase chunk overlap to maintain better connections between sections.
  • Memory issues: Process large documents in smaller segments or reduce the maximum number of chunks.