Skip to content

Indexer Node

The Indexer Node takes long documents and breaks them into smart, searchable chunks. Think of it as a librarian that organizes books into sections and creates a detailed catalog - it makes your documents ready for AI to search through and understand.

NameTypeDescriptionRequiredDefault
inputTextTextDocument content to processYes-
embeddingModelTextAI model for creating searchable vectorsYes-
chunkSizeNumberMaximum characters per chunkYes-
chunkOverlapNumberCharacters to overlap between chunksNo200
separatorsArrayHow to split text (paragraphs, sentences)No[“\n\n”]
NameTypeDescription
chunksArraySmart text chunks with embeddings
summaryObjectProcessing statistics and info
metadataObjectDocument details and timestamps

📚 Knowledge Base Builder: Turn company docs into searchable AI database

  • Input: Employee handbook, policies, procedures
  • Output: Searchable chunks ready for Q&A system

🔍 Research Assistant: Make academic papers searchable by concept

  • Input: Research papers and articles
  • Output: Organized chunks that AI can search through

💬 Smart Customer Support: Index help docs for instant answers

  • Input: FAQ pages, user manuals, troubleshooting guides
  • Output: Searchable knowledge base for support chatbot
flowchart LR
    A[📄 Long Document] --> B[✂️ Smart Chunking]
    B --> C[🧠 Create Embeddings]
    C --> D[📊 Searchable Chunks]

    style A fill:#e3f2fd
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e8

Simple Process:

  1. Smart Splitting: Breaks documents at natural points (paragraphs, sections)
  2. Create Embeddings: Converts text chunks into searchable vectors
  3. Add Metadata: Keeps track of source, position, and relationships
  4. Ready for Search: Chunks are ready for AI knowledge systems

What to Index 📄

  • Input Text: The content you want to make searchable (from documents, web pages, etc.)
  • Embedding Model: The AI model that creates the searchable format (OpenAI recommended)

How to Split Content ✂️

  • Chunk Size: How big each piece should be (1000 characters works well for most content)
  • Chunk Overlap: How much pieces should overlap (200 characters prevents losing context)

Content Organization 🏷️

  • Metadata: Extra information like document title, date, category
  • Separators: Where to split (paragraphs work best for most documents)

Works in all major browsers:

  • Chrome: Full support with fast processing
  • Firefox: Full support
  • ⚠️ Safari: Limited storage for large documents
  • Edge: Full support
  • 🔒 API Keys Protected: Embedding keys stored securely
  • 🌐 Secure Processing: Uses encrypted connections for AI services
  • 💾 Local Caching: Processed chunks cached locally for performance
  • Content Validation: Input text validated for security

What you’ll build: Turn company documents into searchable AI database

Workflow:

Get All Text From Link → Indexer Node → Local Knowledge → RAG Node

Setup:

  • Chunk Size: 1000 (good for general business docs)
  • Chunk Overlap: 200 (maintains context)
  • Embedding Model: “text-embedding-ada-002”

Result: Smart, searchable knowledge base ready for employee Q&A system.

What you’ll build: Searchable academic paper database

Workflow:

Upload Documents → Indexer Node → Local Knowledge → Q&A Node

Setup:

  • Chunk Size: 1200 (longer for academic content)
  • Chunk Overlap: 300 (important for research context)
  • Preserve Format: true (keeps citations and formatting)

Result: AI-powered research assistant that can find relevant papers and concepts.

Example 3: Customer Support Knowledge Base

Section titled “Example 3: Customer Support Knowledge Base”

What you’ll build: Smart help documentation system

Workflow:

Get HTML From Link → Indexer Node → Local Knowledge → RAG Node

Setup:

  • Chunk Size: 800 (shorter for quick answers)
  • Separators: [“\n\n”, “\n”, “Q:”, “A:”] (respects Q&A format)

Result: Instant, accurate customer support powered by your documentation.

🔍 Advanced Example: Multi-Language Documents

What you’ll build: Knowledge base supporting multiple languages

Setup:

  • Use multilingual embedding models
  • Separate chunk processing by language
  • Maintain language metadata for each chunk

Use case: International company with documentation in multiple languages.

  • Test chunk sizes: Start with 1000 characters, adjust based on your content
  • Use appropriate overlap: 150-200 characters maintains good context
  • Choose smart separators: Respect natural document structure (paragraphs, sections)
  • Add meaningful metadata: Include source, category, date for better organization
  • Process in batches: Break very large documents into manageable pieces
  • Making chunks too small (loses context) or too large (reduces precision)
  • Using zero overlap (can break important connections between chunks)
  • Ignoring document structure (splits sentences or concepts awkwardly)
  • Processing without metadata (makes it hard to track sources later)

Problem: Too many embedding API requests Solution: Reduce batch size, add delays between requests, or upgrade API plan

Problem: Chunks cut off mid-sentence or lose context Solution: Adjust chunk size and overlap, customize separators for your document type

Problem: Browser crashes with very large documents Solution: Process documents in smaller segments, reduce max chunks limit

Problem: Indexing takes too long Solution: Use smaller documents, reduce chunk overlap, or try local embedding models

  • Document Size: Maximum 100MB per document for browser processing
  • Processing Time: Large documents may take several minutes to process
  • API Costs: Cloud embedding services charge per chunk processed
  • Memory Usage: Large documents require significant browser memory
  • Local Knowledge: Stores the processed chunks for searching
  • RAG Node: Uses indexed chunks for intelligent question-answering
  • Recursive Character Text Splitter: Alternative for simpler text splitting
  • Ollama Embeddings: Creates the searchable vectors from text

Indexer Node is the first step in:

  • Building smart knowledge bases
  • Creating searchable document collections
  • Enabling AI-powered Q&A systems

Related nodes: Local KnowledgeRAG NodeQ&A Node

Common workflows: AI Knowledge BaseDocument SearchSmart Research

Learn more: AI Workflow Builder • [Understanding RAG](/advanced-ai/basics/rag-in-Agentic WorkFlow/) • Vector Databases


💡 Pro Tip: Start with standard settings (1000 character chunks, 200 overlap) and adjust based on your specific content type. Technical docs might need smaller chunks, while narrative content can handle larger ones.