Skip to content

RAG in <Agentic WorkFlow>

Retrieval-Augmented Generation (RAG) is a technique that improves AI responses by combining language models with external data sources. Instead of relying solely on the model’s internal training data, RAG systems retrieve relevant documents to ground responses in up-to-date, domain-specific, or proprietary knowledge. RAG workflows typically rely on vector stores to manage and search this external data efficiently.

sequenceDiagram
    participant User
    participant RAG_System as RAG System
    participant Vector_Store as Vector Store
    participant LLM as Language Model
    participant Knowledge_Base as Knowledge Base

    Note over User, Knowledge_Base: RAG Workflow Process

    User->>RAG_System: Submit Query
    RAG_System->>Vector_Store: Search for Relevant Documents
    Vector_Store->>Knowledge_Base: Retrieve Document Chunks
    Knowledge_Base-->>Vector_Store: Return Matching Content
    Vector_Store-->>RAG_System: Provide Relevant Context
    RAG_System->>LLM: Send Query + Retrieved Context
    LLM-->>RAG_System: Generate Contextual Response
    RAG_System-->>User: Return Enhanced Answer

A vector store is a special database designed to store and search high-dimensional vectors: numerical representations of text, images, or other data. When you upload a document, the vector store splits it into chunks and converts each chunk into a vector using an embedding model.

You can query these vectors using similarity searches, which construct results based on semantic meaning, rather than keyword matches. This makes vector stores a powerful foundation for RAG and other AI systems that need to retrieve and reason over large sets of knowledge.

/// note | Start with a RAG template 👉 Try out RAG in Agentic WorkFlow with the [RAG Starter Template](https://Agentic WorkFlow/workflows/5010-rag-starter-template-using-simple-vector-stores-form-trigger-and-openai). The template includes two ready-made workflows: one for uploading files and one for querying them. ///

flowchart TD
    A[Source Data] --> B[Data Loader Node]
    B --> C[Text Splitter]
    C --> D[Embedding Model]
    D --> E[Vector Store - Insert]

    F[User Query] --> G[Vector Store - Search]
    G --> H[Retrieved Chunks]
    H --> I{Use Agent or Direct Query?}

    I -->|Agent| J[Agent Node with Vector Store Tool]
    I -->|Direct| K[Vector Store - Get Many]

    J --> L[LLM with Context]
    K --> L
    L --> M[Enhanced Response]

    style A fill:#e1f5fe
    style F fill:#e8f5e8
    style M fill:#fff3e0

Before your agent can access custom knowledge, you need to upload that data to a vector store:

  1. Add the nodes needed to fetch your source data.
  2. Insert a Vector Store node (e.g. the Simple Vector Store) and choose the Insert Documents operation.
  3. Select an embedding model, which converts your text into vector embeddings. Consult the FAQ for more information on choosing the right embedding model.
  4. Add a Default Data Loader node, which splits your content into chunks. You can use the default settings or define your own chunking strategy:
    • Character Text Splitter: splits by character length.
    • Recursive Character Text Splitter: recursively splits by Markdown, HTML, code blocks or simple characters (recommended for most use cases).
    • Token Text Splitter: splits by token count.
  5. (Optional) Add metadata to each chunk to enrich the context and allow better filtering later.

You can query the data in two main ways: using an agent or directly through a node.

  1. Add an agent to your workflow.
  2. Add the vector store as a tool and give it a description to help the agent understand when to use it:
    • Set the limit to define how many chunks to return.
    • Enable Include Metadata to provide extra context for each chunk.
  3. Add the same embedding model you used when inserting the data.

/// tip | Pro tip To save tokens on an expensive model, you can first use the Vector Store Question Answer tool to retrieve relevant data, and only then pass the result to the Agent. To see this in action, check out [this tAgentic WorkFlowlate](https://Agentic WorkFlow/workflows/5011-save-costs-in-rag-workflows-using-the-qanda-tool-with-multiple-models). ///

  1. Add your vector store node to the canvas and choose the Get Many operation.
  2. Enter a query or prompt:
    • Set a limit for how many chunks to return.
    • Enable Include Metadata if needed.

How do I choose the right embedding model?

Section titled “How do I choose the right embedding model?”

The right embedding model differs from case to case.

In general, smaller models (for example, text-embedding-ada-002) are faster and cheaper and thus ideal for short, general-purpose documents or lightweight RAG workflows. Larger models (for example, text-embedding-3-large) offer better semantic understanding. These are best for long documents, complex topics, or when accuracy is critical.

What is the best text splitting for my use case?

Section titled “What is the best text splitting for my use case?”

This again depends a lot on your data:

  • Small chunks (for example, 200 to 500 tokens) are good for fine-grained retrieval.
  • Large chunks may carry more context but can become diluted or noisy.

Using the right overlap size is important for the AI to understand the context of the chunk. That’s also why using the Markdown or Code Block splitting can often help to make chunks better.

Another good approach is to add more context to it (for example, about the document where the chunk came from). If you want you can read more about this, you can check out this great article from Anthropic.