How to build RAG AI workflows on n8n

intermediate 12 min read Updated 2026-05-25

Quick Answer

Build RAG AI workflows on n8n by combining document processing nodes, vector database operations, and LLM nodes to create intelligent question-answering systems. Connect data ingestion, embedding generation, vector search, and response generation in a seamless automated workflow.

Full n8n Review

Prerequisites

Basic n8n workflow experience
OpenAI API key or similar LLM access
Vector database setup (Pinecone, Weaviate, or Qdrant)
Understanding of embeddings and vector search concepts

Step-by-Step Instructions

Set up your n8n workflow foundation

Create a new workflow in n8n and add a Webhook node as your trigger. Configure the webhook to accept POST requests with JSON data containing user queries. Set the webhook path to something like /rag-query and enable Return Response option to send results back to the caller.

Use a Manual Trigger node during development to test your workflow before exposing it via webhook

Add document processing and chunking

Insert a Code node after your trigger to process incoming documents. Use JavaScript to split large documents into smaller chunks:

const text = $input.first().json.document;
const chunkSize = 1000;
const chunks = [];

for (let i = 0; i < text.length; i += chunkSize) {
  chunks.push({
    chunk: text.slice(i, i + chunkSize),
    index: Math.floor(i / chunkSize)
  });
}

return chunks.map(chunk => ({ json: chunk }));

Keep chunks between 500-1500 characters for optimal embedding performance

Generate embeddings for your content

Add an OpenAI node and select Embeddings operation. Configure it to use the text-embedding-3-small model for cost efficiency. Map the chunk text from the previous node to the Input Text field. Enable Execute Once for All Items to process multiple chunks efficiently.

Store the original chunk text alongside embeddings for retrieval - you'll need both later

Store embeddings in vector database

Connect a Pinecone or HTTP Request node (for other vector DBs) to store your embeddings. For Pinecone, use the Insert operation and map:

ID: Generate unique identifier using {{$json.index}}-{{Date.now()}}
Values: The embedding array from OpenAI
Metadata: Include original text and any relevant metadata

Include meaningful metadata like document source, timestamp, and categories for better filtering during retrieval

Implement query processing and vector search

Create a separate branch for handling user queries. Add another OpenAI node to generate embeddings for the incoming question. Then add a Pinecone node with Query operation to find similar content. Set Top K to 5-10 results and include metadata in the response. Use the query embedding as the search vector.

Experiment with different similarity thresholds to balance relevance and recall

Merge retrieved context with user query

Add a Code node to combine search results into a coherent context. Process the vector search results:

const searchResults = $input.all();
const context = searchResults
  .map(item => item.json.metadata.text)
  .join('\n\n');

const userQuery = $('Webhook').first().json.query;

return [{
  json: {
    context: context,
    query: userQuery,
    prompt: `Context: ${context}\n\nQuestion: ${userQuery}\n\nAnswer:`
  }
}];

Limit context length to stay within your LLM's token limits - typically 4000-8000 tokens for most models

Generate AI response using retrieved context

Add a final OpenAI node with Chat operation using gpt-4 or gpt-3.5-turbo. Create a system message: You are a helpful assistant. Answer questions based only on the provided context. Use the constructed prompt from the previous step as the user message. Set temperature to 0.1 for more consistent responses.

Include instructions to say 'I don't know' if the context doesn't contain relevant information

Format and return the final response

Add a final Code node to format the response for your application. Structure the output with the AI answer, source information, and confidence indicators:

const aiResponse = $input.first().json.choices[0].message.content;
const sources = $('Pinecone Query').all().map(item => item.json.metadata);

return [{
  json: {
    answer: aiResponse,
    sources: sources,
    timestamp: new Date().toISOString()
  }
}];

Include source citations and confidence scores to make your RAG system more transparent and trustworthy

Common Issues & Troubleshooting

Vector search returns irrelevant results

Check your embedding model consistency - ensure you use the same model for both indexing and querying. Also verify your chunk size isn't too large and consider adding more specific metadata filtering in your Pinecone Query node.

Workflow timeouts with large documents

Enable Execute Once for All Items in your OpenAI embedding nodes and consider implementing batch processing using the Split Into Batches node to handle large document sets in smaller chunks.

AI responses are inconsistent or hallucinated

Lower the temperature setting to 0.1 in your OpenAI Chat node and strengthen your system prompt to emphasize answering only from provided context. Add explicit instructions to respond with 'Information not available' when context is insufficient.

High API costs from OpenAI calls

Use text-embedding-3-small instead of larger models for embeddings, implement caching for repeated queries using n8n's Redis node, and set reasonable limits on context length and search result counts.

Prices mentioned in this guide are pulled from current plan data and may change. Always verify on the official n8n website before purchasing.

Visit n8n View n8n Pricing