How to build RAG AI workflows on n8n
Build RAG AI workflows on n8n by combining document processing nodes, vector database operations, and LLM nodes to create intelligent question-answering systems. Connect data ingestion, embedding generation, vector search, and response generation in a seamless automated workflow.
Prerequisites
- Basic n8n workflow experience
- OpenAI API key or similar LLM access
- Vector database setup (Pinecone, Weaviate, or Qdrant)
- Understanding of embeddings and vector search concepts
Step-by-Step Instructions
Set up your n8n workflow foundation
/rag-query and enable Return Response option to send results back to the caller.Add document processing and chunking
const text = $input.first().json.document;
const chunkSize = 1000;
const chunks = [];
for (let i = 0; i < text.length; i += chunkSize) {
chunks.push({
chunk: text.slice(i, i + chunkSize),
index: Math.floor(i / chunkSize)
});
}
return chunks.map(chunk => ({ json: chunk }));Generate embeddings for your content
text-embedding-3-small model for cost efficiency. Map the chunk text from the previous node to the Input Text field. Enable Execute Once for All Items to process multiple chunks efficiently.Store embeddings in vector database
- ID: Generate unique identifier using
{{$json.index}}-{{Date.now()}} - Values: The embedding array from OpenAI
- Metadata: Include original text and any relevant metadata
Implement query processing and vector search
Merge retrieved context with user query
const searchResults = $input.all();
const context = searchResults
.map(item => item.json.metadata.text)
.join('\n\n');
const userQuery = $('Webhook').first().json.query;
return [{
json: {
context: context,
query: userQuery,
prompt: `Context: ${context}\n\nQuestion: ${userQuery}\n\nAnswer:`
}
}];Generate AI response using retrieved context
gpt-4 or gpt-3.5-turbo. Create a system message: You are a helpful assistant. Answer questions based only on the provided context. Use the constructed prompt from the previous step as the user message. Set temperature to 0.1 for more consistent responses.Format and return the final response
const aiResponse = $input.first().json.choices[0].message.content;
const sources = $('Pinecone Query').all().map(item => item.json.metadata);
return [{
json: {
answer: aiResponse,
sources: sources,
timestamp: new Date().toISOString()
}
}];Common Issues & Troubleshooting
Vector search returns irrelevant results
Check your embedding model consistency - ensure you use the same model for both indexing and querying. Also verify your chunk size isn't too large and consider adding more specific metadata filtering in your Pinecone Query node.
Workflow timeouts with large documents
Enable Execute Once for All Items in your OpenAI embedding nodes and consider implementing batch processing using the Split Into Batches node to handle large document sets in smaller chunks.
AI responses are inconsistent or hallucinated
Lower the temperature setting to 0.1 in your OpenAI Chat node and strengthen your system prompt to emphasize answering only from provided context. Add explicit instructions to respond with 'Information not available' when context is insufficient.
High API costs from OpenAI calls
Use text-embedding-3-small instead of larger models for embeddings, implement caching for repeated queries using n8n's Redis node, and set reasonable limits on context length and search result counts.