Document RAG
Document RAG (Retrieval-Augmented Generation)
This example demonstrates how to build a classic RAG system using Synalinks. RAG combines document retrieval with language model generation to answer questions based on your own documents.
How RAG Works
graph LR
subgraph Indexing
A[Documents] --> B[Embeddings]
B --> C[(KnowledgeBase)]
end
subgraph Query Time
D[Question] --> E[RetrieveKnowledge]
C --> E
E --> F[Relevant Docs]
F --> G[Generator]
G --> H[Answer]
end
- Index: Store documents in a knowledge base with embeddings
- Retrieve: When a question is asked, find relevant documents
- Generate: Use the retrieved context to generate an accurate answer
Creating a Document Store
class Document(synalinks.DataModel):
id: str = synalinks.Field(description="Document ID")
title: str = synalinks.Field(description="Document title")
content: str = synalinks.Field(description="Document content")
knowledge_base = synalinks.KnowledgeBase(
uri="duckdb://./documents.db",
data_models=[Document],
embedding_model=embedding_model, # For semantic search
)
Building the RAG Pipeline
inputs = synalinks.Input(data_model=Query)
# Retrieve relevant documents
retrieved = await synalinks.RetrieveKnowledge(
knowledge_base=knowledge_base,
language_model=language_model,
search_type="hybrid",
k=3,
)(inputs)
# Generate answer from retrieved context
answer = await synalinks.Generator(
data_model=Answer,
language_model=language_model,
instructions="Answer based on the retrieved documents.",
)(retrieved)
Key Takeaways
- Hybrid Search: Combines keyword (BM25) and semantic (vector) search for better retrieval accuracy.
- Chunking: For large documents, split into smaller chunks for better retrieval granularity.
- Context Window: Retrieved documents are passed as context to the LM for grounded generation.
- Trainable: The retrieval and generation modules can be optimized using Synalinks training.
Program Visualization
API References
Answer
Bases: DataModel
An answer generated from retrieved documents.
Source code in examples/13_document_rag.py
Document
Bases: DataModel
A document stored in the knowledge base.
