Vector RAG Agent
Vector RAG Agent
Guide 6 showed how to store documents in a
KnowledgeBase and query them with similarity / fulltext / hybrid
search. Guide 5 introduced the agent loop. This guide combines
the two: a Vector RAG Agent is an agent whose tool set is
pre-wired for retrieval, so the agent itself decides whether to
search, what table to search, and how to phrase the query.
Why Make Retrieval Agentic?
A textbook RAG pipeline is a fixed three-step program:
- Embed the user's question.
- Run a single similarity search.
- Stuff the top-k results into a final prompt.
That works when retrieval is always needed, the right phrasing is the user's question, and a single round of results is enough. It struggles when:
- The user asks something the model already knows ("what day is today?") — the retrieval step is pure overhead.
- The user's phrasing doesn't match the corpus phrasing ("PTO policy" vs "vacation days") — one search misses; a reformulation might land.
- One topic is enough; another needs two searches against different tables.
An agent moves those decisions inside the loop. The model sees the question, decides whether to retrieve, picks the table, writes a query in its own words, reads the results, and may search again before answering.
graph TD
A["User question"] --> B["VectorRAGAgent"]
B --> C{"Retrieval needed?"}
C -->|"no"| H["Final answer"]
C -->|"yes"| D["search_knowledge_base"]
D --> E["Top-k hits"]
E --> F{"Enough info?"}
F -->|"yes"| H
F -->|"no, refine"| D
F -->|"need a record's full body"| G["get_record_by_id"]
G --> H
The Three Tools
synalinks.VectorRAGAgent wraps a FunctionCallingAgent and
pre-wires three retrieval tools bound to a KnowledgeBase:
| Tool | Purpose |
|---|---|
get_knowledge_base_schema() |
List every table with its columns and descriptions. Used once to learn what's available. |
search_knowledge_base(table_name, query) |
Run the configured search (similarity / fulltext / hybrid_fts) against one table and return up to k hits. k is fixed per-agent at construction; the LM doesn't choose it per-call. |
get_record_by_id(table_name, record_id) |
After a search returns ids, fetch a full record. Useful when search snippets truncate long fields. |
The three tools cover the natural shape of retrieval: what tables exist → search one of them → optionally read a record in full.
Picking a Search Mode
search_type is a per-agent setting (the LM doesn't choose). It
shapes both the dispatch behaviour and the default instructions the
LM receives (so it phrases queries the right way).
search_type |
What it does | Use when |
|---|---|---|
"similarity" |
Pure vector search over embeddings. Requires an embedding model. | The corpus and the question use different words for the same concept (paraphrase-heavy). |
"fulltext" |
BM25 keyword search. No embedding model needed. | The words in the question are likely to appear verbatim in the documents (technical terms, named entities). |
"hybrid_fts" (default) |
Vector + BM25 fused with Reciprocal Rank Fusion (RRF). Requires an embedding model. | You don't want to choose — RRF combines both signals and is the safest default. |
For each mode, the agent's default instructions tell the LM how to phrase its queries: natural-language paraphrases for similarity, keyword-rich strings for fulltext, both for hybrid.
Output Format
Search results are returned as CSV by default (output_format="csv"),
which is dramatically more token-efficient than JSON for tabular
hits — no key names repeated per row. The LM reads CSV well, and on
modern providers it parses faster too. Switch to "json" if you
need list-of-dicts results for downstream code, but for purely
LM-facing flows CSV is the better default.
Building the Agent
The constructor signature mirrors FunctionCallingAgent exactly —
every parameter on that class is accepted with identical semantics.
The additions are retrieval-specific:
| Param | Required | Default | Notes |
|---|---|---|---|
knowledge_base |
yes | — | The KnowledgeBase to retrieve from. |
search_type |
no | "hybrid_fts" |
"similarity", "fulltext", or "hybrid_fts". |
k |
no | 5 |
Top-k for searches. Fixed per-agent — not exposed to the LM. |
similarity_threshold |
no | None |
Max vector distance for similarity / hybrid modes. |
fulltext_threshold |
no | None |
Min BM25 score for fulltext / hybrid modes. |
output_format |
no | "csv" |
"csv" (compact) or "json" (list of dicts). |
tools |
no | None |
Extra Tool instances or async functions to append to the three built-ins. Same name-collision and no-leading-underscore rules as FunctionCallingAgent. |
import synalinks
embedding_model = synalinks.EmbeddingModel(
model="gemini/text-embedding-004",
)
kb = synalinks.KnowledgeBase(
uri="duckdb://docs.db",
data_models=[Document],
embedding_model=embedding_model,
)
# ... populate kb ...
lm = synalinks.LanguageModel(model="ollama/mistral")
inputs = synalinks.Input(data_model=synalinks.ChatMessages)
outputs = await synalinks.VectorRAGAgent(
knowledge_base=kb,
language_model=lm,
)(inputs)
agent = synalinks.Program(inputs=inputs, outputs=outputs)
Note: "similarity" and "hybrid_fts" modes need the knowledge
base to have an embedding_model set. "fulltext" works without
one.
Layering Extra Tools
The tools= slot lets you append non-retrieval tools to the built-
in three. Useful when the answer mixes retrieved text with another
capability:
@synalinks.saving.register_synalinks_serializable()
async def calculate(expression: str):
"""Evaluate an arithmetic expression.
Args:
expression (str): A safe expression like '100 * 0.85'.
"""
return {"result": eval(expression, {"__builtins__": {}}, {})}
agent_module = synalinks.VectorRAGAgent(
knowledge_base=kb,
language_model=lm,
tools=[synalinks.Tool(calculate)],
)
Now the LM can retrieve a pricing policy from the kb and then compute a discount in the same turn.
A Worked Example
A small end-to-end task: a kb of HR / pricing policies, and the agent answers a question that needs both retrieval and arithmetic.
result = await agent(synalinks.ChatMessages(messages=[synalinks.ChatMessage(
role="user",
content=(
"If I have 100 users on the Enterprise plan, what would the "
"per-user monthly cost be after the volume discount?"
),
)]))
What the agent typically does:
search_knowledge_base("Document", "enterprise plan pricing volume discount")— pulls the pricing-policy document.- The LM reads the discount tiers from the snippet.
calculate("99 * (1 - 0.15)")— applies the 15% discount.- Stops, produces the final natural-language answer.
If the first search doesn't return what the LM expected, it reformulates and searches again. Multiple search rounds are normal when the question's phrasing diverges from the corpus's.
Multi-Turn Conversations
VectorRAGAgent accepts ChatMessages as input and threads context
across turns. Each turn the agent sees the full conversation, so it
can ground a follow-up against retrieval from a previous turn
without re-asking:
messages = []
for user_msg in [
"What's the daily meal allowance?",
"What about international?",
"I'm going for 3 days — what's my total budget?",
]:
messages.append(synalinks.ChatMessage(role="user", content=user_msg))
chat = synalinks.ChatMessages(messages=messages)
result = await agent(chat)
last = next(m for m in reversed(result.get("messages", []))
if m.get("role") == "assistant" and m.get("content"))
print("Agent:", last["content"])
messages.append(synalinks.ChatMessage(role="assistant", content=last["content"]))
Compared to Other Agents
VectorRAGAgent is one of several specialized agents that wrap a
FunctionCallingAgent with a workload-specific tool set:
| Agent | Bound to | Tools |
|---|---|---|
FunctionCallingAgent |
nothing | whatever you pass in |
SQLAgent |
a KnowledgeBase |
schema discovery, table sample, read-only SQL |
VectorRAGAgent |
a KnowledgeBase |
schema discovery, similarity / fulltext / hybrid search, get-by-id |
DeepAgent |
a workdir | list, search, read, write, edit, bash |
When to pick VectorRAGAgent over SQLAgent: the data is text
(unstructured documents), and the question is "find me what's
relevant to X" rather than "compute Y from rows of Z". When the
data is structured (typed columns) and the answer needs joins or
aggregations, reach for SQLAgent.
You can also use both — give an agent a KnowledgeBase with both
document tables and structured tables, layer SQL tools on top of
retrieval, and let the LM mix them. But for a single workload, the
specialized agent is the simpler call.
API References
Document
Bases: DataModel
A document in the knowledge base.
Source code in guides/23_vector_rag_agent.py
Source
import asyncio
import os
from dotenv import load_dotenv
import synalinks
class Document(synalinks.DataModel):
"""A document in the knowledge base."""
id: str = synalinks.Field(description="Document id")
title: str = synalinks.Field(description="Document title")
content: str = synalinks.Field(description="Body text")
SAMPLE_DOCUMENTS = [
Document(
id="pricing-enterprise",
title="Enterprise Plan Pricing",
content=(
"Enterprise Plan: $99/user/month billed annually. "
"Volume discounts: 10% off at 50 users, 15% off at 100, "
"20% off at 250."
),
),
Document(
id="policy-pto",
title="PTO Policy",
content=(
"Full-time employees receive 20 days of PTO per year. "
"PTO accrues monthly. Up to 5 unused days carry over."
),
),
Document(
id="policy-remote",
title="Remote Work Policy",
content=(
"Employees may work remotely up to 3 days per week with "
"manager approval. Core hours are 10 AM to 4 PM."
),
),
]
async def main():
load_dotenv()
synalinks.clear_session()
db_path = "./guides/vector_rag_agent_guide.db"
if os.path.exists(db_path):
os.remove(db_path)
# Build a kb with an embedding model so hybrid_fts can use both
# vector and BM25 signals.
embedding_model = synalinks.EmbeddingModel(model="ollama/all-minilm")
kb = synalinks.KnowledgeBase(
uri=f"duckdb://{db_path}",
data_models=[Document],
embedding_model=embedding_model,
wipe_on_start=True,
name="vector_rag_agent_guide_kb",
)
for doc in SAMPLE_DOCUMENTS:
await kb.update(doc.to_json_data_model())
language_model = synalinks.LanguageModel(model="ollama/mistral")
# Build the agent with the default hybrid_fts mode.
inputs = synalinks.Input(data_model=synalinks.ChatMessages)
outputs = await synalinks.VectorRAGAgent(
knowledge_base=kb,
language_model=language_model,
max_iterations=5,
)(inputs)
agent = synalinks.Program(
inputs=inputs,
outputs=outputs,
name="vector_rag_agent",
description="A RAG agent that retrieves and answers from documents.",
)
# Ask a question that needs retrieval to answer correctly.
question = synalinks.ChatMessages(
messages=[
synalinks.ChatMessage(
role="user",
content="How much PTO do employees get per year?",
)
]
)
result = await agent(question)
# Print the final assistant message.
for msg in reversed(result.get("messages", [])):
if msg.get("role") == "assistant" and msg.get("content"):
print("Agent:", msg["content"])
break
if os.path.exists(db_path):
os.remove(db_path)
if __name__ == "__main__":
asyncio.run(main())