Skip to content

VectorRAGAgent module

VectorRAGAgent

Bases: Module

A ready-to-use retrieval-augmented agent backed by a knowledge base.

VectorRAGAgent is a thin specialization of :class:FunctionCallingAgent that pre-wires three retrieval tools bound to a :class:KnowledgeBase:

  • get_knowledge_base_schema: lists available tables and columns.
  • search_knowledge_base: dispatches to similarity / fulltext / hybrid_fts depending on the configured search_type.
  • get_record_by_id: full-record lookup after a search returns an id.

The constructor mirrors :class:FunctionCallingAgent — every parameter on that class is accepted here with identical semantics. The only additions are knowledge_base (required), the retrieval knobs (search_type, k, similarity_threshold, fulltext_threshold), and output_format. User-supplied tools are appended to the three built-in retrieval tools.

Compared to a hardcoded RAG pipeline (always retrieve, then answer), the agent decides if retrieval is needed, which table to search, and how to phrase the query. Multiple searches per turn are allowed.

Example:

import synalinks
import asyncio

class Document(synalinks.DataModel):
    id: str = synalinks.Field(description="Document id")
    title: str = synalinks.Field(description="Title")
    content: str = synalinks.Field(description="Body text")

async def main():
    embedding_model = synalinks.EmbeddingModel(
        model="gemini/text-embedding-004",
    )
    kb = synalinks.KnowledgeBase(
        uri="duckdb://docs.db",
        data_models=[Document],
        embedding_model=embedding_model,
    )
    # ... populate kb ...

    lm = synalinks.LanguageModel(model="ollama/mistral")

    inputs = synalinks.Input(data_model=synalinks.ChatMessages)
    outputs = await synalinks.VectorRAGAgent(
        knowledge_base=kb,
        language_model=lm,
    )(inputs)
    agent = synalinks.Program(inputs=inputs, outputs=outputs)

    messages = synalinks.ChatMessages(messages=[
        synalinks.ChatMessage(role="user", content="What is the PTO policy?")
    ])
    result = await agent(messages)
    print(result.get("messages")[-1].get("content"))

asyncio.run(main())

Parameters:

Name Type Description Default
knowledge_base KnowledgeBase

The knowledge base to retrieve from. Required.

None
search_type str

Retrieval mode for the search_knowledge_base tool. One of:

  • "similarity": vector-similarity over embeddings.
  • "fulltext": BM25 keyword search.
  • "hybrid_fts" (default): vector + BM25 fused with RRF.

Requires the knowledge base to have an embedding model configured for "similarity" and "hybrid_fts".

'hybrid_fts'
k int

Top-k for searches. Fixed per-agent at construction time — the LM doesn't pass it. Defaults to 5.

5
similarity_threshold float

Maximum vector distance for the similarity and hybrid modes. Optional.

None
fulltext_threshold float

Minimum BM25 score for the fulltext and hybrid modes. Optional.

None
output_format str

How search results are rendered to the LM. "csv" (default) is compact; "json" returns a list of dicts.

'csv'
tools list

Additional :class:Tool instances (or plain async functions) to expose alongside the three built-in retrieval tools. Tool names must not collide with the built-ins (get_knowledge_base_schema, search_knowledge_base, get_record_by_id) or a ValueError is raised.

None
schema dict

JSON schema for the final answer.

None
data_model DataModel

DataModel for the final answer. Mutually exclusive with schema.

None
language_model LanguageModel

The language model that drives the agent loop.

None
prompt_template str

Forwarded to the tool-call generator.

None
examples list

Few-shot examples for the tool-call generator.

None
instructions str

Override the default system instructions. When omitted, defaults are built from the knowledge base's tables and the configured search_type.

None
final_instructions str

Instructions for the final-answer generator. Defaults to instructions.

None
temperature float

LM sampling temperature. Defaults to 0.0.

0.0
use_inputs_schema bool

Include the input schema in the prompt.

False
use_outputs_schema bool

Include the output schema in the prompt.

False
reasoning_effort str

Forwarded to the generators (for reasoning-capable LMs).

None
use_chain_of_thought bool

When True, the tool-call generator emits a thinking field per round.

False
autonomous bool

When True (default), the agent runs the tool loop end-to-end. When False, returns one step at a time for human-in-the-loop workflows.

True
return_inputs_with_trajectory bool

When True (default), the full message trajectory is included alongside the final answer.

True
max_iterations int

Maximum number of tool-call rounds. Defaults to 5.

5
streaming bool

Stream the final answer when no schema is set. Defaults to False.

False
name str

Module name.

None
description str

Module description.

None
Source code in synalinks/src/modules/agents/vector_rag_agent.py
@synalinks_export(
    [
        "synalinks.modules.VectorRAGAgent",
        "synalinks.VectorRAGAgent",
    ]
)
class VectorRAGAgent(Module):
    """A ready-to-use retrieval-augmented agent backed by a knowledge base.

    VectorRAGAgent is a thin specialization of
    :class:`FunctionCallingAgent` that pre-wires three retrieval tools
    bound to a :class:`KnowledgeBase`:

    - ``get_knowledge_base_schema``: lists available tables and columns.
    - ``search_knowledge_base``: dispatches to similarity / fulltext /
      hybrid_fts depending on the configured ``search_type``.
    - ``get_record_by_id``: full-record lookup after a search returns
      an id.

    The constructor mirrors :class:`FunctionCallingAgent` — every
    parameter on that class is accepted here with identical semantics.
    The only additions are ``knowledge_base`` (required), the
    retrieval knobs (``search_type``, ``k``, ``similarity_threshold``,
    ``fulltext_threshold``), and ``output_format``. User-supplied
    ``tools`` are appended to the three built-in retrieval tools.

    Compared to a hardcoded RAG pipeline (always retrieve, then
    answer), the agent decides *if* retrieval is needed, *which* table
    to search, and *how* to phrase the query. Multiple searches per
    turn are allowed.

    Example:

    ```python
    import synalinks
    import asyncio

    class Document(synalinks.DataModel):
        id: str = synalinks.Field(description="Document id")
        title: str = synalinks.Field(description="Title")
        content: str = synalinks.Field(description="Body text")

    async def main():
        embedding_model = synalinks.EmbeddingModel(
            model="gemini/text-embedding-004",
        )
        kb = synalinks.KnowledgeBase(
            uri="duckdb://docs.db",
            data_models=[Document],
            embedding_model=embedding_model,
        )
        # ... populate kb ...

        lm = synalinks.LanguageModel(model="ollama/mistral")

        inputs = synalinks.Input(data_model=synalinks.ChatMessages)
        outputs = await synalinks.VectorRAGAgent(
            knowledge_base=kb,
            language_model=lm,
        )(inputs)
        agent = synalinks.Program(inputs=inputs, outputs=outputs)

        messages = synalinks.ChatMessages(messages=[
            synalinks.ChatMessage(role="user", content="What is the PTO policy?")
        ])
        result = await agent(messages)
        print(result.get("messages")[-1].get("content"))

    asyncio.run(main())
    ```

    Args:
        knowledge_base (KnowledgeBase): The knowledge base to retrieve
            from. Required.
        search_type (str): Retrieval mode for the
            ``search_knowledge_base`` tool. One of:

            - ``"similarity"``: vector-similarity over embeddings.
            - ``"fulltext"``: BM25 keyword search.
            - ``"hybrid_fts"`` (default): vector + BM25 fused with RRF.

            Requires the knowledge base to have an embedding model
            configured for ``"similarity"`` and ``"hybrid_fts"``.
        k (int): Top-k for searches. Fixed per-agent at construction
            time — the LM doesn't pass it. Defaults to 5.
        similarity_threshold (float): Maximum vector distance for the
            similarity and hybrid modes. Optional.
        fulltext_threshold (float): Minimum BM25 score for the
            fulltext and hybrid modes. Optional.
        output_format (str): How search results are rendered to the
            LM. ``"csv"`` (default) is compact; ``"json"`` returns a
            list of dicts.
        tools (list): Additional :class:`Tool` instances (or plain
            async functions) to expose alongside the three built-in
            retrieval tools. Tool names must not collide with the
            built-ins (``get_knowledge_base_schema``,
            ``search_knowledge_base``, ``get_record_by_id``) or a
            ``ValueError`` is raised.
        schema (dict): JSON schema for the final answer.
        data_model (DataModel): DataModel for the final answer.
            Mutually exclusive with ``schema``.
        language_model (LanguageModel): The language model that drives
            the agent loop.
        prompt_template (str): Forwarded to the tool-call generator.
        examples (list): Few-shot examples for the tool-call generator.
        instructions (str): Override the default system instructions.
            When omitted, defaults are built from the knowledge base's
            tables and the configured ``search_type``.
        final_instructions (str): Instructions for the final-answer
            generator. Defaults to ``instructions``.
        temperature (float): LM sampling temperature. Defaults to 0.0.
        use_inputs_schema (bool): Include the input schema in the
            prompt.
        use_outputs_schema (bool): Include the output schema in the
            prompt.
        reasoning_effort (str): Forwarded to the generators (for
            reasoning-capable LMs).
        use_chain_of_thought (bool): When ``True``, the tool-call
            generator emits a ``thinking`` field per round.
        autonomous (bool): When ``True`` (default), the agent runs the
            tool loop end-to-end. When ``False``, returns one step at
            a time for human-in-the-loop workflows.
        return_inputs_with_trajectory (bool): When ``True`` (default),
            the full message trajectory is included alongside the
            final answer.
        max_iterations (int): Maximum number of tool-call rounds.
            Defaults to 5.
        streaming (bool): Stream the final answer when no ``schema``
            is set. Defaults to ``False``.
        name (str): Module name.
        description (str): Module description.
    """

    def __init__(
        self,
        *,
        knowledge_base=None,
        search_type: str = "hybrid_fts",
        k: int = 5,
        similarity_threshold: Optional[float] = None,
        fulltext_threshold: Optional[float] = None,
        output_format: str = "csv",
        tools: Optional[List] = None,
        schema=None,
        data_model=None,
        language_model=None,
        prompt_template=None,
        examples=None,
        instructions: Optional[str] = None,
        final_instructions: Optional[str] = None,
        temperature: float = 0.0,
        use_inputs_schema: bool = False,
        use_outputs_schema: bool = False,
        reasoning_effort: Optional[str] = None,
        use_chain_of_thought: bool = False,
        autonomous: bool = True,
        return_inputs_with_trajectory: bool = True,
        max_iterations: int = 5,
        streaming: bool = False,
        name: Optional[str] = None,
        description: Optional[str] = None,
    ):
        super().__init__(name=name, description=description)

        if knowledge_base is None:
            raise ValueError("`knowledge_base` is required")
        self.knowledge_base = knowledge_base
        self.language_model = _get_lm(language_model)

        if not schema and data_model:
            schema = data_model.get_schema()
        self.schema = schema

        if search_type not in SEARCH_TYPES:
            raise ValueError(
                f"`search_type` must be one of {SEARCH_TYPES}, got {search_type!r}"
            )
        self.search_type = search_type

        if output_format not in ("csv", "json"):
            raise ValueError(
                f"`output_format` must be 'csv' or 'json', got {output_format!r}"
            )
        self.output_format = output_format

        self.k = k
        self.similarity_threshold = similarity_threshold
        self.fulltext_threshold = fulltext_threshold

        if instructions is None:
            tables = [
                m.get_schema().get("title", "Unknown")
                for m in self.knowledge_base.get_symbolic_data_models()
            ]
            instructions = get_default_instructions(tables, self.search_type)
        self.instructions = instructions
        self.final_instructions = final_instructions

        self.prompt_template = prompt_template
        self.examples = examples
        self.temperature = temperature
        self.use_inputs_schema = use_inputs_schema
        self.use_outputs_schema = use_outputs_schema
        self.reasoning_effort = reasoning_effort
        self.use_chain_of_thought = use_chain_of_thought
        self.autonomous = autonomous
        self.return_inputs_with_trajectory = return_inputs_with_trajectory
        self.max_iterations = max_iterations
        self.streaming = streaming

        builtin_tools = [
            Tool(fn)
            for fn in _build_tools(
                self.knowledge_base,
                search_type=self.search_type,
                k=self.k,
                similarity_threshold=self.similarity_threshold,
                fulltext_threshold=self.fulltext_threshold,
                output_format=self.output_format,
            )
        ]
        builtin_names = {t.name for t in builtin_tools}

        self.extra_tools = list(tools) if tools else []
        merged_tools = list(builtin_tools)
        for extra in self.extra_tools:
            extra_tool = extra if isinstance(extra, Tool) else Tool(extra)
            if extra_tool.name in builtin_names:
                raise ValueError(
                    f"Tool name {extra_tool.name!r} collides with a built-in "
                    f"retrieval tool. Rename the additional tool."
                )
            merged_tools.append(extra_tool)
        # Leading-underscore check is centralized in FunctionCallingAgent.

        self.agent = FunctionCallingAgent(
            schema=self.schema,
            language_model=self.language_model,
            prompt_template=self.prompt_template,
            examples=self.examples,
            instructions=self.instructions,
            final_instructions=self.final_instructions,
            temperature=self.temperature,
            use_inputs_schema=self.use_inputs_schema,
            use_outputs_schema=self.use_outputs_schema,
            reasoning_effort=self.reasoning_effort,
            use_chain_of_thought=self.use_chain_of_thought,
            tools=merged_tools,
            autonomous=self.autonomous,
            return_inputs_with_trajectory=self.return_inputs_with_trajectory,
            max_iterations=self.max_iterations,
            streaming=self.streaming,
            name="agent_" + self.name,
        )

    async def call(self, inputs, training=False):
        return await self.agent(inputs, training=training)

    async def compute_output_spec(self, inputs, training=False):
        return await self.agent.compute_output_spec(inputs, training=training)

    def get_config(self):
        config = {
            "schema": self.schema,
            "search_type": self.search_type,
            "k": self.k,
            "similarity_threshold": self.similarity_threshold,
            "fulltext_threshold": self.fulltext_threshold,
            "output_format": self.output_format,
            "prompt_template": self.prompt_template,
            "examples": self.examples,
            "instructions": self.instructions,
            "final_instructions": self.final_instructions,
            "temperature": self.temperature,
            "use_inputs_schema": self.use_inputs_schema,
            "use_outputs_schema": self.use_outputs_schema,
            "reasoning_effort": self.reasoning_effort,
            "use_chain_of_thought": self.use_chain_of_thought,
            "autonomous": self.autonomous,
            "return_inputs_with_trajectory": self.return_inputs_with_trajectory,
            "max_iterations": self.max_iterations,
            "streaming": self.streaming,
            "name": self.name,
            "description": self.description,
        }
        knowledge_base_config = {
            "knowledge_base": serialization_lib.serialize_synalinks_object(
                self.knowledge_base,
            )
        }
        language_model_config = {
            "language_model": serialization_lib.serialize_synalinks_object(
                self.language_model,
            )
        }
        tools_config = {
            "tools": [
                serialization_lib.serialize_synalinks_object(
                    t if isinstance(t, Tool) else Tool(t)
                )
                for t in self.extra_tools
            ]
        }
        return {
            **config,
            **knowledge_base_config,
            **language_model_config,
            **tools_config,
        }

    @classmethod
    def from_config(cls, config):
        knowledge_base = serialization_lib.deserialize_synalinks_object(
            config.pop("knowledge_base")
        )
        language_model = serialization_lib.deserialize_synalinks_object(
            config.pop("language_model")
        )
        tools = [
            serialization_lib.deserialize_synalinks_object(t)
            for t in config.pop("tools", [])
        ]
        return cls(
            knowledge_base=knowledge_base,
            language_model=language_model,
            tools=tools,
            **config,
        )

get_default_instructions(tables, search_type)

Default system instructions for the RAG agent.

Parameters:

Name Type Description Default
tables List[str]

PascalCase names of tables available for retrieval. Embedded in the prompt so the LM can pick a target table without a separate schema call.

required
search_type str

Which retrieval mode the agent is configured for. Shapes the guidance on how to phrase query arguments.

required

Returns:

Type Description
str

A prompt string giving the LM the retrieval loop and the

str

query-writing guidance for the configured search mode.

Source code in synalinks/src/modules/agents/vector_rag_agent.py
def get_default_instructions(tables: List[str], search_type: str) -> str:
    """Default system instructions for the RAG agent.

    Args:
        tables: PascalCase names of tables available for retrieval.
            Embedded in the prompt so the LM can pick a target table
            without a separate schema call.
        search_type: Which retrieval mode the agent is configured for.
            Shapes the guidance on how to phrase ``query`` arguments.

    Returns:
        A prompt string giving the LM the retrieval loop and the
        query-writing guidance for the configured search mode.
    """
    if search_type == "similarity":
        retrieval_hint = (
            "Use natural-language descriptions of what you need — the "
            "search is vector-similarity over embeddings, so paraphrase "
            "the user's intent rather than guessing keywords."
        )
    elif search_type == "fulltext":
        retrieval_hint = (
            "Use keyword-rich queries — the search is BM25 full-text, "
            "so the words you pick must appear in the documents."
        )
    else:  # hybrid_fts
        retrieval_hint = (
            "Use natural-language queries that contain the keywords you "
            "expect to appear in matching documents — the search fuses "
            "vector similarity and BM25 with Reciprocal Rank Fusion, so "
            "both signals contribute."
        )
    return f"""
You are a retrieval-augmented assistant with access to a knowledge base.

Available tables: {tables}
Search mode: {search_type}

Plan:
1. If you don't already know what's available, call `get_knowledge_base_schema`.
2. Call `search_knowledge_base` with the table you want and a query.
   {retrieval_hint}
3. If a search result references an id you want to inspect in full,
   call `get_record_by_id`.
4. Once you have enough context, stop calling tools and answer.

Constraints:
- Only retrieve when the user's question actually needs grounded
  information. Trivial questions don't need a search.
- Reformulate the user's question into focused queries; don't just pass
  the raw user text.
- If a search returns nothing useful, retry with a different phrasing
  before giving up.
""".strip()