Knowledge Extraction and Storage
Knowledge Extraction and Storage
Synalinks provides a powerful knowledge base system for extracting, storing, and retrieving structured knowledge. This example demonstrates extracting structured information from invoices and documents, storing them, and querying them later.
graph LR
subgraph Extraction
A[Document] --> B[Generator]
B --> C[Structured Data]
end
subgraph Storage
C --> D[UpdateKnowledge]
D --> E[(KnowledgeBase)]
end
subgraph Retrieval
F[Query] --> G[RetrieveKnowledge]
E --> G
G --> H[Results]
end
Creating a Knowledge Base
The KnowledgeBase uses DuckDB as the underlying storage engine, providing
full-text search and optional vector similarity search:
# Define your data model
class Invoice(synalinks.DataModel):
invoice_number: str = synalinks.Field(description="Invoice number")
vendor: str = synalinks.Field(description="Vendor name")
total: float = synalinks.Field(description="Total amount")
description: str = synalinks.Field(description="Description of items")
# Create a knowledge base
knowledge_base = synalinks.KnowledgeBase(
uri="duckdb://./invoices.db",
data_models=[Invoice],
embedding_model=embedding_model, # Optional, for similarity search
)
Extracting Information with Generator
Use a Generator to extract structured information from unstructured text:
inputs = synalinks.Input(data_model=DocumentText)
extracted = await synalinks.Generator(
data_model=Invoice,
language_model=language_model,
)(inputs)
Storing Data with UpdateKnowledge
The UpdateKnowledge module stores data models in the knowledge base:
Retrieving Data with RetrieveKnowledge
The RetrieveKnowledge module uses hybrid search to find relevant records:
results = await synalinks.RetrieveKnowledge(
knowledge_base=knowledge_base,
language_model=language_model,
search_type="hybrid",
k=5,
)(query)
Key Takeaways
- KnowledgeBase: Unified interface for storing and searching structured data using DuckDB with full-text and vector search capabilities.
- UpdateKnowledge: Module for inserting/upserting data models into the knowledge base using the first field as primary key.
- RetrieveKnowledge: Module for intelligent retrieval using LM-generated search queries with hybrid search (full-text + vector).
- Structured Extraction: Use Generators to extract typed data from unstructured text like invoices, receipts, or documents.
Program Visualizations
Invoice Extraction Pipeline
Business Q&A System
API References
Answer
Bases: DataModel
An answer based on retrieved information.
Source code in examples/12_knowledge_extraction_and_storage.py
Customer
Bases: DataModel
Extracted customer information.
Source code in examples/12_knowledge_extraction_and_storage.py
DocumentText
Bases: DataModel
Raw document text to extract information from.
Source code in examples/12_knowledge_extraction_and_storage.py
Invoice
Bases: DataModel
Extracted invoice information.
Source code in examples/12_knowledge_extraction_and_storage.py
Query
Bases: DataModel
A user query for searching the knowledge base.

