SQL Agent
SQL Agent
An SQL Agent combines the reasoning capabilities of LLMs with structured database access. Instead of writing SQL manually, users ask questions in natural language and the agent autonomously discovers the schema, constructs queries, and returns answers.
Why SQL Agents Matter
Traditional database access requires SQL knowledge. An SQL Agent bridges this gap:
graph LR
subgraph Traditional
A[User] --> B[Write SQL]
B --> C[Execute Query]
C --> D[Interpret Results]
end
subgraph SQL Agent
E[User Question] --> F[Agent Reasons]
F --> G[Auto-generates SQL]
G --> H[Natural Language Answer]
end
SQL Agents provide:
- Natural Language Interface: Ask questions without knowing SQL
- Schema Discovery: Agent explores database structure automatically
- Query Generation: Constructs correct SQL based on user intent
- Safe Execution: Read-only queries prevent accidental modifications
Architecture
The SQL Agent uses three specialized tools to interact with the Knowledge Base:
flowchart TD
A[User Question] --> B[FunctionCallingAgent]
B --> C{Select Tool}
C -->|Discover structure| D[get_database_schema]
C -->|View sample data| E[get_table_sample]
C -->|Execute query| F[run_sql_query]
D --> G[Schema Info]
E --> H[Sample Rows]
F --> I[Query Results]
G --> B
H --> B
I --> B
B --> J[Natural Language Answer + SQL]
The agent follows an autonomous loop: it first discovers the schema, optionally samples data to understand the format, then constructs and executes SQL queries. This process repeats until the agent has enough information to answer.
Available Tools
| Tool | Description |
|---|---|
get_database_schema |
Returns all tables and their columns with types |
get_table_sample |
Fetches sample rows from a table with pagination |
run_sql_query |
Executes read-only SELECT queries safely |
Defining Input and Output Models
First, define DataModels for the agent's input (user query) and output (answer with SQL):
import synalinks
class Query(synalinks.DataModel):
"""A natural language query about the database."""
query: str = synalinks.Field(
description="The user's question about the data in natural language"
)
class SQLResult(synalinks.DataModel):
"""The result of the SQL agent's analysis."""
answer: str = synalinks.Field(
description="A clear, natural language answer to the user's question"
)
sql_query: str = synalinks.Field(
description="The SQL query that was executed to get the answer"
)
The Query model captures the user's natural language question. The SQLResult
model structures the output to include both a human-readable answer and the
SQL query used, providing transparency into the agent's reasoning.
Tool Design for Database Access
Each tool uses kb.get_symbolic_data_models() to dynamically discover
available tables. This makes the agent adaptable to any database schema
without hardcoding table names:
from synalinks.src.saving.object_registration import register_synalinks_serializable
@register_synalinks_serializable()
async def get_database_schema():
"""Get the complete database schema including all tables and columns."""
kb = get_knowledge_base()
symbolic_models = kb.get_symbolic_data_models()
schema_info = []
for model in symbolic_models:
schema = model.get_schema()
table_name = schema.get("title", "Unknown")
properties = schema.get("properties", {})
columns = []
for col_name, col_info in properties.items():
col_type = col_info.get("type", "unknown")
columns.append(f" - {col_name} ({col_type})")
schema_info.append(f"Table: {table_name}\n" + "\n".join(columns))
return {"schema": "\n\n".join(schema_info), "table_count": len(symbolic_models)}
The @register_synalinks_serializable() decorator enables the tool to be
saved and loaded with the program. The tool extracts table names and column
information from the JSON schema of each symbolic data model.
Building the SQL Agent
Wrap the tool functions with synalinks.Tool() and create the agent using
FunctionCallingAgent:
# Create Knowledge Base with your data models
kb = synalinks.KnowledgeBase(
uri="duckdb://my_database.db",
data_models=[Customer, Product, SalesOrder],
)
# Configure language model
lm = synalinks.LanguageModel(model="openai/gpt-4.1-mini")
# Wrap async functions as Tool objects
schema_tool = synalinks.Tool(get_database_schema)
sample_tool = synalinks.Tool(get_table_sample)
query_tool = synalinks.Tool(run_sql_query)
# Build the agent using Functional API
inputs = synalinks.Input(data_model=Query)
outputs = await synalinks.FunctionCallingAgent(
data_model=SQLResult,
language_model=lm,
tools=[schema_tool, sample_tool, query_tool],
autonomous=True, # Run until agent decides it's done
max_iterations=10, # Safety limit to prevent infinite loops
)(inputs)
sql_agent = synalinks.Program(
inputs=inputs,
outputs=outputs,
name="sql_agent",
)
The autonomous=True setting allows the agent to make multiple tool calls
until it has gathered enough information. The max_iterations parameter
prevents runaway execution.
Safety Considerations
The run_sql_query tool enforces read-only access through multiple layers:
- SELECT Only: Rejects queries not starting with SELECT
- Keyword Filtering: Blocks DROP, DELETE, INSERT, UPDATE, ALTER, etc.
- Read-Only Mode: Uses
kb.query(sql, read_only=True)
@register_synalinks_serializable()
async def run_sql_query(sql_query: str):
"""Execute a read-only SQL query."""
kb = get_knowledge_base()
# Validate query is read-only
query_upper = sql_query.strip().upper()
if not query_upper.startswith("SELECT"):
return {"error": "Only SELECT queries are allowed.", "success": False}
# Check for dangerous keywords
dangerous = ["DROP", "DELETE", "INSERT", "UPDATE", "ALTER", "CREATE"]
for keyword in dangerous:
if keyword in query_upper:
return {"error": f"Forbidden keyword: {keyword}", "success": False}
results = await kb.query(sql_query, read_only=True)
return {"success": True, "results": results, "row_count": len(results)}
Example Usage
# Ask a natural language question
result = await sql_agent(Query(query="Who are the top 3 customers by orders?"))
print(result["answer"])
# Output: "The top 3 customers are: Carlos Garcia ($1539.96),
# Alice Johnson ($1489.96), and Diana Chen ($379.94)"
print(result["sql_query"])
# Output: "SELECT c.name, SUM(o.total_amount) as total
# FROM Customer c JOIN SalesOrder o ON c.id = o.customer_id
# GROUP BY c.name ORDER BY total DESC LIMIT 3"
The agent automatically: 1. Discovered the Customer and SalesOrder tables 2. Understood the relationship via customer_id 3. Wrote a proper JOIN with aggregation 4. Returned both the answer and the SQL for transparency
Key Takeaways
-
Dynamic Schema Discovery: Use
kb.get_symbolic_data_models()to make tools adaptable to any database structure without hardcoding. -
Autonomous Reasoning: The
FunctionCallingAgentwithautonomous=Trueiteratively calls tools until it can answer the question. -
Safety First: Always validate SQL queries and use read-only mode to prevent accidental data modifications.
-
Transparent Outputs: Include the generated SQL in the output so users can verify and learn from the agent's reasoning.
-
Tool Serialization: Use
@register_synalinks_serializable()on tool functions to enable program saving and loading.
API References
Customer
Bases: DataModel
A customer in the database.
Source code in examples/16_sql_agent.py
Product
Bases: DataModel
A product in the database.
Source code in examples/16_sql_agent.py
Query
Bases: DataModel
A natural language query about the database.
Source code in examples/16_sql_agent.py
SQLResult
Bases: DataModel
The result of the SQL agent's analysis.
Source code in examples/16_sql_agent.py
SalesOrder
Bases: DataModel
An order in the database.
Source code in examples/16_sql_agent.py
get_database_schema()
async
Get the complete database schema including all tables and their columns.
Returns a list of all tables with their column names and types. Use this tool first to understand what data is available before writing queries.
Source code in examples/16_sql_agent.py
get_knowledge_base()
get_table_sample(table_name, limit, offset)
async
Get a sample of rows from a specific table to understand the data format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
The name of the table to sample. |
required |
limit
|
int
|
Number of sample rows to return (recommended: 3-5). |
required |
offset
|
int
|
Number of rows to skip before returning results (use 0 for start). |
required |
Source code in examples/16_sql_agent.py
main()
async
Demonstrate the SQL agent with natural language queries.
Source code in examples/16_sql_agent.py
637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 | |
populate_knowledge_base(kb)
async
Populate the knowledge base with sample data.
Source code in examples/16_sql_agent.py
438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 | |
run_sql_query(sql_query)
async
Execute a read-only SQL query and return the results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_query
|
str
|
A SELECT SQL query to execute. Only SELECT queries are allowed. |
required |
Important
- Only SELECT queries are permitted for safety
- Use get_database_schema first to discover available tables
- Use proper JOIN syntax for multi-table queries
- Include LIMIT clause for large result sets