Getting started¶

Installation¶

pip install synalog

Or with uv:

uv add synalog          # add it to a uv-managed project
uv pip install synalog  # or install into the current virtualenv

Requires Python 3.10+. Wheels are published for Linux (x86_64, aarch64, armv7, s390x, ppc64le; glibc and musl), Windows (x64, x86, aarch64) and macOS (x86_64, aarch64).

duckdb (the default engine) and sqlite work out of the box. To execute on PostgreSQL, add the run extra — pip install 'synalog[run]' or uv add 'synalog[run]' — which pulls in the psycopg driver.

Your first program¶

A Synalog program declares tables (external data), defines concepts (entities and relationships extracted from tables), and writes rules (derived data). You then compile a predicate to SQL and run it with any database driver.

import synalog

source = """
@Engine("duckdb");

Employee(name: "Alice", department: "Engineering", salary: 75000);
Employee(name: "Bob", department: "Marketing", salary: 65000);
Employee(name: "Charlie", department: "Engineering", salary: 80000);

@OrderBy(EngineeringTeam, "name");
EngineeringTeam(name:, salary:) :- Employee(name:, department: "Engineering", salary:);
"""

# 1. Validate the program — returns a list of error messages
errors = synalog.check(source)
assert errors == []

# 2. Compile a predicate to SQL
sql = synalog.compile(source, "EngineeringTeam")
print(sql)

Output (the DuckDB initialization preamble is omitted here — the complete example below shows the full, unedited output):

WITH t_0_Employee AS (SELECT * FROM (
    SELECT
      E'Alice' AS name,
      E'Engineering' AS department,
      75000 AS salary
   UNION ALL
    SELECT
      E'Bob' AS name,
      E'Marketing' AS department,
      65000 AS salary
   UNION ALL
    SELECT
      E'Charlie' AS name,
      E'Engineering' AS department,
      80000 AS salary
) AS UNUSED_TABLE_NAME  )
SELECT
  Employee.name AS name,
  Employee.salary AS salary
FROM
  t_0_Employee AS Employee
WHERE
  (Employee.department = E'Engineering') ORDER BY name;

Inline facts compile to a WITH clause; a real database table compiles to a plain FROM over that table, as in Querying a CSV file below.

Executing the SQL¶

Synalog produces plain SQL strings, so you can execute them with any driver for your engine — sqlite3, duckdb, psycopg, google-cloud-bigquery, and so on:

import duckdb

result = duckdb.sql(sql).fetchall()
print(result)
# [('Alice', 75000), ('Charlie', 80000)]

The command line¶

Everything above also works without writing Python. Installing the package installs a synalog command that validates, compiles and runs predicates in one step:

synalog program.l check
synalog program.l run EngineeringTeam

Running synalog with no arguments starts an interactive session where you build a program rule by rule and query it as you go. With uv, uvx runs the CLI without installing anything: uvx synalog program.l run EngineeringTeam (duckdb is bundled; add --from 'synalog[run]' for PostgreSQL). See CLI interface.

Start a project¶

The fastest way to start: synalog init scaffolds a project — with uv you don't even need to install anything first, uvx synalog init runs it straight away. It asks for a name and a description, then creates a runnable starter program, sample data, and agent instructions (AGENTS.md, CLAUDE.md and a Synalog skill in .agents/skills/synalog/) so coding agents know how to use Synalog:

$ synalog init
Project name [synalog-project]: demo-kb
Description [A Synalog knowledge base]: Sales analytics for the demo team
Created demo-kb/
  .agents/skills/synalog/SKILL.md
  .gitignore
  AGENTS.md
  CLAUDE.md
  data/sales.csv
  example.l
  lib/metrics.l

Next steps:
  cd demo-kb
  synalog example.l run TopRegion MonthlySales --load sales=data/sales.csv

Put your data files in data/, keep reusable predicates in lib/ modules, and write one top-level program per analysis. See Starting a project for what each generated file does.

Querying a CSV file¶

Real data usually lives in files or database tables, not inline facts. DuckDB loads a CSV straight into a table, and a Synalog program references that table by its database name (lowercase). By convention the # Tables section maps the raw table to a PascalCase table predicate, and everything else builds on the predicate.

Take a small smoke-test dataset:

test_id,device,status,duration_ms,run_at
1,sensor-a,passed,532,2026-06-10 08:00:12
2,sensor-b,passed,488,2026-06-10 08:00:15
3,gateway,failed,2150,2026-06-10 08:00:21
4,sensor-a,passed,540,2026-06-10 20:00:09
5,sensor-b,flaky,1102,2026-06-10 20:00:14
6,gateway,failed,2310,2026-06-10 20:00:20
7,sensor-a,passed,529,2026-06-11 08:00:11
8,sensor-b,passed,495,2026-06-11 08:00:13
9,gateway,passed,1874,2026-06-11 08:00:19
10,sensor-a,failed,1980,2026-06-11 20:00:10
11,sensor-b,passed,501,2026-06-11 20:00:16
12,gateway,passed,1822,2026-06-11 20:00:22

Load it into DuckDB and run a compiled predicate against the connection:

import duckdb
import synalog

conn = duckdb.connect()
conn.execute(
    "CREATE TABLE smoke_tests AS SELECT * FROM read_csv('smoke_tests.csv')"
)

source = open("loading_csv.l").read()
assert synalog.check(source) == []

sql = synalog.compile(source, "FailuresByDevice")
print(conn.execute(sql).fetchall())
# [('gateway', 2), ('sensor-a', 1)]

The program maps the smoke_tests table once, extracts the device and status concepts, and derives failure counts and daily run totals — note the temporal pipeline (ToString → Substr) on the run_at timestamp:

# run: TestStatus, FailuresByDevice, RunsPerDay
# load: smoke_tests.csv as smoke_tests
@Engine("duckdb");

# Tables

## Smoke test runs loaded from smoke_tests.csv.
SmokeTests(test_id:, device:, status:, duration_ms:, run_at:) :-
  smoke_tests(test_id:, device:, status:, duration_ms:, run_at:);

# Concepts

@OrderBy(Device, "device");
Device(device:) distinct :- SmokeTests(device:);

@OrderBy(TestStatus, "status");
TestStatus(status:) distinct :- SmokeTests(status:);

# Rules

## Number of failed smoke tests per device.
@OrderBy(FailuresByDevice, "failures DESC");
FailuresByDevice(device:, failures? += 1) distinct :-
  Device(device:), SmokeTests(device:, status: "failed");

## Smoke test runs per calendar day.
@OrderBy(RunsPerDay, "day");
RunsPerDay(day:, runs? += 1) distinct :-
  SmokeTests(run_at:), day == Substr(ToString(run_at), 1, 10);

Generated SQL and execution results

$ synalog.check('loading_csv.l')
No errors found.

-- Loaded smoke_tests.csv into DuckDB table smoke_tests (12 rows)

$ synalog.compile('loading_csv.l', 'TestStatus')
-- Initializing DuckDB environment.
create schema if not exists logica_home;
-- Empty record, has to have a field by DuckDB syntax.
drop type if exists logicarecord893574736 cascade; create type logicarecord893574736 as struct(nirvana numeric);
create sequence if not exists eternal_logical_sequence;

SELECT
  smoke_tests.status AS status
FROM
  smoke_tests
GROUP BY smoke_tests.status ORDER BY status;

-- Executed on DuckDB:
| status |
|--------|
| failed |
| flaky  |
| passed |
(3 rows)

$ synalog.compile('loading_csv.l', 'FailuresByDevice')
-- Initializing DuckDB environment.
create schema if not exists logica_home;
-- Empty record, has to have a field by DuckDB syntax.
drop type if exists logicarecord893574736 cascade; create type logicarecord893574736 as struct(nirvana numeric);
create sequence if not exists eternal_logical_sequence;

WITH t_0_Device AS (SELECT
  t_2_smoke_tests.device AS device
FROM
  smoke_tests AS t_2_smoke_tests
GROUP BY t_2_smoke_tests.device ORDER BY device)
SELECT
  Device.device AS device,
  SUM(1) AS failures
FROM
  t_0_Device AS Device, smoke_tests
WHERE
  (E'failed' = smoke_tests.status) AND
  (Device.device = smoke_tests.device)
GROUP BY Device.device ORDER BY failures DESC;

-- Executed on DuckDB:
| device   | failures |
|----------|----------|
| gateway  | 2        |
| sensor-a | 1        |
(2 rows)

$ synalog.compile('loading_csv.l', 'RunsPerDay')
-- Initializing DuckDB environment.
create schema if not exists logica_home;
-- Empty record, has to have a field by DuckDB syntax.
drop type if exists logicarecord893574736 cascade; create type logicarecord893574736 as struct(nirvana numeric);
create sequence if not exists eternal_logical_sequence;

SELECT
  SUBSTR(CAST(smoke_tests.run_at AS TEXT), 1, 10) AS day,
  SUM(1) AS runs
FROM
  smoke_tests
GROUP BY SUBSTR(CAST(smoke_tests.run_at AS TEXT), 1, 10) ORDER BY day;

-- Executed on DuckDB:
| day        | runs |
|------------|------|
| 2026-06-10 | 6    |
| 2026-06-11 | 6    |
(2 rows)

Pagination¶

AI agents have limited context windows, so Synalog supports pagination at compile time via the limit and offset arguments:

sql = synalog.compile(source, "EngineeringTeam", limit=20, offset=40)

The limit combines with the @Limit directive: the effective limit is min(limit, @Limit).

Always set @OrderBy

Pagination is only deterministic when results have a stable sort order. Put an @OrderBy directive on every concept and rule.

Choosing an engine¶

The target SQL dialect is set with the @Engine annotation in the program, or overridden with the engine keyword of the Python API functions:

sql = synalog.compile(source, "EngineeringTeam", engine="psql")

Valid engines: sqlite, duckdb (default), bigquery, psql, presto, trino, databricks. See Supported engines.

Complete example¶

The same program as a standalone .l file, with the SQL it compiles to and the rows it returns on DuckDB:

# run: EngineeringTeam
@Engine("duckdb");

# Tables
Employee(name: "Alice", department: "Engineering", salary: 75000);
Employee(name: "Bob", department: "Marketing", salary: 65000);
Employee(name: "Charlie", department: "Engineering", salary: 80000);

# Rules
@OrderBy(EngineeringTeam, "name");
EngineeringTeam(name:, salary:) :- Employee(name:, department: "Engineering", salary:);

Generated SQL and execution results

$ synalog.check('getting_started.l')
No errors found.

$ synalog.compile('getting_started.l', 'EngineeringTeam')
-- Initializing DuckDB environment.
create schema if not exists logica_home;
-- Empty record, has to have a field by DuckDB syntax.
drop type if exists logicarecord893574736 cascade; create type logicarecord893574736 as struct(nirvana numeric);
create sequence if not exists eternal_logical_sequence;

WITH t_0_Employee AS (SELECT * FROM (

    SELECT
      E'Alice' AS name,
      E'Engineering' AS department,
      75000 AS salary
   UNION ALL

    SELECT
      E'Bob' AS name,
      E'Marketing' AS department,
      65000 AS salary
   UNION ALL

    SELECT
      E'Charlie' AS name,
      E'Engineering' AS department,
      80000 AS salary

) AS UNUSED_TABLE_NAME  )
SELECT
  Employee.name AS name,
  Employee.salary AS salary
FROM
  t_0_Employee AS Employee
WHERE
  (Employee.department = E'Engineering') ORDER BY name;

-- Executed on DuckDB:
| name    | salary |
|---------|--------|
| Alice   | 75000  |
| Charlie | 80000  |
(2 rows)

Next steps¶

Learn the program structure and the syntax.
Model your data as a knowledge graph.
Explore the complete Python API.
Run and explore programs with the CLI interface.