Observability

Observability is the ability to understand the internal state of a system by examining its outputs. In LM applications, this means tracking every prompt, response, token usage, and decision - enabling you to debug issues, optimize performance, and monitor production systems.

Why Observability Matters

LM applications are inherently non-deterministic and complex. Without observability, you're flying blind:

graph LR
    subgraph Without Observability
        A[Input] --> B[Black Box]
        B --> C[Output]
        C --> D["Why did it fail?"]
    end
    subgraph With Observability
        E[Input] --> F[Traced Pipeline]
        F --> G[Output]
        H[Traces] --> I[Debug & Optimize]
    end

Observability enables:

Debugging: See exactly what prompts were sent and responses received
Performance Monitoring: Track latency, token usage, and costs
Quality Assurance: Identify and fix problematic outputs
Optimization: Find bottlenecks and improve efficiency

Enabling Observability

Synalinks uses MLflow for tracing and metrics:

import synalinks

synalinks.enable_observability(
    tracking_uri="http://localhost:5000",  # MLflow server
    experiment_name="my_experiment",        # Group related runs
)

Start the MLflow UI:

mlflow ui --port 5000

Then open http://localhost:5000 in your browser.

What Gets Traced

Every operation in your program is automatically traced:

graph TD
    A[Program Call] --> B[Trace]
    B --> C[Module: Input]
    B --> D[Module: Generator]
    D --> E[LLM Call]
    E --> F[Prompt]
    E --> G[Response]
    E --> H[Token Count]
    B --> I[Module: Branch]
    I --> J[Selected Path]

Trace Contents

Component	What's Captured
Program	Input/output DataModels, execution time
Module	Each module's inputs, outputs, parameters
LLM Call	Full prompt, response, model name, tokens
Tool Call	Tool name, arguments, result
Training	Metrics, hyperparameters, artifacts

Logging Levels

Control log verbosity for debugging:

import synalinks

# Detailed logging - every LLM call logged
synalinks.enable_logging(log_level="debug")

# Standard logging - key events only
synalinks.enable_logging(log_level="info")

# Quiet logging - warnings and errors only
synalinks.enable_logging(log_level="warning")

Debugging Tools

Program Summary

Inspect your program's structure:

program.summary()

Output:

Program: qa_program
description: 'A `Functional` program is a `Program` defined as a directed graph
of modules.'
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Module (type)               ┃ Output Schema         ┃    Variable # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_module (InputModule)  │ Query                 │             0 │
├─────────────────────────────┼───────────────────────┼───────────────┤
│ generator (Generator)       │ Answer                │             2 │
└─────────────────────────────┴───────────────────────┴───────────────┘
 Total variables: 2
 Trainable variables: 2

Program Visualization

Generate a visual graph of your program:

synalinks.utils.plot_program(
    program,
    to_folder="output",
    show_module_names=True,
    show_trainable=True,
)

This creates a PNG file showing the computation graph.

Trainable Variables

Inspect what can be optimized:

for var in program.trainable_variables:
    print(f"Variable: {var.name}")
    print(f"  Value: {var.value}")

MLflow Integration

Viewing Traces

Navigate to the MLflow UI at http://localhost:5000:

Select your experiment from the sidebar
Click on a run to see its traces
Expand traces to see individual module calls
Click on LLM calls to see full prompts/responses

Trace Structure

Each trace shows the hierarchical execution:

Program Call: qa_program
├── Module: Input
│   ├── Input: {"query": "What is Python?"}
│   └── Duration: 0.001s
├── Module: Generator
│   ├── LLM Call: openai/gpt-4.1-mini
│   │   ├── Prompt: [full text]
│   │   ├── Response: [full text]
│   │   ├── Input tokens: 150
│   │   └── Output tokens: 50
│   └── Duration: 1.2s
└── Output: {"answer": "Python is..."}

Training Metrics

During training, MLflow captures:

Per-epoch metrics (reward, accuracy)
Validation metrics
Hyperparameters (optimizer settings, epochs)
Artifacts (saved program checkpoints)

Complete Example

import asyncio
from dotenv import load_dotenv
import synalinks

class Query(synalinks.DataModel):
    """User question."""
    query: str = synalinks.Field(description="User question")

class Answer(synalinks.DataModel):
    """Answer with reasoning."""
    thinking: str = synalinks.Field(description="Step by step thinking")
    answer: str = synalinks.Field(description="The final answer")

async def main():
    load_dotenv()
    synalinks.clear_session()

    # Enable observability
    synalinks.enable_observability(
        tracking_uri="http://localhost:5000",
        experiment_name="observability_demo",
    )

    # Enable logging
    synalinks.enable_logging(log_level="info")

    lm = synalinks.LanguageModel(model="openai/gpt-4.1-mini")

    # Create program
    inputs = synalinks.Input(data_model=Query)
    outputs = await synalinks.Generator(
        data_model=Answer,
        language_model=lm,
    )(inputs)

    program = synalinks.Program(
        inputs=inputs,
        outputs=outputs,
        name="traced_qa",
    )

    # Print summary
    program.summary()

    # Run program - traces are automatically captured
    result = await program(Query(query="What is Python?"))
    print(f"Answer: {result['answer']}")

    # Visualize program
    synalinks.utils.plot_program(
        program,
        show_module_names=True,
    )

    # Inspect trainable variables
    print("\nTrainable variables:")
    for var in program.trainable_variables:
        print(f"  - {var.name}")

if __name__ == "__main__":
    asyncio.run(main())

Production Monitoring

For production deployments, set up persistent MLflow tracking:

import synalinks

# Use a remote MLflow server
synalinks.enable_observability(
    tracking_uri="http://mlflow.your-domain.com:5000",
    experiment_name="production",
)

Monitor key metrics:

Latency: Time per request
Token Usage: Cost per request
Error Rate: Failed requests
Quality Scores: If using LMAsJudge or similar

Key Takeaways

MLflow Integration: Synalinks uses MLflow for comprehensive tracing and metrics. All traces are automatically captured.
enable_observability(): Call this at startup with your MLflow server URI and experiment name.
Trace Hierarchy: Traces show the full execution path from program to individual LLM calls.
Debugging Tools: Use program.summary(), plot_program(), and trainable_variables for inspection.
Logging Levels: Control verbosity with enable_logging() - use DEBUG for development, WARNING for production.
Production Ready: Point to a remote MLflow server for production monitoring and alerting.

API References

`Answer`

Bases: DataModel

Answer with reasoning.

Source code in guides/8_observability.py

class Answer(synalinks.DataModel):
    """Answer with reasoning."""

    thinking: str = synalinks.Field(description="Step by step thinking")
    answer: str = synalinks.Field(description="The final answer")

`Query`

Bases: DataModel

User question.

Source code in guides/8_observability.py

class Query(synalinks.DataModel):
    """User question."""

    query: str = synalinks.Field(description="User question")