Training

Training in Synalinks is fundamentally different from traditional machine learning. Instead of updating model weights through backpropagation, Synalinks uses in-context learning optimization - improving your programs by optimizing the prompts, instructions, and examples that guide the language model.

The Philosophy of In-Context Learning

Traditional ML updates weights; Synalinks updates context:

graph LR
    subgraph Traditional ML
        A[Data] --> B[Backprop]
        B --> C[Update Weights]
        C --> D[Better Model]
    end
    subgraph Synalinks Training
        E[Data] --> F[Evaluate]
        F --> G[Update Context]
        G --> H[Better Prompts/Examples]
    end

This approach has key advantages:

No Gradient Computation: Works with any LLM API
Interpretable: You can read and understand what was learned
Modular: Each module learns independently
Fast: No heavy computation - just prompt optimization

What Gets Optimized

Each Generator module has two trainable variables:

graph TD
    A[Generator] --> B[instruction_variable]
    A --> C[examples_variable]
    B --> D["System prompt optimization"]
    C --> E["Few-shot example selection"]

instruction_variable: The system prompt or instruction prefix
examples_variable: Few-shot examples injected into the prompt

The Training Loop

Training follows a familiar pattern:

import synalinks

# 1. Create your program
program = synalinks.Program(inputs=inputs, outputs=outputs)

# 2. Compile with optimizer and reward
program.compile(
    optimizer=synalinks.optimizers.RandomFewShot(nb_max_examples=3),
    reward=synalinks.ExactMatch(key="answer"),
)

# 3. Train
history = await program.fit(
    x=training_data,
    epochs=5,
    validation_data=test_data,
    verbose=1,
)

# 4. Save the trained program
program.save("trained_program.json")

Training Data Format

Training data consists of separate NumPy arrays for inputs (x) and expected outputs (y):

import numpy as np

x_train = np.array(
    [
        InputModel(field="value"),
        InputModel(field="value2"),
        # ... more examples
    ],
    dtype="object",
)
y_train = np.array(
    [
        OutputModel(result="expected"),
        OutputModel(result="expected2"),
        # ... more examples
    ],
    dtype="object",
)

Both arrays must contain DataModel instances matching your program's input and output schemas.

Optimizers

Optimizers determine how trainable variables are updated:

RandomFewShot

Randomly samples k examples from training data to use as few-shot prompts:

optimizer = synalinks.optimizers.RandomFewShot(nb_max_examples=3)

Simple and effective
Good baseline for most tasks
Low computational overhead

OMEGA (Optimizing Memory with Evolution and Gradient Alignment)

Advanced evolutionary optimizer that:

Maintains a population of prompt variants
Uses fitness-based selection
Applies crossover and mutation
Converges to high-performing prompts

optimizer = synalinks.OMEGA(
    population_size=10,
    mutation_rate=0.1,
)

Rewards

Rewards measure how well outputs match expected values:

ExactMatch

Returns 1.0 if field values match exactly, 0.0 otherwise:

reward = synalinks.ExactMatch(key="answer")

Best for:

Classification tasks
Factual QA with known answers
Tasks where partial credit doesn't make sense

CosineSimilarity

Uses embedding similarity between outputs and expected values:

reward = synalinks.CosineSimilarity(
    embedding_model=embedding_model,
    key="answer",
)

Best for:

Open-ended generation
Semantic similarity matters
Multiple valid phrasings

LMAsJudge

Uses another LLM to evaluate output quality:

reward = synalinks.LMAsJudge(
    language_model=judge_model,
    instructions="accuracy, helpfulness, clarity",
)

Best for:

Complex evaluation criteria
Subjective quality assessment
When exact matching is too strict

Metrics

Track performance during training:

program.compile(
    optimizer=optimizer,
    reward=reward,
    metrics=[
        synalinks.metrics.MeanMetricWrapper(fn=reward, name="mean_reward"),
    ],
)

The training history contains all tracked metrics:

history = await program.fit(x=data, epochs=5)

print(history.history.keys())
# ['mean_reward', 'val_mean_reward']

Complete Example

import asyncio
from dotenv import load_dotenv
import synalinks

# =============================================================================
# Data Models
# =============================================================================

class MathProblem(synalinks.DataModel):
    """A math problem."""
    problem: str = synalinks.Field(description="The math problem to solve")

class MathAnswer(synalinks.DataModel):
    """A math answer."""
    thinking: str = synalinks.Field(description="Step by step calculation")
    answer: str = synalinks.Field(description="The numerical answer only")

# =============================================================================
# Main
# =============================================================================

async def main():
    load_dotenv()
    synalinks.clear_session()

    lm = synalinks.LanguageModel(model="openai/gpt-4.1-mini")

    # -------------------------------------------------------------------------
    # Prepare Training Data
    # -------------------------------------------------------------------------
    train_data = [
        (MathProblem(problem="2 + 3"), MathAnswer(thinking="2 + 3 = 5", answer="5")),
        (MathProblem(problem="5 * 4"), MathAnswer(thinking="5 * 4 = 20", answer="20")),
        (MathProblem(problem="10 - 3"), MathAnswer(thinking="10 - 3 = 7", answer="7")),
        (MathProblem(problem="8 / 2"), MathAnswer(thinking="8 / 2 = 4", answer="4")),
        (
            MathProblem(problem="3 + 3 + 3"),
            MathAnswer(thinking="3 + 3 + 3 = 9", answer="9"),
        ),
        (MathProblem(problem="7 * 2"), MathAnswer(thinking="7 * 2 = 14", answer="14")),
    ]

    test_data = [
        (MathProblem(problem="4 + 5"), MathAnswer(thinking="4 + 5 = 9", answer="9")),
        (MathProblem(problem="6 * 3"), MathAnswer(thinking="6 * 3 = 18", answer="18")),
    ]

    # -------------------------------------------------------------------------
    # Create Program
    # -------------------------------------------------------------------------
    inputs = synalinks.Input(data_model=MathProblem)
    outputs = await synalinks.Generator(
        data_model=MathAnswer,
        language_model=lm,
    )(inputs)

    program = synalinks.Program(
        inputs=inputs,
        outputs=outputs,
        name="math_solver",
    )

    # -------------------------------------------------------------------------
    # Compile with Optimizer and Reward
    # -------------------------------------------------------------------------
    reward = synalinks.ExactMatch(key="answer")

    program.compile(
        optimizer=synalinks.optimizers.RandomFewShot(nb_max_examples=3),
        reward=reward,
        metrics=[
            synalinks.metrics.MeanMetricWrapper(fn=reward, name="mean_reward"),
        ],
    )

    # -------------------------------------------------------------------------
    # Train
    # -------------------------------------------------------------------------
    history = await program.fit(
        x=train_data,
        epochs=2,
        validation_data=test_data,
        verbose=1,
    )

    print(f"Training history: {list(history.history.keys())}")

    # -------------------------------------------------------------------------
    # Test Trained Program
    # -------------------------------------------------------------------------
    result = await program(MathProblem(problem="9 + 1"))
    print(f"9 + 1 = {result['answer']}")

    # -------------------------------------------------------------------------
    # Save and Load
    # -------------------------------------------------------------------------
    program.save("trained_math.json")
    loaded = synalinks.Program.load("trained_math.json")

    result = await loaded(MathProblem(problem="100 / 10"))
    print(f"100 / 10 = {result['answer']}")

    import os
    if os.path.exists("trained_math.json"):
        os.remove("trained_math.json")

if __name__ == "__main__":
    asyncio.run(main())

Best Practices

Start Simple

Begin with RandomFewShot and ExactMatch:

program.compile(
    optimizer=synalinks.optimizers.RandomFewShot(nb_max_examples=3),
    reward=synalinks.ExactMatch(key="answer"),
)

Only move to more complex optimizers/rewards if needed.

Use Quality Training Data

Ensure examples are correct and representative
Include edge cases and variations
Balance difficulty levels

Monitor Validation Metrics

Always use validation data to detect overfitting:

history = await program.fit(
    x=train_data,
    validation_data=val_data,  # Always include this
    epochs=5,
)

Save Checkpoints

Save your program after training to preserve learned state:

program.save("checkpoint.json")

Key Takeaways

In-Context Learning: Synalinks optimizes prompts and examples, not model weights. This works with any LLM API.
Trainable Variables: Each Generator has instruction and example variables that get optimized during training.
compile() + fit(): Familiar Keras-like API for configuring and running the training loop.
Optimizers: Start with RandomFewShot, move to OMEGA for more sophisticated optimization.
Rewards: Choose based on your task - ExactMatch for exact answers, CosineSimilarity for semantic similarity, LMAsJudge for complex criteria.
Save Trained State: Use program.save() to preserve learned prompts and examples for deployment.

API References

`MathAnswer`

Bases: DataModel

A math answer.

Source code in guides/7_training.py

class MathAnswer(synalinks.DataModel):
    """A math answer."""

    thinking: str = synalinks.Field(description="Step by step calculation")
    answer: str = synalinks.Field(description="The numerical answer only")

`MathProblem`

Bases: DataModel

A math problem.

Source code in guides/7_training.py

class MathProblem(synalinks.DataModel):
    """A math problem."""

    problem: str = synalinks.Field(description="The math problem to solve")