FAQ
General questions
- What is the difference between DSPy, AdalFlow and Synalinks?
- Why do you focus on in-context techniques?
- I already use structured output, why would I use Synalinks?
- Can Synalinks be used for non-LMs applications
- What are my options for saving programs?
- How to do hyperparameter tuning with Synalinks?
- Where is the Synalinks configuration file stored?
- How should I cite Synalinks?
- Do you provide help or support?
Training related questions
- What do "sample", "batch", and "epoch" mean?
- What's the difference between the
training
argument incall()
and thetrainable
attribute? - In
fit()
, how is the validation split computed? - In
fit()
, is the data shuffled during training?
What is the difference between DSPy, AdalFlow and Synalinks?
Unlike DSPy and AdalFlow, Synalinks is inspired by Keras and not PyTorch. The reason for that is that ease of use is an important factor for us. The Functional API inspired by Keras make it easy and natural to describe any workflows. We additionaly provide more LLMOps features than DSPY and AdalFlow and have a strong focus on neuro-symbolic systems.
Why do you focus on in-context techniques?
In the real-world, most problems that companies face, are not labelled in public datasets for ML engineers to train their networks against. Meaning that their is a big discrepency between the results annonced on public benchmarks and real-world problems. Making adoption difficult when companies face the reality and complexity of real-world.
Training a whole Language Model (LM) from scratch is out of the reach of most companies, so adapting them to real-world tasks is essential.
LMs have the capability to leverage their prompt to mimick the examples given, but it means that one have to update the examples each time you change the pipelines as you experiment. Making it cumberstone, but even with that, their is no guaranty that the examples you gave yield to the best results.
To select the best examples and hints to give to the LMs, it needs a complex system like Synalinks that automate the generation and selection.
I already use structured output, why would I use Synalinks?
While structured output ensure a correct format, ready to parse, it doesn't guaranty the content of the LMs answers. Synalinks use constrained structured output in conjunction with in-context techniques to ensure both format and content correctness.
Can Synalinks be used for non-LMs applications?
While Synalinks provide everything to work with LMs, we emphasize that you can create modules (or entire pipelines) that don't use them. In fact, many neuro-symbolic systems use a conjunction of LMs modules with non-LMs modules. Synalinks backend can suit any algorithm that works with any kind of data-structure as long as they are formalized in JSON.
What are my options for saving programs?
1. Whole-program saving (configuration + variables)
Whole program saving means creating a file that will contain:
- The architecture of the program, allowing you to re-create the program.
- The variables of the program
- The training configuration (reward, optimizer, metrics)
- The state of the optimizer, allowing you to resumethe training exactly where you left off.
The default and recommended way to save the whole program is to do:
program.save("my_program.json")
After saving a program you can re-instanciate it via:
program = synalinks.Program.load("my_program.json")
2. Variables-only saving
If you need to save the variables of a program, you can it using:
program.save_variables("my_program.variables.json")
You can then load the variables into a program with the same architecture:
program.load_variables("my_program.variables.json")
Note: All programs and/or variables are saved in JSON format, we removed the pickle format for obvious security concerns.
How to do hyperparameter tuning with Synalinks?
As today, there is no way to perform automatic hyperparameter tuning with Synalinks, we might consider it in the future.
Where is the Synalinks configuration file stored?
The default directory where all Synalinks data is stored is:
Note that Windows users should replace $HOME
with %USERPROFILE%
.
In case Synalinks cannot create the above directory (e.g. due to permission issues), /tmp/.synalinks/
is used as a backup.
The Synalinks configuration file is a JSON file stored at $HOME/.synalinks/synalinks.json. The default configuration file looks like this:
Likewise, cached dataset files, such as those downloaded with get_file()
, are stored by default in $HOME/.synalinks/datasets/
.
How should I cite Synalinks?
Please cite Synalinks if it is Useful in your research. Here is the bibtex entry to use:
@misc{sallami2025synalinks,
title={Synalinks},
author={Sallami, Yoan and Chollet, Fran\c{c}ois},
year={2025},
howpublished={\url{https://github.com/SynaLinks/Synalinks}},
}
Do you provide help or support?
We provide consulting, development and technical support for companies that want to implement any neuro-symbolic systems. Using a framework is one thing, having a complete view of possible neuro-symbolic applications and the knowledge to create such complex systems is another. If you can't afford our services, you can find help in our public Discord channel.
What do "sample", "batch", and "epoch" mean?
- Sample: A sample is one element of a dataset. For example, one DataModel is one sample.
- Batch: A batch is a set of N samples. The samples in a batch are processed independently, in parallel. During training, a batch result in only one program update. A batch approximates the input distribution better than a single input. The larger the batch, the better the approximation; however a larger batch will take longer to process and still result in only one update.
- Epochs: A epochs is an arbitrarly cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is Useful for logging and periodic evaluation. When using
validation_split
orvalidation_data
with thefit
method of Synalinks programs, evaluation will be run at the end of every epoch.
What's the difference between the training
argument in call()
and the trainable
attribute?
training
is a boolean argument in call
that determines whether the call should be run in inference mode or training mode. For example, in training mode, a Generator
module save each LM prediction for later backpropagation. In inference mode, the Generator
doesn't save the predictions.
trainable
is a boolean module attribute that determines the trainable variables of the module should be updated to maximize the reward during training. If module.trainable
is set to False, then module.trainable_variables
will always be an empty list.
Example:
import synalinks
class ThinkingWithAnswer(synalinks.DataModel):
thinking: str
answer: str
language_model = synalinks.LanguageModel("ollama_chat/deepseek-r1")
program = synalinks.Sequential(
[
synalinks.Generator(
data_model=ThinkingWithAnswer,
language_model=language_model,
),
synalinks.Generator(
data_model=ThinkingWithAnswer,
language_model=language_model,
),
]
)
program.modules[0].trainable = False # Freeze the first generator
assert program.modules[0].trainable_variables == []
program.compile(...)
program.fit(...) # Train only the second generator
In essence, "inference mode vs training mode" and "module variable trainability" are two very different concepts.
In fit()
, how is the validation split computed?
If you set the validation_split
argument in program.fit
to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed.
The same validation set is used for all epochs (within the same call to fit).
Note that the validation_split option is only available if your data is passed as Numpy arrays.
In fit()
, is the data shuffled during training?
If you pass your data as NumPy arrays and if the shuffle
argument in program.fit()
is set to True (which is the default), the training data will be globally randomly shuffled at each epoch.
Validation data is never shuffled.
What's the difference between Program
methods predict()
and __call__()
?
predict()
loops over the data in batches (you can specify the batch size via predict(x, batch_size=64)
) and returns a Numpy array of predictions.
While program(x)
perform a single prediction, and is used to create APIs that process a single user request at a time.
This means that predict()
calls can perform prediction on a dataset. While program(x)
perform a single prediction.