# Synalinks OSS > Keras based LM framework for neuro-symbolic applications Keras based LM framework for neuro-symbolic applications and In-Context learning # Usage documentation # Introduction ______________________________________________________________________ ## What is Synalinks? Synalinks is an open-source framework that makes it easy to create, evaluate, train, and deploy industry-standard Language Models (LMs) applications. Synalinks follows the principle of *progressive disclosure of complexity*: meaning that simple workflows should be quick and easy, while arbitrarily advanced ones should be possible via a clear path that builds upon what you've already learned. Synalinks is an *adaptation of Keras 3* focused on neuro-symbolic systems and in-context reinforcement learning, an ensemble of techniques that enhance the LMs predictions and accuracy without changing the weights of the model. The goal of Synalinks is to facilitate the rapid setup of simple applications while providing the flexibility for researchers and advanced users to develop sophisticated systems. ______________________________________________________________________ Info Too busy to read the documentation? Give the [llms.txt](https://synalinks.github.io/synalinks/llms.txt) or [llms-full.txt](https://synalinks.github.io/synalinks/llms-full.txt) to you favorite LMs or AI coding tools. Or better, use [Synalinks Claude Skills](https://github.com/SynaLinks/synalinks-skills) with Claude Code to use Synalinks right away! ## Who is Synalinks for? Synalinks is designed for a diverse range of users, from professionals and AI researchers to students, independent developers, and hobbyists. It is suitable for anyone who wants to learn about AI by building/composing blocks or build solid foundations for enterprise-grade products. While a background in Machine Learning and Deep Learning can be advantageous — as Synalinks leverages design patterns from Keras, one of the most user-friendly and popular Deep Learning frameworks — it is not a prerequisite. Synalinks is designed to be accessible to anyone with programming skills in Python, making it a versatile and inclusive platform for AI development. ______________________________________________________________________ ## Why use Synalinks? Developping a successful LM application in a profesional context, beyond stateless chatbots, is difficult and typically include: - **Building optimized prompts with examples/instructions at each step**: Synalinks uses advanced In-Context Reinforcement Learning techniques to optimize each prompt. - **Pipelines that change over time**: Easily edit your pipelines, re-run your training, and you're good to go. - **Ensuring the correctness of the LMs output**: Synalinks combines constrained structured output with In-Context RL to ensure both format and content correctness. - **Async Optimization**: Synalinks automatically optimizes your pipelines by detecting parallel processes. - **Assessing the performance of your application**: Synalinks provides built-in metrics and rewards to evaluate your workflows. - **Configuring Language & Embedding Models**: Seamlessly integrate multiple LM providers like Ollama, Anthropic, Mistral or Groq. - **Documenting your ML workflows**: Plot your workflows, training history, and evaluations; document everything. - **Versioning the prompts/pipelines**: Each program is serializable into JSON so you can version it with git. - **Deploying REST APIs**: Compatible out-of-the-box with FastAPI so your Data Scientists and Web Developers can stop tearing each other apart. Synalinks can help you simplify these tasks by leveraging decade old practices in Deep Learning frameworks. We provide a comprehensive suite of tools and features designed to streamline the development process, making it easier to create, evaluate, train, document and deploy robust neuro-symbolic LMs applications. ______________________________________________________________________ Source code in `synalinks/src/trainers/trainer.py` ```` class Trainer: def __init__(self): self._lock = False self._run_eagerly = False self.compiled = False self.reward = None self.steps_per_execution = 1 # Can be set by callbacks in on_train_begin self._initial_epoch = None self._compute_reward_has_training_arg = ( "training" in inspect.signature(self.compute_reward).parameters ) # Placeholders used in `compile` self._optimizer = None self._compile_reward = None self._compile_metrics = None self._reward_tracker = None @tracking.no_automatic_dependency_tracking def compile( self, optimizer=None, reward=None, reward_weights=None, metrics=None, run_eagerly=False, steps_per_execution=1, ): """Configures the program for training. Example: ```python program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), metrics=[ synalinks.metrics.MeanMetricWrapper(synalinks.rewards.exact_match), ], ) ``` Args: optimizer (Optimizer): Optimizer instance. See `synalinks.optimizers`. reward (Reward): Reward function. A `synalinks.rewards.Reward` instance. See `synalinks.rewards`. A reward function is any callable with the signature `reward = fn(y_true, y_pred)`, where `y_true` are the ground truth values, and `y_pred` are the program's predictions. `y_true` should be a list of batch size length `[d0, .. dN]`. `y_pred` should be a list of batch size length `[d0, .. dN]`. The reward function should return a float. reward_weights (list): Optional list specifying scalar coefficients (Python floats) to weight the reward contributions of different program outputs. The reward value that will be maximized by the program will then be the *weighted sum* of all individual rewards, weighted by the `reward_weights` coefficients. It is expected to have a 1:1 mapping to the program's outputs. metrics (list): List of metrics to be evaluated by the program during training and testing. Each of it is a `synalinks.metrics.Metric` instance. See `synalinks.metrics`. A function is any callable with the signature `result = fn(y_true, y_pred)`. run_eagerly (bool): If `True`, this program's forward pass will never be compiled. It is recommended to leave this as `False` when training (for best performance), and to set it to `True` when debugging. steps_per_execution (int): The number of batches to run during each a single compiled function call. Running multiple batches inside a single compiled function call can greatly improve performance on TPUs or small programs with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if `steps_per_execution` is set to `N`, `Callback.on_batch_begin` and `Callback.on_batch_end` methods will only be called every `N` batches (i.e. before/after each compiled function execution). """ self._clear_previous_trainer_metrics() # Resolve string/dict identifiers (Keras-style) into instances. # Reward and metrics flow through `CompileReward`/`CompileMetrics` # which already call `rewards.get` / `metrics.get`. self._optimizer = optimizers_module.get(optimizer) if self._optimizer is not None: self._optimizer.set_program(self) if hasattr(self, "output_names"): output_names = self.output_names else: output_names = None if reward is not None: reward = rewards_module.get(reward) reduction = getattr(reward, "reduction", "mean") self._compile_reward = CompileReward( reward, reward_weights, reduction=reduction, output_names=output_names, ) self.reward = reward if metrics is not None: self._compile_metrics = CompileMetrics(metrics, output_names=output_names) # Operational metrics (e.g. TotalTokens, Throughput) accept # `language_model=None`; bind them to every LM reachable from # the program so counters aggregate automatically. for m in tree.flatten(metrics): if hasattr(m, "bind_program"): m.bind_program(self) self.run_eagerly = run_eagerly self.stop_training = False self.compiled = True self._reward_tracker = metrics_module.Mean(name="reward") self.steps_per_execution = steps_per_execution self._compile_config = serialization_lib.SerializableDict( optimizer=optimizer, reward=reward, reward_weights=reward_weights, metrics=metrics, run_eagerly=run_eagerly, steps_per_execution=steps_per_execution, ) @property def optimizer(self): return self._optimizer @property def metrics(self): # Order: reward tracker, individual reward trackers, compiled metrics, # custom metrcis, submodule metrics. metrics = [] if self.compiled: if self._reward_tracker is not None: metrics.append(self._reward_tracker) if self._compile_metrics is not None: metrics.append(self._compile_metrics) if self._compile_reward is not None: metrics.extend(self._compile_reward.metrics) metrics.extend(self._metrics) for module in self._flatten_modules(include_self=False): if isinstance(module, Trainer): # All Trainer-related metrics in submodules should be ignored # because a new Trainer has been instantiated. continue metrics.extend(module.metrics) return metrics @property def metrics_names(self): return [m.name for m in self.metrics] def reset_metrics(self): for m in self.metrics: m.reset_state() def _get_own_metrics(self): metrics = [] if self._reward_tracker is not None: metrics.append(self._reward_tracker) if self._compile_metrics is not None: metrics.append(self._compile_metrics) if self._compile_reward is not None: metrics.extend(self._compile_reward.metrics) metrics.extend(self._metrics) return metrics def _clear_previous_trainer_metrics(self): for module in self._flatten_modules(include_self=False): if not isinstance(module, Trainer): continue # A submodule might be a Trainer. In that case, we need to clear # the Trainer-related metrics, as they are not usable when a # new Trainer is instantiated. for m in self._get_own_metrics(): module._tracker.untrack(m) module._reward_tracker = None module._compile_metrics = None if module._compile_reward is not None: module._compile_reward._metrics.clear() module._metrics.clear() @property def run_eagerly(self): return self._run_eagerly @run_eagerly.setter def run_eagerly(self, value): self._run_eagerly = value async def compute_reward( self, x=None, y=None, y_pred=None, training=True, ): """Compute per-sample rewards for each prediction. Subclasses can optionally override this method to provide custom reward computation logic. Args: x (list): Input data. y (list): Target data. y_pred (list): Predictions returned by the program (output of `program(x)`). training (bool): Whether we are training or evaluating the program. Returns: (list[float]): A list of per-sample reward values, one for each (y, y_pred) pair. """ # The default implementation does not use `x` or `training`. del x del training rewards = [] prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "reward") try: if self._compile_reward is not None: if not self._compile_reward.built: self._compile_reward.build(y[0], y_pred[0]) if self._compile_reward.has_batch_rewards: results = await self._compile_reward.compute_batch(y, y_pred) else: results = await asyncio.gather( *[self._compile_reward(y_t, y_p) for y_t, y_p in zip(y, y_pred)] ) for reward in results: if reward is not None: rewards.append(float(reward)) else: rewards.append(0.0) for reward in self.rewards: rewards.append(float(numpy.sum(reward))) if len(rewards) == 0: rewards = [0.0] return rewards finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) def stateless_compute_reward( self, trainable_variables, non_trainable_variables, metrics_variables, x=None, y=None, y_pred=None, training=True, ): var_mapping = list(zip(self.trainable_variables, trainable_variables)) var_mapping.extend(zip(self.non_trainable_variables, non_trainable_variables)) var_mapping.extend(zip(self.metrics_variables, metrics_variables)) with backend.StatelessScope(state_mapping=var_mapping) as scope: # Note that this is needed for the regularization reward, which need # the latest value of train/non-trainable variables. reward = self._compute_reward( x, y, y_pred, training=training, ) # Update non trainable vars (may have been updated in compute_reward) non_trainable_variables = [] for v in self.non_trainable_variables: new_v = scope.get_current_value(v) non_trainable_variables.append(new_v) # Update metrics vars (may have been updated in compute_reward) metrics_variables = [] for v in self.metrics_variables: new_v = scope.get_current_value(v) metrics_variables.append(new_v) return reward, ( trainable_variables, non_trainable_variables, metrics_variables, ) async def compute_metrics(self, x, y, y_pred): """Update metric states and collect all metrics to be returned. Subclasses can optionally override this method to provide custom metric updating and collection logic. Custom metrics are not passed in `compile()`, they can be created in `__init__` or `build`. They are automatically tracked and returned by `self.metrics`. ``` Args: x: Input data. y: Target data. y_pred: Predictions returned by the program output of `program.call(x)`. Returns: A `dict` containing values that will be passed to `synalinks.callbacks.CallbackList.on_train_batch_end()`. Typically, the values of the metrics listed in `self.metrics` are returned. Example: `{'reward': 0.2, 'accuracy': 0.7}`. """ del x # The default implementation does not use `x`. if self._compile_metrics is not None: for y_t, y_p in zip(y, y_pred): await self._compile_metrics.update_state(y_t, y_p) return self.get_metrics_result() def get_metrics_result(self): """Returns the program's metrics values as a dict. If any of the metric result is a dict (containing multiple metrics), each of them gets added to the top level returned dict of this method. Returns: (dict): A `dict` containing values of the metrics listed in `self.metrics`. Example: `{'reward': 0.2, 'accuracy': 0.7}`. """ return_metrics = {} for metric in self.metrics: result = metric.result() if isinstance(result, dict): return_metrics.update(result) else: return_metrics[metric.name] = result return python_utils.pythonify_logs(return_metrics) async def fit( self, x=None, y=None, batch_size=1, minibatch_size=4, epochs=1, verbose="auto", callbacks=None, validation_split=0.1, validation_data=None, shuffle=True, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=32, validation_freq=1, ): """Trains the program for a fixed number of epochs (dataset iterations). Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. y (np.ndarray): Target data. Like the input data `x`, it can be either NumPy array(s) of `DataModel`(s). If `x` is a Python generator function, `y` should not be specified since targets will be obtained from `x`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a Python generator function since they generate batches. minibatch_size (int): Integer or `None`. Number of randomly selected samples per batch validation. If unspecified, `minibatch_size` will default to 4. If `None`, the whole validation set will be used. epochs (int): Integer. Number of epochs to train the program. An epoch is an iteration over the entire `x` and `y` data provided (unless the `steps_per_epoch` flag is set to something other than None). Note that in conjunction with `initial_epoch`, `epochs` is to be understood as "final epoch". The program is not trained for a number of iterations given by `epochs`, but merely until the epoch of index `epochs` is reached. verbose (int): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g., in a production environment). Defaults to `"auto"`. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during training. See `synalinks.callbacks`. Note `synalinks.callbacks.ProgbarLogger` and `synalinks.callbacks.History` callbacks are created automatically and need not be passed to `program.fit()`. `synalinks.callbacks.ProgbarLogger` is created or not based on the `verbose` argument in `program.fit()`. validation_split (float): Float between 0 and 1. Fraction of the training data to be used as validation data. The program will set apart this fraction of the training data, will not train on it, and will evaluate the reward and any program metrics on this data at the end of each epoch. The validation data is selected from the last samples in the `x` and `y` data provided, before shuffling. This argument is only supported when `x` and `y` are made of data_models. If both `validation_data` and `validation_split` are provided, `validation_data` will override `validation_split`. validation_data (tuple | iterator): Data on which to evaluate the reward and any program metrics at the end of each epoch. The program will not be trained on this data. `validation_data` will override `validation_split`. It can be: - A tuple `(x_val, y_val)` of `DataModel`s lists. shuffle (bool): Whether to shuffle the training data before each epoch. This argument is ignored when `x` is a Python generator function. initial_epoch (int): Integer. Epoch at which to start training (useful for resuming a previous training run). steps_per_epoch (int): Integer or `None`. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input data_models arrays, the default `None` means that the value used is the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If `x` is a Python generator function, the epoch will run until the input dataset is exhausted. When passing an infinitely repeating dataset, you must specify the `steps_per_epoch` argument, otherwise the training will run indefinitely. validation_steps (int): Integer or `None`. Only relevant if `validation_data` is provided. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch. If `validation_steps` is `None`, validation will run until the `validation_data` dataset is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. If `validation_steps` is specified and only part of the dataset is consumed, the evaluation will start from the beginning of the dataset at each epoch. This ensures that the same validation samples are used every time. validation_batch_size (int): Integer or `None`. Number of samples per validation batch. If unspecified, will default to `batch_size`. Do not specify the `validation_batch_size` if your data is a `synalinks.utils.PyDataset`, `tf.data.Dataset`, `torch.utils.data.DataLoader` or Python generator function since they generate batches. validation_freq (int): Only relevant if validation data is provided. Specifies how many training epochs to run before a new validation run is performed, e.g. `validation_freq=2` runs validation every 2 epochs. Returns: (History): A `History` object. Its `History.history` attribute is a record of training reward values and metrics values at successive epochs, as well as validation reward values and validation metrics values (if applicable). """ self._assert_compile_called("fit") self._eval_epoch_iterator = None val_y, val_y = None, None if self._optimizer is None: # No optimizer ⇒ no parameter updates possible. Iterating the # training loop here would just burn LM calls / wall-clock time # without changing anything. Warn loudly and return an empty # History so callers that inspect `.history` keep working. warnings.warn( "`Program.fit()` was called but no optimizer is set on the " "compiled program — training cannot update any variables, so " "iterating the training data would be wasted compute. " "Skipping the fit loop. If you intended to evaluate, call " "`program.evaluate(x=..., y=...)` directly. If you intended " "to train, recompile with an optimizer (e.g. " "`program.compile(optimizer=synalinks.optimizers.RandomFewShot(), " "reward=..., metrics=...)`).", stacklevel=2, ) history = callbacks_module.History() self.history = history return history if validation_split and validation_data is None: # Create the validation data using the training data. Only supported # for numpy arrays. (x, y), validation_data = array_slicing.train_validation_split( (x, y), validation_split=validation_split ) if validation_data is not None: val_x, val_y = data_adapter_utils.unpack_x_y(validation_data) # Create an iterator that yields batches of input/target data. epoch_iterator = EpochIterator( x=x, y=y, batch_size=batch_size, steps_per_epoch=steps_per_epoch, shuffle=False, steps_per_execution=self.steps_per_execution, ) if not all(module.built for module in self._flatten_modules()): # Build the model on one batch of data. for _, data in epoch_iterator: data_batch = data[0] self._auto_build( iterator=epoch_iterator, data_batch=data_batch, ) break epoch_iterator.reset() # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): # Get optimizer name for logging optimizer_name = None if self._optimizer is not None: optimizer_name = self._optimizer.__class__.__name__ callbacks = callbacks_module.CallbackList( callbacks, add_history=True, add_progbar=verbose != 0, verbose=verbose, epochs=epochs, steps=steps_per_epoch, batch_size=batch_size, optimizer=optimizer_name, program=self, ) self.stop_training = False callbacks.on_train_begin() training_logs = None logs = {} initial_epoch = self._initial_epoch or initial_epoch if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_train_begin( self.trainable_variables, ) for epoch in range(initial_epoch, epochs): self.reset_metrics() if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_epoch_begin( epoch, self.trainable_variables, ) callbacks.on_epoch_begin(epoch) with epoch_iterator.catch_stop_iteration(): for step, iterator in epoch_iterator: data = iterator[0] x_batch, y_batch = data_adapter_utils.unpack_x_y(data) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_batch_begin( step, epoch, self.trainable_variables, ) callbacks.on_train_batch_begin(step) mini_val_x = None mini_val_y = None if minibatch_size: if len(val_x) > minibatch_size: indices = np.random.choice( len(val_x), size=minibatch_size, replace=False, ) mini_val_x = val_x[indices] mini_val_y = val_y[indices] logs = await self.train_on_batch( step=step, x=x_batch, y=y_batch, val_x=mini_val_x if mini_val_x is not None else val_x, val_y=mini_val_y if mini_val_y is not None else val_y, return_dict=True, ) val_logs = await self.evaluate( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps=validation_steps, callbacks=callbacks, _use_cached_eval_dataset=False, ) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_batch_end( step, epoch, self.trainable_variables, ) callbacks.on_train_batch_end(step, logs) if self.stop_training: break # Override with model metrics instead of last step logs if needed. epoch_logs = dict(self._get_metrics_result_or_logs(logs)) # Run validation. if validation_data is not None and self._should_eval(epoch, validation_freq): # Create EpochIterator for evaluation and cache it. if getattr(self, "_eval_epoch_iterator", None) is None: self._eval_epoch_iterator = EpochIterator( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps_per_execution=self.steps_per_execution, steps_per_epoch=validation_steps, shuffle=False, ) val_logs = await self.evaluate( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps=validation_steps, callbacks=callbacks, _use_cached_eval_dataset=True, ) val_logs = {"val_" + name: val for name, val in val_logs.items()} epoch_logs.update(val_logs) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_epoch_end( epoch, self.trainable_variables, ) callbacks.on_epoch_end(epoch, epoch_logs) training_logs = epoch_logs if self.stop_training: break # If _eval_epoch_iterator exists, delete it after all epochs are done. if getattr(self, "_eval_epoch_iterator", None) is not None: del self._eval_epoch_iterator if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_train_end(self.trainable_variables) callbacks.on_train_end(logs=training_logs) return self.history async def evaluate( self, x=None, y=None, batch_size=32, verbose="auto", steps=None, callbacks=None, return_dict=True, **kwargs, ): """Returns the reward value & metrics values for the program in test mode. Computation is done in batches (see the `batch_size` arg.) Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. y (np.ndarray): Target data. Like the input data `x`, it can be either NumPy array(s) of `DataModel`(s). If `x` is a Python generator function, `y` should not be specified since targets will be obtained from `x`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a Python generator function since they generate batches. verbose (int | str): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. `"auto"` becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g. in a production environment). Defaults to `"auto"`. steps (int): Integer or `None`. Total number of steps (batches of samples) to draw before declaring the evaluation round finished. If `steps` is `None`, it will run until `x` is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during evaluation. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): Scalar test reward (if the program has a single output and no metrics) or list of scalars (if the program has multiple outputs and/or metrics). The attribute `program.metrics_names` will give you the display labels for the scalar outputs. """ self._assert_compile_called("evaluate") use_cached_eval_dataset = kwargs.pop("_use_cached_eval_dataset", False) if kwargs: raise ValueError(f"Arguments not recognized: {kwargs}") # Create an iterator that yields batches of input/target data. if use_cached_eval_dataset: epoch_iterator = self._eval_epoch_iterator else: epoch_iterator = EpochIterator( x=x, y=y, batch_size=batch_size, steps_per_epoch=steps, shuffle=False, steps_per_execution=self.steps_per_execution, ) if not all(module.built for module in self._flatten_modules()): # Build the model on one batch of data. for _, data in epoch_iterator: data_batch = data[0] self._auto_build( iterator=epoch_iterator, data_batch=data_batch, ) break epoch_iterator.reset() # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): callbacks = callbacks_module.CallbackList( callbacks, add_history=False, add_progbar=verbose != 0, verbose=verbose, epochs=1, steps=epoch_iterator.num_batches, program=self, ) self.stop_evaluating = False callbacks.on_test_begin() logs = {} self.reset_metrics() for step, iterator in epoch_iterator: callbacks.on_test_batch_begin(step) data = iterator[0] x_batch, y_batch = data_adapter_utils.unpack_x_y(data) logs = await self.test_on_batch( x=x_batch, y=y_batch, return_dict=True, ) callbacks.on_test_batch_end(step, logs) if self.stop_evaluating: break logs = self.get_metrics_result() callbacks.on_test_end(logs) if return_dict: return logs return self._flatten_metrics_in_order(logs) async def predict( self, x, batch_size=None, verbose="auto", steps=None, callbacks=None ): """Generates output predictions for the input samples. Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside of loops that iterate over your data and process small numbers of inputs at a time. For small numbers of inputs that fit in one batch, directly use `__call__()` for faster execution, e.g., `program(x)`, or `program(x, training=False)` if you have modules that behave differently during inference. Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a `synalinks.utils.PyDataset`, `tf.data.Dataset`, `torch.utils.data.DataLoader` or Python generator function since they generate batches. verbose (int): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. `"auto"` becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g. in a production environment). Defaults to `"auto"`. steps (int): Total number of steps (batches of samples) to draw before declaring the prediction round finished. If `steps` is `None`, it will run until `x` is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during prediction. Returns: (list): `JsonDataModel` array(s) of predictions. If the pipeline failed, a None is added to the predictions. """ # Create an iterator that yields batches of input data. epoch_iterator = EpochIterator( x=x, batch_size=batch_size, steps_per_epoch=steps, shuffle=False, steps_per_execution=self.steps_per_execution, ) # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): callbacks = callbacks_module.CallbackList( callbacks, add_history=True, add_progbar=verbose != 0, verbose=verbose, epochs=1, steps=epoch_iterator.num_batches, model=self, ) self.stop_predicting = False callbacks.on_test_begin() outputs = [] for step, iterator in epoch_iterator: callbacks.on_predict_batch_begin(step) data = iterator[0] x_batch, _ = data_adapter_utils.unpack_x_y(data) batch_outputs = await self.predict_on_batch(x_batch) outputs.extend(batch_outputs) callbacks.on_predict_batch_end(step, {"outputs": batch_outputs}) if self.stop_predicting: break callbacks.on_predict_end() return np.array(outputs, dtype="object") async def train_on_batch( self, step, x, y=None, val_x=None, val_y=None, return_dict=False, ): """Runs a single optimization step on a single batch of data. Args: step (int): The training step. x (np.ndarray): Input data. Must be array-like. y (np.ndarray): Target data. Must be array-like. val_x (np.ndarray): Input validation data. Must be array-like. val_y (np.ndarray): Target validation data. Must be array-like. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): A scalar reward value (when no metrics and `return_dict=False`), a list of reward and metric values (if there are metrics and `return_dict=False`), or a dict of metric and reward values (if `return_dict=True`). """ if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "optimizer") try: metrics = await self.optimizer.optimize( step, self.trainable_variables, x=x, y=y, val_x=val_x, val_y=val_y, ) finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) else: warnings.warn("The program does not have any trainable variables.") y_pred = await self.predict_on_batch(val_x) rewards = await self.compute_reward( x=val_x, y=val_y, y_pred=y_pred, ) reduction = ( self._compile_reward.reduction if self._compile_reward is not None else "mean" ) scalar_reward = rewards_module.reduce_rewards(rewards, reduction) await self._reward_tracker.update_state(scalar_reward) metrics = await self.compute_metrics(val_x, val_y, y_pred) if return_dict: return metrics return self._flatten_metrics_in_order(metrics) async def test_on_batch( self, x, y=None, return_dict=False, ): """Test the program on a single batch of samples. Args: x (np.ndarray): Input data. Must be array-like. y (np.ndarray): Target data. Must be array-like. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): A scalar reward value (when no metrics and `return_dict=False`), a list of reward and metric values (if there are metrics and `return_dict=False`), or a dict of metric and reward values (if `return_dict=True`). """ y_pred = await self.predict_on_batch(x) rewards = await self.compute_reward( x=x, y=y, y_pred=y_pred, training=False, ) reduction = ( self._compile_reward.reduction if self._compile_reward is not None else "mean" ) scalar_reward = rewards_module.reduce_rewards(rewards, reduction) await self._reward_tracker.update_state(scalar_reward) metrics = await self.compute_metrics(x, y, y_pred) if return_dict: return metrics return self._flatten_metrics_in_order(metrics) async def predict_on_batch(self, x, training=False): """Returns predictions for a single batch of samples. Args: x (np.ndarray): Input data. Must be array-like. training (bool): Boolean. True if training. Returns: (list): list(s) of JsonDataModel predictions. """ # Tag this work as "inference" so LanguageModel / EmbeddingModel can # attribute token / latency / cost to the program's forward pass # only. See synalinks.src.backend.common.global_state key # `synalinks_op_scope` — value is one of "inference", "reward", # "optimizer", or None. prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "inference") try: tasks = [] for inputs in x: tasks.append(self(inputs, training=training)) y_pred = await asyncio.gather(*tasks) return y_pred finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) def get_compile_config(self): """Returns a serialized config with information for compiling the program. This method returns a config dictionary containing all the information (optimizer, reward, metrics, etc.) with which the program was compiled. Returns: (dict): A dict containing information for compiling the program. """ if self.compiled and hasattr(self, "_compile_config"): return self._compile_config.serialize() def compile_from_config(self, config): """Compiles the program with the information given in config. This method uses the information in the config (optimizer, reward, metrics, etc.) to compile the program. Args: config (dict): Dict containing information for compiling the program. """ has_overridden_compile = self.__class__.compile != Trainer.compile if has_overridden_compile: warnings.warn( "`compile()` was not called as part of program loading " "because the program's `compile()` method is custom. " "All subclassed Models that have `compile()` " "overridden should also override " "`get_compile_config()` and `compile_from_config(config)`. " "Alternatively, you can " "call `compile()` manually after loading.", stacklevel=2, ) return config = serialization_lib.deserialize_synalinks_object(config) self.compile(**config) if hasattr(self, "optimizer") and self.built: # Create optimizer variables/programs. if not self.optimizer.built: run_maybe_nested(self.optimizer.build(self.trainable_variables)) def _should_reward(self, epoch, validation_freq): epoch = epoch + 1 # one-index the user-facing epoch. if isinstance(validation_freq, int): return epoch % validation_freq == 0 elif isinstance(validation_freq, list): return epoch in validation_freq else: raise ValueError( "Expected `validation_freq` to be a list or int. " f"Received: validation_freq={validation_freq} of the " f"type {type(validation_freq)}." ) def _get_metrics_result_or_logs(self, logs): """Returns program metrics as a dict if the keys match with input logs. When the training / evaluation is performed with an asynchronous steps, the last scheduled `train / test_step` may not give the latest metrics because it is not guaranteed to be executed the last. This method gets metrics from the program directly instead of relying on the return from last step function. When the user has custom train / test step functions, the metrics returned may be different from `Program.metrics`. In those instances, this function will be no-op and return the logs passed in. Args: logs (dict): A `dict` of metrics returned by train / test step function. Returns: (dict): A `dict` containing values of the metrics listed in `self.metrics` when logs and program metrics keys match. Otherwise it returns input `logs`. """ metric_logs = self.get_metrics_result() # Verify that train / test step logs passed and metric logs have # matching keys. It could be different when using custom step functions, # in which case we return the logs from the last step. if isinstance(logs, dict) and set(logs.keys()) == set(metric_logs.keys()): return metric_logs return logs def _flatten_metrics_in_order(self, logs): """Turns `logs` dict into a list as per key order of `metrics_names`.""" metric_names = [] for metric in self.metrics: if isinstance(metric, CompileMetrics): metric_names += [sub_metric.name for sub_metric in metric.metrics] else: metric_names.append(metric.name) results = [] for name in metric_names: if name in logs: results.append(logs[name]) for key in sorted(logs.keys()): if key not in metric_names: results.append(logs[key]) if len(results) == 1: return results[0] return results def _assert_compile_called(self, method_name=None): if not self.compiled: msg = "You must call `compile()` before " if metrics_module: msg += "using the program." else: msg += f"calling `{method_name}()`." raise ValueError(msg) def _auto_build(self, iterator=None, data_batch=None): program_unbuilt = not all(module.built for module in self._flatten_modules()) compile_metrics_unbuilt = ( self._compile_metrics is not None and not self._compile_metrics.built ) compile_reward_unbuilt = ( self._compile_reward is not None and not self._compile_reward.built ) optimizer_unbuilt = self.optimizer is not None and not self.optimizer.built if program_unbuilt or compile_metrics_unbuilt or compile_reward_unbuilt: if data_batch is None: for _, data_or_iterator in iterator: if isinstance(data_or_iterator, (list, tuple)): data_batch = data_or_iterator[0] else: data_batch = next(data_or_iterator) break x, y = data_batch try: y_pred = run_maybe_nested(self.predict_on_batch(x)) except Exception as e: raise RuntimeError( "Unable to automatically build the program. " "Please build it yourself before calling " "fit/evaluate/predict. " "A program is 'built' when its variables have " "been created and its `self.built` attribute " "is True. Usually, calling the program on a batch " "of data is the right way to build it.\n" "Exception encountered:\n" f"'{e}'" ) if compile_metrics_unbuilt: # Build all metric state with `backend.compute_output_spec`. run_maybe_nested( backend.compute_output_spec( self.compute_metrics, x, y, y_pred, ) ) if compile_reward_unbuilt: # Build `CompileReward` state with `backend.compute_output_spec`. run_maybe_nested( backend.compute_output_spec( self.compute_reward, x, y, y_pred, training=False, ) ) if optimizer_unbuilt: # Build optimizer run_maybe_nested(self.optimizer.build(self.trainable_variables)) self._post_build() def _assert_compile_called(self, method_name=None): if not self.compiled: msg = "You must call `compile()` before " if metrics_module: msg += "using the model." else: msg += f"calling `{method_name}()`." raise ValueError(msg) def _should_eval(self, epoch, validation_freq): epoch = epoch + 1 # one-index the user-facing epoch. if isinstance(validation_freq, int): return epoch % validation_freq == 0 elif isinstance(validation_freq, list): return epoch in validation_freq else: raise ValueError( "Expected `validation_freq` to be a list or int. " f"Received: validation_freq={validation_freq} of the " f"type {type(validation_freq)}." ) ```` ## `compile(optimizer=None, reward=None, reward_weights=None, metrics=None, run_eagerly=False, steps_per_execution=1)` Configures the program for training. Example: ``` program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), metrics=[ synalinks.metrics.MeanMetricWrapper(synalinks.rewards.exact_match), ], ) ``` Parameters: | Name | Type | Description | Default | | --------------------- | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `optimizer` | `Optimizer` | Optimizer instance. See synalinks.optimizers. | `None` | | `reward` | `Reward` | Reward function. A synalinks.rewards.Reward instance. See synalinks.rewards. A reward function is any callable with the signature reward = fn(y_true, y_pred), where y_true are the ground truth values, and y_pred are the program's predictions. y_true should be a list of batch size length [d0, .. dN]. y_pred should be a list of batch size length [d0, .. dN]. The reward function should return a float. | `None` | | `reward_weights` | `list` | Optional list specifying scalar coefficients (Python floats) to weight the reward contributions of different program outputs. The reward value that will be maximized by the program will then be the weighted sum of all individual rewards, weighted by the reward_weights coefficients. It is expected to have a 1:1 mapping to the program's outputs. | `None` | | `metrics` | `list` | List of metrics to be evaluated by the program during training and testing. Each of it is a synalinks.metrics.Metric instance. See synalinks.metrics. A function is any callable with the signature result = fn(y_true, y_pred). | `None` | | `run_eagerly` | `bool` | If True, this program's forward pass will never be compiled. It is recommended to leave this as False when training (for best performance), and to set it to True when debugging. | `False` | | `steps_per_execution` | `int` | The number of batches to run during each a single compiled function call. Running multiple batches inside a single compiled function call can greatly improve performance on TPUs or small programs with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if steps_per_execution is set to N, Callback.on_batch_begin and Callback.on_batch_end methods will only be called every N batches (i.e. before/after each compiled function execution). | `1` | Source code in `synalinks/src/trainers/trainer.py` ```` @tracking.no_automatic_dependency_tracking def compile( self, optimizer=None, reward=None, reward_weights=None, metrics=None, run_eagerly=False, steps_per_execution=1, ): """Configures the program for training. Example: ```python program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), metrics=[ synalinks.metrics.MeanMetricWrapper(synalinks.rewards.exact_match), ], ) ``` Args: optimizer (Optimizer): Optimizer instance. See `synalinks.optimizers`. reward (Reward): Reward function. A `synalinks.rewards.Reward` instance. See `synalinks.rewards`. A reward function is any callable with the signature `reward = fn(y_true, y_pred)`, where `y_true` are the ground truth values, and `y_pred` are the program's predictions. `y_true` should be a list of batch size length `[d0, .. dN]`. `y_pred` should be a list of batch size length `[d0, .. dN]`. The reward function should return a float. reward_weights (list): Optional list specifying scalar coefficients (Python floats) to weight the reward contributions of different program outputs. The reward value that will be maximized by the program will then be the *weighted sum* of all individual rewards, weighted by the `reward_weights` coefficients. It is expected to have a 1:1 mapping to the program's outputs. metrics (list): List of metrics to be evaluated by the program during training and testing. Each of it is a `synalinks.metrics.Metric` instance. See `synalinks.metrics`. A function is any callable with the signature `result = fn(y_true, y_pred)`. run_eagerly (bool): If `True`, this program's forward pass will never be compiled. It is recommended to leave this as `False` when training (for best performance), and to set it to `True` when debugging. steps_per_execution (int): The number of batches to run during each a single compiled function call. Running multiple batches inside a single compiled function call can greatly improve performance on TPUs or small programs with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if `steps_per_execution` is set to `N`, `Callback.on_batch_begin` and `Callback.on_batch_end` methods will only be called every `N` batches (i.e. before/after each compiled function execution). """ self._clear_previous_trainer_metrics() # Resolve string/dict identifiers (Keras-style) into instances. # Reward and metrics flow through `CompileReward`/`CompileMetrics` # which already call `rewards.get` / `metrics.get`. self._optimizer = optimizers_module.get(optimizer) if self._optimizer is not None: self._optimizer.set_program(self) if hasattr(self, "output_names"): output_names = self.output_names else: output_names = None if reward is not None: reward = rewards_module.get(reward) reduction = getattr(reward, "reduction", "mean") self._compile_reward = CompileReward( reward, reward_weights, reduction=reduction, output_names=output_names, ) self.reward = reward if metrics is not None: self._compile_metrics = CompileMetrics(metrics, output_names=output_names) # Operational metrics (e.g. TotalTokens, Throughput) accept # `language_model=None`; bind them to every LM reachable from # the program so counters aggregate automatically. for m in tree.flatten(metrics): if hasattr(m, "bind_program"): m.bind_program(self) self.run_eagerly = run_eagerly self.stop_training = False self.compiled = True self._reward_tracker = metrics_module.Mean(name="reward") self.steps_per_execution = steps_per_execution self._compile_config = serialization_lib.SerializableDict( optimizer=optimizer, reward=reward, reward_weights=reward_weights, metrics=metrics, run_eagerly=run_eagerly, steps_per_execution=steps_per_execution, ) ```` ## `compile_from_config(config)` Compiles the program with the information given in config. This method uses the information in the config (optimizer, reward, metrics, etc.) to compile the program. Parameters: | Name | Type | Description | Default | | -------- | ------ | ------------------------------------------------------ | ---------- | | `config` | `dict` | Dict containing information for compiling the program. | *required* | Source code in `synalinks/src/trainers/trainer.py` ``` def compile_from_config(self, config): """Compiles the program with the information given in config. This method uses the information in the config (optimizer, reward, metrics, etc.) to compile the program. Args: config (dict): Dict containing information for compiling the program. """ has_overridden_compile = self.__class__.compile != Trainer.compile if has_overridden_compile: warnings.warn( "`compile()` was not called as part of program loading " "because the program's `compile()` method is custom. " "All subclassed Models that have `compile()` " "overridden should also override " "`get_compile_config()` and `compile_from_config(config)`. " "Alternatively, you can " "call `compile()` manually after loading.", stacklevel=2, ) return config = serialization_lib.deserialize_synalinks_object(config) self.compile(**config) if hasattr(self, "optimizer") and self.built: # Create optimizer variables/programs. if not self.optimizer.built: run_maybe_nested(self.optimizer.build(self.trainable_variables)) ``` ## `compute_metrics(x, y, y_pred)` Update metric states and collect all metrics to be returned. Subclasses can optionally override this method to provide custom metric updating and collection logic. Custom metrics are not passed in `compile()`, they can be created in `__init__` or `build`. They are automatically tracked and returned by `self.metrics`. ``` Args: x: Input data. y: Target data. y_pred: Predictions returned by the program output of `program.call(x)`. Returns: A `dict` containing values that will be passed to `synalinks.callbacks.CallbackList.on_train_batch_end()`. Typically, the values of the metrics listed in `self.metrics` are returned. Example: `{'reward': 0.2, 'accuracy': 0.7}`. Source code in `synalinks/src/trainers/trainer.py` ``` async def compute_metrics(self, x, y, y_pred): """Update metric states and collect all metrics to be returned. ```` Subclasses can optionally override this method to provide custom metric updating and collection logic. Custom metrics are not passed in `compile()`, they can be created in `__init__` or `build`. They are automatically tracked and returned by `self.metrics`. ``` Args: x: Input data. y: Target data. y_pred: Predictions returned by the program output of `program.call(x)`. Returns: A `dict` containing values that will be passed to `synalinks.callbacks.CallbackList.on_train_batch_end()`. Typically, the values of the metrics listed in `self.metrics` are returned. Example: `{'reward': 0.2, 'accuracy': 0.7}`. """ del x # The default implementation does not use `x`. if self._compile_metrics is not None: for y_t, y_p in zip(y, y_pred): await self._compile_metrics.update_state(y_t, y_p) return self.get_metrics_result() ```` ``` ## `compute_reward(x=None, y=None, y_pred=None, training=True)` Compute per-sample rewards for each prediction. Subclasses can optionally override this method to provide custom reward computation logic. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `list` | Input data. | `None` | | `y` | `list` | Target data. | `None` | | `y_pred` | `list` | Predictions returned by the program (output of program(x)). | `None` | | `training` | `bool` | Whether we are training or evaluating the program. | `True` | Returns: | Type | Description | | --- | --- | | `list[float]` | A list of per-sample reward values, one for each (y, y_pred) pair. | Source code in `synalinks/src/trainers/trainer.py` ``` async def compute_reward( self, x=None, y=None, y_pred=None, training=True, ): """Compute per-sample rewards for each prediction. ``` Subclasses can optionally override this method to provide custom reward computation logic. Args: x (list): Input data. y (list): Target data. y_pred (list): Predictions returned by the program (output of `program(x)`). training (bool): Whether we are training or evaluating the program. Returns: (list[float]): A list of per-sample reward values, one for each (y, y_pred) pair. """ # The default implementation does not use `x` or `training`. del x del training rewards = [] prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "reward") try: if self._compile_reward is not None: if not self._compile_reward.built: self._compile_reward.build(y[0], y_pred[0]) if self._compile_reward.has_batch_rewards: results = await self._compile_reward.compute_batch(y, y_pred) else: results = await asyncio.gather( *[self._compile_reward(y_t, y_p) for y_t, y_p in zip(y, y_pred)] ) for reward in results: if reward is not None: rewards.append(float(reward)) else: rewards.append(0.0) for reward in self.rewards: rewards.append(float(numpy.sum(reward))) if len(rewards) == 0: rewards = [0.0] return rewards finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) ``` ``` ## `evaluate(x=None, y=None, batch_size=32, verbose='auto', steps=None, callbacks=None, return_dict=True, **kwargs)` Returns the reward value & metrics values for the program in test mode. Computation is done in batches (see the `batch_size` arg.) Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `ndarray | generator` | Input data. It can be: - A NumPy array (or array-like), or a list of DataModel arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding DataModels, if the program has named inputs. - A Python generator function yielding (inputs, targets). | `None` | | `y` | `ndarray` | Target data. Like the input data x, it can be either NumPy array(s) of DataModel(s). If x is a Python generator function, y should not be specified since targets will be obtained from x. | `None` | | `batch_size` | `int` | Integer or None. Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a Python generator function since they generate batches. | `32` | | `verbose` | `int | str` | "auto", 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to "auto". | `'auto'` | | `steps` | `int` | Integer or None. Total number of steps (batches of samples) to draw before declaring the evaluation round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. | `None` | | `callbacks` | `list` | List of synalinks.callbacks.Callback instances. List of callbacks to apply during evaluation. | `None` | | `return_dict` | `bool` | If True, reward and metric results are returned as a dict, with each key being the name of the metric. If False, they are returned as a list. | `True` | Returns: | Type | Description | | --- | --- | | `float | list | dict` | Scalar test reward (if the program has a single output and no metrics) or list of scalars (if the program has multiple outputs and/or metrics). The attribute program.metrics_names will give you the display labels for the scalar outputs. | Source code in `synalinks/src/trainers/trainer.py` ``` async def evaluate( self, x=None, y=None, batch_size=32, verbose="auto", steps=None, callbacks=None, return_dict=True, \*\*kwargs, ): """Returns the reward value & metrics values for the program in test mode. ``` Computation is done in batches (see the `batch_size` arg.) Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. y (np.ndarray): Target data. Like the input data `x`, it can be either NumPy array(s) of `DataModel`(s). If `x` is a Python generator function, `y` should not be specified since targets will be obtained from `x`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a Python generator function since they generate batches. verbose (int | str): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. `"auto"` becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g. in a production environment). Defaults to `"auto"`. steps (int): Integer or `None`. Total number of steps (batches of samples) to draw before declaring the evaluation round finished. If `steps` is `None`, it will run until `x` is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during evaluation. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): Scalar test reward (if the program has a single output and no metrics) or list of scalars (if the program has multiple outputs and/or metrics). The attribute `program.metrics_names` will give you the display labels for the scalar outputs. """ self._assert_compile_called("evaluate") use_cached_eval_dataset = kwargs.pop("_use_cached_eval_dataset", False) if kwargs: raise ValueError(f"Arguments not recognized: {kwargs}") # Create an iterator that yields batches of input/target data. if use_cached_eval_dataset: epoch_iterator = self._eval_epoch_iterator else: epoch_iterator = EpochIterator( x=x, y=y, batch_size=batch_size, steps_per_epoch=steps, shuffle=False, steps_per_execution=self.steps_per_execution, ) if not all(module.built for module in self._flatten_modules()): # Build the model on one batch of data. for _, data in epoch_iterator: data_batch = data[0] self._auto_build( iterator=epoch_iterator, data_batch=data_batch, ) break epoch_iterator.reset() # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): callbacks = callbacks_module.CallbackList( callbacks, add_history=False, add_progbar=verbose != 0, verbose=verbose, epochs=1, steps=epoch_iterator.num_batches, program=self, ) self.stop_evaluating = False callbacks.on_test_begin() logs = {} self.reset_metrics() for step, iterator in epoch_iterator: callbacks.on_test_batch_begin(step) data = iterator[0] x_batch, y_batch = data_adapter_utils.unpack_x_y(data) logs = await self.test_on_batch( x=x_batch, y=y_batch, return_dict=True, ) callbacks.on_test_batch_end(step, logs) if self.stop_evaluating: break logs = self.get_metrics_result() callbacks.on_test_end(logs) if return_dict: return logs return self._flatten_metrics_in_order(logs) ``` ``` ## `fit(x=None, y=None, batch_size=1, minibatch_size=4, epochs=1, verbose='auto', callbacks=None, validation_split=0.1, validation_data=None, shuffle=True, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=32, validation_freq=1)` Trains the program for a fixed number of epochs (dataset iterations). Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `ndarray | generator` | Input data. It can be: - A NumPy array (or array-like), or a list of DataModel arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding DataModels, if the program has named inputs. - A Python generator function yielding (inputs, targets). | `None` | | `y` | `ndarray` | Target data. Like the input data x, it can be either NumPy array(s) of DataModel(s). If x is a Python generator function, y should not be specified since targets will be obtained from x. | `None` | | `batch_size` | `int` | Integer or None. Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a Python generator function since they generate batches. | `1` | | `minibatch_size` | `int` | Integer or None. Number of randomly selected samples per batch validation. If unspecified, minibatch_size will default to 4. If None, the whole validation set will be used. | `4` | | `epochs` | `int` | Integer. Number of epochs to train the program. An epoch is an iteration over the entire x and y data provided (unless the steps_per_epoch flag is set to something other than None). Note that in conjunction with initial_epoch, epochs is to be understood as "final epoch". The program is not trained for a number of iterations given by epochs, but merely until the epoch of index epochs is reached. | `1` | | `verbose` | `int` | "auto", 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g., in a production environment). Defaults to "auto". | `'auto'` | | `callbacks` | `list` | List of synalinks.callbacks.Callback instances. List of callbacks to apply during training. See synalinks.callbacks. Note synalinks.callbacks.ProgbarLogger and synalinks.callbacks.History callbacks are created automatically and need not be passed to program.fit(). synalinks.callbacks.ProgbarLogger is created or not based on the verbose argument in program.fit(). | `None` | | `validation_split` | `float` | Float between 0 and 1. Fraction of the training data to be used as validation data. The program will set apart this fraction of the training data, will not train on it, and will evaluate the reward and any program metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling. This argument is only supported when x and y are made of data_models. If both validation_data and validation_split are provided, validation_data will override validation_split. | `0.1` | | `validation_data` | `tuple | iterator` | Data on which to evaluate the reward and any program metrics at the end of each epoch. The program will not be trained on this data. validation_data will override validation_split. It can be: - A tuple (x_val, y_val) of DataModels lists. | `None` | | `shuffle` | `bool` | Whether to shuffle the training data before each epoch. This argument is ignored when x is a Python generator function. | `True` | | `initial_epoch` | `int` | Integer. Epoch at which to start training (useful for resuming a previous training run). | `0` | | `steps_per_epoch` | `int` | Integer or None. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input data_models arrays, the default None means that the value used is the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If x is a Python generator function, the epoch will run until the input dataset is exhausted. When passing an infinitely repeating dataset, you must specify the steps_per_epoch argument, otherwise the training will run indefinitely. | `None` | | `validation_steps` | `int` | Integer or None. Only relevant if validation_data is provided. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch. If validation_steps is None, validation will run until the validation_data dataset is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. If validation_steps is specified and only part of the dataset is consumed, the evaluation will start from the beginning of the dataset at each epoch. This ensures that the same validation samples are used every time. | `None` | | `validation_batch_size` | `int` | Integer or None. Number of samples per validation batch. If unspecified, will default to batch_size. Do not specify the validation_batch_size if your data is a synalinks.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches. | `32` | | `validation_freq` | `int` | Only relevant if validation data is provided. Specifies how many training epochs to run before a new validation run is performed, e.g. validation_freq=2 runs validation every 2 epochs. | `1` | Returns: | Type | Description | | --- | --- | | `History` | A History object. Its History.history attribute is a record of training reward values and metrics values at successive epochs, as well as validation reward values and validation metrics values (if applicable). | Source code in `synalinks/src/trainers/trainer.py` ``` async def fit( self, x=None, y=None, batch_size=1, minibatch_size=4, epochs=1, verbose="auto", callbacks=None, validation_split=0.1, validation_data=None, shuffle=True, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=32, validation_freq=1, ): """Trains the program for a fixed number of epochs (dataset iterations). ``` Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. y (np.ndarray): Target data. Like the input data `x`, it can be either NumPy array(s) of `DataModel`(s). If `x` is a Python generator function, `y` should not be specified since targets will be obtained from `x`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a Python generator function since they generate batches. minibatch_size (int): Integer or `None`. Number of randomly selected samples per batch validation. If unspecified, `minibatch_size` will default to 4. If `None`, the whole validation set will be used. epochs (int): Integer. Number of epochs to train the program. An epoch is an iteration over the entire `x` and `y` data provided (unless the `steps_per_epoch` flag is set to something other than None). Note that in conjunction with `initial_epoch`, `epochs` is to be understood as "final epoch". The program is not trained for a number of iterations given by `epochs`, but merely until the epoch of index `epochs` is reached. verbose (int): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g., in a production environment). Defaults to `"auto"`. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during training. See `synalinks.callbacks`. Note `synalinks.callbacks.ProgbarLogger` and `synalinks.callbacks.History` callbacks are created automatically and need not be passed to `program.fit()`. `synalinks.callbacks.ProgbarLogger` is created or not based on the `verbose` argument in `program.fit()`. validation_split (float): Float between 0 and 1. Fraction of the training data to be used as validation data. The program will set apart this fraction of the training data, will not train on it, and will evaluate the reward and any program metrics on this data at the end of each epoch. The validation data is selected from the last samples in the `x` and `y` data provided, before shuffling. This argument is only supported when `x` and `y` are made of data_models. If both `validation_data` and `validation_split` are provided, `validation_data` will override `validation_split`. validation_data (tuple | iterator): Data on which to evaluate the reward and any program metrics at the end of each epoch. The program will not be trained on this data. `validation_data` will override `validation_split`. It can be: - A tuple `(x_val, y_val)` of `DataModel`s lists. shuffle (bool): Whether to shuffle the training data before each epoch. This argument is ignored when `x` is a Python generator function. initial_epoch (int): Integer. Epoch at which to start training (useful for resuming a previous training run). steps_per_epoch (int): Integer or `None`. Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input data_models arrays, the default `None` means that the value used is the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If `x` is a Python generator function, the epoch will run until the input dataset is exhausted. When passing an infinitely repeating dataset, you must specify the `steps_per_epoch` argument, otherwise the training will run indefinitely. validation_steps (int): Integer or `None`. Only relevant if `validation_data` is provided. Total number of steps (batches of samples) to draw before stopping when performing validation at the end of every epoch. If `validation_steps` is `None`, validation will run until the `validation_data` dataset is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. If `validation_steps` is specified and only part of the dataset is consumed, the evaluation will start from the beginning of the dataset at each epoch. This ensures that the same validation samples are used every time. validation_batch_size (int): Integer or `None`. Number of samples per validation batch. If unspecified, will default to `batch_size`. Do not specify the `validation_batch_size` if your data is a `synalinks.utils.PyDataset`, `tf.data.Dataset`, `torch.utils.data.DataLoader` or Python generator function since they generate batches. validation_freq (int): Only relevant if validation data is provided. Specifies how many training epochs to run before a new validation run is performed, e.g. `validation_freq=2` runs validation every 2 epochs. Returns: (History): A `History` object. Its `History.history` attribute is a record of training reward values and metrics values at successive epochs, as well as validation reward values and validation metrics values (if applicable). """ self._assert_compile_called("fit") self._eval_epoch_iterator = None val_y, val_y = None, None if self._optimizer is None: # No optimizer ⇒ no parameter updates possible. Iterating the # training loop here would just burn LM calls / wall-clock time # without changing anything. Warn loudly and return an empty # History so callers that inspect `.history` keep working. warnings.warn( "`Program.fit()` was called but no optimizer is set on the " "compiled program — training cannot update any variables, so " "iterating the training data would be wasted compute. " "Skipping the fit loop. If you intended to evaluate, call " "`program.evaluate(x=..., y=...)` directly. If you intended " "to train, recompile with an optimizer (e.g. " "`program.compile(optimizer=synalinks.optimizers.RandomFewShot(), " "reward=..., metrics=...)`).", stacklevel=2, ) history = callbacks_module.History() self.history = history return history if validation_split and validation_data is None: # Create the validation data using the training data. Only supported # for numpy arrays. (x, y), validation_data = array_slicing.train_validation_split( (x, y), validation_split=validation_split ) if validation_data is not None: val_x, val_y = data_adapter_utils.unpack_x_y(validation_data) # Create an iterator that yields batches of input/target data. epoch_iterator = EpochIterator( x=x, y=y, batch_size=batch_size, steps_per_epoch=steps_per_epoch, shuffle=False, steps_per_execution=self.steps_per_execution, ) if not all(module.built for module in self._flatten_modules()): # Build the model on one batch of data. for _, data in epoch_iterator: data_batch = data[0] self._auto_build( iterator=epoch_iterator, data_batch=data_batch, ) break epoch_iterator.reset() # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): # Get optimizer name for logging optimizer_name = None if self._optimizer is not None: optimizer_name = self._optimizer.__class__.__name__ callbacks = callbacks_module.CallbackList( callbacks, add_history=True, add_progbar=verbose != 0, verbose=verbose, epochs=epochs, steps=steps_per_epoch, batch_size=batch_size, optimizer=optimizer_name, program=self, ) self.stop_training = False callbacks.on_train_begin() training_logs = None logs = {} initial_epoch = self._initial_epoch or initial_epoch if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_train_begin( self.trainable_variables, ) for epoch in range(initial_epoch, epochs): self.reset_metrics() if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_epoch_begin( epoch, self.trainable_variables, ) callbacks.on_epoch_begin(epoch) with epoch_iterator.catch_stop_iteration(): for step, iterator in epoch_iterator: data = iterator[0] x_batch, y_batch = data_adapter_utils.unpack_x_y(data) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_batch_begin( step, epoch, self.trainable_variables, ) callbacks.on_train_batch_begin(step) mini_val_x = None mini_val_y = None if minibatch_size: if len(val_x) > minibatch_size: indices = np.random.choice( len(val_x), size=minibatch_size, replace=False, ) mini_val_x = val_x[indices] mini_val_y = val_y[indices] logs = await self.train_on_batch( step=step, x=x_batch, y=y_batch, val_x=mini_val_x if mini_val_x is not None else val_x, val_y=mini_val_y if mini_val_y is not None else val_y, return_dict=True, ) val_logs = await self.evaluate( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps=validation_steps, callbacks=callbacks, _use_cached_eval_dataset=False, ) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_batch_end( step, epoch, self.trainable_variables, ) callbacks.on_train_batch_end(step, logs) if self.stop_training: break # Override with model metrics instead of last step logs if needed. epoch_logs = dict(self._get_metrics_result_or_logs(logs)) # Run validation. if validation_data is not None and self._should_eval(epoch, validation_freq): # Create EpochIterator for evaluation and cache it. if getattr(self, "_eval_epoch_iterator", None) is None: self._eval_epoch_iterator = EpochIterator( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps_per_execution=self.steps_per_execution, steps_per_epoch=validation_steps, shuffle=False, ) val_logs = await self.evaluate( x=val_x, y=val_y, batch_size=validation_batch_size or batch_size, steps=validation_steps, callbacks=callbacks, _use_cached_eval_dataset=True, ) val_logs = {"val_" + name: val for name, val in val_logs.items()} epoch_logs.update(val_logs) if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_epoch_end( epoch, self.trainable_variables, ) callbacks.on_epoch_end(epoch, epoch_logs) training_logs = epoch_logs if self.stop_training: break # If _eval_epoch_iterator exists, delete it after all epochs are done. if getattr(self, "_eval_epoch_iterator", None) is not None: del self._eval_epoch_iterator if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): await self.optimizer.on_train_end(self.trainable_variables) callbacks.on_train_end(logs=training_logs) return self.history ``` ``` ## `get_compile_config()` Returns a serialized config with information for compiling the program. This method returns a config dictionary containing all the information (optimizer, reward, metrics, etc.) with which the program was compiled. Returns: | Type | Description | | --- | --- | | `dict` | A dict containing information for compiling the program. | Source code in `synalinks/src/trainers/trainer.py` ``` def get_compile_config(self): """Returns a serialized config with information for compiling the program. ``` This method returns a config dictionary containing all the information (optimizer, reward, metrics, etc.) with which the program was compiled. Returns: (dict): A dict containing information for compiling the program. """ if self.compiled and hasattr(self, "_compile_config"): return self._compile_config.serialize() ``` ``` ## `get_metrics_result()` Returns the program's metrics values as a dict. If any of the metric result is a dict (containing multiple metrics), each of them gets added to the top level returned dict of this method. Returns: | Type | Description | | --- | --- | | `dict` | A dict containing values of the metrics listed in self.metrics. Example: {'reward': 0.2, 'accuracy': 0.7}. | Source code in `synalinks/src/trainers/trainer.py` ``` def get_metrics_result(self): """Returns the program's metrics values as a dict. ``` If any of the metric result is a dict (containing multiple metrics), each of them gets added to the top level returned dict of this method. Returns: (dict): A `dict` containing values of the metrics listed in `self.metrics`. Example: `{'reward': 0.2, 'accuracy': 0.7}`. """ return_metrics = {} for metric in self.metrics: result = metric.result() if isinstance(result, dict): return_metrics.update(result) else: return_metrics[metric.name] = result return python_utils.pythonify_logs(return_metrics) ``` ``` ## `predict(x, batch_size=None, verbose='auto', steps=None, callbacks=None)` Generates output predictions for the input samples. Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside of loops that iterate over your data and process small numbers of inputs at a time. For small numbers of inputs that fit in one batch, directly use `__call__()` for faster execution, e.g., `program(x)`, or `program(x, training=False)` if you have modules that behave differently during inference. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `ndarray | generator` | Input data. It can be: - A NumPy array (or array-like), or a list of DataModel arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding DataModels, if the program has named inputs. - A Python generator function yielding (inputs, targets). | *required* | | `batch_size` | `int` | Integer or None. Number of samples per batch of computation. If unspecified, batch_size will default to 32. Do not specify the batch_size if your input data x is a synalinks.utils.PyDataset, tf.data.Dataset, torch.utils.data.DataLoader or Python generator function since they generate batches. | `None` | | `verbose` | `int` | "auto", 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. "auto" becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (e.g. in a production environment). Defaults to "auto". | `'auto'` | | `steps` | `int` | Total number of steps (batches of samples) to draw before declaring the prediction round finished. If steps is None, it will run until x is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. | `None` | | `callbacks` | `list` | List of synalinks.callbacks.Callback instances. List of callbacks to apply during prediction. | `None` | Returns: | Type | Description | | --- | --- | | `list` | JsonDataModel array(s) of predictions. If the pipeline failed, a None is added to the predictions. | Source code in `synalinks/src/trainers/trainer.py` ``` async def predict( self, x, batch_size=None, verbose="auto", steps=None, callbacks=None ): """Generates output predictions for the input samples. ``` Computation is done in batches. This method is designed for batch processing of large numbers of inputs. It is not intended for use inside of loops that iterate over your data and process small numbers of inputs at a time. For small numbers of inputs that fit in one batch, directly use `__call__()` for faster execution, e.g., `program(x)`, or `program(x, training=False)` if you have modules that behave differently during inference. Args: x (np.ndarray | generator): Input data. It can be: - A NumPy array (or array-like), or a list of `DataModel` arrays (in case the model has multiple inputs). - A list of dict mapping input names to the corresponding `DataModel`s, if the program has named inputs. - A Python generator function yielding `(inputs, targets)`. batch_size (int): Integer or `None`. Number of samples per batch of computation. If unspecified, `batch_size` will default to 32. Do not specify the `batch_size` if your input data `x` is a `synalinks.utils.PyDataset`, `tf.data.Dataset`, `torch.utils.data.DataLoader` or Python generator function since they generate batches. verbose (int): `"auto"`, 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = single line. `"auto"` becomes 1 for most cases. Note that the progress bar is not particularly useful when logged to a file, so `verbose=2` is recommended when not running interactively (e.g. in a production environment). Defaults to `"auto"`. steps (int): Total number of steps (batches of samples) to draw before declaring the prediction round finished. If `steps` is `None`, it will run until `x` is exhausted. In the case of an infinitely repeating dataset, it will run indefinitely. callbacks (list): List of `synalinks.callbacks.Callback` instances. List of callbacks to apply during prediction. Returns: (list): `JsonDataModel` array(s) of predictions. If the pipeline failed, a None is added to the predictions. """ # Create an iterator that yields batches of input data. epoch_iterator = EpochIterator( x=x, batch_size=batch_size, steps_per_epoch=steps, shuffle=False, steps_per_execution=self.steps_per_execution, ) # Container that configures and calls callbacks. if not isinstance(callbacks, callbacks_module.CallbackList): callbacks = callbacks_module.CallbackList( callbacks, add_history=True, add_progbar=verbose != 0, verbose=verbose, epochs=1, steps=epoch_iterator.num_batches, model=self, ) self.stop_predicting = False callbacks.on_test_begin() outputs = [] for step, iterator in epoch_iterator: callbacks.on_predict_batch_begin(step) data = iterator[0] x_batch, _ = data_adapter_utils.unpack_x_y(data) batch_outputs = await self.predict_on_batch(x_batch) outputs.extend(batch_outputs) callbacks.on_predict_batch_end(step, {"outputs": batch_outputs}) if self.stop_predicting: break callbacks.on_predict_end() return np.array(outputs, dtype="object") ``` ``` ## `predict_on_batch(x, training=False)` Returns predictions for a single batch of samples. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `ndarray` | Input data. Must be array-like. | *required* | | `training` | `bool` | Boolean. True if training. | `False` | Returns: | Type | Description | | --- | --- | | `list` | list(s) of JsonDataModel predictions. | Source code in `synalinks/src/trainers/trainer.py` ``` async def predict_on_batch(self, x, training=False): """Returns predictions for a single batch of samples. ``` Args: x (np.ndarray): Input data. Must be array-like. training (bool): Boolean. True if training. Returns: (list): list(s) of JsonDataModel predictions. """ # Tag this work as "inference" so LanguageModel / EmbeddingModel can # attribute token / latency / cost to the program's forward pass # only. See synalinks.src.backend.common.global_state key # `synalinks_op_scope` — value is one of "inference", "reward", # "optimizer", or None. prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "inference") try: tasks = [] for inputs in x: tasks.append(self(inputs, training=training)) y_pred = await asyncio.gather(*tasks) return y_pred finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) ``` ``` ## `test_on_batch(x, y=None, return_dict=False)` Test the program on a single batch of samples. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `x` | `ndarray` | Input data. Must be array-like. | *required* | | `y` | `ndarray` | Target data. Must be array-like. | `None` | | `return_dict` | `bool` | If True, reward and metric results are returned as a dict, with each key being the name of the metric. If False, they are returned as a list. | `False` | Returns: | Type | Description | | --- | --- | | `float | list | dict` | A scalar reward value (when no metrics and return_dict=False), a list of reward and metric values (if there are metrics and return_dict=False), or a dict of metric and reward values (if return_dict=True). | Source code in `synalinks/src/trainers/trainer.py` ``` async def test_on_batch( self, x, y=None, return_dict=False, ): """Test the program on a single batch of samples. ``` Args: x (np.ndarray): Input data. Must be array-like. y (np.ndarray): Target data. Must be array-like. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): A scalar reward value (when no metrics and `return_dict=False`), a list of reward and metric values (if there are metrics and `return_dict=False`), or a dict of metric and reward values (if `return_dict=True`). """ y_pred = await self.predict_on_batch(x) rewards = await self.compute_reward( x=x, y=y, y_pred=y_pred, training=False, ) reduction = ( self._compile_reward.reduction if self._compile_reward is not None else "mean" ) scalar_reward = rewards_module.reduce_rewards(rewards, reduction) await self._reward_tracker.update_state(scalar_reward) metrics = await self.compute_metrics(x, y, y_pred) if return_dict: return metrics return self._flatten_metrics_in_order(metrics) ``` ``` ## `train_on_batch(step, x, y=None, val_x=None, val_y=None, return_dict=False)` Runs a single optimization step on a single batch of data. Parameters: | Name | Type | Description | Default | | --- | --- | --- | --- | | `step` | `int` | The training step. | *required* | | `x` | `ndarray` | Input data. Must be array-like. | *required* | | `y` | `ndarray` | Target data. Must be array-like. | `None` | | `val_x` | `ndarray` | Input validation data. Must be array-like. | `None` | | `val_y` | `ndarray` | Target validation data. Must be array-like. | `None` | | `return_dict` | `bool` | If True, reward and metric results are returned as a dict, with each key being the name of the metric. If False, they are returned as a list. | `False` | Returns: | Type | Description | | --- | --- | | `float | list | dict` | A scalar reward value (when no metrics and return_dict=False), a list of reward and metric values (if there are metrics and return_dict=False), or a dict of metric and reward values (if return_dict=True). | Source code in `synalinks/src/trainers/trainer.py` ``` async def train_on_batch( self, step, x, y=None, val_x=None, val_y=None, return_dict=False, ): """Runs a single optimization step on a single batch of data. ``` Args: step (int): The training step. x (np.ndarray): Input data. Must be array-like. y (np.ndarray): Target data. Must be array-like. val_x (np.ndarray): Input validation data. Must be array-like. val_y (np.ndarray): Target validation data. Must be array-like. return_dict (bool): If `True`, reward and metric results are returned as a dict, with each key being the name of the metric. If `False`, they are returned as a list. Returns: (float | list | dict): A scalar reward value (when no metrics and `return_dict=False`), a list of reward and metric values (if there are metrics and `return_dict=False`), or a dict of metric and reward values (if `return_dict=True`). """ if self.trainable_variables and isinstance( self.optimizer, optimizers_module.Optimizer ): prev_scope = global_state.get_global_attribute("synalinks_op_scope") global_state.set_global_attribute("synalinks_op_scope", "optimizer") try: metrics = await self.optimizer.optimize( step, self.trainable_variables, x=x, y=y, val_x=val_x, val_y=val_y, ) finally: global_state.set_global_attribute("synalinks_op_scope", prev_scope) else: warnings.warn("The program does not have any trainable variables.") y_pred = await self.predict_on_batch(val_x) rewards = await self.compute_reward( x=val_x, y=val_y, y_pred=y_pred, ) reduction = ( self._compile_reward.reduction if self._compile_reward is not None else "mean" ) scalar_reward = rewards_module.reduce_rewards(rewards, reduction) await self._reward_tracker.update_state(scalar_reward) metrics = await self.compute_metrics(val_x, val_y, y_pred) if return_dict: return metrics return self._flatten_metrics_in_order(metrics) ``` ``` ``` # The Program class Bases: `Trainer`, `Module` A program grouping modules into an object with training/inference features. There is four ways to instantiate a `Program`: ### With the "Functional API" You start from `Input`, you chain modules calls to specify the program's structure, and finally, you create your program from inputs and outputs: ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="chain_of_thought", description="Useful to answer in a step by step manner.", ) if __name__ == "__main__": asyncio.run(main()) ``` Note: Only dicts, lists, and tuples of input data models are supported. Nested inputs are not supported (e.g. lists of list or dicts of dict). ### By subclassing the `Program` class In that case, you should define your modules in `__init__()` and you should implement the program's structure in `call()` . ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) class ChainOfThought(synalinks.Program): """Useful to answer in a step by step manner. The first line of the docstring is provided as description for the program if not provided in the `super().__init__()`. In a similar way the name is automatically infered based on the class name if not provided. """ def __init__( self, language_model=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.answer = synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, name="generator_"+self.name, ) async def call(self, inputs, training=False): if not inputs: return None x = await self.answer(inputs, training=training) return x def get_config(self): config = { "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": synalinks.saving.serialize_synalinks_object( self.language_model ) } return {**config, **language_model_config} @classmethod def from_config(cls, config): language_model = synalinks.saving.deserialize_synalinks_object( config.pop("language_model") ) return cls(language_model=language_model, **config) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) program = ChainOfThought( language_model=language_model, ) ``` If you subclass `Program`, you can optionally have a `training` argument (boolean) in `call()`, which you can use to specify a different behavior in training and inference. Once the program is created, you can config the program with rewards and metrics with `program.compile()`, train the program with `program.fit()`, or use the program to do prediction with `program.predict()` or `program()`. To understand the difference between `program.predict()` or `program()`, read the [FAQ](https://synalinks.github.io/synalinks/FAQ/#whats-the-difference-between-program-methods-predict-and-__call__). ### Mixing the subclassing and the `Functional` API This way of programming is recommended to encapsulate your application while providing an easy to use setup. It is the recommended way for most users as it avoid making your program/agents from scratch. In that case, you should implement only the `__init__()` and `build()` methods. ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): class ChainOfThought(synalinks.Program): """Useful to answer in a step by step manner.""" def __init__( self, language_model=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.language_model = language_model async def build(self, inputs): outputs = await synalinks.Generator( data_model=AnswerWithThinking, language_model=self.language_model, )(inputs) # Create your program using the functional API super().__init__( inputs=inputs, outputs=outputs, name=self.name, description=self.description, trainable=self.trainable, ) language_model = synalinks.LanguageModel( model="ollama/mistral", ) program = ChainOfThought( language_model=language_model, ) if __name__ == "__main__": asyncio.run(main()) ``` This allows you to not have to implement the `call()` and serialization methods (`get_config()` and `from_config()`). The program will be built for any inputs the first time called. ### With the `Sequential` class In addition, `synalinks.Sequential` is a special case of program where the program is purely a stack of single-input, single-output modules. ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): language_model = synalinks.LanguageModel(model="ollama/mistral") program = synalinks.Sequential( [ synalinks.Input( data_model=Query, ), synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, ), ], name="chain_of_thought", description="Useful to answer in a step by step manner.", ) if __name__ == "__main__": asyncio.run(main()) ``` Source code in `synalinks/src/programs/program.py` ```` @synalinks_export(["synalinks.Program", "synalinks.programs.Program"]) class Program(Trainer, Module): """A program grouping modules into an object with training/inference features. There is four ways to instantiate a `Program`: ## With the "Functional API" You start from `Input`, you chain modules calls to specify the program's structure, and finally, you create your program from inputs and outputs: ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="chain_of_thought", description="Useful to answer in a step by step manner.", ) if __name__ == "__main__": asyncio.run(main()) ``` Note: Only dicts, lists, and tuples of input data models are supported. Nested inputs are not supported (e.g. lists of list or dicts of dict). ## By subclassing the `Program` class In that case, you should define your modules in `__init__()` and you should implement the program's structure in `call()` . ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) class ChainOfThought(synalinks.Program): \"\"\"Useful to answer in a step by step manner. The first line of the docstring is provided as description for the program if not provided in the `super().__init__()`. In a similar way the name is automatically infered based on the class name if not provided. \"\"\" def __init__( self, language_model=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.answer = synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, name="generator_"+self.name, ) async def call(self, inputs, training=False): if not inputs: return None x = await self.answer(inputs, training=training) return x def get_config(self): config = { "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = \ { "language_model": synalinks.saving.serialize_synalinks_object( self.language_model ) } return {**config, **language_model_config} @classmethod def from_config(cls, config): language_model = synalinks.saving.deserialize_synalinks_object( config.pop("language_model") ) return cls(language_model=language_model, **config) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) program = ChainOfThought( language_model=language_model, ) ``` If you subclass `Program`, you can optionally have a `training` argument (boolean) in `call()`, which you can use to specify a different behavior in training and inference. Once the program is created, you can config the program with rewards and metrics with `program.compile()`, train the program with `program.fit()`, or use the program to do prediction with `program.predict()` or `program()`. To understand the difference between `program.predict()` or `program()`, read the [FAQ](https://synalinks.github.io/synalinks/FAQ/#whats-the-difference-between-program-methods-predict-and-__call__). ## Mixing the subclassing and the `Functional` API This way of programming is recommended to encapsulate your application while providing an easy to use setup. It is the recommended way for most users as it avoid making your program/agents from scratch. In that case, you should implement only the `__init__()` and `build()` methods. ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): class ChainOfThought(synalinks.Program): \"\"\"Useful to answer in a step by step manner.\"\"\" def __init__( self, language_model=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.language_model = language_model async def build(self, inputs): outputs = await synalinks.Generator( data_model=AnswerWithThinking, language_model=self.language_model, )(inputs) # Create your program using the functional API super().__init__( inputs=inputs, outputs=outputs, name=self.name, description=self.description, trainable=self.trainable, ) language_model = synalinks.LanguageModel( model="ollama/mistral", ) program = ChainOfThought( language_model=language_model, ) if __name__ == "__main__": asyncio.run(main()) ``` This allows you to not have to implement the `call()` and serialization methods (`get_config()` and `from_config()`). The program will be built for any inputs the first time called. ## With the `Sequential` class In addition, `synalinks.Sequential` is a special case of program where the program is purely a stack of single-input, single-output modules. ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithThinking(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking process", ) answer: float = synalinks.Field( description="The correct numerical answer", ) async def main(): language_model = synalinks.LanguageModel(model="ollama/mistral") program = synalinks.Sequential( [ synalinks.Input( data_model=Query, ), synalinks.Generator( data_model=AnswerWithThinking, language_model=language_model, ), ], name="chain_of_thought", description="Useful to answer in a step by step manner.", ) if __name__ == "__main__": asyncio.run(main()) ``` """ def __new__(cls, *args, **kwargs): # Signature detection for usage of `Program` as a `Functional` if functional_init_arguments(args, kwargs) and cls == Program: from synalinks.src.programs.functional import Functional return Functional.__new__(Functional, *args, **kwargs) return typing.cast(cls, super().__new__(cls)) def __init__(self, *args, **kwargs): Trainer.__init__(self) from synalinks.src.programs import functional # Signature detection for usage of a `Program` subclass # as a `Functional` subclass if functional_init_arguments(args, kwargs): inject_functional_program_class(self.__class__) functional.Functional.__init__(self, *args, **kwargs) else: Module.__init__(self, *args, **kwargs) async def call(self, *args, **kwargs): raise NotImplementedError( f"Program {self.__class__.__name__} does not have a `call()` " "method implemented." ) @property def modules(self): return list(self._flatten_modules(include_self=False, recursive=False)) @modules.setter def modules(self, _): raise AttributeError( "`Program.modules` attribute is reserved and should not be used. " "Please use another name." ) def get_module(self, name=None, index=None): """Retrieves a module based on either its name (unique) or index. If `name` and `index` are both provided, `index` will take precedence. Indices are based on order of horizontal graph traversal (bottom-up). Args: name (str): String, name of module. index (int): Integer, index of module. Returns: (Module): A module instance. """ if index is not None and name is not None: raise ValueError( "Provide only a module name or a module index. Received: " f"index={index}, name={name}." ) if index is not None: if len(self.modules) <= index: raise ValueError( f"Was asked to retrieve module at index {index}" f" but program only has {len(self.modules)}" " modules." ) else: return self.modules[index] if name is not None: for module in self.modules: if module.name == name: return module raise ValueError( f"No such module: {name}. Existing modules are: " f"{list(module.name for module in self.modules)}." ) raise ValueError("Provide either a module name or module index at `get_module`.") def summary( self, line_length=None, positions=None, print_fn=None, expand_nested=False, show_trainable=False, module_range=None, ): """Prints a string summary of the program. Args: line_length (int): Total length of printed lines (e.g. set this to adapt the display to different terminal window sizes). positions (list): Relative or absolute positions of log elements in each line. If not provided, becomes `[0.3, 0.6, 0.70, 1.]`. Defaults to `None`. print_fn (Callable): Print function to use. By default, prints to `stdout`. If `stdout` doesn't work in your environment, change to `print`. It will be called on each line of the summary. You can set it to a custom function in order to capture the string summary. expand_nested (bool): Whether to expand the nested models. Defaults to `False`. show_trainable (bool): Whether to show if a module is trainable. Defaults to `False`. module_range (list | tuple): a list or tuple of 2 strings, which is the starting module name and ending module name (both inclusive) indicating the range of modules to be printed in summary. It also accepts regex patterns instead of exact names. In this case, the start predicate will be the first element that matches `module_range[0]` and the end predicate will be the last element that matches `module_range[1]`. By default `None` considers all modules of the model. Raises: ValueError: if `summary()` is called before the model is built. """ summary_utils.print_summary( self, line_length=line_length, positions=positions, print_fn=print_fn, expand_nested=expand_nested, show_trainable=show_trainable, module_range=module_range, ) def save(self, filepath, overwrite=True, **kwargs): """Saves a program as a `.json` file. Example: ```python import synalinks class Query(synalinks.DataModel): query: str class AnswerWithRationale(synalinks.DataModel): rationale: str answer: str language_model = LanguageModel("ollama/mistral") program = synalinks.Sequential( [ synalinks.Input(data_model=Query), synalinks.Generator( data_model=AnswerWithRationale, language_model=language_model, ), ], ) program.save("program.json") loaded_program = synalinks.programs.program_from_json("program.json") ``` The saved `.json` file contains: - The program's configuration (architecture) - The program's variables - The program's optimizer's state (if any) - The program's reward's state (if any) Thus programs can be reinstantiated in the exact same state. Args: filepath (str | os.PathLike): `str` or `os.PathLike` object. The path where to save the model. Must end in `.json`. overwrite (bool): Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. Default to `True`. """ from synalinks.src.saving import serialization_lib filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".json"): raise ValueError( f"The filepath should ends with '.json', received filepath={filepath}" ) program_config = serialization_lib.serialize_synalinks_object(self) variables_config = self.get_state_tree() program_config.update({"variables": variables_config}) program_config_string = orjson.dumps( program_config, option=orjson.OPT_INDENT_2 ).decode() if file_utils.exists(filepath) and not overwrite: io_utils.ask_to_proceed_with_overwrite(filepath) with open(filepath, "w") as f: f.write(program_config_string) async def build_from_config(self, config): if not config: return status = False if "input_schema" in config: # Case: all inputs are in the first arg (possibly nested). if utils.is_default(self.build): status = self._build_by_run_for_single_pos_arg(config["input_schema"]) else: try: await self.build(config["input_schema"]) status = True except Exception: pass self._build_schemas_dict = config elif "schemas_dict" in config: # Case: inputs were recorded as multiple keyword arguments. if utils.is_default(self.build): status = self._build_by_run_for_kwargs(config["schemas_dict"]) else: try: await self.build(**config["schemas_dict"]) status = True except Exception: pass self._build_schemas_dict = config["schemas_dict"] if not status: warnings.warn( f"Program '{self.name}' had a build config, but the program " "cannot be built automatically in " "`build_from_config(config)`. " "You should implement " "`def build_from_config(self, config)`, " "and you might also want to implement the method " " that generates the config at saving time, " "`def get_build_config(self)`. " "The method `build_from_config()` is meant to " "create the state of the model (i.e. its variables) " "upon deserialization.", stacklevel=2, ) def to_json(self, **kwargs): """Returns a JSON string containing the network configuration. ```python json_string = program.to_json() ``` To load a network from a JSON save file, use `synalinks.programs.program_from_json(json_string, custom_objects={...})`. Args: **kwargs (keyword arguments): Additional keyword arguments to be passed to `orjson.dumps()`. Returns: (str): A JSON string. """ from synalinks.src.saving import serialization_lib program_config = serialization_lib.serialize_synalinks_object(self) return orjson.dumps(program_config, **kwargs).decode() @classmethod def from_config(cls, config, custom_objects=None): from synalinks.src.programs.functional import Functional functional_config_keys = [ "name", "modules", "input_modules", "output_modules", ] is_functional_config = all(key in config for key in functional_config_keys) argspec = inspect.getfullargspec(cls.__init__) functional_init_args = inspect.getfullargspec(Functional.__init__).args[1:] revivable_as_functional = ( cls in {Functional, Program} or argspec.args[1:] == functional_init_args or (argspec.varargs == "args" and argspec.varkw == "kwargs") ) if is_functional_config and revivable_as_functional: # Revive Functional model # (but not Functional subclasses with a custom __init__) from synalinks.src.programs.functional import functional_from_config return functional_from_config(cls, config, custom_objects=custom_objects) # Either the model has a custom __init__, or the config # does not contain all the information necessary to # revive a Functional model. This happens when the user creates # subclassed models where `get_config()` is returning # insufficient information to be considered a Functional model. # In this case, we fall back to provide all config into the # constructor of the class. try: return cls(**config) except TypeError as e: raise TypeError( "Unable to revive program from config. When overriding " "the `get_config()` method, make sure that the " "returned config contains all items used as arguments " f"in the constructor to {cls}, " "which is the default behavior. " "You can override this default behavior by defining a " "`from_config(cls, config)` class method to specify " "how to create an " f"instance of {cls.__name__} from its config.\n\n" f"Received config={config}\n\n" f"Error encountered during deserialization: {e}" ) def get_state_tree(self): """Retrieves tree-like structure of program variables. This method allows retrieval of different program variables (trainable, non-trainable, optimizer, and metrics). The variables are returned in a nested dictionary format, where the keys correspond to the variable names and the values are the nested representations of the variables. Example: ```python program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), ) program.fit(x=x_train, y=y_train) state_tree = program.get_state_tree() ``` Returns: (dict): A dictionary containing the nested representations of the requested variables. The keys are the variable names, and the values are the corresponding nested dictionaries. """ variables = {} variables["trainable_variables"] = self._create_nested_dict( self.trainable_variables ) variables["non_trainable_variables"] = self._create_nested_dict( self.non_trainable_variables ) if self.optimizer: variables["optimizer_trainable_variables"] = self._create_nested_dict( self.optimizer.trainable_variables ) variables["optimizer_non_trainable_variables"] = self._create_nested_dict( self.optimizer.non_trainable_variables ) variables["metrics_variables"] = self._create_nested_dict(self.metrics_variables) return variables def _create_nested_dict(self, variables): flat_dict = {} for v in variables: if v.path in flat_dict: raise ValueError( "The following variable path is found twice in the program: " f"'{v.path}'. `get_state_tree()` can only be called when " "all variable paths are unique. Make sure to give unique " "names to your modules (and other objects)." ) flat_dict[v.path] = v.get_json() nested_dict = {} for path, value in flat_dict.items(): parts = path.split("/") current_dict = nested_dict for part in parts[:-1]: if part not in current_dict: current_dict[part] = {} current_dict = current_dict[part] current_dict[parts[-1]] = value return nested_dict def set_state_tree(self, state_tree): """Assigns values to variables of the program. This method takes a dictionary of nested variable values, which represents the state tree of the program, and assigns them to the corresponding variables of the program. The dictionary keys represent the variable names (e.g., `'trainable_variables'`, `'optimizer_variables'`), and the values are nested dictionaries containing the variable paths and their corresponding values. Args: state_tree (dict): A dictionary representing the state tree of the program. The keys are the variable names, and the values are nested dictionaries representing the variable paths and their values. """ for k, v in state_tree.items(): path_value_dict = self._flatten_nested_dict(v) if k == "trainable_variables": self._assign_variable_values(self.trainable_variables, path_value_dict) elif k == "non_trainable_variables": self._assign_variable_values( self.non_trainable_variables, path_value_dict ) elif k == "optimizer_trainable_variables": if self.optimizer: self._assign_variable_values( self.optimizer.trainable_variables, path_value_dict ) elif k == "optimizer_non_trainable_variables": if self.optimizer: self._assign_variable_values( self.optimizer.non_trainable_variables, path_value_dict ) elif k == "metrics_variables": self._assign_variable_values(self.metrics_variables, path_value_dict) else: raise ValueError(f"Unknown variable name: {k}") def _assign_variable_values(self, variables, path_value_dict): for full_path, value in path_value_dict.items(): path = "/".join(full_path.split("/")[:-1]) field_name = full_path.split("/")[-1] for variable in variables: if variable.path == path: variable.get_json()[field_name] = value def _flatten_nested_dict(self, nested_dict): flat_dict = {} def _flatten(current_dict, prefix=""): for key, value in current_dict.items(): if isinstance(value, dict): _flatten(value, prefix + key + "/") else: flat_dict[prefix + key] = value _flatten(nested_dict) return flat_dict def save_variables(self, filepath, overwrite=True): """Saves all module variables to a `.variables.json` file. Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path where to save the program. Must end in `.variables.json`. overwrite (bool): Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".variables.json"): raise ValueError( "The filepath should ends with '.variables.json', " f"received filepath={filepath}" ) config = self.get_state_tree() config_string = orjson.dumps(config, option=orjson.OPT_INDENT_2).decode() if file_utils.exists(filepath) and not overwrite: io_utils.ask_to_proceed_with_overwrite(filepath) with open(filepath, "w") as f: f.write(config_string) def load_variables(self, filepath): """Load all module variables from a `.variable.json` file. Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path to load the program's variables from. Must end in `.variables.json`. """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".variables.json"): raise ValueError( "The filepath should ends with '.variables.json', " f"received filepath={filepath}" ) with open(filepath, "rb") as f: state_tree_config = orjson.loads(f.read()) self.set_state_tree(state_tree_config) @classmethod def load(cls, filepath, custom_objects=None): """Load a program from a JSON file. Example: ```python import synalinks loaded_program = synalinks.Program.load("program.json") ``` Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path to load the program's variables from. Must end in `.variables.json`. custom_objects (dict): Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization. Returns: (Program): A Synalinks program instance (uncompiled). """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".json"): raise ValueError( f"The filepath should ends with '.json', received filepath={filepath}" ) with open(filepath, "r") as f: json_config = f.read() return program_from_json(json_config, custom_objects=custom_objects) ```` ## `get_module(name=None, index=None)` Retrieves a module based on either its name (unique) or index. If `name` and `index` are both provided, `index` will take precedence. Indices are based on order of horizontal graph traversal (bottom-up). Parameters: | Name | Type | Description | Default | | ------- | ----- | ------------------------- | ------- | | `name` | `str` | String, name of module. | `None` | | `index` | `int` | Integer, index of module. | `None` | Returns: | Type | Description | | -------- | ------------------ | | `Module` | A module instance. | Source code in `synalinks/src/programs/program.py` ``` def get_module(self, name=None, index=None): """Retrieves a module based on either its name (unique) or index. If `name` and `index` are both provided, `index` will take precedence. Indices are based on order of horizontal graph traversal (bottom-up). Args: name (str): String, name of module. index (int): Integer, index of module. Returns: (Module): A module instance. """ if index is not None and name is not None: raise ValueError( "Provide only a module name or a module index. Received: " f"index={index}, name={name}." ) if index is not None: if len(self.modules) <= index: raise ValueError( f"Was asked to retrieve module at index {index}" f" but program only has {len(self.modules)}" " modules." ) else: return self.modules[index] if name is not None: for module in self.modules: if module.name == name: return module raise ValueError( f"No such module: {name}. Existing modules are: " f"{list(module.name for module in self.modules)}." ) raise ValueError("Provide either a module name or module index at `get_module`.") ``` ## `get_state_tree()` Retrieves tree-like structure of program variables. This method allows retrieval of different program variables (trainable, non-trainable, optimizer, and metrics). The variables are returned in a nested dictionary format, where the keys correspond to the variable names and the values are the nested representations of the variables. Example: ``` program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), ) program.fit(x=x_train, y=y_train) state_tree = program.get_state_tree() ``` Returns: | Type | Description | | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `dict` | A dictionary containing the nested representations of the requested variables. The keys are the variable names, and the values are the corresponding nested dictionaries. | Source code in `synalinks/src/programs/program.py` ```` def get_state_tree(self): """Retrieves tree-like structure of program variables. This method allows retrieval of different program variables (trainable, non-trainable, optimizer, and metrics). The variables are returned in a nested dictionary format, where the keys correspond to the variable names and the values are the nested representations of the variables. Example: ```python program.compile( optimizer=synalinks.optimizers.RandomFewShot(), reward=synalinks.rewards.ExactMatch(), ) program.fit(x=x_train, y=y_train) state_tree = program.get_state_tree() ``` Returns: (dict): A dictionary containing the nested representations of the requested variables. The keys are the variable names, and the values are the corresponding nested dictionaries. """ variables = {} variables["trainable_variables"] = self._create_nested_dict( self.trainable_variables ) variables["non_trainable_variables"] = self._create_nested_dict( self.non_trainable_variables ) if self.optimizer: variables["optimizer_trainable_variables"] = self._create_nested_dict( self.optimizer.trainable_variables ) variables["optimizer_non_trainable_variables"] = self._create_nested_dict( self.optimizer.non_trainable_variables ) variables["metrics_variables"] = self._create_nested_dict(self.metrics_variables) return variables ```` ## `load(filepath, custom_objects=None)` Load a program from a JSON file. Example: ``` import synalinks loaded_program = synalinks.Program.load("program.json") ``` Parameters: | Name | Type | Description | Default | | ---------------- | ------ | ------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | | `filepath` | \`str | Path\` | str or pathlib.Path object. Path to load the program's variables from. Must end in .variables.json. | | `custom_objects` | `dict` | Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization. | `None` | Returns: | Type | Description | | --------- | ------------------------------------------ | | `Program` | A Synalinks program instance (uncompiled). | Source code in `synalinks/src/programs/program.py` ```` @classmethod def load(cls, filepath, custom_objects=None): """Load a program from a JSON file. Example: ```python import synalinks loaded_program = synalinks.Program.load("program.json") ``` Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path to load the program's variables from. Must end in `.variables.json`. custom_objects (dict): Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization. Returns: (Program): A Synalinks program instance (uncompiled). """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".json"): raise ValueError( f"The filepath should ends with '.json', received filepath={filepath}" ) with open(filepath, "r") as f: json_config = f.read() return program_from_json(json_config, custom_objects=custom_objects) ```` ## `load_variables(filepath)` Load all module variables from a `.variable.json` file. Parameters: | Name | Type | Description | Default | | ---------- | ----- | ----------- | --------------------------------------------------------------------------------------------------- | | `filepath` | \`str | Path\` | str or pathlib.Path object. Path to load the program's variables from. Must end in .variables.json. | Source code in `synalinks/src/programs/program.py` ``` def load_variables(self, filepath): """Load all module variables from a `.variable.json` file. Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path to load the program's variables from. Must end in `.variables.json`. """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".variables.json"): raise ValueError( "The filepath should ends with '.variables.json', " f"received filepath={filepath}" ) with open(filepath, "rb") as f: state_tree_config = orjson.loads(f.read()) self.set_state_tree(state_tree_config) ``` ## `save(filepath, overwrite=True, **kwargs)` Saves a program as a `.json` file. Example: ``` import synalinks class Query(synalinks.DataModel): query: str class AnswerWithRationale(synalinks.DataModel): rationale: str answer: str language_model = LanguageModel("ollama/mistral") program = synalinks.Sequential( [ synalinks.Input(data_model=Query), synalinks.Generator( data_model=AnswerWithRationale, language_model=language_model, ), ], ) program.save("program.json") loaded_program = synalinks.programs.program_from_json("program.json") ``` The saved `.json` file contains: - The program's configuration (architecture) - The program's variables - The program's optimizer's state (if any) - The program's reward's state (if any) Thus programs can be reinstantiated in the exact same state. Parameters: | Name | Type | Description | Default | | ----------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | | `filepath` | \`str | PathLike\` | str or os.PathLike object. The path where to save the model. Must end in .json. | | `overwrite` | `bool` | Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. Default to True. | `True` | Source code in `synalinks/src/programs/program.py` ```` def save(self, filepath, overwrite=True, **kwargs): """Saves a program as a `.json` file. Example: ```python import synalinks class Query(synalinks.DataModel): query: str class AnswerWithRationale(synalinks.DataModel): rationale: str answer: str language_model = LanguageModel("ollama/mistral") program = synalinks.Sequential( [ synalinks.Input(data_model=Query), synalinks.Generator( data_model=AnswerWithRationale, language_model=language_model, ), ], ) program.save("program.json") loaded_program = synalinks.programs.program_from_json("program.json") ``` The saved `.json` file contains: - The program's configuration (architecture) - The program's variables - The program's optimizer's state (if any) - The program's reward's state (if any) Thus programs can be reinstantiated in the exact same state. Args: filepath (str | os.PathLike): `str` or `os.PathLike` object. The path where to save the model. Must end in `.json`. overwrite (bool): Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. Default to `True`. """ from synalinks.src.saving import serialization_lib filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".json"): raise ValueError( f"The filepath should ends with '.json', received filepath={filepath}" ) program_config = serialization_lib.serialize_synalinks_object(self) variables_config = self.get_state_tree() program_config.update({"variables": variables_config}) program_config_string = orjson.dumps( program_config, option=orjson.OPT_INDENT_2 ).decode() if file_utils.exists(filepath) and not overwrite: io_utils.ask_to_proceed_with_overwrite(filepath) with open(filepath, "w") as f: f.write(program_config_string) ```` ## `save_variables(filepath, overwrite=True)` Saves all module variables to a `.variables.json` file. Parameters: | Name | Type | Description | Default | | ----------- | ------ | --------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | | `filepath` | \`str | Path\` | str or pathlib.Path object. Path where to save the program. Must end in .variables.json. | | `overwrite` | `bool` | Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. | `True` | Source code in `synalinks/src/programs/program.py` ``` def save_variables(self, filepath, overwrite=True): """Saves all module variables to a `.variables.json` file. Args: filepath (str | pathlib.Path): `str` or `pathlib.Path` object. Path where to save the program. Must end in `.variables.json`. overwrite (bool): Whether we should overwrite any existing program at the target location, or instead ask the user via an interactive prompt. """ filepath = file_utils.path_to_string(filepath) if not filepath.endswith(".variables.json"): raise ValueError( "The filepath should ends with '.variables.json', " f"received filepath={filepath}" ) config = self.get_state_tree() config_string = orjson.dumps(config, option=orjson.OPT_INDENT_2).decode() if file_utils.exists(filepath) and not overwrite: io_utils.ask_to_proceed_with_overwrite(filepath) with open(filepath, "w") as f: f.write(config_string) ``` ## `set_state_tree(state_tree)` Assigns values to variables of the program. This method takes a dictionary of nested variable values, which represents the state tree of the program, and assigns them to the corresponding variables of the program. The dictionary keys represent the variable names (e.g., `'trainable_variables'`, `'optimizer_variables'`), and the values are nested dictionaries containing the variable paths and their corresponding values. Parameters: | Name | Type | Description | Default | | ------------ | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `state_tree` | `dict` | A dictionary representing the state tree of the program. The keys are the variable names, and the values are nested dictionaries representing the variable paths and their values. | *required* | Source code in `synalinks/src/programs/program.py` ``` def set_state_tree(self, state_tree): """Assigns values to variables of the program. This method takes a dictionary of nested variable values, which represents the state tree of the program, and assigns them to the corresponding variables of the program. The dictionary keys represent the variable names (e.g., `'trainable_variables'`, `'optimizer_variables'`), and the values are nested dictionaries containing the variable paths and their corresponding values. Args: state_tree (dict): A dictionary representing the state tree of the program. The keys are the variable names, and the values are nested dictionaries representing the variable paths and their values. """ for k, v in state_tree.items(): path_value_dict = self._flatten_nested_dict(v) if k == "trainable_variables": self._assign_variable_values(self.trainable_variables, path_value_dict) elif k == "non_trainable_variables": self._assign_variable_values( self.non_trainable_variables, path_value_dict ) elif k == "optimizer_trainable_variables": if self.optimizer: self._assign_variable_values( self.optimizer.trainable_variables, path_value_dict ) elif k == "optimizer_non_trainable_variables": if self.optimizer: self._assign_variable_values( self.optimizer.non_trainable_variables, path_value_dict ) elif k == "metrics_variables": self._assign_variable_values(self.metrics_variables, path_value_dict) else: raise ValueError(f"Unknown variable name: {k}") ``` ## `summary(line_length=None, positions=None, print_fn=None, expand_nested=False, show_trainable=False, module_range=None)` Prints a string summary of the program. Parameters: | Name | Type | Description | Default | | ---------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `line_length` | `int` | Total length of printed lines (e.g. set this to adapt the display to different terminal window sizes). | `None` | | `positions` | `list` | Relative or absolute positions of log elements in each line. If not provided, becomes [0.3, 0.6, 0.70, 1.]. Defaults to None. | `None` | | `print_fn` | `Callable` | Print function to use. By default, prints to stdout. If stdout doesn't work in your environment, change to print. It will be called on each line of the summary. You can set it to a custom function in order to capture the string summary. | `None` | | `expand_nested` | `bool` | Whether to expand the nested models. Defaults to False. | `False` | | `show_trainable` | `bool` | Whether to show if a module is trainable. Defaults to False. | `False` | | `module_range` | \`list | tuple\` | a list or tuple of 2 strings, which is the starting module name and ending module name (both inclusive) indicating the range of modules to be printed in summary. It also accepts regex patterns instead of exact names. In this case, the start predicate will be the first element that matches module_range[0] and the end predicate will be the last element that matches module_range[1]. By default None considers all modules of the model. | Raises: | Type | Description | | ------------ | ------------------------------------------------- | | `ValueError` | if summary() is called before the model is built. | Source code in `synalinks/src/programs/program.py` ``` def summary( self, line_length=None, positions=None, print_fn=None, expand_nested=False, show_trainable=False, module_range=None, ): """Prints a string summary of the program. Args: line_length (int): Total length of printed lines (e.g. set this to adapt the display to different terminal window sizes). positions (list): Relative or absolute positions of log elements in each line. If not provided, becomes `[0.3, 0.6, 0.70, 1.]`. Defaults to `None`. print_fn (Callable): Print function to use. By default, prints to `stdout`. If `stdout` doesn't work in your environment, change to `print`. It will be called on each line of the summary. You can set it to a custom function in order to capture the string summary. expand_nested (bool): Whether to expand the nested models. Defaults to `False`. show_trainable (bool): Whether to show if a module is trainable. Defaults to `False`. module_range (list | tuple): a list or tuple of 2 strings, which is the starting module name and ending module name (both inclusive) indicating the range of modules to be printed in summary. It also accepts regex patterns instead of exact names. In this case, the start predicate will be the first element that matches `module_range[0]` and the end predicate will be the last element that matches `module_range[1]`. By default `None` considers all modules of the model. Raises: ValueError: if `summary()` is called before the model is built. """ summary_utils.print_summary( self, line_length=line_length, positions=positions, print_fn=print_fn, expand_nested=expand_nested, show_trainable=show_trainable, module_range=module_range, ) ``` ## `to_json(**kwargs)` Returns a JSON string containing the network configuration. ``` json_string = program.to_json() ``` To load a network from a JSON save file, use `synalinks.programs.program_from_json(json_string, custom_objects={...})`. Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------------------------------------ | ------- | | `**kwargs` | `keyword arguments` | Additional keyword arguments to be passed to orjson.dumps(). | `{}` | Returns: | Type | Description | | ----- | -------------- | | `str` | A JSON string. | Source code in `synalinks/src/programs/program.py` ```` def to_json(self, **kwargs): """Returns a JSON string containing the network configuration. ```python json_string = program.to_json() ``` To load a network from a JSON save file, use `synalinks.programs.program_from_json(json_string, custom_objects={...})`. Args: **kwargs (keyword arguments): Additional keyword arguments to be passed to `orjson.dumps()`. Returns: (str): A JSON string. """ from synalinks.src.saving import serialization_lib program_config = serialization_lib.serialize_synalinks_object(self) return orjson.dumps(program_config, **kwargs).decode() ```` # The Sequential class Bases: `Program` `Sequential` groups a linear stack of modules into a `Program`. Examples: ``` program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Input( data_program=Query, ) ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # Note that you can also omit the initial `Input`. # In that case the program doesn't have any variables until the first call # to a training/evaluation method (since it isn't yet built): program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # program.variables not created yet # Whereas if you specify an `Input`, the program gets built # continuously as you are adding modules: program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Input( data_program=Query, ) ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # Note that when using the delayed-build pattern (no input specified), # the program gets built the first time you call `fit`, `eval`, or `predict`, # or the first time you call the program on some input data. ``` Source code in `synalinks/src/programs/sequential.py` ```` @synalinks_export(["synalinks.Sequential", "synalinks.programs.Sequential"]) class Sequential(Program): """`Sequential` groups a linear stack of modules into a `Program`. Examples: ```python program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Input( data_program=Query, ) ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # Note that you can also omit the initial `Input`. # In that case the program doesn't have any variables until the first call # to a training/evaluation method (since it isn't yet built): program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # program.variables not created yet # Whereas if you specify an `Input`, the program gets built # continuously as you are adding modules: program = synalinks.Sequential( name="chain_of_thought", description="Useful to answer in a step by step manner." ) program.add( synalinks.Input( data_program=Query, ) ) program.add( synalinks.Generator( data_program=AnswerWithRationale, language_program=language_program, ) ) # Note that when using the delayed-build pattern (no input specified), # the program gets built the first time you call `fit`, `eval`, or `predict`, # or the first time you call the program on some input data. ``` """ def __new__(cls, *args, **kwargs): return typing.cast(cls, super().__new__(cls)) def __init__(self, modules=None, trainable=True, name=None, description=None): if description is None: raise ValueError( "All Sequential programs must have a `description`, " "please add it to the constructor arguments" ) super().__init__(trainable=trainable, name=name, description=description) self._functional = None self._modules = [] if modules: for module in modules: self.add(module, rebuild=False) run_maybe_nested(self._maybe_rebuild()) def add(self, module, rebuild=True): """Adds a module instance on top of the module stack. Args: module (Module): Module instance. rebuild (bool): If `True` rebuild the program. """ # If we are passed a SymbolicDataModel created by synalinks.Input(), we # extract the input module from its synalinks history and use that. if hasattr(module, "_synalinks_history"): origin_module = module._synalinks_history[0] if isinstance(origin_module, InputModule): module = origin_module if not isinstance(module, Module): raise ValueError( "Only instances of `synalinks.Module` can be " f"added to a Sequential program. Received: {module} " f"(of type {type(module)})" ) if not self._is_module_name_unique(module): raise ValueError( "All modules added to a Sequential program " f"should have unique names. Name '{module.name}' is already " "the name of a module in this program. Update the `name` argument " "to pass a unique name." ) if ( isinstance(module, InputModule) and self._modules and isinstance(self._modules[0], InputModule) ): raise ValueError( f"Sequential program '{self.name}' has already been configured " f"to use input schema {self._modules[0].input_schema}. You cannot " f"add a different Input module to it." ) self._modules.append(module) if rebuild: run_maybe_nested(self._maybe_rebuild()) else: self.built = False self._functional = None def pop(self, rebuild=True): """Removes the last module in the program. Args: rebuild (bool): If `True` rebuild the program. """ module = self._modules.pop() self.built = False self._functional = None if rebuild: run_maybe_nested(self._maybe_rebuild()) return module async def _maybe_rebuild(self): self.built = False self._functional = None if isinstance(self._modules[0], InputModule) and len(self._modules) > 1: input_schema = self._modules[0].get_schema() await self.build(Input(schema=input_schema)) elif hasattr(self._modules[0], "input_schema") and len(self._modules) > 1: # We can build the Sequential program if the first module has the # `input_schema` property. This is most commonly found in Functional # program. input_schema = self._modules[0].input_schema await self.build(Input(schema=input_schema)) def _lock_state(self): # Unlike other modules, Sequential is mutable after build. pass def _obj_type(self): return "Sequential" async def build(self, inputs): try: input_schema = standardize_schema(inputs.get_schema()) except Exception: # Do not attempt to build if the program does not have a single # input. return if not self._modules: raise ValueError( f"Sequential program {self.name} cannot be built because it has " "no modules. Call `program.add(module)`." ) if isinstance(self._modules[0], InputModule): if self._modules[0].get_schema() != input_schema: raise ValueError( f"Sequential program '{self.name}' has already been " "configured to use input schema " f"{self._modules[0].get_schema()}. You cannot build it " f"with input_schema {input_schema}" ) else: self._modules = [InputModule(schema=input_schema)] + self._modules # Build functional program inputs = self._modules[0].output x = inputs for module in self._modules[1:]: try: x = await module(x) except NotImplementedError: # Can happen if spec inference is not implemented. # TODO: consider reverting inbound nodes on modules processed. return except TypeError as e: signature = inspect.signature(module.call) positional_args = [ param for param in signature.parameters.values() if param.default == inspect.Parameter.empty ] if len(positional_args) != 1: raise ValueError( "Modules added to a Sequential program " "can only have a single positional argument, " f"the input data model. Module {module.__class__.__name__} " f"has multiple positional arguments: {positional_args}" ) raise e outputs = x self._functional = Functional(inputs=inputs, outputs=outputs) self.built = True async def call(self, inputs, training=None): if self._functional: return await self._functional.call(inputs, training=training) # Fallback: Just apply the module sequence. # This typically happens if `inputs` is a nested struct. for module in self.modules: # During each iteration, `inputs` are the inputs to `module`, and # `outputs` are the outputs of `module` applied to `inputs`. At the # end of each iteration `inputs` is set to `outputs` to prepare for # the next module. kwargs = {} if module._call_has_training_arg and training is not None: kwargs["training"] = training outputs = await module(inputs, **kwargs) inputs = outputs return outputs @property def modules(self): """Unlike Keras, also output the potentially auto-generated `InputModule`""" return self._modules @modules.setter def modules(self, _): raise AttributeError( "`Sequential.modules` attribute is reserved and should not be used. " "Use `add()` and `pop()` to change the modules in this program." ) async def compute_output_spec(self, inputs, training=None): if self._functional: return await self._functional.compute_output_spec( inputs, training=training, ) # Direct application for module in self.modules: outputs = await module.compute_output_spec(inputs, training=training) inputs = outputs return outputs @property def input_schema(self): if self._functional: return self._functional.input_schema raise AttributeError( f"Sequential program '{self.name}' has no defined input schema yet." ) @property def output_schema(self): if self._functional: return self._functional.output_schema raise AttributeError( f"Sequential program '{self.name}' has no defined output schema yet." ) @property def inputs(self): if self._functional: return self._functional.inputs raise AttributeError( f"Sequential program '{self.name}' has no defined inputs yet." ) @property def outputs(self): if self._functional: return self._functional.outputs raise AttributeError( f"Sequential program '{self.name}' has no defined outputs yet." ) def _is_module_name_unique(self, module): for ref_module in self._modules: if module.name == ref_module.name and ref_module is not module: return False return True def get_config(self): serialize_fn = serialization_lib.serialize_synalinks_object module_configs = [] for module in self.modules: module_configs.append(serialize_fn(module)) config = Program.get_config(self) config["name"] = self.name config["description"] = self.description config["modules"] = copy.deepcopy(module_configs) if self._functional is not None: config["build_input_schema"] = self._modules[0].input_schema return config @classmethod def from_config(cls, config, custom_objects=None): if "name" in config: name = config["name"] build_input_schema = config.get("build_input_schema") module_configs = config["modules"] else: name = None module_configs = config if "description" in config: description = config["description"] else: description = None program = cls(name=name, description=description) for module_config in module_configs: module = serialization_lib.deserialize_synalinks_object( module_config, custom_objects=custom_objects, ) program.add(module) if ( not program._functional and "build_input_schema" in locals() and build_input_schema and isinstance(build_input_schema, (tuple, list)) ): program.build(build_input_schema) return program ```` ## `modules` Unlike Keras, also output the potentially auto-generated `InputModule` ## `add(module, rebuild=True)` Adds a module instance on top of the module stack. Parameters: | Name | Type | Description | Default | | --------- | -------- | ---------------------------- | ---------- | | `module` | `Module` | Module instance. | *required* | | `rebuild` | `bool` | If True rebuild the program. | `True` | Source code in `synalinks/src/programs/sequential.py` ``` def add(self, module, rebuild=True): """Adds a module instance on top of the module stack. Args: module (Module): Module instance. rebuild (bool): If `True` rebuild the program. """ # If we are passed a SymbolicDataModel created by synalinks.Input(), we # extract the input module from its synalinks history and use that. if hasattr(module, "_synalinks_history"): origin_module = module._synalinks_history[0] if isinstance(origin_module, InputModule): module = origin_module if not isinstance(module, Module): raise ValueError( "Only instances of `synalinks.Module` can be " f"added to a Sequential program. Received: {module} " f"(of type {type(module)})" ) if not self._is_module_name_unique(module): raise ValueError( "All modules added to a Sequential program " f"should have unique names. Name '{module.name}' is already " "the name of a module in this program. Update the `name` argument " "to pass a unique name." ) if ( isinstance(module, InputModule) and self._modules and isinstance(self._modules[0], InputModule) ): raise ValueError( f"Sequential program '{self.name}' has already been configured " f"to use input schema {self._modules[0].input_schema}. You cannot " f"add a different Input module to it." ) self._modules.append(module) if rebuild: run_maybe_nested(self._maybe_rebuild()) else: self.built = False self._functional = None ``` ## `pop(rebuild=True)` Removes the last module in the program. Parameters: | Name | Type | Description | Default | | --------- | ------ | ---------------------------- | ------- | | `rebuild` | `bool` | If True rebuild the program. | `True` | Source code in `synalinks/src/programs/sequential.py` ``` def pop(self, rebuild=True): """Removes the last module in the program. Args: rebuild (bool): If `True` rebuild the program. """ module = self._modules.pop() self.built = False self._functional = None if rebuild: run_maybe_nested(self._maybe_rebuild()) return module ``` ## `DeepAgent` Bases: `Module` A coding agent with filesystem and shell access scoped to a workdir. DeepAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires up to six workspace tools: - `read_file`: read a file with line-based pagination, output prefixed with 1-based line numbers (`cat -n` style). - `list_directory`: list entries in a directory. - `search_files`: glob for files and optionally grep their contents (regex). Combines find and grep in one call. - `write_file`: overwrite or create a file (gated by `allow_write`). - `edit_file`: exact-string replacement, one occurrence at a time (gated by `allow_write`). - `run_bash`: run a shell command (gated by `allow_bash`). The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are `workdir` (required) and the safety knobs: `allow_write`, `allow_bash`, `timeout`, `max_output_chars`. User-supplied `tools` are appended to the built-in ones. #### Security model File tools (`read_file` / `write_file` / `edit_file` / `list_directory`) refuse any path that resolves outside the workdir, including `..` traversal and absolute paths. Paths are canonicalized via `Path.resolve()` (which flattens `..` and follows existing symlinks) and then prefix-checked against the resolved workdir, so a symlink-inside-workdir pointing to `/etc/passwd` is also caught. File opens use `O_NOFOLLOW` where the OS supports it as defense in depth against TOCTOU symlink swaps. The bash tool is **NOT sandboxed**. Its `cwd` is the workdir, but the shell can still read or write any path the host process can. If you're running this on untrusted input, run the host process inside a container or other OS-level isolation; the Python layer cannot make `run_bash` safe on its own. Disable it with `allow_bash=False` when you don't need it. Example: ``` import synalinks import asyncio async def main(): lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.DeepAgent( workdir="/tmp/my_project", language_model=lm, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) messages = synalinks.ChatMessages(messages=[ synalinks.ChatMessage( role="user", content="What's in this directory?", ) ]) result = await agent(messages) print(result.get("messages")[-1].get("content")) asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | ------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `workdir` | `str` | Working directory the agent operates in. Required. Must exist. All file paths supplied by the LM are resolved relative to it and rejected if they escape. | *required* | | `allow_write` | `bool` | When False, write_file and edit_file are omitted from the tool set. Defaults to True. | `True` | | `allow_bash` | `bool` | When False, run_bash is omitted. Defaults to True. | `True` | | `timeout` | `float` | Per-command bash timeout in seconds. Defaults to 30. | `30.0` | | `max_output_chars` | `int` | Cap on characters returned per stream from read_file (single stream) and run_bash (stdout and stderr each). Also caps the length of each matching line returned by search_files. Defaults to 10000. | `10000` | | `max_search_results` | `int` | Cap on entries returned by search_files (matching files or matching lines). Defaults to 100. | `100` | | `tools` | `list` | Additional :class:Tool instances (or plain async functions) to expose alongside the built-in tools. Names must not start with _ or collide with built-ins. | `None` | | `schema` | `dict` | JSON schema for the final answer. | `None` | | `data_model` | `DataModel` | DataModel for the final answer. Mutually exclusive with schema. | `None` | | `language_model` | `LanguageModel` | The language model that drives the agent loop. | `None` | | `prompt_template` | `str` | Forwarded to the tool-call generator. | `None` | | `examples` | `list` | Few-shot examples for the tool-call generator. | `None` | | `instructions` | `str` | Override the default system instructions. When omitted, the default is built from the workdir and the configured permissions. | `None` | | `final_instructions` | `str` | Instructions for the final-answer generator. Defaults to instructions. | `None` | | `temperature` | `float` | LM sampling temperature. Defaults to 0.0. | `0.0` | | `use_inputs_schema` | `bool` | Include the input schema in the prompt. | `False` | | `use_outputs_schema` | `bool` | Include the output schema in the prompt. | `False` | | `reasoning_effort` | `str` | Forwarded to the generators (for reasoning-capable LMs). | `None` | | `use_chain_of_thought` | `bool` | When True, the tool-call generator emits a thinking field per round. | `False` | | `autonomous` | `bool` | When True (default), the agent runs the tool loop end-to-end. When False, returns one step at a time for human-in-the-loop workflows. | `True` | | `return_inputs_with_trajectory` | `bool` | When True (default), the full message trajectory is included alongside the final answer. | `True` | | `max_iterations` | `int` | Maximum number of tool-call rounds. Defaults to 10 (coding tasks tend to need more rounds than RAG / SQL). | `10` | | `streaming` | `bool` | Stream the final answer when no schema is set. Defaults to False. | `False` | | `name` | `str` | Module name. | `None` | | `description` | `str` | Module description. | `None` | Source code in `synalinks/src/modules/agents/deep_agent.py` ```` @synalinks_export( [ "synalinks.modules.DeepAgent", "synalinks.DeepAgent", ] ) class DeepAgent(Module): """A coding agent with filesystem and shell access scoped to a workdir. DeepAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires up to six workspace tools: - ``read_file``: read a file with line-based pagination, output prefixed with 1-based line numbers (``cat -n`` style). - ``list_directory``: list entries in a directory. - ``search_files``: glob for files and optionally grep their contents (regex). Combines find and grep in one call. - ``write_file``: overwrite or create a file (gated by ``allow_write``). - ``edit_file``: exact-string replacement, one occurrence at a time (gated by ``allow_write``). - ``run_bash``: run a shell command (gated by ``allow_bash``). The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are ``workdir`` (required) and the safety knobs: ``allow_write``, ``allow_bash``, ``timeout``, ``max_output_chars``. User-supplied ``tools`` are appended to the built-in ones. ## Security model File tools (``read_file`` / ``write_file`` / ``edit_file`` / ``list_directory``) refuse any path that resolves outside the workdir, including ``..`` traversal and absolute paths. Paths are canonicalized via ``Path.resolve()`` (which flattens ``..`` and follows existing symlinks) and then prefix-checked against the resolved workdir, so a symlink-inside-workdir pointing to ``/etc/passwd`` is also caught. File opens use ``O_NOFOLLOW`` where the OS supports it as defense in depth against TOCTOU symlink swaps. The bash tool is **NOT sandboxed**. Its ``cwd`` is the workdir, but the shell can still read or write any path the host process can. If you're running this on untrusted input, run the host process inside a container or other OS-level isolation; the Python layer cannot make ``run_bash`` safe on its own. Disable it with ``allow_bash=False`` when you don't need it. Example: ```python import synalinks import asyncio async def main(): lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.DeepAgent( workdir="/tmp/my_project", language_model=lm, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) messages = synalinks.ChatMessages(messages=[ synalinks.ChatMessage( role="user", content="What's in this directory?", ) ]) result = await agent(messages) print(result.get("messages")[-1].get("content")) asyncio.run(main()) ``` Args: workdir (str): Working directory the agent operates in. Required. Must exist. All file paths supplied by the LM are resolved relative to it and rejected if they escape. allow_write (bool): When ``False``, ``write_file`` and ``edit_file`` are omitted from the tool set. Defaults to ``True``. allow_bash (bool): When ``False``, ``run_bash`` is omitted. Defaults to ``True``. timeout (float): Per-command bash timeout in seconds. Defaults to 30. max_output_chars (int): Cap on characters returned per stream from ``read_file`` (single stream) and ``run_bash`` (stdout and stderr each). Also caps the length of each matching line returned by ``search_files``. Defaults to 10000. max_search_results (int): Cap on entries returned by ``search_files`` (matching files or matching lines). Defaults to 100. tools (list): Additional :class:`Tool` instances (or plain async functions) to expose alongside the built-in tools. Names must not start with ``_`` or collide with built-ins. schema (dict): JSON schema for the final answer. data_model (DataModel): DataModel for the final answer. Mutually exclusive with ``schema``. language_model (LanguageModel): The language model that drives the agent loop. prompt_template (str): Forwarded to the tool-call generator. examples (list): Few-shot examples for the tool-call generator. instructions (str): Override the default system instructions. When omitted, the default is built from the workdir and the configured permissions. final_instructions (str): Instructions for the final-answer generator. Defaults to ``instructions``. temperature (float): LM sampling temperature. Defaults to 0.0. use_inputs_schema (bool): Include the input schema in the prompt. use_outputs_schema (bool): Include the output schema in the prompt. reasoning_effort (str): Forwarded to the generators (for reasoning-capable LMs). use_chain_of_thought (bool): When ``True``, the tool-call generator emits a ``thinking`` field per round. autonomous (bool): When ``True`` (default), the agent runs the tool loop end-to-end. When ``False``, returns one step at a time for human-in-the-loop workflows. return_inputs_with_trajectory (bool): When ``True`` (default), the full message trajectory is included alongside the final answer. max_iterations (int): Maximum number of tool-call rounds. Defaults to 10 (coding tasks tend to need more rounds than RAG / SQL). streaming (bool): Stream the final answer when no ``schema`` is set. Defaults to ``False``. name (str): Module name. description (str): Module description. """ def __init__( self, *, workdir: str, allow_write: bool = True, allow_bash: bool = True, timeout: float = 30.0, max_output_chars: int = 10_000, max_search_results: int = 100, tools: Optional[List] = None, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions: Optional[str] = None, final_instructions: Optional[str] = None, temperature: float = 0.0, use_inputs_schema: bool = False, use_outputs_schema: bool = False, reasoning_effort: Optional[str] = None, use_chain_of_thought: bool = False, autonomous: bool = True, return_inputs_with_trajectory: bool = True, max_iterations: int = 10, streaming: bool = False, name: Optional[str] = None, description: Optional[str] = None, ): super().__init__(name=name, description=description) if not workdir: raise ValueError("`workdir` is required") resolved_workdir = Path(workdir).resolve() if not resolved_workdir.exists(): raise ValueError(f"workdir does not exist: {workdir}") if not resolved_workdir.is_dir(): raise ValueError(f"workdir is not a directory: {workdir}") self.workdir = str(resolved_workdir) if not isinstance(timeout, (int, float)) or timeout <= 0: raise ValueError(f"`timeout` must be a positive number, got {timeout!r}") self.timeout = float(timeout) if not isinstance(max_output_chars, int) or max_output_chars < 1: raise ValueError( f"`max_output_chars` must be a positive integer, got {max_output_chars!r}" ) self.max_output_chars = max_output_chars if not isinstance(max_search_results, int) or max_search_results < 1: raise ValueError( f"`max_search_results` must be a positive integer, " f"got {max_search_results!r}" ) self.max_search_results = max_search_results self.allow_write = bool(allow_write) self.allow_bash = bool(allow_bash) self.language_model = _get_lm(language_model) if not schema and data_model: schema = data_model.get_schema() self.schema = schema if instructions is None: instructions = get_default_instructions( self.workdir, self.allow_write, self.allow_bash ) self.instructions = instructions self.final_instructions = final_instructions self.prompt_template = prompt_template self.examples = examples self.temperature = temperature self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.reasoning_effort = reasoning_effort self.use_chain_of_thought = use_chain_of_thought self.autonomous = autonomous self.return_inputs_with_trajectory = return_inputs_with_trajectory self.max_iterations = max_iterations self.streaming = streaming builtin_tools = [ Tool(fn) for fn in _build_tools( resolved_workdir, allow_write=self.allow_write, allow_bash=self.allow_bash, timeout=self.timeout, max_output_chars=self.max_output_chars, max_search_results=self.max_search_results, ) ] builtin_names = {t.name for t in builtin_tools} self.extra_tools = list(tools) if tools else [] merged_tools = list(builtin_tools) for extra in self.extra_tools: extra_tool = extra if isinstance(extra, Tool) else Tool(extra) if extra_tool.name in builtin_names: raise ValueError( f"Tool name {extra_tool.name!r} collides with a built-in " f"deep-agent tool. Rename the additional tool." ) merged_tools.append(extra_tool) # Leading-underscore check is centralized in FunctionCallingAgent. self.agent = FunctionCallingAgent( schema=self.schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, final_instructions=self.final_instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, use_chain_of_thought=self.use_chain_of_thought, tools=merged_tools, autonomous=self.autonomous, return_inputs_with_trajectory=self.return_inputs_with_trajectory, max_iterations=self.max_iterations, streaming=self.streaming, name="agent_" + self.name, ) async def call(self, inputs, training=False): return await self.agent(inputs, training=training) async def compute_output_spec(self, inputs, training=False): return await self.agent.compute_output_spec(inputs, training=training) def get_config(self): config = { "workdir": self.workdir, "allow_write": self.allow_write, "allow_bash": self.allow_bash, "timeout": self.timeout, "max_output_chars": self.max_output_chars, "max_search_results": self.max_search_results, "schema": self.schema, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "final_instructions": self.final_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "reasoning_effort": self.reasoning_effort, "use_chain_of_thought": self.use_chain_of_thought, "autonomous": self.autonomous, "return_inputs_with_trajectory": self.return_inputs_with_trajectory, "max_iterations": self.max_iterations, "streaming": self.streaming, "name": self.name, "description": self.description, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object( t if isinstance(t, Tool) else Tool(t) ) for t in self.extra_tools ] } return {**config, **language_model_config, **tools_config} @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) tools = [ serialization_lib.deserialize_synalinks_object(t) for t in config.pop("tools", []) ] return cls( language_model=language_model, tools=tools, **config, ) ```` ## `PathTraversalError` Bases: `ValueError` Raised when a tool argument resolves outside the configured workdir. Source code in `synalinks/src/modules/agents/deep_agent.py` ``` class PathTraversalError(ValueError): """Raised when a tool argument resolves outside the configured workdir.""" ``` ## `get_default_instructions(workdir, allow_write, allow_bash)` Default system instructions for the deep agent. Parameters: | Name | Type | Description | Default | | ------------- | ------ | ------------------------------------------------------------------------------------------------------------ | ---------- | | `workdir` | `str` | Absolute path of the agent's working directory. Embedded in the prompt so the LM knows where it's operating. | *required* | | `allow_write` | `bool` | Whether write/edit tools are enabled. | *required* | | `allow_bash` | `bool` | Whether the bash tool is enabled. | *required* | Returns: | Type | Description | | ----- | ------------------------------------------------------- | | `str` | A prompt string describing the tool plan and the safety | | `str` | constraints currently in effect. | Source code in `synalinks/src/modules/agents/deep_agent.py` ``` def get_default_instructions( workdir: str, allow_write: bool, allow_bash: bool, ) -> str: """Default system instructions for the deep agent. Args: workdir: Absolute path of the agent's working directory. Embedded in the prompt so the LM knows where it's operating. allow_write: Whether write/edit tools are enabled. allow_bash: Whether the bash tool is enabled. Returns: A prompt string describing the tool plan and the safety constraints currently in effect. """ capabilities = ["read_file", "list_directory", "search_files"] if allow_write: capabilities.extend(["write_file", "edit_file"]) if allow_bash: capabilities.append("run_bash") extras = [] if not allow_write: extras.append("Write/edit tools are DISABLED — this is a read-only session.") if not allow_bash: extras.append("Shell execution is DISABLED.") constraints = ("\n".join(f"- {line}" for line in extras) + "\n") if extras else "" return f""" You are a software engineering assistant with filesystem and shell access scoped to a single working directory. Workdir: {workdir} Available tools: {capabilities} Plan: 1. Use `list_directory` to discover what's in the workdir. 2. Use `search_files` to locate files by glob and/or grep their contents. 3. Use `read_file` to read files. Output is line-numbered (``cat -n`` style). Pages of lines via `offset` / `limit`; raise `offset` to read further into the file. 4. {"Use `edit_file` for surgical changes (preferred over `write_file`)." if allow_write else "Reads only — do not propose write operations."} 5. {"Use `run_bash` for builds, tests, and other shell work." if allow_bash else "Shell is disabled — solve tasks with file tools only."} 6. Once you have the answer, stop calling tools and respond. Constraints: - All paths must stay inside the workdir. ``..`` traversal and absolute paths that escape the workdir are rejected. {constraints}""".strip() ``` ## `FunctionCallingAgent` Bases: `Module` A trainable parallel function calling agent. The agent has 2 different modes: - Autonomous: It will execute tools as soon as called. - Non-autonomous: It will return the tool arguments as a ChatMessage. In *autonomous* mode, the agent accept **any kind of data model input** and perform a final inference to eventually format its final answer if a `data_model` or `schema` is provided. Example: ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class NumericalFinalAnswer(synalinks.DataModel): final_answer: float = synalinks.Field( description="The correct final numerical answer", ) async def calculate(expression: str): """Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. """ if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": ( "Error: invalid characters in expression. " "The expression can only contain numbers, operators (+, -, *, /)," " parentheses, and spaces NOT letters." ), } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } async def main(): language_model = synalinks.LanguageModel(model="ollama/mistral") tools = [ synalinks.Tool(calculate), ] inputs = synalinks.Input(data_model=Query) outputs = await synalinks.FunctionCallingAgent( data_model=NumericalFinalAnswer, tools=tools, language_model=language_model, max_iterations=5, autonomous=True, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="math_agent", description="A math agent", ) input_query = Query(query="How much is 152648 + 485?") response = await agent(input_query) print(response.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` Result: ``` { "query": "How much is 152648 + 485?", "messages": [ { "role": "assistant", "content": "Performing simple addition", "tool_calls": [ { "id": "92a3657c-1a45-46e6-8173-df4255b8423b", "name": "calculate", "arguments": { "expression": "152648 + 485" } } ] }, { "role": "tool", "content": { "result": 153133.0, "log": "Successfully executed" }, "tool_call_id": "92a3657c-1a45-46e6-8173-df4255b8423b", }, { "role": "assistant", "content": "The user has asked for a simple addition " "calculation. The assistant used the 'calculate' tool to " "perform this task, and the result was returned successfully.", } ], "final_answer": 153133.0 } ``` In *non-autonomous* mode (also called human in the loop or interactive mode), the user needs to validate/edit the tool arguments and send it back to the agent. In this mode, the agent requires an `ChatMessages` data model as input and output an `ChatMessage` (or `ChatMessages` if `return_inputs_with_trajectory` is true) back to the user. In that case, the agent ignore the `max_iterations` argument, as it will only perform one **step at a time**. Example: ``` import synalinks import asyncio MAX_ITERATIONS = 5 async def calculate(expression: str): """Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. """ if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": ( "Error: invalid characters in expression. " "The expression can only contain numbers, operators (+, -, *, /)," " parentheses, and spaces NOT letters." ), } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) tools = [ synalinks.Tool(calculate), ] inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.FunctionCallingAgent( tools=tools, language_model=language_model, return_inputs_with_trajectory=True, autonomous=False, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="math_agent", description="A math agent", ) input_messages = synalinks.ChatMessages( messages=[ synalinks.ChatMessage( role="user", content="How much is 152648 + 485?", ) ] ) for i in range(MAX_ITERATIONS): response = await agent(input_messages) print("Assistant response (with trajectory):") print(response.prettify_json()) assistant_message = response.get("messages")[-1] if not assistant_message.get("tool_calls"): break # We stop the loop if the agent didn't call any tool # Validate the tool calls arguments (with an UI or CLI) # Then re-inject the validated assistant response in the input_messages # The corresponding tools will be called by the agent # Here we assume everything is okay for the purpose of the demo. input_messages.messages.append(assistant_message) if __name__ == "__main__": asyncio.run(main()) ``` The FunctionCallingAgent is compatible with MCP tools, here is an example on how to use it: ``` import synalinks import asyncio import litellm class Query(synalinks.DataModel): """Input query data model""" query: str = synalinks.Field( description="The user query", ) class FinalAnswer(synalinks.DataModel): """Final answer data model""" answer: str = synalinks.Field( description="The correct final answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) mcp_client = synalinks.MultiServerMCPClient( { "math": { "url": "http://localhost:8183/mcp/", "transport": "streamable_http", }, } ) tools = await mcp_client.get_tools() inputs = synalinks.Input(data_model=Query) outputs = await synalinks.FunctionCallingAgent( data_model=FinalAnswer, tools=tools, language_model=language_model, max_iterations=5, autonomous=True, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="mcp_math_agent", description="A math agent that can use an external calculator", ) input_query = Query(query="How much is 152648 + 485?") response = await agent(input_query) print(response.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | ------------------------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | `schema` | `dict` | The target JSON schema. If not provided use the data_model to infer it. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The jinja2 prompt template. | `None` | | `examples` | `list` | The default list of examples, the examples are a list of tuples containing input/output JSON pairs. | `None` | | `instructions` | `str` | The default instructions being a string containing instructions for the language model. | `None` | | `final_instructions` | `str` | Optional. The instructions for the final generator that produces the structured output. If not provided, use the same instructions as the tool calls generator. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False). | `False` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `use_chain_of_thought` | `bool` | Optional. Use chain of thought for tool calls generator, usefull when using non-reasoning models. Default False. | `False` | | `tools` | `list` | The list of Tool or MCP tools available to the agent. | `None` | | `autonomous` | `bool` | Optional. Whether the agent runs autonomously (executing tools automatically) or in interactive mode where the user validates tool arguments before execution (Default to True). | `True` | | `return_inputs_with_trajectory` | `bool` | Optional. Whether or not to return the inputs concatenated with the full message trajectory (Default to True). | `True` | | `max_iterations` | `int` | Optional. The maximum number of tool calling iterations in autonomous mode (Default to 5). Ignored in interactive mode. | `5` | | `streaming` | `bool` | Optional. If true, stream the final answer. Only takes effect when no data_model/schema is provided. When streaming, the agent returns a StreamingIterator instead of a wrapped trajectory; the caller iterates it to consume the final response. (Default to False). | `False` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | Source code in `synalinks/src/modules/agents/function_calling_agent.py` ```` @synalinks_export( [ "synalinks.modules.FunctionCallingAgent", "synalinks.FunctionCallingAgent", ] ) class FunctionCallingAgent(Module): """A trainable parallel function calling agent. The agent has 2 different modes: - Autonomous: It will execute tools as soon as called. - Non-autonomous: It will return the tool arguments as a ChatMessage. In *autonomous* mode, the agent accept **any kind of data model input** and perform a final inference to eventually format its final answer if a `data_model` or `schema` is provided. Example: ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class NumericalFinalAnswer(synalinks.DataModel): final_answer: float = synalinks.Field( description="The correct final numerical answer", ) async def calculate(expression: str): \"""Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. \""" if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": ( "Error: invalid characters in expression. " "The expression can only contain numbers, operators (+, -, *, /)," " parentheses, and spaces NOT letters." ), } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } async def main(): language_model = synalinks.LanguageModel(model="ollama/mistral") tools = [ synalinks.Tool(calculate), ] inputs = synalinks.Input(data_model=Query) outputs = await synalinks.FunctionCallingAgent( data_model=NumericalFinalAnswer, tools=tools, language_model=language_model, max_iterations=5, autonomous=True, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="math_agent", description="A math agent", ) input_query = Query(query="How much is 152648 + 485?") response = await agent(input_query) print(response.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` Result: ```json { "query": "How much is 152648 + 485?", "messages": [ { "role": "assistant", "content": "Performing simple addition", "tool_calls": [ { "id": "92a3657c-1a45-46e6-8173-df4255b8423b", "name": "calculate", "arguments": { "expression": "152648 + 485" } } ] }, { "role": "tool", "content": { "result": 153133.0, "log": "Successfully executed" }, "tool_call_id": "92a3657c-1a45-46e6-8173-df4255b8423b", }, { "role": "assistant", "content": "The user has asked for a simple addition " "calculation. The assistant used the 'calculate' tool to " "perform this task, and the result was returned successfully.", } ], "final_answer": 153133.0 } ``` In *non-autonomous* mode (also called human in the loop or interactive mode), the user needs to validate/edit the tool arguments and send it back to the agent. In this mode, the agent requires an `ChatMessages` data model as input and output an `ChatMessage` (or `ChatMessages` if `return_inputs_with_trajectory` is true) back to the user. In that case, the agent ignore the `max_iterations` argument, as it will only perform one **step at a time**. Example: ```python import synalinks import asyncio MAX_ITERATIONS = 5 async def calculate(expression: str): \"""Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. \""" if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": ( "Error: invalid characters in expression. " "The expression can only contain numbers, operators (+, -, *, /)," " parentheses, and spaces NOT letters." ), } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) tools = [ synalinks.Tool(calculate), ] inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.FunctionCallingAgent( tools=tools, language_model=language_model, return_inputs_with_trajectory=True, autonomous=False, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="math_agent", description="A math agent", ) input_messages = synalinks.ChatMessages( messages=[ synalinks.ChatMessage( role="user", content="How much is 152648 + 485?", ) ] ) for i in range(MAX_ITERATIONS): response = await agent(input_messages) print("Assistant response (with trajectory):") print(response.prettify_json()) assistant_message = response.get("messages")[-1] if not assistant_message.get("tool_calls"): break # We stop the loop if the agent didn't call any tool # Validate the tool calls arguments (with an UI or CLI) # Then re-inject the validated assistant response in the input_messages # The corresponding tools will be called by the agent # Here we assume everything is okay for the purpose of the demo. input_messages.messages.append(assistant_message) if __name__ == "__main__": asyncio.run(main()) ``` The FunctionCallingAgent is compatible with MCP tools, here is an example on how to use it: ```python import synalinks import asyncio import litellm class Query(synalinks.DataModel): \"""Input query data model\""" query: str = synalinks.Field( description="The user query", ) class FinalAnswer(synalinks.DataModel): \"""Final answer data model\""" answer: str = synalinks.Field( description="The correct final answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) mcp_client = synalinks.MultiServerMCPClient( { "math": { "url": "http://localhost:8183/mcp/", "transport": "streamable_http", }, } ) tools = await mcp_client.get_tools() inputs = synalinks.Input(data_model=Query) outputs = await synalinks.FunctionCallingAgent( data_model=FinalAnswer, tools=tools, language_model=language_model, max_iterations=5, autonomous=True, )(inputs) agent = synalinks.Program( inputs=inputs, outputs=outputs, name="mcp_math_agent", description="A math agent that can use an external calculator", ) input_query = Query(query="How much is 152648 + 485?") response = await agent(input_query) print(response.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` Args: schema (dict): The target JSON schema. If not provided use the `data_model` to infer it. data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data model for structured output. language_model (LanguageModel): The language model to use. prompt_template (str): The jinja2 prompt template. examples (list): The default list of examples, the examples are a list of tuples containing input/output JSON pairs. instructions (str): The default instructions being a string containing instructions for the language model. final_instructions (str): Optional. The instructions for the final generator that produces the structured output. If not provided, use the same instructions as the tool calls generator. temperature (float): Optional. The temperature for the LM call. use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False). reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). use_chain_of_thought (bool): Optional. Use chain of thought for tool calls generator, usefull when using non-reasoning models. Default False. tools (list): The list of `Tool` or MCP tools available to the agent. autonomous (bool): Optional. Whether the agent runs autonomously (executing tools automatically) or in interactive mode where the user validates tool arguments before execution (Default to True). return_inputs_with_trajectory (bool): Optional. Whether or not to return the inputs concatenated with the full message trajectory (Default to True). max_iterations (int): Optional. The maximum number of tool calling iterations in autonomous mode (Default to 5). Ignored in interactive mode. streaming (bool): Optional. If true, stream the final answer. Only takes effect when no `data_model`/`schema` is provided. When streaming, the agent returns a `StreamingIterator` instead of a wrapped trajectory; the caller iterates it to consume the final response. (Default to False). name (str): Optional. The name of the module. description (str): Optional. The description of the module. """ def __init__( self, *, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions=None, final_instructions=None, temperature=0.0, use_inputs_schema=False, use_outputs_schema=False, reasoning_effort=None, use_chain_of_thought=False, tools=None, autonomous=True, return_inputs_with_trajectory=True, max_iterations=5, streaming=False, name=None, description=None, ): super().__init__( name=name, description=description, ) if not schema and data_model: schema = data_model.get_schema() self.schema = schema self.prompt_template = prompt_template if not instructions: instructions = get_default_instructions() self.instructions = instructions if not final_instructions: self.final_instructions = instructions else: self.final_instructions = final_instructions self.temperature = temperature self.examples = examples self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.reasoning_effort = reasoning_effort self.use_chain_of_thought = use_chain_of_thought self.language_model = _get_lm(language_model) self.tools = {} if not tools: raise ValueError("You must set the `tools` argument") for tool in tools: if tool.name.startswith("_"): raise ValueError( f"Tool name {tool.name!r} starts with an underscore. " f"Tools exposed to the LM must have public names — " f"rename the function or pass an explicit `name=` to " f"Tool(...)." ) self.tools[tool.name] = tool tool_calls_schema = dynamic_tool_calls(tools=tools) self.autonomous = autonomous self.return_inputs_with_trajectory = return_inputs_with_trajectory self.max_iterations = max_iterations # Streaming is only meaningful for the final answer (no schema). if self.schema and streaming: streaming = False self.streaming = streaming if use_chain_of_thought: self.tool_calls_generator = ChainOfThought( schema=tool_calls_schema, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, language_model=self.language_model, name="tool_calls_generator_" + self.name, ) else: self.tool_calls_generator = Generator( schema=tool_calls_schema, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, language_model=self.language_model, name="tool_calls_generator_" + self.name, ) self.final_generator = Generator( schema=self.schema, language_model=self.language_model, instructions=self.final_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, return_inputs=False, streaming=self.streaming, name="final_generator_" + self.name, ) async def call(self, inputs, training=False): if not inputs: return None if self.autonomous: if not is_chat_messages(inputs): trajectory = await ops.concat( inputs, ChatMessages(), name="trajectory_" + self.name, ) else: trajectory = inputs else: if not is_chat_messages(inputs): raise ValueError( "In interactive mode, the FunctionCallingAgent " "needs an ChatMessages-like data model as inputs" ) trajectory = inputs agent_messages = trajectory.get("messages") if self.autonomous: for i in range(self.max_iterations): tool_calls = await self.tool_calls_generator(trajectory) if not tool_calls: assistant_message = ChatMessage( role=ChatRole.ASSISTANT, content="Something went wrong while trying to decide " "the next action.", ) agent_messages.append(assistant_message.get_json()) break assistant_message = ChatMessage( role=ChatRole.ASSISTANT, content=tool_calls.get("thinking", ""), ) if not tool_calls.get("tool_calls"): break tasks = [] tool_calls_ids = [] for tool_call in tool_calls.get("tool_calls"): tool_name = tool_call.get("tool_name") tools_arguments = out_mask_json(tool_call, mask=["tool_name"]) tool_call_id = str(uuid.uuid4()) tool_calls_ids.append(tool_call_id) assistant_message.tool_calls.append( ToolCall( id=tool_call_id, name=tool_name, arguments=tools_arguments, ) ) tasks.append(self.tools[tool_name](**tools_arguments)) agent_messages.append(assistant_message.get_json()) tool_results = await asyncio.gather(*tasks, return_exceptions=True) for j, tool_result in enumerate(tool_results): tool_call_id = tool_calls_ids[j] if isinstance(tool_result, Exception): agent_messages.append( ChatMessage( role=ChatRole.TOOL, tool_call_id=tool_call_id, content="error: %s" % str(tool_result), ).get_json() ) else: # Handle both JsonDataModel and raw dict results content = ( tool_result.get_json() if hasattr(tool_result, "get_json") else tool_result ) agent_messages.append( ChatMessage( role=ChatRole.TOOL, tool_call_id=tool_call_id, content=content, ).get_json() ) trajectory.update({"messages": agent_messages}) if self.schema: # With schema: return the structured data model final_result = await self.final_generator(trajectory) if self.return_inputs_with_trajectory: # Combine trajectory with structured output validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return await ops.concat( JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ), final_result, name=self.name, ) else: return final_result else: # Without schema: append the ChatMessage to the trajectory final_result = await self.final_generator(trajectory) if self.streaming and not training: # Streaming bypasses trajectory wrapping — caller iterates # the StreamingIterator directly to consume final answer. return final_result if final_result: agent_messages.append(final_result.get_json()) validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) else: # Track new messages generated in this step new_messages = [] if len(agent_messages) > 0: if agent_messages[-1].get("role") == ChatRole.ASSISTANT: tasks = [] tool_calls_ids = [] tool_calls = agent_messages[-1].get("tool_calls") for tool_call in tool_calls: tool_name = tool_call.get("name") tools_arguments = tool_call.get("arguments") tool_call_id = tool_call.get("id") tool_calls_ids.append(tool_call_id) tasks.append(self.tools[tool_name](**tools_arguments)) tool_results = await asyncio.gather(*tasks, return_exceptions=True) for j, tool_result in enumerate(tool_results): tool_call_id = tool_calls_ids[j] if isinstance(tool_result, Exception): tool_message = ChatMessage( role=ChatRole.TOOL, tool_call_id=tool_call_id, content="error: %s" % str(tool_result), ) else: # Handle both JsonDataModel and raw dict results content = ( tool_result.get_json() if hasattr(tool_result, "get_json") else tool_result ) tool_message = ChatMessage( role=ChatRole.TOOL, tool_call_id=tool_call_id, content=content, ) agent_messages.append(tool_message.get_json()) new_messages.append(tool_message) trajectory.update({"messages": agent_messages}) tool_calls = await self.tool_calls_generator(trajectory) # If no tool calls, call final generator # without appending the empty tool calls message if not tool_calls or not tool_calls.get("tool_calls"): final_result = await self.final_generator(trajectory) if self.streaming and not training and not self.schema: return final_result if self.schema: # Combine messages with structured output if self.return_inputs_with_trajectory: validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) else: validated_messages = ChatMessages(messages=new_messages) return await ops.concat( JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ), final_result, name=self.name, ) else: # Append ChatMessage to messages if final_result: if self.return_inputs_with_trajectory: agent_messages.append(final_result.get_json()) validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) else: new_messages.append(ChatMessage(**final_result.get_json())) validated_messages = ChatMessages(messages=new_messages) else: if self.return_inputs_with_trajectory: validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) else: validated_messages = ChatMessages(messages=new_messages) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) assistant_message = ChatMessage( role=ChatRole.ASSISTANT, content=tool_calls.get("thinking", ""), tool_calls=[], ) for tool_call in tool_calls.get("tool_calls", []): tool_name = tool_call.get("tool_name") tools_arguments = out_mask_json(tool_call, mask=["tool_name"]) tool_call_id = str(uuid.uuid4()) assistant_message.tool_calls.append( ToolCall( id=tool_call_id, name=tool_name, arguments=tools_arguments, ) ) agent_messages.append(assistant_message.get_json()) new_messages.append(assistant_message) trajectory.update({"messages": agent_messages}) if self.return_inputs_with_trajectory: # Convert dict messages to ChatMessage objects to avoid Pydantic warnings validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) else: return JsonDataModel( json=ChatMessages(messages=new_messages).get_json(), schema=ChatMessages.get_schema(), name=self.name, ) async def compute_output_spec(self, inputs, training=False): if self.autonomous: _ = await self.tool_calls_generator(inputs) if self.schema: if self.return_inputs_with_trajectory: return await ops.logical_and( SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ), SymbolicDataModel( schema=self.schema, name="final_generator_" + self.name, ), name=self.name, ) else: return await self.final_generator(inputs) else: # Without schema: return ChatMessages with final message appended _ = await self.final_generator(inputs) return SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ) else: if not is_chat_messages(inputs): raise ValueError( "In interactive mode, the FunctionCallingAgent " "needs an ChatMessages-like data model as inputs" ) _ = await self.tool_calls_generator(inputs) # The output can be either the final generator output (when no tool calls) # or ChatMessages (when there are tool calls) # We use ChatMessages as the output spec since it's the common case return SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ) def get_config(self): config = { "schema": self.schema, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "final_instructions": self.final_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "reasoning_effort": self.reasoning_effort, "autonomous": self.autonomous, "max_iterations": self.max_iterations, "return_inputs_with_trajectory": self.return_inputs_with_trajectory, "streaming": self.streaming, "name": self.name, "description": self.description, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object(tool) for tool in self.tools.values() ] } return {**config, **language_model_config, **tools_config} @classmethod def from_config(cls, config): tools = [ serialization_lib.deserialize_synalinks_object(tool) for tool in config.pop("tools") ] language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) return cls( language_model=language_model, tools=tools, **config, ) ```` ## `get_default_instructions()` The default parallel function calling agent instructions. Source code in `synalinks/src/modules/agents/function_calling_agent.py` ``` def get_default_instructions(): """The default parallel function calling agent instructions.""" return """ Think step by step: Use the thinking field to elaborate what you observe and what do you need to accomplish next. Reflect on prior steps: Review your previous actions and their outcomes to avoid unnecessary repetition. Avoid unnecessary actions: If you already have enough information to complete the user task, return an empty tool calls array. """.strip() ``` ## `CodeStep` Bases: `DataModel` One turn of Python-snippet reasoning: a Python snippet to execute next. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` class CodeStep(DataModel): """One turn of Python-snippet reasoning: a Python snippet to execute next.""" python_code: str = Field( description=( "A Python snippet to execute in the persistent sandbox. The user " "input is bound as a dict named `inputs`. State persists across " "turns, so variables, functions and imports stay defined. Call the " "`submit` tool to terminate the run with the final answer" ) ) ``` ## `InputsSummary` Bases: `DataModel` Metadata-only view of the user input bound as `inputs` in the sandbox. Only per-field previews and sizes are surfaced here to keep the prompt small when the input contains long documents or large collections. Read the **full** values through `inputs[field_name]` inside your code — the sandbox namespace holds the untruncated data. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` class InputsSummary(DataModel): """Metadata-only view of the user input bound as ``inputs`` in the sandbox. Only per-field previews and sizes are surfaced here to keep the prompt small when the input contains long documents or large collections. Read the **full** values through ``inputs[field_name]`` inside your code — the sandbox namespace holds the untruncated data. """ fields: list[dict] = Field( default=[], description=( "One entry per top-level input field, each with `name`, `type`, " "`size` (len of string/list/dict, else null), `preview`, and " "`truncated` (true when preview omits part of the value). Read " "the complete value from the sandbox via `inputs[name]`." ), ) ``` ## `IterationInfo` Bases: `DataModel` Budget info visible to the code generator on each turn. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` class IterationInfo(DataModel): """Budget info visible to the code generator on each turn.""" iteration: str = Field( description=( "Current turn position as `/`. Budget your " "remaining turns accordingly, batch work into fewer snippets " "when turns are running out." ), ) ``` ## `RecursiveLanguageModelAgent` Bases: `Module` A recursive-language-model agent. An agent that emits Python snippets each turn and executes them in a persistent `Monty `\_ REPL sandbox. State (variables, imports, function definitions) accumulates across turns so the agent can build up intermediate values, probe data, and iterate. When `recursive=True` (the default), two extra helpers are exposed inside the sandbox: `llm_query(prompt)` and `llm_query_batched(prompts)`. The agent then treats long inputs as an *external environment*, it writes Python that slices, filters, and aggregates the data, and recursively delegates semantic work to a sub-LM only on the snippets it cares about. Compared to feeding a long document straight into the primary LM, this trades a single huge context for many small ones, which both fits inside provider limits and reduces the chance of long-context regressions. When `recursive=False`, the agent runs without the sub-LM helpers, useful when the task is purely computational and recursion would only add cost. Bound user tools (if any) appear inside the sandbox as global **async** callables; scripts must `await` them inside an `async def` and drive with `asyncio.run(...)`. Termination: the LM calls the always-present `submit` tool with the final payload. If `max_iterations` is reached without `submit`, a final inference step formats the accumulated trajectory into the target `schema` / `data_model`. Empty `python_code` snippets are not termination signals, the loop feeds back a reminder and keeps going. The `llm_query` quota is per-call: every invocation of this agent gets a fresh budget of `max_llm_calls` sub-LM queries, and concurrent invocations of the *same* agent instance each get an independent budget — the counter and lock are built inside `call()` and never shared across runs. Example: ``` import synalinks import asyncio class Doc(synalinks.DataModel): text: str class Answer(synalinks.DataModel): answer: str async def main(): primary = synalinks.LanguageModel(model="openai/gpt-4o") cheap = synalinks.LanguageModel(model="openai/gpt-4o-mini") inputs = synalinks.Input(data_model=Doc) outputs = await synalinks.RLM( data_model=Answer, language_model=primary, sub_language_model=cheap, max_iterations=8, max_llm_calls=20, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) long_text = open("book.txt").read() result = await agent(Doc(text=long_text)) print(result.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` References - [Recursive Language Models](https://arxiv.org/abs/2512.24601) Parameters: | Name | Type | Description | Default | | ------------------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | `schema` | `dict` | Optional. The target JSON schema for the final structured answer. If not provided, use data_model to infer it. When both are omitted, the agent runs in schemaless mode, the final generator emits a ChatMessage that is appended to the trajectory, and call returns the ChatMessages trajectory directly. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `language_model` | `LanguageModel` | The language model driving the per-turn code generator and the final-formatting step. | `None` | | `sub_language_model` | `LanguageModel` | Optional. The language model used by llm_query and llm_query_batched when recursive=True. Defaults to language_model, pass a cheaper / smaller model here when the recursive sub-queries don't need the primary LM's full capability. Ignored when recursive=False. | `None` | | `recursive` | `bool` | Optional. If True (default), expose llm_query and llm_query_batched inside the sandbox and use the recursive instructions. If False, run without the sub-LM helpers. | `True` | | `tools` | `list` | Optional. Extra :class:Tool instances exposed to the sandbox in addition to submit (and llm_query / llm_query_batched when recursive=True). The names submit, llm_query, and llm_query_batched are always reserved at construction time, even when recursive=False, so tool naming stays stable across the two modes. Naming gotcha: each tool is registered under tool.name == tool.\_func.__name__. Tool(\_my_helper) shows up inside the script as \_my_helper. Rename the function rather than relying on an alias. | `None` | | `prompt_template` | `str` | Optional. Prompt template forwarded to the per-turn code generator. | `None` | | `examples` | `list` | Optional. Examples forwarded to the per-turn code generator. | `None` | | `instructions` | `str` | Optional. Instructions for the per-turn code generator. Defaults to either :func:get_recursive_instructions (when recursive=True, with the {max_llm_calls} placeholder substituted) or :func:get_default_instructions otherwise. | `None` | | `final_instructions` | `str` | Optional. Instructions for the final answer generator. Defaults to instructions. | `None` | | `temperature` | `float` | Optional. Sampling temperature (Default 0.0). | `0.0` | | `use_inputs_schema` | `bool` | Optional. Feed the input schema to the generator prompt (Default False). | `False` | | `use_outputs_schema` | `bool` | Optional. Feed the output schema to the generator prompt (Default False). | `False` | | `reasoning_effort` | `str` | Optional. One of 'minimal', 'low', 'medium', 'high', 'disable', 'none', None. Default None. | `None` | | `use_chain_of_thought` | `bool` | Optional. Wrap the per-turn generator in ChainOfThought so it emits a thinking field alongside code. Default False. | `False` | | `autonomous` | `bool` | Optional. If True (default), run the full code/execute/observe loop until the LM calls submit or max_iterations is reached, then produce a structured final answer. If False, require a ChatMessages input and execute a single code turn per call, returning the updated trajectory, suitable for human-in-the-loop use. For cross-call REPL state in interactive mode, hand a Sandbox to call via the sandbox kwarg; the agent itself stays stateless. | `True` | | `timeout` | `int` | Per-turn execution budget in seconds (Default 60). Recursive sub-LM calls dominate per-turn wall time; llm_query_batched of even a handful of prompts can take several seconds. Snippets that exceed the budget turn into an observation so the LM can recover on the next turn. | `60` | | `max_iterations` | `int` | Maximum number of code-execution turns before forcing the final answer step (Default 20). | `20` | | `max_llm_calls` | `int` | Hard cap on sub-LM calls per agent invocation, shared between llm_query and llm_query_batched (Default 50). Once the budget is spent, further calls return an error string instead of a response so the LM can fall back to code-side aggregation. Ignored when recursive=False. | `50` | | `max_output_chars` | `int` | Maximum characters to include from REPL output in the per-turn observation (Default 10_000). Anything beyond is truncated with a … (truncated, N chars omitted) marker so a single noisy turn cannot blow up the trajectory. | `10000` | | `return_inputs_with_trajectory` | `bool` | Optional. Whether to return the full trajectory alongside the final answer (Default True). | `True` | | `sandbox` | `Sandbox` | Optional. A pre-built Sandbox instance to reuse across calls. When supplied, the agent will not build its own sandbox at call() time and sandbox_type is derived from type(sandbox). Pass this when the caller owns the sandbox lifecycle (e.g. interactive sessions where REPL state must persist across calls). When omitted, a fresh sandbox of sandbox_type is built per call. | `None` | | `sandbox_type` | `type` | Optional. The Sandbox subclass to instantiate when no sandbox is supplied (here or to call()). Defaults to MontySandbox, or to type(sandbox) when sandbox is given. Any Sandbox subclass whose __init__ accepts (timeout=..., name=...) works; register custom subclasses with @register_synalinks_serializable so they round-trip through get_config / from_config. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ```` @synalinks_export( [ "synalinks.modules.RecursiveLanguageModelAgent", "synalinks.RecursiveLanguageModelAgent", "synalinks.modules.RLM", "synalinks.RLM", ] ) class RecursiveLanguageModelAgent(Module): """A recursive-language-model agent. An agent that emits Python snippets each turn and executes them in a persistent `Monty `_ REPL sandbox. State (variables, imports, function definitions) accumulates across turns so the agent can build up intermediate values, probe data, and iterate. When ``recursive=True`` (the default), two extra helpers are exposed inside the sandbox: ``llm_query(prompt)`` and ``llm_query_batched(prompts)``. The agent then treats long inputs as an *external environment*, it writes Python that slices, filters, and aggregates the data, and recursively delegates semantic work to a sub-LM only on the snippets it cares about. Compared to feeding a long document straight into the primary LM, this trades a single huge context for many small ones, which both fits inside provider limits and reduces the chance of long-context regressions. When ``recursive=False``, the agent runs without the sub-LM helpers, useful when the task is purely computational and recursion would only add cost. Bound user tools (if any) appear inside the sandbox as global **async** callables; scripts must ``await`` them inside an ``async def`` and drive with ``asyncio.run(...)``. Termination: the LM calls the always-present ``submit`` tool with the final payload. If ``max_iterations`` is reached without ``submit``, a final inference step formats the accumulated trajectory into the target ``schema`` / ``data_model``. Empty ``python_code`` snippets are not termination signals, the loop feeds back a reminder and keeps going. The ``llm_query`` quota is per-call: every invocation of this agent gets a fresh budget of ``max_llm_calls`` sub-LM queries, and concurrent invocations of the *same* agent instance each get an independent budget — the counter and lock are built inside ``call()`` and never shared across runs. Example: ```python import synalinks import asyncio class Doc(synalinks.DataModel): text: str class Answer(synalinks.DataModel): answer: str async def main(): primary = synalinks.LanguageModel(model="openai/gpt-4o") cheap = synalinks.LanguageModel(model="openai/gpt-4o-mini") inputs = synalinks.Input(data_model=Doc) outputs = await synalinks.RLM( data_model=Answer, language_model=primary, sub_language_model=cheap, max_iterations=8, max_llm_calls=20, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) long_text = open("book.txt").read() result = await agent(Doc(text=long_text)) print(result.prettify_json()) if __name__ == "__main__": asyncio.run(main()) ``` References: - [Recursive Language Models](https://arxiv.org/abs/2512.24601) Args: schema (dict): Optional. The target JSON schema for the final structured answer. If not provided, use ``data_model`` to infer it. When both are omitted, the agent runs in **schemaless** mode, the final generator emits a ``ChatMessage`` that is appended to the trajectory, and ``call`` returns the ``ChatMessages`` trajectory directly. data_model (DataModel | SymbolicDataModel | JsonDataModel): Optional. The target data model for the final answer. language_model (LanguageModel): The language model driving the per-turn code generator and the final-formatting step. sub_language_model (LanguageModel): Optional. The language model used by ``llm_query`` and ``llm_query_batched`` when ``recursive=True``. Defaults to ``language_model``, pass a cheaper / smaller model here when the recursive sub-queries don't need the primary LM's full capability. Ignored when ``recursive=False``. recursive (bool): Optional. If ``True`` (default), expose ``llm_query`` and ``llm_query_batched`` inside the sandbox and use the recursive instructions. If ``False``, run without the sub-LM helpers. tools (list): Optional. Extra :class:`Tool` instances exposed to the sandbox in addition to ``submit`` (and ``llm_query`` / ``llm_query_batched`` when ``recursive=True``). The names ``submit``, ``llm_query``, and ``llm_query_batched`` are always reserved at construction time, even when ``recursive=False``, so tool naming stays stable across the two modes. **Naming gotcha**: each tool is registered under ``tool.name == tool._func.__name__``. ``Tool(_my_helper)`` shows up inside the script as ``_my_helper``. Rename the function rather than relying on an alias. prompt_template (str): Optional. Prompt template forwarded to the per-turn code generator. examples (list): Optional. Examples forwarded to the per-turn code generator. instructions (str): Optional. Instructions for the per-turn code generator. Defaults to either :func:`get_recursive_instructions` (when ``recursive=True``, with the ``{max_llm_calls}`` placeholder substituted) or :func:`get_default_instructions` otherwise. final_instructions (str): Optional. Instructions for the final answer generator. Defaults to ``instructions``. temperature (float): Optional. Sampling temperature (Default 0.0). use_inputs_schema (bool): Optional. Feed the input schema to the generator prompt (Default False). use_outputs_schema (bool): Optional. Feed the output schema to the generator prompt (Default False). reasoning_effort (str): Optional. One of ``'minimal'``, ``'low'``, ``'medium'``, ``'high'``, ``'disable'``, ``'none'``, ``None``. Default ``None``. use_chain_of_thought (bool): Optional. Wrap the per-turn generator in ChainOfThought so it emits a ``thinking`` field alongside ``code``. Default ``False``. autonomous (bool): Optional. If ``True`` (default), run the full code/execute/observe loop until the LM calls ``submit`` or ``max_iterations`` is reached, then produce a structured final answer. If ``False``, require a ``ChatMessages`` input and execute a single code turn per call, returning the updated trajectory, suitable for human-in-the-loop use. For cross-call REPL state in interactive mode, hand a ``Sandbox`` to ``call`` via the ``sandbox`` kwarg; the agent itself stays stateless. timeout (int): Per-turn execution budget in seconds (Default 60). Recursive sub-LM calls dominate per-turn wall time; ``llm_query_batched`` of even a handful of prompts can take several seconds. Snippets that exceed the budget turn into an observation so the LM can recover on the next turn. max_iterations (int): Maximum number of code-execution turns before forcing the final answer step (Default 20). max_llm_calls (int): Hard cap on sub-LM calls per agent invocation, shared between ``llm_query`` and ``llm_query_batched`` (Default 50). Once the budget is spent, further calls return an error string instead of a response so the LM can fall back to code-side aggregation. Ignored when ``recursive=False``. max_output_chars (int): Maximum characters to include from REPL output in the per-turn observation (Default 10_000). Anything beyond is truncated with a ``… (truncated, N chars omitted)`` marker so a single noisy turn cannot blow up the trajectory. return_inputs_with_trajectory (bool): Optional. Whether to return the full trajectory alongside the final answer (Default ``True``). sandbox (Sandbox): Optional. A pre-built ``Sandbox`` instance to reuse across calls. When supplied, the agent will not build its own sandbox at ``call()`` time and ``sandbox_type`` is derived from ``type(sandbox)``. Pass this when the caller owns the sandbox lifecycle (e.g. interactive sessions where REPL state must persist across calls). When omitted, a fresh sandbox of ``sandbox_type`` is built per call. sandbox_type (type): Optional. The ``Sandbox`` subclass to instantiate when no sandbox is supplied (here or to ``call()``). Defaults to ``MontySandbox``, or to ``type(sandbox)`` when ``sandbox`` is given. Any ``Sandbox`` subclass whose ``__init__`` accepts ``(timeout=..., name=...)`` works; register custom subclasses with ``@register_synalinks_serializable`` so they round-trip through ``get_config`` / ``from_config``. name (str): Optional. The name of the module. description (str): Optional. The description of the module. """ def __init__( self, *, schema=None, data_model=None, language_model=None, sub_language_model=None, recursive=True, tools=None, prompt_template=None, examples=None, instructions=None, final_instructions=None, temperature=0.0, use_inputs_schema=False, use_outputs_schema=False, reasoning_effort=None, use_chain_of_thought=False, autonomous=True, timeout=60, max_iterations=20, max_llm_calls=50, max_output_chars=10_000, return_inputs_with_trajectory=True, sandbox=None, sandbox_type=None, name=None, description=None, ): super().__init__(name=name, description=description) if not schema and data_model: schema = data_model.get_schema() # `schema` is optional, when omitted, the agent operates in # "schemaless" mode and returns a ChatMessages trajectory (with # a final assistant message appended) instead of a typed answer. self.schema = schema self.language_model = _get_lm(language_model) # `sub_language_model` defaults to the primary LM when omitted. # ``get(None)`` would raise, so resolve only when a value is given. self.sub_language_model = ( _get_lm(sub_language_model) if sub_language_model is not None else self.language_model ) self.recursive = recursive self.prompt_template = prompt_template self.examples = examples if not instructions: if recursive: instructions = get_recursive_instructions().replace( "{max_llm_calls}", str(max_llm_calls), ) else: instructions = get_default_instructions() self.instructions = instructions self.final_instructions = final_instructions or instructions # Sandbox handling: if a concrete sandbox is supplied at # construction, reuse it across calls and derive sandbox_type # from its class. Otherwise fall back to sandbox_type (default # MontySandbox) and build one fresh per `call()`. Set early so # the sandbox-specific prompt text can be composed into the # code generator's instructions below. self.sandbox = sandbox if sandbox is not None: self.sandbox_type = type(sandbox) else: self.sandbox_type = sandbox_type or MontySandbox sandbox_description = self.sandbox_type.description if sandbox_description: self.instructions = self.instructions + "\n\n" + sandbox_description self.temperature = temperature self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.reasoning_effort = reasoning_effort self.use_chain_of_thought = use_chain_of_thought self.autonomous = autonomous self.timeout = timeout self.max_iterations = max_iterations self.max_llm_calls = max_llm_calls self.max_output_chars = max_output_chars self.return_inputs_with_trajectory = return_inputs_with_trajectory reserved = self._reserved_tool_names() self.tools = {} if tools: for tool in tools: if tool.name.startswith("_"): raise ValueError( f"Tool name {tool.name!r} starts with an underscore. " f"Tools exposed to the LM must have public names — " f"rename the function or pass an explicit `name=` " f"to Tool(...)." ) if tool.name in reserved: raise ValueError( f"Tool name '{tool.name}' is reserved by {type(self).__name__}." ) self.tools[tool.name] = tool self.tools_catalog = _build_tools_catalog(self.tools) code_step_schema = CodeStep.get_schema() generator_cls = ChainOfThought if use_chain_of_thought else Generator self.code_generator = generator_cls( schema=code_step_schema, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, language_model=self.language_model, name="code_generator_" + self.name, ) self.final_generator = Generator( schema=self.schema, language_model=self.language_model, instructions=self.final_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, return_inputs=False, name="final_generator_" + self.name, ) def _reserved_tool_names(self) -> frozenset: """Names a user tool cannot collide with at construction time. ``submit``, ``llm_query``, and ``llm_query_batched`` are always reserved, even when ``recursive=False``, so that tool naming stays stable across the two modes and a user tool can't quietly shadow a helper name that would reappear if ``recursive`` is flipped back on. """ return frozenset({"submit", "llm_query", "llm_query_batched"}) def _build_extra_call_tools(self) -> dict: """Build the per-call recursive helpers when ``recursive=True``. A fresh counter+lock is built on every invocation so concurrent calls into the same agent instance get independent budgets. Returns ``{}`` when ``recursive=False``. """ if not self.recursive: return {} counter = {"value": 0} lock = asyncio.Lock() return { "llm_query": _build_llm_query_tool( self.sub_language_model, self.max_llm_calls, counter, lock, ), "llm_query_batched": _build_llm_query_batched_tool( self.sub_language_model, self.max_llm_calls, counter, lock, ), } async def _execute_turn( self, sandbox, code, inputs_json, tools=None, ): """Run one code snippet in the sandbox, return a formatted observation. ``tools`` overrides ``self.tools`` for the duration of this call — used at call time to layer in per-call built-ins like ``submit``. """ active_tools = tools if tools is not None else self.tools external_functions = ( {name: _adapt_tool_for_sandbox(t) for name, t in active_tools.items()} if active_tools else None ) # `inputs` is rebound every turn so the snippet can always read # `inputs[field]` regardless of what prior turns did to the name. run_kwargs = {"inputs": {"inputs": inputs_json}} if external_functions is not None: run_kwargs["external_functions"] = external_functions execution = await sandbox.run(code, **run_kwargs) return _format_observation( execution.stdout, execution.stderr, execution.result, execution.error, max_output_chars=self.max_output_chars, ) async def call(self, inputs, training=False, sandbox=None): if not inputs: return None if not self.autonomous and not is_chat_messages(inputs): raise ValueError( f"In interactive mode, the {type(self).__name__} needs a " "ChatMessages-like data model as inputs" ) # Per-call tool set: user tools plus a fresh `submit` bound to a # private holder, plus any per-call recursive helpers. submit is # the canonical termination signal, always exposed, schema'd or # not, and everything in this set is built fresh per call so # concurrent invocations don't share holders, counters, or locks. call_tools = dict(self.tools) submit_holder = {"value": None} call_tools["submit"] = _build_submit_tool(self.schema, submit_holder) call_tools.update(self._build_extra_call_tools()) call_tools_catalog = _build_tools_catalog(call_tools) if is_chat_messages(inputs): trajectory = inputs inputs_json = {} else: inputs_json = inputs.get_json() # The LM prompt only sees a metadata summary of the inputs — # previews and sizes, never the full value. The sandbox gets # the complete `inputs_json` rebound on every turn (see # `_execute_turn`), so `inputs[field]` is always reachable. base = _summarize_inputs(inputs_json) if call_tools_catalog is not None: base = await ops.concat( base, call_tools_catalog, name="inputs_with_tools_" + self.name, ) trajectory = await ops.concat( base, ChatMessages(), name="trajectory_" + self.name, ) agent_messages = trajectory.get("messages") # Sandbox resolution order: per-call kwarg > constructor-supplied # sandbox > fresh sandbox of `sandbox_type`. The first two cases # let the caller (or the agent's owner) keep REPL state alive # across calls; the third is the stateless-per-call default. if sandbox is None: sandbox = self.sandbox or self.sandbox_type(timeout=self.timeout) iterations = self.max_iterations if self.autonomous else 1 submitted_final = None for n in range(iterations): # The iteration counter is concat'd in per turn so the code # generator can pace itself ("2/5" => half the budget left, # plan accordingly). iteration_info = IterationInfo( iteration=f"{n + 1}/{iterations}", ) turn_input = await ops.concat( trajectory, iteration_info, name=f"turn_{n}_{self.name}", ) code_step = await self.code_generator(turn_input) if not code_step: break code = code_step.get("python_code") or "" thinking = code_step.get("thinking", "") content_parts = [] if thinking: content_parts.append(thinking) if code.strip(): content_parts.append(f"```python\n{code}\n```") assistant_message = ChatMessage( role=ChatRole.ASSISTANT, content="\n\n".join(content_parts) if content_parts else "", ) agent_messages.append(assistant_message.get_json()) # Empty code is no longer a termination signal, submit is the # canonical path. Feed a reminder back as an observation and # let the loop run another turn. if not code.strip(): agent_messages.append( ChatMessage( role=ChatRole.TOOL, content=( "(no code emitted) Call the `submit` tool with " "the final result to terminate the run." ), ).get_json() ) trajectory.update({"messages": agent_messages}) continue observation = await self._execute_turn( sandbox, code, inputs_json, tools=call_tools, ) # If submit was called, validate (when a schema is set) and # decide whether to end the loop. Clear the holder either way # so a subsequent retry isn't short-circuited by a stale # captured payload. submitted = submit_holder["value"] submit_holder["value"] = None if submitted is not None: if self.schema: try: jsonschema.validate(submitted, self.schema) except ValidationError as ve: observation = ( observation + f"\nsubmit validation failed: {ve.message}. " + "Revise the payload and call submit again." ) else: submitted_final = submitted observation = observation + "\nsubmit accepted." else: # Schemaless: any dict is accepted. submitted_final = submitted observation = observation + "\nsubmit accepted." agent_messages.append( ChatMessage( role=ChatRole.TOOL, content=observation, ).get_json() ) trajectory.update({"messages": agent_messages}) if submitted_final is not None: break # Interactive mode: only invoke the final generator when the LM itself # signalled completion via submit. Otherwise return the updated # trajectory so the caller can decide when to continue. if not self.autonomous and submitted_final is None: validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) # submit short-circuit: the LM already produced the final payload # inside the sandbox, so we skip the final-formatting LM call. # Schemaless mode treats the payload as the content of a final # assistant ChatMessage appended to the trajectory. if submitted_final is not None: if self.schema: final_result = JsonDataModel( json=submitted_final, schema=self.schema, name="final_generator_" + self.name, ) else: agent_messages.append( ChatMessage( role=ChatRole.ASSISTANT, content=submitted_final, ).get_json() ) validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) else: final_result = await self.final_generator(trajectory) if not self.schema: # Schemaless fallback: the final generator emits a # ChatMessage. Append it to the trajectory and return. if final_result: agent_messages.append(final_result.get_json()) validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ) if self.return_inputs_with_trajectory: validated_messages = ChatMessages( messages=[ChatMessage(**msg) for msg in agent_messages] ) return await ops.concat( JsonDataModel( json=validated_messages.get_json(), schema=ChatMessages.get_schema(), name=self.name, ), final_result, name=self.name, ) return final_result async def compute_output_spec(self, inputs, training=False, sandbox=None): if not self.autonomous and not is_chat_messages(inputs): raise ValueError( f"In interactive mode, the {type(self).__name__} needs a " "ChatMessages-like data model as inputs" ) # Mirror the runtime: the code generator sees a summary of the # input plus the tool catalog plus an IterationInfo, not the raw # input DataModel. if is_chat_messages(inputs): generator_inputs = inputs else: generator_inputs = SymbolicDataModel( schema=InputsSummary.get_schema(), name="inputs_summary_" + self.name, ) if self.tools_catalog is not None: generator_inputs = await ops.concat( generator_inputs, self.tools_catalog, name="inputs_with_tools_" + self.name, ) generator_inputs = await ops.concat( generator_inputs, SymbolicDataModel( schema=IterationInfo.get_schema(), name="iteration_info_" + self.name, ), name="turn_input_" + self.name, ) _ = await self.code_generator(generator_inputs) if not self.autonomous: # Interactive mode: the common case is returning the trajectory. # When the LM emits empty code the runtime returns the final # answer instead; we pick the common-case spec. return SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ) if not self.schema: # Schemaless autonomous: final generator produces a ChatMessage # appended to the trajectory; output spec is ChatMessages. _ = await self.final_generator(inputs) return SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ) if self.return_inputs_with_trajectory: return await ops.logical_and( SymbolicDataModel( schema=ChatMessages.get_schema(), name=self.name, ), SymbolicDataModel( schema=self.schema, name="final_generator_" + self.name, ), name=self.name, ) return await self.final_generator(inputs) def get_config(self): config = { "schema": self.schema, "recursive": self.recursive, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "final_instructions": self.final_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "reasoning_effort": self.reasoning_effort, "use_chain_of_thought": self.use_chain_of_thought, "autonomous": self.autonomous, "timeout": self.timeout, "max_iterations": self.max_iterations, "max_llm_calls": self.max_llm_calls, "max_output_chars": self.max_output_chars, "return_inputs_with_trajectory": self.return_inputs_with_trajectory, "sandbox_type": get_registered_name(self.sandbox_type), "name": self.name, "description": self.description, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ), "sub_language_model": serialization_lib.serialize_synalinks_object( self.sub_language_model, ), } sandbox_config = { "sandbox": ( serialization_lib.serialize_synalinks_object(self.sandbox) if self.sandbox is not None else None ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object(tool) for tool in self.tools.values() ] } return {**config, **language_model_config, **sandbox_config, **tools_config} @classmethod def from_config(cls, config): tools = [ serialization_lib.deserialize_synalinks_object(tool) for tool in config.pop("tools", []) ] language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) sub_language_model = None if "sub_language_model" in config: sub_language_model = serialization_lib.deserialize_synalinks_object( config.pop("sub_language_model") ) sandbox = None if "sandbox" in config: sandbox_serialized = config.pop("sandbox") if sandbox_serialized is not None: sandbox = serialization_lib.deserialize_synalinks_object( sandbox_serialized ) sandbox_type_name = config.pop("sandbox_type", None) sandbox_type = ( get_registered_object(sandbox_type_name) if sandbox_type_name else None ) return cls( language_model=language_model, sub_language_model=sub_language_model, tools=tools or None, sandbox=sandbox, sandbox_type=sandbox_type, **config, ) ```` ## `ToolSpec` Bases: `DataModel` Description of one tool exposed in the sandbox. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` class ToolSpec(DataModel): """Description of one tool exposed in the sandbox.""" name: str = Field( description=( "The async callable's name in the sandbox. Invoke with " "`await {name}(**kwargs)`." ) ) description: str = Field( description="What the tool does (from the Python docstring).", ) parameters: dict = Field( description=( "JSON Schema for keyword arguments: `properties` maps each " "parameter name to its `{type, description}`, and `required` " "lists the parameters that must be passed." ), ) ``` ## `ToolsCatalog` Bases: `DataModel` Catalog of tools bound to the sandbox. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` class ToolsCatalog(DataModel): """Catalog of tools bound to the sandbox.""" tools: list[ToolSpec] = Field( default=[], description=( "Tools callable inside the sandbox as global async functions. " "Every tool returns a dict, a tool wrapping `async def f(x) " "-> int` yields `{'result': }`; a tool already returning " "a dict yields that dict directly. Call with `await` inside " "`async def main(): ...` and drive with `asyncio.run(main())`." ), ) ``` ## `get_default_instructions()` Default instructions for non-recursive Python-snippet reasoning. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` def get_default_instructions(): """Default instructions for non-recursive Python-snippet reasoning.""" return """ You solve the user task by writing and executing Python snippets inside a persistent sandbox. Each turn you emit ONE snippet via the `python_code` field; state persists across turns (variables stay defined, imports stay loaded). IMPORTANT: This is ITERATIVE. Each code block you write will execute, you'll see the output, then you decide what to do next. Do NOT try to solve everything in one step. The user input is bound as a dict named `inputs` in the sandbox, the full, untruncated value. In the prompt you only see an `InputsSummary` with previews and sizes; always read the real values through `inputs[field]` inside your code, never re-type them from the preview. Each turn carries an `IterationInfo.iteration` field like `3/5`, that's your progress out of the hard iteration cap. Budget accordingly: early turns can explore, later ones should converge. If few turns remain, batch work into a single snippet instead of spreading it out. Use `print(...)` to log intermediate observations. Any tools bound to the agent are exposed as async callables, call them inside `async def main():` and drive with `asyncio.run(main())`; calling them without `await` returns a coroutine object, not the value. Termination: call the `submit` tool (always present in the tools catalog) with `result={...}` matching its `result` parameter schema. `submit` is async, so drive it inside `async def main(): ...` with `asyncio.run(main())` like any other tool. It captures the answer and ends the run in one step. If the payload fails schema validation you'll see the validation error on the next turn and can retry. `submit` is the only termination path, emitting an empty `python_code` string is treated as a no-op and you'll be reminded to call `submit`. Don't run out of iterations without calling it. """.strip() ``` ## `get_recursive_instructions()` Default instructions for recursive (sub-LM) Python-snippet reasoning. The `{max_llm_calls}` placeholder is substituted at construction time. Source code in `synalinks/src/modules/agents/recursive_language_model_agent.py` ``` def get_recursive_instructions(): """Default instructions for recursive (sub-LM) Python-snippet reasoning. The ``{max_llm_calls}`` placeholder is substituted at construction time. """ return """ You solve the user task by writing Python that programmatically explores the inputs and recursively delegates semantic work to a sub-LM. Each turn you emit ONE snippet via the `python_code` field; state persists across turns (variables stay defined, imports stay loaded). IMPORTANT: This is ITERATIVE. Each code block you write will execute, you'll see the output, then you decide what to do next. Do NOT try to solve everything in one step. Treat the inputs as an *external environment*, not as text in your prompt. Long documents and large collections live in the sandbox and are read with `inputs[field]`. The prompt only shows an `InputsSummary` with previews and sizes, never re-type values from the preview. Two recursive helpers are always exposed in the tools catalog: - `llm_query(prompt)`, query a sub-LM with one prompt; returns `{"result": }`. Use it for semantic work on snippets you've already carved out with code (search, classification, summarization, reformatting). Pass *only the relevant snippet*, the sub-LM has its own context budget. - `llm_query_batched(prompts)`, same, but takes a list and runs the prompts concurrently. Returns `{"result": [, ...]}` preserving input order; failed prompts come back as strings prefixed with `[error] : `, filter them before aggregating. Strongly preferred over a Python loop of `llm_query` calls, sequential calls waste wall time. You have a hard budget of {max_llm_calls} sub-LM calls per run; the counter is shared between `llm_query` and `llm_query_batched`. When exhausted, both helpers short-circuit with `{"result": , "error": ""}` without consuming any quota, check `error` before trusting `result`. Plan recursion accordingly: prefer code-side aggregation (regex, set ops, sorting, dict-comprehension counting) over re-querying. Each turn carries an `IterationInfo.iteration` field like `3/5` — your progress against the iteration cap. Early turns can explore; later ones should converge. When few turns remain, batch work into fewer snippets. Use `print(...)` to log intermediate observations. All sandbox tools are async, call them inside `async def main():` and drive with `asyncio.run(main())`; calling without `await` yields a coroutine object, not the value. Working rules: 1. EXPLORE FIRST. Print sample values, lengths, types, and shapes of `inputs[field]` before slicing or batching. A cheap probe turn prevents wasted sub-LM calls on the wrong field or shape. 2. CODE FOR STRUCTURE, `llm_query` FOR MEANING. Regex, slicing, and set ops find WHERE things are; the sub-LM understands WHAT they mean. Don't burn `llm_query` budget on aggregation a one-liner can do. 3. MINIMIZE RETYPING. When values are long, precise, or error-prone (IDs, numbers, quoted text, code), re-access them via `inputs[field]` and compute in Python. Never copy from the `InputsSummary` preview into a sub-LM prompt, the preview is truncated. 4. VERIFY BEFORE SUBMITTING. If results look wrong (empty, zeros, unexpected shape), inspect them on a separate turn. Don't submit a guess. 5. `submit` IS TERMINAL. The snippet runs to completion (so a `print(...)` next to `submit(...)` is captured into the observation), but a successful submit ends the loop with no follow-up turn — you never get to read that print. Inspect on one turn, submit on the next. Termination: call the always-present `submit` tool with `result={...}` matching its `result` parameter schema. `submit` is the only termination path, empty `python_code` strings are no-ops and you'll be reminded to call `submit`. Don't run out of iterations without calling it. """.strip() ``` ## `SQLAgent` Bases: `Module` A ready-to-use SQL agent backed by a knowledge base. SQLAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires three SQL tools bound to a :class:`KnowledgeBase`: - `get_database_schema`: discovers all tables and their columns. - `get_table_sample`: fetches a few rows so the LM can see the data shape before writing queries. - `run_sql_query`: executes a `SELECT` query via :meth:`KnowledgeBase.query` with `read_only=True`. The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are `knowledge_base` (required) and `output_format` (controls the SQL tools' result rendering). User-supplied `tools` are appended to the three built-in tools. Safety is enforced by the knowledge base, not by string filtering. The DuckDB adapter parses the query with the engine's parser and rejects anything that isn't a `SELECT` (including `COPY ... TO 'file'` exfiltration, `ATTACH`, multi-statement injection), and the connection has `enable_external_access=false` so `read_csv` / `read_parquet` / httpfs can't reach the host filesystem or network. Example: ``` import synalinks import asyncio class Customer(synalinks.DataModel): id: str = synalinks.Field(description="Customer ID") name: str = synalinks.Field(description="Customer name") country: str = synalinks.Field(description="Customer country") class Query(synalinks.DataModel): query: str = synalinks.Field(description="Natural language question") class SQLAnswer(synalinks.DataModel): answer: str = synalinks.Field(description="Answer in natural language") sql_query: str = synalinks.Field(description="SQL that produced it") async def main(): kb = synalinks.KnowledgeBase( uri="duckdb://my_db.db", data_models=[Customer], ) await kb.update(Customer(id="C1", name="Alice", country="USA")) lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=Query) outputs = await synalinks.SQLAgent( knowledge_base=kb, language_model=lm, data_model=SQLAnswer, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) result = await agent(Query(query="How many customers are in the USA?")) print(result.get("answer")) print(result.get("sql_query")) asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | ------------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------- | | `knowledge_base` | `KnowledgeBase` | The knowledge base to query. Required. | `None` | | `k` | `int` | Maximum page size (rows per call) the LM can pull through get_table_sample and run_sql_query. get_table_sample clamps the LM's limit argument to min(limit, k); run_sql_query wraps the LM's SQL in SELECT * FROM ({sql}) LIMIT k so even unbounded SELECT * queries can't drain a large table into the conversation. A LIMIT inside the LM's own SQL still applies first. Defaults to 50. | `50` | | `output_format` | `str` | How the SQL tools render result sets to the LM. "csv" (default) is compact and minimizes input tokens; "json" returns a list of dicts. Applies to both get_table_sample and run_sql_query. | `'csv'` | | `tools` | `list` | Additional :class:Tool instances (or plain async functions) to expose alongside the three built-in SQL tools — for example a calculator, a datetime helper, a web-search tool. Tool names must not collide with the built-ins (get_database_schema, get_table_sample, run_sql_query) or a ValueError is raised. | `None` | | `schema` | `dict` | JSON schema for the final answer. | `None` | | `data_model` | `DataModel` | DataModel for the final answer. Mutually exclusive with schema. | `None` | | `language_model` | `LanguageModel` | The language model that drives the agent loop. | `None` | | `prompt_template` | `str` | Forwarded to the tool-call generator. | `None` | | `examples` | `list` | Few-shot examples for the tool-call generator. | `None` | | `instructions` | `str` | Override the default system instructions. When omitted, the default is built from the knowledge base's tables so the LM knows what's available without an extra schema call. | `None` | | `final_instructions` | `str` | Instructions for the final-answer generator. Defaults to instructions. | `None` | | `temperature` | `float` | LM sampling temperature. Defaults to 0.0 for deterministic SQL generation. | `0.0` | | `use_inputs_schema` | `bool` | Include the input schema in the prompt. | `False` | | `use_outputs_schema` | `bool` | Include the output schema in the prompt. | `False` | | `reasoning_effort` | `str` | Forwarded to the generators (for reasoning-capable LMs). | `None` | | `use_chain_of_thought` | `bool` | When True, the tool-call generator emits a thinking field per round. | `False` | | `autonomous` | `bool` | When True (default), the agent runs the tool loop end-to-end. When False, returns one step at a time for human-in-the-loop workflows. | `True` | | `return_inputs_with_trajectory` | `bool` | When True (default), the full message trajectory is included alongside the final answer. | `True` | | `max_iterations` | `int` | Maximum number of tool-call rounds. Defaults to 5. | `5` | | `streaming` | `bool` | Stream the final answer when no schema is set. Defaults to False. | `False` | | `name` | `str` | Module name. | `None` | | `description` | `str` | Module description. | `None` | Source code in `synalinks/src/modules/agents/sql_agent.py` ```` @synalinks_export( [ "synalinks.modules.SQLAgent", "synalinks.SQLAgent", ] ) class SQLAgent(Module): """A ready-to-use SQL agent backed by a knowledge base. SQLAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires three SQL tools bound to a :class:`KnowledgeBase`: - ``get_database_schema``: discovers all tables and their columns. - ``get_table_sample``: fetches a few rows so the LM can see the data shape before writing queries. - ``run_sql_query``: executes a ``SELECT`` query via :meth:`KnowledgeBase.query` with ``read_only=True``. The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are ``knowledge_base`` (required) and ``output_format`` (controls the SQL tools' result rendering). User-supplied ``tools`` are appended to the three built-in tools. Safety is enforced by the knowledge base, not by string filtering. The DuckDB adapter parses the query with the engine's parser and rejects anything that isn't a ``SELECT`` (including ``COPY ... TO 'file'`` exfiltration, ``ATTACH``, multi-statement injection), and the connection has ``enable_external_access=false`` so ``read_csv`` / ``read_parquet`` / httpfs can't reach the host filesystem or network. Example: ```python import synalinks import asyncio class Customer(synalinks.DataModel): id: str = synalinks.Field(description="Customer ID") name: str = synalinks.Field(description="Customer name") country: str = synalinks.Field(description="Customer country") class Query(synalinks.DataModel): query: str = synalinks.Field(description="Natural language question") class SQLAnswer(synalinks.DataModel): answer: str = synalinks.Field(description="Answer in natural language") sql_query: str = synalinks.Field(description="SQL that produced it") async def main(): kb = synalinks.KnowledgeBase( uri="duckdb://my_db.db", data_models=[Customer], ) await kb.update(Customer(id="C1", name="Alice", country="USA")) lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=Query) outputs = await synalinks.SQLAgent( knowledge_base=kb, language_model=lm, data_model=SQLAnswer, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) result = await agent(Query(query="How many customers are in the USA?")) print(result.get("answer")) print(result.get("sql_query")) asyncio.run(main()) ``` Args: knowledge_base (KnowledgeBase): The knowledge base to query. Required. k (int): Maximum page size (rows per call) the LM can pull through ``get_table_sample`` and ``run_sql_query``. ``get_table_sample`` clamps the LM's ``limit`` argument to ``min(limit, k)``; ``run_sql_query`` wraps the LM's SQL in ``SELECT * FROM ({sql}) LIMIT k`` so even unbounded ``SELECT *`` queries can't drain a large table into the conversation. A ``LIMIT`` inside the LM's own SQL still applies first. Defaults to 50. output_format (str): How the SQL tools render result sets to the LM. ``"csv"`` (default) is compact and minimizes input tokens; ``"json"`` returns a list of dicts. Applies to both ``get_table_sample`` and ``run_sql_query``. tools (list): Additional :class:`Tool` instances (or plain async functions) to expose alongside the three built-in SQL tools — for example a calculator, a datetime helper, a web-search tool. Tool names must not collide with the built-ins (``get_database_schema``, ``get_table_sample``, ``run_sql_query``) or a ``ValueError`` is raised. schema (dict): JSON schema for the final answer. data_model (DataModel): DataModel for the final answer. Mutually exclusive with ``schema``. language_model (LanguageModel): The language model that drives the agent loop. prompt_template (str): Forwarded to the tool-call generator. examples (list): Few-shot examples for the tool-call generator. instructions (str): Override the default system instructions. When omitted, the default is built from the knowledge base's tables so the LM knows what's available without an extra schema call. final_instructions (str): Instructions for the final-answer generator. Defaults to ``instructions``. temperature (float): LM sampling temperature. Defaults to 0.0 for deterministic SQL generation. use_inputs_schema (bool): Include the input schema in the prompt. use_outputs_schema (bool): Include the output schema in the prompt. reasoning_effort (str): Forwarded to the generators (for reasoning-capable LMs). use_chain_of_thought (bool): When ``True``, the tool-call generator emits a ``thinking`` field per round. autonomous (bool): When ``True`` (default), the agent runs the tool loop end-to-end. When ``False``, returns one step at a time for human-in-the-loop workflows. return_inputs_with_trajectory (bool): When ``True`` (default), the full message trajectory is included alongside the final answer. max_iterations (int): Maximum number of tool-call rounds. Defaults to 5. streaming (bool): Stream the final answer when no ``schema`` is set. Defaults to ``False``. name (str): Module name. description (str): Module description. """ def __init__( self, *, knowledge_base=None, k: int = 50, output_format: str = "csv", tools: Optional[List] = None, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions: Optional[str] = None, final_instructions: Optional[str] = None, temperature: float = 0.0, use_inputs_schema: bool = False, use_outputs_schema: bool = False, reasoning_effort: Optional[str] = None, use_chain_of_thought: bool = False, autonomous: bool = True, return_inputs_with_trajectory: bool = True, max_iterations: int = 5, streaming: bool = False, name: Optional[str] = None, description: Optional[str] = None, ): super().__init__(name=name, description=description) if knowledge_base is None: raise ValueError("`knowledge_base` is required") self.knowledge_base = knowledge_base self.language_model = _get_lm(language_model) if not schema and data_model: schema = data_model.get_schema() self.schema = schema if output_format not in ("csv", "json"): raise ValueError( f"`output_format` must be 'csv' or 'json', got {output_format!r}" ) self.output_format = output_format if not isinstance(k, int) or k < 1: raise ValueError(f"`k` must be a positive integer, got {k!r}") self.k = k if instructions is None: tables = [ m.get_schema().get("title", "Unknown") for m in self.knowledge_base.get_symbolic_data_models() ] instructions = get_default_instructions(tables) self.instructions = instructions self.final_instructions = final_instructions self.prompt_template = prompt_template self.examples = examples self.temperature = temperature self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.reasoning_effort = reasoning_effort self.use_chain_of_thought = use_chain_of_thought self.autonomous = autonomous self.return_inputs_with_trajectory = return_inputs_with_trajectory self.max_iterations = max_iterations self.streaming = streaming builtin_tools = [ Tool(fn) for fn in _build_tools( self.knowledge_base, output_format=self.output_format, k=self.k, ) ] builtin_names = {t.name for t in builtin_tools} # Stash the user-supplied tools as-is for get_config round-trips; # the merged list goes to FunctionCallingAgent below. self.extra_tools = list(tools) if tools else [] merged_tools = list(builtin_tools) for extra in self.extra_tools: extra_tool = extra if isinstance(extra, Tool) else Tool(extra) if extra_tool.name in builtin_names: raise ValueError( f"Tool name {extra_tool.name!r} collides with a built-in " f"SQL tool. Rename the additional tool." ) merged_tools.append(extra_tool) # Leading-underscore check is centralized in FunctionCallingAgent. self.agent = FunctionCallingAgent( schema=self.schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, final_instructions=self.final_instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, use_chain_of_thought=self.use_chain_of_thought, tools=merged_tools, autonomous=self.autonomous, return_inputs_with_trajectory=self.return_inputs_with_trajectory, max_iterations=self.max_iterations, streaming=self.streaming, name="agent_" + self.name, ) async def call(self, inputs, training=False): return await self.agent(inputs, training=training) async def compute_output_spec(self, inputs, training=False): return await self.agent.compute_output_spec(inputs, training=training) def get_config(self): config = { "schema": self.schema, "k": self.k, "output_format": self.output_format, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "final_instructions": self.final_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "reasoning_effort": self.reasoning_effort, "use_chain_of_thought": self.use_chain_of_thought, "autonomous": self.autonomous, "return_inputs_with_trajectory": self.return_inputs_with_trajectory, "max_iterations": self.max_iterations, "streaming": self.streaming, "name": self.name, "description": self.description, } knowledge_base_config = { "knowledge_base": serialization_lib.serialize_synalinks_object( self.knowledge_base, ) } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object( t if isinstance(t, Tool) else Tool(t) ) for t in self.extra_tools ] } return { **config, **knowledge_base_config, **language_model_config, **tools_config, } @classmethod def from_config(cls, config): knowledge_base = serialization_lib.deserialize_synalinks_object( config.pop("knowledge_base") ) language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) tools = [ serialization_lib.deserialize_synalinks_object(t) for t in config.pop("tools", []) ] return cls( knowledge_base=knowledge_base, language_model=language_model, tools=tools, **config, ) ```` ## `get_default_instructions(tables)` Default instructions for the SQL agent. Parameters: | Name | Type | Description | Default | | -------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `tables` | `List[str]` | The PascalCase names of tables available in the knowledge base. Embedded in the prompt so the LM doesn't have to call get_database_schema first for trivial lookups. | *required* | Returns: | Type | Description | | ----- | ------------------------------------------------------- | | `str` | A prompt string giving the LM the tool-use plan and the | | `str` | SELECT-only safety constraint. | Source code in `synalinks/src/modules/agents/sql_agent.py` ``` def get_default_instructions(tables: List[str]) -> str: """Default instructions for the SQL agent. Args: tables: The PascalCase names of tables available in the knowledge base. Embedded in the prompt so the LM doesn't have to call ``get_database_schema`` first for trivial lookups. Returns: A prompt string giving the LM the tool-use plan and the SELECT-only safety constraint. """ return f""" You are an SQL analyst with read-only access to a knowledge base. Available tables: {tables} Plan: 1. If you don't already know the schema, call `get_database_schema` first. 2. When you need to inspect representative values, call `get_table_sample`. 3. Build a single `SELECT` query and execute it with `run_sql_query`. Iterate on the query (read the error, fix the SQL, retry) until you have the data. 4. Once you have an answer, stop calling tools and produce the final response. Constraints: - Only `SELECT` statements are accepted. `INSERT`, `UPDATE`, `DELETE`, `DROP`, `ALTER`, `COPY ... TO`, and multi-statement queries are rejected by the engine — don't waste turns trying them. - Table and column names are case-sensitive: tables are PascalCase (e.g. ``Customer``), columns are snake_case (e.g. ``customer_id``). - Result sets are automatically capped server-side. If the result shows ``may_have_more=true``, refine the query (add filters or ``ORDER BY ... LIMIT n``) rather than asking for more rows. """.strip() ``` ## `VectorRAGAgent` Bases: `Module` A ready-to-use retrieval-augmented agent backed by a knowledge base. VectorRAGAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires three retrieval tools bound to a :class:`KnowledgeBase`: - `get_knowledge_base_schema`: lists available tables and columns. - `search_knowledge_base`: dispatches to similarity / fulltext / hybrid_fts depending on the configured `search_type`. - `get_record_by_id`: full-record lookup after a search returns an id. The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are `knowledge_base` (required), the retrieval knobs (`search_type`, `k`, `similarity_threshold`, `fulltext_threshold`), and `output_format`. User-supplied `tools` are appended to the three built-in retrieval tools. Compared to a hardcoded RAG pipeline (always retrieve, then answer), the agent decides *if* retrieval is needed, *which* table to search, and *how* to phrase the query. Multiple searches per turn are allowed. Example: ``` import synalinks import asyncio class Document(synalinks.DataModel): id: str = synalinks.Field(description="Document id") title: str = synalinks.Field(description="Title") content: str = synalinks.Field(description="Body text") async def main(): embedding_model = synalinks.EmbeddingModel( model="gemini/text-embedding-004", ) kb = synalinks.KnowledgeBase( uri="duckdb://docs.db", data_models=[Document], embedding_model=embedding_model, ) # ... populate kb ... lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.VectorRAGAgent( knowledge_base=kb, language_model=lm, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) messages = synalinks.ChatMessages(messages=[ synalinks.ChatMessage(role="user", content="What is the PTO policy?") ]) result = await agent(messages) print(result.get("messages")[-1].get("content")) asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | ------------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | | `knowledge_base` | `KnowledgeBase` | The knowledge base to retrieve from. Required. | `None` | | `search_type` | `str` | Retrieval mode for the search_knowledge_base tool. One of: "similarity": vector-similarity over embeddings. "fulltext": BM25 keyword search. "hybrid_fts" (default): vector + BM25 fused with RRF. Requires the knowledge base to have an embedding model configured for "similarity" and "hybrid_fts". | `'hybrid_fts'` | | `k` | `int` | Top-k for searches. Fixed per-agent at construction time — the LM doesn't pass it. Defaults to 5. | `5` | | `similarity_threshold` | `float` | Maximum vector distance for the similarity and hybrid modes. Optional. | `None` | | `fulltext_threshold` | `float` | Minimum BM25 score for the fulltext and hybrid modes. Optional. | `None` | | `output_format` | `str` | How search results are rendered to the LM. "csv" (default) is compact; "json" returns a list of dicts. | `'csv'` | | `tools` | `list` | Additional :class:Tool instances (or plain async functions) to expose alongside the three built-in retrieval tools. Tool names must not collide with the built-ins (get_knowledge_base_schema, search_knowledge_base, get_record_by_id) or a ValueError is raised. | `None` | | `schema` | `dict` | JSON schema for the final answer. | `None` | | `data_model` | `DataModel` | DataModel for the final answer. Mutually exclusive with schema. | `None` | | `language_model` | `LanguageModel` | The language model that drives the agent loop. | `None` | | `prompt_template` | `str` | Forwarded to the tool-call generator. | `None` | | `examples` | `list` | Few-shot examples for the tool-call generator. | `None` | | `instructions` | `str` | Override the default system instructions. When omitted, defaults are built from the knowledge base's tables and the configured search_type. | `None` | | `final_instructions` | `str` | Instructions for the final-answer generator. Defaults to instructions. | `None` | | `temperature` | `float` | LM sampling temperature. Defaults to 0.0. | `0.0` | | `use_inputs_schema` | `bool` | Include the input schema in the prompt. | `False` | | `use_outputs_schema` | `bool` | Include the output schema in the prompt. | `False` | | `reasoning_effort` | `str` | Forwarded to the generators (for reasoning-capable LMs). | `None` | | `use_chain_of_thought` | `bool` | When True, the tool-call generator emits a thinking field per round. | `False` | | `autonomous` | `bool` | When True (default), the agent runs the tool loop end-to-end. When False, returns one step at a time for human-in-the-loop workflows. | `True` | | `return_inputs_with_trajectory` | `bool` | When True (default), the full message trajectory is included alongside the final answer. | `True` | | `max_iterations` | `int` | Maximum number of tool-call rounds. Defaults to 5. | `5` | | `streaming` | `bool` | Stream the final answer when no schema is set. Defaults to False. | `False` | | `name` | `str` | Module name. | `None` | | `description` | `str` | Module description. | `None` | Source code in `synalinks/src/modules/agents/vector_rag_agent.py` ```` @synalinks_export( [ "synalinks.modules.VectorRAGAgent", "synalinks.VectorRAGAgent", ] ) class VectorRAGAgent(Module): """A ready-to-use retrieval-augmented agent backed by a knowledge base. VectorRAGAgent is a thin specialization of :class:`FunctionCallingAgent` that pre-wires three retrieval tools bound to a :class:`KnowledgeBase`: - ``get_knowledge_base_schema``: lists available tables and columns. - ``search_knowledge_base``: dispatches to similarity / fulltext / hybrid_fts depending on the configured ``search_type``. - ``get_record_by_id``: full-record lookup after a search returns an id. The constructor mirrors :class:`FunctionCallingAgent` — every parameter on that class is accepted here with identical semantics. The only additions are ``knowledge_base`` (required), the retrieval knobs (``search_type``, ``k``, ``similarity_threshold``, ``fulltext_threshold``), and ``output_format``. User-supplied ``tools`` are appended to the three built-in retrieval tools. Compared to a hardcoded RAG pipeline (always retrieve, then answer), the agent decides *if* retrieval is needed, *which* table to search, and *how* to phrase the query. Multiple searches per turn are allowed. Example: ```python import synalinks import asyncio class Document(synalinks.DataModel): id: str = synalinks.Field(description="Document id") title: str = synalinks.Field(description="Title") content: str = synalinks.Field(description="Body text") async def main(): embedding_model = synalinks.EmbeddingModel( model="gemini/text-embedding-004", ) kb = synalinks.KnowledgeBase( uri="duckdb://docs.db", data_models=[Document], embedding_model=embedding_model, ) # ... populate kb ... lm = synalinks.LanguageModel(model="ollama/mistral") inputs = synalinks.Input(data_model=synalinks.ChatMessages) outputs = await synalinks.VectorRAGAgent( knowledge_base=kb, language_model=lm, )(inputs) agent = synalinks.Program(inputs=inputs, outputs=outputs) messages = synalinks.ChatMessages(messages=[ synalinks.ChatMessage(role="user", content="What is the PTO policy?") ]) result = await agent(messages) print(result.get("messages")[-1].get("content")) asyncio.run(main()) ``` Args: knowledge_base (KnowledgeBase): The knowledge base to retrieve from. Required. search_type (str): Retrieval mode for the ``search_knowledge_base`` tool. One of: - ``"similarity"``: vector-similarity over embeddings. - ``"fulltext"``: BM25 keyword search. - ``"hybrid_fts"`` (default): vector + BM25 fused with RRF. Requires the knowledge base to have an embedding model configured for ``"similarity"`` and ``"hybrid_fts"``. k (int): Top-k for searches. Fixed per-agent at construction time — the LM doesn't pass it. Defaults to 5. similarity_threshold (float): Maximum vector distance for the similarity and hybrid modes. Optional. fulltext_threshold (float): Minimum BM25 score for the fulltext and hybrid modes. Optional. output_format (str): How search results are rendered to the LM. ``"csv"`` (default) is compact; ``"json"`` returns a list of dicts. tools (list): Additional :class:`Tool` instances (or plain async functions) to expose alongside the three built-in retrieval tools. Tool names must not collide with the built-ins (``get_knowledge_base_schema``, ``search_knowledge_base``, ``get_record_by_id``) or a ``ValueError`` is raised. schema (dict): JSON schema for the final answer. data_model (DataModel): DataModel for the final answer. Mutually exclusive with ``schema``. language_model (LanguageModel): The language model that drives the agent loop. prompt_template (str): Forwarded to the tool-call generator. examples (list): Few-shot examples for the tool-call generator. instructions (str): Override the default system instructions. When omitted, defaults are built from the knowledge base's tables and the configured ``search_type``. final_instructions (str): Instructions for the final-answer generator. Defaults to ``instructions``. temperature (float): LM sampling temperature. Defaults to 0.0. use_inputs_schema (bool): Include the input schema in the prompt. use_outputs_schema (bool): Include the output schema in the prompt. reasoning_effort (str): Forwarded to the generators (for reasoning-capable LMs). use_chain_of_thought (bool): When ``True``, the tool-call generator emits a ``thinking`` field per round. autonomous (bool): When ``True`` (default), the agent runs the tool loop end-to-end. When ``False``, returns one step at a time for human-in-the-loop workflows. return_inputs_with_trajectory (bool): When ``True`` (default), the full message trajectory is included alongside the final answer. max_iterations (int): Maximum number of tool-call rounds. Defaults to 5. streaming (bool): Stream the final answer when no ``schema`` is set. Defaults to ``False``. name (str): Module name. description (str): Module description. """ def __init__( self, *, knowledge_base=None, search_type: str = "hybrid_fts", k: int = 5, similarity_threshold: Optional[float] = None, fulltext_threshold: Optional[float] = None, output_format: str = "csv", tools: Optional[List] = None, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions: Optional[str] = None, final_instructions: Optional[str] = None, temperature: float = 0.0, use_inputs_schema: bool = False, use_outputs_schema: bool = False, reasoning_effort: Optional[str] = None, use_chain_of_thought: bool = False, autonomous: bool = True, return_inputs_with_trajectory: bool = True, max_iterations: int = 5, streaming: bool = False, name: Optional[str] = None, description: Optional[str] = None, ): super().__init__(name=name, description=description) if knowledge_base is None: raise ValueError("`knowledge_base` is required") self.knowledge_base = knowledge_base self.language_model = _get_lm(language_model) if not schema and data_model: schema = data_model.get_schema() self.schema = schema if search_type not in SEARCH_TYPES: raise ValueError( f"`search_type` must be one of {SEARCH_TYPES}, got {search_type!r}" ) self.search_type = search_type if output_format not in ("csv", "json"): raise ValueError( f"`output_format` must be 'csv' or 'json', got {output_format!r}" ) self.output_format = output_format self.k = k self.similarity_threshold = similarity_threshold self.fulltext_threshold = fulltext_threshold if instructions is None: tables = [ m.get_schema().get("title", "Unknown") for m in self.knowledge_base.get_symbolic_data_models() ] instructions = get_default_instructions(tables, self.search_type) self.instructions = instructions self.final_instructions = final_instructions self.prompt_template = prompt_template self.examples = examples self.temperature = temperature self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.reasoning_effort = reasoning_effort self.use_chain_of_thought = use_chain_of_thought self.autonomous = autonomous self.return_inputs_with_trajectory = return_inputs_with_trajectory self.max_iterations = max_iterations self.streaming = streaming builtin_tools = [ Tool(fn) for fn in _build_tools( self.knowledge_base, search_type=self.search_type, k=self.k, similarity_threshold=self.similarity_threshold, fulltext_threshold=self.fulltext_threshold, output_format=self.output_format, ) ] builtin_names = {t.name for t in builtin_tools} self.extra_tools = list(tools) if tools else [] merged_tools = list(builtin_tools) for extra in self.extra_tools: extra_tool = extra if isinstance(extra, Tool) else Tool(extra) if extra_tool.name in builtin_names: raise ValueError( f"Tool name {extra_tool.name!r} collides with a built-in " f"retrieval tool. Rename the additional tool." ) merged_tools.append(extra_tool) # Leading-underscore check is centralized in FunctionCallingAgent. self.agent = FunctionCallingAgent( schema=self.schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, final_instructions=self.final_instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, reasoning_effort=self.reasoning_effort, use_chain_of_thought=self.use_chain_of_thought, tools=merged_tools, autonomous=self.autonomous, return_inputs_with_trajectory=self.return_inputs_with_trajectory, max_iterations=self.max_iterations, streaming=self.streaming, name="agent_" + self.name, ) async def call(self, inputs, training=False): return await self.agent(inputs, training=training) async def compute_output_spec(self, inputs, training=False): return await self.agent.compute_output_spec(inputs, training=training) def get_config(self): config = { "schema": self.schema, "search_type": self.search_type, "k": self.k, "similarity_threshold": self.similarity_threshold, "fulltext_threshold": self.fulltext_threshold, "output_format": self.output_format, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "final_instructions": self.final_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "reasoning_effort": self.reasoning_effort, "use_chain_of_thought": self.use_chain_of_thought, "autonomous": self.autonomous, "return_inputs_with_trajectory": self.return_inputs_with_trajectory, "max_iterations": self.max_iterations, "streaming": self.streaming, "name": self.name, "description": self.description, } knowledge_base_config = { "knowledge_base": serialization_lib.serialize_synalinks_object( self.knowledge_base, ) } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object( t if isinstance(t, Tool) else Tool(t) ) for t in self.extra_tools ] } return { **config, **knowledge_base_config, **language_model_config, **tools_config, } @classmethod def from_config(cls, config): knowledge_base = serialization_lib.deserialize_synalinks_object( config.pop("knowledge_base") ) language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) tools = [ serialization_lib.deserialize_synalinks_object(t) for t in config.pop("tools", []) ] return cls( knowledge_base=knowledge_base, language_model=language_model, tools=tools, **config, ) ```` ## `get_default_instructions(tables, search_type)` Default system instructions for the RAG agent. Parameters: | Name | Type | Description | Default | | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `tables` | `List[str]` | PascalCase names of tables available for retrieval. Embedded in the prompt so the LM can pick a target table without a separate schema call. | *required* | | `search_type` | `str` | Which retrieval mode the agent is configured for. Shapes the guidance on how to phrase query arguments. | *required* | Returns: | Type | Description | | ----- | -------------------------------------------------------- | | `str` | A prompt string giving the LM the retrieval loop and the | | `str` | query-writing guidance for the configured search mode. | Source code in `synalinks/src/modules/agents/vector_rag_agent.py` ``` def get_default_instructions(tables: List[str], search_type: str) -> str: """Default system instructions for the RAG agent. Args: tables: PascalCase names of tables available for retrieval. Embedded in the prompt so the LM can pick a target table without a separate schema call. search_type: Which retrieval mode the agent is configured for. Shapes the guidance on how to phrase ``query`` arguments. Returns: A prompt string giving the LM the retrieval loop and the query-writing guidance for the configured search mode. """ if search_type == "similarity": retrieval_hint = ( "Use natural-language descriptions of what you need — the " "search is vector-similarity over embeddings, so paraphrase " "the user's intent rather than guessing keywords." ) elif search_type == "fulltext": retrieval_hint = ( "Use keyword-rich queries — the search is BM25 full-text, " "so the words you pick must appear in the documents." ) else: # hybrid_fts retrieval_hint = ( "Use natural-language queries that contain the keywords you " "expect to appear in matching documents — the search fuses " "vector similarity and BM25 with Reciprocal Rank Fusion, so " "both signals contribute." ) return f""" You are a retrieval-augmented assistant with access to a knowledge base. Available tables: {tables} Search mode: {search_type} Plan: 1. If you don't already know what's available, call `get_knowledge_base_schema`. 2. Call `search_knowledge_base` with the table you want and a query. {retrieval_hint} 3. If a search result references an id you want to inspect in full, call `get_record_by_id`. 4. Once you have enough context, stop calling tools and answer. Constraints: - Only retrieve when the user's question actually needs grounded information. Trivial questions don't need a search. - Reformulate the user's question into focused queries; don't just pass the raw user text. - If a search returns nothing useful, retry with a different phrasing before giving up. """.strip() ``` ## `Action` Bases: `Module` Use a `LanguageModel` to perform a tool call given the input data model. This module uses structured output to call a given Tool. This module can be used in agents or traditional workflows seamlessly, it uses the input data model to infer the tool parameters. The output of this module contains the inputs inferred by the language model as well as the outputs of the tool call. Example: ``` import synalinks import asyncio async def main(): class Query(synalinks.DataModel): query: str @synalinks.saving.register_synalinks_serializable() async def calculate(expression: str): """Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. """ if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": "Error: invalid characters in expression", } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Action( tool=synalinks.Tool(calculate), language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="calculator", description="This program perform the calculation of an expression", ) if __name__ == "__main__": asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `tool` | `Tool` | The Tool instance to call. | *required* | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The default jinja2 prompt template to use (see Generator). | `None` | | `examples` | `list` | The default examples to use in the prompt (see Generator). | `None` | | `instructions` | `list` | The default instructions to use (see Generator). | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False) (see Generator). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False) (see Generator). | `False` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/core/action.py` ```` @synalinks_export( [ "synalinks.modules.Action", "synalinks.Action", ] ) class Action(Module): """Use a `LanguageModel` to perform a tool call given the input data model. This module uses structured output to call a given Tool. This module can be used in agents or traditional workflows seamlessly, it uses the input data model to infer the tool parameters. The output of this module contains the inputs inferred by the language model as well as the outputs of the tool call. Example: ```python import synalinks import asyncio async def main(): class Query(synalinks.DataModel): query: str @synalinks.saving.register_synalinks_serializable() async def calculate(expression: str): \"""Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate, such as '2 + 2'. The expression can contain numbers, operators (+, -, *, /), parentheses, and spaces. \""" if not all(char in "0123456789+-*/(). " for char in expression): return { "result": None, "log": "Error: invalid characters in expression", } try: # Evaluate the mathematical expression safely result = round(float(eval(expression, {"__builtins__": None}, {})), 2) return { "result": result, "log": "Successfully executed", } except Exception as e: return { "result": None, "log": f"Error: {e}", } language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Action( tool=synalinks.Tool(calculate), language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="calculator", description="This program perform the calculation of an expression", ) if __name__ == "__main__": asyncio.run(main()) ``` Args: tool (Tool): The Tool instance to call. language_model (LanguageModel): The language model to use. prompt_template (str): The default jinja2 prompt template to use (see `Generator`). examples (list): The default examples to use in the prompt (see `Generator`). instructions (list): The default instructions to use (see `Generator`). seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False) (see `Generator`). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False) (see `Generator`). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, tool, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, reasoning_effort=None, use_inputs_schema=False, use_outputs_schema=False, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.tool = tool schema = self.tool.get_input_schema() self.language_model = _get_lm(language_model) self.prompt_template = prompt_template self.examples = examples self.instructions = instructions self.seed_instructions = seed_instructions self.temperature = temperature self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.action = Generator( schema=schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, seed_instructions=self.seed_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, name="generator_" + self.name, ) async def call(self, inputs, training=False): if not inputs: return None tool_inputs = await self.action(inputs, training=training) try: tool_result = await self.tool(**tool_inputs.get_json()) tool_outputs = tool_result.get_json() if tool_result else {} except Exception as e: tool_outputs = {"error": str(e)} generic_io = GenericIO(inputs=tool_inputs.get_json(), outputs=tool_outputs) return JsonDataModel( json=GenericAction(action=generic_io.get_json()).get_json(), schema=GenericAction.get_schema(), name=self.name, ) async def compute_output_spec(self, inputs, training=False): _ = await self.action(inputs) return SymbolicDataModel(schema=GenericAction.get_schema(), name=self.name) def get_config(self): config = { "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "name": self.name, "description": self.description, "trainable": self.trainable, } tool_config = {"tool": serialization_lib.serialize_synalinks_object(self.tool)} language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model ) } return {**config, **tool_config, **language_model_config} @classmethod def from_config(cls, config): tool = serialization_lib.deserialize_synalinks_object(config.pop("tool")) language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) return cls(tool=tool, language_model=language_model, **config) ```` ## `GenericAction` Bases: `DataModel` A generic action with inputs/outputs Source code in `synalinks/src/modules/core/action.py` ``` class GenericAction(DataModel): """A generic action with inputs/outputs""" action: GenericIO = Field(description="An action already performed") ``` ## `Branch` Bases: `Module` Use a `LanguageModel` to select which module(s) to call based on an arbitrary input, a question and a list of labels. The selected branch(es) output the data model computed using the inputs and module's branch, while the others output `None`. The output is always a tuple of length `len(branches)` so each label has a fixed positional slot regardless of which one was selected. The behaviour of the selector depends on `decision_type`: - `decision_type=Decision` (default) — exactly **one** branch is selected per call. All other slots are `None`. - `decision_type=MultiDecision` — **one or more** branches are selected per call. Non-selected slots remain `None`. Use this for multi-label routing where several branches may need to fire at once (e.g., an article that spans both `science` and `finance`, or a query that should be answered by both a retrieval and a tool-using sub-program). Single-label example (one branch active per call): ``` import synalinks import asyncio async def main(): class Query(synalinks.DataModel): query: str class Answer(synalinks.DataModel): answer: str class AnswerWithCritique(synalinks.DataModel): thinking: str critique: str answer: str language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) (x1, x2) = await synalinks.Branch( question="What is the difficulty level of the above query?", labels=["easy", "difficult"], branches=[ synalinks.Generator( data_model=Answer, language_model=language_model, ), synalinks.Generator( data_model=AnswerWithCritique, language_model=language_model, ), ], language_model=language_model, )(x0) x3 = x1 | x2 program = synalinks.Program( inputs=x0, outputs=x3, name="adaptative_chain_of_thought", description="Useful to answer step by step only when needed", ) if __name__ == "__main__": asyncio.run(main()) ``` Multi-label example (zero, one, or several branches active per call): ``` import synalinks import asyncio async def main(): class Article(synalinks.DataModel): text: str class ScienceSummary(synalinks.DataModel): thinking: str science_summary: str class FinanceSummary(synalinks.DataModel): thinking: str finance_summary: str class SportsSummary(synalinks.DataModel): thinking: str sports_summary: str language_model = synalinks.LanguageModel(model="ollama/mistral") x0 = synalinks.Input(data_model=Article) # Each label has a fixed slot in the output tuple. With # MultiDecision, several may be populated at once; the rest # are None. (sci, fin, spo) = await synalinks.Branch( question="Which topics does this article cover?", labels=["science", "finance", "sports"], branches=[ synalinks.Generator( data_model=ScienceSummary, language_model=language_model, ), synalinks.Generator( data_model=FinanceSummary, language_model=language_model, ), synalinks.Generator( data_model=SportsSummary, language_model=language_model, ), ], decision_type=synalinks.MultiDecision, language_model=language_model, )(x0) if __name__ == "__main__": asyncio.run(main()) ``` For a biotech-startup article the result might be `(, , None)` — `science` and `finance` are both active, `sports` stays `None`. The non-active slots can be combined downstream with `|` (logical OR) the same way as in the single-label example. Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `question` | `str` | The question to ask. | `None` | | `labels` | `list` | The list of labels to choose from (strings). | `None` | | `branches` | `list` | The list of modules or programs to select from. | `None` | | `inject_decision` | `bool` | If True, inject the decision to the branch inputs. (default to True). | `True` | | `return_decision` | `bool` | If True, return the decision with the branch outputs. (default to True). | `True` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The default jinja2 prompt template to use (see Generator). | `None` | | `examples` | `list` | The default examples to use in the prompt (see Decision). | `None` | | `instructions` | `list` | The default instructions to use (see Decision). | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the decision prompt (Default to False) (see Decision). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the decision prompt (Default to False) (see Decision). | `False` | | `decision_type` | `type` | Optional. The decision module class. Defaults to Decision (single-label, exactly one branch active). Pass MultiDecision to enable multi-label routing where several branches may be active simultaneously. | `Decision` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/core/branch.py` ```` @synalinks_export(["synalinks.modules.Branch", "synalinks.Branch"]) class Branch(Module): """Use a `LanguageModel` to select which module(s) to call based on an arbitrary input, a question and a list of labels. The selected branch(es) output the data model computed using the inputs and module's branch, while the others output `None`. The output is always a tuple of length `len(branches)` so each label has a fixed positional slot regardless of which one was selected. The behaviour of the selector depends on `decision_type`: - `decision_type=Decision` (default) — exactly **one** branch is selected per call. All other slots are `None`. - `decision_type=MultiDecision` — **one or more** branches are selected per call. Non-selected slots remain `None`. Use this for multi-label routing where several branches may need to fire at once (e.g., an article that spans both `science` and `finance`, or a query that should be answered by both a retrieval and a tool-using sub-program). Single-label example (one branch active per call): ```python import synalinks import asyncio async def main(): class Query(synalinks.DataModel): query: str class Answer(synalinks.DataModel): answer: str class AnswerWithCritique(synalinks.DataModel): thinking: str critique: str answer: str language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) (x1, x2) = await synalinks.Branch( question="What is the difficulty level of the above query?", labels=["easy", "difficult"], branches=[ synalinks.Generator( data_model=Answer, language_model=language_model, ), synalinks.Generator( data_model=AnswerWithCritique, language_model=language_model, ), ], language_model=language_model, )(x0) x3 = x1 | x2 program = synalinks.Program( inputs=x0, outputs=x3, name="adaptative_chain_of_thought", description="Useful to answer step by step only when needed", ) if __name__ == "__main__": asyncio.run(main()) ``` Multi-label example (zero, one, or several branches active per call): ```python import synalinks import asyncio async def main(): class Article(synalinks.DataModel): text: str class ScienceSummary(synalinks.DataModel): thinking: str science_summary: str class FinanceSummary(synalinks.DataModel): thinking: str finance_summary: str class SportsSummary(synalinks.DataModel): thinking: str sports_summary: str language_model = synalinks.LanguageModel(model="ollama/mistral") x0 = synalinks.Input(data_model=Article) # Each label has a fixed slot in the output tuple. With # MultiDecision, several may be populated at once; the rest # are None. (sci, fin, spo) = await synalinks.Branch( question="Which topics does this article cover?", labels=["science", "finance", "sports"], branches=[ synalinks.Generator( data_model=ScienceSummary, language_model=language_model, ), synalinks.Generator( data_model=FinanceSummary, language_model=language_model, ), synalinks.Generator( data_model=SportsSummary, language_model=language_model, ), ], decision_type=synalinks.MultiDecision, language_model=language_model, )(x0) if __name__ == "__main__": asyncio.run(main()) ``` For a biotech-startup article the result might be `(, , None)` — `science` and `finance` are both active, `sports` stays `None`. The non-active slots can be combined downstream with `|` (logical OR) the same way as in the single-label example. Args: question (str): The question to ask. labels (list): The list of labels to choose from (strings). branches (list): The list of modules or programs to select from. inject_decision (bool): If True, inject the decision to the branch inputs. (default to True). return_decision (bool): If True, return the decision with the branch outputs. (default to True). language_model (LanguageModel): The language model to use. prompt_template (str): The default jinja2 prompt template to use (see `Generator`). examples (list): The default examples to use in the prompt (see `Decision`). instructions (list): The default instructions to use (see `Decision`). seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the decision prompt (Default to False) (see `Decision`). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the decision prompt (Default to False) (see `Decision`). decision_type (type): Optional. The decision module class. Defaults to `Decision` (single-label, exactly one branch active). Pass `MultiDecision` to enable multi-label routing where several branches may be active simultaneously. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, question=None, labels=None, branches=None, inject_decision=True, return_decision=True, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, reasoning_effort=None, use_inputs_schema=False, use_outputs_schema=False, decision_type=Decision, name=None, description=None, trainable=True, **kwargs, ): super().__init__( name=name, description=description, trainable=trainable, ) if not branches: raise ValueError("The `branches` argument must be provided.") if not isinstance(branches, list): raise ValueError("The `branches` must be a list of `Module` or `Program`.") if len(labels) != len(branches): raise ValueError("The `labels` and `branches` must have the same length.") self.question = question self.labels = labels self.branches = {labels[i]: m for i, m in enumerate(branches)} self.inject_decision = inject_decision self.return_decision = return_decision self.language_model = _get_lm(language_model) self.prompt_template = prompt_template self.examples = examples self.instructions = instructions self.seed_instructions = seed_instructions self.temperature = temperature self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.decision = decision_type( question=self.question, labels=self.labels, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, seed_instructions=self.seed_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, name="decision_" + self.name, ) async def call(self, inputs, training=False): outputs = [None] * len(self.branches) if not inputs: return tuple(outputs) decision = await self.decision( inputs, training=training, ) if not decision: return tuple(outputs) choice = decision.get("choice", decision.get("choices")) if not choice: return tuple(outputs) if self.inject_decision: inputs = await ops.concat( inputs, decision, name="inputs_with_decision_" + self.name, ) tasks = [] async def execute_branch( inputs, module=None, decision=None, return_decision=False ): if not inputs: return None if return_decision: return await ops.logical_and( decision, await module(inputs), ) else: return await module(inputs) for label in self.labels: module = self.branches[label] selected = False if isinstance(choice, str): if label == choice: selected = True elif isinstance(choice, (list, set)): if label in choice: selected = True if selected and module: tasks.append( execute_branch( inputs, module, decision, return_decision=self.return_decision, ) ) else: tasks.append(execute_branch(None)) outputs = await asyncio.gather(*tasks) return tuple(outputs) async def compute_output_spec(self, inputs, training=False): outputs = [] decision = await self.decision( inputs, training=training, ) if self.inject_decision: inputs = await ops.concat( inputs, decision, name="inputs_with_decision_" + self.name, ) for label in self.labels: module = self.branches[label] if self.return_decision: outputs.append( await ops.logical_and( decision, await module( inputs, training=training, ), name="with_decision_" + self.name, ) ) else: outputs.append( await module( inputs, training=training, ) ) return tuple(outputs) def get_config(self): config = { "question": self.question, "labels": self.labels, "inject_decision": self.inject_decision, "return_decision": self.return_decision, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model ) } branches_config = { "branches": [ serialization_lib.serialize_synalinks_object(branch) for branch in self.branches.values() ] } return {**config, **language_model_config, **branches_config} @classmethod def from_config(cls, config, custom_objects=None): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) branches = [ serialization_lib.deserialize_synalinks_object( branch_config, custom_objects=custom_objects ) for branch_config in config.pop("branches") ] return cls(language_model=language_model, branches=branches, **config) ```` ## `Decision` Bases: `Module` Perform a decision on the given input based on a question and a list of labels. This module dynamically create an `Enum` schema based on the given labels and use it to generate a possible answer using structured output. This ensure that the LM answer is **always** one of the provided labels. Example: ``` import synalinks import asyncio async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=synalinks.ChatMessages) x1 = await synalinks.Decision( question="What is the danger level of the discussion?", labels=["low", "medium", "high"], language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="discussion_danger_assessment", description="This program assesses the level of danger in a discussion.", ) if __name__ == "__main__": asyncio.run(main()) ``` You can view this module, as performing a single label classification on the input. Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `question` | `str` | The question to ask. | `None` | | `labels` | `list` | The list of labels to choose from (strings). | `None` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The default jinja2 prompt template to use (see Generator). | `None` | | `examples` | `list` | The default examples to use in the prompt (see Generator). | `None` | | `instructions` | `list` | The default instructions to use (see Generator). | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False) (see Generator). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False) (see Generator). | `False` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/core/decision.py` ```` @synalinks_export(["synalinks.modules.Decision", "synalinks.Decision"]) class Decision(Module): """Perform a decision on the given input based on a question and a list of labels. This module dynamically create an `Enum` schema based on the given labels and use it to generate a possible answer using structured output. This ensure that the LM answer is **always** one of the provided labels. Example: ```python import synalinks import asyncio async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=synalinks.ChatMessages) x1 = await synalinks.Decision( question="What is the danger level of the discussion?", labels=["low", "medium", "high"], language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="discussion_danger_assessment", description="This program assesses the level of danger in a discussion.", ) if __name__ == "__main__": asyncio.run(main()) ``` You can view this module, as performing a single label classification on the input. Args: question (str): The question to ask. labels (list): The list of labels to choose from (strings). language_model (LanguageModel): The language model to use. prompt_template (str): The default jinja2 prompt template to use (see `Generator`). examples (list): The default examples to use in the prompt (see `Generator`). instructions (list): The default instructions to use (see `Generator`). seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False) (see `Generator`). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False) (see `Generator`). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, question=None, labels=None, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, reasoning_effort=None, use_inputs_schema=False, use_outputs_schema=False, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) if not question: raise ValueError("The `question` argument must be provided.") if not labels: raise ValueError("The `labels` argument must be provided.") if not isinstance(labels, list): raise ValueError("The `labels` parameter must be a list of string.") schema = dynamic_enum(DecisionAnswer.get_schema(), "choice", labels) self.schema = schema self.question = question self.labels = labels self.language_model = _get_lm(language_model) self.prompt_template = prompt_template self.examples = examples if not instructions: instructions = default_decision_instructions(self.labels) self.instructions = instructions self.temperature = temperature self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.decision = Generator( schema=self.schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, name="generator_" + self.name, ) async def call(self, inputs, training=False): if not inputs: return None inputs = await ops.concat( inputs, Question(question=self.question), name="inputs_with_question_" + self.name, ) result = await self.decision(inputs, training=training) return result def get_config(self): config = { "question": self.question, "labels": self.labels, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model ) } return {**config, **language_model_config} @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) return cls(language_model=language_model, **config) ```` ## `default_decision_instructions(labels)` The decision default instructions Source code in `synalinks/src/modules/core/decision.py` ``` def default_decision_instructions(labels): """The decision default instructions""" return f""" You will be given a question, your task is to answer step-by-step to choose one the following labels: {labels} """.strip() ``` ## `Generator` Bases: `Module` Use a `LanguageModel` to generate a data model from an arbitrary input data model. Example: ``` import synalinks import asyncio async def main(): class Query(DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithCritique(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking", ) critique: str = synalinks.Field( description="The critique of the above thinking", ) answer: str = synalinks.Field( description="The correct answer", ) language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Generator( data_model=AnswerWithCritique, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="chain_of_thought_with_critique", description="Useful to answer step by step and evaluate your answer", ) if __name__ == "__main__": asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | `schema` | `dict` | The target JSON schema. If not provided use the data_model to infer it. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The jinja2 prompt template. | `None` | | `examples` | `list` | The default list of examples, the examples are a list of tuples containing input/output JSON pairs. | `None` | | `instructions` | `str` | The default instructions being a string containing instructions for the language model. | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False). | `False` | | `return_inputs` | `bool` | Optional. Whether or not to concatenate the inputs to the outputs (Default to False). | `False` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `streaming` | `str` | Optional. If true stream the LM response, enabled only if schema is None and only during inference (not during training). | `False` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/core/generator.py` ```` @synalinks_export(["synalinks.modules.Generator", "synalinks.Generator"]) class Generator(Module): """ Use a `LanguageModel` to generate a data model from an arbitrary input data model. Example: ```python import synalinks import asyncio async def main(): class Query(DataModel): query: str = synalinks.Field( description="The user query", ) class AnswerWithCritique(synalinks.DataModel): thinking: str = synalinks.Field( description="Your step by step thinking", ) critique: str = synalinks.Field( description="The critique of the above thinking", ) answer: str = synalinks.Field( description="The correct answer", ) language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.Generator( data_model=AnswerWithCritique, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="chain_of_thought_with_critique", description="Useful to answer step by step and evaluate your answer", ) if __name__ == "__main__": asyncio.run(main()) ``` Args: schema (dict): The target JSON schema. If not provided use the `data_model` to infer it. data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data model for structured output. language_model (LanguageModel): The language model to use. prompt_template (str): The jinja2 prompt template. examples (list): The default list of examples, the examples are a list of tuples containing input/output JSON pairs. instructions (str): The default instructions being a string containing instructions for the language model. seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False). return_inputs (bool): Optional. Whether or not to concatenate the inputs to the outputs (Default to False). temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). streaming (str): Optional. If true stream the LM response, enabled only if `schema` is `None` and only during inference (not during training). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, use_inputs_schema=False, use_outputs_schema=False, return_inputs=False, temperature=0.0, reasoning_effort=None, streaming=False, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) if not schema and data_model: schema = data_model.get_schema() self.schema = schema # `language_model` may be None; `ops.predict` resolves the default # at call time (or raises if none is set). self.language_model = _get_lm(language_model) if not prompt_template: prompt_template = default_prompt_template() self.prompt_template = prompt_template if not examples: examples = [] self.examples = examples if not instructions and self.schema: data_model_keys = list(self.schema["properties"].keys()) instructions = default_instructions(data_model_keys) self.instructions = instructions self.return_inputs = return_inputs self.temperature = temperature efforts = ["minimal", "low", "medium", "high", "disable", "none", None] if reasoning_effort not in efforts: raise ValueError( f"The reasoning effort parameter should be one of: {efforts}" ) self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema if schema and streaming: streaming = False self.streaming = streaming predictions = [ Prediction( inputs=example[0], outputs=example[1], reward=None, ).get_json() for example in examples ] if not seed_instructions: seed_instructions = [] self.seed_instructions = seed_instructions seed_candidates = [ { "instructions": seed_instruction, } for seed_instruction in self.seed_instructions ] self.state = self.add_variable( initializer=Instructions( instructions=instructions, examples=predictions, seed_candidates=seed_candidates, ).get_json(), data_model=Instructions, name="state_" + self.name, ) async def call(self, inputs, training=False): if not inputs: return None msgs = self.format_messages(inputs) if self.streaming and not training: streaming = True else: streaming = False result = await ops.predict( msgs, schema=self.schema, language_model=self.language_model, streaming=streaming, name="prediction_" + self.name, temperature=self.temperature, reasoning_effort=self.reasoning_effort, ) if streaming: return result if result: if training: predictions = self.state.get("current_predictions") predictions.append( { "inputs": inputs.get_json(), "outputs": result.get_json(), "reward": None, } ) if self.return_inputs: return await ops.concat( inputs, result, name="with_inputs_" + self.name, ) else: return result return None async def compute_output_spec(self, inputs, training=False): if self.schema: if self.return_inputs: return await ops.concat( inputs, SymbolicDataModel( schema=self.schema, name=self.name, ), name="with_inputs_" + self.name, ) else: return SymbolicDataModel( schema=self.schema, name=self.name, ) else: if self.return_inputs: return await ops.concat( inputs, SymbolicDataModel( schema=ChatMessage.get_schema(), name=self.name, ), name="with_inputs_" + self.name, ) else: return SymbolicDataModel( schema=ChatMessage.get_schema(), name=self.name, ) def format_messages(self, inputs=None): template = jinja2.Template(self.prompt_template) rendered_prompt = template.render( inputs_schema=inputs.get_schema() if self.use_inputs_schema else None, outputs_schema=self.schema if self.use_outputs_schema else None, examples=[ (pred.get("inputs"), pred.get("outputs")) for pred in self.state.get("examples") ], instructions=self.state.get("instructions"), ) system_message = ChatMessage(role="system", content=rendered_prompt) user_message = ChatMessage( role="user", content=f"## Input:\n{inputs.get_json()}\n##Output:\n" ) msgs = ChatMessages(messages=[system_message, user_message]) return msgs def get_config(self): config = { "schema": self.schema, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "return_inputs": self.return_inputs, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } return { **config, **language_model_config, } @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model"), ) return cls( language_model=language_model, **config, ) ```` ## `default_prompt_template()` Returns the default prompt template. Returns: | Type | Description | | ----- | ---------------------------- | | `str` | The default prompt template. | Source code in `synalinks/src/modules/core/generator.py` ``` @synalinks_export("synalinks.default_prompt_template") def default_prompt_template(): """Returns the default prompt template. Returns: (str): The default prompt template. """ return """ # Instructions {{ instructions }} {% if inputs_schema %} # Input Schema {{ inputs_schema }} {% endif %}{% if outputs_schema %} # Output schema {{ outputs_schema }} {% endif %}{% if examples %} # Examples {% for example in examples %} ## Input: {{ example[0] }} ## Output: {{ example[1] }} {% endfor %} {% endif %} """.strip() ``` ## `Identity` Bases: `Module` Identity module. This module should be used as a placeholder when no operation is to be performed. The module just returns its `inputs` argument as output. This module can be really useful during development process in order to implement the whole program architecture before the individual modules. It avoid any data models naming issue that you could have by just forwarding the inputs, that way you can implement the general program architecture, validate it and implement the individual modules later. Example: ``` import synalinks class MyAwesomeModule(synalinks.Program): def __init__( name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) async def build(self, inputs): outputs = await synalinks.Identity()(inputs) super().__init__( inputs=inputs, outputs=outputs, name=self.name, description=self.description, trainable=self.trainable, ) ``` Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------ | ------- | | `**kwargs` | `keyword arguments` | The default module's arguments | `{}` | Source code in `synalinks/src/modules/core/identity.py` ```` @synalinks_export(["synalinks.modules.Identity", "synalinks.Identity"]) class Identity(Module): """Identity module. This module should be used as a placeholder when no operation is to be performed. The module just returns its `inputs` argument as output. This module can be really useful during development process in order to implement the whole program architecture before the individual modules. It avoid any data models naming issue that you could have by just forwarding the inputs, that way you can implement the general program architecture, validate it and implement the individual modules later. Example: ```python import synalinks class MyAwesomeModule(synalinks.Program): def __init__( name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) async def build(self, inputs): outputs = await synalinks.Identity()(inputs) super().__init__( inputs=inputs, outputs=outputs, name=self.name, description=self.description, trainable=self.trainable, ) ``` Args: **kwargs (keyword arguments): The default module's arguments """ def __init__(self, **kwargs): super().__init__(**kwargs) self.built = True async def call(self, inputs): if isinstance(inputs, (JsonDataModel, SymbolicDataModel)): return inputs.clone() return tree.map_structure( lambda x: x.clone(), inputs, ) ```` ## `Input(schema=None, data_model=None, optional=False, name=None)` Used to instantiate a `SymbolicDataModel`. A `SymbolicDataModel` is a symbolic data model-like object, which we augment with certain attributes that allow us to build a Synalinks `Program` just by knowing the inputs and outputs of the program (similar to Keras symbolic tensor). Example: ``` import synalinks class Query(synalinks.DataModel): query: str inputs = synalinks.Input(data_model=Query) # You can also create it using a JSON schema like this: inputs = synalinks.Input(schema=Query.get_schema()) # Or using a symbolic datamodel: inputs = synalinks.Input(data_model=Query.to_symbolic_data_model()) ``` Parameters: | Name | Type | Description | Default | | ------------ | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `schema` | `dict` | A Json schema of the data_model. If not provided uses the data_model argument. | `None` | | `data_model` | `DataModel` | Optional existing data model to wrap into the Input layer. If set, the module will use this data_model rather than creating a new placeholder data model. | `None` | | `optional` | `bool` | Whether the input is optional or not. An optional input can accept None values. | `False` | | `name` | `string` | Optional name string for the module. Should be unique in a program (do not reuse the same name twice). It will be autogenerated if it isn't provided. | `None` | Returns: | Type | Description | | ------------------- | --------------------------------------------------------------------- | | `SymbolicDataModel` | The symbolic data model corresponding to the given data model/schema. | Source code in `synalinks/src/modules/core/input_module.py` ```` @synalinks_export(["synalinks.modules.Input", "synalinks.Input"]) def Input( schema=None, data_model=None, optional=False, name=None, ): """Used to instantiate a `SymbolicDataModel`. A `SymbolicDataModel` is a symbolic data model-like object, which we augment with certain attributes that allow us to build a Synalinks `Program` just by knowing the inputs and outputs of the program (similar to Keras symbolic tensor). Example: ```python import synalinks class Query(synalinks.DataModel): query: str inputs = synalinks.Input(data_model=Query) # You can also create it using a JSON schema like this: inputs = synalinks.Input(schema=Query.get_schema()) # Or using a symbolic datamodel: inputs = synalinks.Input(data_model=Query.to_symbolic_data_model()) ``` Args: schema (dict): A Json schema of the data_model. If not provided uses the `data_model` argument. data_model (DataModel): Optional existing data model to wrap into the `Input` layer. If set, the module will use this data_model rather than creating a new placeholder data model. optional (bool): Whether the input is optional or not. An optional input can accept `None` values. name (string): Optional name string for the module. Should be unique in a program (do not reuse the same name twice). It will be autogenerated if it isn't provided. Returns: (SymbolicDataModel): The symbolic data model corresponding to the given data model/schema. """ module = InputModule( schema=schema, input_data_model=data_model.to_symbolic_data_model() if data_model else None, optional=optional, name=name, ) return module.output ```` ## `Not` Bases: `Module` Not module. This module should be used as a placeholder when no operation is to be performed and the output should be None. This module is useful to implement stop conditions when combined with a conditional branch or as placeholder (like the Identity) before implementing guards that leverage the xor operation. Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------ | ------- | | `**kwargs` | `keyword arguments` | The default module's arguments | `{}` | Source code in `synalinks/src/modules/core/not_module.py` ``` @synalinks_export(["synalinks.modules.Not", "synalinks.Not"]) class Not(Module): """Not module. This module should be used as a placeholder when no operation is to be performed and the output should be None. This module is useful to implement stop conditions when combined with a conditional branch or as placeholder (like the Identity) before implementing guards that leverage the xor operation. Args: **kwargs (keyword arguments): The default module's arguments """ def __init__(self, **kwargs): super().__init__(**kwargs) self.built = True async def call(self, inputs): if isinstance(inputs, (JsonDataModel, SymbolicDataModel)): return None return tree.map_structure( lambda x: None, inputs, ) async def compute_output_spec(self, inputs): if isinstance(inputs, (JsonDataModel, SymbolicDataModel)): return inputs.clone() return tree.map_structure( lambda x: x.clone(), inputs, ) ``` ## `Tool` Bases: `Module` A module that wraps an async function as a callable tool. The `Tool` module allows you to wrap any async function and use it as a module within Synalinks programs. It automatically extracts the function's schema from its type hints and docstring. Example: ``` import synalinks @synalinks.saving.register_synalinks_serializable() async def calculate(expression: str): """Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate. """ result = eval(expression) return {"result": result} tool = synalinks.Tool(calculate) result = await tool(expression="2 + 2") ``` Important **No Optional Parameters**: All function parameters must be required. Optional parameters with default values are not supported because LLM providers require all parameters to be required in their structured output JSON schemas. **Complete Docstring Required**: The wrapped function must have a complete docstring with an `Args:` section that documents every parameter. The Tool extracts parameter descriptions from the docstring to build the JSON schema sent to the language model. Missing descriptions will raise a ValueError. Example of a properly documented tool function: ``` async def search(query: str, limit: int): """Search the database for matching records. Args: query (str): The search query string. limit (int): Maximum number of results to return. """ # Implementation here return {"results": [...]} ``` Parameters: | Name | Type | Description | Default | | ------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------- | ---------- | | `func` | `Callable` | The async function to wrap as a tool. | *required* | | `name` | `str` | Optional. The name of the module. Defaults to the function name. | `None` | | `description` | `str` | Optional. The description of the module. Defaults to the function's docstring short description. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. Defaults to False since tools typically don't have trainable state. | `False` | Source code in `synalinks/src/modules/core/tool.py` ```` @synalinks_export(["synalinks.modules.Tool", "synalinks.Tool"]) class Tool(Module): """A module that wraps an async function as a callable tool. The `Tool` module allows you to wrap any async function and use it as a module within Synalinks programs. It automatically extracts the function's schema from its type hints and docstring. Example: ```python import synalinks @synalinks.saving.register_synalinks_serializable() async def calculate(expression: str): \"\"\"Calculate the result of a mathematical expression. Args: expression (str): The mathematical expression to calculate. \"\"\" result = eval(expression) return {"result": result} tool = synalinks.Tool(calculate) result = await tool(expression="2 + 2") ``` Important: **No Optional Parameters**: All function parameters must be required. Optional parameters with default values are not supported because LLM providers require all parameters to be required in their structured output JSON schemas. **Complete Docstring Required**: The wrapped function must have a complete docstring with an `Args:` section that documents every parameter. The Tool extracts parameter descriptions from the docstring to build the JSON schema sent to the language model. Missing descriptions will raise a ValueError. Example of a properly documented tool function: ```python async def search(query: str, limit: int): \"\"\"Search the database for matching records. Args: query (str): The search query string. limit (int): Maximum number of results to return. \"\"\" # Implementation here return {"results": [...]} ``` Args: func (Callable): The async function to wrap as a tool. name (str): Optional. The name of the module. Defaults to the function name. description (str): Optional. The description of the module. Defaults to the function's docstring short description. trainable (bool): Whether the module's variables should be trainable. Defaults to False since tools typically don't have trainable state. """ def __init__( self, func: typing.Callable, *, name=None, description=None, trainable=False, ): self._func = func if not inspect.iscoroutinefunction(self._func): raise TypeError(f"{self._func.__name__} is not an asynchronous function") doc = inspect.getdoc(func) if not doc: raise ValueError(f"The tool ({self._func.__name__}) must have a docstring") self._docstring = docstring_parser.parse(doc) self._signature = inspect.signature(func) self._type_hints = typing.get_type_hints(func) self._params_schema = {} self._required_params = [] self._parse_arguments() # Use function name if no name provided if not name: name = self._func.__name__ # Use docstring short description if no description provided if not description: description = self._docstring.short_description or "" if not description: logging.warning( f"The tool ({name}) has no description. " "This is unsafe behavior and may lead to issues." ) super().__init__( name=name, description=description, trainable=trainable, ) def _parse_arguments(self): """Parse the function arguments to build the input parameter schema.""" for param_name, param in self._signature.parameters.items(): param_schema = get_param_schema( param_name, param, self._type_hints, self._docstring, ) self._params_schema[param_name] = param_schema if param.default is param.empty: self._required_params.append(param_name) def _build_output_schema(self): """Build the output schema from the function's return type hint. Since tools must always return a dict, this method ensures the output schema is always of type "object". """ return_type = self._type_hints.get("return", None) base_schema = { "type": "object", "title": f"{self.name}_output", } if return_type is None: # No return type hint, use generic object schema base_schema["additionalProperties"] = True return base_schema origin = typing.get_origin(return_type) args = typing.get_args(return_type) # Handle Dict[K, V] - extract value type for additionalProperties if origin is dict or origin is typing.Dict: if len(args) >= 2: value_type = args[1] try: value_schema = json_schema_type(value_type) if isinstance(value_schema, dict): base_schema["additionalProperties"] = value_schema else: base_schema["additionalProperties"] = {"type": value_schema} except ValueError: base_schema["additionalProperties"] = True else: base_schema["additionalProperties"] = True return base_schema # Handle TypedDict - extract properties from annotations if hasattr(return_type, "__annotations__"): properties = {} required = [] for field_name, field_type in return_type.__annotations__.items(): try: field_schema = json_schema_type(field_type) if isinstance(field_schema, dict): properties[field_name] = field_schema else: properties[field_name] = {"type": field_schema} # Check if field is required (not Optional) if typing.get_origin(field_type) is not typing.Union: required.append(field_name) elif type(None) not in typing.get_args(field_type): required.append(field_name) except ValueError: properties[field_name] = {} if properties: base_schema["properties"] = properties if required: base_schema["required"] = required base_schema["additionalProperties"] = False return base_schema # Fallback to generic object schema base_schema["additionalProperties"] = True return base_schema def get_input_schema(self) -> dict: """Get the JSON schema for this tool's input parameters. Returns: dict: The JSON schema describing the tool's input parameters. """ return { "additionalProperties": False, "description": self._docstring.short_description, "properties": self._params_schema, "required": self._required_params, "title": self.name.title().replace("_", " "), "type": "object", } @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), retry=retry_if_exception_type((Exception,)), reraise=True, ) async def call( self, training: bool = False, **kwargs: typing.Any ) -> typing.Optional[JsonDataModel]: """Execute the wrapped function with the provided arguments. Args: training (bool): Whether in training mode. Not used by Tool but included for consistency with other modules. **kwargs (Any): The arguments to pass to the wrapped function. Returns: JsonDataModel: The result wrapped in a JsonDataModel with the output schema. """ result = await self._func(**kwargs) if result is None: return None if isinstance(result, dict): return JsonDataModel( json=result, schema=self._build_output_schema(), name=f"{self.name}_output", ) # Wrap non-dict results in a dict return JsonDataModel( json={"result": result}, schema=self._build_output_schema(), name=f"{self.name}_output", ) async def compute_output_spec( self, training: bool = False, **kwargs: typing.Any ) -> SymbolicDataModel: """Compute the output specification for the tool. Uses the function's schema to define the output structure. Args: training (bool): Whether in training mode. **kwargs (Any): The input arguments. Returns: SymbolicDataModel: A SymbolicDataModel with the tool's output schema. """ return SymbolicDataModel( schema=self._build_output_schema(), name=self.name, ) def get_tool_schema(self) -> dict: """Get the JSON schema for this tool's parameters. Returns: dict: The JSON schema describing the tool's input parameters. """ schema = { "additionalProperties": False, "description": self._docstring.short_description, "properties": self._params_schema, "required": self._required_params, "title": self.name.title().replace("_", " "), "type": "object", } return schema def get_config(self): config = { "name": self.name, "description": self.description, "trainable": self.trainable, } func_config = {"func": serialization_lib.serialize_synalinks_object(self._func)} return {**config, **func_config} @classmethod def from_config(cls, config): func = serialization_lib.deserialize_synalinks_object(config.pop("func")) return cls(func=func, **config) ```` ### `call(training=False, **kwargs)` Execute the wrapped function with the provided arguments. Parameters: | Name | Type | Description | Default | | ---------- | ------ | ------------------------------------------------------------------------------------------- | ------- | | `training` | `bool` | Whether in training mode. Not used by Tool but included for consistency with other modules. | `False` | | `**kwargs` | `Any` | The arguments to pass to the wrapped function. | `{}` | Returns: | Name | Type | Description | | --------------- | ------------------------- | ------------------------------------------------------------- | | `JsonDataModel` | `Optional[JsonDataModel]` | The result wrapped in a JsonDataModel with the output schema. | Source code in `synalinks/src/modules/core/tool.py` ``` @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), retry=retry_if_exception_type((Exception,)), reraise=True, ) async def call( self, training: bool = False, **kwargs: typing.Any ) -> typing.Optional[JsonDataModel]: """Execute the wrapped function with the provided arguments. Args: training (bool): Whether in training mode. Not used by Tool but included for consistency with other modules. **kwargs (Any): The arguments to pass to the wrapped function. Returns: JsonDataModel: The result wrapped in a JsonDataModel with the output schema. """ result = await self._func(**kwargs) if result is None: return None if isinstance(result, dict): return JsonDataModel( json=result, schema=self._build_output_schema(), name=f"{self.name}_output", ) # Wrap non-dict results in a dict return JsonDataModel( json={"result": result}, schema=self._build_output_schema(), name=f"{self.name}_output", ) ``` ### `compute_output_spec(training=False, **kwargs)` Compute the output specification for the tool. Uses the function's schema to define the output structure. Parameters: | Name | Type | Description | Default | | ---------- | ------ | ------------------------- | ------- | | `training` | `bool` | Whether in training mode. | `False` | | `**kwargs` | `Any` | The input arguments. | `{}` | Returns: | Name | Type | Description | | ------------------- | ------------------- | -------------------------------------------------- | | `SymbolicDataModel` | `SymbolicDataModel` | A SymbolicDataModel with the tool's output schema. | Source code in `synalinks/src/modules/core/tool.py` ``` async def compute_output_spec( self, training: bool = False, **kwargs: typing.Any ) -> SymbolicDataModel: """Compute the output specification for the tool. Uses the function's schema to define the output structure. Args: training (bool): Whether in training mode. **kwargs (Any): The input arguments. Returns: SymbolicDataModel: A SymbolicDataModel with the tool's output schema. """ return SymbolicDataModel( schema=self._build_output_schema(), name=self.name, ) ``` ### `get_input_schema()` Get the JSON schema for this tool's input parameters. Returns: | Name | Type | Description | | ------ | ------ | ------------------------------------------------------- | | `dict` | `dict` | The JSON schema describing the tool's input parameters. | Source code in `synalinks/src/modules/core/tool.py` ``` def get_input_schema(self) -> dict: """Get the JSON schema for this tool's input parameters. Returns: dict: The JSON schema describing the tool's input parameters. """ return { "additionalProperties": False, "description": self._docstring.short_description, "properties": self._params_schema, "required": self._required_params, "title": self.name.title().replace("_", " "), "type": "object", } ``` ### `get_tool_schema()` Get the JSON schema for this tool's parameters. Returns: | Name | Type | Description | | ------ | ------ | ------------------------------------------------------- | | `dict` | `dict` | The JSON schema describing the tool's input parameters. | Source code in `synalinks/src/modules/core/tool.py` ``` def get_tool_schema(self) -> dict: """Get the JSON schema for this tool's parameters. Returns: dict: The JSON schema describing the tool's input parameters. """ schema = { "additionalProperties": False, "description": self._docstring.short_description, "properties": self._params_schema, "required": self._required_params, "title": self.name.title().replace("_", " "), "type": "object", } return schema ``` ## `get_param_schema(param_name, param, type_hints, doc_parsed)` Create a schema for a single parameter. Source code in `synalinks/src/modules/core/tool.py` ``` def get_param_schema( param_name: str, param: inspect.Parameter, type_hints: typing.Dict[str, typing.Any], doc_parsed: docstring_parser.Docstring, ) -> JsonSchema: """Create a schema for a single parameter.""" if param_name not in type_hints: raise ValueError(f"Missing type hint for parameter '{param_name}'") param_type = type_hints[param_name] param_type_str = json_schema_type(param_type) descriptions = (p.description for p in doc_parsed.params if p.arg_name == param_name) param_doc = next(descriptions, None) if param_doc is None: raise ValueError(f"Missing description for parameter '{param_name}' in docstring") param_schema = {} param_schema["description"] = param_doc.replace("\n", " ") param_schema["title"] = param_name.title().replace("_", " ") if isinstance(param_type_str, dict): param_schema.update(**param_type_str) else: param_schema["type"] = param_type_str if param.default is not param.empty: param_schema["default"] = param.default return param_schema ``` ## `json_schema_type(py_type)` Convert a Python type to a JSON schema type. Source code in `synalinks/src/modules/core/tool.py` ``` def json_schema_type(py_type: typing.Any) -> JsonSchema: """Convert a Python type to a JSON schema type.""" mapping = { int: "integer", float: "number", bool: "boolean", str: "string", type(None): "null", } # Check if type is a basic type if py_type in mapping: return mapping[py_type] # For unparameterized list and dict types if py_type is list or py_type is typing.List: return {"type": "array", "items": {}} if py_type is dict or py_type is typing.Dict: return {"type": "object", "additionalProperties": {}} origin = typing.get_origin(py_type) args = typing.get_args(py_type) if origin is typing.Union: # Handle Optional[type] which is Union[type, None] if len(args) == 2 and type(None) in args: return json_schema_type(args[0]) else: return [json_schema_type(arg) for arg in args] if origin is list or origin is typing.List: schema_type = json_schema_type(args[0]) if isinstance(schema_type, dict): return {"type": "array", "items": schema_type} return { "type": "array", "items": {"type": json_schema_type(args[0])}, } if origin is dict or origin is typing.Dict: schema_type = json_schema_type(args[1]) if isinstance(schema_type, dict): return {"type": "object", "additionalProperties": schema_type} return {"type": "object", "additionalProperties": {"type": schema_type}} raise ValueError(f"Cannot convert {py_type} to a JSON schema type") ``` ## `EmbedKnowledge` Bases: `Module` Extracts a field of interest and generate the corresponding embedding vector. This module is designed to work with any data model structure. It supports to mask the entity fields in order to keep **only one** field to embed per data model. **Note**: Each data model should have the *same field* to compute the embedding from like a `name` or `description` field using `in_mask`. **Or** every data model should have *only one field left* after masking using `out_mask` argument. ``` import synalinks import asyncio from typing import Literal class Document(synalinks.DataModel): title: str = synalinks.Field( description="The document title", ) text: str = synalinks.Field( description="The document content", ) async def main(): inputs = synalinks.Input(data_model=Document) outputs = await synalinks.EmbedKnowledge( embedding_model=embedding_model, in_mask=["text"], )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="embbed_document", description="Embbed the given documents" ) doc = Document( title="my title", text="my document", ) result = await program(doc) if __name__ == "__main__": asyncio.run(main()) ``` If you want to process batch asynchronously use `program.predict()` instead, see the [FAQ](https://synalinks.github.io/synalinks/FAQ/#whats-the-difference-between-program-methods-predict-and-__call__) to understand the difference between `program()` and `program.predict()` Here is an example: ``` import synalinks import asyncio import numpy as np from typing import Literal class Document(synalinks.Entity): label: Literal["Document"] text: str = synalinks.Field( description="The document content", ) async def main(): inputs = synalinks.Input(data_model=Document) outputs = await synalinks.EmbedKnowledge( embedding_model=embedding_model, in_mask=["text"], )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="embbed_document", description="Embbed the given documents" ) doc1 = Document(label="Document", text="my document 1") doc2 = Document(label="Document", text="my document 2") doc3 = Document(label="Document", text="my document 3") docs = np.array([doc1, doc2, doc3], dtype="object") embedded_docs = await program.predict(docs) if __name__ == "__main__": asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | ----------------- | ---------------- | --------------------------------------------------- | ------- | | `embedding_model` | `EmbeddingModel` | The embedding model to use. | `None` | | `in_mask` | `list` | A mask applied to keep specific entity fields. | `None` | | `out_mask` | `list` | A mask applied to remove specific entity fields. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `False` | Source code in `synalinks/src/modules/knowledge/embed_knowledge.py` ```` @synalinks_export( [ "synalinks.modules.EmbedKnowledge", "synalinks.EmbedKnowledge", ] ) class EmbedKnowledge(Module): """Extracts a field of interest and generate the corresponding embedding vector. This module is designed to work with any data model structure. It supports to mask the entity fields in order to keep **only one** field to embed per data model. **Note**: Each data model should have the *same field* to compute the embedding from like a `name` or `description` field using `in_mask`. **Or** every data model should have *only one field left* after masking using `out_mask` argument. ```python import synalinks import asyncio from typing import Literal class Document(synalinks.DataModel): title: str = synalinks.Field( description="The document title", ) text: str = synalinks.Field( description="The document content", ) async def main(): inputs = synalinks.Input(data_model=Document) outputs = await synalinks.EmbedKnowledge( embedding_model=embedding_model, in_mask=["text"], )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="embbed_document", description="Embbed the given documents" ) doc = Document( title="my title", text="my document", ) result = await program(doc) if __name__ == "__main__": asyncio.run(main()) ``` If you want to process batch asynchronously use `program.predict()` instead, see the [FAQ](https://synalinks.github.io/synalinks/FAQ/#whats-the-difference-between-program-methods-predict-and-__call__) to understand the difference between `program()` and `program.predict()` Here is an example: ```python import synalinks import asyncio import numpy as np from typing import Literal class Document(synalinks.Entity): label: Literal["Document"] text: str = synalinks.Field( description="The document content", ) async def main(): inputs = synalinks.Input(data_model=Document) outputs = await synalinks.EmbedKnowledge( embedding_model=embedding_model, in_mask=["text"], )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="embbed_document", description="Embbed the given documents" ) doc1 = Document(label="Document", text="my document 1") doc2 = Document(label="Document", text="my document 2") doc3 = Document(label="Document", text="my document 3") docs = np.array([doc1, doc2, doc3], dtype="object") embedded_docs = await program.predict(docs) if __name__ == "__main__": asyncio.run(main()) ``` Args: embedding_model (EmbeddingModel): The embedding model to use. in_mask (list): A mask applied to keep specific entity fields. out_mask (list): A mask applied to remove specific entity fields. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, embedding_model=None, in_mask=None, out_mask=None, name=None, description=None, trainable=False, ): super().__init__( name=name, description=description, trainable=trainable, ) self.embedding_model = _get_em(embedding_model) self.in_mask = in_mask self.out_mask = out_mask async def _embed(self, data_model): embeddings = data_model.get("embeddings") if embeddings: warnings.warn( "Embeddings already generated for this data model. " "Returning original data model." ) return JsonDataModel( json=data_model.get_json(), schema=data_model.get_schema(), name="embedded_" + data_model.name, ) masked_data_model = data_model if self.out_mask: masked_data_model = await ops.out_mask( data_model, mask=self.out_mask, recursive=False, name="out_mask_" + data_model.name, ) elif self.in_mask: masked_data_model = await ops.in_mask( data_model, mask=self.in_mask, recursive=False, name="in_mask_" + data_model.name, ) embeddings = await ops.embedding( masked_data_model, embedding_model=self.embedding_model, name=data_model.name + "_embedding", ) if not embeddings or not embeddings.get("embeddings"): warnings.warn( f"No embeddings generated for data model {data_model.name}. " "Please check that your schema is correct." ) return None embedding_list = embeddings.get("embeddings") if len(embedding_list) != 1: warnings.warn( "Data models can only have one embedding vector per data model, " "adjust `EmbedKnowledge` module's `in_mask` or `out_mask` " "to keep only one field. Skipping embedding." ) return None vector = embedding_list[0] return await ops.concat( data_model, EmbeddingVector(embedding=vector), name="embedded_" + data_model.name, ) async def call(self, inputs): if not inputs: return None return tree.map_structure( lambda x: run_maybe_nested(self._embed(x)), inputs, ) async def compute_output_spec(self, inputs): return tree.map_structure( lambda x: x.clone(name="embedded_" + x.name), inputs, ) def get_config(self): config = { "in_mask": self.in_mask, "out_mask": self.out_mask, "name": self.name, "description": self.description, "trainable": self.trainable, } embedding_model_config = { "embedding_model": serialization_lib.serialize_synalinks_object( self.embedding_model ) } return {**embedding_model_config, **config} @classmethod def from_config(cls, config): embedding_model = serialization_lib.deserialize_synalinks_object( config.pop("embedding_model") ) return cls(embedding_model=embedding_model, **config) ```` ## `HybridRegexSearchQuery` Bases: `DataModel` Output schema for `search_type="hybrid_regex"`. Adds a `patterns` field so the LM can supply the regex side of the hybrid lookup explicitly. Embedding a regex pattern for vector search makes no sense, and treating a natural-language sentence as a regex literally never matches — the two signals need separate inputs. Source code in `synalinks/src/modules/knowledge/retrieve_knowledge.py` ``` class HybridRegexSearchQuery(DataModel): """Output schema for ``search_type="hybrid_regex"``. Adds a ``patterns`` field so the LM can supply the regex side of the hybrid lookup explicitly. Embedding a regex pattern for vector search makes no sense, and treating a natural-language sentence as a regex literally never matches — the two signals need separate inputs. """ tables: List[str] = Field(description="The tables to lookup") search: List[str] = Field( description="Natural-language queries for vector similarity", ) patterns: List[str] = Field( description="Regex patterns (RE2 syntax) for exact-shape matching", ) ``` ## `RetrieveKnowledge` Bases: `Module` Module for retrieving knowledge from a knowledge base. This module uses a language model to generate search queries and retrieves relevant information from a knowledge base using configurable search methods. Parameters: | Name | Type | Description | Default | | ---------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------- | | `knowledge_base` | `KnowledgeBase` | The knowledge base to search. | `None` | | `language_model` | `LanguageModel` | The language model used to generate search queries. | `None` | | `data_models` | `list` | List of data models to search. Defaults to all models in the knowledge base. | `None` | | `search_type` | `str` | The type of search to perform. One of: - "similarity": Vector-based semantic search using embeddings. - "fulltext": BM25-based full-text search. - "hybrid_fts": Vector + BM25 fulltext, fused with RRF (default). Accepts the legacy alias "hybrid". - "regex": RE2 pattern matching against the string fields of each table. The LM is instructed to produce regex patterns instead of natural-language queries. - "hybrid_regex": Vector + regex, fused with RRF. The LM emits both natural-language queries (vector side) and regex patterns (regex side); the two signals are merged with Reciprocal Rank Fusion. The output schema becomes HybridRegexSearchQuery (adds a patterns field). | `'hybrid_fts'` | | `k` | `int` | Maximum number of results to return. Defaults to 10. | `10` | | `similarity_threshold` | `float` | Maximum distance threshold for similarity search (lower = better match). Only used when search_type is "similarity" or "hybrid". | `None` | | `fulltext_threshold` | `float` | Minimum BM25 score threshold for fulltext search (higher = better match). Only used when search_type is "fulltext" or "hybrid". | `None` | | `k_rank` | `int` | RRF smoothing constant for hybrid search. Lower values emphasize top ranks more strongly. Defaults to 60. | `60` | | `fields` | `list` | Field names to match against in regex search. Defaults to every string field on the schema. Only used when search_type="regex". | `None` | | `case_sensitive` | `bool` | When False, regex matches are case-insensitive. Only used when search_type="regex". Defaults to True. | `True` | | `prompt_template` | `str` | Custom prompt template for the search query generator. | `None` | | `examples` | `list` | Example inputs/outputs for few-shot learning. | `None` | | `instructions` | `str` | Custom instructions for the search query generator. | `None` | | `seed_instructions` | `str` | Seed instructions for variability. | `None` | | `temperature` | `float` | Temperature for the language model. Defaults to 0.0. | `0.0` | | `use_inputs_schema` | `bool` | Whether to include input schema in the prompt. | `False` | | `use_outputs_schema` | `bool` | Whether to include output schema in the prompt. | `False` | | `return_inputs` | `bool` | Whether to include original inputs in the output. | `True` | | `return_query` | `bool` | Whether to include the generated search query in the output. | `True` | | `name` | `str` | Name of the module. | `None` | | `description` | `str` | Description of the module. | `None` | | `trainable` | `bool` | Whether the module is trainable. | `True` | Source code in `synalinks/src/modules/knowledge/retrieve_knowledge.py` ``` @synalinks_export( [ "synalinks.modules.RetrieveKnowledge", "synalinks.RetrieveKnowledge", ] ) class RetrieveKnowledge(Module): """Module for retrieving knowledge from a knowledge base. This module uses a language model to generate search queries and retrieves relevant information from a knowledge base using configurable search methods. Args: knowledge_base (KnowledgeBase): The knowledge base to search. language_model (LanguageModel): The language model used to generate search queries. data_models (list): List of data models to search. Defaults to all models in the knowledge base. search_type (str): The type of search to perform. One of: - "similarity": Vector-based semantic search using embeddings. - "fulltext": BM25-based full-text search. - "hybrid_fts": Vector + BM25 fulltext, fused with RRF (default). Accepts the legacy alias ``"hybrid"``. - "regex": RE2 pattern matching against the string fields of each table. The LM is instructed to produce regex patterns instead of natural-language queries. - "hybrid_regex": Vector + regex, fused with RRF. The LM emits both natural-language queries (vector side) and regex patterns (regex side); the two signals are merged with Reciprocal Rank Fusion. The output schema becomes ``HybridRegexSearchQuery`` (adds a ``patterns`` field). k (int): Maximum number of results to return. Defaults to 10. similarity_threshold (float): Maximum distance threshold for similarity search (lower = better match). Only used when search_type is "similarity" or "hybrid". fulltext_threshold (float): Minimum BM25 score threshold for fulltext search (higher = better match). Only used when search_type is "fulltext" or "hybrid". k_rank (int): RRF smoothing constant for hybrid search. Lower values emphasize top ranks more strongly. Defaults to 60. fields (list): Field names to match against in regex search. Defaults to every string field on the schema. Only used when ``search_type="regex"``. case_sensitive (bool): When ``False``, regex matches are case-insensitive. Only used when ``search_type="regex"``. Defaults to ``True``. prompt_template (str): Custom prompt template for the search query generator. examples (list): Example inputs/outputs for few-shot learning. instructions (str): Custom instructions for the search query generator. seed_instructions (str): Seed instructions for variability. temperature (float): Temperature for the language model. Defaults to 0.0. use_inputs_schema (bool): Whether to include input schema in the prompt. use_outputs_schema (bool): Whether to include output schema in the prompt. return_inputs (bool): Whether to include original inputs in the output. return_query (bool): Whether to include the generated search query in the output. name (str): Name of the module. description (str): Description of the module. trainable (bool): Whether the module is trainable. """ def __init__( self, *, knowledge_base=None, language_model=None, data_models=None, search_type: SearchType = "hybrid_fts", k=10, similarity_threshold=None, fulltext_threshold=None, k_rank=60, fields: Optional[List[str]] = None, case_sensitive: bool = True, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, use_inputs_schema=False, use_outputs_schema=False, return_inputs=True, return_query=True, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.knowledge_base = knowledge_base self.language_model = _get_lm(language_model) # Translate legacy aliases (e.g. "hybrid" -> "hybrid_fts") before # validating against the canonical set, so users on the old name # keep working without a code change. search_type = _SEARCH_TYPE_ALIASES.get(search_type, search_type) if search_type not in SEARCH_TYPES: raise ValueError( f"`search_type` must be one of {SEARCH_TYPES}, got '{search_type}'" ) self.search_type = search_type self.k = k self.similarity_threshold = similarity_threshold self.fulltext_threshold = fulltext_threshold self.k_rank = k_rank self.fields = fields self.case_sensitive = case_sensitive self.prompt_template = prompt_template self.examples = examples if not data_models: data_models = knowledge_base.get_symbolic_data_models() self.data_models = data_models tables = [data_model.get_schema().get("title") for data_model in self.data_models] if not instructions: instructions = default_retriever_instructions( tables, search_type=self.search_type ) self.instructions = instructions self.seed_instructions = seed_instructions self.temperature = temperature self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.return_inputs = return_inputs self.return_query = return_query # The LM output schema is chosen by search_type — the hybrid_regex # mode needs an extra `patterns` field, every other mode reuses # the legacy `SearchQuery` shape. self.search_query_generator = Generator( data_model=_search_query_schema_for(self.search_type), language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, seed_instructions=self.seed_instructions, temperature=self.temperature, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, return_inputs=False, name="search_query_generator_" + self.name, ) async def _perform_search( self, search_terms, patterns, target_data_models ): """Perform the search across one or more tables. The adapter's search methods are single-table; this layer iterates over the LM-selected tables and merges the per-table result sets. Dedup uses a sorted-items signature because row dicts may contain unhashable values (lists, dicts), and we don't know which field is the primary key here. Per-table k is set to ``self.k`` and the merged top-k is taken at the end — when ``score`` is present (similarity / fulltext / hybrid), the merge sorts by descending score (lower is better for plain similarity, so we negate); for score-less results (regex), insertion order is preserved. Args: search_terms: List of search query strings. patterns: List of regex patterns from the LM. Only used by the ``hybrid_regex`` mode. target_data_models: List of data models to search. Returns: Merged top-k list of result dicts. """ aggregated: List[Dict[str, Any]] = [] seen: set = set() async def _search_one(table_name): if self.search_type == "similarity": return await self.knowledge_base.similarity_search( search_terms, table_name=table_name, k=self.k, threshold=self.similarity_threshold, ) if self.search_type == "fulltext": return await self.knowledge_base.fulltext_search( search_terms, table_name=table_name, k=self.k, threshold=self.fulltext_threshold, ) if self.search_type == "regex": # regex_search is single-pattern; iterate the LM's # patterns and merge their rows per table. rows_per_table: List[Dict[str, Any]] = [] seen_local: set = set() for pattern in search_terms: rows = await self.knowledge_base.regex_search( pattern, table_name=table_name, fields=self.fields, case_sensitive=self.case_sensitive, k=self.k, ) for row in rows: sig = json.dumps(row, sort_keys=True, default=str) if sig not in seen_local: seen_local.add(sig) rows_per_table.append(row) return rows_per_table if self.search_type == "hybrid_regex": return await self.knowledge_base.hybrid_regex_search( search_terms, pattern_or_patterns=patterns or None, table_name=table_name, k=self.k, k_rank=self.k_rank, similarity_threshold=self.similarity_threshold, fields=self.fields, case_sensitive=self.case_sensitive, ) # hybrid_fts (default) return await self.knowledge_base.hybrid_fts_search( search_terms, table_name=table_name, k=self.k, k_rank=self.k_rank, similarity_threshold=self.similarity_threshold, fulltext_threshold=self.fulltext_threshold, ) for dm in target_data_models: table_name = dm.get_schema().get("title") try: rows = await _search_one(table_name) except Exception: rows = [] for row in rows: sig = json.dumps(row, sort_keys=True, default=str) if sig in seen: continue seen.add(sig) aggregated.append(row) # Sort: similarity → ascending score; fulltext / hybrid → # descending; regex → insertion order (no score field). if self.search_type == "similarity": aggregated.sort(key=lambda r: r.get("score", float("inf"))) elif self.search_type in ("fulltext", "hybrid_fts", "hybrid_regex"): aggregated.sort( key=lambda r: r.get("score", float("-inf")), reverse=True ) return aggregated[: self.k] async def call(self, inputs, training=False): if not inputs: return None # Generate search query using the language model search_query = await self.search_query_generator(inputs, training=training) if not search_query: return None # Get the tables, search terms, and (for hybrid_regex) patterns # from the generated query. query_json = search_query.get_json() tables = query_json.get("tables", []) search_terms = query_json.get("search", []) # `patterns` is only present when the LM ran against the # HybridRegexSearchQuery schema; harmlessly empty otherwise. patterns = query_json.get("patterns", []) # Hybrid_regex needs at least one of (search_terms, patterns); # every other mode needs search_terms. Otherwise we have nothing # to look up. if self.search_type == "hybrid_regex": if not search_terms and not patterns: return None elif not search_terms: return None # Filter data models to only those requested target_data_models = [] for dm in self.data_models: schema = dm.get_schema() if schema.get("title") in tables: target_data_models.append(dm) if not target_data_models: target_data_models = self.data_models # Perform search based on configured search type search_results = await self._perform_search( search_terms, patterns, target_data_models ) results = JsonDataModel( json={"result": search_results}, schema=GenericResult.get_schema(), name="retrieval_results_" + self.name, ) if self.return_query: results = await ops.logical_and( search_query, results, name="results_with_query_" + self.name, ) if self.return_inputs: results = await ops.logical_and( inputs, results, name="results_with_inputs_" + self.name, ) return results async def compute_output_spec(self, inputs, training=False): search_query = await self.search_query_generator(inputs, training=training) results = SymbolicDataModel( schema=GenericResult.get_schema(), name="retrieval_results_" + self.name, ) if self.return_query: results = await ops.logical_and( search_query, results, name="results_with_query_" + self.name, ) if self.return_inputs: results = await ops.logical_and( inputs, results, name="results_with_inputs_" + self.name, ) return results def get_config(self): config = { "search_type": self.search_type, "k": self.k, "similarity_threshold": self.similarity_threshold, "fulltext_threshold": self.fulltext_threshold, "k_rank": self.k_rank, "fields": list(self.fields) if self.fields is not None else None, "case_sensitive": self.case_sensitive, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "return_inputs": self.return_inputs, "return_query": self.return_query, "name": self.name, "description": self.description, "trainable": self.trainable, } knowledge_base_config = { "knowledge_base": serialization_lib.serialize_synalinks_object( self.knowledge_base, ) } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } data_models_config = { "data_models": [ ( serialization_lib.serialize_synalinks_object( data_model.to_symbolic_data_model( name="data_models" + (f"_{i}_" if i > 0 else "_") + self.name ) ) if not is_symbolic_data_model(data_model) else serialization_lib.serialize_synalinks_object(data_model) ) for i, data_model in enumerate(self.data_models) ] } return { **config, **knowledge_base_config, **language_model_config, **data_models_config, } @classmethod def from_config(cls, config): knowledge_base = serialization_lib.deserialize_synalinks_object( config.pop("knowledge_base"), ) language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model"), ) data_models_config = config.pop("data_models") data_models = [ serialization_lib.deserialize_synalinks_object(data_model) for data_model in data_models_config ] return cls( knowledge_base=knowledge_base, data_models=data_models, language_model=language_model, **config, ) ``` ## `SearchQuery` Bases: `DataModel` Output schema used by every search type except `hybrid_regex`. Source code in `synalinks/src/modules/knowledge/retrieve_knowledge.py` ``` class SearchQuery(DataModel): """Output schema used by every search type except ``hybrid_regex``.""" tables: List[str] = Field(description="The tables to lookup") search: List[str] = Field(description="The list of similarity search request") ``` ## `default_retriever_instructions(tables, search_type='hybrid')` The default instructions for the entity retriever. The body of the instructions tells the LM what kind of strings to put in each output field. The output *schema* also depends on the search type (see `_search_query_schema_for`): the hybrid_regex variant adds a `patterns` field, so the prompt names it explicitly. Source code in `synalinks/src/modules/knowledge/retrieve_knowledge.py` ``` def default_retriever_instructions(tables, search_type="hybrid"): """The default instructions for the entity retriever. The body of the instructions tells the LM what kind of strings to put in each output field. The output *schema* also depends on the search type (see ``_search_query_schema_for``): the hybrid_regex variant adds a ``patterns`` field, so the prompt names it explicitly. """ if search_type == "regex": guidance = ( "The `search` field should be a list of regular-expression " "patterns (RE2 syntax) to match against the text fields of " "the chosen tables. Prefer anchors, character classes, and " "alternation over natural-language phrasing — the patterns " "are matched literally, not interpreted." ) elif search_type == "hybrid_regex": guidance = ( "Emit **both** a natural-language `search` list (for " "vector similarity over the chosen tables) AND a `patterns` " "list of regular-expression patterns (RE2 syntax) that " "capture the exact textual shape of what you are looking " "for (anchors, character classes, alternation). The two " "signals are merged by Reciprocal Rank Fusion, so it is OK " "for each list to err on the side of recall." ) else: guidance = ( "The `search` field should be a list of natural language " "search queries for the information to look for." ) return f""" Your task is to retrieve information among the following tables: {tables}. First, decide step-by-step which tables you need, then use the `search` to perform a lookup. {guidance} """.strip() ``` ## `UpdateKnowledge` Bases: `Module` Update (insert or upsert) data models in the given knowledge base. This module stores data models in the knowledge base, using the first field of the data model as the primary key for upsert operations. Parameters: | Name | Type | Description | Default | | ---------------- | --------------- | --------------------------------------------------- | ------- | | `knowledge_base` | `KnowledgeBase` | The knowledge base to update. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `False` | Source code in `synalinks/src/modules/knowledge/update_knowledge.py` ``` @synalinks_export( [ "synalinks.modules.UpdateKnowledge", "synalinks.UpdateKnowledge", ] ) class UpdateKnowledge(Module): """Update (insert or upsert) data models in the given knowledge base. This module stores data models in the knowledge base, using the first field of the data model as the primary key for upsert operations. Args: knowledge_base (KnowledgeBase): The knowledge base to update. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, knowledge_base=None, name=None, description=None, trainable=False, ): super().__init__( name=name, description=description, trainable=trainable, ) self.knowledge_base = knowledge_base async def _update(self, data_model): await self.knowledge_base.update(data_model) return data_model.clone(name="updated_" + data_model.name) async def call(self, inputs): if not inputs: return None outputs = tree.map_structure( lambda x: run_maybe_nested(self._update(x)), inputs, ) return outputs async def compute_output_spec(self, inputs): return tree.map_structure( lambda x: x.clone(name="updated_" + x.name), inputs, ) def get_config(self): config = { "name": self.name, "description": self.description, "trainable": self.trainable, } knowledge_base_config = { "knowledge_base": serialization_lib.serialize_synalinks_object( self.knowledge_base ) } return {**knowledge_base_config, **config} @classmethod def from_config(cls, config): knowledge_base = serialization_lib.deserialize_synalinks_object( config.pop("knowledge_base") ) return cls(knowledge_base=knowledge_base, **config) ``` ## `InMask` Bases: `Module` A module to keep specific fields of the given data models Example: ``` import synalinks import asyncio language_model = synalinks.LanguageModel( model="ollama/mistral", ) class Document(synalinks.DataModel): title: str = synalinks.Field( description="The title of the document", ) text: str = synalinks.Field( description="The content of the document", ) class Summary(synalinks.DataModel): summary: str = synalinks.Field( description="the concise summary of the document", ) async def main(): inputs = Input(data_model=Document) summary = synalinks.ChainOfThought( data_model=Summary, language_model=language_model, )(inputs) masked_summary = synalinks.InMask( # remove the thinking field from the chain of thought # by keeping only the summary mask=["summary"], )(summary) program = Program( inputs=inputs, outputs=masked_summary, name="summary_generator", description="Generate a summary from a document", ) ``` Parameters: | Name | Type | Description | Default | | ------------- | ------ | --------------------------------------------------- | ------- | | `mask` | `list` | The list of keys to keep. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `False` | Source code in `synalinks/src/modules/masking/in_mask.py` ```` @synalinks_export( [ "synalinks.InMask", "synalinks.modules.InMask", ] ) class InMask(Module): """A module to keep specific fields of the given data models Example: ```python import synalinks import asyncio language_model = synalinks.LanguageModel( model="ollama/mistral", ) class Document(synalinks.DataModel): title: str = synalinks.Field( description="The title of the document", ) text: str = synalinks.Field( description="The content of the document", ) class Summary(synalinks.DataModel): summary: str = synalinks.Field( description="the concise summary of the document", ) async def main(): inputs = Input(data_model=Document) summary = synalinks.ChainOfThought( data_model=Summary, language_model=language_model, )(inputs) masked_summary = synalinks.InMask( # remove the thinking field from the chain of thought # by keeping only the summary mask=["summary"], )(summary) program = Program( inputs=inputs, outputs=masked_summary, name="summary_generator", description="Generate a summary from a document", ) ``` Args: mask (list): The list of keys to keep. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, mask=None, pattern=None, name=None, description=None, trainable=False, ): if not mask or not isinstance(mask, list): raise ValueError("`mask` parameter should be a list of fields to keep") super().__init__( name=name, description=description, ) self.mask = mask self.pattern = pattern async def call(self, inputs): outputs = tree.map_structure( lambda x: x.in_mask(mask=self.mask, pattern=self.pattern), inputs, ) return outputs ```` ## `OutMask` Bases: `Module` A module to remove specific fields of the given data models Example: ``` import synalinks import asyncio language_model = synalinks.LanguageModel( model="ollama/mistral", ) class Document(synalinks.DataModel): title: str = synalinks.Field( description="The title of the document", ) text: str = synalinks.Field( description="The content of the document", ) class Summary(synalinks.DataModel): summary: str = synalinks.Field( description="the concise summary of the document", ) async def main(): inputs = Input(data_model=Document) summary = synalinks.ChainOfThought( data_model=Summary, language_model=language_model, )(inputs) masked_summary = synalinks.OutMask( # remove the thinking field from the chain of thought mask=["thinking"], )(summary) program = Program( inputs=inputs, outputs=masked_summary, name="summary_generator", description="Generate a summary from a document", ) ``` Parameters: | Name | Type | Description | Default | | ------------- | ------ | --------------------------------------------------- | ------- | | `mask` | `list` | The list of keys to remove. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `False` | Source code in `synalinks/src/modules/masking/out_mask.py` ```` @synalinks_export( [ "synalinks.OutMask", "synalinks.modules.OutMask", ] ) class OutMask(Module): """A module to remove specific fields of the given data models Example: ```python import synalinks import asyncio language_model = synalinks.LanguageModel( model="ollama/mistral", ) class Document(synalinks.DataModel): title: str = synalinks.Field( description="The title of the document", ) text: str = synalinks.Field( description="The content of the document", ) class Summary(synalinks.DataModel): summary: str = synalinks.Field( description="the concise summary of the document", ) async def main(): inputs = Input(data_model=Document) summary = synalinks.ChainOfThought( data_model=Summary, language_model=language_model, )(inputs) masked_summary = synalinks.OutMask( # remove the thinking field from the chain of thought mask=["thinking"], )(summary) program = Program( inputs=inputs, outputs=masked_summary, name="summary_generator", description="Generate a summary from a document", ) ``` Args: mask (list): The list of keys to remove. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, mask=None, pattern=None, name=None, description=None, trainable=False, ): if not mask or not isinstance(mask, list): raise ValueError("`mask` parameter should be a list of fields to remove") super().__init__( name=name, description=description, ) self.mask = mask self.pattern = pattern async def call(self, inputs): outputs = tree.map_structure( lambda x: x.out_mask(mask=self.mask, pattern=self.pattern), inputs, ) return outputs ```` ## `And` Bases: `Module` Perform a logical And operation. It takes as input a list of data models, and returns a concatenation of them. If any input is None, then it output None. Table: | `x1` | `x2` | Logical And (`&`) | | ------ | ------ | ----------------- | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `None` | | `None` | `x2` | `None` | | `None` | `None` | `None` | Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------------------ | ------- | | `**kwargs` | `keyword arguments` | Standard keyword arguments for the module. | `{}` | Source code in `synalinks/src/modules/merging/logical_and.py` ``` @synalinks_export( [ "synalinks.And", "synalinks.modules.And", ] ) class And(Module): """Perform a logical And operation. It takes as input a list of data models, and returns a concatenation of them. If any input is None, then it output None. Table: | `x1` | `x2` | Logical And (`&`) | | ------ | ------ | ----------------- | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `None` | | `None` | `x2` | `None` | | `None` | `None` | `None` | Args: **kwargs (keyword arguments): Standard keyword arguments for the module. """ def __init__(self, **kwargs): super().__init__(**kwargs) async def call(self, inputs, training=False): output = inputs[0] for i in range(1, len(inputs)): output = await ops.logical_and( output, inputs[i], name=f"module_and_{i}_" + self.name, ) return output ``` ## `Concat` Bases: `Module` Perform a concatenation operation. It takes as input a list of data models, and returns a concatenation of them. If any input is None, an exception is raised. Table: | `x1` | `x2` | Concat (`+`) | | ------ | ------ | ------------ | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `Exception` | | `None` | `x2` | `Exception` | | `None` | `None` | `Exception` | Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------------------ | ------- | | `**kwargs` | `keyword arguments` | Standard keyword arguments for the module. | `{}` | Source code in `synalinks/src/modules/merging/concat.py` ``` @synalinks_export( [ "synalinks.Concat", "synalinks.Concatenate", "synalinks.modules.Concat", "synalinks.modules.Concatenate", ] ) class Concat(Module): """Perform a concatenation operation. It takes as input a list of data models, and returns a concatenation of them. If any input is None, an exception is raised. Table: | `x1` | `x2` | Concat (`+`) | | ------ | ------ | ----------------- | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `Exception` | | `None` | `x2` | `Exception` | | `None` | `None` | `Exception` | Args: **kwargs (keyword arguments): Standard keyword arguments for the module. """ def __init__(self, **kwargs): super().__init__(**kwargs) async def call(self, inputs, training=False): output = inputs[0] for i in range(1, len(inputs)): output = await ops.concat( output, inputs[i], name=f"module_concat_{i}_" + self.name, ) return output ``` ## `Or` Bases: `Module` Perform a logical Or operation. It takes as input a list of data models, and returns a concatenation of them (if all are provided) otherwise it output the one that is not None. If any input is None, it is ignored. Table: | `x1` | `x2` | Logical Or (`|`) | | --- | --- | --- | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `x1` | | `None` | `x2` | `x2` | | `None` | `None` | `None` | Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------------------ | ------- | | `**kwargs` | `keyword arguments` | Standard keyword arguments for the module. | `{}` | Source code in `synalinks/src/modules/merging/logical_or.py` ``` @synalinks_export( [ "synalinks.Or", "synalinks.modules.Or", ] ) class Or(Module): """Perform a logical Or operation. It takes as input a list of data models, and returns a concatenation of them (if all are provided) otherwise it output the one that is not None. If any input is None, it is ignored. Table: | `x1` | `x2` | Logical Or (`|`) | | ------ | ------ | ---------------- | | `x1` | `x2` | `x1 + x2` | | `x1` | `None` | `x1` | | `None` | `x2` | `x2` | | `None` | `None` | `None` | Args: **kwargs (keyword arguments): Standard keyword arguments for the module. """ def __init__(self, **kwargs): super().__init__(**kwargs) async def call(self, inputs, training=False): output = inputs[0] for i in range(1, len(inputs)): output = await ops.logical_or( output, inputs[i], name=f"module_or_{i}_" + self.name, ) return output ``` ## `Xor` Bases: `Module` Perform a logical Xor operation. It takes as input a list of data models, If more than two data models are not None, then it output None. otherwise it output the one that is not None. Table: | `x1` | `x2` | Logical Xor (`^`) | | ------ | ------ | ----------------- | | `x1` | `x2` | `None` | | `x1` | `None` | `x1` | | `None` | `x2` | `x2` | | `None` | `None` | `None` | Parameters: | Name | Type | Description | Default | | ---------- | ------------------- | ------------------------------------------ | ------- | | `**kwargs` | `keyword arguments` | Standard keyword arguments for the module. | `{}` | Source code in `synalinks/src/modules/merging/logical_xor.py` ``` @synalinks_export( [ "synalinks.Xor", "synalinks.modules.Xor", ] ) class Xor(Module): """Perform a logical Xor operation. It takes as input a list of data models, If more than two data models are not None, then it output None. otherwise it output the one that is not None. Table: | `x1` | `x2` | Logical Xor (`^`)| | ------ | ------ | ---------------- | | `x1` | `x2` | `None` | | `x1` | `None` | `x1` | | `None` | `x2` | `x2` | | `None` | `None` | `None` | Args: **kwargs (keyword arguments): Standard keyword arguments for the module. """ def __init__(self, **kwargs): super().__init__(**kwargs) async def compute_output_spec(self, inputs, training=False): return inputs[0].clone() async def call(self, inputs, training=False): output = inputs[0] for i in range(1, len(inputs)): if inputs[i]: if not output: output = inputs[i] else: return None return output.clone(name=self.name) ``` ## `PythonScript` Bases: `Trainable` The python code to transform a JSON object into another JSON object. The script is executed inside the Monty (https://github.com/pydantic/monty) sandboxed Python interpreter, which implements only a subset of Python. Scripts must observe the following constraints: - The input JSON object is exposed as a dict named `inputs`; the script must assign the output JSON object to a variable named `result` before it ends. - Only this subset of the standard library is importable: `sys`, `os`, `typing`, `asyncio`, `re`, `datetime`, `json`, `math`, `pathlib`. Notably, `time`, `random`, `itertools`, `collections`, `functools` and the rest of the stdlib are **not** available. - No third-party libraries can be imported (e.g. `numpy`, `pandas`, `pydantic`). - `class` definitions and `match` statements are not supported; use functions and `if`/`elif` chains instead. - The host filesystem, environment variables and network are not reachable from the script. `os`, `sys` and `pathlib` import but their dangerous surface is pruned or gated: `open()`, `os.system`, `os.listdir`, `os.environ`, `os.path`, `sys.argv` and `Path.read_text` are all unavailable. - `asyncio` is also a stub: only `asyncio.run` and `asyncio.gather` are exposed. There is no `asyncio.sleep`, `wait_for`, `Future`, `create_task` or `TaskGroup`, and no time primitives of any kind (`time` is not importable either). - Tools bound to the module are exposed as **global async callables** under their tool name. They must be awaited inside an `async def` and driven with `asyncio.run(...)`. Every tool call returns a **dict**: a tool wrapping `async def f(x) -> int` yields `{"result": }`, a tool that already returns a dict yields that dict directly. For example, with a bound tool `web_search`: ``` import asyncio async def main(): hits = await web_search(query=inputs.get("q")) # hits is a dict — index the field you need return {"answer": hits["results"][0]["title"]} result = asyncio.run(main()) ``` Independent tool calls can be fanned out with `asyncio.gather`. Calling a tool without `await` returns a coroutine object, not the real value. - Execution is bounded by the module's `timeout` and by Monty's memory limits; long-running or allocation-heavy scripts will be aborted. Source code in `synalinks/src/modules/synthesis/python_synthesis.py` ```` class PythonScript(Trainable): """The python code to transform a JSON object into another JSON object. The script is executed inside the Monty (https://github.com/pydantic/monty) sandboxed Python interpreter, which implements only a subset of Python. Scripts must observe the following constraints: - The input JSON object is exposed as a dict named ``inputs``; the script must assign the output JSON object to a variable named ``result`` before it ends. - Only this subset of the standard library is importable: ``sys``, ``os``, ``typing``, ``asyncio``, ``re``, ``datetime``, ``json``, ``math``, ``pathlib``. Notably, ``time``, ``random``, ``itertools``, ``collections``, ``functools`` and the rest of the stdlib are **not** available. - No third-party libraries can be imported (e.g. ``numpy``, ``pandas``, ``pydantic``). - ``class`` definitions and ``match`` statements are not supported; use functions and ``if``/``elif`` chains instead. - The host filesystem, environment variables and network are not reachable from the script. ``os``, ``sys`` and ``pathlib`` import but their dangerous surface is pruned or gated: ``open()``, ``os.system``, ``os.listdir``, ``os.environ``, ``os.path``, ``sys.argv`` and ``Path.read_text`` are all unavailable. - ``asyncio`` is also a stub: only ``asyncio.run`` and ``asyncio.gather`` are exposed. There is no ``asyncio.sleep``, ``wait_for``, ``Future``, ``create_task`` or ``TaskGroup``, and no time primitives of any kind (``time`` is not importable either). - Tools bound to the module are exposed as **global async callables** under their tool name. They must be awaited inside an ``async def`` and driven with ``asyncio.run(...)``. Every tool call returns a **dict**: a tool wrapping ``async def f(x) -> int`` yields ``{"result": }``, a tool that already returns a dict yields that dict directly. For example, with a bound tool ``web_search``: ```python import asyncio async def main(): hits = await web_search(query=inputs.get("q")) # hits is a dict — index the field you need return {"answer": hits["results"][0]["title"]} result = asyncio.run(main()) ``` Independent tool calls can be fanned out with ``asyncio.gather``. Calling a tool without ``await`` returns a coroutine object, not the real value. - Execution is bounded by the module's ``timeout`` and by Monty's memory limits; long-running or allocation-heavy scripts will be aborted. """ python_script: str = Field( description=( "A Python script that transforms a JSON input into a JSON " "output. The script reads the input from a dict named " "`inputs` and must assign the output dict to a variable " "named `result` before it ends. Exact language and stdlib " "constraints depend on the active sandbox." ), ) ```` ## `PythonSynthesis` Bases: `Module` A code Python code transformation on JSON data. The script runs inside the `Monty `\_ sandboxed Python interpreter: the host filesystem, environment and network are unreachable from the script. Monty only supports a subset of Python (no third-party libraries, limited standard library, no class or match statements), so the generated script must stay within what Monty can execute. This module features a python code as trainable variable, allowing the optimizers to refine the code during the training loop based on iterative feedback and automatic selection of the best script. This module works **ONLY** with advanced optimizers (**NOT** the `RandomFewShot` optimizer). The module executes the entire Python script and expects the result to be stored in a variable named 'result' at the end of execution. Example: ``` import synalinks import asyncio default_python_script = \ """ def transform(inputs): # TODO implement the code to transform the input grid into the output grid return {"output_grid": inputs.get("input_grid")} result = transform(inputs) """ async def main(): inputs = synalinks.Input( data_model=synalinks.datasets.arcagi.get_input_data_model(), ) outputs = await synalinks.PythonSynthesis( data_model=synalinks.datasets.arcagi.get_output_data_model() python_script=default_python_script, default_return_value={"output_grid": [[]]}, )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="python_script_synthesis", description="A program to solve ARCAGI with python code", ) ``` If you want to explore the future of neuro-symbolic self-evolving systems, contact us. While these systems are not "hard" to code thanks to Synalinks, they requires technical knowledge and a deep understanding of multiple AI paradigm. Parameters: | Name | Type | Description | Default | | ---------------------- | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | `schema` | `dict` | The target JSON schema. If not provided use the data_model to infer it. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `python_script` | `str` | The default Python script. | `None` | | `seed_scripts` | `list` | Optional. A list of Python scripts to use as seed for the evolution. If not provided, create a seed from the default configuration. | `None` | | `default_return_value` | `dict` | Default return value. | `None` | | `return_python_script` | `bool` | Wether or not to return the python script for evaluation. (Default to False). | `False` | | `timeout` | `int` | Maximum execution time in seconds. (Default 5 seconds). | `5` | | `tools` | `list` | Optional. A list of Tool (or MCP tools) exposed to the script as global async callables. Because Tools are async, scripts must call them inside an async def and await them (see the PythonScript docs). Passing None or an empty list means no tools are bound. Naming gotcha: each tool is registered inside the sandbox under tool.name, which is tool.\_func.__name__. So Tool(\_my_helper) registers as \_my_helper (underscore preserved) and the script must call await \_my_helper(...). Name your tool functions exactly as you want them to appear inside the generated script — rename the function, don't rely on an alias. | `None` | | `sandbox` | `Sandbox` | Optional. A pre-built Sandbox instance to reuse across calls. When supplied, the module will not build its own sandbox at call() time and sandbox_type is derived from type(sandbox). Pass this when the caller owns the sandbox lifecycle and state (variables, imports, function defs) must persist across successive calls — useful at training time when candidate scripts share cached state. When omitted, a fresh sandbox of sandbox_type is built per call. | `None` | | `sandbox_type` | `type` | Optional. The Sandbox subclass used to build a fresh sandbox per call when no sandbox is injected. Defaults to MontySandbox, or to type(sandbox) when sandbox is given. Any Sandbox subclass whose __init__ accepts (timeout=..., name=...) works; register custom subclasses with @register_synalinks_serializable so they round-trip through get_config / from_config. | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | `call` also accepts an optional `sandbox` kwarg. The resolution order is: per-call kwarg > constructor-supplied `sandbox` > a fresh sandbox of `sandbox_type`. The first two cases let the caller keep sandbox state alive across calls; the third is the stateless-per-call default. Source code in `synalinks/src/modules/synthesis/python_synthesis.py` ```` @synalinks_export( [ "synalinks.modules.PythonSynthesis", "synalinks.PythonSynthesis", ] ) class PythonSynthesis(Module): """A code Python code transformation on JSON data. The script runs inside the `Monty `_ sandboxed Python interpreter: the host filesystem, environment and network are unreachable from the script. Monty only supports a subset of Python (no third-party libraries, limited standard library, no class or match statements), so the generated script must stay within what Monty can execute. This module features a python code as trainable variable, allowing the optimizers to refine the code during the training loop based on iterative feedback and automatic selection of the best script. This module works **ONLY** with advanced optimizers (**NOT** the `RandomFewShot` optimizer). The module executes the entire Python script and expects the result to be stored in a variable named 'result' at the end of execution. Example: ```python import synalinks import asyncio default_python_script = \\ \"\"\" def transform(inputs): # TODO implement the code to transform the input grid into the output grid return {"output_grid": inputs.get("input_grid")} result = transform(inputs) \"\"\" async def main(): inputs = synalinks.Input( data_model=synalinks.datasets.arcagi.get_input_data_model(), ) outputs = await synalinks.PythonSynthesis( data_model=synalinks.datasets.arcagi.get_output_data_model() python_script=default_python_script, default_return_value={"output_grid": [[]]}, )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="python_script_synthesis", description="A program to solve ARCAGI with python code", ) ``` If you want to explore the future of neuro-symbolic self-evolving systems, contact us. While these systems are not "hard" to code thanks to Synalinks, they requires technical knowledge and a deep understanding of multiple AI paradigm. Args: schema (dict): The target JSON schema. If not provided use the `data_model` to infer it. data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data model for structured output. python_script (str): The default Python script. seed_scripts (list): Optional. A list of Python scripts to use as seed for the evolution. If not provided, create a seed from the default configuration. default_return_value (dict): Default return value. return_python_script (bool): Wether or not to return the python script for evaluation. (Default to False). timeout (int): Maximum execution time in seconds. (Default 5 seconds). tools (list): Optional. A list of `Tool` (or MCP tools) exposed to the script as global async callables. Because `Tool`s are async, scripts must call them inside an `async def` and `await` them (see the ``PythonScript`` docs). Passing `None` or an empty list means no tools are bound. **Naming gotcha**: each tool is registered inside the sandbox under ``tool.name``, which is ``tool._func.__name__``. So ``Tool(_my_helper)`` registers as ``_my_helper`` (underscore preserved) and the script must call ``await _my_helper(...)``. Name your tool functions exactly as you want them to appear inside the generated script — rename the function, don't rely on an alias. sandbox (Sandbox): Optional. A pre-built ``Sandbox`` instance to reuse across calls. When supplied, the module will not build its own sandbox at ``call()`` time and ``sandbox_type`` is derived from ``type(sandbox)``. Pass this when the caller owns the sandbox lifecycle and state (variables, imports, function defs) must persist across successive calls — useful at training time when candidate scripts share cached state. When omitted, a fresh sandbox of ``sandbox_type`` is built per call. sandbox_type (type): Optional. The ``Sandbox`` subclass used to build a fresh sandbox per call when no ``sandbox`` is injected. Defaults to ``MontySandbox``, or to ``type(sandbox)`` when ``sandbox`` is given. Any ``Sandbox`` subclass whose ``__init__`` accepts ``(timeout=..., name=...)`` works; register custom subclasses with ``@register_synalinks_serializable`` so they round-trip through ``get_config`` / ``from_config``. name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. ``call`` also accepts an optional ``sandbox`` kwarg. The resolution order is: per-call kwarg > constructor-supplied ``sandbox`` > a fresh sandbox of ``sandbox_type``. The first two cases let the caller keep sandbox state alive across calls; the third is the stateless-per-call default. """ def __init__( self, *, schema=None, data_model=None, python_script=None, seed_scripts=None, default_return_value=None, return_python_script=False, timeout=5, tools=None, sandbox=None, sandbox_type=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) if not schema and data_model: schema = data_model.get_schema() self.schema = schema if not python_script: raise ValueError("You should provide the `python_script` argument") self.python_script = python_script if not default_return_value: raise ValueError("You should provide the `default_return_value` argument") try: jsonschema.validate(default_return_value, self.schema) except ValidationError as e: raise ValueError( f"`default_return_value` parameter does not conform to schema: {e}" ) self.default_return_value = default_return_value self.return_python_script = return_python_script self.timeout = timeout self.tools = {} if tools: for tool in tools: self.tools[tool.name] = tool # Sandbox handling mirrors RecursiveLanguageModelAgent: if a # concrete sandbox is supplied at construction, reuse it across # calls and derive `sandbox_type` from its class. Otherwise fall # back to `sandbox_type` (default MontySandbox) and build one # fresh per `call()`. self.sandbox = sandbox if sandbox is not None: self.sandbox_type = type(sandbox) else: self.sandbox_type = sandbox_type or MontySandbox if not seed_scripts: seed_scripts = [] self.seed_scripts = seed_scripts seed_candidates = [ {"python_script": seed_script} for seed_script in self.seed_scripts ] self.state = self.add_variable( initializer=PythonScript( python_script=self.python_script, seed_candidates=seed_candidates, ).get_json(), data_model=PythonScript, name="state_" + self.name, ) async def execute(self, inputs, python_script, sandbox=None): """Execute the Python script in the sandbox with a timeout.""" return await _run_script( python_script, inputs.get_json(), self.schema, self.timeout, self.tools, sandbox=sandbox, sandbox_type=self.sandbox_type, ) async def call(self, inputs, training=False, sandbox=None): if not inputs: return None python_script = self.state.get("python_script") # Sandbox resolution order: per-call kwarg > constructor-supplied # sandbox > fresh sandbox of `sandbox_type` (built inside # `_run_script` when `sandbox` is still None). if sandbox is None: sandbox = self.sandbox result, stdout, stderr = await self.execute( inputs, python_script, sandbox=sandbox ) if training: predictions = self.state.get("current_predictions") if result: if self.return_python_script: predictions.append( { "inputs": { **inputs.get_json(), }, "outputs": { "python_script": python_script, **result, "stdout": stdout, "stderr": stderr, }, "reward": None, } ) else: predictions.append( { "inputs": { **inputs.get_json(), }, "outputs": { **result, "stdout": stdout, "stderr": stderr, }, "reward": None, } ) else: if self.return_python_script: predictions.append( { "inputs": { **inputs.get_json(), }, "outputs": { "python_script": python_script, "stdout": stdout, "stderr": stderr, }, "reward": None, } ) else: predictions.append( { "inputs": { **inputs.get_json(), }, "outputs": { "stdout": stdout, "stderr": stderr, }, "reward": None, } ) if result: if self.return_python_script: return JsonDataModel( json={ "python_script": python_script, **result, "stdout": stdout, "stderr": stderr, }, schema=self.schema, name=self.name, ) else: return JsonDataModel( json={ **result, "stdout": stdout, "stderr": stderr, }, schema=self.schema, name=self.name, ) else: if self.return_python_script: return JsonDataModel( json={ "python_script": python_script, **self.default_return_value, "stdout": stdout, "stderr": stderr, }, schema=self.schema, name=self.name, ) else: return JsonDataModel( json={ **self.default_return_value, "stdout": stdout, "stderr": stderr, }, schema=self.schema, name=self.name, ) async def compute_output_spec(self, inputs, training=False, sandbox=None): if self.return_python_script: return await ops.concat( await ops.out_mask( PythonScript.to_symbolic_data_model(), mask=list(Trainable.keys()), name="python_script_masked_" + self.name, ), await ops.concat( SymbolicDataModel(schema=self.schema), PythonConsoleLog, name="python_logs_" + self.name, ), name=self.name, ) else: return await ops.concat( SymbolicDataModel(schema=self.schema), PythonConsoleLog, name=self.name, ) def get_config(self): config = { "schema": self.schema, "python_script": self.python_script, "seed_scripts": self.seed_scripts, "default_return_value": self.default_return_value, "return_python_script": self.return_python_script, "timeout": self.timeout, "sandbox_type": get_registered_name(self.sandbox_type), "name": self.name, "description": self.description, "trainable": self.trainable, } sandbox_config = { "sandbox": ( serialization_lib.serialize_synalinks_object(self.sandbox) if self.sandbox is not None else None ) } tools_config = { "tools": [ serialization_lib.serialize_synalinks_object(tool) for tool in self.tools.values() ] } return {**config, **sandbox_config, **tools_config} @classmethod def from_config(cls, config): tools = [ serialization_lib.deserialize_synalinks_object(tool) for tool in config.pop("tools", []) ] sandbox = None if "sandbox" in config: sandbox_serialized = config.pop("sandbox") if sandbox_serialized is not None: sandbox = serialization_lib.deserialize_synalinks_object( sandbox_serialized ) sandbox_type_name = config.pop("sandbox_type", None) sandbox_type = ( get_registered_object(sandbox_type_name) if sandbox_type_name else None ) return cls( tools=tools or None, sandbox=sandbox, sandbox_type=sandbox_type, **config, ) ```` ### `execute(inputs, python_script, sandbox=None)` Execute the Python script in the sandbox with a timeout. Source code in `synalinks/src/modules/synthesis/python_synthesis.py` ``` async def execute(self, inputs, python_script, sandbox=None): """Execute the Python script in the sandbox with a timeout.""" return await _run_script( python_script, inputs.get_json(), self.schema, self.timeout, self.tools, sandbox=sandbox, sandbox_type=self.sandbox_type, ) ``` ## `SequentialPlan` Bases: `Trainable` The sequential step by step plan to achieve the task Source code in `synalinks/src/modules/synthesis/sequential_plan_synthesis.py` ``` class SequentialPlan(Trainable): """The sequential step by step plan to achieve the task""" steps: List[str] = Field( description="The list of steps", ) ``` ## `SequentialPlanSynthesis` Bases: `Module` A module that executes a sequential plan of steps. This module features a sequential plan as a trainable variable, allowing optimizers to refine the plan during the training loop based on iterative feedback. Basically learning to plan based on iterative feedback and automatic selection of the best plan. The module executes each step in the plan sequentially, passing the output of each step as input to the next step. The runner is responsible for executing each individual step. The most common runners are usually a `FunctionCallingAgent`, `ChainOfThought` or `Generator` module, but you can use any Module or Program. This module start by defaut without any plan, so it is equivalent to a single runner call. This module works **ONLY** with advanced optimizers (**NOT** the `RandomFewShot` optimizer). **Note**: The inputs are forwarded to the runner each time by concatenating the inputs with the previous steps outputs. So **ensure that the runner doesn't returns the inputs**, use `return_inputs=False` or `return_inputs_with_trajectory=False` when configuring your runner. Example: ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class FinalReport(synalinks.DataModel): report: str = synalinks.Field( description="The final report", ) class TaskSummary(synalinks.DataModel): summary: str = synalinks.Field( description="The summary of the executed task", ) async def main(): tools = # ... tools definition (see `FunctionCallingAgent`) inputs = synalinks.Input(data_model=Query) outputs = await synalinks.SequentialPlanSynthesis( data_model=FinalReport, language_model=language_model, runner=synalinks.FunctionCallingAgent( data_model=TaskSummary, language_model=language_model, tools=tools, return_inputs_with_trajectory=False, ), )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="planner_agent", description="An agent that learn a step by step plan to achieve a task", ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------- | | `schema` | `dict` | The target JSON schema. If not provided use the data_model to infer it. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `steps` | `list` | Optional. The default list of steps being a list of strings. | `None` | | `seed_steps` | `list` | Optional. A list of steps to use as seed for the optimization. If not provided, use the default steps as seed. | `None` | | `runner` | \`Module | Program\` | Required. The runner that executes each step. | | `return_inputs` | `bool` | Optional. Whether or not to concatenate the inputs to the outputs (Default to False). | `True` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/synthesis/sequential_plan_synthesis.py` ```` class SequentialPlanSynthesis(Module): """A module that executes a sequential plan of steps. This module features a sequential plan as a trainable variable, allowing optimizers to refine the plan during the training loop based on iterative feedback. Basically learning to plan based on iterative feedback and automatic selection of the best plan. The module executes each step in the plan sequentially, passing the output of each step as input to the next step. The runner is responsible for executing each individual step. The most common runners are usually a `FunctionCallingAgent`, `ChainOfThought` or `Generator` module, but you can use any Module or Program. This module start by defaut without any plan, so it is equivalent to a single runner call. This module works **ONLY** with advanced optimizers (**NOT** the `RandomFewShot` optimizer). **Note**: The inputs are forwarded to the runner each time by concatenating the inputs with the previous steps outputs. So **ensure that the runner doesn't returns the inputs**, use `return_inputs=False` or `return_inputs_with_trajectory=False` when configuring your runner. Example: ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class FinalReport(synalinks.DataModel): report: str = synalinks.Field( description="The final report", ) class TaskSummary(synalinks.DataModel): summary: str = synalinks.Field( description="The summary of the executed task", ) async def main(): tools = # ... tools definition (see `FunctionCallingAgent`) inputs = synalinks.Input(data_model=Query) outputs = await synalinks.SequentialPlanSynthesis( data_model=FinalReport, language_model=language_model, runner=synalinks.FunctionCallingAgent( data_model=TaskSummary, language_model=language_model, tools=tools, return_inputs_with_trajectory=False, ), )(inputs) program = synalinks.Program( inputs=inputs, outputs=outputs, name="planner_agent", description="An agent that learn a step by step plan to achieve a task", ) ``` Args: schema (dict): The target JSON schema. If not provided use the `data_model` to infer it. data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data model for structured output. language_model (LanguageModel): The language model to use. steps (list): Optional. The default list of steps being a list of strings. seed_steps (list): Optional. A list of steps to use as seed for the optimization. If not provided, use the default steps as seed. runner (Module | Program): Required. The runner that executes each step. return_inputs (bool): Optional. Whether or not to concatenate the inputs to the outputs (Default to False). reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, schema=None, data_model=None, language_model=None, steps=None, seed_steps=None, runner=None, return_inputs=True, reasoning_effort=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) if not schema and data_model: schema = data_model.get_schema() self.schema = schema if not steps: steps = [] self.steps = steps if not seed_steps: seed_steps = [[]] self.seed_steps = seed_steps if not runner: raise ValueError("The `runner` parameter is required.") if not isinstance(runner, Module): raise ValueError("The `runner` parameter should be a `Module` or `Program`.") self.language_model = _get_lm(language_model) self.runner = runner self.return_inputs = return_inputs self.reasoning_effort = reasoning_effort self.state = self.add_variable( initializer=SequentialPlan( steps=self.steps, seed_candidates=self.seed_steps, ).get_json(), data_model=SequentialPlan, name="state" + self.name, ) self.final_generator = ChainOfThought( schema=self.schema, language_model=self.language_model, return_inputs=self.return_inputs, reasoning_effort=self.reasoning_effort, name="final_generator_" + self.name, ) async def call(self, inputs, training=False): if not inputs: return None steps = self.state.get("steps") previous_steps = None if steps: for i, step in enumerate(steps): step_result = await self.runner(inputs, training=training) if not previous_steps: previous_steps = step_result else: previous_steps = await ops.concat( previous_steps, step_result, name=+f"step_{i}_with_inputs" + self.name, ) inputs = await ops.concat( inputs, await ops.concat( previous_steps, Step(step=step), name=f"step_{i}_" + self.name, ), name=f"step_{i}_with_inputs_" + self.name, ) else: result = await self.runner(inputs, training=training) inputs = await ops.concat( inputs, result, name="with_inputs_" + self.name, ) return await self.final_generator(inputs, training=training) async def compute_output_spec(self, inputs, training=False): _ = await self.runner(inputs) return await self.final_generator(inputs) def get_config(self): config = { "schema": self.schema, "steps": self.steps, "seed_steps": self.seed_steps, "return_inputs": self.return_inputs, "reasoning_effort": self.reasoning_effort, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } runner_config = { "runner": serialization_lib.serialize_synalinks_object( self.runner, ) } return { **config, **language_model_config, **runner_config, } @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model"), ) runner = serialization_lib.deserialize_synalinks_object( config.pop("runner"), ) return cls( language_model=language_model, runner=runner, **config, ) ```` ## `Step` Bases: `DataModel` The individual step to execute Source code in `synalinks/src/modules/synthesis/sequential_plan_synthesis.py` ``` class Step(DataModel): """The individual step to execute""" step: str = Field( description="The step to execute", ) ``` ## `ChainOfThought` Bases: `Module` Useful to answer in a step by step manner. This component concatenate a thinking field to your data model/schema and generate a prediction allowing the LM to think step by step before answering. By default the reasoning_effort is set to 'low' which uses the model's internal reasoning capabilities (extended thinking) to populate the thinking field. Example: ``` import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class Answer(synalinks.DataModel): answer: str = synalinks.Field( description="The correct answer", ) async def main(): language_model = synalinks.LanguageModel( model="anthropic/claude-3-7-sonnet-20250219", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.ChainOfThought( data_model=Answer, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="answer_with_chain_of_thought", description="Useful to answer step by step", ) if __name__ == "__main__": asyncio.run(main()) ``` References - [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------- | | `schema` | `dict` | The target JSON schema. If not provided use the data_model to infer it. | `None` | | `data_model` | \`DataModel | SymbolicDataModel | JsonDataModel\` | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The jinja2 prompt template (see Generator). | `None` | | `examples` | `list` | The default list of examples, the examples are a list of tuples containing input/output JSON pairs. | `None` | | `instructions` | `str` | The default instructions being a string containing instructions for the language model. | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none']. (Default to 'low'). If reasoning effort is none or disabled, a thinking field is automatically added to the output data model. Otherwise, the thinking field is automatically populated by the model's reasoning content. | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False) (see Generator). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False) (see Generator). | `False` | | `return_inputs` | `bool` | Optional. Whether or not to concatenate the inputs to the outputs (Default to False) (see Generator). | `False` | | `streaming` | `bool` | Optional. If true, stream the LM response. Only takes effect when data_model/schema is None (Default to False). | `False` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/ttc/chain_of_thought.py` ```` @synalinks_export( [ "synalinks.modules.ChainOfThought", "synalinks.ChainOfThought", ] ) class ChainOfThought(Module): """Useful to answer in a step by step manner. This component concatenate a thinking field to your data model/schema and generate a prediction allowing the LM to think step by step before answering. By default the reasoning_effort is set to 'low' which uses the model's internal reasoning capabilities (extended thinking) to populate the thinking field. Example: ```python import synalinks import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class Answer(synalinks.DataModel): answer: str = synalinks.Field( description="The correct answer", ) async def main(): language_model = synalinks.LanguageModel( model="anthropic/claude-3-7-sonnet-20250219", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.ChainOfThought( data_model=Answer, language_model=language_model, )(x0) program = synalinks.Program( inputs=x0, outputs=x1, name="answer_with_chain_of_thought", description="Useful to answer step by step", ) if __name__ == "__main__": asyncio.run(main()) ``` References: - [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) Args: schema (dict): The target JSON schema. If not provided use the `data_model` to infer it. data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data model. language_model (LanguageModel): The language model to use. prompt_template (str): The jinja2 prompt template (see `Generator`). examples (list): The default list of examples, the examples are a list of tuples containing input/output JSON pairs. instructions (str): The default instructions being a string containing instructions for the language model. seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none']. (Default to 'low'). If reasoning effort is none or disabled, a thinking field is automatically added to the output data model. Otherwise, the thinking field is automatically populated by the model's reasoning content. use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False) (see `Generator`). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False) (see `Generator`). return_inputs (bool): Optional. Whether or not to concatenate the inputs to the outputs (Default to False) (see `Generator`). streaming (bool): Optional. If true, stream the LM response. Only takes effect when `data_model`/`schema` is `None` (Default to False). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, schema=None, data_model=None, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, reasoning_effort=None, use_inputs_schema=False, use_outputs_schema=False, return_inputs=False, streaming=False, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) if not schema and data_model: schema = data_model.get_schema() self.schema = schema self.language_model = _get_lm(language_model) self.prompt_template = prompt_template self.examples = examples self.instructions = instructions self.seed_instructions = seed_instructions self.temperature = temperature # Default to "low" reasoning effort for ChainOfThought if reasoning_effort is None: reasoning_effort = "low" self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.return_inputs = return_inputs # Streaming is only meaningful when there is no structured schema. if self.schema and streaming: streaming = False self.streaming = streaming if self.schema: final_data_model = Thinking + SymbolicDataModel(schema=self.schema) else: final_data_model = None self.generator = Generator( data_model=final_data_model, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, seed_instructions=self.seed_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, return_inputs=self.return_inputs, streaming=self.streaming, name="generator_" + self.name, ) async def call(self, inputs, training=False): return await self.generator(inputs, training=training) def get_config(self): config = { "schema": self.schema, "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "return_inputs": self.return_inputs, "streaming": self.streaming, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } return { **config, **language_model_config, } @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model"), ) return cls( language_model=language_model, **config, ) ```` ## `SelfCritique` Bases: `Module` Useful to critique the given inputs. This component critique the inputs given and eventually generate an intermediate reward between 0.0 and 1.0. You can enable or disable the intermediate reward computation by using the `return_reward` flag (default to True). To have more accurate results, ensure that the inputs are provided along with the output to evaluate using `return_inputs` in your modules. Example: ``` import synalink import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class Answer(synalinks.DataModel): answer: str = synalinks.Field( description="The correct answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.ChainOfThought( data_model=Answer, language_model=language_model, return_inputs=True, )(x0) x2 = await synalinks.SelfCritique( language_model=language_model, )(x1) program = synalinks.Program( inputs=x0, outputs=x2, name="answer_with_cot_and_self_critique", description="Useful to answer accurately", ) if __name__ == "__main__": asyncio.run(main()) ``` Parameters: | Name | Type | Description | Default | | -------------------- | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The jinja2 prompt template (see Generator). | `None` | | `examples` | `list` | The default list of examples, the examples are a list of tuples containing input/output JSON pairs. | `None` | | `instructions` | `str` | The default instructions being a string containing instructions for the language model. | `None` | | `seed_instructions` | `list` | Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. | `None` | | `temperature` | `float` | Optional. The temperature for the LM call. | `0.0` | | `reasoning_effort` | `string` | Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). | `None` | | `use_inputs_schema` | `bool` | Optional. Whether or not use the inputs schema in the prompt (Default to False) (see Generator). | `False` | | `use_outputs_schema` | `bool` | Optional. Whether or not use the outputs schema in the prompt (Default to False) (see Generator). | `False` | | `return_reward` | `bool` | Optional. Whether or not to compute an intermediate reward. | `True` | | `return_inputs` | `bool` | Optional. Whether or not to concatenate the inputs to the outputs (Default to True) (see Generator). | `True` | | `name` | `str` | Optional. The name of the module. | `None` | | `description` | `str` | Optional. The description of the module. | `None` | | `trainable` | `bool` | Whether the module's variables should be trainable. | `True` | Source code in `synalinks/src/modules/ttc/self_critique.py` ```` @synalinks_export( [ "synalinks.modules.SelfCritique", "synalinks.SelfCritique", ] ) class SelfCritique(Module): """Useful to critique the given inputs. This component critique the inputs given and eventually generate an intermediate reward between 0.0 and 1.0. You can enable or disable the intermediate reward computation by using the `return_reward` flag (default to True). To have more accurate results, ensure that the inputs are provided along with the output to evaluate using `return_inputs` in your modules. Example: ```python import synalink import asyncio class Query(synalinks.DataModel): query: str = synalinks.Field( description="The user query", ) class Answer(synalinks.DataModel): answer: str = synalinks.Field( description="The correct answer", ) async def main(): language_model = synalinks.LanguageModel( model="ollama/mistral", ) x0 = synalinks.Input(data_model=Query) x1 = await synalinks.ChainOfThought( data_model=Answer, language_model=language_model, return_inputs=True, )(x0) x2 = await synalinks.SelfCritique( language_model=language_model, )(x1) program = synalinks.Program( inputs=x0, outputs=x2, name="answer_with_cot_and_self_critique", description="Useful to answer accurately", ) if __name__ == "__main__": asyncio.run(main()) ``` Args: language_model (LanguageModel): The language model to use. prompt_template (str): The jinja2 prompt template (see `Generator`). examples (list): The default list of examples, the examples are a list of tuples containing input/output JSON pairs. instructions (str): The default instructions being a string containing instructions for the language model. seed_instructions (list): Optional. A list of instructions to use as seed for the optimization. If not provided, use the default instructions as seed. temperature (float): Optional. The temperature for the LM call. reasoning_effort (string): Optional. The reasoning effort for the LM call between ['minimal', 'low', 'medium', 'high', 'disable', 'none', None]. Default to None (no reasoning). use_inputs_schema (bool): Optional. Whether or not use the inputs schema in the prompt (Default to False) (see `Generator`). use_outputs_schema (bool): Optional. Whether or not use the outputs schema in the prompt (Default to False) (see `Generator`). return_reward (bool): Optional. Whether or not to compute an intermediate reward. return_inputs (bool): Optional. Whether or not to concatenate the inputs to the outputs (Default to True) (see `Generator`). name (str): Optional. The name of the module. description (str): Optional. The description of the module. trainable (bool): Whether the module's variables should be trainable. """ def __init__( self, *, language_model=None, prompt_template=None, examples=None, instructions=None, seed_instructions=None, temperature=0.0, reasoning_effort=None, use_inputs_schema=False, use_outputs_schema=False, return_reward=True, return_inputs=True, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.language_model = _get_lm(language_model) self.prompt_template = prompt_template self.examples = examples self.instructions = instructions self.seed_instructions = seed_instructions self.temperature = temperature self.reasoning_effort = reasoning_effort self.use_inputs_schema = use_inputs_schema self.use_outputs_schema = use_outputs_schema self.return_reward = return_reward self.return_inputs = return_inputs if self.return_reward: schema = CritiqueWithReward.get_schema() else: schema = Critique.get_schema() self.generator = Generator( schema=schema, language_model=self.language_model, prompt_template=self.prompt_template, examples=self.examples, instructions=self.instructions, seed_instructions=self.seed_instructions, temperature=self.temperature, reasoning_effort=self.reasoning_effort, use_inputs_schema=self.use_inputs_schema, use_outputs_schema=self.use_outputs_schema, return_inputs=self.return_inputs, name="generator_" + self.name, ) async def call(self, inputs, training=False): return await self.generator(inputs, training=training) def get_config(self): config = { "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "seed_instructions": self.seed_instructions, "temperature": self.temperature, "reasoning_effort": self.reasoning_effort, "use_inputs_schema": self.use_inputs_schema, "use_outputs_schema": self.use_outputs_schema, "return_reward": self.return_reward, "return_inputs": self.return_inputs, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model, ) } return { **config, **language_model_config, } @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model"), ) return cls( language_model=language_model, **config, ) ```` ## `CosineSimilarity` Bases: `RewardFunctionWrapper` Computes the cosine similarity between `y_true` and `y_pred`. Formula: ``` reward = (sum(l2_norm(y_true) * l2_norm(y_pred))+1) / 2 ``` The formula is similar to the classic cosine similarity used in deep learning, but scaled to [0.0, 1.0] and adjusted to have a reward that tend towards 1.0 if the two objects are similar (and 0.0 otherwise). Example: ``` program.compile( reward=synalinks.rewards.CosineSimilarity( embedding_model=embedding_model ) optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | ---------------- | ---------------------------------------------------------------------------------------------- | --------------------- | | `embedding_model` | `EmbeddingModel` | The embedding model to use to compute the cosine similarity. | `None` | | `axis` | `int` | (Optional) Defaults to -1. The dimension along which the cosine similarity is computed. | `-1` | | `name` | `str` | (Optional) string name of the reward instance. | `'cosine_similarity'` | | `in_mask` | `list` | (Optional) list of keys to keep to compute the reward. | `None` | | `out_mask` | `list` | (Optional) list of keys to remove to compute the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask via OR). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask via OR). | `None` | Source code in `synalinks/src/rewards/cosine_similarity.py` ```` @synalinks_export( [ "synalinks.CosineSimilarity", "synalinks.rewards.CosineSimilarity", ] ) class CosineSimilarity(RewardFunctionWrapper): """ Computes the cosine similarity between `y_true` and `y_pred`. Formula: ``` reward = (sum(l2_norm(y_true) * l2_norm(y_pred))+1) / 2 ``` The formula is similar to the classic cosine similarity used in deep learning, but scaled to [0.0, 1.0] and adjusted to have a reward that tend towards 1.0 if the two objects are similar (and 0.0 otherwise). Example: ```python program.compile( reward=synalinks.rewards.CosineSimilarity( embedding_model=embedding_model ) optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Args: embedding_model (EmbeddingModel): The embedding model to use to compute the cosine similarity. axis (int): (Optional) Defaults to `-1`. The dimension along which the cosine similarity is computed. name (str): (Optional) string name of the reward instance. in_mask (list): (Optional) list of keys to keep to compute the reward. out_mask (list): (Optional) list of keys to remove to compute the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask`` via OR). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask`` via OR). """ def __init__( self, embedding_model=None, axis=-1, name="cosine_similarity", in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, ): super().__init__( fn=cosine_similarity, name=name, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, axis=axis, embedding_model=embedding_model, ) def get_config(self): config = Reward.get_config() from synalinks.src.saving.serialization_lib import serialize_synalinks_object embedding_model_config = { "embedding_model": serialize_synalinks_object(self.embedding_model) } return {**config, **embedding_model_config} @classmethod def from_config(cls, config): from synalinks.saving.serialization_lib import deserialize_synalinks_object embedding_model = deserialize_synalinks_object(config.pop("embedding_model")) return cls(embedding_model=embedding_model, **config) ```` ## `cosine_similarity(y_true, y_pred, embedding_model=None, axis=-1)` Computes the cosine similarity between `y_true` and `y_pred`. Formula: ``` reward = (sum(l2_norm(y_true) * l2_norm(y_pred))+1) / 2 ``` The formula is similar to the classic cosine similarity used in deep learning, but scaled to [0.0, 1.0] and adjusted to have a reward that tend towards 1.0 if the two objects are similar (and 0.0 otherwise). Parameters: | Name | Type | Description | Default | | ----------------- | ---------------- | --------------------------------------------------------------------------------------- | ---------- | | `y_true` | `JsonDataModel` | The ground truth JSON data_model. | *required* | | `y_pred` | `JsonDataModel` | The predicted JSON data_model. | *required* | | `embedding_model` | `EmbeddingModel` | The embedding model to use to compute the cosine similarity. | `None` | | `axis` | `int` | (Optional) Defaults to -1. The dimension along which the cosine similarity is computed. | `-1` | Returns: | Type | Description | | ------- | ----------------------------------------------------------------------------------------- | | `float` | The reward value, which tend to 1.0 if the values are similar, and towards 0.0 otherwise. | Source code in `synalinks/src/rewards/cosine_similarity.py` ```` @synalinks_export("synalinks.rewards.cosine_similarity") async def cosine_similarity(y_true, y_pred, embedding_model=None, axis=-1): """ Computes the cosine similarity between `y_true` and `y_pred`. Formula: ``` reward = (sum(l2_norm(y_true) * l2_norm(y_pred))+1) / 2 ``` The formula is similar to the classic cosine similarity used in deep learning, but scaled to [0.0, 1.0] and adjusted to have a reward that tend towards 1.0 if the two objects are similar (and 0.0 otherwise). Args: y_true (JsonDataModel): The ground truth JSON data_model. y_pred (JsonDataModel): The predicted JSON data_model. embedding_model (EmbeddingModel): The embedding model to use to compute the cosine similarity. axis (int): (Optional) Defaults to `-1`. The dimension along which the cosine similarity is computed. Returns: (float): The reward value, which tend to 1.0 if the values are similar, and towards 0.0 otherwise. """ reward = 0.0 if y_pred is not None: y_true = await ops.embedding(y_true, embedding_model=embedding_model) y_pred = await ops.embedding(y_pred, embedding_model=embedding_model) y_true = np.convert_to_tensor(y_true.get("embeddings")) y_pred = np.convert_to_tensor(y_pred.get("embeddings")) y_true, y_pred = squeeze_or_expand_to_same_rank(y_true, y_pred) y_pred = np.normalize(y_pred, axis=axis) y_true = np.normalize(y_true, axis=axis) reward = (np.sum(y_true * y_pred, axis=axis) + 1) / 2 return reward ```` ## `ExactMatch` Bases: `RewardFunctionWrapper` Computes the exact match between `y_true` and `y_pred`. Example: ``` program.compile( reward=synalinks.rewards.ExactMatch(), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | ------ | ---------------------------------------------------------------------------------------------- | --------------- | | `name` | `str` | Optional. string name of the reward instance. | `'exact_match'` | | `in_mask` | `list` | Optional. list of keys to keep to compute the reward. | `None` | | `out_mask` | `list` | Optional. list of keys to remove to compute the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask via OR). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask via OR). | `None` | Source code in `synalinks/src/rewards/exact_match.py` ```` @synalinks_export( [ "synalinks.ExactMatch", "synalinks.rewards.ExactMatch", ] ) class ExactMatch(RewardFunctionWrapper): """Computes the exact match between `y_true` and `y_pred`. Example: ```python program.compile( reward=synalinks.rewards.ExactMatch(), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Args: name (str): Optional. string name of the reward instance. in_mask (list): Optional. list of keys to keep to compute the reward. out_mask (list): Optional. list of keys to remove to compute the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask`` via OR). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask`` via OR). """ def __init__( self, name="exact_match", in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, ): super().__init__( fn=exact_match, name=name, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, ) def get_config(self): return { "name": self.name, "in_mask": self.in_mask, "out_mask": self.out_mask, "in_mask_pattern": self.in_mask_pattern, "out_mask_pattern": self.out_mask_pattern, } @classmethod def from_config(cls, config): return cls(**config) ```` ## `exact_match(y_true, y_pred)` Computes the exact match between `y_true` and `y_pred`. If their values are equal, it returns a reward of 1.0; otherwise, it returns 0.0. Parameters: | Name | Type | Description | Default | | -------- | --------------- | --------------------------------- | ---------- | | `y_true` | `JsonDataModel` | The ground truth JSON data_model. | *required* | | `y_pred` | `JsonDataModel` | The predicted JSON data_model. | *required* | Returns: | Type | Description | | ------- | ------------------------------------------------------------------------------ | | `float` | The reward value, which is 1.0 if the values match exactly, and 0.0 otherwise. | Source code in `synalinks/src/rewards/exact_match.py` ``` @synalinks_export("synalinks.rewards.exact_match") async def exact_match(y_true, y_pred): """ Computes the exact match between `y_true` and `y_pred`. If their values are equal, it returns a reward of 1.0; otherwise, it returns 0.0. Args: y_true (JsonDataModel): The ground truth JSON data_model. y_pred (JsonDataModel): The predicted JSON data_model. Returns: (float): The reward value, which is 1.0 if the values match exactly, and 0.0 otherwise. """ reward = 0.0 if y_pred is not None: if y_pred.get_json() == y_true.get_json(): reward = 1.0 return reward ``` ## `LMAsJudge` Bases: `ProgramAsJudge` Evaluate the output of a program using a `LanguageModel`. Example: ``` async def main(): # ... program definition program.compile( reward=synalinks.rewards.LMAsJudge( language_model=language_model, ) optimizer=synalinks.optimizers.RandomFewShot(), ) history = await program.fit(...) ``` Parameters: | Name | Type | Description | Default | | ------------------ | --------------- | ---------------------------------------------------------------------------------------------- | --------------- | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The default jinja2 prompt template to use (see Generator). | `None` | | `instructions` | `list` | The default instructions to use (see Generator). | `None` | | `examples` | `list` | The default examples to use in the prompt (see Generator). | `None` | | `name` | `str` | Optional. string name of the reward instance. | `'lm_as_judge'` | | `in_mask` | `list` | Optional. list of keys to keep to compute the reward. | `None` | | `out_mask` | `list` | Optional. list of keys to remove to compute the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask via OR). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask via OR). | `None` | Source code in `synalinks/src/rewards/lm_as_judge.py` ```` @synalinks_export( [ "synalinks.LMAsJudge", "synalinks.rewards.LMAsJudge", ] ) class LMAsJudge(ProgramAsJudge): """Evaluate the output of a program using a `LanguageModel`. Example: ```python async def main(): # ... program definition program.compile( reward=synalinks.rewards.LMAsJudge( language_model=language_model, ) optimizer=synalinks.optimizers.RandomFewShot(), ) history = await program.fit(...) ``` Args: language_model (LanguageModel): The language model to use. prompt_template (str): The default jinja2 prompt template to use (see `Generator`). instructions (list): The default instructions to use (see `Generator`). examples (list): The default examples to use in the prompt (see `Generator`). name (str): Optional. string name of the reward instance. in_mask (list): Optional. list of keys to keep to compute the reward. out_mask (list): Optional. list of keys to remove to compute the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask`` via OR). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask`` via OR). """ def __init__( self, language_model=None, prompt_template=None, examples=None, instructions=None, name="lm_as_judge", in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, ): program = LMAsJudgeProgram( language_model=language_model, prompt_template=prompt_template, examples=examples, instructions=instructions, ) super().__init__( program=program, name=name, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, ) ```` ## `LMAsJudgeProgram` Bases: `Program` Evaluate the output of a program using a `LanguageModel`. Parameters: | Name | Type | Description | Default | | ----------------- | --------------- | ---------------------------------------------------------- | ------- | | `language_model` | `LanguageModel` | The language model to use. | `None` | | `prompt_template` | `str` | The default jinja2 prompt template to use (see Generator). | `None` | | `examples` | `list` | The default examples to use in the prompt (see Generator). | `None` | | `instructions` | `list` | The default instructions to use (see Generator). | `None` | | `name` | `str` | Optional. The name of the program. | `None` | | `description` | `str` | Optional. The description of the program. | `None` | | `trainable` | `bool` | Whether the program's variables should be trainable. | `True` | Source code in `synalinks/src/rewards/lm_as_judge.py` ``` class LMAsJudgeProgram(Program): """Evaluate the output of a program using a `LanguageModel`. Args: language_model (LanguageModel): The language model to use. prompt_template (str): The default jinja2 prompt template to use (see `Generator`). examples (list): The default examples to use in the prompt (see `Generator`). instructions (list): The default instructions to use (see `Generator`). name (str): Optional. The name of the program. description (str): Optional. The description of the program. trainable (bool): Whether the program's variables should be trainable. """ def __init__( self, language_model=None, prompt_template=None, examples=None, instructions=None, name=None, description=None, trainable=True, ): super().__init__( name=name, description=description, trainable=trainable, ) self.critique = SelfCritique( language_model=language_model, prompt_template=prompt_template, examples=examples, instructions=instructions, name="self_critique_" + self.name, ) self.language_model = language_model self.prompt_template = prompt_template self.examples = examples self.instructions = instructions async def call(self, inputs): if not isinstance(inputs, (list, tuple)): raise ValueError("The inputs should be a list or tuple.") if len(inputs) != 2: raise ValueError("The inputs of the program should have a length of 2.") y_true = inputs[0] y_pred = inputs[1] if not y_pred: return 0.0 if y_true: y_true = await ops.prefix( y_true, prefix="gold", name="gold_y_true", ) return await self.critique( await ops.concat( y_true, y_pred, name="y_true_with_y_pred", ) ) else: return await self.critique(y_pred) def get_config(self): config = { "prompt_template": self.prompt_template, "examples": self.examples, "instructions": self.instructions, "name": self.name, "description": self.description, "trainable": self.trainable, } language_model_config = { "language_model": serialization_lib.serialize_synalinks_object( self.language_model ) } return {**language_model_config, **config} @classmethod def from_config(cls, config): language_model = serialization_lib.deserialize_synalinks_object( config.pop("language_model") ) return cls(language_model=language_model, **config) ``` ## `BatchReward` Bases: `Reward` Batched reward base class. Subclasses receive the entire batch at once and must return one reward per sample. Use this when the reward needs cross-sample context (e.g. group-relative scores, batch normalization, paired comparisons). To be implemented by subclasses: - `call(y_true, y_pred)`: `y_true` and `y_pred` are lists of length `batch_size`. MUST return a `list[float]` of the same length, one reward per sample. Parameters: | Name | Type | Description | Default | | ------------------ | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | | `name` | `str` | Optional name for the reward instance. | `None` | | `reduction` | `str` | Optional. One of "mean", "sum", "min", "max", "none" or None. Applied by __call__ when called on a batch directly. The trainer consumes the unreduced per-sample list via compute_batch, but propagates this value to control the scalar shown in progress logs and used for candidate scoring ("none"/None falls back to "mean" for those). | `'mean'` | | `in_mask` | `list` | Optional. List of exact field names to keep before computing the reward. | `None` | | `out_mask` | `list` | Optional. List of exact field names to drop before computing the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask). | `None` | Source code in `synalinks/src/rewards/batch_reward.py` ``` @synalinks_export(["synalinks.BatchReward", "synalinks.rewards.BatchReward"]) class BatchReward(Reward): """Batched reward base class. Subclasses receive the entire batch at once and must return one reward per sample. Use this when the reward needs cross-sample context (e.g. group-relative scores, batch normalization, paired comparisons). To be implemented by subclasses: * ``call(y_true, y_pred)``: ``y_true`` and ``y_pred`` are lists of length ``batch_size``. MUST return a ``list[float]`` of the same length, one reward per sample. Args: name (str): Optional name for the reward instance. reduction (str): Optional. One of ``"mean"``, ``"sum"``, ``"min"``, ``"max"``, ``"none"`` or ``None``. Applied by ``__call__`` when called on a batch directly. The trainer consumes the unreduced per-sample list via ``compute_batch``, but propagates this value to control the scalar shown in progress logs and used for candidate scoring (``"none"``/``None`` falls back to ``"mean"`` for those). in_mask (list): Optional. List of exact field names to keep before computing the reward. out_mask (list): Optional. List of exact field names to drop before computing the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask``). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask``). """ async def __call__(self, y_true, y_pred): rewards = await self.compute_batch(y_true, y_pred) return reduce_values(rewards, reduction=self.reduction) async def compute_batch(self, y_true, y_pred): """Apply masks and return the per-sample reward list (unreduced). This is what the trainer calls — it expects the raw ``list[float]`` of length ``batch_size`` so it can treat each entry as that sample's reward. """ with ops.name_scope(self.name): y_true, y_pred = apply_masks( y_true, y_pred, in_mask=self.in_mask, in_mask_pattern=self.in_mask_pattern, out_mask=self.out_mask, out_mask_pattern=self.out_mask_pattern, ) rewards = await self.call(y_true, y_pred) return _validate_batch_rewards(rewards, y_pred, type(self).__name__) async def call(self, y_true, y_pred): raise NotImplementedError def _obj_type(self): return "BatchReward" ``` ### `compute_batch(y_true, y_pred)` Apply masks and return the per-sample reward list (unreduced). This is what the trainer calls — it expects the raw `list[float]` of length `batch_size` so it can treat each entry as that sample's reward. Source code in `synalinks/src/rewards/batch_reward.py` ``` async def compute_batch(self, y_true, y_pred): """Apply masks and return the per-sample reward list (unreduced). This is what the trainer calls — it expects the raw ``list[float]`` of length ``batch_size`` so it can treat each entry as that sample's reward. """ with ops.name_scope(self.name): y_true, y_pred = apply_masks( y_true, y_pred, in_mask=self.in_mask, in_mask_pattern=self.in_mask_pattern, out_mask=self.out_mask, out_mask_pattern=self.out_mask_pattern, ) rewards = await self.call(y_true, y_pred) return _validate_batch_rewards(rewards, y_pred, type(self).__name__) ``` ## `BatchRewardFunctionWrapper` Bases: `BatchReward` Wrap a stateless batched function into a `BatchReward`. The wrapped function receives the full batch and must return a `list[float]` of length `batch_size`. Example: ``` async def my_batch_reward(y_true, y_pred): # y_true, y_pred: list[JsonDataModel] of length batch_size return [1.0 if t.get_json() == p.get_json() else 0.0 for t, p in zip(y_true, y_pred)] program.compile( reward=synalinks.rewards.BatchRewardFunctionWrapper(fn=my_batch_reward), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | | `fn` | `callable` | Async batched reward function with signature fn(y_true, y_pred, \*\*kwargs) -> list[float]. | *required* | | `name` | `str` | Optional. string name of the reward instance. | `None` | | `reduction` | `str` | Optional. One of "mean", "sum", "min", "max", "none" or None. Used by standalone __call__ and propagated through compile to set the scalar reduction used by the trainer's progress log and the optimizer's candidate scoring ("none"/None falls back to "mean" there). | `'mean'` | | `in_mask` | `list` | Optional. | `None` | | `out_mask` | `list` | Optional. | `None` | | `in_mask_pattern` | `str` | Optional. | `None` | | `out_mask_pattern` | `str` | Optional. | `None` | | `**kwargs` | `keyword arguments` | Extra keyword arguments forwarded to fn. | `{}` | Source code in `synalinks/src/rewards/batch_reward.py` ```` @synalinks_export("synalinks.rewards.BatchRewardFunctionWrapper") class BatchRewardFunctionWrapper(BatchReward): """Wrap a stateless batched function into a ``BatchReward``. The wrapped function receives the full batch and must return a ``list[float]`` of length ``batch_size``. Example: ```python async def my_batch_reward(y_true, y_pred): # y_true, y_pred: list[JsonDataModel] of length batch_size return [1.0 if t.get_json() == p.get_json() else 0.0 for t, p in zip(y_true, y_pred)] program.compile( reward=synalinks.rewards.BatchRewardFunctionWrapper(fn=my_batch_reward), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Args: fn (callable): Async batched reward function with signature ``fn(y_true, y_pred, **kwargs) -> list[float]``. name (str): Optional. string name of the reward instance. reduction (str): Optional. One of ``"mean"``, ``"sum"``, ``"min"``, ``"max"``, ``"none"`` or ``None``. Used by standalone ``__call__`` and propagated through ``compile`` to set the scalar reduction used by the trainer's progress log and the optimizer's candidate scoring (``"none"``/``None`` falls back to ``"mean"`` there). in_mask (list): Optional. out_mask (list): Optional. in_mask_pattern (str): Optional. out_mask_pattern (str): Optional. **kwargs (keyword arguments): Extra keyword arguments forwarded to ``fn``. """ def __init__( self, fn, reduction="mean", name=None, in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, **kwargs, ): super().__init__( name=name, reduction=reduction, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, ) self.fn = fn self._fn_kwargs = kwargs async def call(self, y_true, y_pred): return await self.fn(y_true, y_pred, **self._fn_kwargs) def get_config(self): config = super().get_config() config["fn"] = serialization_lib.serialize_synalinks_object(self.fn) config["fn_kwargs"] = serialization_lib.serialize_synalinks_object( self._fn_kwargs ) return config @classmethod def from_config(cls, config): if "fn" in config: config = serialization_lib.deserialize_synalinks_object(config) fn_kwargs = config.pop("fn_kwargs", None) or {} return cls(**config, **fn_kwargs) def __repr__(self): return f"" ```` ## `ProgramAsJudge` Bases: `Reward` Wrap a `Program` into a `Reward`. You can use this to create advanced reward functions that use a Synalinks `Program`. The program should have two inputs and one output. **Note:** The output data model/schema should have a field named `reward`. Example: ``` # ... your program declaration program = synalinks.Program( inputs=x0, outputs=xn, ) program.compile( reward=synalinks.rewards.ProgramAsJudge(program=program) optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | --------- | ---------------------------------------------------------------------------------------------- | ---------- | | `program` | `Program` | The reward program to wrap. | *required* | | `name` | `str` | Optional. string name of the reward instance. | `None` | | `in_mask` | `list` | Optional. list of keys to keep to compute the reward. | `None` | | `out_mask` | `list` | Optional. list of keys to remove to compute the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask via OR). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask via OR). | `None` | Source code in `synalinks/src/rewards/reward_wrappers.py` ```` @synalinks_export( [ "synalinks.ProgramAsJudge", "synalinks.rewards.ProgramAsJudge", ] ) class ProgramAsJudge(Reward): """Wrap a `Program` into a `Reward`. You can use this to create advanced reward functions that use a Synalinks `Program`. The program should have two inputs and one output. **Note:** The output data model/schema should have a field named `reward`. Example: ```python # ... your program declaration program = synalinks.Program( inputs=x0, outputs=xn, ) program.compile( reward=synalinks.rewards.ProgramAsJudge(program=program) optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Args: program (Program): The reward program to wrap. name (str): Optional. string name of the reward instance. in_mask (list): Optional. list of keys to keep to compute the reward. out_mask (list): Optional. list of keys to remove to compute the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask`` via OR). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask`` via OR). """ def __init__( self, program, reduction="mean", name=None, in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, ): super().__init__( name=name, reduction=reduction, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, ) self.program = program async def call(self, y_true, y_pred): result = await self.program([y_true, y_pred]) if result is None: warnings.warn( f"{self.__class__.__name__}: judge program returned None " "(likely an LLM / provider failure). Scoring this sample as " "0.0 and continuing. Check the underlying language model " "and structured-output configuration.", RuntimeWarning, stacklevel=2, ) return 0.0 return float(result.get("reward", 0.0)) def get_config(self): config = super().get_config() config["program"] = serialization_lib.serialize_synalinks_object(self.program) return config @classmethod def from_config(cls, config): if "program" in config: config = serialization_lib.deserialize_synalinks_object(config) return cls(**config) def __repr__(self): return f"" ```` ## `RewardFunctionWrapper` Bases: `Reward` Wrap a stateless function into a `Reward`. You can use this to quickly build a reward from a function. The function needs to have the signature `fn(y_true, y_pred)`. Example: ``` async def my_reward(y_true, y_pred): # ... return reward program.compile( reward=synalinks.rewards.RewardFunctionWrapper(fn=my_reward), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Parameters: | Name | Type | Description | Default | | ------------------ | ------------------- | ---------------------------------------------------------------------------------------------- | ---------- | | `fn` | `callable` | Async reward function to wrap, with signature fn(y_true, y_pred, \*\*kwargs). | *required* | | `name` | `str` | Optional. string name of the reward instance. | `None` | | `in_mask` | `list` | Optional. list of keys to keep to compute the reward. | `None` | | `out_mask` | `list` | Optional. list of keys to remove to compute the reward. | `None` | | `in_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are kept (combined with in_mask via OR). | `None` | | `out_mask_pattern` | `str` | Optional. Regex pattern; fields whose names match are dropped (combined with out_mask via OR). | `None` | | `**kwargs` | `keyword arguments` | Keyword arguments to pass on to fn. | `{}` | Source code in `synalinks/src/rewards/reward_wrappers.py` ```` @synalinks_export("synalinks.rewards.RewardFunctionWrapper") class RewardFunctionWrapper(Reward): """Wrap a stateless function into a `Reward`. You can use this to quickly build a reward from a function. The function needs to have the signature `fn(y_true, y_pred)`. Example: ```python async def my_reward(y_true, y_pred): # ... return reward program.compile( reward=synalinks.rewards.RewardFunctionWrapper(fn=my_reward), optimizer=synalinks.optimizers.RandomFewShot(), ) ``` Args: fn (callable): Async reward function to wrap, with signature ``fn(y_true, y_pred, **kwargs)``. name (str): Optional. string name of the reward instance. in_mask (list): Optional. list of keys to keep to compute the reward. out_mask (list): Optional. list of keys to remove to compute the reward. in_mask_pattern (str): Optional. Regex pattern; fields whose names match are kept (combined with ``in_mask`` via OR). out_mask_pattern (str): Optional. Regex pattern; fields whose names match are dropped (combined with ``out_mask`` via OR). **kwargs (keyword arguments): Keyword arguments to pass on to `fn`. """ def __init__( self, fn, reduction="mean", name=None, in_mask=None, out_mask=None, in_mask_pattern=None, out_mask_pattern=None, **kwargs, ): super().__init__( name=name, reduction=reduction, in_mask=in_mask, out_mask=out_mask, in_mask_pattern=in_mask_pattern, out_mask_pattern=out_mask_pattern, ) self.fn = fn self._fn_kwargs = kwargs async def call(self, y_true, y_pred): return await self.fn(y_true, y_pred, **self._fn_kwargs) def get_config(self): config = super().get_config() config["fn"] = serialization_lib.serialize_synalinks_object(self.fn) # Keep fn kwargs under their own key so they cannot collide with # base-class fields like ``name`` or ``reduction``. config["fn_kwargs"] = serialization_lib.serialize_synalinks_object( self._fn_kwargs ) return config @classmethod def from_config(cls, config): if "fn" in config: config = serialization_lib.deserialize_synalinks_object(config) fn_kwargs = config.pop("fn_kwargs", None) or {} return cls(**config, **fn_kwargs) def __repr__(self): return f"" ````