Skip to content

PythonSynthesis module

PythonScript

Bases: Trainable

The python code to transform a JSON object into another JSON object.

The script is executed inside the Monty (https://github.com/pydantic/monty) sandboxed Python interpreter, which implements only a subset of Python. Scripts must observe the following constraints:

  • The input JSON object is exposed as a dict named inputs; the script must assign the output JSON object to a variable named result before it ends.
  • Only this subset of the standard library is importable: sys, os, typing, asyncio, re, datetime, json, math, pathlib. Notably, time, random, itertools, collections, functools and the rest of the stdlib are not available.
  • No third-party libraries can be imported (e.g. numpy, pandas, pydantic).
  • class definitions and match statements are not supported; use functions and if/elif chains instead.
  • The host filesystem, environment variables and network are not reachable from the script. os, sys and pathlib import but their dangerous surface is pruned or gated: open(), os.system, os.listdir, os.environ, os.path, sys.argv and Path.read_text are all unavailable.
  • asyncio is also a stub: only asyncio.run and asyncio.gather are exposed. There is no asyncio.sleep, wait_for, Future, create_task or TaskGroup, and no time primitives of any kind (time is not importable either).
  • Tools bound to the module are exposed as global async callables under their tool name. They must be awaited inside an async def and driven with asyncio.run(...). Every tool call returns a dict: a tool wrapping async def f(x) -> int yields {"result": <value>}, a tool that already returns a dict yields that dict directly. For example, with a bound tool web_search:
import asyncio

async def main():
    hits = await web_search(query=inputs.get("q"))
    # hits is a dict — index the field you need
    return {"answer": hits["results"][0]["title"]}

result = asyncio.run(main())

Independent tool calls can be fanned out with asyncio.gather. Calling a tool without await returns a coroutine object, not the real value. - Execution is bounded by the module's timeout and by Monty's memory limits; long-running or allocation-heavy scripts will be aborted.

Source code in synalinks/src/modules/synthesis/python_synthesis.py
class PythonScript(Trainable):
    """The python code to transform a JSON object into another JSON object.

    The script is executed inside the Monty
    (https://github.com/pydantic/monty) sandboxed Python interpreter, which
    implements only a subset of Python. Scripts must observe the following
    constraints:

    - The input JSON object is exposed as a dict named ``inputs``; the script
      must assign the output JSON object to a variable named ``result`` before
      it ends.
    - Only this subset of the standard library is importable: ``sys``,
      ``os``, ``typing``, ``asyncio``, ``re``, ``datetime``, ``json``,
      ``math``, ``pathlib``. Notably, ``time``, ``random``, ``itertools``,
      ``collections``, ``functools`` and the rest of the stdlib are **not**
      available.
    - No third-party libraries can be imported (e.g. ``numpy``, ``pandas``,
      ``pydantic``).
    - ``class`` definitions and ``match`` statements are not supported; use
      functions and ``if``/``elif`` chains instead.
    - The host filesystem, environment variables and network are not
      reachable from the script. ``os``, ``sys`` and ``pathlib`` import but
      their dangerous surface is pruned or gated: ``open()``, ``os.system``,
      ``os.listdir``, ``os.environ``, ``os.path``, ``sys.argv`` and
      ``Path.read_text`` are all unavailable.
    - ``asyncio`` is also a stub: only ``asyncio.run`` and ``asyncio.gather``
      are exposed. There is no ``asyncio.sleep``, ``wait_for``, ``Future``,
      ``create_task`` or ``TaskGroup``, and no time primitives of any kind
      (``time`` is not importable either).
    - Tools bound to the module are exposed as **global async callables**
      under their tool name. They must be awaited inside an ``async def``
      and driven with ``asyncio.run(...)``. Every tool call returns a
      **dict**: a tool wrapping ``async def f(x) -> int`` yields
      ``{"result": <value>}``, a tool that already returns a dict yields
      that dict directly. For example, with a bound tool ``web_search``:

      ```python
      import asyncio

      async def main():
          hits = await web_search(query=inputs.get("q"))
          # hits is a dict — index the field you need
          return {"answer": hits["results"][0]["title"]}

      result = asyncio.run(main())
      ```

      Independent tool calls can be fanned out with ``asyncio.gather``.
      Calling a tool without ``await`` returns a coroutine object, not the
      real value.
    - Execution is bounded by the module's ``timeout`` and by Monty's memory
      limits; long-running or allocation-heavy scripts will be aborted.
    """

    python_script: str = Field(
        description=(
            "A Python script that transforms a JSON input into a JSON "
            "output. The script reads the input from a dict named "
            "`inputs` and must assign the output dict to a variable "
            "named `result` before it ends. Exact language and stdlib "
            "constraints depend on the active sandbox."
        ),
    )

PythonSynthesis

Bases: Module

A code Python code transformation on JSON data.

The script runs inside the Monty <https://github.com/pydantic/monty>_ sandboxed Python interpreter: the host filesystem, environment and network are unreachable from the script. Monty only supports a subset of Python (no third-party libraries, limited standard library, no class or match statements), so the generated script must stay within what Monty can execute.

This module features a python code as trainable variable, allowing the optimizers to refine the code during the training loop based on iterative feedback and automatic selection of the best script.

This module works ONLY with advanced optimizers (NOT the RandomFewShot optimizer).

The module executes the entire Python script and expects the result to be stored in a variable named 'result' at the end of execution.

Example:

import synalinks
import asyncio

default_python_script = \
"""
def transform(inputs):
    # TODO implement the code to transform the input grid into the output grid
    return {"output_grid": inputs.get("input_grid")}

result = transform(inputs)
"""

async def main():
    inputs = synalinks.Input(
        data_model=synalinks.datasets.arcagi.get_input_data_model(),
    )
    outputs = await synalinks.PythonSynthesis(
        data_model=synalinks.datasets.arcagi.get_output_data_model()
        python_script=default_python_script,
        default_return_value={"output_grid": [[]]},
    )(inputs)

    program = synalinks.Program(
        inputs=inputs,
        outputs=outputs,
        name="python_script_synthesis",
        description="A program to solve ARCAGI with python code",
    )

If you want to explore the future of neuro-symbolic self-evolving systems, contact us. While these systems are not "hard" to code thanks to Synalinks, they requires technical knowledge and a deep understanding of multiple AI paradigm.

Parameters:

Name Type Description Default
schema dict

The target JSON schema. If not provided use the data_model to infer it.

None
data_model DataModel | SymbolicDataModel | JsonDataModel

The target data model for structured output.

None
python_script str

The default Python script.

None
seed_scripts list

Optional. A list of Python scripts to use as seed for the evolution. If not provided, create a seed from the default configuration.

None
default_return_value dict

Default return value.

None
return_python_script bool

Wether or not to return the python script for evaluation. (Default to False).

False
timeout int

Maximum execution time in seconds. (Default 5 seconds).

5
tools list

Optional. A list of Tool (or MCP tools) exposed to the script as global async callables. Because Tools are async, scripts must call them inside an async def and await them (see the PythonScript docs). Passing None or an empty list means no tools are bound.

Naming gotcha: each tool is registered inside the sandbox under tool.name, which is tool._func.__name__. So Tool(_my_helper) registers as _my_helper (underscore preserved) and the script must call await _my_helper(...). Name your tool functions exactly as you want them to appear inside the generated script — rename the function, don't rely on an alias.

None
sandbox Sandbox

Optional. A pre-built Sandbox instance to reuse across calls. When supplied, the module will not build its own sandbox at call() time and sandbox_type is derived from type(sandbox). Pass this when the caller owns the sandbox lifecycle and state (variables, imports, function defs) must persist across successive calls — useful at training time when candidate scripts share cached state. When omitted, a fresh sandbox of sandbox_type is built per call.

None
sandbox_type type

Optional. The Sandbox subclass used to build a fresh sandbox per call when no sandbox is injected. Defaults to MontySandbox, or to type(sandbox) when sandbox is given. Any Sandbox subclass whose __init__ accepts (timeout=..., name=...) works; register custom subclasses with @register_synalinks_serializable so they round-trip through get_config / from_config.

None
name str

Optional. The name of the module.

None
description str

Optional. The description of the module.

None
trainable bool

Whether the module's variables should be trainable.

True

call also accepts an optional sandbox kwarg. The resolution order is: per-call kwarg > constructor-supplied sandbox > a fresh sandbox of sandbox_type. The first two cases let the caller keep sandbox state alive across calls; the third is the stateless-per-call default.

Source code in synalinks/src/modules/synthesis/python_synthesis.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
@synalinks_export(
    [
        "synalinks.modules.PythonSynthesis",
        "synalinks.PythonSynthesis",
    ]
)
class PythonSynthesis(Module):
    """A code Python code transformation on JSON data.

    The script runs inside the `Monty <https://github.com/pydantic/monty>`_
    sandboxed Python interpreter: the host filesystem, environment and network
    are unreachable from the script. Monty only supports a subset of Python
    (no third-party libraries, limited standard library, no class or match
    statements), so the generated script must stay within what Monty can
    execute.

    This module features a python code as trainable variable, allowing the optimizers
    to refine the code during the training loop based on iterative feedback and
    automatic selection of the best script.

    This module works **ONLY** with advanced optimizers (**NOT** the
    `RandomFewShot` optimizer).

    The module executes the entire Python script and expects the result to be stored
    in a variable named 'result' at the end of execution.

    Example:

    ```python
    import synalinks
    import asyncio

    default_python_script = \\
    \"\"\"
    def transform(inputs):
        # TODO implement the code to transform the input grid into the output grid
        return {"output_grid": inputs.get("input_grid")}

    result = transform(inputs)
    \"\"\"

    async def main():
        inputs = synalinks.Input(
            data_model=synalinks.datasets.arcagi.get_input_data_model(),
        )
        outputs = await synalinks.PythonSynthesis(
            data_model=synalinks.datasets.arcagi.get_output_data_model()
            python_script=default_python_script,
            default_return_value={"output_grid": [[]]},
        )(inputs)

        program = synalinks.Program(
            inputs=inputs,
            outputs=outputs,
            name="python_script_synthesis",
            description="A program to solve ARCAGI with python code",
        )
    ```

    If you want to explore the future of neuro-symbolic self-evolving systems, contact us.
    While these systems are not "hard" to code thanks to Synalinks, they requires 
    technical knowledge and a deep understanding of multiple AI paradigm.

    Args:
        schema (dict): The target JSON schema.
            If not provided use the `data_model` to infer it.
        data_model (DataModel | SymbolicDataModel | JsonDataModel): The target data
            model for structured output.
        python_script (str): The default Python script.
        seed_scripts (list): Optional. A list of Python scripts to use as seed
            for the evolution. If not provided, create a seed from the default
            configuration.
        default_return_value (dict): Default return value.
        return_python_script (bool): Wether or not to return the python script for
            evaluation. (Default to False).
        timeout (int): Maximum execution time in seconds. (Default 5 seconds).
        tools (list): Optional. A list of `Tool` (or MCP tools) exposed to the
            script as global async callables. Because `Tool`s are async,
            scripts must call them inside an `async def` and `await` them
            (see the ``PythonScript`` docs). Passing `None` or an empty list
            means no tools are bound.

            **Naming gotcha**: each tool is registered inside the sandbox
            under ``tool.name``, which is ``tool._func.__name__``. So
            ``Tool(_my_helper)`` registers as ``_my_helper`` (underscore
            preserved) and the script must call ``await _my_helper(...)``.
            Name your tool functions exactly as you want them to appear
            inside the generated script — rename the function, don't rely
            on an alias.
        sandbox (Sandbox): Optional. A pre-built ``Sandbox`` instance to
            reuse across calls. When supplied, the module will not build
            its own sandbox at ``call()`` time and ``sandbox_type`` is
            derived from ``type(sandbox)``. Pass this when the caller
            owns the sandbox lifecycle and state (variables, imports,
            function defs) must persist across successive calls — useful
            at training time when candidate scripts share cached state.
            When omitted, a fresh sandbox of ``sandbox_type`` is built
            per call.
        sandbox_type (type): Optional. The ``Sandbox`` subclass used to
            build a fresh sandbox per call when no ``sandbox`` is
            injected. Defaults to ``MontySandbox``, or to
            ``type(sandbox)`` when ``sandbox`` is given. Any ``Sandbox``
            subclass whose ``__init__`` accepts ``(timeout=..., name=...)``
            works; register custom subclasses with
            ``@register_synalinks_serializable`` so they round-trip
            through ``get_config`` / ``from_config``.
        name (str): Optional. The name of the module.
        description (str): Optional. The description of the module.
        trainable (bool): Whether the module's variables should be trainable.

    ``call`` also accepts an optional ``sandbox`` kwarg. The resolution
    order is: per-call kwarg > constructor-supplied ``sandbox`` > a
    fresh sandbox of ``sandbox_type``. The first two cases let the
    caller keep sandbox state alive across calls; the third is the
    stateless-per-call default.
    """

    def __init__(
        self,
        *,
        schema=None,
        data_model=None,
        python_script=None,
        seed_scripts=None,
        default_return_value=None,
        return_python_script=False,
        timeout=5,
        tools=None,
        sandbox=None,
        sandbox_type=None,
        name=None,
        description=None,
        trainable=True,
    ):
        super().__init__(
            name=name,
            description=description,
            trainable=trainable,
        )
        if not schema and data_model:
            schema = data_model.get_schema()
        self.schema = schema

        if not python_script:
            raise ValueError("You should provide the `python_script` argument")
        self.python_script = python_script

        if not default_return_value:
            raise ValueError("You should provide the `default_return_value` argument")

        try:
            jsonschema.validate(default_return_value, self.schema)
        except ValidationError as e:
            raise ValueError(
                f"`default_return_value` parameter does not conform to schema: {e}"
            )

        self.default_return_value = default_return_value
        self.return_python_script = return_python_script
        self.timeout = timeout

        self.tools = {}
        if tools:
            for tool in tools:
                self.tools[tool.name] = tool

        # Sandbox handling mirrors RecursiveLanguageModelAgent: if a
        # concrete sandbox is supplied at construction, reuse it across
        # calls and derive `sandbox_type` from its class. Otherwise fall
        # back to `sandbox_type` (default MontySandbox) and build one
        # fresh per `call()`.
        self.sandbox = sandbox
        if sandbox is not None:
            self.sandbox_type = type(sandbox)
        else:
            self.sandbox_type = sandbox_type or MontySandbox

        if not seed_scripts:
            seed_scripts = []
        self.seed_scripts = seed_scripts

        seed_candidates = [
            {"python_script": seed_script} for seed_script in self.seed_scripts
        ]

        self.state = self.add_variable(
            initializer=PythonScript(
                python_script=self.python_script,
                seed_candidates=seed_candidates,
            ).get_json(),
            data_model=PythonScript,
            name="state_" + self.name,
        )

    async def execute(self, inputs, python_script, sandbox=None):
        """Execute the Python script in the sandbox with a timeout."""
        return await _run_script(
            python_script,
            inputs.get_json(),
            self.schema,
            self.timeout,
            self.tools,
            sandbox=sandbox,
            sandbox_type=self.sandbox_type,
        )

    async def call(self, inputs, training=False, sandbox=None):
        if not inputs:
            return None
        python_script = self.state.get("python_script")
        # Sandbox resolution order: per-call kwarg > constructor-supplied
        # sandbox > fresh sandbox of `sandbox_type` (built inside
        # `_run_script` when `sandbox` is still None).
        if sandbox is None:
            sandbox = self.sandbox
        result, stdout, stderr = await self.execute(
            inputs, python_script, sandbox=sandbox
        )
        if training:
            predictions = self.state.get("current_predictions")
            if result:
                if self.return_python_script:
                    predictions.append(
                        {
                            "inputs": {
                                **inputs.get_json(),
                            },
                            "outputs": {
                                "python_script": python_script,
                                **result,
                                "stdout": stdout,
                                "stderr": stderr,
                            },
                            "reward": None,
                        }
                    )
                else:
                    predictions.append(
                        {
                            "inputs": {
                                **inputs.get_json(),
                            },
                            "outputs": {
                                **result,
                                "stdout": stdout,
                                "stderr": stderr,
                            },
                            "reward": None,
                        }
                    )
            else:
                if self.return_python_script:
                    predictions.append(
                        {
                            "inputs": {
                                **inputs.get_json(),
                            },
                            "outputs": {
                                "python_script": python_script,
                                "stdout": stdout,
                                "stderr": stderr,
                            },
                            "reward": None,
                        }
                    )
                else:
                    predictions.append(
                        {
                            "inputs": {
                                **inputs.get_json(),
                            },
                            "outputs": {
                                "stdout": stdout,
                                "stderr": stderr,
                            },
                            "reward": None,
                        }
                    )
        if result:
            if self.return_python_script:
                return JsonDataModel(
                    json={
                        "python_script": python_script,
                        **result,
                        "stdout": stdout,
                        "stderr": stderr,
                    },
                    schema=self.schema,
                    name=self.name,
                )
            else:
                return JsonDataModel(
                    json={
                        **result,
                        "stdout": stdout,
                        "stderr": stderr,
                    },
                    schema=self.schema,
                    name=self.name,
                )
        else:
            if self.return_python_script:
                return JsonDataModel(
                    json={
                        "python_script": python_script,
                        **self.default_return_value,
                        "stdout": stdout,
                        "stderr": stderr,
                    },
                    schema=self.schema,
                    name=self.name,
                )
            else:
                return JsonDataModel(
                    json={
                        **self.default_return_value,
                        "stdout": stdout,
                        "stderr": stderr,
                    },
                    schema=self.schema,
                    name=self.name,
                )

    async def compute_output_spec(self, inputs, training=False, sandbox=None):
        if self.return_python_script:
            return await ops.concat(
                await ops.out_mask(
                    PythonScript.to_symbolic_data_model(),
                    mask=list(Trainable.keys()),
                    name="python_script_masked_" + self.name,
                ),
                await ops.concat(
                    SymbolicDataModel(schema=self.schema),
                    PythonConsoleLog,
                    name="python_logs_" + self.name,
                ),
                name=self.name,
            )
        else:
            return await ops.concat(
                SymbolicDataModel(schema=self.schema),
                PythonConsoleLog,
                name=self.name,
            )

    def get_config(self):
        config = {
            "schema": self.schema,
            "python_script": self.python_script,
            "seed_scripts": self.seed_scripts,
            "default_return_value": self.default_return_value,
            "return_python_script": self.return_python_script,
            "timeout": self.timeout,
            "sandbox_type": get_registered_name(self.sandbox_type),
            "name": self.name,
            "description": self.description,
            "trainable": self.trainable,
        }
        sandbox_config = {
            "sandbox": (
                serialization_lib.serialize_synalinks_object(self.sandbox)
                if self.sandbox is not None
                else None
            )
        }
        tools_config = {
            "tools": [
                serialization_lib.serialize_synalinks_object(tool)
                for tool in self.tools.values()
            ]
        }
        return {**config, **sandbox_config, **tools_config}

    @classmethod
    def from_config(cls, config):
        tools = [
            serialization_lib.deserialize_synalinks_object(tool)
            for tool in config.pop("tools", [])
        ]
        sandbox = None
        if "sandbox" in config:
            sandbox_serialized = config.pop("sandbox")
            if sandbox_serialized is not None:
                sandbox = serialization_lib.deserialize_synalinks_object(
                    sandbox_serialized
                )
        sandbox_type_name = config.pop("sandbox_type", None)
        sandbox_type = (
            get_registered_object(sandbox_type_name) if sandbox_type_name else None
        )
        return cls(
            tools=tools or None,
            sandbox=sandbox,
            sandbox_type=sandbox_type,
            **config,
        )

execute(inputs, python_script, sandbox=None) async

Execute the Python script in the sandbox with a timeout.

Source code in synalinks/src/modules/synthesis/python_synthesis.py
async def execute(self, inputs, python_script, sandbox=None):
    """Execute the Python script in the sandbox with a timeout."""
    return await _run_script(
        python_script,
        inputs.get_json(),
        self.schema,
        self.timeout,
        self.tools,
        sandbox=sandbox,
        sandbox_type=self.sandbox_type,
    )