Recursive Language Model Agent
Recursive Language Model Agent
The RecursiveLanguageModelAgent (exported as synalinks.RLM) is built
for tasks where the input itself is too large or too noisy to feed
straight into the language model. Instead of packing a whole book, log
dump, or scraped corpus into the primary LM's context, the agent treats
those inputs as an external environment: the LM writes Python that
programmatically slices, filters, and aggregates the data inside a
persistent sandbox, and recursively delegates semantic work to a
sub-LM on the snippets it actually cares about.
The pattern follows Recursive Language Models (Zhang, Kraska, Khattab — 2025).
Why Recursive?
A long context is expensive on three compounding axes: token cost
scales linearly with the prompt size, latency scales linearly, and
accuracy regresses past a model-specific knee (the "lost in the middle"
effect). RLM avoids all three by keeping the primary LM in a small,
structured context (a metadata summary of the input plus the tool
catalog plus the accumulated trajectory). The full value lives in the
sandbox under inputs[field], and the LM decides per query, per turn
which slice to look at.
flowchart TD
A[Long input + Query] --> S[InputsSummary<br/>previews + sizes only]
S --> P[Primary LM]
P --> C[Python snippet]
C --> X[Monty Sandbox<br/>inputs is dict, full value]
X --> Q{semantic work?}
Q -->|Yes| L[llm_query / llm_query_batched<br/>sub-LM on a snippet]
Q -->|No| R[Pure code: regex, slicing, set ops]
L --> O[Observation]
R --> O
O --> P
P -->|done| SU[submit result]
Needle in a Haystack
This example builds a long, repetitive document (~200 paragraphs of
filler text) and hides a single fact — "The magic number is 4242" —
near the middle. The primary LM never sees the full text; it only
sees an InputsSummary with a preview and a length. Finding the
needle requires writing code that scans the full text in the sandbox
and either uses a regex or batches sub-LM calls over candidate spans.
Define a Doc input and Answer output data model:
class Doc(synalinks.DataModel):
text: str = synalinks.Field(description="The document to analyze")
class Answer(synalinks.DataModel):
answer: str = synalinks.Field(description="The final answer to the user")
Wire up the RLM agent. The primary LM drives orchestration and final
formatting; the sub-LM (configurable via sub_language_model=) handles
per-snippet semantic work. Both default to the same model when only
language_model= is passed:
inputs = synalinks.Input(data_model=Doc)
outputs = await synalinks.RLM(
data_model=Answer,
language_model=language_model,
max_iterations=10,
max_llm_calls=20,
)(inputs)
agent = synalinks.Program(inputs=inputs, outputs=outputs, name="rlm_needle")
When the agent runs, the primary LM emits one Python snippet per turn. State persists across turns inside a Monty REPL sandbox — variables, imports, and function definitions accumulate. Two extra async helpers are exposed in the sandbox alongside any tools you bind:
llm_query(prompt)— single sub-LM call, returns{"result": <text>}.llm_query_batched(prompts)— concurrent sub-LM calls, returns{"result": [<text>, ...]}, preserving input order.
A shared counter caps the two helpers at max_llm_calls per
agent(...) invocation; when exhausted they short-circuit with
{"result": <empty>, "error": "..."} and do not consume quota. The
counter resets on every invocation, so concurrent calls get independent
budgets.
Termination is via the always-present submit tool: submit(result={...})
captures the final payload, validates it against the configured output
schema, and ends the run. Empty python_code strings are no-ops — the
loop reminds the LM to call submit. If max_iterations is reached
without a successful submit, a final LM inference step formats the
accumulated trajectory into the target schema.
Key Takeaways
- Long inputs as external environment: the primary LM sees a
metadata summary; the full value lives in
inputs[field]inside the sandbox. - Two recursive helpers:
llm_queryandllm_query_batchedsend work to a sub-LM and share one budget capped atmax_llm_calls. - Pick a cheap
sub_language_modelwhen you have one available: a typical RLM run is dominated by sub-LM calls, so splitting primary vs. sub-LM is the largest cost lever. submitis the termination path: schema validation errors come back as a retry observation on the next turn.
Program Visualization
API References
build_haystack(needle, paragraphs=200)
Return a long document with needle placed near the middle.
