FastMCP Deployment

Deployment: FastMCP

The previous guide, FastAPI Deployment, exposed a Synalinks program to HTTP callers — browsers, mobile apps, other backend services. This guide exposes the same kind of program to a very different kind of caller: a language model client.

If you have used Claude Desktop or Cursor and noticed that they can call external tools — search the web, edit a file, query a database — that's the world we are stepping into.

What Is MCP?

MCP stands for Model Context Protocol. It is a small, open protocol — published by Anthropic in 2024 — that lets a language model client discover and call tools that you define. From the LM's point of view, your program looks like a function it can choose to invoke. From your point of view, the LM and the wire are somebody else's problem; you only have to write the function.

FastMCP is the standard Python framework for writing MCP servers. It plays the same role that FastAPI plays for REST: you write typed Python functions, the framework handles the protocol.

If you've read the FastAPI guide, the next sentence will sound familiar — and that's the point: most of what you know transfers.

The Same Trick, Repeated

A Synalinks Program is an async function. An MCP tool is an async function. So once again the adapter is small (we'll show where program comes from in the Lifespan section just below; ignore the missing import for now):

@mcp.tool
async def solve(query: str) -> NumericalAnswer:
    """Solve a word problem."""
    result = await program(Query(query=query))    # `program`: see below
    return NumericalAnswer.model_validate(result.get_json())

(As in the sibling FastAPI guide, snippets here are illustrative; the complete file lives under ## Source at the bottom.)

FastMCP cares about three things in that function:

Its name (solve) becomes the tool name shown to the LM client.
Its argument and return types become the tool's JSON schema. Scalar types (str, int, float, bool) appear as flat, top-level tool parameters — that is what LM clients invoke best. Synalinks DataModels are Pydantic models, so FastMCP accepts them directly too; use them when an argument is genuinely structured. For the single-string input here we take a str and wrap it into our internal Query model on the next line — that way the LM sees a clean query: string parameter, not a nested object with a query.query field.
Its docstring becomes the description the LM sees when it is deciding whether to call this tool. Write it as a prompt to the model, not as a developer-facing API note. ("Use this tool when the user asks…" reads well to an LM; "REST endpoint for solving" does not.)

A note on shape, not type: the program's input schema (Query) and the tool's argument schema are allowed to differ. FastAPI happens to make them line up — REST conventionally wraps a body in a JSON object, and Query(query="...") matches that exactly. MCP is a function-call protocol, so flat keyword arguments are the natural shape. The tool body bridges the two.

Two Steps, Not One: Build, Then Serve

The same warning that opened the FastAPI guide applies here: do not let your MCP server build the program on first start. Building means talking to an LM provider, which is exactly the kind of work a server should not be doing during boot. If the build half-fails, you have a half-initialised server confidently exposing a broken tool.

So we keep the same two-step shape:

Build the artifact once. This file exposes a regular build_and_save_program(path) function and a build CLI verb that calls it. Run it from a shell, a CI job, or a Python REPL — wherever you prepare other ML artifacts:

# From a shell:
python -m guides.20_fastmcp_deployment build

# Or, equivalently, from a notebook / REPL. (The module name starts
# with a digit, so import it via importlib rather than `import`.)
import asyncio, importlib
mod = importlib.import_module("guides.20_fastmcp_deployment")
asyncio.run(mod.build_and_save_program(mod.PROGRAM_PATH))

Serve. The server's only startup job is to load math_agent.json. If the file is missing, it fails loudly — not silently.

Lifespan: Loading The Program Once

Like FastAPI, FastMCP has a lifespan: an async context manager that runs once at server startup and once at shutdown. Whatever you yield from the lifespan becomes available on Context.lifespan_context inside every tool call. (Don't worry too much about the word "context" here — it's just FastMCP's name for "the bundle of information available during one tool call.")

If you read the FastAPI guide, you've seen this exact pattern. One small surface difference: the FastAPI lifespan receives app: FastAPI, the FastMCP lifespan receives server (the FastMCP instance). Same idea, different framework-supplied argument.

@asynccontextmanager
async def lifespan(server):
    if not PROGRAM_PATH.exists():
        raise FileNotFoundError(
            f"{PROGRAM_PATH} not found. Run "
            f"`python -m guides.20_fastmcp_deployment build` first."
        )
    program = synalinks.Program.load(str(PROGRAM_PATH))
    yield {"program": program}

To reach the yielded dict from inside a tool, take a Context argument — FastMCP will inject it automatically as long as the type is Context:

from fastmcp import Context

@mcp.tool
async def solve(query: str, ctx: Context) -> NumericalAnswer:
    """Solve a word problem expressed in natural language."""
    program = ctx.lifespan_context["program"]
    result = await program(Query(query=query))
    answer_json = result.out_mask(mask=["messages"]).get_json()
    return NumericalAnswer.model_validate(answer_json)

The out_mask + get_json() + model_validate(...) step is the same as in the FastAPI guide. The reason is the same: await program(...) returns a generic JsonDataModel (on purpose, so internal modules can reshape data freely), and an agent in particular also returns its trajectory (messages) for training and observability. At the boundary of your system you want the typed view back, so we drop the trajectory with out_mask and validate the rest against the response model.

Transports

mcp.run() decides how the tool talks to its client. There are two transports worth using today, plus a third kept for backwards compatibility:

stdio (default). The tool reads requests from standard input and writes responses to standard output, the same way command- line programs read and write text. The client launches your process as a subprocess. This is how Claude Desktop and Cursor use local MCP tools. If in doubt, this is the right choice.
http — a regular HTTP server (streamable). Use this when the client is on a different machine. Replace mcp.run() at the bottom of this file with mcp.run(transport="http", host="127.0.0.1", port=8001), or use fastmcp run --transport http ... from the FastMCP CLI.
(legacy) sse (Server-Sent Events) — an older HTTP-based transport. Still supported, but the MCP spec is moving to streamable HTTP. New deployments should prefer http.

Running The Server

Prerequisites: an LM you have access to. The build function defaults to gemini/gemini-3.1-flash-lite-preview, which expects a GEMINI_API_KEY env var. If you don't have one, edit build_and_save_program and change the model string — a free local option is "ollama/llama3.2:latest" (see Getting Started for setup). fastmcp itself is already a Synalinks dependency, so nothing extra to install.

This module follows the standard FastMCP conventions from the official quickstart: a module-level mcp object, @mcp.tool decorators, and mcp.run() inside if __name__ == "__main__":. So:

# stdio (default) -- the way LM clients usually launch you:
python -m guides.20_fastmcp_deployment

# or run with the FastMCP CLI:
fastmcp run guides/20_fastmcp_deployment.py

Connecting a local MCP client (Claude Desktop, Cursor, …) is a matter of adding a config entry that points at this script with stdio as the transport. The exact format depends on the client; their docs cover it.

If you're new to FastMCP, read the official quickstart alongside this guide. The shape of the module — FastMCP("name"), @mcp.tool, mcp.run() — is copied verbatim from there; only the body of the tool is Synalinks-specific.

Error Handling

There are two layers of "tool" in this file and the rules differ slightly between them. Keeping them straight saves a lot of confusion:

The inner tool, calculate, is called by the agent inside the program. The agent loop is forgiving: if calculate returns {"result": None, "log": "Error: ..."}, the agent reads the log, tries a different approach, and keeps going. That's why calculate returns a dict with a log field instead of raising.
The MCP tool, solve, is the outer boundary that talks to a language-model client. Here, errors are part of the protocol: if solve raises, FastMCP captures the exception, sends a structured error back to the client, and the LM on the other side gets a chance to react — by retrying with corrected input or by telling its user something went wrong.

Two practical consequences for the outer MCP tool:

Do not swallow exceptions inside a tool. Let them propagate. An LM that sees a clear error can fix its own mistake; an LM that sees an empty response can only guess.
Crash early on bad input. A clear ValueError("expression must contain at least one digit") is more useful to the model than a generic 500 deep inside the call.

By default FastMCP forwards the exception message verbatim to the client. This is the right default for development — the LM (and you, reading the logs) get useful information. In production you will eventually want to pass mask_error_details=True to FastMCP(...): the client gets a generic message ("internal error") while the server logs the real one. The trade-off is real, in both directions:

Unmasked leaks debugging detail to anything that talks to your server.
Masked starves the LM of the retry signal that would have let it fix its own mistake.

Pick by deployment, not by reflex.

Bonus: Re-using A FastAPI App As An MCP Server

If you already have the FastAPI server from the previous guide, FastMCP can lift its endpoints into MCP tools for you in one line:

import importlib
from fastmcp import FastMCP

# Your FastAPI module from the previous guide (its name starts with a
# digit, so import it via importlib rather than `import`).
fastapi_app = importlib.import_module("guides.19_fastapi_deployment")

mcp = FastMCP.from_fastapi(
    fastapi_app.app,
    name="math-agent",
)

if __name__ == "__main__":
    mcp.run()

Each FastAPI route becomes one MCP tool, with the route's Pydantic request/response models reused as the tool's schema. This is the cheapest way to make a single program reachable from both audiences.

One caveat worth knowing about: the auto-lifted tool inherits the FastAPI route's request body shape. In our case the FastAPI handler takes query: Query, so the auto-generated MCP tool ends up with a nested query.query parameter — the exact shape we hand-flattened to query: str above. If you go this route and want a clean LM-facing schema, either flatten on the FastAPI side (at the cost of less idiomatic REST), or accept the nested form and trust the LM to fill it in.

Concurrency Note

(Safe to skip on a first read — only matters once you train.)

Same caveat as the FastAPI guide: the program is shared across tool invocations. Serving them only reads the program's trainable variables (the configurable knobs an optimiser tunes during training — see the Training guide). The optimiser is the only thing that ever writes to them, and it doesn't run at tool-call time. If you intend to expose training itself as a tool, give each training run its own program copy so they don't fight over the same knobs.

Take-Home Summary

MCP lets a language model client call your code as a tool. FastMCP is the Python framework for it.
An MCP tool is an async def whose name, signature, and docstring become the schema and prompt the LM sees. Prefer scalar argument types (str, int, float, bool) so the LM sees flat top-level parameters; reach for DataModels only when an argument is genuinely structured. The tool body bridges the wire shape (what the LM sends) and the program shape (what your Program expects).
Build once, serve forever. The server loads a prepared artifact; building inside the server is the wrong place.
Load the program in the lifespan; reach it via Context.lifespan_context. Same separation as FastAPI: setup out of the tool body, work in the tool body.
Tool errors are part of the protocol. Raise honestly; mask details in production only when the trade-off is right for that deployment.
FastMCP.from_fastapi(app) is the one-line trick if you already have a FastAPI server and want to expose it to LM callers without writing a second adapter.

What To Learn Next

FastAPI Deployment — the sibling guide.
Observability — production tracing applies here too; the spans nest cleanly under MCP tool calls if you enable it inside the lifespan.
FastMCP's getting started — for transports beyond stdio, authentication, resources, prompts, and middleware.