Output Guard

Output Guard Patterns

Output Guards protect your LM applications by filtering dangerous or inappropriate outputs AFTER they are generated by the language model. This ensures safe responses even when the LLM produces unexpected content.

Why Output Guards Matter

LLMs can sometimes produce unexpected or inappropriate outputs:

graph LR
    subgraph Without Guards
        A[Input] --> B[LLM]
        B --> C[Unsafe Output?]
    end
    subgraph With Output Guard
        D[Input] --> E[LLM]
        E --> F[Response]
        F --> G{Output Guard}
        G -->|Safe| H[Response]
        G -->|Unsafe| I[Warning]
    end

Output guards provide:

Safety Net: Catch inappropriate content before it reaches users
Compliance: Ensure outputs meet regulatory requirements
Brand Protection: Filter content that doesn't match your brand voice
Graceful Replacement: Substitute unsafe content with helpful alternatives

The XOR and OR Operators for Output Guards

For output guards, we use XOR and OR slightly differently than input guards:

Pattern: Check-and-Replace

graph LR
    A[inputs] --> B[Generator]
    B --> C[answer]
    C --> D[OutputGuard]
    D --> E[warning]
    E --> F["warning ^ answer"]
    C --> F
    F --> G["safe_answer (None if blocked)"]
    E --> H["warning | safe_answer"]
    G --> H
    H --> I[output]

Flow when output is UNSAFE: 1. Generator produces answer 2. Guard checks answer, returns warning (not None) 3. XOR: warning ^ answer = None (invalidates answer) 4. OR: warning | None = warning (use warning)

Flow when output is SAFE: 1. Generator produces answer 2. Guard checks answer, returns None (no warning) 3. XOR: None ^ answer = answer (keep answer) 4. OR: None | answer = answer (use answer)

Building an Output Guard

An output guard is a custom Module that: - Receives the LLM's output - Returns None when output is safe - Returns a warning DataModel when output should be replaced

import synalinks

class OutputGuard(synalinks.Module):
    """Guard that replaces outputs containing blacklisted words."""

    def __init__(self, blacklisted_words, warning_message, **kwargs):
        super().__init__(**kwargs)
        self.blacklisted_words = blacklisted_words
        self.warning_message = warning_message

    async def call(self, inputs, training=False):
        """Return warning if output should be replaced, None otherwise."""
        if inputs is None:
            return None

        # Check the answer field for blacklisted words
        answer = inputs.get("answer", "").lower()

        for word in self.blacklisted_words:
            if word.lower() in answer:
                # Return warning - this will replace the answer
                return Answer(answer=self.warning_message).to_json_data_model()

        # Output is safe - return None to keep original
        return None

    async def compute_output_spec(self, inputs, training=False):
        """Define output schema (same as input for replacement)."""
        return inputs

Complete Example

import asyncio
from dotenv import load_dotenv
import synalinks

class Query(synalinks.DataModel):
    query: str = synalinks.Field(description="User query")

class Answer(synalinks.DataModel):
    answer: str = synalinks.Field(description="The answer")

class OutputGuard(synalinks.Module):
    def __init__(self, blacklisted_words, warning_message, **kwargs):
        super().__init__(**kwargs)
        self.blacklisted_words = blacklisted_words
        self.warning_message = warning_message

    async def call(self, inputs, training=False):
        if inputs is None:
            return None
        answer = inputs.get("answer", "").lower()
        for word in self.blacklisted_words:
            if word.lower() in answer:
                return Answer(answer=self.warning_message).to_json_data_model()
        return None

    async def compute_output_spec(self, inputs, training=False):
        return inputs  # Same schema as input

async def main():
    load_dotenv()
    synalinks.clear_session()

    lm = synalinks.LanguageModel(model="openai/gpt-4.1-mini")

    # Build the guarded program
    inputs = synalinks.Input(data_model=Query)

    # Generate answer
    answer = await synalinks.Generator(
        data_model=Answer,
        language_model=lm,
    )(inputs)

    # Guard checks for blacklisted words in output
    warning = await OutputGuard(
        blacklisted_words=["violence", "harmful", "dangerous"],
        warning_message="I cannot provide that information.",
    )(answer)

    # XOR: If warning exists, invalidate the answer
    safe_answer = warning ^ answer

    # OR: Return warning if it exists, otherwise return safe_answer
    outputs = warning | safe_answer

    program = synalinks.Program(
        inputs=inputs,
        outputs=outputs,
        name="output_guarded_qa",
    )

    # Test with various queries
    result = await program(Query(query="What is Python?"))
    print(f"Result: {result}")

if __name__ == "__main__":
    asyncio.run(main())

Key Takeaways

Post-Processing: Output guards check content AFTER the LLM generates it, acting as a safety net.
Same Schema: Output guards typically return the same schema as the input they check, enabling seamless replacement.
XOR for Invalidation: When guard triggers, XOR invalidates the original answer by producing None.
OR for Replacement: OR selects the warning (replacement) when the original answer is invalidated.
Defense in Depth: Combine with input guards for comprehensive protection.