Output Guard
Output Guard Patterns
Output Guards protect your LM applications by filtering dangerous or inappropriate outputs AFTER they are generated by the language model. This ensures safe responses even when the LLM produces unexpected content.
Why Output Guards Matter
LLMs can sometimes produce unexpected or inappropriate outputs:
graph LR
subgraph Without Guards
A[Input] --> B[LLM]
B --> C[Unsafe Output?]
end
subgraph With Output Guard
D[Input] --> E[LLM]
E --> F[Response]
F --> G{Output Guard}
G -->|Safe| H[Response]
G -->|Unsafe| I[Warning]
end
Output guards provide:
- Safety Net: Catch inappropriate content before it reaches users
- Compliance: Ensure outputs meet regulatory requirements
- Brand Protection: Filter content that doesn't match your brand voice
- Graceful Replacement: Substitute unsafe content with helpful alternatives
The XOR and OR Operators for Output Guards
For output guards, we use XOR and OR slightly differently than input guards:
Pattern: Check-and-Replace
graph LR
A[inputs] --> B[Generator]
B --> C[answer]
C --> D[OutputGuard]
D --> E[warning]
E --> F["warning ^ answer"]
C --> F
F --> G["safe_answer (None if blocked)"]
E --> H["warning | safe_answer"]
G --> H
H --> I[output]
Flow when output is UNSAFE: 1. Generator produces answer 2. Guard checks answer, returns warning (not None) 3. XOR: warning ^ answer = None (invalidates answer) 4. OR: warning | None = warning (use warning)
Flow when output is SAFE: 1. Generator produces answer 2. Guard checks answer, returns None (no warning) 3. XOR: None ^ answer = answer (keep answer) 4. OR: None | answer = answer (use answer)
Building an Output Guard
An output guard is a custom Module that:
- Receives the LLM's output
- Returns None when output is safe
- Returns a warning DataModel when output should be replaced
import synalinks
class OutputGuard(synalinks.Module):
"""Guard that replaces outputs containing blacklisted words."""
def __init__(self, blacklisted_words, warning_message, **kwargs):
super().__init__(**kwargs)
self.blacklisted_words = blacklisted_words
self.warning_message = warning_message
async def call(self, inputs, training=False):
"""Return warning if output should be replaced, None otherwise."""
if inputs is None:
return None
# Check the answer field for blacklisted words
answer = inputs.get("answer", "").lower()
for word in self.blacklisted_words:
if word.lower() in answer:
# Return warning - this will replace the answer
return Answer(answer=self.warning_message).to_json_data_model()
# Output is safe - return None to keep original
return None
async def compute_output_spec(self, inputs, training=False):
"""Define output schema (same as input for replacement)."""
return inputs
Complete Example
import asyncio
from dotenv import load_dotenv
import synalinks
class Query(synalinks.DataModel):
query: str = synalinks.Field(description="User query")
class Answer(synalinks.DataModel):
answer: str = synalinks.Field(description="The answer")
class OutputGuard(synalinks.Module):
def __init__(self, blacklisted_words, warning_message, **kwargs):
super().__init__(**kwargs)
self.blacklisted_words = blacklisted_words
self.warning_message = warning_message
async def call(self, inputs, training=False):
if inputs is None:
return None
answer = inputs.get("answer", "").lower()
for word in self.blacklisted_words:
if word.lower() in answer:
return Answer(answer=self.warning_message).to_json_data_model()
return None
async def compute_output_spec(self, inputs, training=False):
return inputs # Same schema as input
async def main():
load_dotenv()
synalinks.clear_session()
lm = synalinks.LanguageModel(model="openai/gpt-4.1-mini")
# Build the guarded program
inputs = synalinks.Input(data_model=Query)
# Generate answer
answer = await synalinks.Generator(
data_model=Answer,
language_model=lm,
)(inputs)
# Guard checks for blacklisted words in output
warning = await OutputGuard(
blacklisted_words=["violence", "harmful", "dangerous"],
warning_message="I cannot provide that information.",
)(answer)
# XOR: If warning exists, invalidate the answer
safe_answer = warning ^ answer
# OR: Return warning if it exists, otherwise return safe_answer
outputs = warning | safe_answer
program = synalinks.Program(
inputs=inputs,
outputs=outputs,
name="output_guarded_qa",
)
# Test with various queries
result = await program(Query(query="What is Python?"))
print(f"Result: {result}")
if __name__ == "__main__":
asyncio.run(main())
Key Takeaways
-
Post-Processing: Output guards check content AFTER the LLM generates it, acting as a safety net.
-
Same Schema: Output guards typically return the same schema as the input they check, enabling seamless replacement.
-
XOR for Invalidation: When guard triggers, XOR invalidates the original answer by producing None.
-
OR for Replacement: OR selects the warning (replacement) when the original answer is invalidated.
-
Defense in Depth: Combine with input guards for comprehensive protection.
API References
Answer
OutputGuard
Bases: Module
Guard that replaces outputs containing blacklisted words.
Returns None when output is safe, or a replacement Answer when output should be filtered.
Source code in guides/10_output_guard.py
call(inputs, training=False)
async
Return replacement if output should be filtered, None otherwise.
Source code in guides/10_output_guard.py
compute_output_spec(inputs, training=False)
async
Define output schema (same type as Answer for replacement).