Batch reward wrappers
BatchReward
Bases: Reward
Batched reward base class.
Subclasses receive the entire batch at once and must return one reward per sample. Use this when the reward needs cross-sample context (e.g. group-relative scores, batch normalization, paired comparisons).
To be implemented by subclasses:
call(y_true, y_pred):y_trueandy_predare lists of lengthbatch_size. MUST return alist[float]of the same length, one reward per sample.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Optional name for the reward instance. |
None
|
reduction
|
str
|
Optional. One of |
'mean'
|
in_mask
|
list
|
Optional. List of exact field names to keep before computing the reward. |
None
|
out_mask
|
list
|
Optional. List of exact field names to drop before computing the reward. |
None
|
in_mask_pattern
|
str
|
Optional. Regex pattern; fields whose names
match are kept (combined with |
None
|
out_mask_pattern
|
str
|
Optional. Regex pattern; fields whose names
match are dropped (combined with |
None
|
Source code in synalinks/src/rewards/batch_reward.py
compute_batch(y_true, y_pred)
async
Apply masks and return the per-sample reward list (unreduced).
This is what the trainer calls — it expects the raw list[float]
of length batch_size so it can treat each entry as that
sample's reward.
Source code in synalinks/src/rewards/batch_reward.py
BatchRewardFunctionWrapper
Bases: BatchReward
Wrap a stateless batched function into a BatchReward.
The wrapped function receives the full batch and must return a
list[float] of length batch_size.
Example:
async def my_batch_reward(y_true, y_pred):
# y_true, y_pred: list[JsonDataModel] of length batch_size
return [1.0 if t.get_json() == p.get_json() else 0.0
for t, p in zip(y_true, y_pred)]
program.compile(
reward=synalinks.rewards.BatchRewardFunctionWrapper(fn=my_batch_reward),
optimizer=synalinks.optimizers.RandomFewShot(),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
callable
|
Async batched reward function with signature
|
required |
name
|
str
|
Optional. string name of the reward instance. |
None
|
reduction
|
str
|
Optional. One of |
'mean'
|
in_mask
|
list
|
Optional. |
None
|
out_mask
|
list
|
Optional. |
None
|
in_mask_pattern
|
str
|
Optional. |
None
|
out_mask_pattern
|
str
|
Optional. |
None
|
**kwargs
|
keyword arguments
|
Extra keyword arguments forwarded
to |
{}
|