Reward Metrics And Optimizers
Rewards, Metrics & Optimizers
Understanding Rewards
Rewards are an essential part of reinforcement learning frameworks.
They are scalar values (between 0.0 and 1.0 for synalinks)
that guide the process into making more efficient decisions or
predictions. During training, the goal is to maximize the reward function.
The reward gives the system an indication of how well it performed for that task.
All rewards consist of a function or program that takes two inputs:
y_pred: The prediction of the program.y_true: The ground truth/target value provided by the training data.
Understanding Metrics
Metrics are scalar values that are monitored during training and evaluation.
These values are used to know which program is best, in order to save it. Or to
provide additional information to compare different architectures with each others.
Unlike Rewards, a Metric is not used during training, meaning the metric value
is not backpropagated. Additionaly every reward function can be used as metric.
Predictions Filtering
Sometimes, your program have to output a complex JSON but you want to evaluate
just part of it. This could be because your training data only include a subset
of the JSON, or because the additonal fields were added only to help the LMs.
In that case, you have to filter out or filter in your predictions and ground
truth using out_mask or in_mask list parameter.
Understanding Optimizers
graph LR
subgraph Training Loop
A[Input] --> B[Program]
B --> C[y_pred]
D[y_true] --> E[Reward]
C --> E
E --> F[Optimizer]
F --> |update| B
end
E --> G[Metrics]
Optimizers are systems that handle the update of the module's state in order to make them more performant. They are in charge of backpropagating the rewards from the training process and select or generate examples and instructions for the LMs.
program.compile(
reward=synalinks.rewards.CosineSimilarity(
embedding_model=embedding_model,
in_mask=["answer"], # Only evaluate the "answer" field
),
optimizer=synalinks.optimizers.RandomFewShot(),
metrics=[
synalinks.metrics.F1Score(in_mask=["answer"]),
],
)
Key Takeaways
- Rewards: Guide the reinforcement learning process by providing feedback on the system's performance.
- Metrics: Scalar values monitored during training and evaluation to determine the best-performing program.
- Optimizers: Update the module's state to improve performance.
- Filtering Outputs: Use
out_maskorin_maskto evaluate only relevant fields of complex JSON outputs.