Skip to content

FScore metrics

BinaryF1Score

Bases: BinaryFBetaScore

Computes F-1 Score on binary structures.

Formula:

f1_score = 2 * (precision * recall) / (precision + recall)

This is the harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a field level and can be used for multi-class and multi-label classification.

Each field of y_true and y_pred should booleans or floats between [0, 1]. If the fields are floats, it uses the threshold for deciding if the values are 0 or 1.

Parameters:

Name Type Description Default
average str

Type of averaging to be performed across per-class results in the multi-class case. Acceptable values are None, "micro", "macro" and "weighted". Defaults to None. If None, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.

None
threshold float

(Optional) Float representing the threshold for deciding whether prediction values are 1 or 0. Elements of y_pred and y_true greater than threshold are converted to be 1, and the rest 0.

0.5
name str

(Optional) string name of the metric instance.

'binary_f1_score'
in_mask list

(Optional) list of keys to keep to compute the metric.

None
out_mask list

(Optional) list of keys to remove to compute the metric.

None
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.BinaryF1Score")
class BinaryF1Score(BinaryFBetaScore):
    """Computes F-1 Score on binary structures.

    Formula:

    ```python
    f1_score = 2 * (precision * recall) / (precision + recall)
    ```

    This is the harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a field level
    and can be used for **multi-class and multi-label classification**.

    Each field of `y_true` and `y_pred` should booleans or floats between [0, 1].
    If the fields are floats, it uses the threshold for deciding
    if the values are 0 or 1.

    Args:
        average (str): Type of averaging to be performed across per-class results
            in the multi-class case.
            Acceptable values are `None`, `"micro"`, `"macro"` and
            `"weighted"`. Defaults to `None`.
            If `None`, no averaging is performed and `result()` will return
            the score for each class.
            If `"micro"`, compute metrics globally by counting the total
            true positives, false negatives and false positives.
            If `"macro"`, compute metrics for each label,
            and return their unweighted mean.
            This does not take label imbalance into account.
            If `"weighted"`, compute metrics for each label,
            and return their average weighted by support
            (the number of true instances for each label).
            This alters `"macro"` to account for label imbalance.
            It can result in an score that is not between precision and recall.
        threshold (float): (Optional) Float representing the threshold for deciding
            whether prediction values are 1 or 0. Elements of `y_pred` and `y_true`
            greater than `threshold` are converted to be 1, and the rest 0.
        name (str): (Optional) string name of the metric instance.
        in_mask (list): (Optional) list of keys to keep to compute the metric.
        out_mask (list): (Optional) list of keys to remove to compute the metric.
    """

    def __init__(
        self,
        average=None,
        threshold=0.5,
        name="binary_f1_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            average=average,
            beta=1.0,
            threshold=threshold,
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        base_config = super().get_config()
        del base_config["beta"]
        return base_config

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    base_config = super().get_config()
    del base_config["beta"]
    return base_config

BinaryFBetaScore

Bases: FBetaScore

Computes F-Beta score on binary structures.

Formula:

b2 = beta ** 2
f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)

This is the weighted harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a field level and can be used for multi-class and multi-label classification.

Each field of y_true and y_pred should be booleans or floats between [0, 1]. If the fields are floats, it uses the threshold parameter for deciding if the values are 0 or 1.

Example:

class MultiClassClassification(synalinks.DataModel):
    label_1: bool = synalinks.Field(
        description="The first label",
    )
    label_2: bool = synalinks.Field(
        description="The second label",
    )
    label_3: bool = synalinks.Field(
        description="The third label",
    )

# OR you can also use floats between 0 and 1
# The `Score`, enforce a float between 0.0 and 1.0 using constrained decoding

class MultiClassClassification(synalinks.DataModel):
    label_1: synalinks.Score = synalinks.Field(
        description="The first label",
    )
    label_2: synalinks.Score = synalinks.Field(
        description="The second label",
    )
    label_3: synalinks.Score = synalinks.Field(
        description="The third label",
    )

Parameters:

Name Type Description Default
average str

Type of averaging to be performed across per-class results in the multi-class case. Acceptable values are None, "micro", "macro" and "weighted". Defaults to None. If None, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.

None
beta float

Determines the weight of given to recall in the harmonic mean between precision and recall (see pseudocode equation above). Defaults to 1.

1.0
threshold float

(Optional) Float representing the threshold for deciding whether prediction values are 1 or 0. Elements of y_pred and y_true greater than threshold are converted to be 1, and the rest 0.

0.5
name str

(Optional) string name of the metric instance.

'binary_fbeta_score'
in_mask list

(Optional) list of keys to keep to compute the metric.

None
out_mask list

(Optional) list of keys to remove to compute the metric.

None
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.BinaryFBetaScore")
class BinaryFBetaScore(FBetaScore):
    """Computes F-Beta score on binary structures.

    Formula:

    ```python
    b2 = beta ** 2
    f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)
    ```

    This is the weighted harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a field level
    and can be used for **multi-class and multi-label classification**.

    Each field of `y_true` and `y_pred` should be booleans or floats between [0, 1].
    If the fields are floats, it uses the threshold parameter for deciding
    if the values are 0 or 1.

    Example:

    ```

    class MultiClassClassification(synalinks.DataModel):
        label_1: bool = synalinks.Field(
            description="The first label",
        )
        label_2: bool = synalinks.Field(
            description="The second label",
        )
        label_3: bool = synalinks.Field(
            description="The third label",
        )

    # OR you can also use floats between 0 and 1
    # The `Score`, enforce a float between 0.0 and 1.0 using constrained decoding

    class MultiClassClassification(synalinks.DataModel):
        label_1: synalinks.Score = synalinks.Field(
            description="The first label",
        )
        label_2: synalinks.Score = synalinks.Field(
            description="The second label",
        )
        label_3: synalinks.Score = synalinks.Field(
            description="The third label",
        )

    ```

    Args:
        average (str): Type of averaging to be performed across per-class results
            in the multi-class case.
            Acceptable values are `None`, `"micro"`, `"macro"` and
            `"weighted"`. Defaults to `None`.
            If `None`, no averaging is performed and `result()` will return
            the score for each class.
            If `"micro"`, compute metrics globally by counting the total
            true positives, false negatives and false positives.
            If `"macro"`, compute metrics for each label,
            and return their unweighted mean.
            This does not take label imbalance into account.
            If `"weighted"`, compute metrics for each label,
            and return their average weighted by support
            (the number of true instances for each label).
            This alters `"macro"` to account for label imbalance.
            It can result in an score that is not between precision and recall.
        beta (float): Determines the weight of given to recall
            in the harmonic mean between precision and recall (see pseudocode
            equation above). Defaults to `1`.
        threshold (float): (Optional) Float representing the threshold for deciding
            whether prediction values are 1 or 0. Elements of `y_pred` and `y_true`
            greater than `threshold` are converted to be 1, and the rest 0.
        name (str): (Optional) string name of the metric instance.
        in_mask (list): (Optional) list of keys to keep to compute the metric.
        out_mask (list): (Optional) list of keys to remove to compute the metric.
    """

    def __init__(
        self,
        average=None,
        beta=1.0,
        threshold=0.5,
        name="binary_fbeta_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            average=average,
            beta=beta,
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )
        if not isinstance(threshold, float):
            raise ValueError(
                "Invalid `threshold` argument value. "
                "It should be a Python float. "
                f"Received: threshold={threshold} "
                f"of type '{type(threshold)}'"
            )
        if threshold > 1.0 or threshold <= 0.0:
            raise ValueError(
                "Invalid `threshold` argument value. "
                "It should verify 0 < threshold <= 1. "
                f"Received: threshold={threshold}"
            )
        self.threshold = threshold

    async def update_state(self, y_true, y_pred):
        y_pred = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_pred)
        y_true = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_true)

        if self.in_mask:
            y_pred = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_true)
        if self.out_mask:
            y_pred = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_true)

        def convert_to_binary(x):
            if isinstance(x, bool):
                return 1.0 if x is True else 0.0
            elif isinstance(x, float):
                return 1.0 if x > self.threshold else 0.0
            else:
                raise ValueError(
                    "All `y_true` and y_pred` fields should be booleans or floats. "
                    "Use `in_mask` or `out_mask` to remove the other fields."
                )

        y_true = tree.flatten(
            tree.map_structure(lambda x: convert_to_binary(x), y_true.get_json())
        )
        y_pred = tree.flatten(
            tree.map_structure(lambda x: convert_to_binary(x), y_pred.get_json())
        )
        y_true = np.convert_to_tensor(y_true)
        y_pred = np.convert_to_tensor(y_pred)

        true_positives = y_pred * y_true
        false_positives = y_pred * (1 - y_true)
        false_negatives = (1 - y_pred) * y_true
        intermediate_weights = y_true

        current_true_positives = self.state.get("true_positives")
        if current_true_positives:
            true_positives = np.add(current_true_positives, true_positives)

        current_false_positives = self.state.get("false_positives")
        if current_false_positives:
            false_positives = np.add(current_false_positives, false_positives)

        current_false_negatives = self.state.get("false_negatives")
        if current_false_negatives:
            false_negatives = np.add(current_false_negatives, false_negatives)

        current_intermediate_weights = self.state.get("intermediate_weights")
        if current_intermediate_weights:
            intermediate_weights = np.add(
                current_intermediate_weights, intermediate_weights
            )

        self.state.update(
            {
                "true_positives": true_positives.tolist(),
                "false_positives": false_positives.tolist(),
                "false_negatives": false_negatives.tolist(),
                "intermediate_weights": intermediate_weights.tolist(),
            }
        )

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        config = {
            "beta": self.beta,
            "threshold": self.threshold,
            "name": self.name,
        }
        base_config = super().get_config()
        return {**base_config, **config}

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    config = {
        "beta": self.beta,
        "threshold": self.threshold,
        "name": self.name,
    }
    base_config = super().get_config()
    return {**base_config, **config}

F1Score

Bases: FBetaScore

Computes F-1 Score.

Formula:

f1_score = 2 * (precision * recall) / (precision + recall)

This is the harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a word level and can be used for QA systems.

If y_true and y_pred contains multiple fields The JSON object's fields are flattened and the score computed for each one independently before being averaged.

Parameters:

Name Type Description Default
average str

Type of averaging to be performed across per-field results in the multi-field case. Acceptable values are None, "micro", "macro" and "weighted". Defaults to None. If None, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.

None
name str

(Optional) string name of the metric instance.

'f1_score'
in_mask list

(Optional) list of keys to keep to compute the metric.

None
out_mask list

(Optional) list of keys to remove to compute the metric.

None
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.F1Score")
class F1Score(FBetaScore):
    """Computes F-1 Score.

    Formula:

    ```python
    f1_score = 2 * (precision * recall) / (precision + recall)
    ```

    This is the harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a word level
    and can be used for **QA systems**.

    If `y_true` and `y_pred` contains multiple fields
    The JSON object's fields are flattened and the score
    computed for each one independently before being averaged.

    Args:
        average (str): Type of averaging to be performed across per-field results
            in the multi-field case.
            Acceptable values are `None`, `"micro"`, `"macro"` and
            `"weighted"`. Defaults to `None`.
            If `None`, no averaging is performed and `result()` will return
            the score for each class.
            If `"micro"`, compute metrics globally by counting the total
            true positives, false negatives and false positives.
            If `"macro"`, compute metrics for each label,
            and return their unweighted mean.
            This does not take label imbalance into account.
            If `"weighted"`, compute metrics for each label,
            and return their average weighted by support
            (the number of true instances for each label).
            This alters `"macro"` to account for label imbalance.
            It can result in an score that is not between precision and recall.
        name (str): (Optional) string name of the metric instance.
        in_mask (list): (Optional) list of keys to keep to compute the metric.
        out_mask (list): (Optional) list of keys to remove to compute the metric.
    """

    def __init__(
        self,
        average=None,
        name="f1_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            average=average,
            beta=1.0,
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        base_config = super().get_config()
        del base_config["beta"]
        return base_config

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    base_config = super().get_config()
    del base_config["beta"]
    return base_config

FBetaScore

Bases: Metric

Computes F-Beta score.

Formula:

b2 = beta ** 2
f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)

This is the weighted harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a word level and can be used for QA systems.

If y_true and y_pred contains multiple fields The JSON object's fields are flattened and the score computed for each one independently.

Parameters:

Name Type Description Default
average str

Type of averaging to be performed across per-field results in the multi-field case. Acceptable values are None, "micro", "macro" and "weighted". Defaults to None. If None, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.

None
beta float

Determines the weight of given to recall in the harmonic mean between precision and recall (see pseudocode equation above). Defaults to 1.

1.0
name str

(Optional) string name of the metric instance.

'fbeta_score'
in_mask list

(Optional) list of keys to keep to compute the metric.

None
out_mask list

(Optional) list of keys to remove to compute the metric.

None
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.FBetaScore")
class FBetaScore(Metric):
    """Computes F-Beta score.

    Formula:

    ```python
    b2 = beta ** 2
    f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)
    ```

    This is the weighted harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a word level
    and can be used for **QA systems**.

    If `y_true` and `y_pred` contains multiple fields
    The JSON object's fields are flattened and the score
    computed for each one independently.

    Args:
        average (str): Type of averaging to be performed across per-field results
            in the multi-field case.
            Acceptable values are `None`, `"micro"`, `"macro"` and
            `"weighted"`. Defaults to `None`.
            If `None`, no averaging is performed and `result()` will return
            the score for each class.
            If `"micro"`, compute metrics globally by counting the total
            true positives, false negatives and false positives.
            If `"macro"`, compute metrics for each label,
            and return their unweighted mean.
            This does not take label imbalance into account.
            If `"weighted"`, compute metrics for each label,
            and return their average weighted by support
            (the number of true instances for each label).
            This alters `"macro"` to account for label imbalance.
            It can result in an score that is not between precision and recall.
        beta (float): Determines the weight of given to recall
            in the harmonic mean between precision and recall (see pseudocode
            equation above). Defaults to `1`.
        name (str): (Optional) string name of the metric instance.
        in_mask (list): (Optional) list of keys to keep to compute the metric.
        out_mask (list): (Optional) list of keys to remove to compute the metric.
    """

    def __init__(
        self,
        average=None,
        beta=1.0,
        name="fbeta_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )
        if average not in (None, "micro", "macro", "weighted"):
            raise ValueError(
                "Invalid `average` argument value. Expected one of: "
                "[None, 'micro', 'macro', 'weighted']. "
                f"Received: average={average}"
            )

        if not isinstance(beta, float):
            raise ValueError(
                "Invalid `beta` argument value. "
                "It should be a Python float. "
                f"Received: beta={beta} of type '{type(beta)}'"
            )
        self.state = self.add_variable(
            data_model=FBetaState,
            name="state_" + self.name,
        )
        self.average = average
        self.beta = beta
        self.axis = None
        if self.average != "micro":
            self.axis = 0

    async def update_state(self, y_true, y_pred):
        y_pred = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_pred)
        y_true = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_true)

        if self.in_mask:
            y_pred = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_true)
        if self.out_mask:
            y_pred = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_true)

        y_true = tree.flatten(tree.map_structure(lambda x: str(x), y_true.get_json()))
        y_pred = tree.flatten(tree.map_structure(lambda x: str(x), y_pred.get_json()))

        true_positives = []
        false_positives = []
        false_negatives = []
        intermediate_weights = []
        # For each field of y_true and y_pred
        for yt, yp in zip(y_true, y_pred):
            y_true_tokens = nlp_utils.normalize_and_tokenize(str(yt))
            y_pred_tokens = nlp_utils.normalize_and_tokenize(str(yp))
            common_tokens = set(y_true_tokens) & set(y_pred_tokens)
            true_positives.append(len(common_tokens))
            false_positives.append(len(y_pred_tokens) - len(common_tokens))
            false_negatives.append(len(y_true_tokens) - len(common_tokens))
            intermediate_weights.append(len(y_true_tokens))

        true_positives = np.convert_to_numpy(true_positives)
        false_positives = np.convert_to_numpy(false_positives)
        false_negatives = np.convert_to_numpy(false_negatives)
        intermediate_weights = np.convert_to_numpy(intermediate_weights)

        current_true_positives = self.state.get("true_positives")
        if current_true_positives:
            true_positives = np.add(current_true_positives, true_positives)

        current_false_positives = self.state.get("false_positives")
        if current_false_positives:
            false_positives = np.add(current_false_positives, false_positives)

        current_false_negatives = self.state.get("false_negatives")
        if current_false_negatives:
            false_negatives = np.add(current_false_negatives, false_negatives)

        current_intermediate_weights = self.state.get("intermediate_weights")
        if current_intermediate_weights:
            intermediate_weights = np.add(
                current_intermediate_weights, intermediate_weights
            )

        self.state.update(
            {
                "true_positives": true_positives.tolist(),
                "false_positives": false_positives.tolist(),
                "false_negatives": false_negatives.tolist(),
                "intermediate_weights": intermediate_weights.tolist(),
            }
        )

    def result(self):
        if (
            self.state.get("true_positives") is None
            and self.state.get("false_positives") is None
            and self.state.get("false_negatives") is None
        ):
            return 0.0
        precision = np.divide(
            self.state.get("true_positives"),
            np.add(
                self.state.get("true_positives"),
                self.state.get("false_positives"),
            )
            + backend.epsilon(),
        )
        recall = np.divide(
            self.state.get("true_positives"),
            np.add(
                self.state.get("true_positives"),
                self.state.get("false_negatives"),
            )
            + backend.epsilon(),
        )
        precision = np.convert_to_tensor(precision)
        recall = np.convert_to_tensor(recall)

        mul_value = precision * recall
        add_value = ((self.beta**2) * precision) + recall
        mean = np.divide(mul_value, add_value + backend.epsilon())
        f1_score = mean * (1 + (self.beta**2))
        if self.average == "weighted":
            intermediate_weights = self.state.get("intermediate_weights")
            weights = np.divide(
                intermediate_weights,
                np.sum(intermediate_weights) + backend.epsilon(),
            )
            f1_score = np.sum(f1_score * weights)

        elif self.average is not None:  # [micro, macro]
            f1_score = np.mean(f1_score, self.axis)

        try:
            return float(f1_score)
        except Exception:
            return list(f1_score)

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        config = {
            "name": self.name,
            "beta": self.beta,
        }
        base_config = super().get_config()
        return {**base_config, **config}

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    config = {
        "name": self.name,
        "beta": self.beta,
    }
    base_config = super().get_config()
    return {**base_config, **config}

ListF1Score

Bases: ListFBetaScore

Computes F-1 Score on list structures.

Formula:

    f1_score = 2 * (precision * recall) / (precision + recall)

This is the harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a field level and can be used for classification or retrieval pipelines.

The difference between this metric and the F1Score is that this one consider each element of the list (or the string) as one label.

This metric works using list or string structures like in the following example:

Example:

    # for single label classification

    class ListClassification(synalinks.DataModel):
        label: Literal["label", "label_1", "label_2"]

    # for multi label classification

    class ListClassification(synalinks.DataModel):
        labels: List[Literal["label", "label_1", "label_2"]]

    # or use it with retrieval pipelines, in that case make sure to mask the correct fields.

    class AnswerWithReferences(synalinks.DataModel):
        sources: List[str]
        answer: str

Parameters:

Name Type Description Default
average str

Type of averaging to be performed across per-field results in the multi-field case. Acceptable values are None, "micro", "macro" and "weighted". Defaults to None. If None, no averaging is performed and result() will return the score for each class. If "micro", compute metrics globally by counting the total true positives, false negatives and false positives. If "macro", compute metrics for each label, and return their unweighted mean. This does not take label imbalance into account. If "weighted", compute metrics for each label, and return their average weighted by support (the number of true instances for each label). This alters "macro" to account for label imbalance. It can result in an score that is not between precision and recall.

None
name str

(Optional) string name of the metric instance.

'list_f1_score'
in_mask list

(Optional) list of keys to keep to compute the metric.

None
out_mask list

(Optional) list of keys to remove to compute the metric.

None
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.ListF1Score")
class ListF1Score(ListFBetaScore):
    """Computes F-1 Score on list structures.

    Formula:
    ```python
        f1_score = 2 * (precision * recall) / (precision + recall)
    ```

    This is the harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a field level
    and can be used for **classification** or **retrieval pipelines**.

    The difference between this metric and the `F1Score` is that this one consider 
    each element of the list (or the string) as **one label**.

    This metric works using list or string structures like in the following example:

    Example:

    ```python

        # for single label classification

        class ListClassification(synalinks.DataModel):
            label: Literal["label", "label_1", "label_2"]

        # for multi label classification

        class ListClassification(synalinks.DataModel):
            labels: List[Literal["label", "label_1", "label_2"]]

        # or use it with retrieval pipelines, in that case make sure to mask the correct fields.

        class AnswerWithReferences(synalinks.DataModel):
            sources: List[str]
            answer: str
    ```

    Args:
        average (str): Type of averaging to be performed across per-field results
            in the multi-field case.
            Acceptable values are `None`, `"micro"`, `"macro"` and
            `"weighted"`. Defaults to `None`.
            If `None`, no averaging is performed and `result()` will return
            the score for each class.
            If `"micro"`, compute metrics globally by counting the total
            true positives, false negatives and false positives.
            If `"macro"`, compute metrics for each label,
            and return their unweighted mean.
            This does not take label imbalance into account.
            If `"weighted"`, compute metrics for each label,
            and return their average weighted by support
            (the number of true instances for each label).
            This alters `"macro"` to account for label imbalance.
            It can result in an score that is not between precision and recall.
        name (str): (Optional) string name of the metric instance.
        in_mask (list): (Optional) list of keys to keep to compute the metric.
        out_mask (list): (Optional) list of keys to remove to compute the metric.
    """

    def __init__(
        self,
        average=None,
        name="list_f1_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            average=average,
            beta=1.0,
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        base_config = super().get_config()
        del base_config["beta"]
        return base_config

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    base_config = super().get_config()
    del base_config["beta"]
    return base_config

ListFBetaScore

Bases: FBetaScore

Computes F-Beta score on list structures.

Formula:

b2 = beta ** 2
f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)

This is the weighted harmonic mean of precision and recall. Its output range is [0, 1]. It operates at a field level and can be used for classification or retrieval pipelines.

The difference between this metric and the F1Score is that this one consider each element of the list (or the string) as one label.

This metric works using list or string structures like in the following example:

Example:

# for single label classification

class ListClassification(synalinks.DataModel):
    label: Literal["label", "label_1", "label_2"]

# for multi label classification

class ListClassification(synalinks.DataModel):
    labels: List[Literal["label", "label_1", "label_2"]]

# or use it with retrieval pipelines, in that case make sure to mask the correct fields.

class AnswerWithReferences(synalinks.DataModel):
    sources: List[str]
    answer: str
Source code in synalinks/src/metrics/f_score_metrics.py
@synalinks_export("synalinks.metrics.ListFBetaScore")
class ListFBetaScore(FBetaScore):
    """Computes F-Beta score on list structures.

    Formula:

    ```python
    b2 = beta ** 2
    f_beta_score = (1 + b2) * (precision * recall) / (precision * b2 + recall)
    ```

    This is the weighted harmonic mean of precision and recall.
    Its output range is `[0, 1]`. It operates at a field level
    and can be used for **classification** or **retrieval pipelines**.

    The difference between this metric and the `F1Score` is that this one consider 
    each element of the list (or the string) as **one label**.

    This metric works using list or string structures like in the following example:

    Example:

    ```python

    # for single label classification

    class ListClassification(synalinks.DataModel):
        label: Literal["label", "label_1", "label_2"]

    # for multi label classification

    class ListClassification(synalinks.DataModel):
        labels: List[Literal["label", "label_1", "label_2"]]

    # or use it with retrieval pipelines, in that case make sure to mask the correct fields.

    class AnswerWithReferences(synalinks.DataModel):
        sources: List[str]
        answer: str

    ```

    """
    def __init__(
        self,
        average=None,
        beta=1.0,
        name="list_fbeta_score",
        in_mask=None,
        out_mask=None,
    ):
        super().__init__(
            average=average,
            beta=beta,
            name=name,
            in_mask=in_mask,
            out_mask=out_mask,
        )

    async def update_state(self, y_true, y_pred):
        y_pred = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_pred)
        y_true = tree.map_structure(lambda x: ops.convert_to_json_data_model(x), y_true)

        if self.in_mask:
            y_pred = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.in_mask(mask=self.in_mask), y_true)
        if self.out_mask:
            y_pred = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_pred)
            y_true = tree.map_structure(lambda x: x.out_mask(mask=self.out_mask), y_true)

        y_true = tree.flatten(tree.map_structure(lambda x: x, y_true.get_json()))
        y_pred = tree.flatten(tree.map_structure(lambda x: x, y_pred.get_json()))

        true_positives = []
        false_positives = []
        false_negatives = []
        intermediate_weights = []

        for yt, yp in zip(y_true, y_pred):        
            y_true_tokens = [str(tok) for tok in y_true] if isinstance(y_true, list) else [yt]
            y_pred_tokens = [str(tok) for tok in y_pred] if isinstance(y_pred, list) else [yp]
            common_tokens = set(y_true_tokens) & set(y_pred_tokens)
            true_positives.append(len(common_tokens))
            false_positives.append(len(y_pred_tokens) - len(common_tokens))
            false_negatives.append(len(y_true_tokens) - len(common_tokens))
            intermediate_weights.append(len(y_true_tokens))

        true_positives = np.convert_to_numpy(true_positives)
        false_positives = np.convert_to_numpy(false_positives)
        false_negatives = np.convert_to_numpy(false_negatives)
        intermediate_weights = np.convert_to_numpy(intermediate_weights)

        current_true_positives = self.state.get("true_positives")
        if current_true_positives:
            true_positives = np.add(current_true_positives, true_positives)

        current_false_positives = self.state.get("false_positives")
        if current_false_positives:
            false_positives = np.add(current_false_positives, false_positives)

        current_false_negatives = self.state.get("false_negatives")
        if current_false_negatives:
            false_negatives = np.add(current_false_negatives, false_negatives)

        current_intermediate_weights = self.state.get("intermediate_weights")
        if current_intermediate_weights:
            intermediate_weights = np.add(
                current_intermediate_weights, intermediate_weights
            )

        self.state.update(
            {
                "true_positives": true_positives.tolist(),
                "false_positives": false_positives.tolist(),
                "false_negatives": false_negatives.tolist(),
                "intermediate_weights": intermediate_weights.tolist(),
            }
        )

    def get_config(self):
        """Return the serializable config of the metric.

        Returns:
            (dict): The config dict.
        """
        config = {
            "beta": self.beta,
            "name": self.name,
        }
        base_config = super().get_config()
        return {**base_config, **config}

get_config()

Return the serializable config of the metric.

Returns:

Type Description
dict

The config dict.

Source code in synalinks/src/metrics/f_score_metrics.py
def get_config(self):
    """Return the serializable config of the metric.

    Returns:
        (dict): The config dict.
    """
    config = {
        "beta": self.beta,
        "name": self.name,
    }
    base_config = super().get_config()
    return {**base_config, **config}