These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification.
Trustworthy AI Relevance
This metric addresses Robustness by quantifying relevant system properties. F-score relates to the assigned objective 'Robustness' because it quantifies the model's classification performance and therefore its consistency and reliability under typical evaluation conditions. Tracking F1 (and per-class F1) across datasets, time slices, or perturbed/noisy inputs helps detect degradation under distribution shift, class imbalance, or adversarial/noisy conditions—key aspects of robustness.
Related use cases :
A survey of cross-validation procedures for model selection
Uploaded on Oct 21, 2022Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many...
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
Uploaded on Nov 1, 2023About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Target sector(s):
Lifecycle stage(s):
Target users:


























