These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Machine learning models, at the core of AI applications, typically achieve a high accuracy at the expense of an insufficient explainability. Moreover, according to the proposed regulations, AI applications based on machine learning must be "trustworthy'', and comply with a set of mandatory requirements, such as Sustainability and Fairness. To date there are no standardised metrics that can ensure an integrated overall assessment of the trustworthiness of AI applications, and provide a summary score of the Sustainability, Accuracy, Fairness and Explainability of an AI application.
To fill the gap, we propose a set of integrated statistical methods, based on the Lorenz Curve, that can be used to assess and monitor over time whether an AI application is trustworthy, and what are the risks of not being such . Specifically, the methods will measure Sustainability (in terms of robustness with respect to anomalous and cyber inflated data), Accuracy (in terms of predictive accuracy), Fairness (in terms of prediction bias across different population groups) and Explainability (in terms of human understanding and oversight).
Trustworthy AI Relevance
This metric addresses Fairness and Explainability by quantifying relevant system properties. SAFE explicitly includes 'Fair' and 'Explainable' in its name and design, so it directly maps to fairness and explainability objectives: Fairness — SAFE incorporates bias/inequity checks (e.g., group parity, disparate impact, equalized odds) and thus supports detecting and mitigating discriminatory outcomes; Explainability — SAFE requires that model outputs be interpretable/explainable (e.g., feature importance, local explanations, human-evaluable explanations) which improves comprehensibility of decisions. Practical rationale: SAFE is a multi-dimensional evaluation framework that forces practitioners to measure beyond raw accuracy, helping uncover fairness gaps and explanation shortcomings that single performance metrics miss.
About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Lifecycle stage(s):
Usage rights:
Target users:
Github stars:
- 7100
Github forks:
- 720


























