Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

4 citations of this metric

The root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSE represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample. The RMSE serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSE is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

RMSE is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSE is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used.

RMSE is the square root of the average of squared errors. The effect of each error on RMSE is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSE. Consequently, RMSE is sensitive to outliers.

Trustworthy AI Relevance

This metric addresses Robustness and Transparency by quantifying relevant system properties. Robustness: RMSE quantifies the magnitude of prediction errors and is sensitive to large deviations; tracking RMSE across validation sets, OOD tests, noisy inputs, or over time provides a measurable signal of model degradation, distribution shift, or failure modes. It therefore supports assessing reliability and consistency of model performance (core aspects of Robustness).

References

1. Nikulin, M.S., 2001. Hellinger distance. In Encyclopedia of Mathematics. EMS Press. 2. Hellinger, E., 1909. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. Journal für die reine und angewandte Mathematik, 136, pp.210–271. 3. Jeffreys, H., 1946. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 186(1007), pp.453–461. 4. Liese, F. and Miescke, K.-J., 2008. Statistical Decision Theory: Estimation, Testing, and Selection. Springer. 5. Pardo, L., 2006. Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC.

About the metric





Target users:


Risk management stage(s):


Github stars:

  • 7100

Github forks:

  • 720

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.