Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Topic Adherence evaluates an AI system’s ability to confine its responses to predefined subject areas during interactions. This metric is crucial in applications where the AI is expected to assist only within specific domains, ensuring that responses remain relevant and within scope. By assessing how well the AI adheres to designated topics, this metric helps maintain the quality and relevance of interactions, especially in conversational AI systems deployed in real-world applications.


Formulae:

Precision:

Precision = (Number of Relevant Queries Answered that Match Reference Topics) / (Total Number of Queries Answered)

Recall:

Recall = (Number of Relevant Queries Answered that Match Reference Topics) / (Total Number of Relevant Queries in Reference Topics)

F1 Score:

F1 Score = (2 * Precision * Recall) / (Precision + Recall)

 

These formulas measure the AI’s adherence to predefined topics by assessing both how accurately the AI’s responses align with relevant topics (precision) and how comprehensively the AI covers the intended topics (recall). The F1 score provides a balanced metric by combining both precision and recall.

References

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.