Agent Goal Accuracy

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

4 citations of this metric

Github

Website

Agent Goal Accuracy is a metric used to evaluate the effectiveness of a language model in accurately identifying and achieving a user’s intended goals during an interaction. This binary metric assigns a score of 1 if the AI successfully accomplishes the user’s goal and 0 if it does not. It is particularly valuable in assessing the performance of AI agents in task-oriented dialogues, where the objective is to fulfill specific user requests.

Formula:

Agent Goal Accuracy = (Number of Successfully Achieved Goals) / (Total Number of Goals)

Trustworthy AI Relevance

This metric addresses Robustness and Human Agency & Control by quantifying relevant system properties. Robustness: Agent Goal Accuracy quantifies an agent's ability to deliver correct outcomes across tasks and conditions. As a consistency/reliability metric, it helps detect failures under distribution shift, ambiguous inputs, or noisy environments and supports monitoring and improvement of resilience (preferred mapping for general performance metrics).

References

Agentic or Tool use - Ragas

About the metric

You can click on the links to see the associated metrics

Metric type(s):

Technical

Objective(s):

Human Agency & Control
Robustness

Purpose(s):

Interaction support/chatbots
Personalisation/recommenders
Content generation

Target sector(s):

Other
Education
Economy

Lifecycle stage(s):

Operate & monitor
Verify & validate
Build & interpret model

Usage rights:

Open source/Permissive

Target users:

Data scientist
Developer
Researcher

Risk management stage(s):

Treat: Mitigate risks & impacts
Assess risks & impacts
Assess
Treat

Github stars:

7100

Github forks:

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.