Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

18717 citations of this metric

Tool Call Accuracy evaluates the effectiveness of a language model (LLM) in accurately identifying and invoking the necessary tools to accomplish a specified task. This metric is essential for assessing the model’s capability to select and utilize appropriate tools in a sequence that aligns with the task requirements. A higher Tool Call Accuracy indicates that the LLM effectively recognizes and employs the correct tools in the proper order, thereby enhancing task performance and reliability.

 

Formula:

Tool Call Accuracy = (Number of Correct Tool Calls Made by the Model) / (Total Number of Reference Tool Calls)

 

This formula calculates the proportion of tool calls made by the model that match the reference tool calls in both sequence and correctness.

Trustworthy AI Relevance

This metric addresses Robustness by quantifying relevant system properties. Robustness: Tool call Accuracy directly measures the AI system's consistency and reliability in interacting with external tools under varied inputs. High tool-call accuracy indicates the system maintains correct behavior across cases (including distribution shifts or noisy prompts), and low accuracy reveals brittle or failure-prone behavior that undermines reliability — matching the Robustness objective (resilience and reliability).

References

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.