Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Agentic Benchmark for CRM

Website

Github

The Agentic Benchmark for CRM is a benchmarking framework developed by Salesforce to evaluate the performance of AI agents and models in enterprise customer relationship management (CRM) use cases using metrics such as accuracy, cost, speed, trust and safety, and sustainability. The benchmark is designed to assess the readiness of AI systems for enterprise workflows by measuring how well AI agents perform tasks grounded in real CRM environments.

The framework evaluates AI models across use cases drawn from CRM domains including sales, service, and field service operations. These tasks are based on realistic enterprise workflows and datasets that reflect how organisations use CRM systems in practice. The evaluation combines automated metrics with human assessments conducted by Salesforce employees and customers to ensure that results reflect real-world expectations and operational requirements.

The benchmark measures model performance across five core dimensions: accuracy, cost, speed, trust and safety, and sustainability.

Accuracy evaluates factors such as factuality, instruction following, conciseness, and completeness in generated responses. Cost captures the computational and operational expense associated with using a model. Speed measures the time required to generate responses in enterprise workflows. Trust and safety evaluates characteristics such as privacy, safety, and truthfulness. Sustainability assesses the environmental impact associated with the computational resources required by different models.

Benchmark results can be explored through a dashboard that allows users to filter models and results by CRM domain, use case type, model provider, and model size. This enables organisations to compare different AI models and agents for specific enterprise CRM tasks and determine which systems are most suitable for operational deployment. By providing a structured evaluation framework grounded in enterprise CRM workflows, the Agentic Benchmark for CRM supports organisations in assessing the performance and readiness of AI agents for use in business environments.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

Salesforce

Tool type(s):

Technical validation
Rating framework

Objective(s):

Safety
Environmental Sustainability

Purpose(s):

Risk management

Lifecycle stage(s):

Verify & validate
Build & interpret model

Type of approach:

Technical

Maturity:

Published document

Usage rights:

Restricted access

Target groups:

Private sector
Technical community

Target users:

Business leader
Developer

Stakeholder group:

Business
Technical community

Benefits:

Increased quality results
Responsible implementation

Geographical scope:

International

People involved:

IT employees
Operations employees

Risk management stage(s):

Govern: Communicate about risk management process
Treat: Prevent risks & impacts
Assess risks & impacts

Technology platforms:

Platform specific

Tags:

Accuracy and performance
readiness
ai evaluation
customer relationship management

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.