Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Agentic Benchmark for CRM



The Agentic Benchmark for CRM is a benchmarking framework developed by Salesforce to evaluate the performance of AI agents and models in enterprise customer relationship management (CRM) use cases using metrics such as accuracy, cost, speed, trust and safety, and sustainability. The benchmark is designed to assess the readiness of AI systems for enterprise workflows by measuring how well AI agents perform tasks grounded in real CRM environments.

The framework evaluates AI models across use cases drawn from CRM domains including sales, service, and field service operations. These tasks are based on realistic enterprise workflows and datasets that reflect how organisations use CRM systems in practice. The evaluation combines automated metrics with human assessments conducted by Salesforce employees and customers to ensure that results reflect real-world expectations and operational requirements.

The benchmark measures model performance across five core dimensions: accuracy, cost, speed, trust and safety, and sustainability.

 Accuracy evaluates factors such as factuality, instruction following, conciseness, and completeness in generated responses. Cost captures the computational and operational expense associated with using a model. Speed measures the time required to generate responses in enterprise workflows. Trust and safety evaluates characteristics such as privacy, safety, and truthfulness. Sustainability assesses the environmental impact associated with the computational resources required by different models.

Benchmark results can be explored through a dashboard that allows users to filter models and results by CRM domain, use case type, model provider, and model size. This enables organisations to compare different AI models and agents for specific enterprise CRM tasks and determine which systems are most suitable for operational deployment. By providing a structured evaluation framework grounded in enterprise CRM workflows, the Agentic Benchmark for CRM supports organisations in assessing the performance and readiness of AI agents for use in business environments.

About the tool


Developing organisation(s):






Type of approach:



Usage rights:






Geographical scope:




Technology platforms:


Tags:

  • Accuracy and performance
  • readiness
  • ai evaluation
  • customer relationship management

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.