Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

MLPerf Client



MLPerf Client

The MLPerf Client is a benchmark for Windows and macOS developed collaboratively by MLCommons. It is designed to assess the performance of large language models (LLMs) and other AI workloads on personal computing devices, including laptops, desktops, and workstations.
MLPerf Client is a result of a collaboration among technology innovators, including AMD, Intel, Microsoft, NVIDIA, Qualcomm Technologies, Inc., and top PC OEMs. These contributors have combined resources and expertise to develop a standardised benchmark that provides valuable insights into performance on key consumer AI workloads.
By simulating real-world AI tasks, the MLPerf Client benchmark provides clear metrics for understanding how well systems handle generative AI workloads. The MLPerf Client working group intends for this benchmark to drive innovation and foster competition, ensuring that PCs can meet the challenges of the AI-powered future.

In December 2024, MLCommons announced the public release of the MLPerf Client v.0.5 benchmark:

  • AI model: The benchmark’s tests are based on Meta’s Llama 2 7B large language model, optimized for reduced memory and computational requirements via 4-bit integer quantization.
  • Tests and metrics: Includes four AI tasks—content generation, creative writing, and text summarization of two different document lengths—evaluated using familiar metrics like time-to-first-token (TTFT) and tokens-per-second (TPS).
  • Hardware optimization: Supports hardware-accelerated execution on integrated and discrete GPUs via two distinct paths: ONNX Runtime GenAI and Intel OpenVINO.
  • Platform support: This initial release supports Windows 11 on x86-64 systems, with future updates planned for Windows on Arm and macOS.
  • Freely accessible: The benchmark is freely downloadable from MLCommons.org, empowering anyone to measure AI performance on supported systems.

 

About the tool


Developing organisation(s):


Tool type(s):



Impacted stakeholders:



Country of origin:



Type of approach:






Geographical scope:


Programming languages:


Tags:

  • evaluation
  • performance
  • benchmarking

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.