Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

HarmBench



HarmBench

HarmBench is a standardised evaluation framework for automated red teaming. HarmBench has out-of-the-box support for transformers-compatible LLMs, numerous closed-source APIs, and several multimodal models.

Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs) highlighting the need for standardised evaluation frameworks to rigorously assess these methods. To address this need, HarmBench was designed with key considerations that were previously overlooked in red teaming evaluations. Using HarmBench, a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses was conducted, yielding novel insights. In addition, a highly efficient adversarial training method that significantly enhances LLM robustness across a wide range of attacks, demonstrates how HarmBench enables co-development of attacks and defenses.

There are two primary ways to use HarmBench: 

  • (1) To evaluate red teaming methods against a set of LLMs.
  • (2) To evaluate LLMs against a set of red teaming methods.

About the tool


Tool type(s):



Country of origin:


Type of approach:



Target groups:




Geographical scope:


Programming languages:


Tags:

  • evaluation
  • robustness
  • red-teaming
  • llm

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.