Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

HarmBench

Website

Github

HarmBench is a standardised evaluation framework for automated red teaming. HarmBench has out-of-the-box support for transformers-compatible LLMs, numerous closed-source APIs, and several multimodal models.

Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs) highlighting the need for standardised evaluation frameworks to rigorously assess these methods. To address this need, HarmBench was designed with key considerations that were previously overlooked in red teaming evaluations. Using HarmBench, a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses was conducted, yielding novel insights. In addition, a highly efficient adversarial training method that significantly enhances LLM robustness across a wide range of attacks, demonstrates how HarmBench enables co-development of attacks and defenses.

There are two primary ways to use HarmBench:

(1) To evaluate red teaming methods against a set of LLMs.
(2) To evaluate LLMs against a set of red teaming methods.

About the tool

You can click on the links to see the associated tools

Tool type(s):

Toolkit/software

Objective(s):

Performance
Robustness

Country of origin:

International

Type of approach:

Technical

Usage rights:

Open source/Permissive

Target groups:

Technical community

Target users:

Data scientist
Developer
Project manager

Benefits:

Responsible implementation

Geographical scope:

All countries

Programming languages:

Python
Jupyter Notebook

Tags:

evaluation
robustness
red-teaming
llm

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.