Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Inspect: an open-source framework for large language model evaluations

Website

Github

Created by the UK AI Safety Institute, Inspect is a software library which enables testers – from start ups, academia and AI developers to international governments – to assess specific capabilities of individual models and then produce a score based on their results. Inspect can be used to evaluate models in a range of areas, including their core knowledge, ability to reason, and autonomous capabilities. Released through an open source licence, Inspect it is freely available.

Making Inspect available to the global community, the Institute is helping accelerate the work on AI safety evaluations being carried out across the globe, leading to better safety testing and the development of more secure models. This will allow for a consistent approach to AI safety evaluations around the world.
Inspect provides many built-in components, including facilities for prompt engineering, tool usage, multi-turn dialog, and model graded evaluations.

For more information please view the press release on AI Safety Institute releases new AI safety evaluation platform.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

AISI - AI Security Institute

Objective(s):

Transparency

Country/Territory of origin:

United Kingdom

Type of approach:

Technical

Maturity:

Published document

Usage rights:

Open source/Permissive
Free of charge

Target users:

Developer
Government
Researcher

Tags:

collaborative governance
evaluation
large language model
open source
AISI

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.