Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Origin

Scope

Clear all

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!

SUBMIT
Target user(s) Developer

EducationalUnited KingdomUploaded on Dec 9, 2024
PLIM is designed to make benchmarking and continuous monitoring of LLMs safer and more fit for purpose. This is particularly important in high-risk environments, e.g. healthcare, finance, insurance and defence. Having community-based prompts to validate models as fit for purpose is safer in a world where LLMs are not static. 

Objective(s)


TechnicalProceduralUnited StatesUploaded on Dec 6, 2024
Vectice is a regulatory MLOps platform for AI/ML developers and validators that streamlines documentation, governance, and collaborative reviewing of AI/ML models. Designed to enhance audit readiness and ensure regulatory compliance, Vectice automates model documentation, from development to validation. With features like automated lineage tracking and documentation co-pilot, Vectice empowers AI/ML developers and validators to work in their favorite environment while focusing on impactful work, accelerating productivity, and reducing risk.

TechnicalUnited KingdomUploaded on Dec 6, 2024
Continuous automated red teaming for AI, minimize security threats to AI models and applications.

TechnicalInternationalUploaded on Dec 6, 2024
Evaluating machine learning agents on machine learning engineering.

Objective(s)


TechnicalUnited StatesUploaded on Nov 8, 2024
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

ProceduralSingaporeUploaded on Oct 2, 2024
Resaro offers independent, third-party assurance of mission-critical AI systems. It promotes responsible, safe and robust AI adoption for enterprises, through technical advisory and evaluation of AI systems against emerging regulatory requirements.

ProceduralUnited KingdomUploaded on Oct 2, 2024
Warden AI provides independent, tech-led AI bias auditing, designed for both HR Tech platforms and enterprises deploying AI solutions in HR. As the adoption of AI in recruitment and HR processes grows, concerns around fairness have intensified. With the advent of regulations such as NYC Local Law 144 and the EU AI Act, organisations are under increasing pressure to demonstrate compliance and fairness.

ProceduralUploaded on Oct 2, 2024
FairNow is an AI governance software tool that simplifies and centralises AI risk management at scale. To build and maintain trust with customers, organisations must conduct thorough risk assessments on their AI models, ensuring compliance, fairness, and security. Risk assessments also ensure organisations know where to prioritise their AI governance efforts, beginning with high-risk models and use cases.

TechnicalUploaded on Nov 5, 2024
garak, Generative AI Red-teaming & Assessment Kit, is an LLM vulnerability scanner. Garak checks if an LLM can be made to fail.

TechnicalInternationalUploaded on Nov 5, 2024
A fast, scalable, and open-source framework for evaluating automated red teaming methods and LLM attacks/defenses. HarmBench has out-of-the-box support for transformers-compatible LLMs, numerous closed-source APIs, and several multimodal models.

TechnicalUnited StatesUploaded on Sep 9, 2024
Harms Modeling is a practice designed to help you anticipate the potential for harm, identify gaps in product that could put people at risk, and ultimately create approaches that proactively address harm.

TechnicalUnited StatesUploaded on Sep 9, 2024
Dioptra is an open source software test platform for assessing the trustworthy characteristics of artificial intelligence (AI). It helps developers on determining which types of attacks may impact negatively their model's performance.

TechnicalUnited StatesUploaded on Aug 2, 2024
AI Security Platform for GenAI and Conversational AI applications. Probe enables security officers and developers identify, mitigate, and monitor AI system security.

TechnicalUnited StatesUploaded on Jun 14, 2024
Based on the occurrence of specific events, the AIIA allows management and development teams to identify actual and potential impacts at the AI system level through a set of defined controls across stages of the system lifecycle.

TechnicalUnited KingdomUploaded on Jun 14, 2024
NayaOne is a Sandbox-as-a-Service provider to tier 1 financial services institutions, world-leading regulators, and governments. This sandbox is designed to address key concerns in AI deployment by providing a single environment where AI can be evaluated and procured while also enabling collaboration and access to world-leading tools.

ProceduralUnited KingdomUploaded on Jun 14, 2024
Panel to develop good practice in the use of new technologies like AI in the planning of the major infrastructure that are critical for the delivery of national goals such as net zero, resilience and nature recovery.

Related lifecycle stage(s)

Plan & design

ProceduralUnited KingdomUploaded on Jun 13, 2024
Provides a method for the robust assessment of whether AI systems meet the stringent requirements of national security bodies. The framework centres on a structured system card template for UK national security.

Objective(s)

Related lifecycle stage(s)

Operate & monitorDeployPlan & design

TechnicalSingaporeUploaded on Jun 25, 2024
Developed by the AI Verify Foundation, Moonshot is one of the first tools to bring Benchmarking and Red-Teaming together to help AI developers, compliance teams and AI system owners evaluate LLMs and LLM applications.

Objective(s)

Related lifecycle stage(s)

Verify & validate

TechnicalUnited KingdomUploaded on Jun 5, 2024
Advai Insight is designed for enterprise-level which require information on key insights and performance indicators. This tool provides monitoring solutions for all models and risks, giving advanced insights into the AI's performance

TechnicalUnited KingdomUploaded on Jun 5, 2024
Advai Versus is a tool for developers to test and evaluate a company's AI systems. Integrated within the MLOps architecture, Advai Versus can be used to test for biases, security, and other critical aspects, ensuring that the AI models are robust and fit for purpose.

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.