Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

ShieldGemma

Website

ShieldGemma is a set of instruction tuned models for evaluating the safety of text and images against a set of defined safety policies.

The models are designed to support the development of generative AI systems by evaluating the safety of prompts and model outputs against predefined safety policies. ShieldGemma functions as a content safety evaluation component that can be integrated into generative AI applications to detect and prevent policy violations in generated content.

The model family includes classifiers for different modalities. ShieldGemma 1 focuses on text content moderation and is available in several parameter sizes, including 2B, 9B, and 27B models. ShieldGemma 2 extends these capabilities to images and provides a 4-billion-parameter model designed to classify the safety of synthetic and natural images.

These models are trained to identify violations across key harm categories such as sexually explicit content, dangerous content, hate, and harassment. Developers can use ShieldGemma as a filtering mechanism within generative AI pipelines, for example by checking prompts before they reach a model or by filtering outputs generated by AI systems.

The models are provided with open weights and can be fine-tuned to adapt to specific use cases and safety policies defined by developers. By enabling automated classification of unsafe or policy-violating content, ShieldGemma supports developers in building generative AI applications that align with defined safety standards and reduce the risk of harmful outputs.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

Google AI for Developers

Tool type(s):

Toolkit/software
Technical validation

Objective(s):

Robustness
Safety

Purpose(s):

Event/anomaly detection
Risk management

Lifecycle stage(s):

Operate & monitor
Deploy

Type of approach:

Technical

Maturity:

Published document

Usage rights:

Open source/Permissive
Free of charge

Target groups:

Academia/educators/students
Private sector
Technical community

Target users:

Developer
IT specialist

Stakeholder group:

Academia
Business
Technical community

Benefits:

Open access material
Reduction in risk of failure
Responsible implementation

Geographical scope:

International

People involved:

IT employees

Required skills:

IT skills

Technology platforms:

Multi-platform

Tags:

ai ethics
evaluation
ai generated content
ai safety

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.