Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

ShieldGemma



ShieldGemma is a set of instruction tuned models for evaluating the safety of text and images against a set of defined safety policies.

The models are designed to support the development of generative AI systems by evaluating the safety of prompts and model outputs against predefined safety policies. ShieldGemma functions as a content safety evaluation component that can be integrated into generative AI applications to detect and prevent policy violations in generated content.

The model family includes classifiers for different modalities. ShieldGemma 1 focuses on text content moderation and is available in several parameter sizes, including 2B, 9B, and 27B models. ShieldGemma 2 extends these capabilities to images and provides a 4-billion-parameter model designed to classify the safety of synthetic and natural images.

These models are trained to identify violations across key harm categories such as sexually explicit content, dangerous content, hate, and harassment. Developers can use ShieldGemma as a filtering mechanism within generative AI pipelines, for example by checking prompts before they reach a model or by filtering outputs generated by AI systems.

The models are provided with open weights and can be fine-tuned to adapt to specific use cases and safety policies defined by developers. By enabling automated classification of unsafe or policy-violating content, ShieldGemma supports developers in building generative AI applications that align with defined safety standards and reduce the risk of harmful outputs.

About the tool


Developing organisation(s):





Lifecycle stage(s):


Type of approach:








Geographical scope:


People involved:


Required skills:


Technology platforms:


Tags:

  • ai ethics
  • evaluation
  • ai generated content
  • ai safety

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.