Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

Jun 5, 2024

Advai: Robustness Assurance Framework for Guardrail Implementation in Large Language Models (LLMs)

To assure and secure LLMs up to the standard needed for business adoption, Advai provides a robustness assurance framework designed to test, detect, and mitigate potential vulnerabilities. This framework establishes strict guardrails that ensure the LLM's outputs remain within acceptable operational and ethical boundaries, in line with parameters set by the organisation. Our adversarial attacks have been optimised across multiple models with different architectures, therefore relevant to a broad range of LLMs. Not only do these attacks reveal potential causes of natural failure, but we can therefore immunise client LLMs against similar attacks, enhancing the guardrail's longer-term effectiveness.

The robustness assurance framework is a proactive strategy to instil trust and reliability in LLMs before deployment, aiming to pre-emptively address potential risks and vulnerabilities. Our approach embodies the belief that language model assurance must come ‘first not last’, because the consequences of vulnerable language models can be significant. 

Benefits of using the tool in this use case

Offers a systematic method to adjust the operational scope of LLMs to match varying contexts and risk profiles.

  • Empowers stakeholders with a clear understanding of the LLM's operational limits. Organisations with knowledge of failure modes can design uses for this system within those boundaries.
  • Ensures compliance with regulatory requirements, fostering trust and wider acceptance.
  • New adversarial threats seem to emerge weekly. Keeps the LLM's guardrails current with the latest adversarial strategies, maintaining a state of preparedness against emerging threats.

Shortcomings of using the tool in this use case

The necessity for continuous investment and updating of a rewards prompt dataset and surrounding guardrail activities is resource demanding.

  • Understanding the nuances of matching contextual needs to the robustness framework requires specialised knowledge from the business side.
  • Ultimately there is no guarantee that these pre-emptive measures will be 100% effective. They simply skew the risk calculation in favour of deploying the LLM to business users or customer facing applications.
     

Related links: 

This case study was published in collaboration with the UK Department for Science, Innovation and Technology Portfolio of AI Assurance Techniques. You can read more about the Portfolio and how you can upload your own use case here.

Modify this use case

About the use case


Objective(s):