Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Advai: Operational Boundaries Calibration for AI Systems via Adversarial Robustness Techniques

Jun 5, 2024

Advai: Operational Boundaries Calibration for AI Systems via Adversarial Robustness Techniques

To enable AI systems to be deployed safely and effectively in enterprise environments, there must be a solid understanding of their fault tolerances in response to adversarial stress-testing methods.

 Our stress-testing tools identifies vulnerabilities from these two broad categories of AI failure:  

1. Natural, human-meaningful vulnerabilities encompass failure modes that a human could hypothesise, e.g. a computer vision system struggling with a skewed, foggy, or rotated image.  

2. Adversarial vulnerabilities, pinpoint where minor yet unexpected parameter variations can induce failure. These vulnerabilities not only reveal potential attack vectors but also signal broader system fragility. It’s worth noting that the methods for detecting adversarial vulnerabilities can often reveal natural failure modes, too.

 The process begins with "jailbreaking" AI models, a metaphor for stress-testing them to uncover hidden flaws. This involves presenting the system with a range of adversarial inputs to identify at what points the AI fails or when it responds in unintended ways. These adversarial inputs are crafted using state-of-the-art techniques that simulate potential real-world attacks or unexpected inputs that the system may encounter.

 Advai's adversarial robustness framework then defines a model’s operational limits – points beyond which a system is likely to fail. This use case captures our approach to calibrating the operational use of AI systems according to their points of failure. 

Adversarial robustness testing is the gold standard for stress-testing AI systems in a controlled and empirical manner. It not only exposes potential weaknesses but also confirms the precise conditions under which the AI system can be expected to perform unreliably, guiding the formulation of precise operational boundaries. 

Benefits of using the tool in this use case

Enhanced predictability and reliability of AI systems that are used within their operational scope, leading to increased trust from users and stakeholders.

  • A more objective risk profile that can be communicated across the organisation, helping technical and non-technical stakeholders align on organisational need and model deployment decisions.
  • Empowerment of the organisation to enforce an AI posture that meets industry regulations and ethical standards through informed boundary-setting.

Shortcomings of using the tool in this use case

While adversarial testing is thorough, it is not exhaustive and might not account for every conceivable scenario, especially under rapidly evolving conditions.

  • The process requires expert knowledge and continuous re-evaluation to keep pace with technological advancements and emerging threat landscapes.
  • Internal expertise is needed to match the failure induced by adversarial methods with the organisation’s appetite for risk in a given use-case.
  • There is a trade-off between the restrictiveness of operational boundaries and the AI's ability to learn and adapt; overly strict boundaries may inhibit the system's growth and responsiveness to new data.
     

Related links: 

This case study was published in collaboration with the UK Department for Science, Innovation and Technology Portfolio of AI Assurance Techniques. You can read more about the Portfolio and how you can upload your own use case here.

Modify this use case

About the use case


Objective(s):


Impacted stakeholders: