Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Resaro’s Performance and Robustness Evaluation: Facial Recognition System on the Edge

Oct 2, 2024

Resaro’s Performance and Robustness Evaluation: Facial Recognition System on the Edge

Resaro evaluated the performance of third-party facial recognition (FR) systems that run on the edge, in the context of assessing vendors’ claims about the performance and robustness of the system in highly dynamic operational conditions. 

As part of this, Resaro developed an end-to-end test protocol that included a standardised testing process to ensure a fair test across vendors, the creation of context-specific evaluation datasets that simulate operational scenarios, and the analysis of threat vectors, such as presentation attacks. The evaluation went beyond traditional accuracy metrics to identify causes of missed detections and false alarms. We also rigorously tested computational demands which are crucial for edge devices. The evaluation highlighted several trade-offs a buyer must consider when selecting an AI system, such as detection speed vs accuracy. Performance was also influenced by factors such as quality of the image, size of the face in the image, among others.

Existing benchmarks, such as NIST’s Facial Recognition Vendor Test, provide a baseline assessment of facial recognition algorithms under specific conditions, which may or may not reflect the real-world scenarios in which the FR systems are used. For example, standard benchmarks assume high-powered server environments, which differ significantly from the edge processing requirements for this use case. In addition, not all vendors offer their NIST-tested algorithms to customers for commercial use. Resaro filled these gaps by designing a standardised testing protocol, custom test dataset, and metrics that reflect the business and operational requirements. 

Benefits of using the tool in this use case

Resaro's evaluation method rigorously tests the claims made by various FR vendors, providing an objective assessment that goes beyond the vendors' marketing materials. The testing approach uncovers the trade-offs and limitations inherent in different FR technologies. Additionally, Resaro's comprehensive evaluation of resource usage enables decision-making on a system that best matches the operational demands and constraints. This approach to evaluation empowers buyers of AI systems to be well-informed about the suitability of the available systems for their use cases and risk appetites. This ultimately minimises security risks and potential reputational damage to the buyer. 

Shortcomings of using the tool in this use case

While the evaluation protocol thoroughly assesses the performance and robustness of FR systems over a wide variety of operational conditions, it may not encompass all potential real-world scenarios, especially as the operational context-of-use is dynamic. Such scenarios may require the generation of additional, more representative evaluation datasets. The evaluation protocol fundamentally offers a snapshot of the performance of the current AI system under an agreed set of operational conditions. Hence, continuous or periodic re-evaluation might be necessary when the core assumptions of the vendors’ systems are no longer relevant (e.g., major system upgrade). 

Link to the full use case.

This case study was published in collaboration with the UK Department for Science, Innovation and Technology Portfolio of AI Assurance Techniques. You can read more about the Portfolio and how you can upload your own use case here.
 

Modify this use case

About the use case


Developing organisation(s):




Country of origin: