These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Implementing Responsible Data Enrichment Practices at an AI Developer: The Example of DeepMind

As demand for AI services grows, so, too, does the need for the enriched data used to train and validate machine learning (ML) models. While these datasets can only be prepared by humans, the data enrichment workers who do so (performing tasks like data annotation, data cleaning, and human review of algorithmic outputs) are an often-overlooked part of the development lifecycle, frequently working in poor conditions. Following the Partnership on AI’s (PAI) recent white paper on Responsible Sourcing of Data Enrichment Services, we (DeepMind) collaborated with PAI to develop our practices and processes for data enrichment. This included the creation of five steps AI practitioners can follow to improve the working conditions for people involved in data enrichment tasks (for more details, please visit PAI’s Data Enrichment Sourcing Guidelines):
- Select an appropriate payment model and ensure all workers are paid above the local living wage.
- Design and run a pilot before launching a data enrichment project.
- Identify appropriate workers for the desired task.
- Provide verified instructions and/or training materials for workers to follow.
- Establish clear and regular communication mechanisms with workers.
Together, we created the necessary policies and resources, gathering multiple rounds of feedback from our internal legal, data, security, ethics, and research teams in the process, before piloting them on a small number of data collection projects and later rolling them out to the wider organisation.
Benefits of using the tool in this use case
These tools can provide researchers and institutions with more clarity around how best to set up data enrichment tasks, which may improve researchers' confidence in study design and execution. At DeepMind, they not only increased the efficiency of our approval and launch processes, but, importantly, enhanced the experience of the people involved in data enrichment tasks.
Shortcomings of using the tool in this use case
While these best practices underpin our work, we shouldn’t rely on them alone to ensure our projects meet the highest standards of participant or worker welfare and safety in research. Each project at DeepMind is different, which is why we have a dedicated human data review process that allows us to continually engage with research teams to identify and mitigate risks on a case-by-case basis. This could lead to extensions or amendments to our best practices in the future, based on feedback.
Learnings or advice for using the tool in a similar context
While some AI organisations may have less infrastructure in place to make similar changes (DeepMind has an existing IRB-style review process for human participant research), the resources shared in this case study are designed to make it easier for organisations of any size to responsibly create enriched datasets. We hope that this leads to cross-sector conversations which could further develop these guidelines and resources for teams and partners. Through this collaboration we also hope to spark broader discussion about how the AI community can continue to develop norms of responsible data collection and collectively build better industry standards.
About the use case
You can click on the links to see the associated use cases
Impacted stakeholders:
Target sector(s):
Country/Territory of origin:
Target users:

























