Academia

Three steps for businesses to make AI data and compute more sustainable

Digital data and a leaf

Concerns about Generative AI are fueling global discussions about the need for guardrails to protect against biases, discriminatory outcomes, safety, reliability, and the potential impacts on children and labour markets. Generative AI is a category of artificial intelligence (AI) techniques and models that generate new and original content, such as images, text, music, or even videos. Unlike traditional AI systems that rely on pre-programmed rules or explicit instructions, generative AI models are trained to learn and mimic patterns from vast amounts of data.

Data is a critical input into training AI models, and few have considered the carbon cost involved simply because many believe data to be carbon neutral. Over the last decade, there has been an explosion of data. While AI has been identified as a mechanism to manage the currently estimated doubling of global data every two years, there has been little consideration given to how AI itself is contributing to this growth and the environmental impacts associated with this.

As workplace behaviours are becoming more technology-oriented, organisations as collections of individuals are increasingly reliant on new decision-making tools to function in the new data environment. AI is often seen as an exciting and cutting-edge solution, and yet there is a lack of understanding in organisations about what AI requires. A critical consideration is data.

To build generative AI models, there is a data flow from the acquisition of data, be that through existing or synthetic data sets, through to its use in building training models, for instance, and ongoing data storage. At each step of the data flow, CO2 is generated, and this remains a hidden contributor to emissions from global energy use. The use of data and its infrastructure carry environmental implications that governments must assess as part of Good Practice Principles for Data Ethics, especially within the context of generative AI.

Generative AI is powerful and risky

Discussions about AI guardrails have intensified because of Large Language Models’ (LLMs) enhanced capabilities to build generative AI models. An LLM is a sophisticated AI system designed to process and generate human-like textual responses based on input. These models are typically trained on vast amounts of text data to learn the patterns, grammar, and semantics of one or more languages. They utilise deep learning techniques, such as transformers, to understand and generate coherent and contextually appropriate responses.

Models like GPT-3 stand out for their extensive size and capacity, boasting an impressive 175 billion parameters, making them among the largest language models developed thus far. With such a vast number of parameters, LLMs are remarkably versatile and accurate in comprehending and generating text across a wide range of topics and writing styles.

However, the environmental impact of training LLMs is a risk that goes unnoticed, and these data storage requirements, coupled with the CPU-intensive nature of AI, add to the overall energy consumption of building and operating AI engines. As AI applications become more prevalent, the demand for computational resources and energy continues to rise, contributing to significant environmental challenges and impacts.

To put this into context, the energy supply sector, as recognised by the UN, is the largest contributor to greenhouse gas emissions, accounting for 35% of all global emissions. The data infrastructures supporting AI significantly contribute to this energy demand, which will continue to grow in magnitude as AI becomes the new normal. For example, despite the huge steps forward in the energy efficiencies of data centres over the last decade, they alone are reported to be responsible for at least 2.5% of all human-generated carbon dioxide emissions, surpassing the carbon footprint of the aviation industry (2.1%) at pre-global pandemic flight levels.

Decisions taken by organisations embarking on the AI journey play a crucial role, especially considering that some data centres are far more energy-intensive than others, highlighting the potentially huge variability in the environmental impact of data decisions. To illustrate, there is a claim that if the Power Utilisation Effectiveness (PUE) were reduced to 1.4 for existing data centres with a PUE above 2.0 across Europe, the energy saved would power a city the size of Hamburg in a year.

Three steps that can reduce the environmental impact of AI computing

Today, efforts to create AI guardrails primarily focus on ensuring fairness, transparency, accountability, and safety. Recognising and addressing the environmental implications of AI is just as if not more urgent and yet appears absent from recent government AI narratives.

Thankfully, there are ways to mitigate environmental damage from AI. Here, we outline three distinct measures that, if used together and methodically, can make a real difference:

  • Sustainable data for AI algorithms Ensuring ethical data sourcing and respecting privacy rights and environmental considerations form the bedrock of promoting sustainable AI practices. The level of sophistication and accuracy required for the analytics in an AI project is closely linked to the number and size of datasets needed to build the system. A thoughtful evaluation of analytics requirements becomes crucial, and tools like the data carbon ladder can prove invaluable in forecasting the CO2 impact of future data projects from their proposed data flow.

    For example, can data be accessed at the source or must it be copied to a local host, which carries a data CO2 impact? Delving into the intricacies of analytics, such as decisions regarding the necessity of real-time updates versus periodic updates, should be carefully weighed to determine the most environmentally conscious approach. Employing environmentally sustainable choices throughout the AI project’s lifecycle can significantly impact the overall environmental footprint. It is essential to recognise that every decision made in AI projects could have long-term environmental consequences. By prioritising sustainable practices and data flow, we can contribute to minimising the environmental impact of AI over time, from its origin through to end-use, promoting data circularity.
  • Embrace responsible data management practices can play a pivotal role in minimising unnecessary data storage and reducing the environmental impact of AI. AI models often demand extensive datasets for training, and inefficient data storage practices can result in heightened energy consumption and environmental strain. By adopting data management strategies that prioritise data minimisation, efficient storage, and responsible data disposal, we can significantly decrease the ecological footprint of AI. The alternative is that data is used, stored and then forgotten, becoming dark data that carries a hidden CO2 cost. In our quest to expedite AI model development, it is crucial to consider best practices throughout the journey. Taking ownership of responsible data management is an area where we can make a meaningful impact. As we endeavour to build AI models, we require vast amounts of data for training, often utilising incremental datasets. However, it is essential to consider what happens to these datasets once they have served their purpose.

    There is often a disconnect between our data storage practices and their environmental impact. As an example, a recent publication from Loughborough University, co-authored by Amazon Web Services and the London Data Company, identifies as much as 80% of on-premise primary business data as potentially dark or Redundant, Obsolete, and Trivial (ROT) data. That means 1000s of terabytes.

    To prevent such cases from occurring in AI, organisations can forecast the environmental impact of storing data. They can use the following formula, presented in a recent publication in the Knowledge Management Research and Practice Journal:

    Carbon emissions data transfer and storage cost in CO2e = Size of data set in Gb x Power consumption kWh per hour) * time period of storage 24 hours * number of days the data set will be stored 0.23314 CO2 (1kWh to carbon output).

    Consistently using tools like this, organisations can develop a better understanding of the carbon emissions produced by data storage and prioritise sustainability over time.
  • Assess the need for cutting-edge AI analytics – Energy efficiency optimisation in infrastructure presents a significant opportunity for reducing AI’s carbon footprint. By incorporating energy-efficient hardware and software solutions, AI systems can minimise energy consumption and operate more sustainably. Nevertheless, there is a crucial step that warrants consideration before embarking on any new data project: determining the level of AI our project truly requires. While this may seem obvious, many project-based teams will not have sufficient understanding and knowledge of AI network, compute, and storage implications.

    While the allure of using the latest AI technology on our datasets may be tempting, it is essential to scrutinise how such advancements genuinely enhance our decision-making processes. Tools like the data carbon scorecard evaluate whether methods from descriptive, predictive, or prescriptive analytics are more appropriate to meet project needs rather than AI, which are key aids to promoting behavioural change and environmental efficiency. If more advanced analytics are needed, environmental gains can come from rationalising the number and size of datasets required to build a system. In our pursuit of effective sustainability, striking a balance between cutting-edge AI technologies and more environmentally friendly alternatives becomes paramount. By considering the potential environmental impact of our planned data projects and thoughtfully selecting the most suitable approach to fulfil project requirements, we can make informed decisions that contribute positively to our ecological responsibilities.

AI sustainability through awareness and working together

AI sustainability will require all actors of the AI ecosystem to work together to reduce data’s CO2 footprint. This ecosystem encompasses multiple levels and involves many stakeholders, each crucial in ensuring a greener and more responsible future for AI.

These three steps are a means to integrate sustainability into AI guardrails. By aligning AI’s growth with broader sustainability goals and digital decarbonisation, we can promote a more sustainable and responsible integration of AI technologies across society. Through collective efforts and co-action, we can harness the potential of AI while minimising its environmental footprint, ultimately paving the way for a more sustainable and equitable future that preserves the health of our planet.

Tom and Ian developed the Digital Decarbonization Toolkit, a range of free tools to help organisations on their net zero data journey (learn more at digitaldecarb.org).



Disclaimer: The opinions expressed and arguments employed herein are solely those of the authors and do not necessarily reflect the official views of the OECD or its member countries. The Organisation cannot be held responsible for possible violations of copyright resulting from the posting of any written material on this website/blog.