Policy evaluation

AI has the potential to reshape how public policies are assessed, offering faster, broader and more dynamic evaluations. By accelerating data collection and analysis, supporting causal inference and enabling real-time synthesis of evidence, AI tools can help governments better understand what works — and under what conditions. Yet adoption in this domain remains limited, with most applications still in the experimental stage and significant capacity gaps across public administrations.

The current state of play

AI is beginning to support policy evaluation in a variety of ways:

  • Supporting evaluation design and implementation. AI can assist in summarising previous evaluations and synthesising evidence across studies, particularly through text mining and document screening. These tools reduce the time needed for systematic reviews and help evaluators develop stronger ex ante and ex post designs.
  • Supporting analytics. NLP and machine learning techniques are being used to analyse large sets of documents, identify programme logic, and evaluate stakeholder inputs. These tools support both qualitative insight and predictive modelling in evaluation contexts.
  • Supporting management and communication, AI can generate evaluation summaries, draft terms of reference and assist with project planning. LLMs also power digital repositories that help users search and navigate thousands of past evaluations — improving access and reuse.

Despite early innovations, the field still faces key barriers: poor data quality, limited evaluator expertise in AI, and the lack of tailored training and infrastructure. Risks such as automation bias (i.e. over-reliance on AI), algorithmic opacity and skewed datasets also pose serious concerns for the legitimacy and rigour of AI-supported evaluations.

Examples from practice

  • Norway: Classifying cybercrime with AI in audit. Norway’s Office of the Auditor General used ML and text mining to analyse over 300,000 police cases. The results helped auditors evaluate national efforts to combat cybercrime, improving reliability in data classification.
  • European Commission: Multilingual summarisation. The eSummary tool, developed by the EU’s Directorate-General for Translation, uses AI to condense policy documents and generate summaries across all official EU languages — improving accessibility and decision support for evaluators.
  • OECD: AI-powered environmental policy evaluation. In collaboration with global partners, the OECD applied AI to assess over 1,500 environmental policies across 41 countries. The study identified key policy combinations that reduced emissions, offering new insights for climate governance.
  • World Bank: Portfolio-wide project analysis. Using unsupervised ML, the World Bank analysed 392 project reports to identify patterns in success and failure across aid-receiving countries — enhancing learning at scale.

Untapped potential and the way forward

AI can strengthen evaluation by enabling faster, more iterative analysis. From forecasting impacts to simulating alternatives and tracking public sentiment, AI supports both policy design and review. Approaches like dynamic evaluation and AI-enabled rapid reviews can improve agility during crises. Achieving this will require better data, infrastructure, trained evaluators and strong institutional support to embed evaluation into real-time decision making and promote evidence-based governance.

Learn more

Review a detailed section on AI in policy evaluation here.