Preqin data

Methodological note

OECD.AI estimates venture capital (VC) financial investment in AI and data firms worldwide based on private-source data from Preqin, processed by the AI lab of the Jožef Stefan Institute, Slovenia, and analysed by the OECD. Preqin is a private company, founded in 2003, which collects data regarding private equity transactions, funds, and fund managers.

Deal information provided by Preqin includes information on the firm raising VC investments as well as on the deal itself and on the investors. Information about the firm includes the name of the company, the country where it is located, the year it was established, a description of its activities, a classification of the industries where the firms operate, as well as a set of cross-industry classifications, labelled “verticals”. Information about the deal includes the date, the stage (e.g. seed funding, round A, etc.), and the amount of the deal. Information about the investors includes their names and the country where they are located.

An AI start-up is considered to be a private company that researches and delivers all or part of an AI system or researches and delivers products and services that rely significantly on AI systems. The definition of an AI system follows that of the OECD principles: “An AI system is a machine-based system that is capable of influencing the environment by making recommendations, predictions, or decisions for a given set of objectives. It does so by utilising machine and/or human-based inputs/data to i) perceive real and/or virtual environments; ii) abstract such perceptions into models manually or automatically, and iii) use Model Interpretations to formulate options for outcomes.” A data start-up is considered to be a private company that provides solutions for large volumes of data, through data gathering, storing, or analysis.

Start-ups are identified as AI or data start-ups based on Preqin’s cross-industry and vertical categorisation, as well as on OECD’s automated analysis of the keywords contained in the description of the company’s activities. AI keywords used are of three kinds: generic AI keywords, such as “artificial intelligence” and “machine learning”; keywords pertaining to AI techniques, such as “neural network”, “deep learning”, “reinforcement learning”; and keywords referring to fields of AI applications, such as “computer vision”, “predictive analytics”, “natural language processing”, “autonomous vehicles”. Data keywords include “data management”, “data collection” and “data tracking”. Firms with keywords related to digital security, cloud computing, or telecommunications were not considered data-related firms. Furthermore, AI start-ups can be focused on generative AI, and the keywords pertaining to this field include “generative ai”, “generative artificial intelligence”, “generative adversarial network”, “creative adversarial network”, “text generation”, “image generation”, “audio generation”, “generative model”, “stable diffusion”, “chat gpt”, “creative ai”, “creative artificial intelligence”, “style transfer”, “content generation”, “creative coding”, “coding assistant” and “code generation”. AI start-ups can also be focused on compute, and the keywords pertaining to this field include “compute”, “data centre”, “semiconductor”, “GPU”, “CPU”, “high-performance compute”, “core software system”, “processor chip”, “infrastructure-as-a-service”, “neuromorphic computing”, “full-stack”, “integrated circuit”, “FPGA” and “computing chips”.

Deals reported as being “Secondary Stock Purchase”, “Mergers” or “Add-ons” were excluded from the analysis because those deals do not correspond to the financing of start-ups, i.e. where the money goes to those start-ups to develop themselves, but to a secondary market transaction where the money goes directly from one investor to another investor.

Preqin data was processed and categorised by country and industry. The industry categorisation is based on grouping 228 Preqin industry labels into 20 broader categories. The industries considered are:

  • IT infrastructure and hosting
  • Media, social platforms, marketing
  • Business processes and support services
  • Healthcare, drugs, and biotechnology
  • Robots, sensors, IT hardware
  • Financial and insurance services
  • Digital security
  • Mobility and autonomous vehicles
  • Education and training
  • Logistics, wholesale, and retail
  • Consumer products
  • Travel, leisure, and hospitality
  • Agriculture
  • Energy, raw materials, and utilities
  • Consumer services
  • Government, security, and defence
  • Environmental services
  • Construction and air conditioning
  • Real estate
  • Food and beverages

Many of the reported investment transactions (deals) do not include the amount invested, e.g. 18% of the deals for US AI start-ups and 63% for Chinese AI start-ups from 2012 to 2020. Where possible, an estimate of missing amounts was calculated based on the median amount of comparable clusters of deals per country of the start-up, investment year, and investment stage. 

When considering the origin of the financing, a number of deals have no investor identified (e.g. about 16% of AI deals from 2012 to 2020). The estimates prorate the value of those deals to the different countries in the sample following the distribution of deals with reported investors.

When a single round of financing includes multiple investors, Preqin data does not specify how much each investor has contributed. For such deals, the invested value is split equally between investors.

For more information and findings about venture capital investments in AI start-ups, please see and: 

OECD (2021), Venture Capital Investments in Artificial Intelligence: Analysing trends in VC in AI companies from 2012 through 2020. OECD Publishing, Paris.

Sign up for OECD artificial intelligence newsletter