AIKoD data
Methodological note
Background information
Generative AI (genAI) models can generate a broad range of content, including text, code, images, audio, and other media They are being applied across various contexts, ranging from cognitive tasks to manufacturing.
The AI knowledge on Demand (AIKOD) database is the product of OECD research on developments in the AI market; see André et al., (2025) for more details. Data collection is carried out by scraping publicly available websites of cloud providers offering AI-as-a-Service from an API and collecting the catalogue price of each model endpoint. The information on price is then matched for each model with its correspondent public available benchmarks performance from several sources. The dataset is composed of models trained by AI developers from 14 countries and available from 51 cloud providers from 11 countries.
The database aims to maximise the sample representativity across countries and regions. However, access restrictions in some countries (e.g., China) and the rapidly-changing AI cloud market – where new providers emerge monthly – may lead to underrepresentation of recent and fast-growing providers with unique offerings. Additionally, the database only includes publicly available models and does not account for strictly on-premise or non-disclosed options.
Dimensions of analysis
Each genAI model is uniquely characterised by three dimensions: the modality of the model specifies how users interact with the model – whether through text, images, or sound – and the type of output the model generates (e.g., text-to-image, audio-to-text, text-to-text, multimodal); the quality (performance) of the model is a measure of the capability to successfully perform a cognitive task (such as being able to successfully answer physics questions); and the price of inference is the cost of using the model and reflects the price of inputs (e.g. an hour of audio for audio-to-text models) and outputs (e.g. a generated image for text-to-image models).
Identifying genAI model origin and modality
The “provider” of the model is the cloud service firm that offers support and hosting services for the model. The “developer” is the firm that developed the model. A model’s country assignment is determined by the location of the headquarters of the provider or developer.
The dataset also includes additional information when available, in particular whether the model has been released in open-weight with a permissive license (allowing commercial use) and details on the model specification (size of the model, architecture, etc.).
Models are further categorised based on the type, or modality, of the input and output. Modalities in the AIKoD database can be text-to-text, text-to-image, text-to-audio or audio-to-text.
Frequency of data updates
The visualisations on OECD.AI include data from 2023 onwards. Data will be updated regularly.
Identifying unique genAI models
GenAI models are frequently updated and can vary across multiple dimensions. Careful consideration is needed when defining what constitutes a distinct model.
Therefore, models were identified using the unique combinations of 6 technical characteristics: foundation model, variant, version, update date, number of parameters, and context window size.
- Foundation model is the name of the base model (e.g. GPT, LLaMA, etc)
- Variant refers to a specific adaptation or configuration of a foundation model (e.g. Gemini-pro, Claude-sonnet, etc). There are sometimes 2 variants (e.g. LLaMA-groq-preview).
- Version refers to a specific release iteration of a model or variant (e.g. GPT-4o, Deepseek-2.5, etc)
- Update date is the most recent date on which the model (or its parameters/configuration) was updated or released (e.g. GPT-turbo-20240409, Mistral-large-202407).
- Number of parameters refers to the total number of trainable weights in the model (e.g. Mixtral-8×7, LLaMA-3-70).
- Context window size is the maximum number of tokens (e.g., words or parts of words) the model can process at once during input and output (QWEN-2.5-32k, Gemma-8k)
These characteristics are combined (and unknown is inserted on unavailable characteristics) to define individual models (e.g. Gemma-groq-preview-3-unknown-70-8k, commandr-plus-unknown-unknown-20240801-104-12.8k).
Counts of models
Models were counted by summing the unique combinations of characteristics as defined in the previous section. The number of models can be summed across developers, providers, or both.
For example, we can count the number of models developed by French companies that are being offered by American cloud service providers. More specifically, this would include Microsoft providing cloud services and support for Mistral models.
Prices of models
The measurement of model price is dependent on the modality of the model and is combination of input and output prices that reflect a representative user interaction.
The (blended) price of text-to-text models is expressed in USD per million tokens and measured using a combination of the price per tokens in the prompt (“price_input”) and price per tokens in the generated AI response (“price_output”). An additional adjustment for reasoning steps (“coefficient_reasoning”), is made for reasoning models that charge for intermediary reasoning tokens that are billed but not displayed. The reasoning coefficient is computed as the ratio of the final output token and the total output token including reasoning tokens. When not explicitly available the reasoning tokens are estimated using external information. For non-reasoning models the coefficient is set to 1. The price formula is computed as follows:
Price = α × price_input + (1- α) × price_output × coefficient_reasoning
Where α = Number of input tokens⁄(input + output tokens) = 0.75
For audio-to-text models, the price is measured as price per 1 hour of input audio.
For text-to-image models, the price is measured as price per 100 generated images.
For text-to-sound models, the price is measured as price per 1 million input tokens.
The final price of a model is the average prices across all providers when the model is available from more than one end-point.
Quality of models
Different variables serve as benchmarks to assess the performance of models. No single value is sufficient, and multiple benchmarks exist. Visualisations for quality on OECD.AI are modality dependent. The scores in the AIKoD database rely on benchmarks acquired from Artificial Analysis and others from Hugging Face and Livebench.
For text-to-text models the performance is computed by taking the average across several benchmarks and sources (Hugging Face’s MMLU scores, Arena ELO scores, the GPQA value and Livebench). More specifically, the Quality (q) of a model is an index representative of a cognitive tasks that includes 4 different sub tasks reflecting various difficulty (qHS = high school tasks, qgrad = graduate level tasks, qpref = qualitative performance and qnovel = novel tasks) and taking a specific benchmark to proxy the performance of the AI model in this specific task.
q = wungrad * qungrad+ wpref * qpref + wgrad * qgrad + wnovel * qnovel
For audio-to-text models, the quality is calculated from the word error rate (WER) using the formula: quality_index = 1 – (WER / 100).
For text-to-image models, the quality is based on the “Model Quality ELO” from Artificial Analysis, normalized between 0 and 1.
For text-to-sound models, the quality is based on Artificial Analysis’ Arena ELO score normalized between 0 and 1.
Quality is averaged across all model offerings, even when the same model is offered by multiple providers.
References
André, Bétin, Gal and Peltier (2025), Developments in Artificial Intelligence: new indicators based on models characteristics, prices and providers. OECD Publishing, Paris.
Experimental AI knowledge on Demand (AIKOD) database (internal OECD)