Stack Overflow

Methodological note

Background

Stackoverflow is a question-and-answer platform for programmers. There are tens of millions of visits to the site every month to ask questions, learn, and share technical knowledge. The site functions as a collection of detailed, high-quality collection of information relating to programming.

The website is a valuable resource for AI practitioners, data scientists and machine learning experts around the world and has emerged as one of the main online platforms where programmers convene to discuss a variety of AI-related topics.

Identifying AI questions and answers

OECD.AI leverages the public Stack Overflow API to analyse the evolution of AI-related questions and answers over time. To identify questions and answers relating to AI, OECD.AI filters the data using tags made available by Stack Overflow. Tags are leveraged by question askers to target the appropriate responses, and used by question respondents to categorize and identify the subject of answers. Specific tags are also aggregated and categorized by Stack Overflow users to identify the subjects relating to a major area, such as AI and Machine Learning. OECD.AI leverages the list of AI-related tags from Stack Overflow to identify concepts related to machine learning (e.g. unsupervised-learning, neural-network, computer-vision, etc.) as well as symbolic AI (e.g., fuzzy logic, expert systems, etc.). Over 60 Stack Overflow tags have been identified as being related to AI (Table 1).

Location analysis

Contributions to AI questions and answers are mapped to a country based on location information at the user level.

Stack Overflow location: users have the option of providing their location through their Stack Overflow profiles. The location provided by contributors is not standardised and could belong to different levels (e.g., sub-urban, urban, regional, or national). To allow cross-country comparisons, GeoPy is used to standardise all available locations to the country level. If the above fails, a contributor’s location field is left blank.

As of November 2025, roughly 36% of users could be mapped to a location using this methodology.

Questions, answers, and accepted answers

OECD.AI analysis presents the overall number of questions, answers, and accepted answers posted to Stack Overflow to highlight different activities and trends on the platform. The answer selected as most relevant by the user asking the question. In other words, users posing questions usually choose one answer to highlight the response that most adequately answered their question.

Measuring the quality of questions and answers

OECD.AI ranks the quality of questions and answers by their ‘score’ i.e., the number of votes they received. Stack Overflow allows users to vote positively or negatively on any question or answer. The overall ‘score’ on a question or answer is the aggregation of all votes. OECD.AI uses this score to assess the ‘quality’ of questions and answers. The quality indicator is divided into three categories: low, medium, and high. Based on the distribution of votes, we define ‘low’ quality as a question or answer that has a negative score or zero (i.e., same number of negative and positive votes or more negative than positive votes). ‘Medium’ score is attributed to a question or answer that has a score of more than zero and less than five (e.g., 10 positive votes and 7 negative votes). A ‘high’ score is attributed to a post that has a score greater than five (e.g., 15 positive votes and 2 negative votes).

International knowledge flows

Stack Overflow enables AI-related international knowledge flows. Knowledge flows happen when a user from one country answers a question posed by a user in another country. By mapping user location, OECD.AI enables the analysis of international knowledge flows by country over time. OECD.AI visualizes interactions from two perspectives: the country asking questions (incoming knowledge flow by country) and the country answering questions (outgoing knowledge flow by country).

AI tags

machine-learning

data-science

neural-network

conv-neural-network

recurrent-neural-network

convolutional-neural-network

karas

pytorch

mxnet

classification

supervised-learning

regression

cluster-analysis

unsupervised-learning

reinforcement-learning

pca

support-vector-machines

svm

nearest-neighbor

knn

k-means

bayesian-networks

mixture-model

decisiontrees

genetic-algorithm

simulated-annealing

hidden-markov-models

gaussian-process

kalman-filter

kalman

particle-filter

ensemble-learning

q-learning

computer-vision

face-recognition

ocr

image-recognition

speech-recognition

voice-recognition

nlp

spam-filtering

anomaly-detection

recommendation-engine

machine-translation

libsvm

weka

orange

shogun

scikit-learn

pybrain

mahout

rapidminer

knime

azure-machine-learning

nltk

caffe

tensorflow

theano

keras

opennmt

xgboost

catboost

stanford-nlp

deep-learning

reinforcement-learning

computer-vision

robotics

artificial-intelligence

automation

expert-system

fuzzy-logic