Stack Overflow

Methodological note

Background

Stackoverflow is a question-and-answer platform for programmers. There are tens of millions of visits to the site every month to ask questions, learn, and share technical knowledge. The site functions as a collection of detailed, high-quality collection of information relating to programming.

The website is a valuable resource for AI practitioners, data scientists and machine learning experts around the world and has emerged as one of the main online platforms where programmers convene to discuss a variety of AI-related topics.

Identifying AI questions and answers 

OECD.AI leverages the public Stack Overflow API to analyse the evolution of AI-related questions and answers over time. To identify questions and answers relating to AI, OECD.AI filters the data using tags made available by Stack Overflow.  Tags are leveraged by question askers to target the appropriate responses, and used by question respondents to categorize and identify the subject of answers.  Specific tags are also aggregated and categorized by Stack Overflow users to identify the subjects relating to a major area, such as AI and Machine Learning.  OECD.AI leverages the list of AI-related tags from Stack Overflow to identify concepts related to machine learning (e.g. unsupervised-learning, neural-network, computer-vision, etc.) as well as symbolic AI (e.g., fuzzy logic, expert systems, etc.). Over 60 Stack Overflow tags have been identified as being related to AI (Table 1).

Location analysis 

Contributions to AI questions and answers are mapped to a country based on location information at the user level. 

Stack Overflow location: users have the option of providing their location through their Stack Overflow profiles. The location provided by contributors is not standardised and could belong to different levels (e.g., sub-urban, urban, regional, or national). To allow cross-country comparisons, GeoPy is used to standardise all available locations to the country level. If the above fails, a contributor’s location field is left blank. 

As of November 2025, roughly 36% of users could be mapped to a location using this methodology.

Questions, answers, and accepted answers 

OECD.AI analysis presents the overall number of questions, answers, and accepted answers posted to Stack Overflow to highlight different activities and trends on the platform. The answer selected as most relevant by the user asking the question. In other words, users posing questions usually choose one answer to highlight the response that most adequately answered their question. 

Measuring the quality of questions and answers

OECD.AI ranks the quality of questions and answers by their ‘score’ i.e., the number of votes they received. Stack Overflow allows users to vote positively or negatively on any question or answer. The overall ‘score’ on a question or answer is the aggregation of all votes. OECD.AI uses this score to assess the ‘quality’ of questions and answers. The quality indicator is divided into three categories: low, medium, and high. Based on the distribution of votes, we define ‘low’ quality as a question or answer that has a negative score or zero (i.e., same number of negative and positive votes or more negative than positive votes). ‘Medium’ score is attributed to a question or answer that has a score of more than zero and less than five (e.g., 10 positive votes and 7 negative votes). A ‘high’ score is attributed to a post that has a score greater than five (e.g., 15 positive votes and 2 negative votes). 

International knowledge flows 

Stack Overflow enables AI-related international knowledge flows. Knowledge flows happen when a user from one country answers a question posed by a user in another country. By mapping user location, OECD.AI enables the analysis of international knowledge flows by country over time.  OECD.AI visualizes interactions from two perspectives: the country asking questions (incoming knowledge flow by country) and the country answering questions (outgoing knowledge flow by country).

AI tags

machine-learning 
data-science 
neural-network 
conv-neural-network 
recurrent-neural-network 
convolutional-neural-network 
karas 
pytorch 
mxnet 
classification 
supervised-learning 
regression 
cluster-analysis 
unsupervised-learning 
reinforcement-learning 
pca 
support-vector-machines 
svm 
nearest-neighbor 
knn 
k-means 
bayesian-networks 
mixture-model 
decisiontrees 
genetic-algorithm 
simulated-annealing 
hidden-markov-models 
gaussian-process 
kalman-filter 
kalman 
particle-filter 
ensemble-learning 
q-learning 
computer-vision 
face-recognition 
ocr 
image-recognition 
speech-recognition 
voice-recognition 
nlp 
spam-filtering 
anomaly-detection 
recommendation-engine 
machine-translation 
libsvm 
weka 
orange 
shogun 
scikit-learn 
pybrain 
mahout 
rapidminer 
knime 
azure-machine-learning 
nltk 
caffe 
tensorflow 
theano 
keras 
opennmt 
xgboost 
catboost 
stanford-nlp 
deep-learning 
reinforcement-learning 
computer-vision 
robotics 
artificial-intelligence 
automation 
expert-system 
fuzzy-logic