Background
Stackoverflow is a question-and-answer platform for programmers. There are tens of millions of visits to the site every month to ask questions, learn, and share technical knowledge. The site functions as a collection of detailed, high-quality collection of information relating to programming.
The website is a valuable resource for AI practitioners, data scientists and machine learning experts around the world and has emerged as one of the main online platforms where programmers convene to discuss a variety of AI-related topics.
Identifying AI questions and answers
OECD.AI leverages the public Stack Overflow API to analyse the evolution of AI-related questions and answers over time. To identify questions and answers relating to AI, OECD.AI filters the data using tags made available by Stack Overflow. Tags are leveraged by question askers to target the appropriate responses, and used by question respondents to categorize and identify the subject of answers. Specific tags are also aggregated and categorized by Stack Overflow users to identify the subjects relating to a major area, such as AI and Machine Learning. OECD.AI leverages the list of AI-related tags from Stack Overflow to identify concepts related to machine learning (e.g. unsupervised-learning, neural-network, computer-vision, etc.) as well as symbolic AI (e.g., fuzzy logic, expert systems, etc.). Over 60 Stack Overflow tags have been identified as being related to AI (Table 1).
Location analysis
Contributions to AI questions and answers are mapped to a country based on location information at the user level.
Stack Overflow location: users have the option of providing their location through their Stack Overflow profiles. The location provided by contributors is not standardised and could belong to different levels (e.g., sub-urban, urban, regional, or national). To allow cross-country comparisons, GeoPy is used to standardise all available locations to the country level. If the above fails, a contributor’s location field is left blank.
As of November 2025, roughly 36% of users could be mapped to a location using this methodology.
Questions, answers, and accepted answers
OECD.AI analysis presents the overall number of questions, answers, and accepted answers posted to Stack Overflow to highlight different activities and trends on the platform. The answer selected as most relevant by the user asking the question. In other words, users posing questions usually choose one answer to highlight the response that most adequately answered their question.
Measuring the quality of questions and answers
OECD.AI ranks the quality of questions and answers by their ‘score’ i.e., the number of votes they received. Stack Overflow allows users to vote positively or negatively on any question or answer. The overall ‘score’ on a question or answer is the aggregation of all votes. OECD.AI uses this score to assess the ‘quality’ of questions and answers. The quality indicator is divided into three categories: low, medium, and high. Based on the distribution of votes, we define ‘low’ quality as a question or answer that has a negative score or zero (i.e., same number of negative and positive votes or more negative than positive votes). ‘Medium’ score is attributed to a question or answer that has a score of more than zero and less than five (e.g., 10 positive votes and 7 negative votes). A ‘high’ score is attributed to a post that has a score greater than five (e.g., 15 positive votes and 2 negative votes).
International knowledge flows
Stack Overflow enables AI-related international knowledge flows. Knowledge flows happen when a user from one country answers a question posed by a user in another country. By mapping user location, OECD.AI enables the analysis of international knowledge flows by country over time. OECD.AI visualizes interactions from two perspectives: the country asking questions (incoming knowledge flow by country) and the country answering questions (outgoing knowledge flow by country).
AI tags
| machine-learning |
| data-science |
| neural-network |
| conv-neural-network |
| recurrent-neural-network |
| convolutional-neural-network |
| karas |
| pytorch |
| mxnet |
| classification |
| supervised-learning |
| regression |
| cluster-analysis |
| unsupervised-learning |
| reinforcement-learning |
| pca |
| support-vector-machines |
| svm |
| nearest-neighbor |
| knn |
| k-means |
| bayesian-networks |
| mixture-model |
| decisiontrees |
| genetic-algorithm |
| simulated-annealing |
| hidden-markov-models |
| gaussian-process |
| kalman-filter |
| kalman |
| particle-filter |
| ensemble-learning |
| q-learning |
| computer-vision |
| face-recognition |
| ocr |
| image-recognition |
| speech-recognition |
| voice-recognition |
| nlp |
| spam-filtering |
| anomaly-detection |
| recommendation-engine |
| machine-translation |
| libsvm |
| weka |
| orange |
| shogun |
| scikit-learn |
| pybrain |
| mahout |
| rapidminer |
| knime |
| azure-machine-learning |
| nltk |
| caffe |
| tensorflow |
| theano |
| keras |
| opennmt |
| xgboost |
| catboost |
| stanford-nlp |
| deep-learning |
| reinforcement-learning |
| computer-vision |
| robotics |
| artificial-intelligence |
| automation |
| expert-system |
| fuzzy-logic |

























