Stack Overflow Survey

Methodological note

Background

Stackoverflow is a question-and-answer platform for programmers. There are tens of millions of visits to the site every month to ask questions, learn, and share technical knowledge. The site functions as a collection of detailed, high-quality collection of information relating to programming.

The website is a valuable resource for AI practitioners, data scientists and machine learning experts around the world and has emerged as one of the main online platforms where programmers convene to discuss a variety of AI-related topics.

Stack Overflow Survey

Since 2011, Stackoverflow conducts an annual survey to collect information from its users. The purpose of the survey is to better understand the Stackoverflow user base. As the survey only includes respondents that define themselves as ‘Data scientist’ or ‘Machine learning expert’ or similar beginning in 2015, this analysis examines surveys from 2015 onwards. Roughly 50,000 to 70,000 stackoverflow users participate each year, and over 100 countries are represented.

Respondents of the survey are recruited from onsite messaging, blog posts, email lists, banner ads, and social media posts. Survey respondents anonymously and voluntarily self-report their information and are likely to be highly engaged in the Stackoverflow community.

OECD.AI leverages aggregate demographic information from survey respondents to build indicators and identify trends related to the profession, country, salary, education, and age of AI developers.

Visualisations

Stackoverflow survey visualisations on OECD.AI focus on three areas from 2015-present:

Breakdown by demographic characteristic, including salary range, education, profession, region, country, and age
Popular and trending programming languages.
Opinions and AI tools used by developers

The charts focus on the relationship between self-reported demographic information (location, gender, age) and salary/education. Filtering is available for all views by year, profession, region, and country.

Indicator breakdown

It is important to note that many survey fields have changed since 2015. The structure and content of the survey has become more stable beginning in 2018. This analysis uses the following standardised fields to present information over time:

Country

Country data refers to the country of residence of the respondent at the time of the survey and is derived from responses to the question ‘In which country do you currently live?’.

Education

The Education field refers to the formal education of the respondent. OECD.AI buckets responses by: Associates & less than bachelors, Bachelors, and Advanced degree (which includes Masters degrees and above). As this is a multiple-choice field, the analysis captures the highest attained formal education level.
As the possible answers were changed each year, mappings were performed to place education level into 4 categories:
- Associates & less than bachelor’s degree: classified if the answer equals the label or contains text such as “boot-camp”, “on-the-job”, “mentorship”, “certification”, “something else”, “without earning a degree”, “Primary”, “Secondary”, “self-taught”, “Coursera”, “Other”, etc.
- Bachelor’s degree (e.g. B.A., B.S.): classified if the answer equals the label or contains text such as “bachelors”, “B.S.”, “BA”, etc.
- Master’s degree (e.g. M.A., M.S.): classified if the answer equals the label or contains text such as “Master” or “masters”.
- Professional degree (e.g. Ph.D, MD): classified if the answer equals the label or contains text such as “PhD”, “Ph.D”, or “Doctoral”, etc.

Salary

For 2015 and 2016, users directly provided their salary as a range of yearly USD.
For 2017-2022 users were asked to identify their salary, currency, and payment frequency (weekly, monthly, or yearly). Salaries were converted by stackoverflow into yearly salaries assuming 12 working months and 50 working weeks.
Exchange rates were calculated by stackoverflow based on the dates of the survey.

Age

The age field is mapped from the question ‘What is your age?’. Depending on the year, users can either select an age group, or input an integer value. All integer values are converted to the following age groups for consistency:
- 24 and Under
- 25-34
- 35-44
- 45-54
- 55-64
- 65 years or older
In 2015, users were asked to select their age from a pre-defined age group bucket that did not align with the previously defined groupings. This analysis used the following age buckets for this year:
- 24 and Under
- 25-34
- 35-39
- 40-50
- 51-60
- 60 years or older
In 2017, the survey did not provide a field for age. This analysis does not include age data for this year.

Programming Languages

This analysis leverages two survey questions to show the preferred programming languages by respondents. These questions refer to: programming languages that respondents have done extensive development work in over the past year, and languages developers want to work in over the next year. This is a multiple choice field, with a write in option.
The list of languages included in the multiple choice list varies by year, as survey designers curate a collection of programming languages to include based on responses from the previous year.

Number of respondents

The number of respondents depends on the year of the survey. As the survey popularity and structure changes over time, caution is advised when comparing over time. It is important to note that the structure of the survey changed significantly after 2017, leading to an increase in participants and more standardization among fields and responses.

In terms of overall responses, the number of overall respondents in the years 2018 to 2024 approximately ranges between 60,000 and 100,000. For 2016 and 2017, and then again in 2025, the number of respondents is around 50,000. 2015 has significantly less results, with only about 26,000 overall respondents.

It is important to keep in mind that this data and survey represents a subset of employees in the artificial intelligence field who are active in the online data science and programming communities.

As such, caution should be exercised when interpreting the results of the data, and it should be considered as one of many sources of information about the field of data science and machine learning.