Elsevier (Scopus) data 

Methodological note

Background

Scopus is Elsevier’s expertly curated abstract and citation database, with over 75 million indexed records. The data used to construct OECD.AI visualisations includes scholarly articles, conference proceedings, reviews, book chapters and books from a subset of AI-related resources belonging to Elsevier. Elsevier publishes more than 500 000 articles annually in 2 500 journals.

Identifying AI scientific publications

More than 600,000 AI scholarly publications are extracted from its archives using core AI keywords such as back-propagation neural network, genetics-based machine learning, cohen-grossberg neural networks, back-propagation algorithm, and neural networks learning. More details on the methodology used to identify AI publications are available in Chapter 1 of Elsevier’s “Artificial Intelligence: How knowledge is created, transferred, and used” report.

Frequency of data updates

The visualisations on OECD.AI include data from 2010 onwards. Quarterly snapshots of the Scopus database are used to update the data. Due to a lag in reporting, figures for the latest quarter may appear slightly lower than they actually are. This is automatically corrected in subsequent updates.

Counting of publications: quantity measure

A “fractional count” indicator – one that assigns equal weights to each publication’s co-author – is provided to avoid double-counting of publications. In other words, a publication with three co-authors from different countries would be counted as 1/3 of a publication for each country.

The “Publications per capita” checkbox allows the user to normalise the number of publications per unit of population for countries with a population of at least one million.

Counting of publications: quality measure

Scientific publications are ranked based on the Field-weighted Citation Impact (FWCI), which is the ratio of the total citations actually received by a scientific publication and the total citations that would be expected based on the average of the subject field or scientific discipline. A FWCI of 1 means that the publication is cited as much as the average publication in that subject field or scientific discipline. A FWCI of less (more) than 1 means that the publication is cited less (more) than the average publication in that subject field or scientific discipline. In this manner, the FWCI takes into account the differences in research behaviour across disciplines.

OECD.AI defines three categories of scientific publications based on their FCWI score:

  • Low impact: 0 < FWCI ≤ 0.5
  • Medium impact: 0.5 < FWCI ≤ 1.5
  • High impact: FWCI > 1.5

Note that the drop in the number of high quality publications in the last couple of years reflects the expected lag in citations (i.e. citations are accumulated in time).

Measuring gender in scientific publications

Elsevier uses NamSor’s API to infer an author’s gender from his or her name. The API provides a “Gender Probability Score”, which is the natural log of the ratio of probabilities – as determined by a Naïve-Bayes model – of the name receiving the classification of either “male” or “female”. The score is based on three data points: country of origin, first name and last name. Each author’s country of origin was estimated based on the country of affiliation listed on his or her first publications in Scopus. Only those authors for whom the algorithm returned a gender probability of 85% or higher were assigned a gender value. In the case of China, gender disambiguation methods were found to be more reliable when applied to author names written in Mandarin than to those same names transliterated using the Roman alphabet. The gender probability threshold was set at 70% to ensure a sufficient number of authors for analysis.

Measuring collaboration between countries and institutions

OECD.AI displays research collaborations between different entities, either institutions or countries (Country names and codes in OECD.AI abide by the “OECD Guidelines regarding the use of the list of names of countries and territories). This is done by assigning each paper to the relevant institutions and countries on the basis of the authors’ institutional affiliations. OECD and CSIC (2016) define collaboration as “co-authorship involving different institutions. International collaboration refers to publications co-authored among institutions in different countries…National collaboration concerns publications co-authored by different institutions within the reference country. No collaboration refers to publications not involving co-authorship across institutions. No collaboration includes singled-authored articles, as long as the individual has a single affiliation, as well as multiple-authored documents within a given institution.” Institutional measures of collaboration may overestimate actual collaboration in the case of countries where it is common practice to have a double affiliation (OECD and CSIC, 2016). 

To avoid double counting, collaborations are considered to be binary: either an entity collaborates on a paper (value=1) or it does not (value=0). The shared paper counts as one toward the number of collaborations between two entities. The following rules apply: 

  • For between-country collaborations: papers written by authors from more than one institution in the same country only count as one collaboration for that country. 
  • For between-institution collaboration: papers written by more than one author from the same institution only count as one collaboration for that institution.  

Types of scientific publications

Scopus classifies scientific publications into the following types depending on publication outlets: journal article, conference paper, book chapter, review paper, letter, note, survey, article in press, erratum, data paper, editorial, book, abstract report, and unspecified. Given their low volumes, the following publication types were aggregated under “Other”: letter, note, survey, article in press, erratum, data paper, editorial, book, abstract report, and unspecified.

Additional metrics 

Additional metrics are used to construct the y-axis of the “AI publications vs GDP per capita by country, region, in time” chart and enable the “Per capita” checkbox in some charts. These indicators include: 

  • GDP: GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without deducting the depreciation of fabricated assets or the depletion and degradation of natural resources. Data are in current US dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates. For a few countries where the official exchange rate does not reflect the rate effectively applied to actual foreign exchange transactions, an alternative conversion factor is used. Sources: World Bank national accounts data and OECD National Accounts data files (data.worldbank.org/). 
  • GDP per capita: GDP per capita is gross domestic product divided by midyear population. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Sources: World Bank national accounts data and OECD National Accounts data files (data.worldbank.org/). 
  • Population: Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. Sources: United Nations Population Division, World Population Prospects: 2019 Revision; Census reports and other statistical publications from national statistical offices; Eurostat: Demographic Statistics; United Nations Statistical Division, Population and Vital Statistics Report; U.S. Census Bureau: International Database; and Secretariat of the Pacific Community: Statistics and Demography Programme (data.worldbank.org/). 
  • R&D expenditure (% of GDP): Gross domestic expenditures on research and development (R&D), expressed as a percent of GDP. They include both capital and current expenditures in the four main sectors: Business enterprise, Government, Higher education and Private non-profit. R&D covers basic research, applied research, and experimental development. Source: UNESCO Institute for Statistics (uis.unesco.org). 

For these metrics, data is interpolated in years where no data is available. If last year’s value is missing for an indicator, the value of the latest available year is used. 

References

OECD and SCImago Research Group (CSIC) (2016), Compendium of Bibliometric Science Indicators, OECD Publishing, Paris, http://oe.cd/scientometrics.