Elsevier (Scopus) data

Methodological note

Background

Scopus is Elsevier’s expertly curated abstract and citation database, with over 75 million indexed records. The data used to construct OECD.AI visualisations includes scholarly articles, conference proceedings, reviews, book chapters and books from a subset of AI-related resources belonging to Elsevier. Elsevier publishes more than 500 000 articles annually in 2 500 journals.

Identifying AI scientific publications

More than 2,000,000 AI scholarly publications are extracted from its archives using core AI keywords such as back-propagation neural network, genetics-based machine learning, cohen-grossberg neural networks, back-propagation algorithm, and neural networks learning. More details on the methodology used to identify AI publications are available in Chapter 1 of Elsevier’s “Artificial Intelligence: How knowledge is created, transferred, and used” report.

Frequency of data updates

The visualisations on OECD.AI include data from 2010 onwards. Quarterly snapshots of the Scopus database are used to update the data. Due to a lag in reporting, figures for the latest quarter may appear slightly lower than they actually are. This is automatically corrected in subsequent updates.

Counting of publications: quantity measure

A “fractional count” indicator – one that assigns equal weights to each publication’s co-author – is provided to avoid double-counting of publications. In other words, a publication with three co-authors from different countries would be counted as 1/3 of a publication for each country.

Counting of publications: quality measure

Scientific publications are ranked based on the Field-weighted Citation Impact (FWCI), which is the ratio of the total citations actually received by a scientific publication and the total citations that would be expected based on the average of the subject field or scientific discipline. A FWCI of 1 means that the publication is cited as much as the average publication in that subject field or scientific discipline. A FWCI of less (more) than 1 means that the publication is cited less (more) than the average publication in that subject field or scientific discipline. In this manner, the FWCI takes into account the differences in research behaviour across disciplines.

OECD.AI defines three categories of scientific publications based on their FCWI score:

Low impact: 0 < FWCI ≤ 0.5
Medium impact: 0.5 < FWCI ≤ 1.5
High impact: FWCI > 1.5

Note that the drop in the number of high quality publications in the last couple of years reflects the expected lag in citations (i.e. citations are accumulated in time).

Measuring gender in scientific publications

Elsevier uses NamSor’s API to infer an author’s gender from his or her name. The API provides a “Gender Probability Score”, which is the natural log of the ratio of probabilities – as determined by a Naïve-Bayes model – of the name receiving the classification of either “male” or “female”. The score is based on three data points: country of origin, first name and last name. Each author’s country of origin was estimated based on the country of affiliation listed on his or her first publications in Scopus. Only those authors for whom the algorithm returned a gender probability of 85% or higher were assigned a gender value. In the case of China, gender disambiguation methods were found to be more reliable when applied to author names written in Mandarin than to those same names transliterated using the Roman alphabet. The gender probability threshold was set at 70% to ensure a sufficient number of authors for analysis.

Measuring collaboration between countries and institutions

OECD.AI displays research collaborations between different entities, either institutions or countries (Country names and codes in OECD.AI abide by the “OECD Guidelines regarding the use of the list of names of countries and territories”). This is done by assigning each paper to the relevant institutions and countries on the basis of the authors’ institutional affiliations. OECD and CSIC (2016) define collaboration as “co-authorship involving different institutions. International collaboration refers to publications co-authored among institutions in different countries…National collaboration concerns publications co-authored by different institutions within the reference country. No collaboration refers to publications not involving co-authorship across institutions. No collaboration includes singled-authored articles, as long as the individual has a single affiliation, as well as multiple-authored documents within a given institution.” Institutional measures of collaboration may overestimate actual collaboration in the case of countries where it is common practice to have a double affiliation (OECD and CSIC, 2016).

To avoid double counting, collaborations are considered to be binary: either an entity collaborates on a paper (value=1) or it does not (value=0). The shared paper counts as one toward the number of collaborations between two entities. The following rules apply:

For between-country collaborations: papers written by authors from more than one institution in the same country only count as one collaboration for that country.

Types of scientific publications

Scopus classifies scientific publications into the following types depending on publication outlets: journal article, conference paper, book chapter, review paper, letter, note, survey, article in press, erratum, data paper, editorial, book, abstract report, and unspecified. Given their low volumes, the following publication types were aggregated under “Other”: letter, note, survey, article in press, erratum, data paper, editorial, book, abstract report, and unspecified.

References

OECD and SCImago Research Group (CSIC) (2016), Compendium of Bibliometric Science Indicators, OECD Publishing, Paris, http://oe.cd/scientometrics.