Google Trends Data
Google Trends provides access to a largely unfiltered sample of search queries made to Google, anonymised, categorised (according to the topic of the search query) and aggregated. As such, Google Trends data measures public interest in a topic and the evolution of that interest across time, languages, and geographies.
To capture trends across various languages, the OECD.AI visualisations are based on topics provided by google trends. Topics are predefined thematic headings that group together alternative spellings, names in other languages, and words related to the same concept under a single label.
For any given search topic (e.g. artificial intelligence, machine learning, etc.), the Google API provides two types of related topics:
- Top topics: A list of the most popular topics – or ‘top topics’ – that are most often searched for in association with the search topic, based on correlational analysis.
- Rising topics: A list of related ‘rising topics’ with the biggest increase in search frequency over a specified time period.
Google Trends provides data from 2004 onwards. However, only data from 2010 onwards is displayed on OECD.AI to increase the relevance of the results (e.g. AI-related developments gained significant momentum and popularity only after 2010).
The data displayed on OECD.AI is limited to OECD and G20 countries. Caution is advised when comparing values between countries, as the overall search volume varies by country.
Three visualisations have been constructed on OECD.AI using Google Trends data: “Top trending searches”, “Top ten search topics” and “AI-related search topics”.
Data cleaning involved the removal of repetitive or irrelevant related topics. For example, this included topics like ‘Artificial’, ‘Intelligence’, ‘Machine’ and ‘Learning’, as well as topics spuriously related to acronyms (e.g. ‘Adobe Illustrator’ for ‘AI’ and ‘Neuro-Linguistic Programming’ for ‘NLP’).
Semantic similarity is applied to filter out topics that are least related to the original key search topic. The filter – based on cosine similarity – uses a threshold of 0.15 to systematically exclude the least related topics to the original search topic.
Top trending searches by country and region over time
The “Top trending searches” visualisation presents the top rising topics related to each of the six key AI-related search topics: artificial intelligence, machine learning, natural language processing, computer vision, robotics and automation. For each key search topic and country, the interactive map highlights the related topic that experienced the most significant growth in frequency during a given year. The visualisation includes data from 2010 until the month identified in the ‘as of’ date at the bottom of the visualisation.
In sum, these ‘trending searches’ or rising topics fulfil two conditions:
- They have been commonly searched for with the selected key search topic; and
- They have experienced the most significant growth in search frequency over the selected time period. Hovering over each trending topic (i.e. represented by the coloured dots on the map) shows this percentage increase.
Google only provides trending or rising topics that have achieved ‘significant growth’ – defined as a search frequency increase of or above 50% – throughout the time period. For this reason, some countries may not have a rising topic for certain years.
In turn, some rising topics present very high percentage growths. These may be new topics with few (if any) prior searches.
Classification by search topic category
Zero-shot encoding is used in order to classify search terms across six categories: Business, Industry and Governance; Education, Training, Academia; Data Processing, Analytics, Big Data; Robotics, Automation, and Hardware; Science, Engineering, and Technology; and Software, Tools and Platforms.
For our analysis, we have chosen six categories that are relevant to data analytics and machine learning. Next, we use a pre-trained language model to convert the search terms into high-dimensional vectors. These vectors are then mapped to category centroids using a nearest-neighbor algorithm. The category with the closest centroid is assigned as the category for the search term.
We assign a search term to a label if the confidence score is over 0.2. The ‘score’ represents the model’s estimated probability that a given input sequence belongs to a particular label or category.
Top ten search topics by country over time
The “Top ten search topics” visualisation displays the evolution of the top ten related topics for a given country by year. The visualisation presents the top topics related to six key search topics: artificial intelligence, machine learning, natural language processing, computer vision, robotics, and automation.
Topics are ranked from one to ten by their indexed popularity in a given country per year. The visualisation includes data from 2010 until the month identified in the ‘as of’ date at the bottom of the visualisation.
Certain countries may have less than 10 top related topics in a given year. This is related to the way Google Trends indexes interest in a topic, normalising all searches on all topics in the given time period and country from 0 to 100. Topics with indexed values below 1 are not provided by Google Trends, as they are not considered relevant.
Related search topics by country and region over time
The “Related search topics” visualisation shows a list of the top related topics per country, aggregated over the period of 2010 until the most recently updated month identified at the bottom of the visualisation. The visualisation presents the top topics related to six key search topics: artificial intelligence, machine learning, natural language processing, computer vision, robotics, and automation.
The chart on the left presents the overall list of top topics identified as frequently searched for in association to the key search topic during the time period. The ranking of topics is given by the total number of countries that include a given topic in their list of top topics during this time frame. The list of countries that include a top topic can be seen by hovering over the corresponding bar. Clicking on a bar will in turn filter the chart on the right to display the trends by country for the selected topic.
The chart on the right shows the overall popularity of each topic over the entire time frame and per country. The time series are normalised per country and time range. Relative comparisons between countries are not valid. Data are indexed to 100, where 100 represents the point in time when search interest in a particular topic peaked for the time range and country selected. Hovering over the trend line displays the indexed popularity value for a given month.