Guidelines for Scraping or Collecting Publicly Accessible Data

May 12, 2025

In a data-driven world, data access is key to develop digital products. The IP Committee of the GPAI Innovation & Commercialization Working Group has initiated this work to identify global recommendations applicable regardless of the country. Data scraping is the process of extracting content from a website and importing it on a computer. The content can be used to then be analyzed or fed into an artificial intelligence algorithm. In certain instances, content scraped from public-facing websites may be protected by copyright and will require a license or an exception. Given that there is no international exception to copyright for data scraping, jurisdictions have very different approaches to the matter. As examples, data scraping in the United States can be allowed under fair use, provided it meets the criteria. Additionally, the European Union has introduced new text and data mining exceptions to copyright1 . Japan allows data scraping for computerized technical analysis only. There is quite the uncertainty on whether data scraping is subjected to the authorization of the right holder in other jurisdictions. Taking into account the diversity of applicable laws, these guidelines were designed to provide general recommendations for data scraping. So, the first graphic intends to be thought of as global guidelines to explain what one should and should not do, in order to avoid any intellectual property (IP) issues while web scraping and training AI. The second graphics are thought to present different exceptions that may be applicable depending on the jurisdiction.


Disclaimer: The opinions expressed and arguments employed herein are solely those of the authors and do not necessarily reflect the official views of the OECD or its member countries. The Organisation cannot be held responsible for possible violations of copyright resulting from the posting of any written material on this website/blog.