Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Mission KI: Dassen a Dataset Search Engine



Although large amounts of data are available, only a fraction of it is in an easily usable form. Much of the data is not recorded in a uniform manner and is available without quality descriptions, known as data profiles. This impairs the use of the data. MISSION KI aims to make it easier to find suitable data and improve data quality. As part of a project, the initiative has developed the innovative data set search engine (Daseen), which for the first time enables cross-source searches for data sets. Daseen is now available to the public as a beta version at https://www.daseen.de free of charge and without registration.

Simplified data searches require data quality descriptions. The partners have developed the Extended Dataset Profile Service (EDPS) for data quality descriptions. The EDPS is a uniform method for indexing and cataloging data. With the EDPS, metadata, known as data profiles, can be created automatically for data sets. Specifically, this means that the new service gives data providers the ability to automatically catalog and curate data from different sources and make it findable and usable based on data profiles. Once the data has been described in this way, data users can find it manually or automatically across data spaces and data portals using the data profiles. The team has integrated the EDPS into Daseen, ensuring that the data quality of the available data sets is immediately visible. The combination of Daseen and the EDPS enables data users to obtain high-quality data that is precisely tailored to their needs.

The EDPS was designed to be operated locally by the data provider. Common connector solutions such as the Eclipse Data Space Connector are used for this purpose. The EDPS thus follows the compute-to-data principle: the algorithms used to create the data profiles are executed where the data is physically located – i.e., at the user's site. This ensures that the data does not have to be transferred in order to generate the desired metadata. 

Data-providing and data-using companies alike will benefit from the Daseen and EDPS software solutions, which are separate but interoperable components, as will operators of data spaces and data portals. The team is now making the software solutions available as open source on Github to enable widespread reuse: https://github.com/Mission-KI/Dataset-Search-Engine.

About the tool






Lifecycle stage(s):


Type of approach:



Usage rights:




Stakeholder group:


Geographical scope:


People involved:


Tags:

  • data governance
  • quality
  • data
  • Accessibility

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.