Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Higher-dimensional bias in a BERT-based disinformation classifier

Mar 15, 2024

Tool concerned:Unsupervised bias detection tool

Higher-dimensional bias in a BERT-based disinformation classifier

Abstract

A normative advice commission, consisting of 6 AI experts from various disciplines, reviewed the results of the bias detection tool (BDT) applied to a self-trained BERT-based disinformation classifier on the Twitter1516 dataset. The commission believes there is a low risk of (higher-dimensional) proxy discrimination by the disinformation classifier and that the particular difference in treatment identified by the BDT can be justified, if certain conditions apply.

Results of the BDT, applied on the BERT-based disinformation classifier on the Twitter1516 dataset

Bias according to False Positive Rate (FPR). On average, users that:

are verified, have higher #followers, user engagement and #URLs;
use less #hashags and have lower tweet length

have more true content classified as false (false positives). Based on the clusters identified with most bias (according to the FPR bias metric). Cluster with highest bias (FPR): 0.08. #elements in cluster with highest bias: 249.

Bias according to False Negative Rate (FPR). On average, users that:

use more #hashtags and have higher sentiment score;
are non-verified, have less #followers, user engagement and tweet length

have more false content classified as true (false negative). Based on the clusters identified with most bias (according to the FNR bias metric). Cluster with highest bias (FNR): 0.13. #elements in cluster with highest bias: 46.

Question raised for the normative advice commission

Is there an indication that one of the statistically significant features, or a combination of the features from Slide 10 are critically linked to one or multiple protected grounds?
In the context of disinformation detection, is it as harmful to classify true content as false (false positive, FP) as false content as true (false negative, FN)?
For a specific cluster of people, is it justifiable to have true content classified as false 8 percentage points more often? For a specific cluster of people, is it justifiable to have false content classified as true 13 percentage points more often?
Is it justifiable that the disinformation classification algorithm is too harsh towards users with verified profile, more #followers and higher user engagement and too lenient towards users with non-verified profile, less #followers and lower user engagement?

Answers provided by the normative advice commission

The commission, comprising experts in AI and ethics, scrutinizes the features influencing a disinformation classifier's outcomes. They find no direct link to protected grounds as defined by the European Convention on Human Rights. However, they stress the importance of considering user characteristics such as verified profiles or high follower counts, which could serve as proxies for socio-economic status and potentially lead to unfair outcomes. Despite this, there's no immediate suspicion of discrimination.
The commission identifies false positives (labeling true content as false) and false negatives (labeling false content as true) as harmful, though they have different impacts. False positives directly affect individual authors and undermine trust in content moderation, raising concerns about free speech. False negatives, meanwhile, contribute to the spread of disinformation and erode trust in moderation processes. The majority view is that false positives are more harmful due to their unfairness and implications for freedom of expression. Ensuring equal treatment and respecting users' freedom of expression are paramount. The commission recommends prioritizing the reduction of false negatives when human moderators analyze flagged content. They emphasize the need for effective recourse mechanisms and clear explanations for content classification decisions.
While the commission doesn't find the observed discrepancies unjustified, they stress the importance of thorough assessment and monitoring. They propose proactive measures, such as setting quantitative thresholds for misclassification rates and examining cluster compositions to evaluate biases. Adequate feedback loops and accessible recourse channels are deemed essential for mitigating biases. The commission acknowledges that unequal treatment across user clusters may be justified under certain conditions. They suggest increased scrutiny for high-profile users due to their potential impact on disinformation spread, while also cautioning against unjust treatment of specific high-profile groups like journalists. Transparent communication about moderation processes and the availability of recourse mechanisms are crucial for building trust.
The commission contends that social media companies should actively counteract leniency towards high-profile users, ensuring fair treatment. They argue that the observed discrepancies are less concerning than if the model showed higher leniency towards high-profile users, emphasizing the importance of purposeful decision-making, documentation, continuous learning, and understanding the root causes of disinformation spread.

This normative advice is supported by 20+ actors from the AI auditing community, including journalists, civil society organizations, NGOs, corporate data scientists and academics.

Benefits of using the tool in this use case

The benefits of the tool are multi-faceted: 1. no user data needed on protected attributes, 2. detects complex bias, 3. scalable method based on machine learning to detect algorithmic bias.

No user data needed on protected attributes

The BDT identifies clusters for which a binary disinformation detection algorithm is systematically misclassifying, i.e., predicting false content to be true and true content to be false. A cluster is a group of social media users sharing similar features, e.g., verified account, number of external URLs used in a social media post and average post length. The BDT makes use of unsupervised clustering and therefore does not require a priori information about existing disparities and protected attributes of users (which are often not available in practice).

2. Detects complex bias

The BDT aims to discover complex and hidden forms of bias in the disinformation classifier which is trained on numerous content-agnostic attributes of both false and true social media posts. The BDT is specifically geared towards detecting unforeseen forms of bias and higher-dimensional forms of bias, such as whether users have a verified profile or external URLs used in a social media post. Aside from unfair biases with respect to established protected groups, such as gender, sexual orientation, and race, bias can also occur with respect to non-established and unexpected groups of people. These forms of bias are more difficult to detect for humans, especially when the unfairly treated group is defined by a high-dimensional mixture of features, e.g., a combination of a verified profile and number of URLs used in a social media post. The BDT is based on an unsupervised clustering method, which makes it capable of detecting these complex forms of bias. It thereby tackles the difficult problem of detecting proxy-discrimination that stems from unforeseen and higher-dimensional forms of bias, including intersectional forms of discrimination.

3. Scalable method based on machine learning to detect algorithmic bias

The BDT, which can be used as a web app on our Algorithm Audit's website, is a scalable, user-friendly, free-to-use, and open-source web application. The tool works on all sorts of binary AI classifiers and is therefore model-agnostic. This design choice was intentional, as it enables users to audit the specific classifier applied in their use case, whatever it may be. In this way, the scalable and data-driven benefits of machine learning work in tandem with the normative and context-sensitive judgment of human experts, in order to determine fair AI in a concrete way.

Shortcomings of using the tool in this use case

We elaborate on two shortcoming of the BDT: 1. noise detection and 2. hyper parameter selection.

Noise detection

When applying statistical clustering, such as HBAC k-means, a risk is the potential for noise detection. Noise refers to data points that do not belong to any discernible pattern or cluster. These noisy data points can mislead the BDT to create spurious clusters or affecting the accuracy of identified clusters. This can result in incorrect statistical patterns being submitted to a qualitative human-led normative advice commission. Therefore, careful data preprocessing and methodologically robust clustering algorithms that can handle noise effectively are essential to mitigate this risk and ensure the reliability of clustering results. Both the used Twitter1516 data set and the HBAC algorithm suffice these robustness criteria.

2. Hyper parameter selection

The k-means HBAC algorithm, as used by the BDT to identify clusters, uses various hyperparameters, e.g., number of initial clusters, minimal splitable cluster size, minimal acceptable cluster size. Choices for these parameters are not always clear and might affect the output of the BDT, which affect the results submitted to a commission of experts for qualitative assessment. Sensitivity testing was conducted to assess the robustness of the BDT results for various parameters. See Appendix of algoprudence AA:2023:01 https://algorithmaudit.eu/algoprudence/cases/aa202302_risk-profiling-for-social-welfare-reexamination/.

Learnings or advice for using the tool in a similar context

Our joint approach combines the power of rigorous, machine learning-informed quantitative testing with the balanced judgment of human experts, in order to determine fair AI on a case-by-case basis. When applying the BDT yourself it is important to:

Please carefully assess whether your data structure is compatible with the current BDT processing requirements. Currently, only numerical feature columns can be processed, along with the predicted and ground truth labels from the binary classifier. BDT will soon be able to process categorical data too, using k-modes rather than k-means clustering.
Perform sensitivity testing on the selected hyper parameters. This feature will soon be included as well in the web app.

Comparison with other tools

In contrast to other tools, the BDT distinguishes itself by it's unsupervised clustering approach, which makes it capable of detecting complex forms of bias. It thereby tackles the difficult problem of detecting proxy-discrimination that stems from unforeseen and higher-dimensional forms of bias, including intersectional forms of discrimination.

Modify this use case

About the use case

You can click on the links to see the associated use cases

Objective(s):

Fairness
Human Agency & Control
Robustness

Impacted stakeholders:

Consumers
Management
Regulators

Purpose(s):

Event/anomaly detection
Forecasting/prediction
Recognition/object detection

Target sector(s):

Science & technology
Public governance
Digital Economy

Country of origin:

Netherlands

Target users:

Data scientist
Management
Policy makers