NoiseRank: Unsupervised Label Noise Reduction with Dependence Models
Label noise is increasingly prevalent in datasets acquired from noisy
channels. Existing approaches that detect and remove label noise generally rely
on some form of supervision, which is not scalable and error-prone. In this
paper, we propose NoiseRank, for unsupervised label noise reduction using
Markov Random Fields (MRF). We construct a dependence model to estimate the
posterior probability of an instance being incorrectly labeled given the
dataset, and rank instances based on their estimated probabilities. Our method
1) Does not require supervision from ground-truth labels, or priors on label or
noise distribution. 2) It is interpretable by design, enabling transparency in
label noise removal. 3) It is agnostic to classifier architecture/optimization
framework and content modality. These advantages enable wide applicability in
real noise settings, unlike prior works constrained by one or more conditions.
NoiseRank improves state-of-the-art classification on Food101-N (~20% noise),
and is effective on high noise Clothing-1M (~40% noise).