MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Object detectors often perform poorly on data that differs from their
training set. Domain adaptive object detection (DAOD) methods have recently
demonstrated strong results on addressing this challenge. Unfortunately, we
identify systemic benchmarking pitfalls that call past results into question
and hamper further progress: (a) Overestimation of performance due to
underpowered baselines, (b) Inconsistent implementation practices preventing
transparent comparisons of methods, and (c) Lack of generality due to outdated
backbones and lack of diversity in benchmarks. We address these problems by
introducing: (1) A unified benchmarking and implementation framework, Align and
Distill (ALDI), enabling comparison of DAOD methods and supporting future
development, (2) A fair and modern training and evaluation protocol for DAOD
that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset,
CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method,
ALDI++, that achieves state-of-the-art results by a large margin. ALDI++
outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy
Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to
outperform a fair baseline), and +2.0 AP50 on CFC Kenai to Channel. Our
framework, dataset, and state-of-the-art method offer a critical reset for DAOD
and provide a strong foundation for future research. Code and data are
available: https://github.com/justinkay/aldi and
https://github.com/visipedia/caltech-fish-counting.