I3CL:Intra- and Inter-Instance Collaborative Learning for Arbitrary-shaped Scene Text Detection
One fundamental challenge of vehicle re-identification (re-id) is to learn
robust and discriminative visual representation, given the significant
intra-class vehicle variations across different camera views. As the existing
vehicle datasets are limited in terms of training images and viewpoints, we
propose to build a unique large-scale vehicle dataset (called VehicleNet) by
harnessing four public vehicle datasets, and design a simple yet effective
two-stage progressive approach to learning more robust visual representation
from VehicleNet. The first stage of our approach is to learn the generic
representation for all domains (i.e., source vehicle datasets) by training with
the conventional classification loss. This stage relaxes the full alignment
between the training and testing domains, as it is agnostic to the target
vehicle domain. The second stage is to fine-tune the trained model purely based
on the target vehicle set, by minimizing the distribution discrepancy between
our VehicleNet and any target domain. We discuss our proposed multi-source
dataset VehicleNet and evaluate the effectiveness of the two-stage progressive
representation learning through extensive experiments. We achieve the
state-of-art accuracy of 86.07% mAP on the private test set of AICity
Challenge, and competitive results on two other public vehicle re-id datasets,
i.e., VeRi-776 and VehicleID. We hope this new VehicleNet dataset and the
learned robust representations can pave the way for vehicle re-id in the
real-world environments.