Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition
Anomaly detection in video is a challenging computer vision problem. Due to
the lack of anomalous events at training time, anomaly detection requires the
design of learning methods without full supervision. In this paper, we approach
anomalous event detection in video through self-supervised and multi-task
learning at the object level. We first utilize a pre-trained detector to detect
objects. Then, we train a 3D convolutional neural network to produce
discriminative anomaly-specific information by jointly learning multiple proxy
tasks: three self-supervised and one based on knowledge distillation. The
self-supervised tasks are: (i) discrimination of forward/backward moving
objects (arrow of time), (ii) discrimination of objects in
consecutive/intermittent frames (motion irregularity) and (iii) reconstruction
of object-specific appearance information. The knowledge distillation task
takes into account both classification and detection information, generating
large prediction discrepancies between teacher and student models when
anomalies occur. To the best of our knowledge, we are the first to approach
anomalous event detection in video as a multi-task learning problem,
integrating multiple self-supervised and knowledge distillation proxy tasks in
a single architecture. Our lightweight architecture outperforms the
state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD
Ped2. Additionally, we perform an ablation study demonstrating the importance
of integrating self-supervised learning and normality-specific distillation in
a multi-task learning setting.