AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding
In this paper, we investigate visual-based camera re-localization with neural
networks for robotics and autonomous vehicles applications. Our solution is a
CNN-based algorithm which predicts camera pose (3D translation and 3D rotation)
directly from a single image. It also provides an uncertainty estimate of the
pose. Pose and uncertainty are learned together with a single loss function and
are fused at test time with an EKF. Furthermore, we propose a new fully
convolutional architecture, named CoordiNet, designed to embed some of the
scene geometry. Our framework outperforms comparable methods on the largest
available benchmark, the Oxford RobotCar dataset, with an average error of 8
meters where previous best was 19 meters. We have also investigated the
performance of our method on large scenes for real time (18 fps) vehicle
localization. In this setup, structure-based methods require a large database,
and we show that our proposal is a reliable alternative, achieving 29cm median
error in a 1.9km loop in a busy urban area