The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks
In LiDAR-based 3D object detection for autonomous driving, the ratio of the
object size to input scene size is significantly smaller compared to 2D
detection cases. Overlooking this difference, many 3D detectors directly follow
the common practice of 2D detectors, which downsample the feature maps even
after quantizing the point clouds. In this paper, we start by rethinking how
such multi-stride stereotype affects the LiDAR-based 3D object detectors. Our
experiments point out that the downsampling operations bring few advantages,
and lead to inevitable information loss. To remedy this issue, we propose
Single-stride Sparse Transformer (SST) to maintain the original resolution from
the beginning to the end of the network. Armed with transformers, our method
addresses the problem of insufficient receptive field in single-stride
architectures. It also cooperates well with the sparsity of point clouds and
naturally avoids expensive computation. Eventually, our SST achieves
state-of-the-art results on the large scale Waymo Open Dataset. It is worth
mentioning that our method can achieve exciting performance (83.8 LEVEL 1 AP on
validation split) on small object (pedestrian) detection due to the
characteristic of single stride. Codes will be released at
https://github.com/TuSimple/SST