Exphormer: Sparse Transformers for Graphs
Quantification of behavior is critical in applications ranging from
neuroscience, veterinary medicine and animal conservation efforts. A common key
step for behavioral analysis is first extracting relevant keypoints on animals,
known as pose estimation. However, reliable inference of poses currently
requires domain knowledge and manual labeling effort to build supervised
models. We present a series of technical innovations that enable a new method,
collectively called SuperAnimal, to develop and deploy deep learning models
that require zero additional human labels and model training. SuperAnimal
allows video inference on over 45 species with only two global classes of
animal pose models. If the models need fine-tuning, we show SuperAnimal models
are 10$\times$ more data efficient and outperform prior transfer-learning-based
approaches. Moreover, we provide an unsupervised video-adaptation method to
refine keypoints in videos. We illustrate the utility of our model in
behavioral classification in mice and gait analysis in horses. Collectively,
this presents a data-efficient solution for animal pose estimation for
downstream behavioral analysis.