Multiscale Vision Transformers
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem, i.e. classifying
instances among a set of classes for which we only have a few labeled samples,
while simultaneously detecting instances that do not belong to any known class.
We explore the popular transductive setting, which leverages the unlabelled
query instances at inference. Motivated by the observation that existing
transductive methods perform poorly in open-set scenarios, we propose a
generalization of the maximum likelihood principle, in which latent scores
down-weighing the influence of potential outliers are introduced alongside the
usual parametric model. Our formulation embeds supervision constraints from the
support set and additional penalties discouraging overconfident predictions on
the query set. We proceed with a block-coordinate descent, with the latent
scores and parametric model co-optimized alternately, thereby benefiting from
each other. We call our resulting formulation \textit{Open-Set Likelihood
Optimization} (OSLO). OSLO is interpretable and fully modular; it can be
applied on top of any pre-trained model seamlessly. Through extensive
experiments, we show that our method surpasses existing inductive and
transductive methods on both aspects of open-set recognition, namely inlier
classification and outlier detection.