UPANets: Learning from the Universal Pixel Attention Networks
Diffusion frameworks have achieved comparable performance with previous
state-of-the-art image generation models. Researchers are curious about its
variants in discriminative tasks because of its powerful noise-to-image
denoising pipeline. This paper proposes DiffusionInst, a novel framework that
represents instances as instance-aware filters and formulates instance
segmentation as a noise-to-filter denoising process. The model is trained to
reverse the noisy groundtruth without any inductive bias from RPN. During
inference, it takes a randomly generated filter as input and outputs mask in
one-step or multi-step denoising. Extensive experimental results on COCO and
LVIS show that DiffusionInst achieves competitive performance compared to
existing instance segmentation models with various backbones, such as ResNet
and Swin Transformers. We hope our work could serve as a strong baseline, which
could inspire designing more efficient diffusion frameworks for challenging
discriminative tasks. Our code is available in
https://github.com/chenhaoxing/DiffusionInst.