The devil is in the labels: Semantic segmentation from sentences
Among image classification, skip and densely-connection-based networks have
dominated most leaderboards. Recently, from the successful development of
multi-head attention in natural language processing, it is sure that now is a
time of either using a Transformer-like model or hybrid CNNs with attention.
However, the former need a tremendous resource to train, and the latter is in
the perfect balance in this direction. In this work, to make CNNs handle global
and local information, we proposed UPANets, which equips channel-wise attention
with a hybrid skip-densely-connection structure. Also, the extreme-connection
structure makes UPANets robust with a smoother loss landscape. In experiments,
UPANets surpassed most well-known and widely-used SOTAs with an accuracy of
96.47% in Cifar-10, 80.29% in Cifar-100, and 67.67% in Tiny Imagenet. Most
importantly, these performances have high parameters efficiency and only
trained in one customer-based GPU. We share implementing code of UPANets in
https://github.com/hanktseng131415go/UPANets.