Fully Convolutional Transformer with Local-Global Attention
Sihaeng Lee,Eojindl Yi,Janghyeon Lee,Jinsu Yoo,Honglak Lee,Seung Hwan Kim,Sihaeng Lee,Eojindl Yi,Janghyeon Lee,Jinsu Yoo,Honglak Lee,Seung Hwan Kim
In an attempt to imitate the success of transformers in the field of natural language processing into computer vision tasks, vision transformers (ViTs) have recently gained attention. Performance breakthroughs have been achieved in coarse-grained tasks like classification. However, dense prediction tasks, such as detection, segmentation, and depth estimation, require additional modifications and h...