(48-5) 11 * << * >> * Russian * English * Content * All Issues
Improving attention mechanisms in transformer architecture in image restoration
N.I. Berezhnov 1, A.A. Sirota 1
1 Voronezh State University,
394018, Russia, Voronezh, Universitetskaya Square 1
PDF, 2538 kB
DOI: 10.18287/2412-6179-CO-1393
Pages: 726-733.
Full text of article: Russian language.
Abstract:
We discuss a problem of improving the quality of images obtained under the influence of various kinds of noise and distortion. In this work we solve this problem using transformer neural network models, because they have recently shown high efficiency in computer vision tasks. An attention mechanism of transformer models is investigated and problems associated with the implementation of the existing approaches based on this mechanism are identified. We propose a novel modification of the attention mechanism with the aim of reducing the number of neural network parameters, conducting a comparison of the proposed transformer model with the known ones. Several datasets with natural and generated distortions are considered. For training neural networks, the Edge Loss function is used to preserve the sharpness of images in the process of noise elimination. The influence of the degree of compression of channel information in the proposed attention mechanism on the image restoration quality is investigated. PSNR, SSIM, and FID metrics are used to assess the quality of the restored images and for a comparison with the existing neural network architectures for each of the datasets. It is confirmed that the architecture proposed by the present authors is, at least, not inferior to the known approaches in improving the image quality, while requiring less computing resources. The quality of the improved images is shown to slightly decrease for the naked human eye with an increase in the channel information compression ratio within reasonable limits.
Keywords:
image quality improvement, neural networks, transformer models, attention mechanism.
Citation:
Berezhnov NI, Sirota AA. Improving attention mechanisms in transformer architecture in image restoration. Computer Optics 2024; 48(5): 726-733. DOI: 10.18287/2412-6179-CO-1393.
References:
- Ali A, Benjdira B, et al. Vision transformers in image restoration: A survey. Sensors 2023; 23(5): 2385. DOI: 10.3390/s23052385.
- Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv Preprint. 2020. Source: <https://arxiv.org/abs/2010.11929>.
- Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. arXiv Preprint. 2017. Source: <https://arxiv.org/abs/1706.03762>. DOI: 10.48550/abs/1706.03762.
- Cordonnier J, Loukas A, Jaggi M. On the relationship between self-attention and convolutional layers. arXiv Preprint. 2019. Source: <https://arxiv.org/abs/1911.03584>. DOI: 10.48550/arXiv.1911.03584.
- Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2020: 10076-10085.
- Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B. Swin transformer: Hierarchical vision transformer using shifted windows. Proc IEEE/CVF Int Conf on Computer Vision (ICCV'2021) 2021: 10012-10022.
- Zhang J, Qin Q, Ye Q, Ruan T. ST-Unet: Swin transformer boosted U-Net with cross-layer feature enhancement for medical image segmentation. Comput Biol Med 2023; 153: 106516. DOI: 10.1016/j.compbiomed.2022.106516.
- Illarionova S, Shadrin D, Shukhratov I, Evteeva K, Popandopulo G, Sotiriadi, Burnaev E. Benchmark for building segmentation on up-scaled Sentinel-2 imagery. Remote Sens 2023; 15(9): 2347. DOI: 10.3390/rs15092347.
- Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and efficient design for semantic segmentation with transformers. arXiv Preprint. 2021. Source: <https://arxiv.org/abs/2105.15203>. DOI: 10.48550/arXiv.2105.15203.
- Fan C-M, Liu T-J, Liu K-H. SUNet: Swin transformer UNet for image denoising. 2022 IEEE Int Symp on Circuits and Systems (ISCAS) 2022: 2333-2337. DOI: 10.1109/ISCAS48785.2022.9937486.
- Wang C, Pan J, Wu X. Structural prior guided generative adversarial transformers for low-light image enhancement. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2207.07828>. DOI: 10.48550/arXiv.2207.07828.
- Zamir SW, Arora A, Khan S, Hayat M, Khan FS, Yang M. Restormer: Efficient transformer for high-resolution image restoration. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2022: 5718-5729. DOI 10.1109/CVPR52688.2022.00564.
- Valanarasu JM, Yasarla R, Patel VM. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2022: 2353-2363.
- Jing L, Tian Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans Pattern Anal Mach Intell 2021; 43: 4037-4058.
- Zhuang F. A comprehensive survey on transfer learning. Proc IEEE 2021; 109: 43-76. DOI: 10.1109/JPROC.2020.3004555.
- Berezhnov NI, Sirota AA. Universal image enhancement algorithm using deep neural neworks [In Russian]. Proc Voronezh State University: Systems Analysis and Information Technologies 2022; 2: 81-92. DOI: 10.17308/sait/1995-5499/2022/2/81-92.
- He K, et al. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition 2016: 770-778.
- Tan M, Le QV. EfficientNetV2: Smaller models and faster training. arXiv Preprint. 2021. Source: <https://arxiv.org/abs/2104.00298>. DOI: 10.48550/arXiv.2104.00298.
- Abdelhamed A, Lin S, Brown, MS. A high-quality denoising dataset for smartphone cameras. Proc IEEE Conf on Computer Vision and Pattern Recognition 2018: 1692-1700.
- Buslaev A, Iglovikov VI, Khvedchenya E, Parinov A, Druzhinin M, Kalinin AA. Albumentations: Fast and flexible image augmentations. Information 2020; 11: 125. DOI: https://doi.org/10.3390/info11020125
- Huynh-Thu, Q. Scope of validity of psnr in image/video quality assessment. Electron Lett 2013; 44: 800-801. DOI: 10.1049/el:20080522.
- Zamir SW, et al. Learning enriched features for real image restoration and enhancement. In Book: Vedaldi A, Bischof H, Brox T, Frahm J-M, eds. Computer Vision – ECCV 2020. Cham: Springer Nature Switzerland AG; 2020: 492-511. DOI: 10.1007/978-3-030-58595-2_30.
© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20