(50-1) 14 * << * >> * Russian * English * Content * All Issues
DOI: 10.18287/COJ1686
Article ID: 1686
Language: English
Abstract:
A hybrid neural network architecture, SegTwice, is proposed for the semantic segmentation task. It combines the strengths of transformers and convolutional neural networks within a unified encoder-decoder framework. The original architecture of the encoding network, TWICE-DA, is presented, featuring a hierarchical structure with four levels. New architectural solutions are introduced and justified within the transformer blocks, which differ from known analogs: a multi-scale perception unit, a channel attention module, a deformable attention module, and a convolutional feedforward network module. Experiments on image classification tasks were conducted to assess the feature extraction effectiveness of TWICE-DA on datasets of varying complexity. It is shown that TWICE-DA demonstrates high quality, outperforming most modern models in terms of accuracy and computational complexity. The integration of TWICE-DA into the semantic segmentation network structure is achieved by adding a lightweight MLP decoder, ultimately realizing the SegTwice architecture. Experiments conducted on standard aerospace datasets, LoveDA and Potsdam, revealed that the proposed SegTwice network demonstrates competitive performance, matching traditional models and modern transformers in accuracy, and in some cases, outperforming them. Notably, SegTwice was trained "from scratch" without pre-training on large datasets, highlighting its resilience to overfitting in scenarios with limited data.
Keywords:
computer vision, semantic segmentation, deep neural networks, convolutional neural networks, transformers, attention mechanism.
Acknowledgements:
This work was supported by the Ministry of Science and Higher Education within the State assignment № 075-00444-25-00 (by 26.12.2024).
Citation:
Otyrba RR, Sirota AA. Hybrid architecture of transformer and convolutional neural network with a multi-scale deformable attention mechanism for semantic segmentation task. Computer Optics 2026; 50(1): 1686. DOI: 10.18287/COJ1686.
References:
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20