(46-5) 13 * << * >> * Русский * English * Содержание * Все выпуски

Разработка нейросетевого алгоритма распознавания надписей на изображениях реальных сцен
В.А. Лобанова¹, Ю.А. Иванова¹

¹Национальный исследовательский Томский политехнический университет,
634050, Россия, г. Томск, пр. Ленина, д. 30

PDF, 1304 kB

DOI: 10.18287/2412-6179-CO-1047

Страницы: 790-800.

Аннотация:
Работа посвящена проектированию и реализации нейросетевого алгоритма детектирования надписей на изображениях реальных сцен. Проведен обзор существующих нейросетевых и классических моделей, в качестве базовой была выбрана модель U-net. На ее основе предложен и реализован алгоритм детектирования текстовых областей на изображениях. В ходе проведения экспериментов были определены следующие параметры нейронной сети: размеры входных изображений, количество и типы составляющих её слоёв. В качестве предобработки рассматривались билатеральные фильтры сглаживания и сглаживающие частотные фильтры. Увеличение исходной базы изображений KAIST Scene Text Database достигается за счёт применения поворотов, сжатия и разбиения входящих в неё изображений. Полученные результаты превосходят другие методы по значению F-меры и достигают 0,88.

Ключевые слова:
детектирование текстовых областей, U-Net, сегментация изображений, изображения реальных сцен.

Цитирование:
Лобанова, В.А. Разработка нейросетевого алгоритма распознавания надписей на изображениях реальных сцен / В.А. Лобанова, Ю.А. Иванова // Компьютерная оптика. – 2022. – Т. 46, № 5. – С. 790-800. – DOI: 10.18287/2412-6179-CO-1047.

Citation:
Lobanova VA, Ivanova YA. Development of software for the segmentation of text areas in real-scene images. Computer Optics 2022; 46(5): 790-800. DOI: 10.18287/2412-6179-CO-1047.

References:

Mechi O, Mehri M, Ingold R, Ben Amara NE. Text line segmentation in historical document images using an adaptive U–Net architecture. Int Conf on Document Analysis and Recognition 2019: 369-374.
Chowdhury PN, Shivakumara P, Raghavendra R, Pal U, Lu T, Blumenstein M. A new U-Net based license plate enhancement model in night and day images 5th Asian Conf on Pattern Recognition 2019: 749-763.
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 2004; 22(10): 761-767.
Neumann L, Matas J. Real-time scene text localization and recog-nition. IEEE Conf on Computer Vision and Pattern Recognition 2012: 3538-3545.
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2010; 2963-2970.
Ahmed N, Natarajan T, RaoKR. Discrete cosine transform. IEEE Trans Comput 1974; C-23(1): 90-93.
Zhong Y, Zhang H, Jain AK. Automatic caption localization in compressed video. IEEE Trans Pattern Anal Mach Intell 2000; 22(4): 385-392.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005; 1: 886-893.
Czarnek N. Physically motivated feature development for machine learning applications. Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2017.
Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proc 2001 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2001; 1: 511-518.
Ghorbel A. Generalized Haar-like filters for document analysis: application to word spotting and text extraction from comics. Document and Text Processing. Université de La Rochelle; 2016.
Chen X, Yuille AL. Detecting and reading text in natural scenes. Proc 2004 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2004; 2: 366-373.
Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V. Multi-digit number recognition from street view imagery using deep convolutional neural networks. Proc Int Conf on Learning Representations 2014: 1-12.
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multi-oriented text detection with fully convolutional networks. Proc 2016 IEEE Conf on Computer Vision and Pattern Recognition 2016: 4159-4167.
Ronneberger O, Fischer P, BroxT. U-net: Convolutional networks for bio-medical image segmentation. Med Image Comput Comput Assist Interv 2015; 9351: 234-241.
Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832.
Lee S, Cho MS, Jung K, Kim JH. Scene text extraction with edge constraint and text collinearity. 20th Int Conf on Pattern Recognition 2010: 3983-3986.
Tomasi C, Manduchi R. Bilateral filtering for gray and color images. 6th Int Conf on Computer Vision 1998: 839-846.
Bai B, Yin F, Liu CL. A seed-based segmentation method for scene text extraction. 11th IAPR Int Workshop on Document Analysis Systems2014: 262-266.
Agrawal A, Mukherjee P, Srivastava S, Lall B. Enhanced characterness for text detection in the wild. Proc 2nd Int Conf on Computer Vision & Image Processing 2018: 359-369.
Gomez L, Karatzas D. A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. Int J Doc Anal Recognit 2016; 19(4): 335-349.
Jahangiri, M., Petrou, M. An attention model for extracting components that merit identification. 2009 16th IEEE Int Conf on Image Processing (ICIP) 2009: 965-968.
Li Y, et al. Characterness. An indicator of text in the wild. IEEE Trans Image Process 2014; 23(4): 1666-1677.
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst 1979; 9(1): 62-66.
Niblack W. An introduction to digital image processing. New York: Prentice Hall; 1986.
Kita K, Wakahara T. Binarization of color characters in scene images using k-means clustering and support vector machines. 2010 20th Int Conf on Pattern Recognition 2010: 3183-3186.
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R. Multi-lingual scene text detection and language identification. Pattern Recognit Lett 2020; 138: 16-22.
Li L, Yu S, Zhong L, Li X. Multilingual text detection with nonlinear neural network. Math Probl Eng 2015; 2015: 431608.
Xu H, Su X, Liu T, Guo P, Gao G, Bao F. A natural scene text extraction approach based on generative adversarial learning. Int Conf on Neural Information Processing 2019: 65-73.
Nayef N, Patel Y, Busta M, Chowdhury PN, Karatzas D, Khlif W, Matas J, Pal U, Burie J-C, Liu C-l, Ogier JM. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. 2019 Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1582-1587.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20

Разработка нейросетевого алгоритма распознавания надписей на изображениях реальных сцен В.А. Лобанова 1, Ю.А. Иванова 1

1 Национальный исследовательский Томский политехнический университет, 634050, Россия, г. Томск, пр. Ленина, д. 30

Разработка нейросетевого алгоритма распознавания надписей на изображениях реальных сцен
В.А. Лобанова¹, Ю.А. Иванова¹

¹Национальный исследовательский Томский политехнический университет,
634050, Россия, г. Томск, пр. Ленина, д. 30