(49-6) 24 * << * >> * Russian * English * Content * All Issues

Enhanced dynamic programming-based method for text line recognition in documents
Y.S. Chernyshova 1,2, K.K. Suloev 3, A.V. Sheshkus 1,2, V.V. Arlazarov 1,2

FRC CSC of RAS,
119333, Russia, Moscow, Vavilova 44, kor. 2;
Smart Engines Service LLC,
117312, Russia, Moscow, pr. 60-letiya Oktyabrya 9;
IITP of RAS (Kharkevich Institute),
127051, Moscow, Russia, Bolshoy Karetny per. 19, build 1

 PDF, 1977 kB

DOI: 10.18287/COJ1761

Pages: 1081-1092.

Full text of article: English language.

Abstract:
On-premise text recognition is in demand. Customers want to recognize bank cards to pay online, passports to fill in tickets' information and many more using their smartphones. As main approach to text recognition in the last two decades is artificial neural networks the resulting solutions tend to be resource-hungry and not fitting on mobile devices. In our paper, we introduce an enhanced method based on dynamic programming and a fully convolutional network for text line recognition that allows this classic model to demonstrate competitive results with much heavier architectures. The main idea is the addition of the special pin into the network alphabet that allows to apply dynamic programming to analyze the raw neural network output effectively. As our main focus is the recognition of identity documents we employ public dataset MIDV-500 and its extension MIDV-2019 as a test sample. We compare our resulting recognizer with several published models, including TrOCR, Paddle OCR, and Tesseract OCR 5, to demonstrate its superiority in accuracy and performance trade-off. Our method is about 200 times faster than TrOCR, and in the most cases is about 2 times faster than Paddle OCR. The accuracy of our recognizer is comparable with Paddle OCR on MIDV-500 and is better on MIDV-2019, including it being about 2 times more accurate for machine-readable zones images.

Keywords:
data synthesis, fully convolutional neural networks, ID documents recognition, OCR, on-device recognition, text line recognition.

Citation:
Chernyshova YS, Suloev KK, Shehskus AV, Arlazarov VV. Enhanced dynamic programming-based method for text line recognition in documents. Computer Optics 2025; 49(6): 1081-1092. DOI: 10.18287/COJ1761.

References:

  1. S. Cooley, C. A. Payne, Automatic online checkout via mobile communication device with imaging system (U.S. Patent 8,177,125 B1, May 15, 2012).
  2. C. A. Payne, S. Cooley, Automatic electronic payments via mobile communication device with imaging system (U.S. Patent 9,076,171 B2, Jul. 7, 2015).
  3. B. Siddharth, G. Paritosh, S. Aldrich, M. Aishwarya, V. Dipti, Artificial intelligence-based ocr, in: ICT Systems and Sustainability, Springer Nature Singapore, Singapore, 2023.
  4. Azure AI document intelligence, accessed: 2025-06-20. Source: <https://learn.microsoft.com/en-us/azure/ai-foundry/>.
  5. L. Y. He Li, W. He, The impact of gdpr on global technology development , Journal of Global Information Technology Management 22 (1) (2019) 1–6. arXiv:https://doi.org/10.1080/1097198X.2019.1569186 , doi:10.1080/1097198X.2019.1569186 . Source: <https://doi.org/10.1080/1097198X.2019.1569186>.
  6. P. Voigt, A. Von dem Bussche, The eu general data protection regulation (gdpr), A Practical Guide, 1st Ed., Cham: Springer International Publishing 10 (3152676) (2017) 10–5555.
  7. K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, D. Nikolaev, Smart idreader: Document recognition in video stream, in: ICDAR 2017, Vol. 6, Institute of Electrical and Electronics Engineers Inc. (IEEE), Manhattan, New York, U.S., 2017, pp. 39–44, DOI: 10.1109/ICDAR.2017.347.
  8. K. B. Bulatov, P. V. Bezmaternykh, D. P. Nikolaev, V. V. Arlazarov, Towards a unified framework for identity documents analysis and recognition, Computer Optics 46 (3) (2022) 436–454, dOI: 10.18287/2412-6179-CO-1024.
  9. V. V. Arlazarov, Problems and features of 2d, 3d, and 4d identity document recognition systems, Trudy ISA RAN (Proceedings of ISA RAS) 72 (3) (2022) 3–9, dOI: 10.14357/20790279220301.
  10. V. V. Arlazarov, K. Bulatov, T. Chernov, V. L. Arlazarov, Midv-500: A dataset for identity document analysis and recognition on mobile devices in video stream, Computer Optics 43 (5) (2019) 818–824, dOI: 10.18287/2412-6179-2019-43-5-818-824.
  11. K. Bulatov, D. Matalov, V. V. Arlazarov, Midv-2019: Challenges of the modern mobile-based document ocr, in: W. Osten, D. Nikolaev, J. Zhou (Eds.), ICMV 2019, Vol. 11433, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2020, pp. 114332N1–114332N6, DOI: 10.1117/12.2558438.
  12. N. Andriyanov, Extraction and analysis of information from accounting invoices across different countries, in: Fourth International Conference on Digital Technologies, Optics, and Materials Science (DTIEE 2025), Vol. 13662, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2025, DOI: 10.1117/12.3072879.
  13. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324. DOI: 10.1109/5.726791 .
  14. Y. S. Chernyshova, A. V. Sheshkus, V. V. Arlazarov, Two-step cnn framework for text line recognition in camera-captured images, IEEE Access 8 (2020) 32587–32600, DOI: 10.1109/ACCESS.2020.2974051.
  15. R. Ptucha, F. Petroski Such, S. Pillai, F. Brockler, V. Singh, P. Hutkowski, Intelligent character recognition using fully convolutional neural networks, Pattern Recognition 88 (2019) 604–613. DOI: https://doi.org/10.1016/j.patcog.2018.12.017 .
  16. F. P. Such, D. Peri, F. Brockler, H. Paul, R. Ptucha, Fully convolutional networks for handwriting recognition, in: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018, pp. 86–91. DOI: 10.1109/ICFHR-2018.2018.00024 .
  17. J. Ghosh, A. K. Talukdar, K. K. Sarma, A light-weight natural scene text detection and recognition system, Multimedia Tools and Applications (2023) 1–33.
  18. B. Shi, X. Bai, C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (11) (2017) 2298–2304. DOI: 10.1109/TPAMI.2016.2646371 .
  19. C. Neudecker, K. Baierer, M. Federbusch, M. Boenig, K.-M. Würzner, V. Hartmann, E. Herrmann, Ocr-d: An end-to-end open source ocr framework for historical printed documents, in: Proceedings of the 3rd international conference on digital access to textual cultural heritage, 2019, pp. 53–58.
  20. M. Namysl, I. Konya, Efficient, lexicon-free ocr using deep learning, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 295–301. DOI: 10.1109/ICDAR.2019.00055 .
  21. D. P. Van Hoai, H.-T. Duong, V. T. Hoang, Text recognition for vietnamese identity card based on deep features network, International Journal on Document Analysis and Recognition (IJDAR) 24 (2021) 123–131.
  22. T. Hegghammer, Ocr with tesseract, amazon textract, and google document ai: a benchmarking experiment, Journal of Computational Social Science 5 (1) (2022) 861–882.
  23. I. M. D. R. Mudiarta, I. M. D. S. Atmaja, I. K. Suharsana, I. W. G. S. Antara, I. W. P. Bharaditya, G. A. Suandirat, G. Indrawan, Balinese character recognition on mobile application based on tesseract open source ocr engine , Journal of Physics: Conference Series 1516 (1) (2020) 012017. DOI: 10.1088/1742-6596/1516/1/012017 . Source: <https://dx.doi.org/10.1088/1742-6596/1516/1/012017>.
  24. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin, Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017.
  25. D. Kass, E. Vats, Attentionhtr: handwritten text recognition based on attention encoder-decoder networks, in: International Workshop on Document Analysis Systems, Springer, 2022, pp. 507–522.
  26. N. Riaz, H. Arbab, A. Maqsood, K. Nasir, A. Ul-Hasan, F. Shafait, Conv-transformer architecture for unconstrained off-line urdu handwriting recognition, International Journal on Document Analysis and Recognition (IJDAR) 25 (4) (2022) 373–384.
  27. C. Li, W. Liu, R. Guo, X. Yin, K. Jiang, Y. Du, Y. Du, L. Zhu, B. Lai, X. Hu, D. Yu, Y. Ma, Pp-ocrv3: More attempts for the improvement of ultra lightweight ocr system (2022). arXiv:2206.03001 .
  28. M. Li, T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, F. Wei, Trocr: Transformer-based optical character recognition with pre-trained models, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 2023, pp. 13094–13102.
  29. Z. Huang, K. Chen, J. He, X. Bai, D. Karatzas, S. Lu, C. V. Jawahar, Icdar2019 competition on scanned receipt ocr and information extraction, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 1516–1520. DOI: 10.1109/ICDAR.2019.00244 .
  30. J. Park, W. Kang, S. Park, K. Lee, H. Koo, N. Ik Cho, Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images, IEEE Access 13 (2025) 91263–91275, DOI: 10.1109/ACCESS.2025.3572001.
  31. M. Fujitake, Dtrocr: Decoder-only transformer for optical character recognition, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 8025–8035.
  32. T. S. Ajani, A. L. Imoize, A. A. Atayero, An overview of machine learning within embedded and mobile devices–optimizations and applications , Sensors 21 (13) (2021). DOI:10.3390/s21134412. Source: <https://www.mdpi.com/1424-8220/21/13/4412>.
  33. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, H. Adam, Searching for mobilenetv3, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  34. E. E. Limonova, Fast and gate-efficient approximated activations for bipolar morphological neural networks, ITiVS (2) (2022) 3–10, DOI: 10.14357/20718632220201.
  35. A. Trusov, E. Limonova, D. Slugin, D. Nikolaev, V. V. Arlazarov, Fast implementation of 4-bit convolutional neural networks for mobile devices, in: 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 9897–9903. DOI: 10.1109/ICPR48806.2021.9412841 .
  36. Y. Baek, B. Lee, D. Han, S. Yun, H. Lee, Character region awareness for text detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  37. M. Arooba, R. Nauman, U.-H. Adnan, S. Faisal, A unified architecture for urdu printed and handwritten text recognition, in: Document Analysis and Recognition - ICDAR 2023, Springer Nature Switzerland, Cham, 2023, pp. 116–130.
  38. Y. S. Chernyshova, A. V. Gayer, A. V. Sheshkus, Generation method of synthetic training data for mobile ocr system, in: A. Verikas, P. Radeva, D. Nikolaev, J. Zhou (Eds.), ICMV 2017, Vol. 10696, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2018, pp. 106962G1–106962G7, DOI: 10.1117/12.2310119.
  39. Y. Liu, H. Joren, O. Gupta, D. Raviv, Mrz code extraction from visa and passport documents using convolutional neural networks, International Journal on Document Analysis and Recognition (IJDAR) 25 (1) (2022) 29–39.
    .

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20