(46-4) 08 * << * >> * Russian * English * Content * All Issues

Document image analysis and recognition: a survey
V.V. Arlazarov^1,2, E.I. Andreeva², K.B. Bulatov^1,2, D.P. Nikolaev³, O.O. Petrova², B.I. Savelev², O.A. Slavin¹

¹Federal Research Center "Computer Sciences and Control" Russian Academy of Sciences,
117312, Moscow, Russia, prosp. 60-letiya Oktyabrya, 9;
²LLC "Smart Engines Service", 117312, Moscow, Russia, prosp. 60-letiya Oktyabrya, 9;
³Federal Publicly Funded Institution of Science, Institute for Information Transmission Problems
n.a. A.A. Kharkevich of Russian Academy of Science, 127051, Moscow, Russia Bolshoy Karetny per. 19

PDF, 1288 kB

DOI: 10.18287/2412-6179-CO-1020

Pages: 567-589.

Full text of article: English language.

Abstract:
This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.

Keywords:
document recognition, image normalization, binarization, local features, segmentation, document boundary detection, artificial neural network, information extraction, document sorting, document comparison, video sequence recognition.

Citation:
Arlazarov VV, Andreeva EI, Bulatov KB, Nikolaev DP, Petrova OO, Savelev BI, Slavin OA. Document image analysis and recognition: a survey. Computer Optics 2022; 46(4): 567-589. DOI: 10.18287/2412-6179-CO-1020.

Acknowledgements:
The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC "Smart Engines Service"), Yuliya Chernysheva (LLC "Smart Engines Service"), Yuliya Shemyakina (LLC "Smart Engines Service") for valuable comments and suggestions.

References:

Arlazarov V, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
Jaume G, Ekenel HK, Thiran J. Funsd: A dataset for form understanding in noisy scanned documents. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019; 2: 1-6. DOI: 10.1109/ICDARW.2019.10029.
Liu L, Wang Z, Qiu T, Chen Q, Lu Y, Suen CY. Document image classification: Progress over two decades. Neurocomputing 2021; 453: 223-240. DOI: 10.1016/j.neucom.2021.04.114.
Baviskar D, Ahirrao S, Potdar V, Kotecha K. Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions. IEEE Access 2021; 9: 72894-72936. doi: 10.1109/ACCESS.2021.3072900.
Hull JJ. Document image skew detection: Survey and annotated bibliography. In Book: Hull JJ, Taylor SL, eds. Document analysis systems II. London: World Scientific Publishing Co; 1998: 40-64. DOI: 10.1142/9789812797704_0003.
Rehman A, Saba T. Document skew estimation and correction: Analysis of techniques, common problems and possible solutions. Appl Artif Intell 2011; 25(9): 769-787. DOI: 10.1080/08839514.2011.607009.
Chen D, Luettin J, Shearer K. A survey of text detection and recognition in images and videos. Institute Dalle Molle d'Intelligence Artificielle Perceptive Research Report 2000: 00-38.
Nagy G. Twenty years of document analysis in PAMI. IEEE Trans Pattern Anal Mach Intell 2000; 22(1): 38-62. DOI: 10.1109/34.824820.
Mao S, Rosenfeld A, Kanungo T. Document structure analysis algorithms: a literature survey. Proc SPIE 2003; 5010: 197-207. DOI: 10.1117/12.476326.
Doermann D, Liang J, Li H. Progress in camera-based document image analysis. Seventh Int Conf on Document Analysis and Recognition 2003; 1: 606-616. doi: 10.1109/ICDAR.2003.1227735.
Zanibbi R, Blostein D, Cordy J. A survey of table recognition. Int J Doc Anal Recognit 2004; 7: 1-16. DOI: 10.1007/s10032-004-0120-9.
Jung K, Kim K, Jain A. Text information extraction in images and video: A survey. Pattern Recognit 2004; 37: 977-997. DOI: 10.1016/j.patcog.2003.10.012.
Liang J, Doermann D, Li H. Camera-based analysis of text and documents: a survey. Int J Doc Anal Recognit 2005; 7: 84-104. DOI: 10.1007/s10032-004-0138-z.
Marinai S, Gori M, Soda G. Artificial neural networks for document analysis and recognition. IEEE Trans Pattern Anal Mach Intell 2005; 27(1): 23-35. doi: 10.1109/TPAMI.2005.4.
Chen N, Blostein D. A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int J Doc Anal Recognit 2007; 10: 1-16. DOI: 10.1007/s10032-006-0020-2.
Baharudin B, et al. A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 2010; 1: 4-20.
Dixit U, Shirdhonkar M. A survey on document image analysis and retrieval system. Int J Cybern Inform 2015; 4: 259-270. DOI: 10.5121/ijci.2015.4225.
Eskenazi S, Gomez-Krämer P, Ogier JM. A comprehensive survey of mostly textual document segmentation algorithms since 2008. Pattern Recognit 2017; 64: 1-14.
Binmakhashen GM, Mahmoud SA. Document layout analysis: A comprehensive survey. ACM Comput Surv 2019; 52(6): 109.
Lombardi F, Marinai S. Deep learning for historical document analysis and recognition–A survey. J Imaging 2020; 6: 110. DOI: 10.3390/jimaging6100110.
Bhatt J, Hashmi KA, Afzal MZ, Stricker D. A survey of graphical page object detection with deep neural networks. Appl Sci 2021; 11(12): 5344. DOI: 10.3390/app11125344.
Doermann D, Tombre K. Handbook of document image processing and recognition. Springer Publishing Company Inc; 2014.
Liu CL, Lu Y, eds. Advances in chinese document and text processing. World Scientific; 2017. ISBN: 978-981-3143-67-8.
Fischer A, Liwicki M, Ingold R. Handwritten historical document analysis, recognition, and retrieval – state of the art and future trends. World Scientific Publishing Co Pte Ltd; 2021.
SJR. Scimago Journal & Country Rank. Proc Int Conf on Document Analysis and Recognition (ICDAR). Source: áhttps://www.scimagojr.com/journalsearch.php?q=75898&tip=sidñ.
Bloomberg DS, Kopec GE, Dasari L. Measuring document image skew and orientation. Proc SPIE 1995; 2422: 302-316. DOI: 10.1117/12.205832.
Steinherz T, Intrator N, Rivlin E. Skew detection via principal components analysis. Proc Fifth Int Conf on Document Analysis and Recognition. ICDAR '99 (Cat. No. PR00318) 1999: 153-156. DOI: 10.1109/ICDAR.1999.791747.
Bezmaternykh P, Nikolaev DP. A document skew detection method using fast Hough transform. Proc SPIE 2020; 114330: 114330J. DOI: 10.1117/12.2559069.
Akhter SSMN, Rege PP. Improving skew detection and correction in different document images using a deep learning approach. 2020 11th Int Conf on Computing, Communication and Networking Technologies (ICCCNT) 2020: 1-6. DOI: 10.1109/ICCCNT49239.2020.9225619.
Papandreou A, Gatos B, Louloudis G, Stamatopoulos N. ICDAR 2013 document image skew estimation contest (DISEC 2013). 2013 12th Int Conf on Document Analysis and Recognition 2013: 1444-1448. DOI: 10.1109/ICDAR.2013.291.
Fabrizio J. A precise skew estimation algorithm for document images using KNN clustering and fourier transform. 2014 IEEE Int Conf on Image Processing (ICIP) 2014: 2585-2588. DOI: 10.1109/ICIP.2014.7025523.
Uchida S, Taira E, Sakoe H. Nonuniform slant correction using dynamic programming. Proc Sixth Int Conf on Document Analysis and Recognition 2001: 434-438. DOI: 10.1109/ICDAR.2001.953827.
Otsu N. Threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern Syst 1979; SMC-9(1): 62-66. DOI: 10.1109/tsmc.1979.4310076.
Lu S, Su B, Tan CL. Document image binarization using background estimation and stroke edges. Int J Doc Anal Recognit 2010; 13(4): 303-314. DOI: 10.1007/s10032-010-0130-8.
Gatos B, Pratikakis I, Perantonis SJ. Adaptive degraded document image binarization. Pattern Recognit 2006; 39(3): 317-327. DOI: 10.1016/j.patcog.2005.09.010.
Ershov EI, Korchagin SA, Kokhan VV, Bezmaternykh PV. A generalization of Otsu method for linear separation of two unbalanced classes in document image binarization. Computer Optics 2021; 45(1): 66-76. DOI: 10.18287/2412-6179-CO-752.
Calvo-Zaragoza J, Gallego A-J. A selectional auto-encoder approach for document image binarization. Pattern Recognit 2019; 86: 37-47. DOI: 10.1016/j.patcog.2018.08.011.
Bezmaternykh PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization contest. Computer Optics 2019; 43(5): 825-832. DOI: 10.18287/2412-6179-2019-43-5-825-832.
Document image binarization. Source: áhttps://dib.cin.ufpe.brñ.
Skoryukina N, Arlazarov V, Nikolaev D. Fast method of id documents location and type identification for mobile and server application. IEEE Int Conf on Document Analysis and Recognition (ICDAR) 2019: 850-857. DOI: 10.1109/ICDAR.2019.00141.
Challenge 1: smartphone document capture competition. Source: <https://sites.google.com/site/icdar15smartdoc/challenge-1>.
Schmid C, Mohr R. Local grayvalue invariants for image retrieval. IEEE Trans Pattern Anal Mach Intell 1997; 19(5): 530-535. DOI: 10.1109/34.589215.
Harris C, Stephens M. A combined corner and edge detector. Alvey Vision Conference 1988: 147-151. DOI: 10.5244/C.2.23.
Rosten E, Drummond T. Machine learning for high-speed corner detection. In Book: Leonardis A, Bischof H, Pinz A, eds. Computer vision – ECCV 2006. Part 1. Berlin, Heidelberg: Springer-Verlag; 2006: 430-443. DOI: 10.1007/11744023_34.
Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2004; 60(2): 91-110. DOI: 10.1023/B%3AVISI.0000029664.99615.94.
Lepetit V, Fua P. Towards recognizing feature points using classification trees. Technical report, Swiss Federal Institute of Technology (EPFL), 2004. Source: <https://infoscience.epfl.ch/record/52666>.
Bay H, EssTinne A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput Vis Image Underst 2008; 110(3): 346-359. DOI: 10.1016/j.cviu.2007.09.014.
Rosin PL. Measuring corner properties. Comput Vis Image Underst 1999; 73(2): 291-307. DOI: 10.1006/cviu.1998.0719.
Leutenegger S, Chli M, Siegwart RY. BRISK: Binary robust invariant scalable keypoints. IEEE Int Conf on Computer Vision (ICCV) 2011: 2548-2555. DOI: 10.1109/ICCV.2011.6126542.
Zhang H, Wohlfeil J, Grießbach D. Extension and evaluation of the AGAST feature detector. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 2016; III(4): 133-137. DOI: 10.5194/isprsannals-III-4-133-2016.
Verma R, Kaur R. Enhanced character recognition using surf feature and neural network technique. Int J Comput Sci Inf Technol Res 2014; 5(4): 5565-5570.
Dang OB, Coustaty M, Luqman MMM, Ogier J-M. A comparison of local features for camera-based document image retrieval and spotting. Int J Doc Anal Recognit 2019; 22: 247-263. DOI: 10.1007/s10032-019-00329-w.
Lewis D, Agam G, Argamon S, Frieder O, Grossman D. Building a test collection for complex document information processing. Proc 29th Annual Int ACM SIGIR conf on Research and development in information retrieval (SIGIR '06) 2006: 665-666. DOI: 10.1145/1148170.1148307.
Bulatov K, Matalov D, Arlazarov VV. MIDV-2019: Challenges of the modern mobile-based document OCR. Proc SPIE 2019; 11433: 114332N. DOI: 10.1117/12.2558438.
University of California, San Francisco: The Legacy Tobacco Document Library (LTDL) 2007. Source: <http://legacy.library.ucsf.edu>.
Zhang Z, He L-W. Whiteboard scanning and image enhancement. Digit Signal Process 2007; 17(2): 414-432. DOI: 10.1016/j.dsp.2006.05.006.
Liu N, Wang L. Dynamic detection of an object framework in a mobile device captured image. US Patent 10134163 of November 20, 2018.
Hartl A, Reitmayr G. Rectangular target extraction for mobile augmented reality applications. The 21st Int Conf on Pattern Recognition (ICPR 2012) 2012: 81-84.
Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2014; 9445: 94452A. DOI: 10.1117/12.2181377.
Tropin DV, Ilyuhin SA, Nikolaev DP, Arlazarov VV. Approach for document detection by contours and contrasts. IEEE Int Conf on Pattern Recognition (ICPR) 2020: 9689-9695. DOI: 10.1109/ICPR48806.2021.9413271.
Hua G, Liu Z, Zhang Z, Wu Y. Automatic business card scanning with a camera. IEEE Int Conf on Image Processing (ICIP) 2006: 373-376. DOI: 10.1109/ICIP.2006.312471.
Xu Y, Carlinet E, Géraud T, Najman L. Hierarchical segmentation using tree-based shape spaces. IEEE Trans Pattern Anal Mach Intell 2017; 39(3): 457-469. DOI: 10.1109/TPAMI.2016.2554550.
Attivissimo F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity documents. IEEE Int Conf on Systems, Man and Cybernetics (SMC) 2019: 3525-3530. DOI: 10.1109/SMC.2019.8914438.
Castelblanco A, Solano J, Lopez C, Rivera E, Tengana L, Ochoa M. Machine learning techniques for identity document verification in uncontrolled environments: A case study. Springer Mexican Conference on Pattern Recognition (MCPR) 2020: 271-281. DOI: 10.1007/978-3-030-49076-8_26.
Sheshkus A, Nikolaev D, Arlazarov VL. Houghencoder: neural network architecture for document image semantic segmentation. IEEE Int Conf on Image Processing (ICIP) 2020: 1946-1950. DOI: 10.1109/ICIP40778.2020.9191182.
Javed K, Shafait F. Real-time document localization in natural images by recursive application of a CNN. IEEE IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 105-110. DOI: 10.1109/ICDAR.2017.26.
das Neves RB, Felipe Verçosa L, Macêdo D, Dantas Bezerra BL, Zanchettin C. A fast fully octave convolutional neural network for document image segmentation. IEEE Int Joint Conf on Neural Networks (IJCNN) 2020: 1-6. DOI: 10.1109/IJCNN48605.2020.9206711.
Viola P, Jones M. Robust real-time object detection. Int J Comput Vis 2002; 57: 137-154.
Usilin S, Nikolaev D, Postnikov V, Schaefer G. Visual appearance based document image classification. 2010 IEEE Int Conf on Image Processing 2010: 2133-2136. DOI: 10.1109/ICIP.2010.5652024.
Roy PP, Pal U, Llados J. Seal detection and recognition: an approach for document indexing. 10th Int Conf on Document Analysis and Recognition 2009: 101-105. DOI: 10.1109/ICDAR.2009.128.
Wang Y, Zhou Y, Tang Z. Comic frame extraction via line segments combination. 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 856-860. DOI: 10.1109/ICDAR.2015.7333883.
Povolotskiy MA, Tropin DV. Dynamic programming approach to template-based OCR. Proc SPIE 2019; 11041: 110411T. DOI: 10.1117/12.2522974.
Slavin OA. Using special text points in the recognition of documents. In Book: Kravets AG, Bolshakov AA, Shcherbakov MV, eds. Cyber-physical systems: Advances in design & modelling. Cham: Springer Nature Switzerland AG; 2020: 43-53. DOI: 10.1007/978-3-030-32579-4_4.
Shafait F, Breuel TM. The effect of border noise on the performance of projection-based page segmentation methods. IEEE Trans Pattern Anal Mach Intell 2011; 33(4): 846-851. DOI: 10.1109/TPAMI.2010.194.
Melinda L, Ghanapuram R, Bhagvati C. Document layout analysis using multigaussian fitting. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 747-752. DOI: 10.1109/ICDAR.2017.127.
Yi X, Gao L, Liao Y, Zhang X, Liu R, Jiang Z. CNN based page object detection in document images. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 230-235. DOI: 10.1109/ICDAR.2017.46.
Kosaraju SC, Masum M, Tsaku NZ, Patel P, Bayramoglu T, Modgil G, Kang M. DoT-Net: Document layout classification using texture-based CNN. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1029-1034. DOI: 10.1109/ICDAR.2019.00168.
He D, Cohen S, Price B, Kifer D, Giles CL. Multi-scale multi-task FCN for semantic page segmentation and table detection. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 254-261. DOI: 10.1109/ICDAR.2017.50.
Wu Y, Wang W, Palaiahnakote S, Lu T. A robust symmetry-based method for scene/video text detection through neural network. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 1249-1254. DOI: 10.1109/ICDAR.2017.206.
Antonacopoulos A, Bridson D, Papadopoulos C, Pletschacher S. A realistic dataset for performance evaluation of document layout analysis. 10th Int Conf on Document Analysis and Recognition 2009: 296-300. DOI: 10.1109/ICDAR.2009.271.
Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-Text: Dataset and benchmark for text detection and recognition in natural images. arXiv Preprint 2016. Source: <https://arxiv.org/abs/1601.07140>.
Brunessaux S, Giroux P, Grilheres B, Manta M, Bodin M, Choukri K, Galibert O, Kahn J. The Maurdor Project: Improving automatic processing of digital documents. 11th IAPR Int Workshop on Document Analysis Systems 2014: 349-354. DOI: 10.1109/DAS.2014.58.
Soares AS, Neves RB, Bezerra BLD. BID Dataset: a challenge dataset for document processing tasks. Conf on Graphics, Patterns and images (sibgrapi) 2020. DOI: 10.5753/sibgrapi.est.2020.12997.
Göbel M, Hassan T, Oro E, Orsi G. ICDAR 2013 table competition. 12th Int Conf on Document Analysis and Recognition 2013: 1449-1453. DOI: 10.1109/ICDAR.2013.292.
Gao L, Yi X, Jiang Z, Hao L, Tang Z. ICDAR 2017 competition on page object detection. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 1: 1417-1422. DOI: 10.1109/ICDAR.2017.231.
Gao L, et al. ICDAR 2019 competition on table detection and recognition (cTDaR). Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1510-1515. DOI: 10.1109/ICDAR.2019.00243.
Costa e Silva A, Jorge AM, Torgo L. Design of an end-to-end method to extract information from tables. Int J Doc Anal Recognit 2006; 8: 144-171. DOI: 10.1007/s10032-005-0001-x.
Shafait F, Smith R. Table detection in heterogeneous documents. 9th IAPR Int Workshop on Document Analysis Systems 2010: 65-72. DOI: 10.1145/1815330.1815339.
Zhong X, ShafieiBavani E, Yepes AJ. Image-based table recognition: data, model, and evaluation. arXiv Preprint 2019. Source: <https://arxiv.org/abs/1911.10683>.
Lewis D, Agam G, Argamon S, Frieder O, Grossman D, Heard J. Building a test collection for complex document information processing. 29th Annual Int ACM SIGIR conf on Research and development in Information Retrieval 2006: 665-666. DOI: 10.1145/1148170.1148307.
Shahab A, Shafait F, Kieninger T, Dengel A. An open approach towards the benchmarking of table structure recognition systems. 9th IAPR Int Workshop on Document Analysis Systems 2010: 113-120. DOI: 10.1145/1815330.1815345.
Fang J, Tao X, Tang Z, Qiu R, Liu Y. Dataset, ground-truth and performance metrics for table detection evaluation. 10th IAPR Int Workshop on Document Analysis Systems 2012: 445-449. DOI: 10.1109/DAS.2012.29.
Seo W, Koo HI, Cho NI. Junction-based table detection in camera-captured document images. Int J Doc Anal Recognit 2014; 18(1): 47-57. DOI: 10.1007/s10032-014-0226-7.
Siddiqui SA, Fateh IA, Rizvi STR, Dengel A, Ahmed S. DeepTabStR: Deep learning based table structure recognition. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1403-1409. DOI: 10.1109/ICDAR.2019.00226.
Huang Z, Chen K, He J, Bai X, Karatzas D, Lu S, Jawahar CV. ICDAR 2019 competition on scanned receipt ocr and information extraction. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1516-1520. DOI: 10.1109/ICDAR.2019.00244.
Mondal A, Lipps P, Jawahar CV. IIIT-AR-13K: A new dataset for graphical object detection in documents. In Book: Bai X, Karatzas D, Lopresti D, eds. Document analysis systems. Cham: Springer International Publishing; 2020: 216-230. DOI: 10.1007/978-3-030-57058-3_16.
Jia F, Shi C, Wang Y, Wang C, Xiao B. Grayscale-projection based optimal character segmentation for camera-captured faint text recognition. 2017 Int Conf on Document Analysis and Recognition 2017: 1301-1306. DOI: 10.1109/ICDAR.2017.214.
Roy PP, Pal U, Lladós J, Delalandre M. Multi-oriented touching text character segmentation in graphical documents using dynamic programming. Pattern Recognit 2012; 45(5): 1972-1983. DOI: 10.1016/j.patcog.2011.09.026.
Saba T, Rehman A. Effects of artificially intelligent tools on pattern recognition. Int J Mach Learn Cybern 2013; 4: 155-162. DOI: 10.1007/s13042-012-0082-z.
Chernyshova YS, Sheshkus AV, Arlazarov VV. Two-step CNN framework for text line recognition in camera-captured images. IEEE Access 2020; 8: 32587-32600. DOI: 10.1109/ACCESS.2020.2974051.
Alvear-Sandoval RF, Sancho-Gómez JL, Figueiras-Vidal AR. On improving CNNs performance: The case of MNIST. Inf Fusion 2019; 52: 106-109. DOI: 10.1016/j.inffus.2018.12.005.
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning (Still) requires rethinking generalization. Commun ACM 2021; 64(3): 107-115. DOI: 10.1145/3446776.
Bahi E, Zatni A. Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network. Multimed Tools Appl 2019; 78(18): 26453-26481. DOI: 10.1007/s11042-019-07855-z.
Rubner Y, Tomasi C, Guibas LJ. The earth mover's distance as a metric for image retrieval. Int J Comput Vis 2000; 40 (2): 99-121.
Elarian Y, Ahmad I, Awaida S, Al-Khatib W, Zidouri A. Arabic ligatures: Analysis and application in text recognition. 2015 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 896-900. DOI: 10.1109/ICDAR.2015.7333891.
Ilyuhin SA, Sheshkus AV, Arlazarov VL. Recognition of images of Korean characters using embedded networks. Twelfth Int Conf on Machine Vision (ICMV 2019) 2020; 114330: 1143311. DOI: 10.1117/12.2559453.
Kišš M, Hradiš M, Kodym O. Brno mobile OCR dataset. 2019 Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1352-1357. DOI: 10.1109/ICDAR.2019.00218.
Doush IA, AlKhateeb F, Gharibeh AH. Yarmouk arabic OCR dataset. 2018 8th Int Conf on Computer Science and Information Technology (CSIT) 2018: 150-154. DOI: 10.1109/CSIT.2018.8486162.
Mathew M, Singh AK, Jawahar CV. Multilingual OCR for Indic Scripts. 2016 12th IAPR Workshop on Document Analysis Systems (DAS) 2016: 186-191. DOI: 10.1109/DAS.2016.68.
Guo C-Y, Tang YY, Liu C-S, Duan J. A japanese OCR post-processing approach based on dictionary matching. Int Conf on Wavelet Analysis and Pattern Recognition 2013: 22-26. DOI: 10.1109/ICWAPR.2013.6599286.
Kissos I, Dershowitz N. OCR error correction using character correction and feature-based word classification. 12th IAPR Workshop on Document Analysis Systems (DAS) 2016: 198-203. DOI: 10.1109/DAS.2016.44.
Mei J, Islam A, Wu Y, Moh'd A, Milios EE. Statistical learning for OCR text correction. arXiv Preprint 2016. Source: <http://arxiv.org/abs/1611.06950>.
Bassil Y, Alwani M. OCR post-processing error correction algorithm using google online spelling suggestion. arXiv Preprint. Source: <https://arxiv.org/abs/1204.0191>.
Eutamene A, Kholladi MK, Belhadef H. Ontologies and bigram-based approach for isolated non-word errors correction in OCR system. Int J Electr Comput Eng 2015; 5(6): 1458-1467. DOI: 10.11591/ijece.v5i6.pp1458-1467.
Jean-Caurant A, Tamani N, Courboulay V, Burie JC. Lexicographical-based order for post-OCR correction of named entities. Int Conf on Document Analysis and Recognition (ICDAR) 2018: 1192-1197. DOI: 10.1109/ICDAR.2017.197.
Bulatov K, Manzhikov T, Slavin O, Faradjev I, Janiszewski I. Trigram-based algorithms for OCR result correction. Proc SPIE 2017; 10341: 103410O. DOI: 10.1117/12.2268559.
Fonseca Cacho JR, Taghva K. OCR post processing using support vector machines. In Book: Arai K, Kapoor S, Bhatia R, eds. Intelligent computing. Proceedings of the 2020 computing conference. Vol 2. Cham: Springer Nature Switzerland AG; 2020: 694-713. DOI: 10.1007/978-3-030-52246-9_51.
Bouchaffra D, Govindaraju V, Srihari SN. Postprocessing of recognized strings using nonstationary markovian models. IEEE Trans Pattern Anal Mach Intell 1999; 21(10): 990-999. DOI: 10.1109/34.799906.
Saluja R, Punjabi M, Carman M, Ramakrishnan G, Chaudhuri P. Sub-word embeddings for OCR corrections in highly fusional indic languages. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 160-165. DOI: 10.1109/ICDAR.2019.00034.
Llobet R, Navarro-Cerdan JR, Perez-Cortes JC, Arlandis J. OCR post-processing using weighted finite-state transducers. Int Conf on Pattern Recognition 2010: 2021-2024. DOI: 10.1109/ICPR.2010.498.
Bulatov KB, Nikolaev DP, Postnikov VV. General-purpose algorithm for text field OCR result post-processing based on validation grammars [In Russian]. Trudy Instituta Sistemnogo Analiza RAN 2015; 65(4): 68-73.
Sheshkus A, Nikolaev DP, Ingacheva A, Skoryukina N. Approach to recognition of flexible form for credit card expiration date recognition as example. Proc SPIE 2015; 9875: 98750R. DOI: 10.1117/12.2229534.
Wang K, Belongie S. Word spotting in the wild. In Book: Daniilidis K, Maragos P, Paragios N, eds. Computer vision – ECCV 2010. Berlin, Heidelberg: Springer-Verlag; 2010: 591-604. DOI: 10.1007/978-3-642-15549-9_43.
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. 2010 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2010: 2963-2970. DOI: 10.1109/CVPR.2010.5540041.
Felzenszwalb PF, Zabih R. Dynamic programming and graph algorithms in computer vision. IEEE Trans Pattern Anal Mach Intell 2011; 33(4): 721-740. DOI: 10.1109/TPAMI.2010.135.
Rubin TN, Chambers A, Smyth P, Steyvers M. Statistical topic models for multi-label document classification. Machine Learning 2011; 88(1): 157-208. DOI: 10.1007/s10994-011-5272-5.
Vorontsov KV. Additive regularization for topic models of text collections [In Russian]. Doklady Mathematics 2014; 89(3): 301-304. DOI: 10.1134/S1064562414020185.
Chen Q, Allot A, Lu Z. Keep up with the latest coronavirus research. Nature 2020; 579(7798): 193. DOI: 10.1038/d41586-020-00694-1.
Byun Y, Lee Y. Form classification using DP matching. ACM Symposium on Applied Computing 2000; 1: 1-4. DOI: 10.1145/335603.335611.
Peng HC, Long FH, Chi ZR, Siu W-C. Document image template matching based on component block list. Pattern Recognit Lett 2001; 22: 1033-1042. DOI: 10.1016/S0167-8655(01)00049-6.
Liang J, Doermann D, Ma M, Guo J. Page classification through logical Labeling. 2002 Int Conf on Pattern Recognition 2002; 3: 477-480. DOI: 10.1109/ICPR.2002.1047980.
Afzal MZ, Kölsch A, Ahmed S, Liwicki M. Cutting the error by half: Investigation of very deep CNN and advanced training strategies for document image classification. Int Conf on Document Analysis and Recognition 2017; 1: 883-888. DOI: 10.1109/ICDAR.2017.149.
RVL-CDIP-I Dataset. Source: <https://www.kaggle.com/nbhativp/first-half-training>.
NIST Special Database 2. Source: <https://www.nist.gov/srd/nist-special-database-2>.
Tobacco-3482. Source: <https://www.kaggle.com/patrickaudriaz/tobacco3482jpg>.
Rusiñol M, Frinken V, Karatzas D, Bagdanov AD, Lladós J. Multimodal page classification in administrative document image streams. Int J Doc Anal Recognit 2014; 17: 331-341. DOI: 10.1007/s10032-014-0225-8.
Jain R, Doermann D. Localized document image change detection. 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 786-790. DOI: 10.1109/icdar.2015.7333869.
Lopresti DP. A comparison of text-based methods for detecting duplication in scanned document databases. Inf Retr J 2001; 4: 153-173. DOI: 10.1023/A:1011471129047.
Lin Y, Li Y, Song Y, et al. Fast document image comparison in multilingual corpus without OCR. Multimed Syst 2017; 23: 315-324. DOI: 10.1007/s00530-015-0484-3.
Eglin V, Bres S. Document page similarity based on layout visual saliency: application to query by example and document classification. Seventh Int Conf on Document Analysis and Recognition 2003: 1208-1212. DOI: 10.1109/ICDAR.2003.1227849.
Liu L, Lu Y, Suen CY. Near-duplicate document image matching: A graphical perspective. Pattern Recognit 2014; 47(4): 1653-1663. DOI: 10.1016/j.patcog.2013.11.006.
Vitaladevuni S, Choi F, Prasad R, Natarajan P. Detecting near-duplicate document images using interest point matching. 21st Int Conf on Pattern Recognition (ICPR2012) 2012: 347-350.
Caprari RS. Duplicate document detection by template matching. Image Vis Comput 2000; 18(8): 633-643. DOI: 10.1016/s0262-8856(99)00086-4.
Lopresti DP. Models and algorithms for duplicate document detection. Fifth Int Conf on Document Analysis and Recognition, ICDAR '99 (Cat. No. PR00318) 1999: 297-300. DOI: 10.1109/ICDAR.1999.791783.
Ahmed AGH, Shafait F. Forgery detection based on intrinsic document contents. 11th IAPR Int Workshop on Document Analysis Systems 2014: 252-256. DOI: 10.1109/DAS.2014.26.
Beusekom J, Shafait F, Breuel TM. Document signature using intrinsic features for counterfeit detection. In Book: Srihari SN, Franke K, eds. Computational forensics. Berlin, Heidelberg: Springer-Verlag; 2008: 47-57. DOI: 10.1007/978-3-540-85303-9_5.
Sidere N, Cruz F, Coustaty M, Ogier JM. A dataset for forgery detection and spotting in document images. Seventh Int Conf on Emerging Security Technologies (EST) 2017: 26-31. DOI: 10.1109/EST.2017.8090394.
Ôn Vũ Ngoc M, Fabrizio J, Géraud T. Document detection in videos captured by smartphones using a saliency-based method. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019: 19-24. DOI: 10.1109/ICDARW.2019.30059.
Zhanzhan C, Jing L, Yi N, Shiliang P, Fei W, Shuigeng Z. You only recognize once: Towards fast video text spotting. 27th ACM Int Conf 2019: 855-863. DOI: 10.1145/3343031.3351093.
Deudon M, Kalaitzis A, Goytom I, Arefin MdR, Lin Z, Sankaran K, Michalski V, Kahou SE, Cornebise J, Bengio Y. HighRes-net: Multi-frame super-resolution by recursive fusion. ICLR 2020 Conf. Source: <https://openreview.net/forum?id=HJxJ2h4tPr>.
Cheng Z, Lu J, Xie J, Niu Y, Pu S, Wu F. Efficient video scene text spotting: Unifying detection, tracking, and recognition. arXiv Preprint 2019. Source: <https://arxiv.org/abs/1903.03299>.
Zhang S, Li P, Meng Y, Li L, Zhou Q, Fu X. A video deblurring algorithm based on motion vector and an encorder-decoder network. IEEE Access 2019; 7: 86778-86788. DOI: 10.1109/ACCESS.2019.2923759.
Fiscus JG. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER). IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings 1997: 347-354. DOI: 10.1109/ASRU.1997.659110.
Bulatov K, Arlazarov V, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document recognition in video stream. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 6: 39-44. DOI: 10.1109/ICDAR.2017.347.
Elhoushi M, Chen Z, Shafiq F, Tian YH, Li JY. DeepShift: Towards multiplication-less neural networks. arXiv Preprint 2020. Source: <https://https://arxiv.org/pdf/1905.13298.pdf>.
Trusov AV, Limonova EE, Slugin DG, Nikolaev DP, Arlazarov VV. Fast implementation of 4-bit convolutional neural networks for mobile devices. 2020 25th Int Conf on Pattern Recognition (ICPR) 2021: 9897-9903. DOI: 10.1109/ICPR48806.2021.9412841.
Li J, Wang Y, Liu B, Han Y, Li X-W. Simulate-the-hardware: training accurate binarized neural networks for low-precision neural accelerators. 24th Asia and South Pacific Design Automation Conf 2019: 323-328. DOI: 10.1145/3287624.3287628.
Sun X, Choi J, Chen C-Y, Wang N, Venkataramani S, Srinivasan VV, Cui X, Zhang W, Gopalakrishnan K. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Adv Neural Inf Process Syst 2019; 32: 4901-4909.
Phan AH, et al. Stable low-rank tensor decomposition for for compression of convolutional neural network. In Book: Vedaldi A, Bischof H, Brox T, Frahm J-M, eds. Computer Vision – ECCV 2020. Part XXIX. Cham: Springer Nature Switzerland AG; 2020: 522-539. DOI: 10.1007/978-3-030-58526-6_31.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20

Document image analysis and recognition: a survey V.V. Arlazarov 1,2, E.I. Andreeva 2, K.B. Bulatov 1,2, D.P. Nikolaev 3, O.O. Petrova 2, B.I. Savelev 2, O.A. Slavin 1

Document image analysis and recognition: a survey
V.V. Arlazarov^1,2, E.I. Andreeva², K.B. Bulatov^1,2, D.P. Nikolaev³, O.O. Petrova², B.I. Savelev², O.A. Slavin¹