(46-4) 08 * << * >> * Russian * English * Content * All Issues
  
Document image analysis and recognition: a survey
  V.V. Arlazarov 1,2, E.I. Andreeva 2, K.B. Bulatov 1,2, D.P. Nikolaev 3, O.O. Petrova 2, B.I. Savelev 2, O.A. Slavin 1
1 Federal Research Center "Computer Sciences and Control" Russian Academy of Sciences,
    117312, Moscow, Russia, prosp. 60-letiya Oktyabrya, 9;
    2 LLC "Smart Engines Service", 117312, Moscow, Russia, prosp. 60-letiya Oktyabrya, 9;
    3 Federal Publicly Funded Institution of Science, Institute for Information Transmission Problems
    n.a. A.A. Kharkevich of Russian Academy of Science, 127051, Moscow, Russia Bolshoy Karetny per. 19
  
 PDF, 1288 kB
  PDF, 1288 kB
DOI: 10.18287/2412-6179-CO-1020
Pages: 567-589.
Full text of article: English language.
 
Abstract:
This paper analyzes the  problems of document image recognition and the existing solutions. Document  recognition algorithms have been studied for quite a long time, but despite  this, currently, the topic is relevant and research continues, as evidenced by a  large number of associated publications and reviews. However, most of these  works and reviews are devoted to individual recognition tasks. In this review,  the entire set of methods, approaches, and algorithms necessary for document  recognition is considered. A preliminary systematization allowed us to distinguish  groups of methods for extracting information from documents of different types:  single-page and multi-page, with text and handwritten contents, with a fixed template  and flexible structure, and digitalized via different ways: scanning,  photographing, video recording. Here, we  consider methods of document recognition and analysis applied to a wide range  of tasks: identification and verification of identity, due diligence, machine  learning algorithms, questionnaires, and audits. The groups of methods  necessary for the recognition of a single page image are examined: the  classical computer vision algorithms, i.e., keypoints, local feature  descriptors, Fast Hough Transforms, image binarization, and modern neural  network models for document boundary detection, document classification, document  structure analysis, i.e., text blocks and tables localization, extraction and  recognition of the details, post-processing of recognition results. The review  provides a description of publicly available experimental data packages for  training and testing recognition algorithms. Methods for optimizing the  performance of document image analysis and recognition methods are described.
Keywords:
document recognition, image normalization, binarization, local features, segmentation, document boundary detection, artificial neural network, information extraction, document sorting, document comparison, video sequence recognition.
Citation:
  Arlazarov VV, Andreeva EI, Bulatov KB, Nikolaev DP, Petrova OO, Savelev BI, Slavin OA. Document image analysis and recognition: a survey. Computer Optics 2022; 46(4): 567-589. DOI: 10.18287/2412-6179-CO-1020.
Acknowledgements:
The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir  L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC "Smart Engines Service"), Yuliya Chernysheva (LLC "Smart Engines Service"), Yuliya Shemyakina (LLC "Smart Engines Service") for valuable comments and suggestions.
References:
  - Arlazarov V, Bulatov K,  Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and  recognition on mobile devices in video stream. Computer Optics 2019; 43(5):  818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
- Jaume  G, Ekenel HK, Thiran J. Funsd: A  dataset for form understanding in noisy scanned documents. Int Conf on Document  Analysis and Recognition Workshops (ICDARW) 2019; 2: 1-6. DOI:  10.1109/ICDARW.2019.10029. 
 
- Liu  L, Wang Z, Qiu T, Chen Q, Lu Y, Suen CY. Document image classification:  Progress over two decades. Neurocomputing 2021; 453: 223-240. DOI:  10.1016/j.neucom.2021.04.114.
 
- Baviskar  D, Ahirrao S, Potdar V, Kotecha K. Efficient automated processing of the  unstructured documents using artificial intelligence: A systematic literature  review and future directions. IEEE Access 2021; 9: 72894-72936. doi: 10.1109/ACCESS.2021.3072900.
 
- Hull  JJ. Document image skew detection: Survey and annotated bibliography. In Book: Hull JJ, Taylor SL, eds.  Document analysis systems II. London:  World Scientific Publishing Co; 1998: 40-64. DOI: 10.1142/9789812797704_0003.
 
- Rehman  A, Saba T. Document skew estimation and correction: Analysis of techniques,  common problems and possible solutions. Appl Artif Intell 2011; 25(9): 769-787.  DOI: 10.1080/08839514.2011.607009.
 
-     Chen  D, Luettin J, Shearer K. A survey of text detection and recognition in images  and videos. Institute Dalle Molle d'Intelligence Artificielle Perceptive  Research Report 2000: 00-38.
 
- Nagy  G. Twenty years of document analysis in PAMI. IEEE Trans Pattern Anal Mach  Intell 2000; 22(1): 38-62. DOI: 10.1109/34.824820.
 
- Mao  S, Rosenfeld A, Kanungo T. Document structure analysis algorithms: a literature  survey. Proc SPIE 2003; 5010: 197-207. DOI: 10.1117/12.476326.
 
- Doermann  D, Liang J, Li H. Progress in camera-based document image analysis. Seventh Int  Conf on Document Analysis and Recognition 2003; 1: 606-616. doi: 10.1109/ICDAR.2003.1227735.
 
- Zanibbi  R, Blostein D, Cordy J. A survey of table recognition. Int J Doc Anal Recognit  2004; 7: 1-16. DOI: 10.1007/s10032-004-0120-9.
 
- Jung  K, Kim K, Jain A. Text information extraction in images and video: A survey.  Pattern Recognit 2004; 37: 977-997. DOI: 10.1016/j.patcog.2003.10.012.
 
- Liang  J, Doermann D, Li H. Camera-based analysis of text and documents: a survey. Int  J Doc Anal Recognit 2005; 7: 84-104. DOI: 10.1007/s10032-004-0138-z.
 
- Marinai  S, Gori M, Soda G. Artificial neural networks for document analysis and  recognition. IEEE Trans Pattern Anal Mach Intell 2005; 27(1): 23-35. doi: 10.1109/TPAMI.2005.4.
 
- Chen  N, Blostein D. A survey of document image classification: problem statement,  classifier architecture and performance evaluation. Int J Doc Anal Recognit  2007; 10: 1-16. DOI: 10.1007/s10032-006-0020-2.
 
- Baharudin  B, et al. A review of machine learning algorithms for text-documents  classification. J Adv Inf Technol 2010; 1: 4-20.
 
- Dixit  U, Shirdhonkar M. A survey on document image analysis and retrieval system. Int  J Cybern Inform 2015; 4: 259-270. DOI: 10.5121/ijci.2015.4225.
 
- Eskenazi  S, Gomez-Krämer P, Ogier JM. A comprehensive survey of mostly textual document  segmentation algorithms since 2008. Pattern Recognit 2017; 64: 1-14.
 
- Binmakhashen  GM, Mahmoud SA. Document layout analysis: A comprehensive survey. ACM Comput  Surv 2019; 52(6): 109.
 
- Lombardi  F, Marinai S. Deep learning for historical document analysis and recognition–A  survey. J Imaging 2020; 6: 110. DOI: 10.3390/jimaging6100110.
 
- Bhatt  J, Hashmi KA, Afzal MZ, Stricker D. A survey of graphical page object detection  with deep neural networks. Appl Sci 2021; 11(12): 5344. DOI:  10.3390/app11125344.
 
- Doermann  D, Tombre K. Handbook of document image processing and recognition. Springer  Publishing Company Inc; 2014.
 
- Liu  CL, Lu Y, eds. Advances in chinese  document and text processing. World Scientific; 2017. ISBN: 978-981-3143-67-8.
 
- Fischer  A, Liwicki M, Ingold R. Handwritten historical document analysis, recognition,  and retrieval – state of the art and future trends. World Scientific Publishing  Co Pte Ltd; 2021.
 
- SJR.  Scimago Journal & Country Rank. Proc Int Conf on Document Analysis and  Recognition (ICDAR). Source: áhttps://www.scimagojr.com/journalsearch.php?q=75898&tip=sidñ.
 
- Bloomberg  DS, Kopec GE, Dasari L. Measuring document image skew and orientation. Proc  SPIE 1995; 2422: 302-316. DOI: 10.1117/12.205832.
 
- Steinherz T,  Intrator N, Rivlin E. Skew detection via principal components analysis. Proc  Fifth Int Conf on Document Analysis and Recognition. ICDAR '99 (Cat.  No. PR00318) 1999: 153-156. DOI: 10.1109/ICDAR.1999.791747.
 
- Bezmaternykh  P, Nikolaev DP. A document skew detection method using fast Hough transform.  Proc SPIE 2020; 114330: 114330J. DOI: 10.1117/12.2559069.
 
- Akhter  SSMN, Rege PP. Improving skew detection and correction in different document  images using a deep learning approach. 2020 11th Int Conf on Computing,  Communication and Networking Technologies (ICCCNT) 2020: 1-6. DOI:  10.1109/ICCCNT49239.2020.9225619.
 
- Papandreou  A, Gatos B, Louloudis G, Stamatopoulos N. ICDAR 2013 document image skew  estimation contest (DISEC 2013). 2013 12th Int Conf on Document Analysis and Recognition 2013: 1444-1448. DOI:  10.1109/ICDAR.2013.291.
 
- Fabrizio  J. A precise skew estimation algorithm for document images using KNN clustering  and fourier transform. 2014 IEEE Int Conf on Image Processing (ICIP) 2014:  2585-2588. DOI: 10.1109/ICIP.2014.7025523.
 
- Uchida  S, Taira E, Sakoe H. Nonuniform slant correction using dynamic programming.  Proc Sixth Int Conf on Document Analysis and Recognition 2001: 434-438. DOI:  10.1109/ICDAR.2001.953827.
 
- Otsu  N. Threshold selection method from gray-level histograms. IEEE Trans Syst Man  Cybern Syst 1979; SMC-9(1): 62-66. DOI: 10.1109/tsmc.1979.4310076.
 
- Lu  S, Su B, Tan CL. Document image binarization using background estimation and  stroke edges. Int J Doc Anal Recognit 2010; 13(4): 303-314. DOI:  10.1007/s10032-010-0130-8.
 
- Gatos  B, Pratikakis I, Perantonis SJ. Adaptive degraded document image binarization.  Pattern Recognit 2006; 39(3): 317-327. DOI: 10.1016/j.patcog.2005.09.010.
 
- Ershov  EI, Korchagin SA, Kokhan VV, Bezmaternykh PV. A generalization of Otsu method for linear  separation of two unbalanced classes in document image binarization. Computer  Optics 2021; 45(1): 66-76. DOI: 10.18287/2412-6179-CO-752.
 
- Calvo-Zaragoza  J, Gallego A-J. A selectional auto-encoder approach for document image  binarization. Pattern Recognit 2019; 86: 37-47. DOI:  10.1016/j.patcog.2018.08.011.
 
- Bezmaternykh  PV, Ilin DA, Nikolaev DP. U-Net-bin: hacking the document image binarization  contest. Computer Optics 2019; 43(5): 825-832. DOI:  10.18287/2412-6179-2019-43-5-825-832.
 
- Document image  binarization. Source:          áhttps://dib.cin.ufpe.brñ. 
 
- Skoryukina  N, Arlazarov V, Nikolaev D. Fast method of id documents location and type  identification for mobile and server application. IEEE Int Conf on Document  Analysis and Recognition (ICDAR) 2019: 850-857. DOI: 10.1109/ICDAR.2019.00141.
 
- Challenge  1: smartphone document capture  competition. Source: <https://sites.google.com/site/icdar15smartdoc/challenge-1>. 
 
- Schmid  C, Mohr R. Local grayvalue invariants for image retrieval. IEEE Trans Pattern  Anal Mach Intell 1997; 19(5): 530-535. DOI: 10.1109/34.589215.
 
- Harris  C, Stephens M. A combined corner and edge detector. Alvey Vision Conference  1988: 147-151. DOI: 10.5244/C.2.23.
 
- Rosten  E, Drummond T. Machine learning for high-speed corner detection. In Book:  Leonardis A, Bischof H, Pinz A, eds. Computer vision – ECCV 2006. Part 1. Berlin, Heidelberg:  Springer-Verlag; 2006: 430-443. DOI: 10.1007/11744023_34.
 
- Lowe  DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis  2004; 60(2): 91-110. DOI: 10.1023/B%3AVISI.0000029664.99615.94.
 
- Lepetit  V, Fua P. Towards recognizing feature points using classification trees.  Technical report, Swiss Federal Institute of Technology (EPFL), 2004. Source: <https://infoscience.epfl.ch/record/52666>. 
 
- Bay  H, EssTinne A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput  Vis Image Underst 2008; 110(3): 346-359. DOI: 10.1016/j.cviu.2007.09.014.
 
- Rosin  PL. Measuring corner properties. Comput Vis Image Underst 1999; 73(2): 291-307.  DOI: 10.1006/cviu.1998.0719.
 
- Leutenegger  S, Chli M, Siegwart RY. BRISK: Binary robust invariant scalable keypoints. IEEE  Int Conf on Computer Vision (ICCV) 2011: 2548-2555. DOI:  10.1109/ICCV.2011.6126542.
 
- Zhang  H, Wohlfeil J, Grießbach D. Extension and evaluation of the AGAST feature  detector. ISPRS Ann Photogramm Remote Sens Spat Inf Sci 2016; III(4): 133-137.  DOI: 10.5194/isprsannals-III-4-133-2016.
 
- Verma  R, Kaur R. Enhanced character recognition using surf feature and neural network  technique. Int J Comput Sci Inf Technol Res 2014; 5(4): 5565-5570.
 
- Dang  OB, Coustaty M, Luqman MMM, Ogier J-M. A  comparison of local features for camera-based document image retrieval and  spotting. Int J Doc Anal Recognit 2019; 22: 247-263. DOI:  10.1007/s10032-019-00329-w.
 
- Lewis  D, Agam G, Argamon S, Frieder O, Grossman D. Building a test collection for  complex document information processing. Proc 29th Annual Int ACM SIGIR conf on Research and development     in information retrieval  (SIGIR '06) 2006: 665-666. DOI: 10.1145/1148170.1148307.
 
- Bulatov  K, Matalov D, Arlazarov VV. MIDV-2019: Challenges of the modern mobile-based  document OCR. Proc SPIE 2019; 11433: 114332N. DOI: 10.1117/12.2558438.
 
- University of California,  San Francisco:  The Legacy Tobacco Document Library (LTDL) 2007. Source: <http://legacy.library.ucsf.edu>. 
 
- Zhang  Z, He L-W. Whiteboard scanning and image enhancement. Digit Signal Process  2007; 17(2): 414-432. DOI: 10.1016/j.dsp.2006.05.006.
 
- Liu  N, Wang L. Dynamic detection of an object framework in a mobile device captured  image. US Patent 10134163 of November 20, 2018.
 
- Hartl  A, Reitmayr G. Rectangular target extraction for mobile augmented reality  applications. The 21st Int Conf on Pattern Recognition (ICPR 2012) 2012:  81-84.
 
- Skoryukina N,  Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on  mobile devices. Proc SPIE 2014; 9445: 94452A. DOI: 10.1117/12.2181377.
 
- Tropin  DV, Ilyuhin SA, Nikolaev DP, Arlazarov VV. Approach for document detection by  contours and contrasts. IEEE Int Conf on Pattern Recognition (ICPR) 2020:  9689-9695. DOI: 10.1109/ICPR48806.2021.9413271.
 
- Hua G, Liu Z,  Zhang Z, Wu Y. Automatic business card scanning with a camera. IEEE Int Conf on  Image Processing (ICIP) 2006: 373-376. DOI: 10.1109/ICIP.2006.312471.
 
- Xu  Y, Carlinet E, Géraud T, Najman L. Hierarchical segmentation using tree-based  shape spaces. IEEE Trans Pattern Anal Mach Intell 2017; 39(3): 457-469. DOI:  10.1109/TPAMI.2016.2554550.
 
- Attivissimo  F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity  documents. IEEE Int Conf on Systems, Man and Cybernetics (SMC) 2019: 3525-3530.  DOI: 10.1109/SMC.2019.8914438.
 
- Castelblanco  A, Solano J, Lopez C, Rivera E, Tengana L, Ochoa M. Machine learning techniques  for identity document verification in uncontrolled environments: A case study.  Springer Mexican     Conference on Pattern Recognition (MCPR) 2020: 271-281. DOI:  10.1007/978-3-030-49076-8_26.
 
- Sheshkus A,  Nikolaev D, Arlazarov VL. Houghencoder: neural network architecture for  document image semantic segmentation. IEEE Int Conf on Image Processing (ICIP)  2020: 1946-1950. DOI: 10.1109/ICIP40778.2020.9191182.
 
- Javed  K, Shafait F. Real-time document localization in natural images by recursive  application of a CNN. IEEE IAPR Int Conf on Document Analysis and Recognition  (ICDAR) 2017: 105-110. DOI: 10.1109/ICDAR.2017.26.
 
- das  Neves RB, Felipe Verçosa L, Macêdo D, Dantas Bezerra BL, Zanchettin C. A fast  fully octave convolutional neural network for document image segmentation. IEEE  Int Joint Conf on Neural Networks (IJCNN) 2020: 1-6. DOI:  10.1109/IJCNN48605.2020.9206711.
 
- Viola  P, Jones M. Robust real-time object detection. Int J Comput Vis 2002; 57: 137-154.
 
- Usilin  S, Nikolaev D, Postnikov V, Schaefer G. Visual appearance based document image  classification. 2010 IEEE Int Conf on Image Processing 2010: 2133-2136. DOI:  10.1109/ICIP.2010.5652024.
 
- Roy PP, Pal U, Llados J. Seal detection  and recognition: an approach for  document indexing. 10th Int Conf on Document Analysis and Recognition 2009:  101-105. DOI: 10.1109/ICDAR.2009.128.
 
- Wang  Y, Zhou Y, Tang Z. Comic frame extraction via line segments combination. 13th  Int Conf on Document Analysis and Recognition (ICDAR) 2015: 856-860. DOI:  10.1109/ICDAR.2015.7333883.
 
- Povolotskiy  MA, Tropin DV. Dynamic programming approach to template-based OCR. Proc SPIE  2019; 11041: 110411T. DOI: 10.1117/12.2522974.
 
- Slavin OA.  Using special text points in the recognition of documents. In Book: Kravets AG,  Bolshakov AA, Shcherbakov MV, eds. Cyber-physical systems: Advances in design  & modelling. Cham:     Springer Nature Switzerland AG; 2020: 43-53. DOI:  10.1007/978-3-030-32579-4_4.
 
- Shafait  F, Breuel TM. The effect of border noise on the performance of projection-based  page segmentation methods. IEEE Trans Pattern Anal Mach Intell 2011; 33(4):  846-851. DOI: 10.1109/TPAMI.2010.194.
 
- Melinda  L, Ghanapuram R, Bhagvati C. Document layout analysis using multigaussian  fitting. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017:  747-752. DOI: 10.1109/ICDAR.2017.127.
 
- Yi X, Gao L,  Liao Y, Zhang X, Liu R, Jiang Z. CNN based page object detection in document  images. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017:  230-235. DOI: 10.1109/ICDAR.2017.46.
 
- Kosaraju SC, Masum M, Tsaku NZ, Patel P,  Bayramoglu T, Modgil G, Kang M. DoT-Net: Document layout classification using  texture-based CNN. Int Conf on Document Analysis and Recognition (ICDAR) 2019:  1029-1034. DOI: 10.1109/ICDAR.2019.00168.
 
- He  D, Cohen S, Price B, Kifer D, Giles CL. Multi-scale multi-task FCN for semantic  page segmentation and table detection. 14th IAPR Int Conf on Document Analysis  and Recognition (ICDAR) 2017: 254-261. DOI: 10.1109/ICDAR.2017.50.
 
- Wu  Y, Wang W, Palaiahnakote S, Lu T. A robust symmetry-based method for  scene/video text detection through neural network. 14th IAPR Int Conf on  Document Analysis and Recognition (ICDAR) 2017: 1249-1254. DOI:  10.1109/ICDAR.2017.206.
 
- Antonacopoulos  A, Bridson D, Papadopoulos C, Pletschacher S. A realistic dataset for  performance evaluation of document layout analysis. 10th Int Conf on Document  Analysis and Recognition 2009: 296-300. DOI: 10.1109/ICDAR.2009.271.
 
- Veit  A, Matera T,  Neumann L, Matas J, Belongie S. COCO-Text: Dataset and benchmark for text  detection and recognition in natural images. arXiv Preprint 2016. Source: <https://arxiv.org/abs/1601.07140>. 
 
- Brunessaux  S, Giroux P, Grilheres B, Manta M, Bodin M, Choukri K, Galibert O, Kahn J. The  Maurdor Project: Improving automatic processing of digital documents. 11th IAPR  Int Workshop on Document Analysis Systems 2014: 349-354. DOI: 10.1109/DAS.2014.58.
 
- Soares AS, Neves RB, Bezerra BLD. BID  Dataset: a challenge dataset for document processing tasks. Conf on Graphics,  Patterns and images (sibgrapi) 2020. DOI:  10.5753/sibgrapi.est.2020.12997.
 
- Göbel M, Hassan  T, Oro E, Orsi G. ICDAR 2013 table competition. 12th Int Conf on Document  Analysis and Recognition 2013: 1449-1453. DOI: 10.1109/ICDAR.2013.292.
 
- Gao  L, Yi X, Jiang Z, Hao L, Tang Z. ICDAR 2017 competition on page object  detection. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR)  2017; 1: 1417-1422. DOI: 10.1109/ICDAR.2017.231.
 
- Gao  L, et al. ICDAR 2019 competition on table detection and recognition (cTDaR).  Int Conf on Document Analysis and Recognition (ICDAR) 2019: 1510-1515. DOI:  10.1109/ICDAR.2019.00243.
 
- Costa  e Silva A, Jorge AM, Torgo L. Design of an end-to-end method to extract  information from tables. Int J Doc Anal Recognit 2006; 8: 144-171. DOI:  10.1007/s10032-005-0001-x.
 
- Shafait  F, Smith R. Table detection in heterogeneous documents. 9th IAPR Int Workshop  on Document Analysis Systems 2010: 65-72. DOI: 10.1145/1815330.1815339.
 
- Zhong  X, ShafieiBavani E, Yepes AJ. Image-based table recognition: data, model, and  evaluation. arXiv Preprint 2019. Source: <https://arxiv.org/abs/1911.10683>. 
 
- Lewis  D, Agam G, Argamon S, Frieder O, Grossman D, Heard J. Building a test  collection for complex document information processing. 29th Annual Int ACM  SIGIR conf on Research and development in Information Retrieval  2006: 665-666. DOI: 10.1145/1148170.1148307.
 
- Shahab  A, Shafait F, Kieninger T, Dengel A. An open approach towards the benchmarking  of table structure recognition systems. 9th IAPR Int Workshop on Document  Analysis Systems 2010: 113-120. DOI: 10.1145/1815330.1815345.
 
- Fang  J, Tao X, Tang Z, Qiu R, Liu Y. Dataset, ground-truth and performance metrics  for table detection evaluation. 10th IAPR Int Workshop on Document Analysis  Systems 2012: 445-449. DOI: 10.1109/DAS.2012.29.
 
- Seo W, Koo HI,  Cho NI. Junction-based table detection in camera-captured document images. Int  J Doc Anal Recognit 2014; 18(1): 47-57. DOI: 10.1007/s10032-014-0226-7.
 
- Siddiqui  SA, Fateh IA,  Rizvi STR, Dengel A, Ahmed S. DeepTabStR: Deep learning based table structure  recognition. Int Conf on Document Analysis and Recognition (ICDAR) 2019:  1403-1409. DOI: 10.1109/ICDAR.2019.00226.
 
- Huang  Z, Chen K, He J, Bai X, Karatzas D, Lu S, Jawahar CV. ICDAR 2019 competition on  scanned receipt ocr and information extraction. Int Conf on Document Analysis  and Recognition (ICDAR) 2019: 1516-1520. DOI: 10.1109/ICDAR.2019.00244.
 
- Mondal A, Lipps  P, Jawahar CV. IIIT-AR-13K: A new dataset for graphical object detection in  documents. In Book: Bai X, Karatzas D, Lopresti D, eds. Document analysis  systems. Cham: Springer     International Publishing; 2020: 216-230. DOI:  10.1007/978-3-030-57058-3_16.
 
- Jia  F, Shi C, Wang Y, Wang C, Xiao B. Grayscale-projection based optimal character  segmentation for camera-captured faint text recognition. 2017 Int Conf on  Document Analysis and Recognition 2017: 1301-1306. DOI: 10.1109/ICDAR.2017.214.
 
- Roy PP, Pal U,  Lladós J, Delalandre M. Multi-oriented touching text character segmentation in  graphical documents using dynamic programming. Pattern Recognit 2012; 45(5):  1972-1983. DOI: 10.1016/j.patcog.2011.09.026.
 
- Saba  T, Rehman A. Effects of artificially intelligent tools on pattern recognition. Int  J Mach Learn Cybern 2013; 4: 155-162. DOI: 10.1007/s13042-012-0082-z.
 
- Chernyshova  YS, Sheshkus AV, Arlazarov VV. Two-step CNN framework for text line recognition  in camera-captured images. IEEE Access 2020; 8: 32587-32600. DOI:  10.1109/ACCESS.2020.2974051.
 
- Alvear-Sandoval  RF, Sancho-Gómez JL, Figueiras-Vidal   AR. On improving CNNs  performance: The case of MNIST. Inf Fusion 2019; 52: 106-109. DOI:  10.1016/j.inffus.2018.12.005.
 
- Zhang C, Bengio S, Hardt M, Recht B,  Vinyals O. Understanding deep learning (Still) requires rethinking generalization.  Commun ACM 2021; 64(3): 107-115. DOI: 10.1145/3446776.
 
- Bahi E, Zatni A. Text recognition in  document images obtained by a smartphone based on deep convolutional and  recurrent neural network. Multimed Tools Appl 2019; 78(18): 26453-26481. DOI:  10.1007/s11042-019-07855-z.
 
- Rubner Y, Tomasi C, Guibas LJ. The  earth mover's distance as a metric for image retrieval. Int J Comput Vis 2000;  40 (2): 99-121.
 
- Elarian Y, Ahmad I, Awaida S,  Al-Khatib W, Zidouri A. Arabic ligatures: Analysis and application in text  recognition. 2015 13th Int Conf on Document Analysis and Recognition (ICDAR)  2015: 896-900. DOI: 10.1109/ICDAR.2015.7333891.
 
- Ilyuhin SA, Sheshkus AV, Arlazarov  VL. Recognition of images of Korean characters using embedded networks. Twelfth  Int Conf on Machine Vision (ICMV 2019) 2020; 114330: 1143311. DOI:  10.1117/12.2559453.
 
- Kišš M, Hradiš M, Kodym O. Brno  mobile OCR dataset. 2019 Int Conf on Document Analysis and Recognition (ICDAR)  2019: 1352-1357. DOI: 10.1109/ICDAR.2019.00218.
 
- Doush IA, AlKhateeb F, Gharibeh AH. Yarmouk  arabic OCR dataset. 2018 8th Int Conf on Computer Science and Information Technology  (CSIT) 2018: 150-154. DOI: 10.1109/CSIT.2018.8486162.
 
- Mathew M, Singh AK,  Jawahar CV. Multilingual OCR for Indic Scripts. 2016 12th IAPR Workshop on Document  Analysis Systems (DAS) 2016: 186-191. DOI: 10.1109/DAS.2016.68.
 
- Guo C-Y, Tang YY, Liu C-S, Duan J. A  japanese OCR post-processing approach based on dictionary matching. Int Conf on  Wavelet Analysis and Pattern Recognition 2013: 22-26. DOI:  10.1109/ICWAPR.2013.6599286.
 
- Kissos I, Dershowitz N. OCR error  correction using character correction and feature-based word classification. 12th  IAPR Workshop on Document Analysis Systems (DAS) 2016: 198-203. DOI:  10.1109/DAS.2016.44.
 
- Mei J, Islam A, Wu Y, Moh'd A,  Milios EE. Statistical learning for OCR text correction. arXiv Preprint 2016.  Source: <http://arxiv.org/abs/1611.06950>. 
 
- Bassil Y, Alwani M. OCR  post-processing error correction algorithm using google online spelling  suggestion. arXiv Preprint. Source: <https://arxiv.org/abs/1204.0191>. 
 
- Eutamene A, Kholladi MK, Belhadef H.  Ontologies and bigram-based approach for isolated non-word errors correction in  OCR system. Int J Electr Comput Eng 2015; 5(6): 1458-1467. DOI:  10.11591/ijece.v5i6.pp1458-1467.
 
- Jean-Caurant A, Tamani N, Courboulay  V, Burie JC. Lexicographical-based order for post-OCR correction of named  entities. Int Conf on Document Analysis and Recognition (ICDAR) 2018: 1192-1197.  DOI: 10.1109/ICDAR.2017.197.
 
- Bulatov K, Manzhikov T, Slavin O,  Faradjev I, Janiszewski I. Trigram-based algorithms for OCR result correction. Proc  SPIE 2017; 10341: 103410O. DOI: 10.1117/12.2268559.
 
- Fonseca Cacho JR, Taghva K. OCR post  processing using support vector machines. In Book: Arai K, Kapoor S, Bhatia R,  eds. Intelligent computing. Proceedings of the 2020 computing conference. Vol  2. Cham: Springer Nature Switzerland AG; 2020: 694-713. DOI:  10.1007/978-3-030-52246-9_51.
 
- Bouchaffra D, Govindaraju V, Srihari  SN. Postprocessing of recognized strings using nonstationary markovian models.  IEEE Trans Pattern Anal Mach Intell 1999; 21(10): 990-999. DOI: 10.1109/34.799906.
 
- Saluja R, Punjabi M, Carman M,  Ramakrishnan G, Chaudhuri P. Sub-word embeddings for OCR corrections in highly  fusional indic languages. Int Conf on Document Analysis and Recognition (ICDAR)  2019: 160-165. DOI: 10.1109/ICDAR.2019.00034.
 
- Llobet R, Navarro-Cerdan JR,  Perez-Cortes JC, Arlandis J. OCR post-processing using weighted finite-state  transducers. Int Conf on Pattern Recognition 2010: 2021-2024. DOI:  10.1109/ICPR.2010.498.
 
- Bulatov KB,  Nikolaev DP, Postnikov VV. General-purpose algorithm for text field OCR result  post-processing based on validation grammars [In Russian]. Trudy Instituta  Sistemnogo Analiza RAN 2015; 65(4): 68-73.
 
- Sheshkus A, Nikolaev DP, Ingacheva  A, Skoryukina N. Approach to recognition of flexible form for credit card  expiration date recognition as example. Proc SPIE 2015; 9875: 98750R. DOI:  10.1117/12.2229534.
 
- Wang K, Belongie S. Word spotting in  the wild. In Book: Daniilidis K, Maragos P, Paragios N, eds. Computer vision –  ECCV 2010. Berlin, Heidelberg: Springer-Verlag; 2010: 591-604.  DOI: 10.1007/978-3-642-15549-9_43.
 
- Epshtein B,  Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform.  2010 IEEE Computer Society Conf on Computer Vision and Pattern Recognition  2010: 2963-2970. DOI: 10.1109/CVPR.2010.5540041.
 
- Felzenszwalb PF, Zabih R. Dynamic  programming and graph algorithms in computer vision. IEEE Trans Pattern Anal  Mach Intell 2011; 33(4): 721-740. DOI: 10.1109/TPAMI.2010.135.
 
- Rubin TN, Chambers A, Smyth P, Steyvers M.  Statistical topic models for multi-label document classification. Machine Learning  2011; 88(1): 157-208. DOI: 10.1007/s10994-011-5272-5.
 
- Vorontsov KV.  Additive regularization for topic models of text collections [In Russian].  Doklady Mathematics 2014; 89(3): 301-304. DOI: 10.1134/S1064562414020185.
 
- Chen Q, Allot A, Lu Z. Keep up with  the latest coronavirus research. Nature 2020; 579(7798): 193. DOI:  10.1038/d41586-020-00694-1.
 
- Byun Y, Lee Y. Form classification  using DP matching. ACM Symposium on Applied Computing 2000; 1: 1-4. DOI: 10.1145/335603.335611.
 
- Peng HC, Long FH, Chi ZR, Siu W-C.  Document image template matching based on component block list. Pattern  Recognit Lett 2001; 22: 1033-1042. DOI: 10.1016/S0167-8655(01)00049-6.
 
- Liang J, Doermann D, Ma M, Guo J.  Page classification through logical Labeling. 2002 Int Conf on Pattern Recognition  2002; 3: 477-480. DOI: 10.1109/ICPR.2002.1047980.
 
- Afzal MZ, Kölsch A, Ahmed S, Liwicki  M. Cutting the error by half: Investigation of very deep CNN and advanced  training strategies for document image classification. Int Conf on Document  Analysis and Recognition 2017; 1: 883-888. DOI: 10.1109/ICDAR.2017.149.
 
- RVL-CDIP-I Dataset. Source: <https://www.kaggle.com/nbhativp/first-half-training>. 
 
- NIST Special Database 2. Source: <https://www.nist.gov/srd/nist-special-database-2>. 
 
- Tobacco-3482. Source:  <https://www.kaggle.com/patrickaudriaz/tobacco3482jpg>. 
 
- Rusiñol M, Frinken V, Karatzas D,  Bagdanov AD, Lladós J. Multimodal page classification in administrative document  image streams. Int J Doc Anal Recognit 2014; 17: 331-341. DOI:  10.1007/s10032-014-0225-8.
 
- Jain R, Doermann D. Localized  document image change detection. 13th Int Conf on Document Analysis and Recognition  (ICDAR) 2015: 786-790. DOI: 10.1109/icdar.2015.7333869.
 
- Lopresti DP. A  comparison of text-based methods for detecting duplication in scanned document  databases. Inf Retr J 2001; 4: 153-173. DOI: 10.1023/A:1011471129047.
 
- Lin Y, Li Y, Song Y, et al. Fast  document image comparison in multilingual corpus without OCR. Multimed Syst  2017; 23: 315-324. DOI: 10.1007/s00530-015-0484-3.
 
- Eglin V, Bres S. Document page  similarity based on layout visual saliency: application to query by example and  document classification. Seventh Int Conf on Document Analysis and Recognition  2003: 1208-1212. DOI: 10.1109/ICDAR.2003.1227849.
 
- Liu L, Lu Y,  Suen CY. Near-duplicate document image matching: A graphical perspective.  Pattern Recognit 2014; 47(4): 1653-1663. DOI: 10.1016/j.patcog.2013.11.006.
 
- Vitaladevuni S, Choi F, Prasad R,  Natarajan P. Detecting near-duplicate document images using interest point  matching. 21st Int Conf on Pattern Recognition (ICPR2012) 2012: 347-350.
 
- Caprari RS. Duplicate document  detection by template matching. Image Vis  Comput 2000; 18(8): 633-643. DOI: 10.1016/s0262-8856(99)00086-4.
 
- Lopresti DP. Models and algorithms  for duplicate document detection. Fifth Int Conf on Document Analysis and  Recognition, ICDAR '99 (Cat. No. PR00318) 1999: 297-300. DOI:  10.1109/ICDAR.1999.791783.
 
- Ahmed AGH, Shafait F. Forgery  detection based on intrinsic document contents. 11th IAPR Int Workshop on  Document Analysis Systems 2014: 252-256. DOI: 10.1109/DAS.2014.26.
 
- Beusekom J, Shafait F, Breuel TM.  Document signature using intrinsic features for counterfeit detection. In Book:  Srihari SN, Franke K, eds. Computational forensics. Berlin,  Heidelberg:  Springer-Verlag; 2008: 47-57. DOI: 10.1007/978-3-540-85303-9_5.
 
- Sidere N, Cruz F, Coustaty M, Ogier  JM. A dataset for forgery detection and spotting in document images. Seventh  Int Conf on Emerging Security Technologies (EST) 2017: 26-31. DOI:  10.1109/EST.2017.8090394.
 
- Ôn Vũ Ngoc M, Fabrizio J, Géraud T.  Document detection in videos captured by smartphones using a saliency-based  method. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019:  19-24. DOI: 10.1109/ICDARW.2019.30059.
 
- Zhanzhan C, Jing L, Yi N, Shiliang  P, Fei W, Shuigeng Z. You only recognize once: Towards fast video text spotting.  27th ACM Int Conf 2019: 855-863. DOI: 10.1145/3343031.3351093.
 
- Deudon M, Kalaitzis A, Goytom I,  Arefin MdR, Lin Z, Sankaran K, Michalski V, Kahou SE, Cornebise J, Bengio Y.  HighRes-net: Multi-frame  super-resolution by recursive fusion. ICLR 2020 Conf. Source: <https://openreview.net/forum?id=HJxJ2h4tPr>.
 
- Cheng Z, Lu J, Xie J, Niu Y, Pu S,  Wu F. Efficient video scene text spotting: Unifying detection, tracking, and recognition.  arXiv Preprint 2019. Source: <https://arxiv.org/abs/1903.03299>.
 
- Zhang S, Li P, Meng Y, Li L, Zhou Q,  Fu X. A video deblurring algorithm based on motion vector and an encorder-decoder  network. IEEE Access 2019; 7: 86778-86788. DOI: 10.1109/ACCESS.2019.2923759.
 
- Fiscus JG. A post-processing system  to yield reduced word error rates: Recognizer output voting error reduction  (ROVER). IEEE Workshop on Automatic Speech Recognition and Understanding  Proceedings 1997: 347-354. DOI: 10.1109/ASRU.1997.659110.
 
- Bulatov K,  Arlazarov V, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document  recognition in video stream. 14th IAPR Int Conf on Document Analysis and  Recognition (ICDAR) 2017; 6: 39-44. DOI: 10.1109/ICDAR.2017.347.
 
- Elhoushi M, Chen Z, Shafiq F, Tian  YH, Li JY. DeepShift: Towards multiplication-less neural networks. arXiv  Preprint 2020. Source: <https://https://arxiv.org/pdf/1905.13298.pdf>.
 
- Trusov AV, Limonova EE, Slugin DG,  Nikolaev DP, Arlazarov VV. Fast implementation of 4-bit convolutional neural  networks for mobile devices. 2020 25th Int Conf on Pattern Recognition (ICPR)  2021: 9897-9903. DOI: 10.1109/ICPR48806.2021.9412841.
 
- Li J, Wang Y, Liu B, Han Y, Li X-W.  Simulate-the-hardware: training accurate binarized neural networks for  low-precision neural accelerators. 24th Asia  and South Pacific Design Automation Conf 2019: 323-328. DOI: 10.1145/3287624.3287628.
 
- Sun X, Choi J, Chen C-Y, Wang N,  Venkataramani S, Srinivasan VV, Cui X, Zhang W, Gopalakrishnan K. Hybrid 8-bit  floating point (HFP8) training and inference for deep neural networks. Adv  Neural Inf Process Syst 2019; 32: 4901-4909.     
    
- Phan AH, et al. Stable low-rank tensor  decomposition for for compression of convolutional neural network. In Book: Vedaldi  A, Bischof H, Brox T, Frahm J-M, eds. Computer Vision – ECCV 2020. Part XXIX.  Cham: Springer Nature Switzerland AG; 2020: 522-539. DOI: 10.1007/978-3-030-58526-6_31.
      
      
    
  
  © 2009, IPSI RAS
  151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20