(49-6) 21 * << * >> * Русский * English * Содержание * Все выпуски

Fast localization and rectification of documents folded into thirds
A. Ershov 1,2, D. Tropin 1,3, D. Nikolaev 1,3

Smart Engines Service LLC,
Prospekt 60-letiia Oktiabria 9, Moscow, 117312, Russia;
Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute),
Bolshoy Karetny per. 19, build.1, Moscow, 127051, Russia;
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences,
Prospekt 60-letiia Oktiabria 9, Moscow, 119333, Russia

  PDF, 6909 kB

DOI: 10.18287/COJ1755

Страницы: 1049-1060

Язык статьи: English.

Аннотация:
The ubiquitous usage of smartphones makes camera-captured document images as widely used as scanned ones as the input of a modern document recognition system. A document captured by a smartphone camera may appear mechanically distorted in the image creating the need for an image rectification step. The present paper considers a particular case of document image distortions. Specifically, if a business document is sent via postal service, it may need to be folded to fit the envelope. Once the document is taken out of the envelope and unfolded, its geometric shape is distorted in a very particular pattern. Since the most popular envelope formats in Europe and America require the document to be folded into thirds, this case is considered in this paper. We propose a novel content-independent model-based algorithm for the localization and geometrical rectification of documents folded into thirds. Our algorithm outperforms current SOTA rectification methods on the recently published dataset FDI by key rectification accuracy metrics (AD and CER) and is able to rectify documents held in hand. Moreover, it can be executed on a mobile CPU and has a reasonable execution time: it takes only about 17 ms to localize a document and about 110 ms to projectively rectify it. So it makes it possible to embed the proposed algorithm into document recognition systems designed for on-device acquisition.

Ключевые слова:
Folded documents, document rectification, document unwarping, on-device acquisition.

Citation:
Ershov A., Tropin D., Nikolaev D. Fast localization and rectification of documents folded into thirds. Computer Optics 2026; 49(6): 1049-1060. DOI: 10.18287/COJ1755.

References:

  1. Liang J, Doermann D, Li H. Camera-based analysis of text and documents: a survey. International Journal of Document Analysis and Recognition (IJDAR) 2005; 7: 84-104. DOI: 10.1007/s10032-004-0138-z.
  2. Slavin OA, Arlazarov VL. Method for classifying recognized pages of administrative documents on the basis of text key points. Trudy ISA RAN (Proceedings of ISA RAS) 2018; 68 (S1): 32-42. DOI: 10.14357/20790279180504.
  3. Doermann D, Liang J, Li H. Progress in camera-based document image analysis. In: Seventh International Conference on Document Analysis and Recognition (ICDAR2003): 606-616. IEEE(2003). DOI: 10.1109/ICDAR.2003.1227735.
  4. Burie J, Chazalon J, Coustaty M, Eskenazi S, Luqman MM, Mehri M, Nayef N, Ogier J, Prum S, Rusiñol M. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). In: 13th International Conference on Document Analysis and Recognition (ICDAR2015): 1161-1165. IEEE(2015). DOI: 10.1109/ICDAR.2015.7333943.
  5. Xue C, Tian Z, Zhan F, Lu S, Bai S. Fourier Document Restoration for Robust Document Dewarping and Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR2022): 4573–4582. IEEE(2022). DOI: 10.1109/CVPR52688.2022.00453.
  6. Ma K, Shu Z, Bai X, Wang J, Samaras D. Docunet: Document image unwarping via a stacked u-net. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR2018): 4700–4709. IEEE(2018). DOI: 10.1109/CVPR.2018.00494.
  7. Zhang Z, He L. Whiteboard scanning and image enhancement. Digital signal processing 2007; 17 (2): 414–432. DOI: 10.1016/j.dsp.2006.05.006.
  8. ISO 216:2007. Writing paper and certain classes of printed matter — Trimmed sizes — A and B series, and indication of machine direction. Geneva, Switzerland: ISO; 2007
  9. DIN 678-1-1998, Envelopes - Part 1: Sizes. Berlin, Germany: German institute for standardization; 1998
  10. GOST R 51506-99, Post envelopes. Technical requirements. Control methods [In Russian]. Moscow, Russia: Federal Agency for Technical Regulation and Metrology of the Russian Federation; 1999
  11. ASME Y14.1-2020, Drawing Sheet Size and Format. New York City, U.S.: American Society of Mechanical Engineers; 2020
  12. Ershov AM, Tropin DV, Limonova EE, Nikolaev DP, Arlazarov VV. Unfolder: Fast localization and image rectification of a document with a crease from folding in half. Computer Optics 2024; 48 (4): 542–553. DOI: 10.18287/2412-6179-CO-1406.
  13. Das S, Mishra G, Sudharshana A, Shilkrot R. The common fold: utilizing the four-fold to dewarp printed documents from a single image. In: Proceedings of the 2017 ACM Symposium on Document Engineering: 125–128. ACM(2017). DOI: 10.1145/3103010.3121030.
  14. Dambrogio J, Ghassaei A, Smith DS, Jackson H, Demaine ML, Davis G, Mills D, Ahrendt R, Akkerman N, Van der Linden D and others. Unlocking history through automated virtual unfolding of sealed documents imaged by X-ray microtomography. Nature communications 2021; 12 (1): 1184. DOI: 10.1038/s41467-021-21326-w.
  15. Coons SA. Surfaces for computer-aided design of space forms. 1967; (MIT-LCS-TR-041MAC-TR-041).
  16. Brown MS, Seales WB. Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents. In: Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV2001): 367–374. IEEE(2001). DOI: 10.1109/ICCV.2001.937649.
  17. Brown MS, Seales WB. Image restoration of arbitrarily warped documents. IEEE Transactions on pattern analysis and machine intelligence 2004; 26 (10): 1295–1306. DOI: 10.1109/TPAMI.2004.87.
  18. Chua KB, Zhang L, Zhang Y, Tan CL. A fast and stable approach for restoration of warped document images. In: Eighth International Conference on Document Analysis and Recognition (ICDAR2005): 384–388. IEEE(2005). DOI: 10.1109/ICDAR.2005.8.
  19. Sun M, Yang R, Yun L, Landon G, Seales B, Brown MS. Geometric and photometric restoration of distorted documents. In: Tenth IEEE International Conference on Computer Vision (ICCV2005): 1117–1123. IEEE(2005). DOI: 10.1109/ICCV.2005.106.
  20. Meng G, Wang Y, Qu S, Xiang S, Pan C. Active flattening of curved document images via two structured beams. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2014): 3890–3897. IEEE(2014). DOI: 10.1109/CVPR.2014.497.
  21. Perriollat M, Bartoli A. A computational model of bounded developable surfaces with application to image-based three-dimensional reconstruction. Computer Animation and Virtual Worlds 2013; 24 (5): 459–476. DOI: 10.1002/cav.1478.
  22. You S, Matsushita Y, Sinha S, Bou Y, Ikeuchi K. Multiview Rectification of Folded Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 2018; 40 (2): 505-511. DOI: 10.1109/TPAMI.2017.2675980.
  23. Luo D, Bo P. Geometric Rectification of Creased Document Images based on Isometric Mapping. arXiv preprint arXiv:2212.08365 (2022).
  24. Zhang L, Yip AM, Brown MS, Tan CL. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recognition 2009; 42 (11): 2961–2978. DOI: 10.1016/j.patcog.2009.03.025.
  25. Brown MS, Tsoi Y. Geometric and shading correction for images of printed materials using boundary. IEEE Transactions on Image Processing 2006; 15 (6): 1544–1554. DOI: 10.1109/tip.2006.871082.
  26. Tsoi Y, Brown MS. Multi-view document rectification using boundary. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR2007): 1–8. IEEE(2007). DOI: 10.1109/CVPR.2007.383251.
  27. Koo HI, Cho NI. Rectification of figures and photos in document images using bounding box interface. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2010): 3121–3128. IEEE(2010). DOI: 10.1109/CVPR.2010.5540071.
  28. Gatos B, Pratikakis I, Ntirogiannis K. Segmentation based recovery of arbitrarily warped document images. In: Ninth international conference on document analysis and recognition (ICDAR2007): 989–993. IEEE(2007). DOI: 10.1109/ICDAR.2007.4377063.
  29. Lu S, Tan CL. The restoration of camera documents through image segmentation. In: Document Analysis Systems VII: 7th International Workshop (DAS2006): 484–495. Springer(2006). DOI: 10.1007/11669487_43.
  30. Ezaki H, Uchida S, Asano A, Sakoe H. Dewarping of document image by global optimization. In: Eighth International Conference on Document Analysis and Recognition (ICDAR2005): 302–306. IEEE(2005). DOI: 10.1109/ICDAR.2005.87.
  31. Stamatopoulos N, Gatos B, Pratikakis I, Perantonis SJ. A two-step dewarping of camera document images. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems: 209–216. IEEE(2008). DOI: 10.1109/DAS.2008.40.
  32. Zhang Z, Tan CL. Correcting document image warping based on regression of curved text lines. In: Seventh International Conference on Document Analysis and Recognition (ICDAR2003): 589–593. IEEE(2003). DOI: 10.1109/ICDAR.2003.1227732.
  33. Ulges A, Lampert CH, Breuel TM. Document image dewarping using robust estimation of curled text lines. In: Eighth International Conference on Document Analysis and Recognition (ICDAR2005): 1001–1005. IEEE(2005). DOI: 10.1109/ICDAR.2005.90.
  34. Gaofeng M, Chunhong P, Shiming X, Jiangyong D, Nanning Z. Metric Rectification of Curved Document Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 2012; 34 (4): 707-722. DOI: 10.1109/TPAMI.2011.151.
  35. Liang J, DeMenthon D, Doermann D. Geometric rectification of camera-captured document images. IEEE transactions on pattern analysis and machine intelligence 2008; 30 (4): 591–605. DOI: 10.1109/TPAMI.2007.70724.
  36. Meng G, Su Y, Wu Y, Xiang S, Pan C. Exploiting vector fields for geometric rectification of distorted document images. In: Proceedings of the European Conference on Computer Vision (ECCV2018): 172–187. Springer(2018). DOI: 10.1007/978-3-030-01270-0_11 .
  37. Liang J, DeMenthon D, Doermann D. Flattening curved documents in images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2005): 338–345. IEEE(2005). DOI: 10.1109/CVPR.2005.163.
  38. Fujimoto K, Sun J, Takebe H, Suwa M, Naoi S. Curved paper rectification for digital camera document images by shape from parallel geodesics using continuous dynamic programming. In: Ninth International Conference on Document Analysis and Recognition (ICDAR2007): 267–271. IEEE(2007). DOI: 10.1109/ICDAR.2007.4378717.
  39. Das S, Ma K, Shu Z, Samaras D, Shilkrot R. DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks. In: Proceedings of International Conference on Computer Vision (ICCV2019): 131–140. (2019). DOI: 10.1109/ICCV.2019.00022.
  40. Markovitz A, Lavi I, Perel O, Mazor S, Litman R. Can you read me now? content aware rectification using angle supervision. In: Proceedings of the European Conference on Computer Vision (ECCV2020): 208–223. Springer(2020). DOI: 10.1007/978-3-030-58610-2_13.
  41. Xu Z, Yin F, Yang P, Liu C. Document Image Rectification in Complex Scene Using Stacked Siamese Networks. In: 26th International Conference on Pattern Recognition (ICPR2022): 1550–1556. IEEE(2022). DOI: 10.1109/ICPR56361.2022.9956331.
  42. Bandyopadhyay H, Dasgupta T, Das N, Nasipuri M. A gated and bifurcated stacked u-net module for document image dewarping. In: 25th International Conference on Pattern Recognition (ICPR2020): 10548–10554. IEEE(2021). DOI: 10.1109/ICPR48806.2021.9413001.
  43. Li X, Zhang B, Liao J, Sander PV. Document rectification and illumination correction using a patch-based CNN. ACM Transactions on Graphics (TOG) 2019; 38 (6): 1–11. DOI: 10.1145/3355089.3356563.
  44. Feng H, Wang Y, Zhou W, Deng J, Li H. DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction. In: Proceedings of the 29th ACM International Conference on Multimedia: 273–281. ACM(2021). DOI: 10.1145/3474085.3475388.
  45. Feng H, Zhou W, Deng J, Wang Y, Li H. Geometric Representation Learning for Document Image Rectification. In: European Conference on Computer Vision (ECCV2022): 475–492. Springer(2022). DOI: 10.1007/978-3-031-19836-6_27.
  46. Hertlein F, Naumann A, Philipp P. Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping. International Journal on Document Analysis and Recognition (IJDAR) 2023; 26 (3): 175–186. DOI: 10.1007/s10032-023-00434-x.
  47. Ershov A, Tropin D, Kazimirov D, Bulatov K, Nikolaev D. Utilizing a Two Planes Model to Rectify Documents With a Single Arbitrary Crease. IEEE Access 2024; 12 (): 147073–147086. DOI: 10.1109/ACCESS.2024.3474099.
  48. Ignatov A, Timofte R, Kulik A, Yang S, Wang K, Baum F, Wu M, Xu L, Van Gool L. Ai benchmark: All about deep learning on smartphones in 2019. In: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW2019): 3617–3635. IEEE(2019). DOI: 10.1109/ICCVW.2019.00447.
  49. Limonova E, Sheshkus A, Ivanova A, Nikolaev D. Convolutional neural network structure transformations for complexity reduction and speed improvement. In: : 24–33. Springer(2018). DOI: 10.1134/S105466181801011X.
  50. Limonova EE. Fast and gate-efficient approximated activations for bipolar morphological neural networks. Information Technologies and Computation Systems 2022; (2): 3–10. DOI: 10.14357/20718632220201.
  51. Tropin DV, Ershov AM, Nikolaev DP, Arlazarov VV. Advanced Hough-based method for on-device document localization. Computer Optics 2021; 45 (5): 702–712. DOI: 10.18287/2412-6179-CO-895.
  52. Brady ML. A fast discrete approximation algorithm for the Radon transform. SIAM Journal on Computing 1998; 27 (1): 107–119. DOI: 10.1137/S0097539793256673.
  53. Shemiakina J, Konovalenko I, Tropin D, Faradjev I. Fast projective image rectification for planar objects with Manhattan structure. In: Twelfth International Conference on Machine Vision (ICMV 2019): 450–458. SPIE(2020). DOI: 10.1117/12.2559630.
  54. Ma K, Das S, Shu Z, Samaras D. Learning from documents in the wild to improve document unwarping. In: ACM SIGGRAPH 2022 Conference Proceedings (SIGGRAPH2022): 1–9. ACM(2022). DOI: 10.1145/3528233.3530756.
  55. Wang Z, Simoncelli EP, Bovik AC. Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003: 1398–1402. IEEE(2003). DOI: 10.1109/ACSSC.2003.1292216.
  56. Levenshtein VI and others. Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady: 707–710. (1966)
    .

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20