(48-4) 09 * << * >> * Русский * English * Содержание * Все выпуски

Unfolder: fast localization and image rectification of a document with a crease from folding in half
A.M. Ershov 1,2, D.V. Tropin 1,3, E.E. Limonova 1,3, D.P. Nikolaev 1,2, V.V. Arlazarov 1,3

Smart Engines Service LLC, 117312, Moscow, Russia, Prospekt 60-letiia Oktiabria 9;
Institute for Information Transmission Problems of RAS (Kharkevich Institute),
127051, Moscow, Russia, Bolshoy Karetny per. 19, build 1;
Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences,
119333, Moscow, Russia, Prospekt 60-letiia Oktiabria 9

  PDF, 4836 kB

DOI: 10.18287/2412-6179-CO-1406

Страницы: 542-553.

Язык статьи: English.

Аннотация:
Presentation of folded documents is not an uncommon case in modern society. Digitizing such documents by capturing them with a smartphone camera can be tricky since a crease can divide the document contents into separate planes. To unfold the document, one could hold the edges potentially obscuring it in a captured image. While there are many geometrical rectification methods, they were usually developed for arbitrary bends and folds. We consider such algorithms and propose a novel approach Unfolder developed specifically for images of documents with a crease from folding in half. Unfolder is robust to projective distortions of the document image and does not fragment the image in the vicinity of a crease after rectification. A new Folded Document Images dataset was created to investigate the rectification accuracy of folded (2, 3, 4, and 8 folds) documents. The dataset includes 1600 images captured when document placed on a table and when held in hand. The Unfolder algorithm allowed for a recognition error rate of 0.33, which is better than the advanced neural network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for Unfolder was only 0.25 s/image on an iPhone XR.

Ключевые слова:
folded documents, image rectification, dewarping, on-device acquisition, open dataset.

Citation:
Ershov AM, Tropin DV, Limonova EE, Nikolaev DP, Arlazarov VV. Unfolder: fast localization and image rectification of a document with a crease from folding in half. Computer Optics 2024; 48(4): 542-553. DOI: 10.18287/2412-6179-CO-1406.

References:

  1. Arlazarov VV, Zhukovsky A, Krivtsov V, Nikolaev D, Polevoy D. Analysis of using stationary and mobile small-scale digital video cameras for document recognition [In Russian]. Information Technologies and Computation Systems 2014; (3): 71-78.
  2. Burie J, Chazalon J, Coustaty M, Eskenazi S, Luqman MM, Mehri M, Nayef N, Ogier J, Prum S, Rusiñol M. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). 2015 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 1161-1165. DOI: 10.1109/ICDAR.2015.7333943.
  3. Hartl A, Reitmayr G. Rectangular target extraction for mobile augmented reality applications. Proc 21st Int Conf on Pattern Recognition (ICPR2012) 2012: 81-84.
  4. Puybareau E, Géraud T. Real-time document detection in smartphone videos. 2018 25th IEEE Int Conf on Image Processing (ICIP) 2018: 1498-1502. DOI: 10.1109/ICIP.2018.8451533.
  5. Tropin DV, Ershov AM, Nikolaev DP, Arlazarov VV. Advanced Hough-based method for on-device document localization. Computer Optics 2021; 45(5): 702-712. DOI: 10.18287/2412-6179-CO-895.
  6. Das S, Mishra G, Sudharshana A, Shilkrot R. The common fold: utilizing the four-fold to dewarp printed documents from a single image. Proc 2017 ACM Symposium on Document Engineering 2017: 125-128. DOI: 10.1145/3103010.3121030.
  7. Ma K, Shu Z, Bai X, Wang J, Samaras D. DocUNet: Document image unwarping via a stacked u-net. Proc IEEE Conf on Computer Vision and Pattern Recognition 2018: 4700-4709. DOI: 10.1109/CVPR.2018.00494.
  8. Xue C, Tian Z, Zhan F, Lu S, Bai S. Fourier document restoration for robust document dewarping and recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2022: 4573-4582. DOI: 10.1109/CVPR52688.2022.00453.
  9. Tan CL, Zhang L, Zhang Z, Xia T. Restoring warped document images through 3D shape modeling. IEEE Trans Pattern Anal Mach Intell 2005; 28(2): 195-208. DOI: 10.1109/TPAMI.2006.40.
  10. Zhang L, Yip AM, Brown MS, Tan CL. A unified framework for document restoration using inpainting and shape-from-shading. Pattern Recogn 2009; 42(11): 2961-2978. DOI: 10.1016/j.patcog.2009.03.025.
  11. You S, Matsushita Y, Sinha S, Bou Y, Ikeuchi K. Multiview rectification of folded documents. IEEE Trans Pattern Anal Mach Intell 2018; 40(2): 505-511. DOI: 10.1109/TPAMI.2017.2675980.
  12. Luo D, Bo P. Geometric rectification of creased document images based on isometric mapping. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2212.08365>.
  13. Brown MS, Seales WB. Image restoration of arbitrarily warped documents. IEEE Trans Pattern Anal Mach Intell 2004; 26(10): 1295-1306. DOI: 10.1109/TPAMI.2004.87.
  14. Zhang L, Zhang Y, Tan C. An improved physically-based method for geometric restoration of distorted document images. IEEE Trans Pattern Anal Mach Intell 2008; 30(4): 728-734. DOI: 10.1109/TPAMI.2007.70831.
  15. Sun M, Yang R, Yun L, Landon G, Seales B, Brown MS. Geometric and photometric restoration of distorted documents. Tenth IEEE Int Conf on Computer Vision (ICCV’05) 2005; 1: 1117-1123. DOI: 10.1109/ICCV.2005.106.
  16. Meng G, Wang Y, Qu S, Xiang S, Pan C. Active flattening of curved document images via two structured beams. Proc IEEE Conf on Computer Vision and Pattern Recognition 2014: 3890-3897. DOI: 10.1109/CVPR.2014.497.
  17. Brown MS, Tsoi Y. Geometric and shading correction for images of printed materials using boundary. IEEE Trans Image Process 2006; 15(6): 1544-1554. DOI: 10.1109/tip.2006.871082.
  18. Koo HI, Cho NI. Rectification of figures and photos in document images using bounding box interface. 2010 IEEE Computer Society Conf on Computer Vision and Pattern Recognition 2010: 3121-3128. DOI: 10.1109/CVPR.2010.5540071.
  19. Tsoi Y, Brown MS. Multi-view document rectification using boundary. 2007 IEEE Conf on Computer Vision and Pattern Recognition 2007: 1-8. DOI: 10.1109/CVPR.2007.383251.
  20. Coons SA. Surfaces for computer-aided design of space forms. Technical Report. 1967: MIT/LCS/TR-41.
  21. Stamatopoulos N, Gatos B, Pratikakis I, Perantonis SJ. A two-step dewarping of camera document images. 2008 The Eighth IAPR Int Workshop on Document Analysis Systems 2008: 209-216. DOI: 10.1109/DAS.2008.40.
  22. Gaofeng M, Chunhong P, Shiming X, Jiangyong D, Nanning Z. Metric rectification of curved document images. IEEE Trans Pattern Anal Mach Intell 2012; 34(4): 707-722. DOI: 10.1109/TPAMI.2011.151.
  23. Fu B, Wu M, Li R, Li W, Xu Z, Yang C. A model-based book dewarping method using text line detection. Proc 2nd Int Workshop on Camera Based Document Analysis and Recognition 2007: 63-70.
  24. Das S, Ma K, Shu Z, Samaras D, Shilkrot R. DewarpNet: Single-image document unwarping with stacked 3D and 2D regression networks. Proc Int Conf on Computer Vision 2019: 131-140 DOI: 10.1109/ICCV.2019.00022.
  25. Feng H, Zhou W, Deng J, Tian Q, Li H. DocScanner: Robust document image rectification with progressive learning. arXiv Preprint. 2021. Source: <https://arxiv.org/abs/2110.14968>.
  26. Feng H, Wang Y, Zhou W, Deng J, Li H. DocTr: Document image transformer for geometric unwarping and illumination correction. Proc 29th ACM Int Conf on Multimedia (MM’ 21) 2021: 273-281. DOI: 10.1145/3474085.3475388.
  27. Das S, Singh KY, Wu J, Bas E, Mahadevan V, Bhotika R, Samaras D. End-to-end piece-wise unwarping of document images. Proc IEEE/CVF Int Conf on Computer Vision 2021: 4268-4277. DOI: 10.1109/ICCV48922.2021.00423.
  28. Xie G, Yin F, Zhang X, Liu C. Document dewarping with control points. In Book: Lladós J, Lopresti D, Uchida S, eds. Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I. Berlin, Heidelberg: Springer-Verlag; 2021: 466-480. DOI: 10.1007/978-3-030-86549-8_30.
  29. Jiang X, Long R, Xue N, Yang Z, Yao C, Xia G. Revisiting document image dewarping by grid regularization. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2022: 4543-4552. DOI: 10.1109/CVPR52688.2022.00450.
  30. Wang Y, Zhou W, Lu Z, Li H. UDoc-GAN: Unpaired document illumination correction with background light prior. Proc 30th ACM Int Conf on Multimedia 2022: 5074-5082. DOI: 10.1145/3503161.3547916.
  31. Ma K, Das S, Shu Z, Samaras D. Learning from documents in the wild to improve document unwarping. ACM SIGGRAPH 2022 Conf Proc 2022: 1-9. DOI: 10.1145/3528233.3530756.
  32. Feng H, Zhou W, Deng J, Wang Y, Li H. Geometric representation learning for document image rectification. In Book: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, eds. Computer vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII. Berlin, Heidelberg: Springer-Verlag; 2022: 475-492. DOI: 10.1007/978-3-031-19836-6_27.
  33. Das S, Ma K, Shu Z, Samaras D. Learning an isometric surface parameterization for texture unwrapping. In Book: Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, eds. Computer vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII. Berlin, Heidelberg: Springer-Verlag; 2022: 580-597. DOI: 10.1007/978-3-031-19836-6_33.
  34. Li X, Zhang B, Liao J, Sander PV. Document rectification and illumination correction using a patch-based CNN. ACM Trans Graph 2019; 38(6): 1-11. DOI: 10.1145/3355089.3356563.
  35. Bandyopadhyay H, Dasgupta T, Das N, Nasipuri M. A gated and bifurcated stacked u-net module for document image dewarping. 2020 25th Int Conf on Pattern Recognition (ICPR) 2021: 10548-10554. DOI: 10.1109/ICPR48806.2021.9413001.
  36. Xie G, Yin F, Zhang X, Liu C. Dewarping document image by displacement flow estimation with fully convolutional network. In Book: Bai X, Karatzas D, Lopresti D, eds. 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26-29, 2020, Proceedings. Cham: Springer Nature Switzerland AG; 2020: 131-144. DOI: 10.1007/978-3-030-57058-3_10.
  37. Xu Z, Yin F, Yang P, Liu C. Document image rectification in complex scene using stacked siamese networks. 2022 26th Int Conf on Pattern Recognition (ICPR) 2022: 1550-1556. DOI: 10.1109/ICPR56361.2022.9956331.
  38. Verhoeven F, Magne T, Sorkine-Hornung O. UVDoc: Neural grid-based document unwarping. arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2302.02887>. DOI: 10.48550/arXiv.2302.02887.
  39. Feng H, Liu S, Deng J, Zhou W, Li H. Deep unrestricted document image rectification. arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2304.08796>. DOI: 10.48550/arXiv.2304.08796.
  40. Hertlein F, Naumann A, Philipp P. Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping. Int J Doc Anal Recognit 2023; 26: 175-186. DOI: 10.1007/s10032-023-00434-x.
  41. Brady ML. A fast discrete approximation algorithm for the Radon transform. SIAM J Comput 1998; 27(1): 107-119. DOI: 10.1137/S0097539793256673.
  42. Shemiakina J, Konovalenko I, Tropin D, Faradjev I. Fast projective image rectification for planar objects with Manhattan structure. Twelfth Int Conf on Machine Vision (ICMV 2019) 2020: 450-458. DOI: 10.1117/12.2559630.
  43. Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2015; 9445: 94452A. DOI: 10.1117/12.2181377.
  44. Zhang Z, He L. Whiteboard scanning and image enhancement. Digit Signal Process 2007; 17(2): 414-432. DOI: 10.1016/j.dsp.2006.05.006.
  45. Trusov A, Limonova E. The analysis of projective transformation algorithms for image recognition on mobile devices. Proc SPIE 2020; 11433: 114330Y. DOI: 10.1117/12.2559732.
  46. Dutta A, Zisserman A. The VIA annotation software for images, audio and video. Proc 27th ACM Int Conf on Multimedia 2019: 2276-2279. DOI: 10.1145/3343031.3350535.
  47. Wang Z, Simoncelli EP, Bovik AC. Multiscale structural similarity for image quality assessment. The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers 2003: 1398-1402. DOI: 10.1109/ACSSC.2003.1292216.
  48. Liu C, Yuen J, Torralba A, Sivic J, Freeman WT. SIFT flow: Dense correspondence across different scenes. In Book: Forsyth D, Torr P, Zisserman A, eds. Computer Vision – ECCV 2008. Berlin, Heidelberg: Springer-Verlag; 2008: 28-42. DOI: 10.1007/978-3-540-88690-7_3.
  49. Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics – Doklady 1966; 10(8): 707-710.
  50. Smith R. An overview of the Tesseract OCR engine. Ninth Int Conf on document analysis and recognition (ICDAR 2007) 2007: 629-633. DOI: 10.1109/ICDAR.2007.4376991.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20