(48-4) 09 * << * >> * Русский * English * Содержание * Все выпуски
  
Unfolder: fast localization and image rectification of a document with a crease from folding in half
 A.M. Ershov 1,2, D.V. Tropin 1,3, E.E. Limonova 1,3, D.P. Nikolaev 1,2, V.V. Arlazarov 1,3
 1 Smart Engines Service LLC, 117312, Moscow, Russia, Prospekt 60-letiia Oktiabria 9;
     2 Institute for Information Transmission Problems of RAS (Kharkevich Institute),
     127051, Moscow, Russia, Bolshoy Karetny per. 19, build 1;
     3 Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences,
     119333, Moscow, Russia, Prospekt 60-letiia Oktiabria 9
 
 PDF, 4836 kB
  PDF, 4836 kB
DOI: 10.18287/2412-6179-CO-1406
Страницы: 542-553.
Язык статьи: English.
 
Аннотация:
 
Presentation  of folded documents is not an uncommon case in modern society. Digitizing such  documents by capturing them with a smartphone camera can be tricky since a  crease can divide the document contents into separate planes. To unfold the  document, one could hold the edges potentially obscuring it in a captured  image. While there are many geometrical rectification methods, they were  usually developed for arbitrary bends and folds. We consider such algorithms  and propose a novel approach Unfolder developed specifically for images of  documents with a crease from folding in half. Unfolder is robust to projective  distortions of the document image and does not fragment the image in the  vicinity of a crease after rectification. A new Folded Document Images dataset  was created to investigate the rectification accuracy of folded (2, 3, 4, and 8  folds) documents. The dataset includes 1600 images captured when document  placed on a table and when held in hand. The Unfolder algorithm allowed for a  recognition error rate of 0.33, which is better than the advanced neural  network methods DocTr (0.44) and DewarpNet (0.57). The average runtime for  Unfolder was only 0.25 s/image on an iPhone  XR.
Ключевые слова:
folded documents,  image rectification, dewarping, on-device acquisition, open dataset.
Citation:
Ershov AM, Tropin DV, Limonova EE, Nikolaev DP,  Arlazarov VV. Unfolder: fast localization and image rectification of a document  with a crease from folding in half. Computer Optics 2024; 48(4): 542-553. DOI: 10.18287/2412-6179-CO-1406.
References:
  - Arlazarov  VV, Zhukovsky A, Krivtsov V, Nikolaev D, Polevoy D. Analysis of using  stationary and mobile small-scale digital video cameras for document  recognition [In Russian]. Information Technologies and Computation Systems  2014; (3): 71-78.
 
- Burie  J, Chazalon J, Coustaty M, Eskenazi S, Luqman MM, Mehri M, Nayef N, Ogier J,  Prum S, Rusiñol M. ICDAR2015 competition on smartphone document capture and OCR  (SmartDoc). 2015 13th Int Conf on Document Analysis and Recognition (ICDAR)  2015: 1161-1165. DOI: 10.1109/ICDAR.2015.7333943.
 
- Hartl  A, Reitmayr G. Rectangular target extraction for mobile augmented reality  applications. Proc 21st Int Conf on Pattern Recognition (ICPR2012) 2012: 81-84.
 
- Puybareau  E, Géraud T. Real-time document detection in smartphone videos. 2018 25th IEEE  Int Conf on Image Processing (ICIP) 2018: 1498-1502. DOI:  10.1109/ICIP.2018.8451533.
 
- Tropin  DV, Ershov AM, Nikolaev DP, Arlazarov VV. Advanced Hough-based method for  on-device document localization. Computer Optics 2021; 45(5): 702-712. DOI:  10.18287/2412-6179-CO-895.
 
- Das  S, Mishra G, Sudharshana A, Shilkrot R. The common fold: utilizing the  four-fold to dewarp printed documents from a single image. Proc 2017 ACM  Symposium on Document Engineering 2017: 125-128. DOI: 10.1145/3103010.3121030.
 
- Ma  K, Shu Z, Bai X, Wang J, Samaras D. DocUNet: Document image unwarping via a  stacked u-net. Proc IEEE Conf on Computer Vision and Pattern Recognition 2018:  4700-4709. DOI: 10.1109/CVPR.2018.00494.
 
- Xue  C, Tian Z, Zhan F, Lu S, Bai S. Fourier document restoration for robust  document dewarping and recognition. Proc IEEE/CVF Conf on Computer Vision and  Pattern Recognition 2022: 4573-4582. DOI: 10.1109/CVPR52688.2022.00453.
 
- Tan  CL, Zhang L, Zhang Z, Xia T. Restoring warped document images through 3D shape  modeling. IEEE Trans Pattern Anal Mach Intell 2005; 28(2): 195-208. DOI:  10.1109/TPAMI.2006.40.
 
- Zhang  L, Yip AM, Brown MS, Tan CL. A unified framework for document restoration using  inpainting and shape-from-shading. Pattern Recogn 2009; 42(11): 2961-2978. DOI:  10.1016/j.patcog.2009.03.025.
 
- You  S, Matsushita Y, Sinha S, Bou Y, Ikeuchi K. Multiview rectification of folded  documents. IEEE Trans Pattern Anal Mach Intell 2018; 40(2): 505-511. DOI:  10.1109/TPAMI.2017.2675980.
 
- Luo  D, Bo P. Geometric rectification of creased document images based on isometric  mapping. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2212.08365>.
 
- Brown  MS, Seales WB. Image restoration of arbitrarily warped documents. IEEE Trans  Pattern Anal Mach Intell 2004; 26(10): 1295-1306. DOI: 10.1109/TPAMI.2004.87.
 
- Zhang  L, Zhang Y, Tan C. An improved physically-based method for geometric  restoration of distorted document images. IEEE Trans Pattern Anal Mach Intell  2008; 30(4): 728-734. DOI: 10.1109/TPAMI.2007.70831.
 
- Sun  M, Yang R, Yun L, Landon G, Seales B, Brown MS. Geometric and photometric  restoration of distorted documents. Tenth IEEE Int Conf on Computer Vision  (ICCV’05) 2005; 1: 1117-1123. DOI: 10.1109/ICCV.2005.106.
 
- Meng  G, Wang Y, Qu S, Xiang S, Pan C. Active flattening of curved document images  via two structured beams. Proc IEEE Conf on Computer Vision and Pattern  Recognition 2014: 3890-3897. DOI: 10.1109/CVPR.2014.497.
 
- Brown  MS, Tsoi Y. Geometric and shading correction for images of printed materials  using boundary. IEEE Trans Image Process 2006; 15(6): 1544-1554. DOI:  10.1109/tip.2006.871082.
 
- Koo  HI, Cho NI. Rectification of figures and photos in document images using  bounding box interface. 2010 IEEE Computer Society Conf on Computer Vision and  Pattern Recognition 2010: 3121-3128. DOI: 10.1109/CVPR.2010.5540071.
 
- Tsoi  Y, Brown MS. Multi-view document rectification using boundary. 2007 IEEE Conf  on Computer Vision and Pattern Recognition 2007: 1-8. DOI:  10.1109/CVPR.2007.383251.
 
- Coons  SA. Surfaces for computer-aided design of space forms. Technical Report. 1967:  MIT/LCS/TR-41.
 
- Stamatopoulos  N, Gatos B, Pratikakis I, Perantonis SJ. A two-step dewarping of camera  document images. 2008 The Eighth IAPR Int Workshop on Document Analysis Systems  2008: 209-216. DOI: 10.1109/DAS.2008.40.
 
- Gaofeng  M, Chunhong P, Shiming X, Jiangyong D, Nanning Z. Metric rectification of  curved document images. IEEE Trans Pattern Anal Mach Intell 2012; 34(4):  707-722. DOI: 10.1109/TPAMI.2011.151.
 
- Fu B,  Wu M, Li R, Li W, Xu Z, Yang C. A model-based book dewarping method using text  line detection. Proc 2nd Int Workshop on Camera Based Document Analysis and  Recognition 2007: 63-70.
 
- Das  S, Ma K, Shu Z, Samaras D, Shilkrot R. DewarpNet: Single-image document  unwarping with stacked 3D and 2D regression networks. Proc Int Conf on Computer  Vision 2019: 131-140 DOI: 10.1109/ICCV.2019.00022.
 
- Feng  H, Zhou W, Deng J, Tian Q, Li H. DocScanner: Robust document image  rectification with progressive learning. arXiv Preprint. 2021. Source: <https://arxiv.org/abs/2110.14968>.
 
- Feng  H, Wang Y, Zhou W, Deng J, Li H. DocTr: Document image transformer for  geometric unwarping and illumination correction. Proc 29th ACM Int Conf on  Multimedia (MM’ 21) 2021: 273-281. DOI: 10.1145/3474085.3475388.
 
- Das  S, Singh KY, Wu J, Bas E, Mahadevan V, Bhotika R, Samaras D. End-to-end  piece-wise unwarping of document images. Proc IEEE/CVF Int Conf on Computer  Vision 2021: 4268-4277. DOI: 10.1109/ICCV48922.2021.00423.
 
- Xie  G, Yin F, Zhang X, Liu C. Document dewarping with control points. In Book:  Lladós J, Lopresti D, Uchida S, eds. Document Analysis and Recognition – ICDAR 2021:  16th International Conference, Lausanne, Switzerland, September 5–10, 2021,  Proceedings, Part I. Berlin, Heidelberg: Springer-Verlag; 2021: 466-480. DOI:  10.1007/978-3-030-86549-8_30.
 
- Jiang  X, Long R, Xue N, Yang Z, Yao C, Xia G. Revisiting document image dewarping by  grid regularization. Proc IEEE/CVF Conf on Computer Vision and Pattern  Recognition 2022: 4543-4552. DOI: 10.1109/CVPR52688.2022.00450.
 
- Wang  Y, Zhou W, Lu Z, Li H. UDoc-GAN: Unpaired document illumination correction with  background light prior. Proc 30th ACM Int Conf on Multimedia 2022: 5074-5082.  DOI: 10.1145/3503161.3547916.
 
- Ma K,  Das S, Shu Z, Samaras D. Learning from documents in the wild to improve  document unwarping. ACM SIGGRAPH 2022 Conf Proc 2022: 1-9. DOI:  10.1145/3528233.3530756.
 
- Feng  H, Zhou W, Deng J, Wang Y, Li H. Geometric representation learning for document  image rectification. In Book: Avidan S, Brostow G, Cissé M, Farinella GM,  Hassner T, eds. Computer vision – ECCV 2022: 17th European Conference, Tel  Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII. Berlin,  Heidelberg: Springer-Verlag; 2022: 475-492. DOI: 10.1007/978-3-031-19836-6_27.
 
- Das  S, Ma K, Shu Z, Samaras D. Learning an isometric surface parameterization for  texture unwrapping. In Book: Avidan S, Brostow G, Cissé M, Farinella GM,  Hassner T, eds. Computer vision – ECCV 2022: 17th European Conference, Tel  Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII. Berlin,  Heidelberg: Springer-Verlag; 2022: 580-597. DOI: 10.1007/978-3-031-19836-6_33.
 
- Li X,  Zhang B, Liao J, Sander PV. Document rectification and illumination correction  using a patch-based CNN. ACM Trans Graph 2019; 38(6): 1-11. DOI:  10.1145/3355089.3356563.
 
- Bandyopadhyay  H, Dasgupta T, Das N, Nasipuri M. A gated and bifurcated stacked u-net module  for document image dewarping. 2020 25th Int Conf on Pattern Recognition (ICPR)  2021: 10548-10554. DOI: 10.1109/ICPR48806.2021.9413001.
 
- Xie  G, Yin F, Zhang X, Liu C. Dewarping document image by displacement flow  estimation with fully convolutional network. In Book: Bai X, Karatzas D,  Lopresti D, eds. 14th IAPR International Workshop, DAS 2020, Wuhan, China, July  26-29, 2020, Proceedings. Cham: Springer Nature Switzerland AG; 2020: 131-144.  DOI: 10.1007/978-3-030-57058-3_10.
 
- Xu Z,  Yin F, Yang P, Liu C. Document image rectification in complex scene using  stacked siamese networks. 2022 26th Int Conf on Pattern Recognition (ICPR)  2022: 1550-1556. DOI: 10.1109/ICPR56361.2022.9956331.
 
- Verhoeven  F, Magne T, Sorkine-Hornung O. UVDoc: Neural grid-based document unwarping.  arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2302.02887>. DOI:  10.48550/arXiv.2302.02887.
 
- Feng  H, Liu S, Deng J, Zhou W, Li H. Deep unrestricted document image rectification.  arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2304.08796>. DOI:  10.48550/arXiv.2304.08796.
 
- Hertlein  F, Naumann A, Philipp P. Inv3D: a high-resolution 3D invoice dataset for  template-guided single-image document unwarping. Int J Doc Anal Recognit 2023;  26: 175-186. DOI: 10.1007/s10032-023-00434-x.
 
- Brady  ML. A fast discrete approximation algorithm for the Radon transform. SIAM J  Comput 1998; 27(1): 107-119. DOI: 10.1137/S0097539793256673.
 
- Shemiakina  J, Konovalenko I, Tropin D, Faradjev I. Fast projective image rectification for  planar objects with Manhattan structure. Twelfth Int Conf on Machine Vision  (ICMV 2019) 2020: 450-458. DOI: 10.1117/12.2559630.
 
- Skoryukina  N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection  on mobile devices. Proc SPIE 2015; 9445: 94452A. DOI: 10.1117/12.2181377.
 
- Zhang  Z, He L. Whiteboard scanning and image enhancement. Digit Signal Process 2007;  17(2): 414-432. DOI: 10.1016/j.dsp.2006.05.006.
 
- Trusov  A, Limonova E. The analysis of projective transformation algorithms for image  recognition on mobile devices. Proc SPIE 2020; 11433: 114330Y. DOI:  10.1117/12.2559732.
 
- Dutta  A, Zisserman A. The VIA annotation software for images, audio and video. Proc  27th ACM Int Conf on Multimedia 2019: 2276-2279. DOI: 10.1145/3343031.3350535.
 
- Wang  Z, Simoncelli EP, Bovik AC. Multiscale structural similarity for image quality  assessment. The Thrity-Seventh Asilomar Conference on Signals, Systems &  Computers 2003: 1398-1402. DOI: 10.1109/ACSSC.2003.1292216.
 
- Liu  C, Yuen J, Torralba A, Sivic J, Freeman WT. SIFT flow: Dense correspondence across  different scenes. In Book: Forsyth D, Torr P, Zisserman A, eds. Computer Vision  – ECCV 2008. Berlin, Heidelberg: Springer-Verlag; 2008: 28-42. DOI:  10.1007/978-3-540-88690-7_3.
 
- Levenshtein  VI. Binary codes capable of correcting deletions, insertions, and reversals.  Soviet Physics – Doklady 1966; 10(8): 707-710. 
- Smith R. An overview of the Tesseract OCR engine. Ninth Int Conf on  document analysis and recognition (ICDAR 2007) 2007: 629-633. DOI:  10.1109/ICDAR.2007.4376991.
  
  © 2009, IPSI RAS
    Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7  (846)  242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический  редактор), факс: +7 (846) 332-56-20