(49-6) 31 * << * >> * Русский * English * Содержание * Все выпуски

Model-driven approach to creating ID document templates for localization and classification based on a single image
D.P. Matalov 1,2, V.V. Arlazarov 1,2

Federal Research Center "Computer Science and Control" of Russian Academy of Sciences,
Prospekt 60-Letiya Oktyabrya 9, Moscow, 119333, Russia;
Smart Engines Service LLC,
Prospekt 60-Letiya Oktyabrya 9, Moscow, 117312, Russia

  PDF, 2903 kB

DOI: 10.18287/2412-6179-CO-1762

Страницы: 1148-1155.

Язык статьи: English.

Аннотация:
ID document recognition systems are already deeply integrated into human activity, and the pace of integration is only increasing. The first and most fundamental problems of such systems are document image localization and classification. In this field, template matching-based approaches have become widely used. These methods offer industrial precision, require minimal training data, and provide real-time performance on mobile devices. However, these methods have a significant limitation in scalability: every document type represents a set of local features to store and process, which affects the required computing resources. Moreover, considering the number of different document types supported by modern industrial recognition systems, they become unusable. To mitigate the drawback, we propose a method to select a subset of the most "stable" keypoints. To estimate keypoints' stability we synthesize a dataset of images containing various distortions relevant to the process of taking photos of hand-held documents with a smartphone camera in uncontrolled lighting conditions. To perform experiments we use well-known MIDV datasets, which have been designed to benchmark modern ID document recognition. The experiments show that the proposed method allows for increased ID document detection performance given thousands of document types and with limited computing resources.

Ключевые слова:
one-shot learning, documents recognition, document processing, image augmentation, template matching, local features.

Citation:
Matalov DP, Arlazarov VV. Model-driven approach to creating ID document templates for localization and classification based on a single image. Computer Optics 2025; 49(6): 1148-1155. DOI: 10.18287/2412-6179-CO-1762.

References:

  1. Bulatov KB, Bezmaternykh PV, Nikolaev DP, Arlazarov VV. Towards a unified framework for identity documents analysis and recognition. Computer Optics 2022; 46(3): 436-454. DOI: 10.18287/2412-6179-CO-1024.
  2. V. V. Arlazarov, Key stages of document template processing in modern identification document recognition systems. Trudy ISA RAN (Proceedings of ISA RAS) 72 (3) (2022) 19-25. DOI: 10.14357/20790279220303.
  3. A. M. Awal, N. Ghanmi, R. Sicre, T. Furon, Complex document classification and localization application on identity document images , in: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2017, p. 426–431. doi:10.1109/icdar.2017.77 . URL http://dx.doi.org/10.1109/ICDAR.2017.77
  4. N. Skoryukina, V. V. Arlazarov, D. P. Nikolaev, Fast method of id documents location and type identification for mobile and server application, in: ICDAR 2019, The Institute of Electrical and Electronics Engineers (IEEE), Manhattan, New York, U.S., 2019, pp. 850–857, dOI: 10.1109/ICDAR.2019.00141.
  5. G. Chiron, N. Ghanmi, A. M. Awal, Id documents matching and localization with multi-hypothesis constraints, in: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, 2021, p. 3644–3651. doi:10.1109/icpr48806.2021.9412437 . URL http://dx.doi.org/10.1109/ICPR48806.2021.9412437
  6. N. Z. Valishina, A. V. Gayer, N. S. Skoryukina, V. V. Arlazarov, Fast keypoint filtering for feature-based identity documents classifcation on complex background, in: W. Osten, D. Nikolaev, J. Debayle (Eds.), ICMV 2023, Vol. 13072, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2024, pp. 1307205–1–1307205–9. doi:DOI: 10.1117/12.3023194.
  7. D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev, V. V. Arlazarov, Improved algorithm of id card detection by a priori knowledge of the document aspect ratio, in: ICMV 2020, Vol. 11605, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2021, pp. 116051F1–116051F9. doi:DOI: 10.1117/12.2587029 .
  8. N. S. Skoryukina, E. A. Shalnova, V. V. Arlazarov, Method for detecting false responses of localization and identification algorithms using global features, ITiVS (4) (2023) 28–36, dOI: 10.14357/20718632230403.
  9. D. P. Matalov, E. E. Limonova, N. S. Skoryukina, V. V. Arlazarov, Rfdoc: memory efficient local descriptors for id documents localization and classification, in: J. Lladós, D. Lopresti, S. Uchida (Eds.), ICDAR 2021, 2nd Edition, Vol. 12822 of Lecture Notes in Computer Science (LNCS), Springer Nature Group, London, UK (main office), 2021, pp. 209–224, dOI: 10.1007/978-3-030-86331-9_14.
  10. V. V. Arlazarov, K. Bulatov, T. Chernov, V. L. Arlazarov, Midv-500: A dataset for identity document analysis and recognition on mobile devices in video stream, Computer Optics 43 (5) (2019) 818–824, dOI: 10.18287/2412-6179-2019-43-5-818-824.
  11. N. Skoryukina, V. V. Arlazarov, A. Milovzorov, Memory consumption reduction for identity document classification with local and global features combination , in: W. Osten, J. Zhou, D. P. Nikolaev (Eds.), Thirteenth International Conference on Machine Vision, SPIE, 2021, p. 36. doi:10.1117/12.2587033 . URL http://dx.doi.org/10.1117/12.2587033
  12. T. Trzcinski, M. Christoudias, V. Lepetit, Learning image descriptors with boosting , IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (3) (2015) 597–610. doi:10.1109/tpami.2014.2343961 . URL http://dx.doi.org/10.1109/TPAMI.2014.2343961
  13. H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (surf) , Computer Vision and Image Understanding 110 (3) (2008) 346–359. doi:10.1016/j.cviu.2007.09.014 . URL http://dx.doi.org/10.1016/j.cviu.2007.09.014
  14. D. P. Matalov, E. E. Limonova, N. S. Skoryukina, V. V. Arlazarov, Memory effcient local features descriptor for identity document detection on mobile and embedded devices, IEEE Access 11 (2022) 1104–1114, dOI: 10.1109/ACCESS.2022.3233463.
  15. J. Lerouge, G. Betmont, T. Bres, E. Stepankevich, A. Bergès, Docxpand-25k: a large and diverse benchmark dataset for identity documents analysis, arXiv preprint arXiv:2407.20662 (2024).
  16. L. Xie, Y. Wang, H. Guan, S. Nag, R. Goel, N. Swamy, Y. Yang, C. Xiao, J. Prisby, R. Maciejewski, J. Zou, Idnet: A novel identity document dataset via few-shot and quality-driven synthetic data generation, in: 2024 IEEE International Conference on Big Data (BigData), 2024, pp. 2244–2253. doi:10.1109/BigData62323.2024.10825017 .
  17. K. Bulatov, D. Matalov, V. V. Arlazarov, Midv-2019: Challenges of the modern mobile-based document ocr, in: W. Osten, D. Nikolaev, J. Zhou (Eds.), ICMV 2019, Vol. 11433, Society of Photo-Optical Instrumentation Engineers (SPIE), Bellingham, Washington 98227-0010 USA, 2020, pp. 114332N1–114332N6, dOI: 10.1117/12.2558438.
  18. K. B. Bulatov, E. V. Emelyanova, D. V. Tropin, N. S. Skoryukina, Y. S. Chernyshova, A. V. Sheshkus, S. A. Usilin, Z. Ming, J.-C. Burie, M. M. Luqman, V. V. Arlazarov, Midv-2020: A comprehensive benchmark dataset for identity document analysis, Computer Optics 46 (2) (2022) 252–270, dOI: 10.18287/2412-6179-CO-1006.
  19. J. Shemiakina, I. Konovalenko, D. Tropin, I. Faradjev, Fast projective image rectification for planar objects with manhattan structure , in: W. Osten, D. P. Nikolaev (Eds.), Twelfth International Conference on Machine Vision (ICMV 2019), SPIE, 2020, p. 123. doi:10.1117/12.2559630. Source: <http://dx.doi.org/10.1117/12.2559630>.
  20. C. Harris, M. Stephens, A combined corner and edge detector , in: Procedings of the Alvey Vision Conference 1988, AVC 1988, Alvey Vision Club, 1988, pp. 23.1–23.6. doi:10.5244/c.2.23 . Source: <http://dx.doi.org/10.5244/C.2.23>.
  21. D. Lowe, Object recognition from local scale-invariant features , in: Proceedings of the Seventh IEEE International Conference on Computer Vision, IEEE, 1999, p. 1150–1157 vol.2. doi:10.1109/iccv.1999.790410 . Source: <http://dx.doi.org/10.1109/ICCV.1999.790410>.
  22. Y. Verdie, K. Yi, P. Fua, V. Lepetit, Tilde: A temporally invariant learned detector, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  23. D. DeTone, T. Malisiewicz, A. Rabinovich, Superpoint: Self-supervised interest point detection and description, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.
  24. Y. Ono, E. Trulls, P. Fua, K. M. Yi, Lf-net: Learning local features from images, Advances in neural information processing systems 31 (2018).
  25. C. Schmid, R. Mohr, C. Bauckhage, Evaluation of interest point detectors , International Journal of Computer Vision 37 (2) (2000) 151–172. doi:10.1023/a:1008199403446 . Source: <http://dx.doi.org/10.1023/A:1008199403446>.
  26. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. V. Gool, A comparison of affine region detectors , International Journal of Computer Vision 65 (1–2) (2005) 43–72. doi:10.1007/s11263-005-3848-x . Source: <http://dx.doi.org/10.1007/s11263-005-3848-x>.
  27. B. Li, R. Xiao, Z. Li, R. Cai, B.-L. Lu, L. Zhang, Rank-sift: Learning to rank repeatable local interest points , in: CVPR 2011, IEEE, 2011. doi:10.1109/cvpr.2011.5995461 . Source: <http://dx.doi.org/10.1109/CVPR.2011.5995461>.
  28. M. Muja, D. G. Lowe, Fast approximate nearest neighbors with automatic algorithm configuration., VISAPP (1) 2 (331-340) (2009) 2.
  29. S. O. Emelyanov, A. A. Ivanova, E. A. Shvets, D. P. Nikolaev, Methods of training data augmentation in the task of image classification, Sensory systems 32 (3) (2018) 236–245. doi:DOI: 10.1134/S0235009218030058.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20