(46-3) 10 * << * >> * Russian * English * Content * All Issues

Tiny CNN for feature point description for document analysis: approach and dataset
A. Sheshkus 1,2,3, A. Chirvonaya 3,4, V.L. Arlazarov 2,3

Moscow Institute for Physics and Technology, 141701, Russia, Moscow Region, Dolgoprudny, Institutskiy per., 9;
Institute for Systems Analysis, Federal Research Center "Computer Science and Control"
of Russian Academy of Sciences, 117312, Moscow, Russia, pr. 60-letiya Oktyabrya, 9;
Smart Engines Service LLC, 117312, Moscow, Russia, pr. 60-letiya Oktyabrya, 9;
National University of Science and Technology "MISIS", 119049, Moscow, Russia, Leninskiy prospect, 4

 PDF, 1461 kB

DOI: 10.18287/2412-6179-CO-1016

Pages: 429-435.

Full text of article: English language.

In this paper, we study the problem of feature points description in the context of document analysis and template matching. Our study shows that specific training data is required for the task especially if we are to train a lightweight neural network that will be usable on devices with limited computational resources. In this paper, we construct and provide a dataset of photo and synthetically generated images and a method of training patches generation from it. We prove the effectiveness of this data by training a lightweight neural network and show how it performs in both general and documents patches matching. The training was done on the provided dataset in comparison with HPatches training dataset and for the testing, we solve HPatches testing framework tasks and template matching task on two publicly available datasets with various documents pictured on complex backgrounds: MIDV-500 and MIDV-2019.

feature points description, metrics learning, training dataset.

Sheshkus A, Chirvonaya A, Arlazarov VL. Tiny CNN for feature point description for document analysis: approach and dataset. Computer Optics 2022; 46(3): 429-435. DOI: 10.18287/2412-6179-CO-1016.

This work was supported by the Russian Foundation for Basic Research (projects 18-29-26033 and 19-29-09064).


  1. Kougia V, Pavlopoulos J, Androutsopoulos I. Medical image tagging by deep learning and retrieval. In Book: Arampatzis A. et al, eds. Experimental IR meets multilinguality, multimodality, and interaction. CLEF 2020. Cham: Springer; 2020: 154-166. DOI: 10.1007/978-3-030-58219-7_14.
  2. Shin Y, Seo K, Ahn J, Im DH. Deep-learning-based image tagging for semantic image annotation. In Book: Park J, Park DS, Jeong YS, Pan Y, eds. Advances in computer science and ubiquitous computing. CSA-CUTE 2018. 2018. Singapore: Springer; 2019: 54-59. DOI: 10.1007/978-981-13-9341-9_10.
  3. William I, Ignatius Moses Setiadi DR, Rachmawanto EH, Santoso HA, Sari CA. Face recognition using FaceNet (survey, performance test, and comparison). Fourth Int Conf on Informatics and Computing 2019 (ICIC) 2019; 1: 1-6. DOI: 10.1109/ICIC47613.2019.8985786.
  4. Skoryukina N, Arlazarov V, Nikolaev D. Fast method of ID documents location and type identification for mobile and server application. Int Conf on Document Analysis and Recognition, 2019 (ICDAR) 2019; 1: 850-857. DOI: 10.1109/ICDAR.2019.00141.
  5. Kumar M, Gupta S, Mohan N. A computational approach for printed document forensics using SURF and ORB features. Soft Comput 2020; 24(1): 13197-13208. DOI: 10.1007/s00500-020-04733-x.
  6. Ilyuhin SA, Sheshkus AV, Arlazarov VL. Recognition of images of Korean characters using embedded networks. In Book: Wolfgang O, Nikolaev D, Zhou J, eds. Twelfth Int Conf on Machine Vision 2019 (ICMV) 2020; 11433: 1-7. DOI: 10.10007/1234567890.
  7. Duan Y, Lu J, Wang Z, Feng J, Zhou J. Learning deep binary descriptor with multi-quantization. IEEE Conf on Computer Vision and Pattern Recognition 2017; 1: 1183-1192. DOI: 10.1109/CVPR.2017.516.
  8. Zhang J, Ye S, Huang T, Rui Y. CDbin: Compact discriminative binary descriptor learned with efficient neural network. IEEE Trans Circuits Syst Video Technol 2020; 30(3): 862-874. DOI: 10.1109/TCSVT.2019.2896095.
  9. Balntas V, Lenc K, Vedaldi A, Mikolajczyk K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. IEEE Conf on Computer Vision and Pattern Recognition 2017: 5173-5182.
  10. Hoffer E, Ailon N. Deep metric learning using triplet network. In Book: Feragen A, Pelillo M, Loog M, eds. Similarity-based pattern recognition 2015 (SIMBAD). Cham: Springer; 2015: 84-92. DOI: 10.1007/978-3-319-24261-3_7.
  11. Mishra A, Liwicki M. Using deep object features for image descriptions. arXiv preprint. Source: <https://arxiv.org/abs/1902.09969>.
  12. Paulin M, Douze M, Harchaoui Z, Mairal J, PerroninF, Schmid C. Local convolutional features with unsupervised training for image retrieval. 2015 IEEE Int Conf on Computer Vision (ICCV) 2016; 1: 91-99. DOI: 10.1109/ICCV.2015.19.
  13. Schultz M, Joachims T. Learning a distance metric from relative comparisons. Adv Neural Inf Process Syst 2004; 16(1): 41-48.
  14. Cacheux YL, Borgne HL, Crucianu M. Modeling inter and intra-class relations in the triplet loss for zero-shot learning. Proc IEEE/CVF Int Conf on Computer Vision (ICCV) 2019; 1: 10333-10342.
  15. Chen W, Chen X, Zhang J, Huang K.: Beyond triplet loss: a deep quadruplet network for person re-identification. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2017; 1: 403-412.
  16. Chernyshova YS, Gayer AV, Sheshkus AV. Generation method of synthetic training data for mobile OCR system. Proc SPIE 2018; 10696: 106962G. DOI: 10.1117/12.2310119.
  17. Nikolaev DP, Karpenko SM, Nikolaev IP, Nikolayev PP. Hough transform: underestimated tool in the computer vision field. Proc 22th European Conf on Modelling and Simulation 2008: 238-246. DOI: 10.7148/2008-0238.
  18. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science 2006; 313(5786): 504-507. DOI: 10.1126/science.1127647.
  19. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint. Source: <https://arxiv.org/abs/1602.07360>.
  20. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint. Source: <https://arxiv.org/abs/1704.04861>.
  21. Mishchuk A, Mishkin D, Radenovic F, Matas J. Working hard to know your neighbor’s margins: Local descriptor learning loss. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems 2017: 4826-4837.
  22. Zhao Y, Jin Z, Qi GJ, Lu H, Hua XS. An adversarial approach to hard triplet generation. In Book: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer vision – Proceedings of the European conference on computer vision 2018. Cham: Springer; 2018: 501-517. DOI: 10.1007/978-3-030-01240-3_31.
  23. Sikaroudi M, Ghojogh B, Safarpoor A, Karray F, Crowley M, Tizhoosh HR. Offline versus online triplet mining based on extreme distances of histopathology patches. In Book: Bebis G. et al, eds. Advances in visual computing 2020. Cham: Springer; 2020: 333-345. DOI: 10.1007/978-3-030-64556-4_26.
  24. Gayer AV, Chernyshova YS, Sheshkus AV. Effective real-time augmentation of training dataset for the neural networks learning. Proc SPIE 2018; 11041: 10411I. DOI: 10.1117/12.2522969.
  25. Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proc Thirteenth Int Conf on Artificial Intelligence and statistics (AISTAST) 2010; 9: 249-256.
  26. Arlazarov VV, Bulatov KB, Chernov TS, Arlazarov VL. MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019; 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
  27. Bulatov K, Matalov D, Arlazarov VV. MIDV-2019: challenges of the modern mobile-based document OCR. Proc SPIE 2019; 11433: 114332N. DOI: 10.1117/12.2558438.
  28. Arandjelovic R, Zisserman A. Three things everyone should know to improve object retrieval. Proc 2012 IEEE Conf on Computer Vision and Pattern Recognition 2012: 2911-2918. DOI: 10.1109/CVPR.2012.6248018.
  29. Calonder M, Lepetit V, Strecha C, Fua P. BRIEF: Binary robust independent elementary features. In Book: Daniilidis K, Maragos P, Paragios N, eds. Proceedings of the 11th European conference on computer vision.  Berlin, Heidelberg: Springer; 2010: 778-792. DOI: 10.1007/978-3-642-15561-1_56.
  30. Lowe DG. Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf on Computer Vision 1999; 2: 1150-1157. DOI: 10.1109/ICCV.1999.790410.
  31. Trzcinski T, Christoudias M, Lepetit V. Learning image descriptors with boosting. IEEE Trans Pattern Anal Mach Intell 2015; 37(3): 597-610. DOI: 10.1109/TPAMI.2014.2343961.
  32. Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. Proc 2015 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2015: 4353-4361. DOI: 10.1109/CVPR.2015.7299064.
  33. Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F. Discriminative learning of deep convolutional feature point descriptors. Proc 2015 IEEE Int Conf on Computer Vision (ICCV) 2015: 118-126. DOI: 10.1109/ICCV.2015.22.
  34. Balntas V, Riba E, Ponsa D, Mikolajczyk K. Learning local feature descriptors with triplets and shallow convolutional neural networks. Proc British Machine Vision Conf 2016: 119.1-119.11. DOI: 10.5244/C.30.119.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20