(46-2) 12 * << * >> * Русский * English * Содержание * Все выпуски

MIDV-2020: a comprehensive benchmark dataset for identity document analysis
K.B. Bulatov^1,2, E.V. Emelianova^2,3, D.V. Tropin^1,2,4, N.S. Skoryukina^1,2, Y.S. Chernyshova^1,2, A.V. Sheshkus^1,2,
S.A. Usilin^1,2, Z. Ming⁵, J.-C. Burie⁵, M.M. Luqman⁵, V.V. Arlazarov^1,2

¹Federal Research Center «Computer Science and Control» or Russian Academy of Sciences, Moscow, Russia;
²Smart Engines Service LLC, Moscow, Russia;
³National University of Science and Technology «MISiS», Moscow, Russia;
⁴Moscow Institute of Physics and Technology (State University), Moscow, Russia;
⁵L3i Laboratory, La Rochelle University, La Rochelle, France

PDF, 7455 kB

DOI: 10.18287/2412-6179-CO-1006

Страницы: 252-270.

Язык статьи: English.

Аннотация:
Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. The dataset contains 72409 annotated images in total, making it the largest publicly available identity document dataset to the date of publication. We describe the structure of the dataset, its content and annotations, and present baseline experimental results to serve as a basis for future research. For the task of document location and identification content-independent, feature-based, and semantic segmentation-based methods were evaluated. For the task of document text field recognition, the Tesseract system was evaluated on field and character levels with grouping by field alphabets and document types. For the task of face detection, the performance of Multi Task Cascaded Convolutional Neural Networks-based method was evaluated separately for different types of image input modes. The baseline evaluations show that the existing methods of identity document analysis have a lot of room for improvement given modern challenges. We believe that the proposed dataset will prove invaluable for advancement of the field of document analysis and recognition.

Ключевые слова:
document analysis, document recognition, identity documents, open data, video recognition, document location, text recognition, face detection.

Citation:
Bulatov KB, Emelianova EV, Tropin DV, Skoryukina NS, Chernyshova YS, Sheshkus AV, Usilin SA, Ming Z, Burie JC, Luqman MM, Arlazarov VV. MIDV-2020: a comprehensive benchmark dataset for identity document analysis. Computer Optics 2022; 46(2): 252-270. DOI: 10.18287/2412-6179-CO-1006.

Благодарности
This work is partially supported by Russian Foundation for Basic Research (projects 19-29-09066 and 19-29-09092). All source images for MIDV-2020 dataset were obtained from Wikimedia Commons. Author attributions for each source images are listed in the original MIDV-500 description table (ftp://smartengines.com/midv-500/documents.pdf). Face images by Generated Photos (https://generated.photos).

References:

Fang X, Fu X, Xu X. ID card identification system based on image recognition. 12th IEEE Conf on Industrial Electronics and Applications (ICIEA) 2017: 1488-1492. DOI: 10.1109/ICIEA.2017.8283074.
Attivissimo F, Giaquinto N, Scarpetta M, Spadavecchia M. An automatic reader of identity documents. IEEE Int Conf on Systems, Man and Cybernetics (SMC) 2019: 3525-3530. DOI: 10.1109/SMC.2019.8914438.
Kuklinski T, Monk B. The use of ID reader-authenticators in secure access control and credentialing. IEEE Conf on Technologies for Homeland Security 2008: 246-251. DOI: 10.1109/THS.2008.4534458.
Soares A, das Neves Junior R, Bezerra B. BID Dataset: a challenge dataset for document processing tasks. Anais Estendidos do XXXIII Conf on Graphics, Patterns and Images 2020: 143-146. DOI:10.5753/sibgrapi.est.2020.12997.
Ghanmi N, Nabli C, Awal AM. CheckSim: A reference-based identity document verification by image similarity measure. In Book: Smith EHB, Pal U, eds. Document analysis and recognition – ICDAR 2021 Workshops. Springer Nature Switzerland AG; 2021: 422-436. DOI: 10.1007/978-3-030-86198-8_30.
Chiron G, Arrestier F, Awal AM. Fast end-to-end deep learning identity document detection, classification and cropping. In Book: Lladós J, Lopresti D, Uchida S, eds. Document analysis and recognition – ICDAR 2021. Springer Nature Switzerland AG; 2021: 333-347. DOI: 10.1007/978-3-030-86337-1_23.
Ngoc MOV, Fabrizio J, Geraud T. Saliency-based detection of identity documents captured by smartphones. 12th IAPR Int Workshop on Document Analysis Systems (DAS) 2018: 387-392. DOI: 10.1109/DAS.2018.17.
Arlazarov VV, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer optics 2019; 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.
Bulatov K, Matalov D, Arlazarov VV. MIDV-2019: challenges of the modern mobile-based document OCR. Proc SPIE 2020; 11433: 114332N. DOI: 10.1117/12.2558438.
Chazalon J, Gomez-Kramer P, Burie J-C, Coustaty M, Eskenazi S, Luqman M, Nayef N, Rusinol M, Sidere N, Ogier J. SmartDoc 2017 Video Capture: mobile document acquisition in video mode. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017; 4: 11-16. DOI: 10.1109/ICDAR.2017.306.
Skoryukina N, Arlazarov VV, Nikolaev D. Fast method of ID documents location and type identification for mobile and server application. Int Conf on Document Analysis and Recognition (ICDAR) 2019: 850-857. DOI: 10.1109/ICDAR.2019.00141.
Buonanno A, Nogarotto A, Cacace G, di Gennaro G, Palmieri FAN, Valenti M, Graditi G. Bayesian feature fusion using factor graph in reduced normal form. Appl 2021; 11(4): 1934. DOI: 10.3390/app11041934.
Skoryukina N, Faradjev I, Bulatov K, Arlazarov VV. Impact of geometrical restrictions in RANSAC sampling on the ID document classification. Proc SPIE 2020; 11433: 1143306. DOI: 10.1117/12.2559306.
Lynchenko A, Sheshkus A, Arlazarov VL. Document image recognition algorithm based on similarity metric robust to projective distortions for mobile devices. Proc SPIE 2019; 11041: 110411K. DOI: 10.1117/12.2523152.
das Neves Junior RB, Lima E, Bezerra BL, Zanchettin C, Toselli AH. HU-PageScan: a fully convolutional neural network for document page crop. IET Image Proces 2020; 14: 3890-3898. DOI: 10.1049/iet-ipr.2020.0532.
Sheshkus A, Nikolaev D, Arlazarov VL. Houghencoder: neural network architecture for document image semantic segmentation. IEEE Int Conf on Image Processing (ICIP), 2020: 1946-1950. DOI: 10.1109/ICIP40778.2020.9191182.
Bakkali S, Luqman MM, Ming Z, Burie J. Face detection in camera captured images of identity documents under challenging conditions. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019: 55-60. DOI: 10.1109/ICDARW.2019.30065.
Chernyshova YS, Sheshkus AV, Arlazarov VV. Two-step CNN framework for text line recognition in camera-captured images. IEEE Access 2020; 8: 32587-32600. DOI: 10.1109/ACCESS.2020.2974051.
Petrova O, Bulatov K, Arlazarov VV, Arlazarov VL. Weighted combination of per-frame recognition results for text recognition in a video stream. Computer optics 2021; 45(1): 77-89. DOI: 10.18287/2412-6179-CO-795.
Bulatov KB. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives [In Russian]. Vestnik Yuzhno-Ural'skogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie 2019; 12(3): 74-88. DOI: 10.14529/mmp190307.
Bulatov K, Razumnyi N, Arlazarov VV. On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model. Int J Doc Anal Recognit 2019; 22(3): 303-314. DOI: 10.1007/s10032-019-00333-0.
Bulatov K, Fedotova N, Arlazarov VV. Fast approximate modelling of the next combination result for stopping the text recognition in a video. 25th Int Conf on Pattern Recognition (ICPR) 2021: 239-246. DOI: 10.1109/ICPR48806.2021.9412574.
Aliev MA, Kunina IA, Kazbekov AV, Arlazarov VL. Algorithm for choosing the best frame in a video stream in the task of identity document recognition. Computer optics 2021; 45(1): 101-109. DOI: 10.18287/2412-6179-CO-811.
Chernov TS, Ilyuhin SA, Arlazarov VV. Application of dynamic saliency maps to video stream recognition systems with image quality assessment. Proc SPIE 2019; 11041: 110410T. DOI: 10.1117/12.2522768.
Myasnikov E, Savchenko A. Detection of sensitive textual information in user photo albums on mobile devices. Int Multi-Conf on Engineering, Computer and Information Sciences (SIBIRCON) 2019: 384-390. DOI: 10.1109/SIBIRCON48586.2019.8958325.
Castelblanco A, Solano J, Lopez C, Rivera E, Tengana L, Ochoa M. Machine learning techniques for identity document verification in uncontrolled environments: a case study. In Book: Mora KMF, Marín JA, Cerda J, Carrasco-Ochoa JA, José Francisco Martínez-Trinidad, José Arturo Olvera-López, eds. Pattern recognition. Proceedings. Springer Nature Switzerland AG; 2020: 271-281. DOI: 10.1007/978-3-030-49076-8_26.
Council of the European Union. PRADO – Public Register of Authentic identity and travel Documents Online, Source: <https://www.consilium.europa.eu/prado>.
Wikipedia. Category: Serbian masculine given names. Source: <https://en.wikipedia.org/wiki/Category:Serbian_masculine_given_names>.
Fantasy name generators: Azerbaijani names. Source: <https://www.fantasynamegenerators.com/azerbaijani-names.php>.
Generated Photos. Source: <https://generated.photos>.
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR) 2019: 4396-4405. DOI: 10.1109/CVPR.2019.00453.
Dutta A, Zisserman A. The VIA annotation software for images, audio and video. Proc 27th ACM Int Conf on Multimedia (MM'19) 2019: 2276-2279. DOI: 10.1145/3343031.3350535.
VGG Image Annotator (VIA). Source: <https://www.robots.ox.ac.uk/~vgg/software/via>.
Javed K, Shafait F. Real-time document localization in natural images by recursive application of a CNN. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 105-110. DOI: 10.1109/ICDAR.2017.26.
Zhu A, Zhang C, Zhi L, Xiong S. Coarse-to-fine document localization in natural scene images with regional attention and recursive corner refinement. Int J Doc Anal Recognit 2019; 22: 351-360. DOI: 10.1007/s10032-019-00341-0.
Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2015; 9445: 94452A. DOI: 10.1117/12.2181377.
Tropin DV, Ershov AM, Nikolaev DP, Arlazarov VV. Advanced Hough-based method for on-device document localization. Computer Optics 2021; 45(5): 702-712. DOI: 10.18287/2412-6179-CO-895.
Tropin DV, Ilyuhin SA, Nikolaev DP, Arlazarov VV. Approach for document detection by contrours and contrasts. 25th Int Conf on Pattern Recognition (ICPR) 2021: 9689-9695. DOI: 10.1109/ICPR48806.2021.9413271.
Tropin DV, Konovalenko IA, Skoryukina NS, Nikolaev DP, Arlazarov VV. Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio. Proc SPIE 2021; 11605: 116051F. DOI: 10.1117/12.2587029.
Ngoc MOV, Fabrizio J, Geraud T. Document detection in videos captured by smartphones using a saliency-based method. Int Conf on Document Analysis and Recognition Workshops (ICDARW) 2019: 19-24. DOI: 10.1109/ICDARW.2019.30059.
Burie J, et al. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). 13th Int Conf on Document Analysis and Recognition (ICDAR) 2015: 1161-1165. DOI: 10.1109/ICDAR.2015.7333943.
Liu L, Wang Z, Qiu T, Chen Q, Lu Y, Suen CY. Document image classification: Progress over two decades. Neurocomputing 2021; 453: 223-240. DOI: 10.1016/j.neucom.2021.04.114.
Chiron G, Ghanmi N, Awal AM. ID documents matching and localization with multi-hypothesis constraints. 25th Int Conf on Pattern Recognition (ICPR) 2021: 3644-3651. DOI: 10.1109/ICPR48806.2021.9412437.
Awal AM, Ghanmi N, Sicre R, Furon T. Complex document classification and localization application on identity document images. 14th IAPR Int Conf on Document Analysis and Recognition (ICDAR) 2017: 426-431. DOI: 10.1109/ICDAR.2017.77.
Bay H, Ess A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput Vis Image Underst 2008; 110(3): 346-359. DOI: 10.1016/j.cviu.2007.09.014.
Suarez I, Sfeir G, Buenaposada JM, Baumela L. BEBLID: boosted efficient binary local image descriptor. Patt recogn lett 2020; 133: 366-372. DOI: 10.1016/j.patrec.2020.04.005.
Smith R. An overview of the Tesseract OCR engine. 9th Int Conf on Document Analysis and Recognition (ICDAR) 2007: 629-633. DOI: 10.1109/ICDAR.2007.4376991.
Smith R, Podobny Z, et al. Tesseract OCR. Source: <https://github.com/tesseract-ocr/tesseract>.
ABBYY FineReader PDF: the smarter PDF solution. Source: <https://pdf.abbyy.com>.
Jain V, Learned-Miller E. FDDB: A benchmark for face detection in unconstrained settings. Amherst: University of Massachusetts; 2010: UM-CS-2010-009. Source: <http://vis-www.cs.umass.edu/fddb>.
Yang S, Luo P, Loy CC, Tang X. WIDER FACE: a face detection benchmark. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 5525-5533. DOI: 10.1109/CVPR.2016.596.
Hu P, Ramanan D. Finding tiny faces. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2017: 1522-1530. DOI: 10.1109/CVPR.2017.166.
Najibi M, Samangouei P, Chellappa R, Davis LS. SSH: single stage headless face detector. IEEE Int Conf on Computer Vision (ICCV) 2017: 4885-4894. DOI: 10.1109/ICCV.2017.522.
Yang S, Luo P, Loy C, Tang X. From facial parts responses to face detection: a deep learning approach. IEEE Int Conf on Computer Vision (ICCV) 2015: 3676-3684. DOI: 10.1109/ICCV.2015.419.
Zhu Y, Cai H, Zhang S, Wang C, Xiong Y. TinaFace: strong but simple baseline for face detection. arxiv Preprint. Source: <https://arxiv.org/abs/2011.13183>.
Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2015: 5325-5334. DOI: 10.1109/CVPR.2015.7299170.
Zhang K, Zhang Z, Li Z, Qiao Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Proces Lett 2016; 23(10): 1499-1503. DOI: 10.1109/LSP.2016.2603342.
Shi X, Shan S, Kan M, Wu S, Chen X. Real-time rotation-invariant face detection with progressive calibration networks. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2018: 2295-2303. DOI: 10.1109/CVPR.2018.00244.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. SSD: single shot multibox detector. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. Cham: Springer International Publishing AG; 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2.
Zhang S, Zhu X, Lei Z, Shi H, Wang X, Li SZ. S3FD: single shot scale-invariant face detector. IEEE Int Conf on Computer Vision (ICCV) 2017: 192-201. DOI: 10.1109/ICCV.2017.30.
Deng Z, Guo J, Ververas E, Kotsia I, Zafeiriou S. RetinaFace: single-stage dense face localisation in the wild. IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2020: 5203-5212. DOI: 10.1109/CVPR42600.2020.00525.
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL. Microsoft COCO: common objects in context. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision – ECCV 2014. Switzerland: Springer International Publishing; 2014. DOI: 10.1007/978-3-319-10602-1_48.
The source code for average precision metric. Source: <https://github.com/hengxyz/MIDV_2020_det>.
Hartl AD, Arth C, Grubert, J, Schmalstieg D. Efficient verification of holograms using mobile augmented reality. IEEE Trans Vis Comput Graph 2015; 22(7): 1843-1851. DOI: 10.1109/TVCG.2015.2498612.
Polevoy DV, Panfilova EI, Nikolaev DP. White balance correction for detection of holograms in color images of black and white photographs [In Russian]. Journal of Information Technologies and Computing Systems 2021; 3: 82-95. DOI: 10.14357/20718632210308.
MIDV-2020 dataset. Mirror 1. Source: <ftp://smartengines.com/midv-2020>.
MIDV-2020 dataset. Mirror 2. Source: <http://l3i-share.univ-lr.fr>.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20

MIDV-2020: a comprehensive benchmark dataset for identity document analysis K.B. Bulatov 1,2, E.V. Emelianova 2,3, D.V. Tropin 1,2,4, N.S. Skoryukina 1,2, Y.S. Chernyshova 1,2, A.V. Sheshkus 1,2, S.A. Usilin 1,2, Z. Ming 5, J.-C. Burie 5, M.M. Luqman 5, V.V. Arlazarov 1,2

MIDV-2020: a comprehensive benchmark dataset for identity document analysis
K.B. Bulatov^1,2, E.V. Emelianova^2,3, D.V. Tropin^1,2,4, N.S. Skoryukina^1,2, Y.S. Chernyshova^1,2, A.V. Sheshkus^1,2,
S.A. Usilin^1,2, Z. Ming⁵, J.-C. Burie⁵, M.M. Luqman⁵, V.V. Arlazarov^1,2