(43-5) 15 * << * >> * Russian * English * Content * All Issues

MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream

V.V. Arlazarov1,2,3, K. Bulatov1,2,3, T. Chernov3, V.L. Arlazarov1,2,3

Moscow Institute of Physics and Technology (State University), Moscow, Russia,  
Institute for Systems Analysis, FRC CSC RAS, Moscow, Russia,
LLC "Smart Engines Service", Moscow, Russia

 PDF, 1268 kB

DOI: 10.18287/2412-6179-2019-43-5-818-824

Pages: 818-824.

Full text of article: English language.

A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their sensitiveness as they contain personal data, all source document images used in MIDV-500 are either in public domain or distributed under public copyright licenses.
The main goal of this paper is to present a dataset. However, in addition and as a baseline, we present evaluation results for existing methods for face detection, text line recognition, and document data extraction, using the presented dataset.

document analysis and recognition, dataset, identity documents, video stream recognition.

Arlazarov VV, Bulatov K, Chernov T, Arlazarov VL. MIDV-500: a dataset for identity document analysis and recognition on mobile devices in video stream. Computer Optics 2019, 43(5): 818-824. DOI: 10.18287/2412-6179-2019-43-5-818-824.

This work is partially supported by Russian Foundation for Basic Research (projects 17-29-03170 and 17-29-03370). All source images for MIDV-500 dataset are obtained from Wikimedia Commons. Author attributions for each source images are listed in the description table at ftp://smartengines.com/midv-500/documents.pdf.


  1. Gai K, Qiu M, Sun X. A survey on fintech. J Netw Comput Appl 2017; 103: 262-273. DOI: 10.1016/j.jnca.2017.10.011.
  2. De Koker L. Money laundering compliance – the challenges of technology. In Book: Dion M, Weisstub D, Richet JL, eds. Financial crimes: Psychological, technological, and ethical issues. Cham: Springer; 2016: 329-347.
  3. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 85/46/EC (General Data Protection Regulation). Official J Eur Union 2016; L119: 1-88.
  4. de las Heras L-P, Terrades OR, Llados J, Fernandez-Mota D, Canero C. Use case visual bag-of-words techniques for camera based identity document classification. Int Conf Doc Anal Recognit (ICDAR) 2015; 721-725.
  5. Awal AM, Ghanmi N, Sicre R, Furon T. Complex document classification and localization application on identity document images. Int Conf Doc Anal Recognit (ICDAR) 2017; 1: 426-431.
  6. Simon M, Rodner E, Denzler J. Fine-grained classification of identity document types with only one example. IEEE MVA 2015; 126-129.
  7. Usilin S, Nikolaev D, Postnikov V, Schaefer G. Visual appearance based document image classification. Int Conf Image Process (ICIP) 2010; 2133-2136.
  8. Skoryukina N, Nikolaev DP, Sheshkus A, Polevoy D. Real time rectangular document detection on mobile devices. Proc SPIE 2015; 9445: 94452A.
  9. Burie JC, Chazalon J, Coustaty M, Eskenazi S, Luqman MM, Mehri M, Nayef N, Ogier JM, Prum S, Rusinol M. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). Int Conf Doc Anal Recognit (ICDAR) 2015; 1161-1165.
  10. Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-text: Dataset and benchmark for text detection and recognition in natural images. Source: <https://arxiv.org/abs/1601.07140>.
  11. Pratikakis I, Zagoris K, Barlas G, Gatos B. ICDAR2017 competition on document image binarization (DIBCO 2017). Int Conf Doc Anal Recognit (ICDAR) 2017; 1:1395-1403.
  12. LeCun Y. The MNIST database of handwritten digits. 1998. Source: <http://yann.lecun.com/exdb/mnist>.
  13. Zhang Y, Gueguen L, Zharkov I, Zhang P, Seifert K, Kadlec B. Uber-text: A large-scale dataset for optical character recognition from street-level imagery. SUNw: Scene Underst Workshop, CVPR 2017.
  14. Chazalon J, Gomez-Kramer P, Burie JC, Coustaty M, Eskenazi S, Luqman M, Nayef N, Rusinol M, Sidere N, Ogier JM. SmartDoc 2017 video capture: Mobile document acquisition in video mode. Int Conf Doc Anal Recognit (ICDAR) 2017; 4: 11-16.
  15. Sidere N, Cruz F, Coustaty M, Ogier JM. A dataset for forgery detection and spotting in document images. IEEE Emerg Security Technol 2017; 26-31.
  16. Harley AW, Ufkes A, Derpanis KG. Evaluation of deep convolutional nets for document image classification and retrieval. Int Conf Doc Anal Recognit (ICDAR) 2015; 991-995.
  17. Antonacopoulos A, Bridson D, Papadopoulos C, Pletschacher S. A realistic dataset for performance evaluation of document layout analysis. Int Conf Doc Anal Recognit (ICDAR) 2009; 296-300.
  18. Clausner C, Antonacopoulos A, Pletschacher S. ICDAR2017 competition on recognition of documents with complex layouts – RDCL2017. Int Cond Doc Anal Recognit (ICDAR) 2017; 1:1404-1410.
  19. Kumar J, Ye P, Doermann D. A dataset for quality assessment of camera captured document images. In Book: Iwamura M, Shafait F, eds. Camera-based document analysis and recognition. Cham: Springer; 2013: 113-125.
  20. Nayef N, Luqman MM, Prum S, Eskenazi S, Chazalon J, Ogier JM. SmartDoc-QA: A dataset for quality assessment of smartphone captured document images - single and multiple distortions. Int Conf Doc Anal Recognit (ICDAR) 2015; 1231-1235.
  21. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. IEEE CVPR 2009; 248-255.
  22. Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D, Murphy K. OpenImages: A public dataset for large-scale multi-label and multi-class image classification. 2017. Source: <https://github.com/amukka/openimages>.
  23. Chernov T, Kolmakov S, Nikolaev D. An algorithm for detection and phase estimation of protective elements periodic lattice on document image. Pattern Recognit Image Anal 2017; 22(1): 53-65.
  24. Chernov TS, Razumnuy NP, Kozharinov AS, Nikolaev DP, Arlazarov VV. Image quality assessment for video stream recognition systems. Proc SPIE 2018; 10696: 106961U.
  25. Bulatov K, Arlazarov V, Chernov T, Slavin O, Nikolaev D. Smart IDReader: Document recognition in video stream. Int Conf Doc Anal Recognit (ICDAR) 2017; 6: 39-44.
  26. Zhukovskiy AE, Nikolaev DP, Arlazarov VV, Postnikov VV, Polevoy DV, Skoryukina NS, Chernov TS, Shemyakina YA, Mukovozov AA, Konovalenko IA, Povolotskiy MA. Segments graph-based approach for document capture in a smartphone video stream. Int Conf Doc Anal Recognit (ICDAR) 2017; 1: 337-342.
  27. Anantharajah K, Denman S, Sridharan S, Fookes C, Tjondronegoro D. Quality based frame selection for video face recognition. 6th Int Conf Signal Process Comm Systems (ICSPCS) 2012; 1-5.
  28. Zilberstein S. Using anytime algorithms in intelligent systems. AI Magazine 1996; 17(3): 73-83.
  29. King DE. Dlib-ml: A machine learning toolkit. J Mach Learn Res 2009; 10:1755-1758.
  30. Bradski G. The OpenCV Library Dr Dobb's J Softw Tools. 2000. Source: <http://www.drdobbs.com/open-source/the-opencv-library/184404319>.
  31. Smith R. An overview of the Tesseract OCR engine. Int Conf Doc Anal Recognit (ICDAR) 2007; 629-633.
  32. Ocrad – the GNU OCR. 2017. Source: <https://www.gnu.org/software/ocrad/>.
  33. Yujian L, Bo L. A normalized Levenshtein distance metric. IEEE Trans Pattern Anal Mach Intell 2007; 29(6):1091-1095.


© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: ko@smr.ru ; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20