(45-1) 12 * << * >> * Русский * English * Содержание * Все выпуски

Algorithm for choosing the best frame in a video stream in the task of identity document recognition
M.A. Aliev^1,4, I.A. Kunina^1,2,3, A.V. Kazbekov¹, V.L. Arlazarov⁴

¹Smart Engines Service LLC, Moscow, Russia,

²Institute for Information Transmission Problems (Kharkevich Institute) RAS, Moscow, Russia,

³Moscow Institute of Physics and Technology (State University), Moscow, Russia,

⁴Federal Research Center Computer Science and Control RAS, Moscow, Russia

PDF, 3121 kB

DOI: 10.18287/2412-6179-CO-811

Страницы: 101-109.

Язык статьи: English

Аннотация:
During the process of document recognition in a video stream using a mobile device camera, the image quality of the document varies greatly from frame to frame. Sometimes recognition system is required not only to recognize all the specified attributes of the document, but also to select final document image of the best quality. This is necessary, for example, for archiving or providing various services; in some countries it can be required by law. In this case, recognition system needs to assess the quality of frames in the video stream and choose the "best" frame. In this paper we considered the solution to such a problem where the "best" frame means the presence of all specified attributes in a readable form in the document image. The method was set up on a private dataset, and then tested on documents from the open MIDV-2019 dataset. A practically applicable result was obtained for use in recognition systems.

Ключевые слова:
human perception, quality assessment, document images, blur, sharpness, flares.

Благодарности
This work was partially supported by the Russian Foundation for Basic Research (projects ## 17-29-03161, 18-07-01387).

Citation:
Aliev MA, Kunina IA, Kazbekov AV, Arlazarov VL. Algorithm for choosing the best frame in a video stream in the task of identity document recognition. Computer Optics 2021; 45(1): 101-109. DOI: 10.18287/2412-6179-CO-811.

Литература:

Bulatov KB, Arlazarov VV, Chernov TS, Slavin OA, Nikolaev DP. Smart idreader: Document recognition in video stream. ICDAR 2017: 39-44. DOI: 10.1109/ICDAR.2017.347.
Puybareau É, Géraud T. Real-time document detection in smartphone videos. 25th IEEE ICIP 2018: 1498-1502. DOI: 10.1109/ICIP.2018.8451533.
Polevoy DV, Bulatov KB, Skoryukina NS, Chernov TS, Arlazarov VV, Sheshkus AV. Key aspects of document recognition using small digital cameras. Russian Foundation for Basic Research Journal 2016; 4: 97-108. DOI: 10.22204/2410-4639-2016-092-04-97-108.
Bulatov K. Selecting optimal strategy for combining per-frame character recognition results in video stream. ITiVS 2017; 3: 45-55.
Bulatov KB. A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives. Vestnik YuUrGU MMP 2019; 12(3): 74-88. DOI: 10.14529/mmp190307.
Dodge S, Karam L. Understanding how image quality affects deep neural networks. Eighth International Conference on Quality of Multimedia Experience (QoMEX) 2016: 1-6.
Bulatov K, Matalov D, Arlazarov VV. MIDV-2019: Challenges of the modern mobile-based document OCR. Proc SPIE 2020; 11433:114332N. DOI: 10.1117/12.2558438.
Li H, Zhu F, Qiu J. CG-DIQA: No-reference document image quality assessment based on character gradient. 24th International Conference on Pattern Recognition (ICPR) 2018: 3622-3626. DOI: 10.1109/ICPR.2018.8545433.
Obafemi-Ajayi T, Agam G. Character-based automated human perception quality assessment in document images. IEEE Transactions on Systems, Man, and Cybernetics (TSMC) 2012; 42: 584-595. DOI: 10.1109/TSMCA.2011.2170417.
Nayef N, Ogier J-M. Metric-based no-reference quality assessment of heterogeneous document images. Proc SPIE 2015; 9402: 94020L.
Cannon M, Hochberg J, Kelly P. Quality assessment and restoration of typewritten document images. Int J Doc Anal Recognit 1999; 2(2-3): 80-89.
Alaei A, Conte D, Raveaux R. Document image quality assessment based on improved gradient magnitude similarity deviation. 13th ICDAR 2015: 176-180.
Kang L, Ye P, Li Y, Doermann D. A deep learning approach to document image quality assessment. IEEE ICIP 2014: 2570-2574.
Singh P, Vats E, Hast A. Learning surrogate models of document image quality metrics for automated document image processing. 13th IAPR International Workshop on Document Analysis Systems (DAS) 2018: 67-72.
Zhan Y, Zhang R. No-reference image sharpness assessment based on maximum gradient and variability of gradients. IEEE Transactions on Multimedia2018; 20(7): 1796-1808.
Marziliano P, Dufaux F, Winker S, Ebrahimi T. Perceptual blur and ringing metrics: Applications to jpeg2000. Signal Process Image Commun 2004; 19: 163-172.
SmartDoc-QA: A dataset for quality assessment of smartphone captured document images – single and multiple distortions. 2015. Source: <https://hal.archives-ouvertes.fr/hal-01319900>.
Kumar J, Ye P, Doermann D. A dataset for quality assessment of camera captured document images. In Book: Iwamura M, Shafait F, eds. Camera-based document analysis and recognition. Cham: Springer International Publishing; 2014: 113-125.
Chabchoub F, Kessentini Y, Kanoun S, Eglin V, Lebourgeois F. SmartATID: A mobile captured arabic text images dataset for multi-purpose recognition tasks. 15th ICFHR 2016: 120-125.
Skoryukina N, Shemiakina J, Arlazarov VL, Faradjev I. Document localization algorithms based on feature points and straight lines. Proc SPIE 2018; 10696: 106961H. DOI: 10.1117/12.2311478.
Shemyakina J, Zhukovskiy A, Nikolaev D. The method for homogrpaphy estimation between two planes based on lines and points. Proc SPIE 2018; 10696: 106961G. DOI: 10.1117/12.2310111.
Povolotskiy MA, Tropin DV, Chernov TS, Savelev BI. Dynamic programming approach to textual structured objects segmentation in images. ITiVS 2019; 69(3): 66-78. DOI: 10.14357/20718632190306.
Arlazarov VV, Bulatov KB, Karpenko SM. Recognition confidence determining method for embossed symbol recognition problem [In Russian]. Trudy ISA RAN 2013; 63(3): 117-122.
Bulatov KB, Ilin DA, Polevoy DV, Chernyshova YS. Recognition problems of machine-readable zones using small-format digital cameras of mobile devices [In Russian]. Trudy ISA RAN 2015; 65(3): 85-93.
Bulatov K, Polevoy D. Reducing overconfidence in neural networks by dynamic variation of recognizer relevance. ECMS 2015: 488-491. DOI: 10.7148/2015-0488.
Council of the european union. 2020. Source: <https://www.consilium.europa.eu/prado/en/search-by-document-country.html>.
Chernov TS, Razumnuy NP, Kozharinov AS, Nikolaev DP, Arlazarov VV. Image quality assessment for video stream recognition systems. Proc SPIE 2018; 10696: 106961U. DOI: 10.1117/12.2309628.
Article experimental data. 2020. Source: <ftp://vis.iitp.ru/best_frame_article_data/>.
Lange H. Automatic glare removal in reflectance imagery of the uterine cervix. Proc SPIE 2005: 5747: 2183-2192. DOI: 10.1117/12.596012.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта:journal@computeroptics.ru ; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20

Algorithm for choosing the best frame in a video stream in the task of identity document recognition M.A. Aliev 1,4, I.A. Kunina 1,2,3, A.V. Kazbekov 1, V.L. Arlazarov 4

1 Smart Engines Service LLC, Moscow, Russia,

2 Institute for Information Transmission Problems (Kharkevich Institute) RAS, Moscow, Russia,

3 Moscow Institute of Physics and Technology (State University), Moscow, Russia,

4 Federal Research Center Computer Science and Control RAS, Moscow, Russia

Algorithm for choosing the best frame in a video stream in the task of identity document recognition
M.A. Aliev^1,4, I.A. Kunina^1,2,3, A.V. Kazbekov¹, V.L. Arlazarov⁴

¹Smart Engines Service LLC, Moscow, Russia,

²Institute for Information Transmission Problems (Kharkevich Institute) RAS, Moscow, Russia,

³Moscow Institute of Physics and Technology (State University), Moscow, Russia,

⁴Federal Research Center Computer Science and Control RAS, Moscow, Russia