Neural network model for video-based face recognition with frames quality assessment
Nikitin M.Yu., Konushin V.S., Konushin A.S.

M.V. Lomonosov Moscow State University, Moscow, Russia,
Video Analysis Technologies LLC, Moscow, Russia,
National Research University Higher School of Economics, Moscow, Russia

Full text of article: Russian language.

PDF

Abstract:
This paper addresses a problem of video-based face recognition. We propose a new neural network model that uses an input set of facial images of a person to produce a compact, fixed-dimension descriptor. Our model is composed of two modules. The feature embedding module maps each image onto a feature vector, while the face quality assessment module estimates the utility of each facial image. These feature vectors are weighted based on their utility estimations, resulting in the image set feature representation. During visual analysis we found that our model learns to use more information from high-quality face images and less information from blurred or occluded images. The experiments on YouTube Faces and Janus Benchmark A (IJB-A) datasets show that the proposed feature aggregation method based on face quality assessment consistently outperforms naïve aggregation methods.

Keywords:
face recognition, video analysis, neural networks, deep learning, machine vision algorithms.

Citation:
Nikitin MYu, Konushin VS, Konushin AS. Neural network model for video-based face recognition with frames quality assessment. Computer Optics 2017; 41(5): 732-742. DOI: 10.18287/2412-6179-2017-41-5-732-742.

References:

Kalinovskii IA, Spitsyn VG. Review and testing of frontal face detectors. Computer Optics 2016; 40(1): 99-111. DOI: 10.18287/2412-6179-2016-40-1- 99-111.
Wong Y, Chen S, Mau S, Sanderson C, Lovell BC. Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition. CVPRW 2011: 74-81. DOI: 10.1109/CVPRW.2011.5981881.
Nikitin M, Konushin V, Konushin A. Face quality assessment for face verification in video. GraphiCon 2014: 111-114.
Chen Y-C, Patel VM, Phillips PJ, Chellappa R. Dictionary-based face recognition from video. European Conference on Computer Vision 2012: 766-779. DOI: 10.1007/978-3-642-33783-3_55.
Lu J, Wang G, Deng W, Moulin P. Simultaneous feature and dictionary learning for image set based face recognition. European Conference on Computer Vision 2014: 265-280. DOI: 10.1007/978-3-319-10590-1_18.
Zhang M, He R, Cao D, Sun Z, Tan T. Simultaneous feature and sample reduction for image-set classification. AAAI'16 2016: 1401-1407.
Cevikalp H, Triggs B. Face recognition based on image sets. CVPR 2010: 2567-2573. DOI: 10.1109/CVPR.2010.5539965.
Kim TK, Kittler J, Cipolla R. Discriminative learning and recognition of image set classes using canonical correlations. IEEE Transactions on Pattern Analysis and Machine Intelligence 2007; 29(6): 1005-1018. DOI: 10.1109/TPAMI.2007.1037.
Cui Z, Shan S, Zhang H, Lao S, Chen X. Image sets alignment for Video-Based Face Recognition. CVPR 2012: 2626-2633. DOI: 10.1109/CVPR.2012.6247982.
Huang Z, Wang R, Shan S, Chen X. Face recognition on large-scale video in the wild with hybrid Euclidean-and-Riemannian metric learning. Pattern Recognition 2015; 48(10): 3113-3124. DOI: 10.1016/j.patcog.2015.03.011.
Huang Z, Wang R, Shan S, Li X, Chen X. Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification. International Conference on Machine Learning 2015; 37: 720-729.
Wang W, Wang R, Huang Z, Shan S, Chen X. Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets. CVPR 2015: 2048-2057. DOI: 10.1109/CVPR.2015.7298816
Kukharenko AI, Konushin AS. Simultaneous classification of several features of a person’s appearance using a deep convolutional neural network. Pattern Recognition and Image Analysis 2015; 25(3): 461-465. DOI: 10.1134/S1054661815030128.
Vizilter YV, Gorbatsevich VS, Vorotnikov AV, Kostromov NA. Real-time face identification via CNN and boosted hashing forest. Computer Optics 2017; 41(2): 254-265. DOI: 10.18287/2412-6179-2017-41-2-254-265.
Taigman Y, Yang M, Ranzato M, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014: 1701-1708. DOI: 10.1109/CVPR.2014.220.
Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015: 815-823.
Parkhi OM, Vedaldi A, Zisserman A. Deep face recognition. Proceedings of the British Machine Vision Conference 2015; 1(3): 6.
Wen Y, Zhang K, Li Z, Qiao Y. A discriminative feature learning approach for deep face recognition. European Conference on Computer Vision 2016: 499-515. DOI: 10.1007/978-3-319-46478-7_31.
Sun Y, Wang X, Tang X. Deeply learned face representations are sparse, selective, and robust. CVPR 2015: 2892-2900. DOI: 10.1109/CVPR.2015.7298907.
Ding C, Taio D. Trunk-branch ensemble convolutional neural networks for video-based face recognition. arXiv preprint arXiv:1607.05427 2016. DOI: 10.1109/TPAMI.2017.2700390.
Li Y, Zheng W, Cui Z. Recurrent regression for face recognition. arXiv preprint arXiv:1607.06999 2016.
Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R. Signature verification using a “Siamese” time delay neural network. In book: Cowan JD, Tesauro G, Alspector J, eds. Advances in Neural Information Processing Systems 6. Morgan Kaufmann Pub; 1994: 737-744. ISBN: 978-1-558603226.
Klare BF, Klein B, Taborsky E, Blanton A, Cheney J, Allen K, Grother P, Mah A, Bugre M, Jain AK. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. CVPR 2015: 1931-1939. DOI: 10.1109/CVPR.2015.7298803.
Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. CVPR 2011: 529-534. DOI: 10.1109/CVPR.2011.5995566.
Video Analysis Technologies. FaceSDK. Source: áhttp://tevian.ru/product/facesdk/ñ.
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia 2014: 675-678. DOI: 10.1145/2647868.2654889.
Caffe. Source: áhttp://caffe.berkeleyvision.org/tutorial/layers.htmlñ.
Grother P, Ngan M. Face recognition vendor test (FRVT): Performance of face identification algorithms. NIST Interagency Report 8009. NIST; 2014.

© 2009, IPSI RAS
Institution of Russian Academy of Sciences, Image Processing Systems Institute of RAS, Russia, 443001, Samara, Molodogvardeyskaya Street 151; E-mail: journal@computeroptics.ru ; Phones: +7 (846 2) 332-56-22, Fax: +7 (846 2) 332-56-20

Neural network model for video-based face recognition with frames quality assessment Nikitin M.Yu., Konushin V.S., Konushin A.S.

M.V. Lomonosov Moscow State University, Moscow, Russia, Video Analysis Technologies LLC, Moscow, Russia, National Research University Higher School of Economics, Moscow, Russia

Neural network model for video-based face recognition with frames quality assessment
Nikitin M.Yu., Konushin V.S., Konushin A.S.

M.V. Lomonosov Moscow State University, Moscow, Russia,
Video Analysis Technologies LLC, Moscow, Russia,
National Research University Higher School of Economics, Moscow, Russia