A framework of reading timestamps for surveillance video
Cheng J., Dai W.
Computer School, Hubei Polytechnic University, Huangshi 435000, Hubei, China,
School of Economics and Management, Hubei Polytechnic University, Huangshi 435003, Hubei, China
PDF
Abstract:
This paper presents a framework to automatically read timestamps for surveillance video. Reading timestamps from surveillance video is difficult due to the challenges such as color variety, font diversity, noise, and low resolution. The proposed algorithm overcomes these challenges by using the deep learning framework. The framework has included: training of both timestamp localization and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the timestamps and adapts its resolution to the data. The proposed method achieves state-of-the-art accuracy in the end-to-end timestamps recognition on our datasets, whilst being an order of magnitude faster than competing methods. The framework can be improved the market competitiveness of panoramic video surveillance products.
Keywords:
surveillance video, timestamp localization, timestamp recognition.
Citation:
Cheng J, Dai W. A framework of reading timestamps for surveillance video. Computer Optics 2019; 43(1): 72-77. DOI: 10.18287/2412-6179-2019-43-1-72-77.
References:
- Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LGi, Mestre SR, Mas J, Mota DF, Almazàn JA, de las Heras LP. ICDAR 2013 robust reading competition. Proc 12th International Conference on Document Analysis and Recognition 2013: 1484-1493. DOI: 10.1109/ICDAR.2013.221.
- Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E. Proc 13th International Conference on Document Analysis and Recognition (ICDAR) 2015: 1156-1160. DOI: 10.1109/ICDAR.2015.7333942.
- Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision – ECCV 2014. Cham: Springer; 2014: 512-528. DOI: 10.1007/978-3-319-10593-2_34.
- Lécun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998; 86(11): 2278-2324. DOI: 10.1109/5.726791.
- Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 2016; 116(1): 1-20. DOI: 10.1007/s11263-015-0823-z.
- Zitnick CL, Doll< r P. Edge boxes: Locating object proposals from edges. In Book: Fleet D, Pajdla T, Schiele B, Tuytelaars T, eds. Computer Vision – ECCV 2014. Cham: Springer; 2014: 391-405. DOI: 10.1007/978-3-319-10602-1_26.
- Dollar P, Appel R, Belongie S, Perona P. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2014; 36(8): 1532-1545. DOI: 10.1109/TPAMI.2014.2300479.
- Bosch A, Zisserman A, Munoz X. Image classification using random forests and ferns. IEEE International Conference on Computer Vision 2007: 1-8. DOI: 10.1109/ICCV.2007.4409066.
- Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. IEEE Conference on Computer Vision and Pattern Recognition 2016: 2315-2324. DOI: 10.1109/CVPR.2016.254.
- Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016: 779-788. DOI: 10.1109/CVPR.2016.91.
- Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint v6 2015. Source: < https://arxiv.org/abs/1409.1556 >
- Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. Cham: Springer; 2016: 56-72. DOI: 10.1007/978-3-319-46484-8_4.
- Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017; 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
- Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: A fast text detector with a single deep neural network. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) 2017: 4161-4167.
- Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. SSD: Single shot multibox detector. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. Cham: Springer; 2016: 21-37. DOI: 10.1007/978-3-319-46448-0_2.
- Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia 2018; 20(11): 3111-3122. DOI: 10.1109/TMM.2018.2818020.
- Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2016; 39(11): 2298-2304. DOI: 10.1109/TPAMI.2016.2646371.
- Graves A, Gomez F. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Proc 23rd International Conference on Machine Learning 2006: 369-376. DOI: 10.1145/1143844.1143891.
- Girshick R. Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV) 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169.
- Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. NIPS Deep Learning Workshop 2014. Source: <https://arxiv.org/abs/1406.2227>.
- Yu X, Cheng J, Wu S, Song W. A framework of timestamp replantation for panorama video surveillance. Multimedia Tools and Applications 2016; 75(17): 10357-10381. DOI: 10.1007/s11042-015-3051-1.
© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846)332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20