(44-1) 14 * << * >> * Russian * English * Content * All Issues

Person tracking algorithm based on convolutional neural network for indoor video surveillance

R. Bohush 1, I. Zakharava 1

Polotsk State University, Polotsk, Belarus

 PDF, 709 kB

DOI: 10.18287/2412-6179-CO-565

Pages: 109-116.

Full text of article: Russian language.

In this paper, a person tracking algorithm for indoor video surveillance is presented. The algorithm contains the following steps: person detection, person features formation, features similarity calculation for the detected objects, postprocessing, person indexing, and person visibility determination in the current frame.  Convolutional Neural Network (CNN) YOLO v3 is used for person detection. Person features are formed based on H channel in HSV color space histograms and a modified CNN ResNet. The proposed architecture includes 29 convolutional and one fully connected layer. As the output, it forms a 128-feature vector for every input image. CNN model was  trained to perform feature extraction. Experiments were conducted using MOT methodology on stable camera videos in indoor environment. Main characteristics of the presented algorithm are calculated and discussed, confirming its effectiveness in comparison with the current approaches for person tracking in an indoor environment. Our algorithm performs real time processing for object detection and tracking using CUDA technology and a graphics card NVIDIA GTX 1060.

person tracking, indoor video surveillance, convolutional neural networks.

Bohush RP, Zakharava IY. Person tracking algorithm based on convolutional neural network for indoor video surveillance. Computer Optics 2020; 40(1): 109-116. DOI: 10.18287/2412-6179-CO-565.


  1. Forsyth D, Ponce J. Computer vision: A modern approach. 2nd Ed. Pearson Education; 2012.
  2. Shatalin RA, Fidelman VR, Ovchinnikov PE. Abnormal behavior detection method for video surveillance applications. Computer Optics 2017; 41(1): 37-45. DOI: 10.18287/2412-6179-2017-41-1-37-45.
  3. MOTChallenge: The multiple object tracking benchmark Source: <https://motchallenge.net>.
  4. Miguel MD, Brunete A, Hernando M, Gambao E. Home camera-based fall detection system for the elderly. Sensors 2017; 17(12): 2864-2885. DOI: 10.3390/s17122864.
  5. Kuplyakov D, Shalnov E, Konushin A. Markov chain Monte Carlo based video tracking algorithm. Programming and Computer Software 2017; 43(4): 224-229. DOI: 10.1134/S0361768817040053.
  6. Tao R, Gavves E., Smeulders AW. Siamese instance search for tracking. IEEE Conf Comp Vis Pattern Recogn 2016: 1420-1429. DOI: 10.1109/CVPR.2016.158.
  7. Zhao L, Li X, Zhuang Y, Wang J. Deeply-learned part-aligned representations for person re-identification. IEEE Int Conf Comp Vis 2017: 3239-3248. DOI: 10.1109/ICCV.2017.349.
  8. Chahyati D, Fanany MI, ArymurthyA. Tracking people by detection using CNN features. Proc 4th Inform Sys Int Conf 2017: 167-172.
  9. Iqbal U, Milan A, Gall J. PoseTrack: Joint multi-person pose estimation and tracking. IEEE Conf Comp Vis Pattern Recogn 2017: 4654-4663. DOI: 10.1109/CVPR.2017.495.
  10. Wojke N, Bewley A, Paulus D. Simple online and real time tracking with a deep association metric. IEEE Int Conf Image Process 2017: 3645-3649. DOI: 10.1109/ICIP.2017.8296962.
  11. Bewley A, Ge Z, Ott L, Ramos FT, Upcroft B. Simple online and real time tracking. IEEE Int Conf Image Process 2016: 3464-3468. DOI: 10.1109/ICIP.2016.7533003.
  12. Real-time Multi-person tracker using YOLO v3 and deep_sort with tensorflow. Source: <https://github.com/Qidian213/deep_sort_yolov>.
  13. Redmon J, Divvala SK, Girshick RB, Farhadi A. You only look once: Unified, real-time object detection. IEEE Conf Comp Vis Pattern Recogn 2016: 779-788. DOI: 10.1109/CVPR.2016.91.
  14. Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. IEEE Conf Comp Vis Pattern Recogn 2017: 6517-6525. DOI: 10.1109/CVPR.2017.690.
  15. YOLOv3: An incremental improvement. Source: <https://arxiv.org/abs/1804.02767>.
  16. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conf Comp Vis Pattern Recogn 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
  17. Wu L, Chunhua S, Hengel A. PersonNet: Person re-identification with deep convolutional neural networks. Source: <https://arxiv.org/pdf/1601.07255.pdf>.
  18. Kuhn HW. The hungarian method for the assignment problem. Naval Research Logistics Quarterly 1955; 2: 83-97.
  19. Bogush R, Maltsev S. Minimax criterion of similarity for video information processing. Proc Siberian Conf Control Commun 2007: 120-127. DOI: 10.1109/SIBCON.2007.371310.
  20. Person Re-ID (PRID) Dataset. Source: <https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/prid11/>.
  21. iLIDS Video re-IDentification (iLIDS-VID) dataset. Source: <http://www.eecs.qmul.ac.uk/~xiatian/downloads_qmul_iLIDS-VID_ReID_dataset.html>.
  22. Keni B, Stiefelhagen R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J Image Video Process 2008; 1: 1-10.


© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: ko@smr.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20