(48-5) 12 * << * >> * Russian * English * Content * All Issues

People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors
H. Quan¹, G. Ma², Y. Weichen², R. Bohush³, F. Zuo⁴, S. Ablameyko^1,5

¹Belarusian State University, 220030, Minsk, Republic of Belarus, Nezavisimosti av. 4;
²EarthView Image Inc., 313200, China Deqing County, Zhejiang, Keyuan Road 11;
³Polotsk State University, 211440, Novopolotsk, Republic of Belarus, Blokhin str. 29;
⁴Henan International Joint Laboratory of Theories and Key Technologies on Intelligence Networks,
Henan University, 450046, China, Kaifeng;
⁵United Institute of Informatics Problems, National Academy of Sciences of Belarus,
220012, Minsk, Republic of Belarus, Surganov str. 6

PDF, 785 kB

DOI: 10.18287/2412-6179-CO-1422

Pages: 734-744.

Full text of article: English language.

Abstract:
The tracking-by-detection paradigm is widely used for people multi-object tracking tasks. Up to now, there exist many detectors and trackers, many evaluation benchmarks, which necessitates the use of relatively uniform estimation methods and metrics. It leads to necessity to choose better combined models of detectors and trackers. To solve this task, we developed a comprehensive performance evaluation methodology for estimation of people tracking accuracy and real-time by using different detectors and trackers. We conducted experiments by choosing the official pre-trained models of YOLOv5, YOLOv6, YOLOv7, YOLOv8 with representative BoTSORT, ByteTrack, DeepOCSORT, OCSORT, StrongSORT trackers under two benchmarks of MOT17 and MOT20. Detailed metrics in terms of error and speed such as higher order tracking accuracy and frames per second were analyzed for the combinations of detectors and trackers. It is concluded that the OCSORT+YOLOv6l model has the best comprehensive performance and the combination of OCSORT and YOLOv7 has the best average performance under MOT17 and MOT20.

Keywords:
YOLO family detectors, tracking-by-detection, multi-object tracking, scoring function, comprehensive performance, video surveillance.

Citation:
Quan H, Ma G, Weichen Y, Bohush R, Zuo F, Ablameyko S. People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors. Computer Optics 2024; 48(5): 734-744. DOI: 10.18287/2412-6179-CO-1422.

References:

Zhang Y, et al. Bytetrack: Multi-object tracking by associating every detection box. Proc 17th European Conf on Computer Vision (ECCV) 2022: 1-21.
Aharon N, Orfaig R, Bobrovsky B-Z. BoT-SORT: Robust associations multi-pedestrian tracking. ArXiv Preprint. 2022. Source: <https://arxiv.org/abs/2206.14651>. DOI: 10.48550/arXiv.2206.14651.
Du Y, et al. StrongSORT: Make deepSORT great again. IEEE Trans Multimed 2023; 25: 8725-8737. DOI: 10.1109/TMM.2023.3240881.
Zhou K, Xiang T. Torchreid: A library for deep learning person re-identification in pytorch. arXiv Preprint. 2019. Source: <https://arxiv.org/abs/1910.10093>. DOI: 10.48550/arXiv.1910.10093.
Redmon J, Divvala SK, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. Proc IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 779-788. DOI: 10.1109/cvpr.2016.91.
Fengxia Y, Xing Z, Boqi L. Video object tracking based on YOLOv7 and DeepSORT. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2207.12202>. DOI: 10.48550/arxiv.2207.12202.
Wojke N, Bewley A, Paulus D. Simple online and realtime tracking with a deep association metric. Proc IEEE Int Conf on Image Processing (ICIP) 2017: 3645-3649.
Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2207.02696>. DOI: 10.48550/arXiv.2207.02696.
Quan H, Bohush R, Ma G, Weichen Y, Ablameyko S. People detecting and tracking in video by CNN YOLO and StrongSORT combined algorithm. Nonlinear Phenom Complex Syst 2023; 26(1): 83-97. DOI: 10.33581/1561-4085-2023-26-1-83-97.
Luiten J, et al. HOTA: A higher order metric for evaluating multi-object tracking. Int J Comput Vis 2021; 129: 548-578. DOI: 10.1007/s11263-020-01375-2.
Li C, et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2209.02976>. DOI: 10.48550/arXiv.2209.02976.
Maggiolino G, Ahmad A, Cao J, Kitani K. Deep OC-SORT: Multi-pedestrian tracking by adaptive re-identification. arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2302.11813>. DOI: 10.48550/arXiv.2302.11813.
Cao J, Weng X, Khirodkar R, Pang J, Kitani K. Observation-centric sort: Rethinking sort for robust multi-object tracking. arXiv Preprint. 2022. Source: <https://arxiv.org/abs/2203.14360>. DOI: 10.48550/arXiv.2203.14360.
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object tracking. arXiv Preprint. 2016. Source: <https://arxiv.org/abs/1603.00831>. DOI: 10.48550/arXiv.1603.00831.
Dendorfer P, et al. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv Preprint. 2020. Source: <https://arxiv.org/abs/2003.09003>. DOI: 10.48550/arXiv.2003.09003.
Li C, et al. YOLOv6 v3.0: A full-scale reloading. arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2301.05586>. DOI: 10.48550/arXiv.2301.05586.
Sun P, et al. DanceTrack: Multi-object tracking in uniform appearance and diverse motion. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2022: 20993-21002. DOI: 10.1109/CVPR52688.2022.02032.
Cui Y, et al. SportsMOT: A large multi-object tracking dataset in multiple sports scenes. arXiv Preprint. 2023. Source: <https://arxiv.org/abs/2304.05170>. DOI: 10.48550/arXiv.2304.05170.
Lin W, et al. Human in events: A large-scale benchmark for human-centric video analysis in complex events. arXiv Preprint. 2020. Source: <https://arxiv.org/abs/2005.04490>. DOI: 10.48550/arXiv.2005.04490.
Voigtlaender P, et al. MOTS: Multi-object tracking and segmentation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2019: 7942-7951. DOI: 10.1109/CVPR.2019.00813.
Yang L, Fan Y, Xu N. Video instance segmentation. Proc of the IEEE/CVF Int Conf on Computer Vision 2019: 5188-5197.
Manohar V, et al. Performance evaluation of object detection and tracking in video. Proc 7th Asian Conf on Computer Vision 2006; Pt II: 151-161.
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C. Performance measures and a data set for multi-target, multi-camera tracking. In Book: Hua G, Jégou H, eds. Computer vision – ECCV 2016 workshops. Pt II. Cham: Springer International Publishing Switzerland; 2016: 17-35.
Bernardin K, Stiefelhagen R. Evaluating multiple object tracking performance: the clear mot metrics. Eurasip J Image Video Process 2008; 2008: 246309. DOI: 10.1155/2008/246309.
Vaswani A, et al. Attention is all you need. arXiv Preprint. 2017. Source: <https://arxiv.org/abs/1706.03762>. DOI: 10.48550/arXiv.1706.03762..

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20

People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors H. Quan 1, G. Ma 2, Y. Weichen 2, R. Bohush 3, F. Zuo 4, S. Ablameyko 1,5

People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors
H. Quan¹, G. Ma², Y. Weichen², R. Bohush³, F. Zuo⁴, S. Ablameyko^1,5