(48-5) 12 * << * >> * Русский * English * Содержание * Все выпуски
  
People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors
 H. Quan 1, G. Ma 2, Y. Weichen 2, R. Bohush 3, F. Zuo 4, S. Ablameyko 1,5
 1 Belarusian State University, 220030, Minsk, Republic of Belarus, Nezavisimosti av. 4;
     2 EarthView Image Inc., 313200, China Deqing County, Zhejiang, Keyuan Road 11;
     3 Polotsk State University, 211440, Novopolotsk, Republic of Belarus, Blokhin str. 29;
     4 Henan International Joint Laboratory of Theories and Key Technologies on Intelligence Networks,
     Henan University, 450046, China, Kaifeng;
     5 United Institute of Informatics Problems, National Academy of Sciences of Belarus,
     220012, Minsk, Republic of Belarus, Surganov str. 6
 
 PDF, 785 kB
  PDF, 785 kB
DOI: 10.18287/2412-6179-CO-1422
Страницы: 734-744.
Язык статьи: English.
 
Аннотация:
The  tracking-by-detection paradigm is widely used for people multi-object tracking  tasks. Up to now, there exist many detectors and trackers, many evaluation  benchmarks, which necessitates the use of relatively uniform estimation methods  and metrics.  It leads to necessity to choose  better combined models of detectors and trackers. To solve this task, we developed  a comprehensive performance evaluation methodology for estimation of people  tracking accuracy and real-time by using different detectors and trackers. We conducted  experiments by choosing the official pre-trained models of YOLOv5, YOLOv6, YOLOv7,  YOLOv8 with representative BoTSORT, ByteTrack, DeepOCSORT, OCSORT, StrongSORT  trackers under two benchmarks of MOT17 and MOT20. Detailed metrics in terms of error  and speed such as higher order tracking accuracy and frames per second were  analyzed for the combinations of detectors and trackers. It is concluded that  the OCSORT+YOLOv6l model has the best comprehensive performance and the  combination of OCSORT and YOLOv7 has the best average performance under MOT17  and MOT20.
Ключевые слова:
YOLO  family detectors, tracking-by-detection, multi-object tracking, scoring function,  comprehensive performance, video surveillance.
Citation:
Quan H, Ma G, Weichen Y, Bohush R, Zuo F, Ablameyko S. People tracking accuracy improvement in video by matching relevant trackers and YOLO family detectors. Computer Optics 2024; 48(5): 734-744. DOI: 10.18287/2412-6179-CO-1422.
References:
  - Zhang Y, et  al. Bytetrack: Multi-object tracking by associating  every detection box. Proc 17th European Conf on Computer Vision (ECCV) 2022:  1-21.
 
- Aharon N, Orfaig R,  Bobrovsky B-Z. BoT-SORT: Robust associations multi-pedestrian tracking. ArXiv  Preprint. 2022. Source: <https://arxiv.org/abs/2206.14651>. DOI:  10.48550/arXiv.2206.14651.
 
- Du  Y, et al. StrongSORT:  Make deepSORT great again. IEEE  Trans Multimed 2023; 25: 8725-8737. DOI: 10.1109/TMM.2023.3240881.
 
- Zhou  K, Xiang T. Torchreid: A library for deep learning person re-identification in  pytorch. arXiv Preprint. 2019. Source:  <https://arxiv.org/abs/1910.10093>. DOI: 10.48550/arXiv.1910.10093.
 
- Redmon  J, Divvala SK, Girshick R, Farhadi A. You only look  once: Unified, real-time object detection. Proc IEEE Conf on Computer Vision  and Pattern Recognition (CVPR) 2016: 779-788. DOI: 10.1109/cvpr.2016.91.
 
- Fengxia  Y, Xing Z, Boqi L. Video object tracking based on YOLOv7 and DeepSORT. arXiv  Preprint. 2022. Source: <https://arxiv.org/abs/2207.12202>. DOI:  10.48550/arxiv.2207.12202.
 
- Wojke  N, Bewley A, Paulus D. Simple online and realtime tracking with a deep  association metric. Proc IEEE Int Conf on Image Processing (ICIP) 2017:  3645-3649.
 
- Wang  C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new  state-of-the-art for real-time object detectors. arXiv Preprint. 2022. Source:  <https://arxiv.org/abs/2207.02696>. DOI: 10.48550/arXiv.2207.02696.
 
- Quan  H, Bohush R, Ma G, Weichen Y, Ablameyko S. People detecting and tracking in  video by CNN YOLO and StrongSORT combined algorithm. Nonlinear Phenom Complex  Syst 2023; 26(1): 83-97. DOI: 10.33581/1561-4085-2023-26-1-83-97.
 
- Luiten  J, et al. HOTA: A  higher order metric for evaluating multi-object tracking. Int J Comput Vis  2021; 129: 548-578. DOI: 10.1007/s11263-020-01375-2.
 
- Li  C, et al. YOLOv6: A  single-stage object detection framework for industrial applications. arXiv  Preprint. 2022. Source: <https://arxiv.org/abs/2209.02976>. DOI:  10.48550/arXiv.2209.02976.
 
- Maggiolino  G, Ahmad A, Cao J, Kitani K. Deep OC-SORT: Multi-pedestrian tracking by  adaptive re-identification. arXiv Preprint. 2023. Source:  <https://arxiv.org/abs/2302.11813>. DOI: 10.48550/arXiv.2302.11813.
 
- Cao  J, Weng X, Khirodkar R, Pang J, Kitani K. Observation-centric sort: Rethinking  sort for robust multi-object tracking. arXiv Preprint. 2022. Source:  <https://arxiv.org/abs/2203.14360>. DOI: 10.48550/arXiv.2203.14360.
 
- Milan  A, Leal-Taixé L, Reid I, Roth S, Schindler K. MOT16: A benchmark for multi-object  tracking. arXiv Preprint. 2016. Source:  <https://arxiv.org/abs/1603.00831>. DOI: 10.48550/arXiv.1603.00831.
 
- Dendorfer  P, et al. MOT20: A  benchmark for multi object tracking in crowded scenes. arXiv Preprint. 2020.  Source: <https://arxiv.org/abs/2003.09003>. DOI:  10.48550/arXiv.2003.09003.
 
- Li  C, et al. YOLOv6  v3.0: A full-scale reloading. arXiv Preprint. 2023. Source:  <https://arxiv.org/abs/2301.05586>. DOI: 10.48550/arXiv.2301.05586.
 
- Sun  P, et al. DanceTrack:  Multi-object tracking in uniform appearance and diverse motion. Proc IEEE/CVF  Conf on Computer Vision and Pattern Recognition 2022: 20993-21002. DOI:  10.1109/CVPR52688.2022.02032.
 
- Cui  Y, et al. SportsMOT:  A large multi-object tracking dataset in multiple sports scenes. arXiv  Preprint. 2023. Source: <https://arxiv.org/abs/2304.05170>. DOI:  10.48550/arXiv.2304.05170.
 
- Lin W, et al. Human in events: A  large-scale benchmark for human-centric video analysis in complex events. arXiv  Preprint. 2020. Source: <https://arxiv.org/abs/2005.04490>. DOI: 10.48550/arXiv.2005.04490.
 
- Voigtlaender  P, et al. MOTS:  Multi-object tracking and segmentation. Proc IEEE/CVF Conf on Computer Vision  and Pattern Recognition 2019: 7942-7951. DOI: 10.1109/CVPR.2019.00813.
 
- Yang  L, Fan Y, Xu N. Video instance segmentation. Proc of the IEEE/CVF Int Conf on  Computer Vision 2019: 5188-5197.
 
- Manohar  V, et al. Performance  evaluation of object detection and tracking in video. Proc 7th Asian Conf on Computer  Vision 2006; Pt II: 151-161.
 
- Ristani  E, Solera F, Zou R, Cucchiara R, Tomasi C. Performance measures and a data set  for multi-target, multi-camera tracking. In Book: Hua G, Jégou H, eds. Computer  vision – ECCV 2016 workshops. Pt II. Cham: Springer International Publishing Switzerland;  2016: 17-35.
 
- Bernardin  K, Stiefelhagen R. Evaluating multiple object tracking performance: the clear  mot metrics. Eurasip J Image Video Process 2008; 2008: 246309. DOI:  10.1155/2008/246309.
 
- Vaswani  A, et al. Attention is all you need. arXiv Preprint. 2017. Source:  <https://arxiv.org/abs/1706.03762>. DOI: 10.48550/arXiv.1706.03762..
  
  © 2009, IPSI RAS
    Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7  (846)  242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический  редактор), факс: +7 (846) 332-56-20