(49-3) 15 * << * >> * Russian * English * Content * All Issues

Human Action Recognition Based on The Skeletal Pairwise Dissimilarity
E.E. Surkov¹, O.S. Seredin¹, A.V. Kopylov¹

¹Tula State University,
Lenin Ave. 92, Tula, 300012, Russia

PDF, 6771 kB

DOI: 10.18287/2412-6179-CO-1522

Pages: 493-503.

Full text of article: English language.

Abstract:
The main idea of the paper is to apply the principles of featureless pattern recognition to human activity recognition problem. The article presents the human figure representing approach based on pairwise dissimilarity function of skeletal models and a set of reference objects, also known as a basic assembly. The paper includes a basic assembly analysis and we propose the method for selecting the least-correlated basic objects. The video sequence proposed for analysis of human activity within frames is represented as an activity map. The activity map is a result of computing the pairwise dissimilarity function between skeletal models from the video sequence and the basic assembly of skeletons. The paper conducts frame-by-frame annotation of activities in the TST Fall Detection v2 database, such as standing, sitting, lying, walking, falling, post-fall lying, grasp, ungrasp. A convolutional neural network based on the ResNetV2 with the SE-block is proposed to solve the activity recognition problem. SE-block allows to detect inter-channel dependencies and selecting the most important features. Additionally, we prepare a data for training, determine an optimal hyperparameters of the neural network model. Experimental results of human activity recognition on the TST Fall Detection v2 database using the Leave-one-person-out procedure are provided. Furthermore, the paper presents a frame-by-frame assessment of the quality of human activity recognition, achieving an accuracy exceeding 83%.

Keywords:
basic assembly, pairwise dissimilarity measure, activity map, human action recognition, CNN, inner-channel attention.

Citation:
Surkov EE, Seredin OS, Kopylov AV. Human Action Recognition Based on The Skeletal Pairwise Dissimilarity. Computer Optics 2025; 49(3): 493-503. DOI: 10.18287/2412-6179-CO-1522.

Acknowledgements:
This research is funded by the Ministry of Science and Higher Education of the Russian Federation within the framework of the state task FSFS-2024-0012.

References:

Seredin OS, Kopylov AV, Surkov EE. The study of skeleton description reduction in the human fall-detection task. Computer Optics 2020; 44(6): 951-958. DOI: 10.18287/2412-6179-CO-753.
Seredin OS, Kopylov AV, Huang SC, Rodionov DS. A skeleton features-based fall detection using Microsoft Kinect v2 with one class-classifier outlier removal. Int Arch Photogramm Remote Sens Spat Inf Sci 2019; XLII-2/W12: 189-195. DOI: 10.5194/isprs-archives-XLII-2-W12-189-2019.
Hussein ME, Torki M, Gowayyed MA, El-Saban M. Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. Proc Twenty-Third Int Joint Conf on Artificial Intelligence (IJCAI '13) 2013: 2466-2472.
Vemulapalli R, Arrate F, Chellappa R. Human action recognition by rep-resenting 3D skeletons as points in a lie group, 2014 IEEE Conf on Computer Vision and Pattern Recognition 2014: 588-595. DOI: 10.1109/CVPR.2014.82.
Wang J, Liu Z, Wu Z, Yuan J. Mining actionlet ensemble for action recognition with depth cameras. 2012 IEEE Conf on Computer Vision and Pattern Recognition 2012: 1290-1297. DOI: 10.1109/CVPR.2012.6247813.
Smolyaninov VV. Invariants of anthropometric proportions [In Russian]. Biophisycs 2012; 57(3): 528-560.
Ren F, Tang C, Tong A, et al. Skeleton-based human action recognition by fusing attention based three-stream convolutional neural network and SVM. Multimed Tools Appl 2024; 83(2): 6273-6295. DOI: 10.1007/s11042-023-15334-9.
Xin C, Kim S, Cho Y, Park KS. Enhancing human action recognition with 3d skeleton data: A comprehensive study of deep learning and data augmentation. Electronics 2024; 13(4): 747. DOI: 10.3390/electronics13040747.
Xie J, Meng Y, Zhao Y, Nguyen A, Yang X, Zheng Y. Dynamic semantic-based spatial graph convolution network for skeleton-based human action recognition. Proc AAAI Conf on Artificial Intelligence 2024; 38(6): 6225-6233. DOI: 10.1609/aaai.v38i6.28440.
Abduljalil H, Elhayek A, Marish Ali A, Alsolami F. Spatiotemporal graph autoencoder network for skeleton-based human action recognition. Preprints. 2024: 2024011998. Source: <https://www.preprints.org/manuscript/202401.1998/v2>. DOI: 10.20944/preprints202401.1998.v2.
Lovanshi M, Tiwari V. Human skeleton pose and spatio-temporal feature-based activity recognition using ST-GCN. Multimed Tools Appl 2024; 83(5): 12705-12730. DOI: 10.1007/s11042-023-16001-9.
Chen K, Yang Z, Yang Z. Graph neural networks for skeleton-based action recognition. Advances in Engineering Technology Research 2024; 9(1): 604-604. DOI: 10.56028/aetr.9.1.604.2024.
Do J, Kim M. SkateFormer: skeletal-temporal transformer for human action recognition. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2403.09508>. DOI: 10.48550/arXiv.2403.09508.
Lerch DJ, Zhong Z, Martin M, Voit M, Beyerer J. Unsupervised 3D skeleton-based action recognition using cross-attention with conditioned generation capabilities. 2024 IEEE/CVF Winter Conf on Applications of Computer Vision Workshops (WACVW) 2024: 211-220.
Qiu H, Biao H. Multi-grained clip focus for skeleton-based action recognition. Pattern Recogn 2024; 148: 110-188. DOI: 10.1016/j.patcog.2023.110188.
Uddin S, Nawaz T, Ferryman J, Rashid N, Asaduzzaman M, Nawaz R. Skeletal keypoint-based transformer model for human action recognition in aerial videos. IEEE Access 2024; 12: 11095-11103. DOI: 10.1109/ACCESS.2024.3354389.
Han H, Zeng H, Kuang L, Han X, Xue H. A human activity recognition method based on Vision Transformer. Sci Rep 2024; 14: 15310. https://doi.org/10.1038/s41598-024-65850-3.
Bevilacqua V, et al. Fall detection in indoor environment with kinect sensor. 2014 IEEE Int Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proc 2014: 319-324. DOI: 10.1109/INISTA.2014.6873638.
Bian P, Hou J, Chau P, Thalmann NM. Fall detection based on body part tracking using a depth camera. IEEE J Biomed Heal Informatics 2015; 19(2): 430-439. DOI: 10.1109/JBHI.2014.2319372.
Mottl V, Seredin O, Dvoenko S, Kulikowski C, Muchnik I. Featureless pattern recognition in an imaginary Hilbert space. Proc 16th Int Conf on Pattern Recognition 2002; 2: 88-91. DOI: 10.1109/ICPR.2002.1048244.
Duin PW, Pekalska E, Ridder DD. Relational discriminant analysis. Pattern Recognit Lett 1999; 20(11-13): 1175-1181. DOI: 10.1016/S0167-8655(99)00085-9.
Seredin OS, Kopylov AV, Surkov EE, Huang SC. The basic assembly of skeletal models in the fall detection problem. Computer Optics 2023; 47(2): 323-334. DOI: 10.18287/2412-6179-CO-1158.
Pekalska E, Duin RPW. The dissimilarity representation for pattern recognition: Foundations and applications. Singapore: World Scientific Publishing Co Ptr Ltd; 2005. ISBN: 981-256-530-2.
Pekalska E, Duin RPW, Paclik P. Prototype selection for dissimilarity-based classifiers, Pattern Recognit 2006; 39(2): 189-208. DOI: 10.1016/j.patcog.2005.06.012.
Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S. Pose-based human action recognition via sparse representation in dissimilarity space. J Vis Commun Image Represent 2014; 25(1): 12-23. DOI: 10.1016/j.jvcir.2013.03.008.
Kim M, Jiang X, Lauter K, et al. Secure human action recognition by encrypted neural network inference. Nat Commun 2022; 13: 4799. DOI: 10.1038/s41467-022-32168-5.
Rajput AS, Raman B, Imran J. Privacy-preserving human action recognition as a remote cloud service using RGB-D sensors and deep CNN. Expert Syst Appl 2020; 152: 113349. DOI: 10.1016/j.eswa.2020.113349.
Wang H, et al. Understanding the robustness of skeleton-based action recognition under adversarial attack. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition 2021: 14656-14665. DOI: 10.1109/CVPR46437.2021.01442.
Seredin OS, Surkov EE, Kopylov AV, Dvoenko SD. Multidimensional data visualization based on the shortest unclosed path search. In Book: Dang NHT, Zhang Y-D, Tavares JMRS, Chen B-H, eds. Artificial intelligence in data and big data processing. Cham, Switzerland: Springer Nature Switzerland AG; 2022: 279-299. DOI: 10.1007/978-3-030-97610-1_23.
Surkov EE, Seredin OS, Kopylov AV. Locally optimal solutions in the shortest unclosed path search problem, IEEE Ural-Siberian Conf on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) 2023: 221-224. DOI: 10.1109/USBEREIT58508.2023.10158834.
He K, et al. Identity mappings in deep residual networks. In Book: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision – ECCV 2016. 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV. Cham, Switzerland: Springer International Publishing AG; 2016: 630-645. DOI: 10.1007/978-3-319-46493-0_38.
Gasparrini S, Cippitelli E, Gambi E, Spinsante S, Wahslen J, Orhan I, Lindh T. Proposal and experimental evaluation of fall detection solution based on wearable and depth data fusion. In Book: Loshkovska S, Koceski S, eds. ICT Innovations 2015. Emerging technologies for better living. Cham: Springer International Publishing Switzerland; 2016: 99-108. DOI: 10.1007/978-3-319-25733-4_11.
Wang X, Talavera E, Karastoyanova D, Azzopardi G. Fall detection with a nonintrusive and first-person vision approach. IEEE Sens J 2023; 23(22): 28304-28317. DOI: 10.1109/JSEN.2023.3314828.
Mottl V, Seredin O, Krasotkina O. Compactness hypothesis, potential functions, and rectifying linear space in machine learning. In Book: Rozonoer L, Mirkin B, Muchnik I, eds. Braverman readings in machine learning. key ideas from inception to current state. International Conference Commemorating the 40th Anniversary of Emmanuil Braverman's Decease, Boston, MA, USA, April 28-30, 2017, Invited Talks. Cham, Switzerland: Springer Nature Switzerland AG; 2018: 52-102. DOI: 10.1007/978-3-319-99492-5_3.
He K, et al. Deep residual learning for image recognition. 2016 IEEE Conf on Computer Vision and Pattern Recognition (CVPR) 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. 2018 IEEE/CVF Conf on Computer Vision and Pattern Recognition 2018: 132-141. DOI: 10.1109/CVPR.2018.00745.
Xu Z, Yu J, Xiang W, Zhu S, Hussain M, Liu B, Li J. A novel SE-CNN attention architecture for sEMG-based hand gesture recognition. CMES - Computer Modeling in Engineering and Sciences 2023; 134(1): 157-177. DOI: 10.32604/cmes.2022.020035.
Mikhaylichenko AA, Demyanenko YM. Using squeeze-and-excitation blocks to improve an accuracy of automatically grading knee osteoarthritis severity using convolutional neural networks. Computer Optics 2022; 46(2): 317-325. DOI: 10.18287/2412-6179-CO-897.
Tharwat A. Classification assessment methods. Appl Comput Inform 2021; 17(1): 168-192. DOI: 10.1016/j.aci.2018.08.003.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv Preprint. 2014. Source: <https://arxiv.org/abs/1412.6980>. DOI: 10.48550/arXiv.1412.6980.
Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. Proc Mach Learn Res 2010; 9: 249-256.
Manzi A, Dario P, Cavallo F. A human activity recognition system based on dynamic clustering of skeleton data. Sensors 2017; 17(5): 1100. DOI: 10.3390/s17051100.
Yin J, et al. MC-LSTM: Real-time 3D human action detection system for intelligent healthcare applications. IEEE Trans Biomed Circuits Syst 2021; 15(2): 259-269. DOI: 10.1109/TBCAS.2021.3064841.

© 2009, IPSI RAS
151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: journal@computeroptics.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20

Human Action Recognition Based on The Skeletal Pairwise Dissimilarity E.E. Surkov 1, O.S. Seredin 1, A.V. Kopylov 1

1 Tula State University, Lenin Ave. 92, Tula, 300012, Russia

Human Action Recognition Based on The Skeletal Pairwise Dissimilarity
E.E. Surkov¹, O.S. Seredin¹, A.V. Kopylov¹

¹Tula State University,
Lenin Ave. 92, Tula, 300012, Russia