(49-1) 16 * << * >> * Русский * English * Содержание * Все выпуски

Machine learning-based voice assistant: optimizing the efficiency of speech conversion for people with speech disorders
M.H. Antor 1, N.V. Chudinovskikh 1, M.V. Bachurin 1, A.A. Shurpikov 1, N.A. Khlebnikov 1, B.A. Bredikhin 1

Ural Federal University named after the first President of Russia B. N. Yeltsin,
Ekaterinburg, Russia, 620002, Mira street, 19

  PDF, 1632 kB

DOI: 10.18287/2412-6179-CO-1482

Страницы: 124-131.

Язык статьи: English.

Аннотация:
An automatic speech recognition system has the possibility of enhancing the standard of living for persons with disabilities by solving issues such as dysarthria, stuttering, and other speech defects. In this paper, we introduce a voice assistant using hyperkinetic dysarthria (HD) defect speeches. It contains the data preprocessing steps and the development of a novel convolutional recurrent network (CRN) model that is built depending on the convolutional neural networks and recurrent neural networks. We implemented data preprocessing methods, including filtering, down-sampling, and splitting, to prevent overfitting and decrease processing power as well as time. In addition, the technique of Mel Frequency Cepstral Coefficients (MFCC) has been utilized to extract speech characteristics. The proposed model is trained to recognize HD speech disorders using a dataset including 2000 Russian speeches. The experimental results demonstrate that the proposed method obtains a character error rate (CER) of 14.76 %. It indicates that approximately 85 % of characters are able to correctly recognize on the test dataset. We have created a telegram bot that utilizes our trained model to help people with hyperkinetic dysarthria speech disorder. This bot is capable of providing assistance independently, without the need for any third-party assistance.

Ключевые слова:
natural language processing, hyperkinetic dysarthria, speech recognition, feature extraction, optimization.

Citation:
Antor MH, Chudinovskikh NV, Bachurin MV, Shurpikov AA, Khlebnikov NA, Bredikhin BA. Machine learning-based voice assistant: optimizing the efficiency of speech conversion for people with speech disorders. Computer Optics 2025; 49(1): 124-131. DOI: 10.18287/2412-6179-CO-1482.

References:

  1. Darley FL, Aronson AE, Brown JR. Clusters of deviant speech dimensions in the dysarthrias. J Speech Hear Res 1969; 12(3): 462-96. DOI: 10.1044/jshr.1203.462.
  2. Barkmeier-Kraemer JM, Clark HM. Speech-language pathology evaluation and management of hyperkinetic disorders affecting speech and swallowing function. Tremor Other Hyperkinet Mov 2017; 7: 489. DOI: 10.5334/tohm.381.
  3. Sadeghi Milani A, Cecil-Xavier A, Gupta A, Cecil J, Kennison S. A systematic review of Human-Computer Interaction (HCI) research in medical and other engineering fields. Int J Human–Computer Interact 2024; 40(3): 515-536. DOI: 10.1080/10447318.2022.2116530.
  4. Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: A systematic review. IEEE Access 2019; 7: 19143-19165. DOI: 10.1109/ACCESS.2019.2896880.
  5. Saon G, Sercu T, Rennie S, Kuo H-KJ. The IBM 2016 english conversational telephone speech recognition system. arXiv Preview. 2016. Source: <https://arxiv.org/abs/1604.08242>. DOI: 10.48550/arXiv.1604.08242.
  6. Hashan AM, Al-Saeedi Adnan Adhab K, Islam RMRU, Avinash K, Dey S. Automated human facial emotion recognition system using depthwise separable convolutional neural network. 2023 IEEE Int Conf on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT) 2023: 113-117. DOI: 10.1109/IAICT59002.2023.10205785.
  7. Yu D, Deng L. Gaussian mixture models. In Book: Yu D, Deng L. Automatic speech recognition. A deep learning approach. London: Springer-Verlag; 2015: 13-21. DOI: 10.1007/978-1-4471-5779-3_2.
  8. Palaz D, Magimai-Doss M, Collobert R. End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Commun 2019; 108: 15-32. DOI: 10.1016/j.specom.2019.01.004.
  9. Wang W, Gang J. Application of convolutional neural network in natural language processing. 2018 Int Conf on Information Systems and Computer Aided Education (ICISCAE) 2018: 64-70. DOI: 10.1109/ICISCAE.2018.8666928.
  10. Kukharchik P, Martynov D, Kheidorov I, Kotov O. Vocal fold pathology detection using modified wavelet-like features and support vector machines. 2007 15th European Signal Processing Conf 2007: 2214-2218.
  11. Kim H, Hasegawa-Johnson M, Perlman A, et al. Dysarthric speech database for universal access research. 9th Annual Conf of the International Speech Communication Association (INTERSPEECH 2008) 2008: 1741-1744. DOI: 10.21437/Interspeech.2008-480.
  12. Joy NM, Umesh S. Improving acoustic models in TORGO dysarthric speech database. IEEE Trans Neural Syst Rehabil Eng 2018; 26(3): 637-645. DOI: 10.1109/TNSRE.2018.2802914.
  13. Hashan AM, Dmitrievich CR, Valerievich MA, Vasilyevich DD, Alexandrovich KN, Bredikhin BA. Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation. Int J Speech Technol 2024; 27(1): 255-265. DOI: 10.1007/s10772-024-10098-5.
  14. Takashima Y, Nakashika T, Takiguchi T, Ariki Y. Feature extraction using pre-trained convolutive bottleneck nets for dysarthric speech recognition. 2015 23rd European Signal Processing Conf (EUSIPCO) 2015: 1411-1415. DOI: 10.1109/EUSIPCO.2015.7362616.
  15. Passricha V, Aggarwal RK. Convolutional support vector machines for speech recognition. Int J Speech Technol 2019; 22(3): 601-609. DOI: 10.1007/s10772-018-09584-4.
  16. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv Preview. 2012. Source: <https://arxiv.org/abs/1207.0580>. DOI: 10.48550/arXiv.1207.0580.
  17. Abdel-Hamid O, Mohamed A, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 2014; 22(10): 1533-1545. DOI: 10.1109/TASLP.2014.2339736.
  18. Wang Z, Xiao X. Duration-distribution-based HMM for speech recognition. Front Electr Electron Eng 2006; 1(1): 26-30. DOI: 10.1007/s11460-005-0010-z.
  19. Muhammad G, Mesallam TA, Malki KH, Farahat M, Alsulaiman M, Bukhari M. Formant analysis in dysphonic patients and automatic Arabic digit speech recognition. Biomed Eng OnLine 2011; 10: 41. DOI: 10.1186/1475-925X-10-41.
  20. Gurunath Shivakumar P, Narayanan S. End-to-end neural systems for automatic children speech recognition: An empirical study. Comput Speech Lang 2022; 72: 101289. DOI: 10.1016/j.csl.2021.101289.
  21. Hashan AM, Chaganov RD, Melnikov AV, Dorokh DV, Khlebnikov NA, Bredikhin BA. Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation. Int J Speech Technol 2024; 27(1): 255-265. DOI: 10.1007/s10772-024-10098-5.
  22. Turrisi R, Braccia A, Emanuele M, et al. EasyCall corpus: a dysarthric speech dataset. arXiv Preview. 2021. Source: <https://arxiv.org/abs/2104.02542>. DOI: 10.48550/arXiv.2104.02542.
  23. Yadav S, Shukla S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. 2016 IEEE 6th Int Conf on Advanced Computing (IACC) 2016: 78-83. DOI: 10.1109/IACC.2016.25.
  24. Ittichaichareon C, Suksri S, Yingthawornsuk T. Speech recognition using MFCC. Int Conf on Computer Graphics, Simulation and Modeling (ICGSM'2012) 2012: 135-138.
  25. Wu Y, Feng J. Development and application of artificial neural network. Wirel Pers Commun 2018; 102: 1645-1656. DOI: 10.1007/s11277-017-5224-x.
  26. Salehinejad H, Sankar S, Barfett J, Colak E, Valaee S. Recent advances in recurrent neural networks. arXiv Preview. 2018. Source: <https://arxiv.org/abs/1801.01078>. DOI: 10.48550/arXiv.1801.01078.
  27. Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv Preview. 2017. Source: <https://arxiv.org/abs/1711.05101>. DOI: 10.48550/arXiv.1711.05101.
  28. Ariesta MC, Wiryana F, Suharjito, Zahra A. Sentence level indonesian sign language recognition using 3D convolutional neural network and bidirectional recurrent neural network. 2018 Indonesian Association for Pattern Recognition Int Conf (INAPR) 2018: 16-22. DOI: 10.1109/INAPR.2018.8627016.
  29. Habeeb IQ, Al-Zaydi ZQ, Abdulkhudhur HN. Selection technique for multiple outputs of optical character recognition. Eurasian J Math Comput Appl 2020; 8(2): 41-51. DOI: 10.32523/2306-6172-2020-8-2-41-51.
  30. Sugiarto, Diyasa IGSM, Diana IN. Levenshtein distance algorithm analysis on enrollment and disposition of letters application. 2020 6th Information Technology International Seminar (ITIS) 2020: 198-202. DOI: 10.1109/ITIS50118.2020.9321030.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20