Developing a negative speech emotion recognition model for safety systems using deep learning

Growing threats in public spaces have forced people to question personal security, making technology more relevant, especially in speech recognition. This paper proposes a security safety system by considering keyword and negative emotion detection to solve this problem. It detects the wake-up word "ON" whenever it is spoken with negative emotion. Our essential contribution is two-fold: first detecting the presence of the wake-up keyword 'ON' in the speech using a Convolutional Neural Network (CNN) model, and second, detecting negative emotion in the speech through a Long Short-Term Memory (LSTM) Model. In this paper, we proposed combining the models above, catering to the same problem statement. From the suggested methodology, the CNN-based keyword detection model achieves 97.23% accuracy for the safety-related ‘ON’ keyword, placing it only slightly above comparable works, while the LSTM-based negative emotion recognition registers 88.94% accuracy, trailing advanced architectures from recent developments. The dataset curation, different methodologies implemented, and system pipeline are some of the building blocks discussed further. The paper also compares feature extraction techniques such as MEL Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), CHROMA, and MEL. Moreover, as speech recognition applications with more than one model are becoming increasingly popular, this analysis would help develop applications that require a similar end-to-end construct. © The Author(s) 2025.

Авторы
Jena S. , Basak S. , Agrawal H. , Saini B. , Gite S. , Kotecha K. , Alfarhood S.
Журнал
Издательство
Springer Nature
Номер выпуска
1
Язык
Английский
Статус
Опубликовано
Номер
54
Том
12
Год
2025
Организации
  • 1 Computer Science Engineering Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, 412115, India
  • 2 Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, 412115, India
  • 3 Artificial Intelligence and Machine Learning Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, 412115, India
  • 4 Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O.Box 51178, Riyadh, 11543, Saudi Arabia
  • 5 Peoples’ Friendship University of Russia, RUDN University, Moscow, 117198, Russian Federation
Ключевые слова
Automatic speech recognition (ASR); Convolutional neural network (CNN); Long short-term memory (LSTM) model; MEL-frequency cepstral coefficients (MFCC); Safety systems
Цитировать
Поделиться

Другие записи