pp. 2489-2500
S&M4068 Research Paper of Special Issue https://doi.org/10.18494/SAM5558 Published: June 25, 2025 Robust Speaker Recognition in Voice Sensing Environments with Specific Background Noises Using Deep Learning of Hybridized Speech Enhancement Generative Adversarial Network and Convolutional Neural Network for Smart Manufacturing [PDF] Ing-Jr Ding and Meng-Chuan Hsieh (Received January 18, 2025; Accepted June 2, 2025) Keywords: speaker recognition, deep learning, hybridized SEGAN-CNN, SEGAN, VGG-16 CNN
Identity recognition using the specific biometrical characteristics of a person has recently become a popular technique. Compared with image-sensor-data-based face and fingerprint recognition, speaker recognition using the acoustic characteristics of the uttered voices obtained from a speaking person is an additional alternative. In certain cases of dark environments or dirty fingers, acoustics-based speaker recognition will be an alternative method for accomplishing identity recognition with satisfactory recognition accuracy. Speaker recognition in practical application scenarios will inevitably encounter the problem of acoustic speech mixed with background noises. Utterances with undesired background noises of specific environments cannot be finely matched with the preestablished speaker models, thus causing inaccurate identity recognition results. To tackle this issue, we present a deep-learning-based method for speaker recognition in a noisy environment, which is a hybridization of two different types of deep learning calculation model, speech enhancement generative adversarial network (SEGAN) and convolutional neural network (CNN), called hybridized SEGAN-CNN. By removing specific background noise from the substandard utterance with noise using SEGAN and classifying the identities of numerous speaking subjects without noise effects using CNN, the task becomes speaker recognition in a clear environment, in which the robustness of speaker recognition can be effectively maintained. The results of experiments using a voice command phrase mixed with motor operation noise for robot navigation control in a simulated factory environment demonstrate the effectiveness of the proposed speaker recognition method.
Corresponding author: Ing-Jr Ding![]() ![]() This work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Ing-Jr Ding and Meng-Chuan Hsieh, Robust Speaker Recognition in Voice Sensing Environments with Specific Background Noises Using Deep Learning of Hybridized Speech Enhancement Generative Adversarial Network and Convolutional Neural Network for Smart Manufacturing, Sens. Mater., Vol. 37, No. 6, 2025, p. 2489-2500. |