Digital Repository

Analysis of Emotional Speech Recognition Using Artificial Neural Network

Show simple item record Archana, A.F.C. Thirukumaran, S. 2016-12-23T05:38:02Z 2016-12-23T05:38:02Z 2016
dc.identifier.citation Archana, A.F.C. and Thirukumaran, S. 2016. Analysis of Emotional Speech Recognition Using Artificial Neural Network. Kelaniya International Conference on Advances in Computing and Technology (KICACT - 2016), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p 01. en_US
dc.identifier.isbn 978-955-704-013-4
dc.description.abstract This paper presents an artificial neural network based approach for analyzing the classification of emotional human speech. Speech rate and energy are the most basic features of speech signal but they still have significant differences between emotions such as angry and sad. The feature pitch is frequently used in this work and auto-correlation method is used to detect the pitch in each of the frames. The speech samples used for the simulations are taken from the dataset Emotional Prosody Speech and Transcripts in the Linguistic Data Consortium (LDC). The LDC database has a set of acted emotional speeches voiced by the males and females. The speech samples of only four emotions categories in the LDC database containing both male and female emotional speeches are used for the simulation. In the speech pre-processing phase, the samples of four basic types of emotional speeches sad, angry, happy, and neutral are used. Important features related to different emotion states are extracted to recognize speech emotions from the voice signal then those features are fed into the input end of a classifier and obtain different emotions at the output end. Analog speech signal samples are converted to digital signal to perform the pre-processing. Normalized speech signals are segmented in frames so that the speech signal can maintain its characteristics in short duration. 23 short term audio signal features of 40 samples are selected and extracted from the speech signals to analyze the human emotions. Statistical values such as mean and variance have been derived from the features. These derived data along with their related emotion target are fed to train using artificial neural network and test to make up the classifier. Neural network pattern recognition algorithm has been used to train and test the data and to perform the classification. The confusion matrix is generated to analyze the performance results. The accuracy of the neural network based approach to recognize the emotions improves by applying multiple times of training. The overall correctly classified results for two times trained network is 73.8%, whereas it is 83.8% when increasing the training times to ten. The overall system provides a reliable performance and correctly classifying more than 80% emotions after properly trained. en_US
dc.language.iso en en_US
dc.publisher Faculty of Computing and Technology, University of Kelaniya, Sri Lanka en_US
dc.subject Confusion matrix en_US
dc.subject Neural Networks en_US
dc.subject Short Term Features en_US
dc.subject Speech Emotions en_US
dc.title Analysis of Emotional Speech Recognition Using Artificial Neural Network en_US
dc.type Article en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


My Account