Analysis of Emotional Speech Recognition Using Artificial Neural Network

Archana, A.F.C.; Thirukumaran, S.

UoK Repository Home
→
Computing and Technology
→
Symposia and Conferences
→
International Conference on Advances in Computing and Technology (ICACT)
→
KICACT 2016
→
View Item

dc.contributor.author	Archana, A.F.C.
dc.contributor.author	Thirukumaran, S.
dc.date.accessioned	2016-12-23T05:38:02Z
dc.date.available	2016-12-23T05:38:02Z
dc.date.issued	2016
dc.identifier.citation	Archana, A.F.C. and Thirukumaran, S. 2016. Analysis of Emotional Speech Recognition Using Artificial Neural Network. Kelaniya International Conference on Advances in Computing and Technology (KICACT - 2016), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p 01.	en_US
dc.identifier.isbn	978-955-704-013-4
dc.identifier.uri	http://repository.kln.ac.lk/handle/123456789/15609
dc.description.abstract	This paper presents an artificial neural network based approach for analyzing the classification of emotional human speech. Speech rate and energy are the most basic features of speech signal but they still have significant differences between emotions such as angry and sad. The feature pitch is frequently used in this work and auto-correlation method is used to detect the pitch in each of the frames. The speech samples used for the simulations are taken from the dataset Emotional Prosody Speech and Transcripts in the Linguistic Data Consortium (LDC). The LDC database has a set of acted emotional speeches voiced by the males and females. The speech samples of only four emotions categories in the LDC database containing both male and female emotional speeches are used for the simulation. In the speech pre-processing phase, the samples of four basic types of emotional speeches sad, angry, happy, and neutral are used. Important features related to different emotion states are extracted to recognize speech emotions from the voice signal then those features are fed into the input end of a classifier and obtain different emotions at the output end. Analog speech signal samples are converted to digital signal to perform the pre-processing. Normalized speech signals are segmented in frames so that the speech signal can maintain its characteristics in short duration. 23 short term audio signal features of 40 samples are selected and extracted from the speech signals to analyze the human emotions. Statistical values such as mean and variance have been derived from the features. These derived data along with their related emotion target are fed to train using artificial neural network and test to make up the classifier. Neural network pattern recognition algorithm has been used to train and test the data and to perform the classification. The confusion matrix is generated to analyze the performance results. The accuracy of the neural network based approach to recognize the emotions improves by applying multiple times of training. The overall correctly classified results for two times trained network is 73.8%, whereas it is 83.8% when increasing the training times to ten. The overall system provides a reliable performance and correctly classifying more than 80% emotions after properly trained.	en_US
dc.language.iso	en	en_US
dc.publisher	Faculty of Computing and Technology, University of Kelaniya, Sri Lanka	en_US
dc.subject	Confusion matrix	en_US
dc.subject	Neural Networks	en_US
dc.subject	Short Term Features	en_US
dc.subject	Speech Emotions	en_US
dc.title	Analysis of Emotional Speech Recognition Using Artificial Neural Network	en_US
dc.type	Article	en_US