Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/15609
Full metadata record
DC FieldValueLanguage
dc.contributor.authorArchana, A.F.C.-
dc.contributor.authorThirukumaran, S.-
dc.date.accessioned2016-12-23T05:38:02Z-
dc.date.available2016-12-23T05:38:02Z-
dc.date.issued2016-
dc.identifier.citationArchana, A.F.C. and Thirukumaran, S. 2016. Analysis of Emotional Speech Recognition Using Artificial Neural Network. Kelaniya International Conference on Advances in Computing and Technology (KICACT - 2016), Faculty of Computing and Technology, University of Kelaniya, Sri Lanka. p 01.en_US
dc.identifier.isbn978-955-704-013-4-
dc.identifier.urihttp://repository.kln.ac.lk/handle/123456789/15609-
dc.description.abstractThis paper presents an artificial neural network based approach for analyzing the classification of emotional human speech. Speech rate and energy are the most basic features of speech signal but they still have significant differences between emotions such as angry and sad. The feature pitch is frequently used in this work and auto-correlation method is used to detect the pitch in each of the frames. The speech samples used for the simulations are taken from the dataset Emotional Prosody Speech and Transcripts in the Linguistic Data Consortium (LDC). The LDC database has a set of acted emotional speeches voiced by the males and females. The speech samples of only four emotions categories in the LDC database containing both male and female emotional speeches are used for the simulation. In the speech pre-processing phase, the samples of four basic types of emotional speeches sad, angry, happy, and neutral are used. Important features related to different emotion states are extracted to recognize speech emotions from the voice signal then those features are fed into the input end of a classifier and obtain different emotions at the output end. Analog speech signal samples are converted to digital signal to perform the pre-processing. Normalized speech signals are segmented in frames so that the speech signal can maintain its characteristics in short duration. 23 short term audio signal features of 40 samples are selected and extracted from the speech signals to analyze the human emotions. Statistical values such as mean and variance have been derived from the features. These derived data along with their related emotion target are fed to train using artificial neural network and test to make up the classifier. Neural network pattern recognition algorithm has been used to train and test the data and to perform the classification. The confusion matrix is generated to analyze the performance results. The accuracy of the neural network based approach to recognize the emotions improves by applying multiple times of training. The overall correctly classified results for two times trained network is 73.8%, whereas it is 83.8% when increasing the training times to ten. The overall system provides a reliable performance and correctly classifying more than 80% emotions after properly trained.en_US
dc.language.isoenen_US
dc.publisherFaculty of Computing and Technology, University of Kelaniya, Sri Lankaen_US
dc.subjectConfusion matrixen_US
dc.subjectNeural Networksen_US
dc.subjectShort Term Featuresen_US
dc.subjectSpeech Emotionsen_US
dc.titleAnalysis of Emotional Speech Recognition Using Artificial Neural Networken_US
dc.typeArticleen_US
Appears in Collections:KICACT 2016

Files in This Item:
File Description SizeFormat 
1.pdf8.15 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.