Digital Repository

Classification and Regression Trees (CART) based Data Driven Approach for Prosody Duration Modeling in Sinhala Language

Show simple item record

dc.contributor.author Dolawattha, D.D.M.
dc.contributor.author Dias, N.G.J.
dc.contributor.author Kumara, K.H.
dc.date.accessioned 2014-12-17T08:54:52Z
dc.date.available 2014-12-17T08:54:52Z
dc.date.issued 2010
dc.identifier Statistics & Computer Science en_US
dc.identifier.citation Research Symposium; 2010 en_US
dc.identifier.uri http://repository.kln.ac.lk/handle/123456789/4758
dc.description.abstract A Text-to-Speech (TTS) Synthesizer or Text-to-Speech Engine is a computer based system that capable to read any text aloud with naturally. In TTS, the text might be inserted directly to the computer by an operator or an output file of an Optical Character Recognition (OCR) system of a scanned written text document. Prosody features play a major role when developing a TTS system. Getting the correct intonation, Stress and duration from written text is the most challenging problems for natural languages. The prosodic duration highly affect on machine generated synthetic speech’s naturalness and intelligibility. Here we have used different features that are automatically derived from the text and affect on the duration pattern of the natural speech to be modeled the duration. In this work, in order to develop generic models for prosodic synthesis in speech synthesis, we have selected a speech corpus of 150 possible sentences in Sinhala Language and recorded them according to the three intonation patterns angry, sadness and sarcastic with a female native speaker who is a well trained person in Drama and Theater. Both the waveform and the spectrogram were used to determine the segment (phoneme) boundaries, and the boundaries identified are confirmed by listening to the speech. Each segment in the corpora was annotated with the following features together with the actual segment duration and finally generated the CART. Identity of the current phoneme, Identity of the preceding phoneme, the features considered are the Identity of the following phoneme, Position in the parent syllable, Parent syllable initial, Parent syllable final, Parent syllable position type, Number of syllables in the parent word, Position of parent syllable in the word, Parent syllables break information, Phrase length (number of words) and Position of phrase in the utterance. Above features were observed from similar worked carried out for other languages specially Asian languages [1]. Predictions of the segmental durations were done as follows. The decision tree (CART) was traversable starting from the root node, taking various paths satisfying the conditions at intermediate nodes, till the leaf node is reached. The leaf node contains the value of segmental duration prediction. en_US
dc.language.iso en en_US
dc.publisher Research Symposium 2010 - Faculty of Graduate Studies, University of Kelaniya en_US
dc.title Classification and Regression Trees (CART) based Data Driven Approach for Prosody Duration Modeling in Sinhala Language en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Advanced Search

Browse

My Account