Browsing by Author "Kumara, K.H."
Now showing 1 - 10 of 10
- Results Per Page
- Sort Options
Item An analysis of sound parameters for prosodic modeling in Sinhala text to speech synthesis(Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Dias, N.G.J.; Kumara, K.H.; Dolawattha, D.D.M.Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software and/or hardware. Text-to-Speech (TTS) is one of the speech synthesis technologies. Before a synthesizer can produce an utterance, several steps have to be completed. Among them, after computing the basic pronunciation from authographic text, prosody annotation should be performed. Finding correct intonation, stress, and duration from written text is the most challenging problem for most of the natural languages. These features together are called prosodic or suprasegmental features and may be considered as the melody, rhythm, and emphasis of the speech at the perceptual level. Unfortunately, written text usually contains very little information of these features and some of them change dynamically during speech. However, with some specific control characters this information must be given (at least some extend) to the speech synthesizer to produce enough natural speech of the target language. On the other hand timing at sentence level or grouping of words into phrases correctly is difficult; in many languages, prosodic phrasing is not always marked in text by punctuation, and phrasal accentuation is almost never marked. If there is no breath pauses in speech or if they are in wrong places, the speech may sound very unnatural or even the meaning of the sentence may be misunderstood. As an example, in Sinhala, the input string " wïu wdjo@ ” " can be spoken as three different ways changing the intonation patterns as angry, sadness and sarcastic; giving three different meanings to the listener. Here intonation means how the pitch pattern or fundamental frequency changes during speech. The prosody of continuous speech depends on many separate aspects, it may be twice as high as with male voice and with children it may be even three, such as the meaning of the sentence and the speaker characteristics and emotions. Therefore it is clear that prosody plays a major role in speech synthesis, and a deeper treatment of prosody is a must in any kind of speech synthesis. In this work, in order to develop generic models for prosodic synthesis in speech synthesis, we selected 150 possible sentences in Sinhala Language and recorded them according to the above three intonation patterns (i.e. angry, sadness and sarcastic) with a female native speaker who is a well trained person in Drama and Theater. Then we computed various speech parameters for above 150X3 sentences using PRAAT speech processing tool developed by www.praat.org. Hence we found that for all above 150 sentences there is an incremental pattern in the duration from Angry to Sarcastic. No regular pattern in Median, Mean, Standard Deviation, Minimum, and Maximum values of the Pitch parameter. Regarding the pulses, we computed the Number of pulses, Number of periods, Mean period, Standard deviation of period for each of the above sound files and we observed that there is no regular pattern in the parameter Pulses. For voicing parameter we computed the Fraction of locally unvoiced frames, Number of voice breaks and Degree of voice breaks. However for this parameter there were not regular patterns too. Then we computed the Harmonicity values as Mean autocorrelation, Mean noise-to-harmonics ratio, Mean harmonics-to-noise ratio and found that there is no regular pattern. After computing the mean-energy intensity of each sentences, we found that there is an incremental pattern in the Intensity by concerning the order Angry, Sarcastic and Sadness. Finally we computed the formant values as First formant, First Bandwidth, Second Formant, Second Bandwidth, Third formant, Third Bandwidth, fourth formant and forth bandwidth and found that there is no regular pattern in different formant parameters. Although there are no regular patterns in most of the above speech parameters, in order to develop a more natural sounding speech synthesizer, however these parameters should be annotated with basic pronunciation computed from the authograpich text in speech synthesis. Therefore in future we hope to develop more generic probabilistic models based on this analysis to model above speech parameters for Sinhala speech synthesis.Item Automatic Segmentation Of Given Set Of Sinhala Text Into Syllables For Speech Synthesis(University of Kelaniya, 2007) Kumara, K.H.; Dias, N.G.J.; Sirisena, H.A dictionary based automatic syllabification tool has been given for Speech Synthesis in Sinhala language. This tool is also capable of providing frequency distributions of Vowels, Consonants and Syllables for a given set of Sinhala text. A method of determining syllable boundaries has also been shown. Detection of Syllable boundaries for a given Sinhala sentence is achieved by four main phases and those phases have been described with examples. Rules for the automatic segmentation of words into syllables have been derived based on a dictionary. An algorithm has been produced for the implementation of these rules which utilizes the dictionary together with an accurate mark up of the syllable boundaries.Item Classification and Regression Trees (CART) based Data Driven Approach for Prosody Duration Modeling in Sinhala Language(Research Symposium 2010 - Faculty of Graduate Studies, University of Kelaniya, 2010) Dolawattha, D.D.M.; Dias, N.G.J.; Kumara, K.H.A Text-to-Speech (TTS) Synthesizer or Text-to-Speech Engine is a computer based system that capable to read any text aloud with naturally. In TTS, the text might be inserted directly to the computer by an operator or an output file of an Optical Character Recognition (OCR) system of a scanned written text document. Prosody features play a major role when developing a TTS system. Getting the correct intonation, Stress and duration from written text is the most challenging problems for natural languages. The prosodic duration highly affect on machine generated synthetic speech’s naturalness and intelligibility. Here we have used different features that are automatically derived from the text and affect on the duration pattern of the natural speech to be modeled the duration. In this work, in order to develop generic models for prosodic synthesis in speech synthesis, we have selected a speech corpus of 150 possible sentences in Sinhala Language and recorded them according to the three intonation patterns angry, sadness and sarcastic with a female native speaker who is a well trained person in Drama and Theater. Both the waveform and the spectrogram were used to determine the segment (phoneme) boundaries, and the boundaries identified are confirmed by listening to the speech. Each segment in the corpora was annotated with the following features together with the actual segment duration and finally generated the CART. Identity of the current phoneme, Identity of the preceding phoneme, the features considered are the Identity of the following phoneme, Position in the parent syllable, Parent syllable initial, Parent syllable final, Parent syllable position type, Number of syllables in the parent word, Position of parent syllable in the word, Parent syllables break information, Phrase length (number of words) and Position of phrase in the utterance. Above features were observed from similar worked carried out for other languages specially Asian languages [1]. Predictions of the segmental durations were done as follows. The decision tree (CART) was traversable starting from the root node, taking various paths satisfying the conditions at intermediate nodes, till the leaf node is reached. The leaf node contains the value of segmental duration prediction.Item Design and Implementation of a Web-Based Faculty Information System(University of Kelaniya, 2006) Kumara, K.H.; Munasinghe, L.; Jayasuriya, K.D.; Dias, N.G.J.; de Silva, C.H.; Kalingamudali, S.R.D.Although Information Systems (IS) are valuable elements for organizations, the private and public sectors in Sri Lanka are reluctant to use IS for decision making, organizing and classifying data, processing transactions, and for many other activities. This is caused by the lack of computer literacy and conventional attitudes of the majority of the Sri Lankan community. Even in the higher education institutions in Sri Lanka, majority of both staff and students who are well aware of information technology, rely on conventional ways of handling information. One major reason for the above issue is lack of availability of application software well suited for their needs. On one hand, such types of software are rarely used by institutes because of their high cost; on the other hand, they are highly organization dependent. Hence steps have been taken to build a Faculty Information System (FIS) for the Faculty of Science, University of Kelaniya. The FIS was developed in a network environment, with the active participation of all those involved by means of continuous dialogues with the aim of both promoting and demonstrating its benefits and by catering to the different needs arising from the faculty community. The FIS consists of three major subsystems, namely FIS Web Based Subsystem (FISW), FIS Intranet Sub System (FISI) and FIS Examination Sub System (FISE). FISW provides www access to FIS users at any time from anywhere. FISI enables the capability of access to FIS via the Faculty office local area network with security restrictions. FISE processes the examination data in a highly secured environment which is separated from both FISW and FISI. FISI and FISW eventually connect with FISE under security restrictions as required. It is clear that development of this type of tool has social, cultural and technological dimensions. What we planned is one thing, what happened in reality and how the stake holders respond to the tool is another. An evidence of the neediness of this type of tool to the faculty is the number of accesses, 41784, in two years. The above figure is not a complete measure of acceptance of FIS. To detect its defects and limitations, in addition it is necessary to take into account the number of pages requested by each registered user in the FIS. These statistics can be used to enhance the features of FIS.Item Food web: an interactive software for quantifying Wine miller’s trophic networks in fish communities(Sri Lanka Association for Fisheries and Aquatic Resources, 2004) Weliange, W.S.; Wickramasinghe, R.I.P.; Kumara, K.H.; de Silva, C.; Amarasinghe, U.S.; Vijverberg, J.Observed properties of aquatic food webs have important management implications as well as important theoretical implications in the subjects of fisheries science and aquatic ecology. The food web approach is useful to understand pathways of energy and material transfer and the structure of the hierarchy of species trophic interactions in aquatic ecosystems. Winemiller (1990) presented a graphical method to investigate spatial and temporal variation in trophic networks in tropical fish communities. A computer programme was developed to produce graphic illustrations of trophic networks in the fish communities and associated food web parameters namely number of nodes, compartmentalization, connectance, average number of prey per node, average number of predators per node and ratio of consumer nodes to total nodes. The input data for this software are relative importance of food items of constituent species in the fish community and the tropic levels of prey items. The graphic illustrations and associated food web parameters mentioned above can be used for spatial and temporal comparison of trophic relationships in fish communities.Item MBROLA Formatted Diphone Database for Sinhala Language(University of Kelaniya, 2007) Kumara, K.H.; Dias, N.G.J.; Wickramasinghe, R.l.P.Diphone synthesis is one of the most popular methods used for creating a synthetic voice from recordings or samples of a particular person. Diphones are speech units that begin in the middle of the stable state of a phone and end in the middle of the following phone. The main interest in diphone synthesis is that they minimize the concatenation problems. The aim of the MBROLA project, recently initiated by the Facult' e Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones, takes a list of allophones associated with prosodic information as input and outputs 16 bit linear speech samples. Diphone databases tailored to the MBROLA format are necessary to run the synthesizer. Therefore we put forward a Diphone database, tailored to the MBROLA format, to generate synthetic voice for Sinhala language through MBROLA .pho reader. The first step of building the diphone database was the fixing a list of all the phones (acoustic instances of phonemes) of Sinhala language. Creating the diphone database was achieved in three steps: Creating a text corpus, Recording the corpus and Segmenting the speech corpus. For the text corpus, we used few selected chapters of two Sinhala novels. The corpus was then red by two (Male and Female) native Sinhala speakers, digitally recorded and stored. Then all diphones were spotted manually with the help of Speech Viewer ofCSLU toolkit which was developed by the Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology, USA. A diphone database was finally created with 1004 diphone segments, which summarizes the results in the form of: the name of diphoncs, the related waveforms, their duration, and internal sub-splittings. Since we did not consider allophone variations in all instances, it may reduce the naturalness of the resulting synthetic speech. It is also possible that the number of diphone segments may higher than the above number (1004). However, most of the common occurrences of diphones were included in the database that we have developed. 137Item New Interpretation Of Primitive Pythagorean Triples And A Conjecture Related To Fermat’s Last Theorem(University of Kelaniya, 2007) Piyadasa, R.A.D.; Munasinghe, J.; Mallawa Arachchi, D.K.; Kumara, K.H.In this study primitive Pythagorean triples have been carefully examined and found that all of them satisfy a simple rule related to mean value theorem. It is pointed out that integral triples satisfying the equation on Fermat’s Last Theorem should satisfy a special rule related to mean value theorem. A conjecture is proposed which may lead to find a simple proof of Fermat’s Last Theorem.Item Text-to-speech synthesis for Sinhala language(2009) Kumara, K.H.Item A tool for automatic derivation of phone transitions for the creation of a diphone database for Sinhala text to speech synthesis(Research Symposium 2009 - Faculty of Graduate Studies, University of Kelaniya, 2009) Kumara, K.H.; Dias, N.G.J.Since the conventional user interfaces such as keyboard and monitors restrict the usage of computers, there is a dire need for an interface other than keyboard and screen-interface that is widely in use at present. Speech technologies promise to be the next generation user interfaces. In general, two technologies for processing speech are needed. One is speech recognition, and the other is speech synthesis. Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software and/or hardware. Text-to-Speech (TTS) is one of the speech synthesis technologies. TTS can be defined as “the production of speech by machines, by way of the automatic phonetization of the sentences to utter”. Before a synthesizer can produce an utterance, several steps have to be completed. First, the right segments/units have to be selected. The units usually used are diphones, half-syllables, and triphones etc. Many synthesizers use diphones as their basic units of concatenation. A diphone is the transition between two speech sounds, obtained from natural speech. Creating a diphone database, which contains all the sound transitions in the target language, is critical in diphone TTS synthesis.Item A tool for automatic segmentation of a given Sinhala text into Syllables for Speech synthesis and Speech recognition(University of Kelaniya, 2006) Kumara, K.H.; Dias, N.G.J.; Sirisena, H.In the present era of human computer interaction, the educationally under privileged and the rural communities of Sri Lanka are being deprived of technologies that pervade the growing interconnected web of computers and communications. One good solution for this problem would be computers talking to the common man in the language he is comfortable to communicate in. Sri Lankan population has a significant percentage of people who are educationally under-privileged. On one hand we claim that to build an EGovernment or an E-Society in Sri Lanka on the other hand, the advances we make are totally inaccessible by a large number of people in Sri Lanka. Under such circumstances, we cannot expect rural/educationally under-privileged people to use computers and IT products unless we remove the need of being literate, which exists as a barrier between them and computers. However, the interaction between the computer and the user is largely through keyboard and screen-oriented systems. In the current Sri Lankan context, this restricts the usage of computers to a miniscule fraction of the population, who are both computer-literate and conversant with written English. In order to enable a wider proportion of population to benefit from Information technology, there is a dire need for an interface other than keyboard and screen-interface that is widely in use at present. Speech technologies promise to be the next generation user interface. Software applications having speech and voice recognition abilities have a better chance to communicate with a large percentage of population which include educationally underprivileged, visually challenged and computer illiterates, if these applications can speak and understand the native language. It is well known that the transcription of orthographic words into syllables is one of the principal steps of a syllable based Speech synthesis and Speech recognition. Hence we put forward a dictionary based automatic syllabification tool for Speech Synthesis and Automatic Speech Recognition in Sinhala language. Also it is capable to provide the frequency distributions of Vowels, Consonants and Syllables of given Sinhala text. Although there is no universal agreement for syllable definition, in this research our syllable definition can be considered as Cn 0 V n 1 Cn 0 where Cn 0 signifies 0 to n consonants and V n 1 signifies 1 to n vowels. In this tool, detection of Syllable boundaries for a given Sinhala sentence is achieved by four main phases: (1) Reformat everything encountered (e.g. digits, abbreviations) into words and punctuation.(2) Derive a phonemic representation for each word. (3) Determine the C n 0 V n 1 units for a given word. (4) Reformat above Cn 0 V n 1 units according to the Cn 0 V n 1 Cn 0 definition in order to obtain the syllable boundaries. Following example will give a better explanation of the algorithm.