Browsing by Author "Weerasinghe, A.R."
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item A Rule Based Syllabification Algorithm for Sinhala(Lecture Notes in Computer Science/ IJCNLP: International Joint Conference on Natural Language Processing, 2005) Weerasinghe, A.R.; Wasala, R.A.; Gamage, K.N.This paper presents a study of Sinhala syllable structure and an algorithm for identifying syllables in Sinhala words. After a thorough study of the Syllable structure and linguistic rules for syllabification of Sinhala words and a survey of the relevant literature, a set of rules was identified and implemented as a simple, easy-to-implement algorithm. The algorithm was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually syllabified. The algorithm performs with 99.95 % accuracy.Item Festival-si: A Sinhala Text-to-Speech System(Lecture Notes in Computer Science/ IJCNLP: International Joint Conference on Natural Language Processing, 2007) Weerasinghe, A.R.; Wasala, R.A.; Gamage, K.N.This paper brings together the development of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting Letter-to-Sound rules in Festival's context sensitive rule format and the implementation of Sinhala syllabification algorithm. A Modified Rhyme Test (MRT) was conducted to evaluate the intelligibility of the synthesized speech and yielded a score of 71.5% for the TTS system described.Item Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis(Proceedings of the COLING/ACL Main Conference Poster Sessions, Association for Computational Linguistics, 2006) Weerasinghe, A.R.; Wasala, R.A.; Gamage, K.N.This paper describes an architecture to convert Sinhala Unicode text into phonemic specification of pronunciation. The study was mainly focused on disambiguating schwa-/\/ and /a/ vowel epenthesis for consonants, which is one of the significant problems found in Sinhala. This problem has been addressed by formulating a set of rules. The proposed set of rules was tested using 30,000 distinct words obtained from a corpus and com-pared with the same words manually transcribed to phonemes by an expert. The Grapheme-to-Phoneme (G2P) con-version model achieves 98 % accuracy.Item The Sinhala Collation Sequence and its Representation in UNICODE(Localisation Focus: The International Journal for Localisation, 2005) Weerasinghe, A.R.; Herath, D.L.; Gamage, K.N.The alphabet of a language is perhaps the first thing we learn as users. The alphabet of our mother tongue would be the first alphabet we ever learn. And yet, a closer look reveals that there is much about such an alphabet that we have not explicitly specified anywhere. The Sinhala alphabet order is a prime example. We use it, recite it and yet would be hard pressed to define it explicitly. Sinhala is spoken in all parts of Sri Lanka except some districts in the north, east and centre by approximately 20 million people. It is spoken by an additional 30,000 (1993) people in Canada, Maldives, Singapore, Thailand and United Arab Emirates. Sinhala is classified as an Indo-European language and used as an official language. The UNICODE Collation Algorithm (UCA) is an attempt to make explicit the collation sequence of any language expressed in the UNICODE (or any other) coding system. In order to express the Sinhala collation sequence (alphabetical order) using UCA, the authors undertook the task of identifying unresolved issues facing the unambiguous definition of the order. This paper first describes the issues identified through this study, suggesting alternate solutions and recommending one of them. Finally, it sets out the recommended collation sequence for Sinhala in the form of the UNICODE collation specification. The outcome of this process is a unique and unambiguous expression of the Sinhala collation sequence which could be tested using existing tools and software environments.