Part of Speech (POS) tagger for Sinhala language

Jayaweera, A.J.P.M.P.; Dias, N.G.J.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/8062

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jayaweera, A.J.P.M.P.
dc.contributor.author	Dias, N.G.J.
dc.date.accessioned	2015-06-05T08:40:59Z
dc.date.available	2015-06-05T08:40:59Z
dc.date.issued	2011
dc.identifier.citation	Jayaweera, A.J.P.M.P. and Dias, N.G.J., 2011. Part of Speech (POS) tagger for Sinhala language, Proceedings of the Annual Research Symposium 2011, Faculty of Graduate Studies, University of Kelaniya, pp 81.	en_US
dc.identifier.uri
dc.identifier.uri	http://repository.kln.ac.lk/handle/123456789/8062
dc.description.abstract	Sinhala is a morphologically complex and agglutinative language. Most of the features of the words are postpositionally affixed to the root word. This paper presents a POS (Part Of Speech) tagger for Sinhala language using Hidden Markov Model (HMM). Part Of Speech tagging is one of the fundamental and important steps of any natural language processing task, which is the process of assigning a part-of-speech or other lexical class marker to each word in a sentence. This is important in every area of natural language processing (NLP) from speech recognition to machine translation, spelling and grammar checking to language-based information retrieval on the web. The tagger takes a sentence, a tagset and a corpus as input and gives the tagged sentence as output. The tagging process is done by counting the tag sequence probability P(ti\|ti-1) and a word-likelihood probability P(wi\|t) form the given corpus, where the linguistic knowledge is automatically extracted from the annotated corpus. In this research, we use the tagset and the corpus developed by UCSC/LRTL (2005) under PAN Localization project. The current tagset consists of 29 morphological syntactic tags. An algorithm is presented in this paper for the implementation of POS tagging system for Sinhala language, which would enable users to reach more than 80% of the success rate.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Kelaniya	en_US
dc.subject	Part-of-speech (POS), Morphology, lexical, lemma, word stream, affixes, algorithm, stochastic model, Hidden Markov Model (HMM), Natural language processing (NLP)	en_US
dc.title	Part of Speech (POS) tagger for Sinhala language	en_US
dc.type	Article	en_US
Appears in Collections:	ARS - 2011

Files in This Item:

File	Description	Size	Format
N.G.J. Dias,.pdf		387.81 kB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets