Browsing by Author "Dias, G."
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item A Comprehensive Part of Speech (POS) Tag Set for Sinhala Language.(The Third International Conference on Linguistics in Sri Lanka, ICLSL 2017. Department of Linguistics, University of Kelaniya, Sri Lanka., 2017) Dilshani, N.; Fernando, S.; Ranathunga, S.; Jayasena, S.; Dias, G.Sinhala, which belongs to Indo-Aryan language family, is a morphologically complex language. Most of the features of the words are postpositionally affixed to the root word. Thus, well-developed Part of Speech (POS) tag sets for languages such as English cannot be easily adopted to create a POS tag set for Sinhala. Moreover, currently available Sinhala POS tag sets have many limitations such as the unavailability of tags for certain words. The objective of the research is to overcome and to identify ambiguities and limitations of the present POS tag sets for Sinhala language, and to develop a comprehensive multi-level tag set for Sinhala language. The new tag set was designed after a thorough evaluation of different types of corpora such as news articles and official government letters, and as well as an analysis of the existing POS tag set for Sinhala. This new tag set consists of 148 tags and is organized into 3 levels. Thus, it covers most of the word classes and inflection based grammatical variations of the Sinhala language. The ultimate purpose of developing this tag set is to implement an automatic POS tagger, which is an essential tool in implementing Natural Language Processing Applications. To train the automatic POS tagger, a corpus of 300000 words has been POS annotated manually using this tag set. This tag set produced an overall accuracy of 84.68% and it bypasses the other Sinhala POS taggers. However, this annotation is done only up to level 2 in the tag set. Annotating at level 3 has the potential to introduce many ambiguities to the manual annotation process, due to the large number of POS tags. Thus this opens up new research avenues to investigate on the use of inflectional morphological features of Sinhala language, in order to determine the POS tag of a word at the third level.Item A Corpus-Based Morphological Analysis of Sinhala Verbs.(The Third International Conference on Linguistics in Sri Lanka, ICLSL 2017. Department of Linguistics, University of Kelaniya, Sri Lanka., 2017) Dilshani, W.S.N.; Dias, G.Verbs are essential components of a meaningful sentence and are important in understanding the sentence structure. This paper presents a morphological analysis of Sinhala verbs by combining traditional Sinhala grammar with an analysis of current usage based on a corpus of official documents. Sinhala verbs may be classified into a number of groups based on their morphology. However, there is currently no well-defined methodology to classify a particular verb. It is hypothesized that verb morphology patterns may be identified by analysis of a Sinhala text corpus. On the basis of that hypothesis, this research proposes a classification for Sinhala verbs based on their morphology which allows the morphological analysis of verbs in Sinhala text, and also the derivation of morphological rules for each class of verbs. This classification and rules are derived from an analysis of the corpus of official documents. Additionally, the rules were tested by applying them to another part of the corpus. This also allows the identification of irregular verbs, which do not fall into standard classes. With the analysis, it was identified that the usage of tenses in contemporary official documents is more complex than those given in grammar texts and different combinations of Sinhala grammatical forms are used to denote the time periods among standard tenses. Moreover, other writing forms of Sinhala were identified and it is shown that the existing classification of verbs in traditional grammar is insufficient to handle modern usage of the language.Item Design and Development of a Dashboard for a Real-Time Anomaly Detection System.(Faculty of Computing and Technology, University of Kelaniya, Sri Lanka., 2017) Korala, H.C.; Weerasooriya, G.N.R.; Udantha, M.; Dias, G.Web logs contain a wealth of undiscovered information on user activities and if analyzed in a proper way they can be utilized for many purposes. Identifying malicious attacks and having a daily summary on user activities are some valuable information that can be extracted from these log files. At present, many tools and algorithms have been developed to extract information from these log files but on most occasions, they have failed to present this information to the user to make decisions in real-time. This paper presents a novel approach taken to design and develop a dashboard for a real-time anomaly detection system with the use of some open source tools to process complex events in real-time, batch process stored data using big data tools and dashboard development techniques. The system accepts web log files as the input; first they are cleaned by a preprocessing unit and then published to WSO2’s complex event processor as events to identify and filter out special patterns and summarised by using a set of user specified rules. If an anomaly is detected, an alert or warning will be displayed on the widget based dashboard in real time. Furthermore, each and every event stream that comes to the CEP will be forwarded to WSO2’s Data Analytic Server via 'Thrift' protocol. That data will be saved in a Cassandra big data database for further batch processing which is used for drill down purposes. A widget based Dashboard has been developed with the use of modern dashboard concepts and web technologies to display information such as daily summary, possible security breaches in an interactive way allowing system administrators to make operational decisions then and there based on the information provided. Moreover, users can drill down and analyze the historical security breach information and also can customize the dashboard according to their preference. The evaluation techniques used fall under the criteria of evaluation against well-established standards and evaluation by external expert review. Evaluation for security standards has done against the security standard set by the PCI security standards council and evaluation for dashboard has been carried out against the dashboard standards defined by Oracle which describes about the best practices in developing an effective dashboard. Evaluation by external expert review was done in line with the people who have prior experience in dealing with a dashboard in different contexts. Ten expert evaluators from different expertise areas (System Administrators, UX engineers and QA engineers) have been used for this evaluation and a score based model was used to determine how efficient this dashboard is to view and drill information. Based on the results yielded from the evaluation, it is identified that the dashboard meets with the international standards of dashboard designs, well established security standards in dashboard design as well as provides the best user experience for users in different functional areas.