A Method to Sort Official Correspondence through Natural Language Processing

Weerasooriya, T.; Perera, N.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/14343

Full metadata record

DC Field	Value	Language
dc.contributor.author	Weerasooriya, T.	-
dc.contributor.author	Perera, N.	-
dc.date.accessioned	2016-09-02T05:45:12Z	-
dc.date.available	2016-09-02T05:45:12Z	-
dc.date.issued	2016	-
dc.identifier.citation	Weerasooriya, T. and Perera, N. 2016. A Method to Sort Official Correspondence through Natural Language Processing. Proceedings of the Second International Conference on Linguistics in Sri Lanka, ICLSL 2016, 25th August 2016, Department of Linguistics, University of Kelaniya, Sri Lanka. pp 119.	en_US
dc.identifier.issn	2513-2954	-
dc.identifier.uri	http://repository.kln.ac.lk/handle/123456789/14343	-
dc.description.abstract	Natural language Processing (NLP) is a new branch of study in Computational Linguistics and the field has undergone rapid development over the past few decades. Keyword extraction is a popular application of NLP. The present study makes use of Stanford Core NLP, an NLP tool that enables Parts-of-Speech (POS) tagging in order to extract the keywords from official correspondence. POS tagging identifies all the parts of speech in a sentence and categorises them into the relevant grammatical categories. Capitalising on the grammatical uniformity of formal written English, the system is able to identify all the noun phrases and verb phrases of a sentence. Hence, the subject and the predicate of the sentence are isolated. Document sorting with regard to official correspondence is done through the system by analysing the „object‟ line of an official letter or the „subject‟ line of an e-mail, and listing the noun phrases and verb phrases. The document is then sorted to the relevant department. In order to prevent slips in the system, the remaining words of the „object‟ / „subject‟ lines are filtered through a keyword corpus. This increases the accuracy of the keyword extraction process. The present system proved to be more efficient that selecting keywords through a filter, as the POS tagging sorts and presents keywords in an order where the respondents are able to grasp the main idea of the sentence. The subsidiary list of words extracted through the key word corpus adds to the accuracy of the system. The present study is only limited to official correspondence in English. It could be modified to be adapted to other languages.	en_US
dc.language.iso	en	en_US
dc.publisher	Department of Linguistics, University of Kelaniya, Sri Lanka	en_US
dc.subject	document sorting	en_US
dc.subject	keyword extraction	en_US
dc.subject	natural language processing	en_US
dc.subject	official correspondence	en_US
dc.subject	part-of-speech tagging	en_US
dc.title	A Method to Sort Official Correspondence through Natural Language Processing	en_US
dc.type	Article	en_US
Appears in Collections:	ICLSL 2016

Files in This Item:

File	Description	Size	Format
ICLSL Book.119.pdf		139.55 kB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets