Plagiarism detection educational tool: A student’s assessments similarity checker

Jayakody, J.R.K.C.

Please use this identifier to cite or link to this item: http://repository.kln.ac.lk/handle/123456789/15724

Title:	Plagiarism detection educational tool: A student’s assessments similarity checker
Authors:	Jayakody, J.R.K.C.
Keywords:	Plagiarism Natural Language Processing Term frequency Inverted Document Frequency
Issue Date:	2016
Publisher:	Faculty of Science, University of Kelaniya, Sri Lanka
Citation:	Jayakody, J.R.K.C. 2016. Plagiarism detection educational tool: A student’s assessments similarity checker. In Proceedings of the International Research Symposium on Pure and Applied Sciences (IRSPAS 2016), Faculty of Science, University of Kelaniya, Sri Lanka. p 70.
Abstract:	Plagiarism is very common among students in higher education institutes due to many reasons such as lack of knowledge about the subject, poor academic writing skills or difficulty in meeting a given deadline. The most popular method of plagiarism is to use the online web pages or e-books as it is an easy effort to get the contents from internet, change it and to submit as an original work. Hence, there are bunch of online software tools as well as offline tools exists to detect the plagiarism. However, there are less software tools to identify the copied works among students. Therefore, in this research I developed a plagiarism detection tool to identify the plagiarized assignments or tutorial submitted. Individual assignments and tutorials which had been given to software engineering courses of the Department of Computing and Information System of Wayamba University were used as the dataset. Natural language processing algorithms were developed to derive the statistical features from the assignments such as bag of words, most frequent words, number of words, name entities and paragraphs etc. Moreover, Term Frequency and Inverted Document Frequency (TF-IDF) module was developed to generate a similarity index value among assignments. In addition, Latent semantic analysis module was developed with the word dictionary and vector corpus. Features that were generated and extracted from every module were used to identify the clusters of similar assignments. K-mean clustering algorithms in rapid minor were used to identify the clusters. Most of the submitted assignments were identified with number of clusters. Once the clustering results were verified with the students, it was evident that fairly good results were the given by the automatic cluster classification.
URI:	http://repository.kln.ac.lk/handle/123456789/15724
ISBN:	978-955-704-008-0
Appears in Collections:	IRSPAS 2016

Files in This Item:

File	Description	Size	Format
70.pdf		217.78 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets