Detecting plagiarism in multiple Sinhala documents

dc.contributor.authorGanepola, G.A.U.E.
dc.contributor.authorWijayasiriwardhane, T.K.
dc.date.accessioned2018-08-17T05:20:46Z
dc.date.available2018-08-17T05:20:46Z
dc.date.issued2018
dc.description.abstractAvailability of unlimited information resources over the Internet and the advancement of the Internet search engines such as Google to locate those resources much easily have contributed to an increase of plagiarism. Though there are a number of software tools available for detecting plagiarism in multiple English documents, no such a tool is yet available for the Sinhala language. This paper presents a novel language dependent approach to detect plagiarism in multiple Sinhala documents. It uses stemming, stop word removal and synonym replacement for text preprocessing and term frequency-inverse document frequency (tf-idf) and cosine similarity for similarity comparison. A prototype software tool was developed and interlinked with an operational Sinhala WordNet to demonstrate the viability of the proposed approach. The prototype tool was validated against a sample of Sinhala assignments from secondary school students. The assignments were also examined by an expert to determine whether they had actually been plagiarized. When compared the results of the prototype tool against those of the expert judgment, we found that our proposed approach for plagiarism detection in multiple Sinhala documents performs with an accuracy of over 80%.en_US
dc.identifier.citationGanepola,G.A.U.E. and Wijayasiriwardhane,T.K. (2018). Detecting plagiarism in multiple Sinhala documents. International Research Conference on Smart Computing and Systems Engineering - SCSE 2018, Department of Industrial Management, Faculty of Science, University of Kelaniya, Sri Lanka. p.166.en_US
dc.identifier.urihttp://repository.kln.ac.lk/handle/123456789/19026
dc.language.isoenen_US
dc.publisherInternational Research Conference on Smart Computing and Systems Engineering - SCSE 2018en_US
dc.subjectPlagiarism detectionen_US
dc.subjectSinhala languageen_US
dc.subjectSinhala WordNeten_US
dc.titleDetecting plagiarism in multiple Sinhala documentsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
SCSE Proceedings - (166).pdf
Size:
627.9 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: