Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

Mahsa Khajeh Zadeh; Meisam Zaifar

Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

Publish place: The Second National Conference on Digital Transformation and Intelligent Systems

Publish Year: 1403

نوع سند: مقاله کنفرانسی

زبان: English

This Paper With 9 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/2040043

شناسه ملی سند علمی:

DTIS02_039

تاریخ نمایه سازی: 14 مرداد 1403

Abstract:

Manually detecting plagiarism in the huge volume of published documents is not feasible.Existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic andsyntactic aspects of plagiarism. A challenging area of plagiarism detection is the semantic area which is thecombination of lexical and syntactic conversions. NLP can be exploited to analyze the semantic similarityand detect document plagiarism. Hybrid methods, made by a combination of different kinds of algorithms,have proven to be more comprehensive. In this study an existing hybrid similarity algorithm is improvedand a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels.The results on MASRP dataset show a few percent improvement in all similarity evaluation criteria,including accuracy, precision, recall and F-measure. Moreover, the document plagiarism score shows agood reflection of the amount of plagiarism detected in the documents. Our tests on CPSA corpus verifythat the defined plagiarism score correlates to the level of plagiarism in the suspicious document.

Keywords:

Authors

Mahsa Khajeh Zadeh

Meisam Zaifar