Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

Publish Year: 1403
نوع سند: مقاله کنفرانسی
زبان: English
View: 19

This Paper With 9 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

DTIS02_039

تاریخ نمایه سازی: 14 مرداد 1403

Abstract:

Manually detecting plagiarism in the huge volume of published documents is not feasible.Existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic andsyntactic aspects of plagiarism. A challenging area of plagiarism detection is the semantic area which is thecombination of lexical and syntactic conversions. NLP can be exploited to analyze the semantic similarityand detect document plagiarism. Hybrid methods, made by a combination of different kinds of algorithms,have proven to be more comprehensive. In this study an existing hybrid similarity algorithm is improvedand a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels.The results on MASRP dataset show a few percent improvement in all similarity evaluation criteria,including accuracy, precision, recall and F-measure. Moreover, the document plagiarism score shows agood reflection of the amount of plagiarism detected in the documents. Our tests on CPSA corpus verifythat the defined plagiarism score correlates to the level of plagiarism in the suspicious document.