CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP

عنوان مقاله: Evaluating Semantic and Syntactic Similarity forPlagiarism Detection in English Using NLP
شناسه ملی مقاله: DTIS02_039
منتشر شده در دومین کنفرانس ملی تحول دیجیتال و سیستم های هوشمند در سال 1403
مشخصات نویسندگان مقاله:

Mahsa Khajeh Zadeh
Meisam Zaifar

خلاصه مقاله:
Manually detecting plagiarism in the huge volume of published documents is not feasible.Existing automatic plagiarism detection tools mostly focus on lexical matching, missing semantic andsyntactic aspects of plagiarism. A challenging area of plagiarism detection is the semantic area which is thecombination of lexical and syntactic conversions. NLP can be exploited to analyze the semantic similarityand detect document plagiarism. Hybrid methods, made by a combination of different kinds of algorithms,have proven to be more comprehensive. In this study an existing hybrid similarity algorithm is improvedand a plagiarism detection method and plagiarism score is defined to compare document plagiarism levels.The results on MASRP dataset show a few percent improvement in all similarity evaluation criteria,including accuracy, precision, recall and F-measure. Moreover, the document plagiarism score shows agood reflection of the amount of plagiarism detected in the documents. Our tests on CPSA corpus verifythat the defined plagiarism score correlates to the level of plagiarism in the suspicious document.

کلمات کلیدی:
Semantic Similarity, Syntactic Similarity, Plagiarism, NLP

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/2040043/