A Modified Language Modeling Method for Authorship Attribution
Publish Year: 1395
نوع سند: مقاله کنفرانسی
زبان: English
View: 848
This Paper With 6 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICIKT08_006
تاریخ نمایه سازی: 5 بهمن 1395
Abstract:
This paper presents an approach to a closed-classauthorship attribution (AA) problem. It is based on languagemodeling for classification and called modified languagemodeling. Modified language modeling aims to offer a solutionfor AA problem by Combinations of both bigram wordsweighting and Unigram words weighting. It makes the relationbetween unseen text and training documents clearer with givingextra reward of training documents; training document includingbigram word as well as unigram words. Moreover, IDF valuemultiplied by related word probability has been used, instead ofremoving stop words which are provided by Stop words list. weevaluate Experimental results by four approaches; unigram,bigram, trigram and modified language modeling by using twoPersian poem corpora as WMPR-AA2016-A Dataset and WMPRAA2016-B Dataset. Results show that modified language modelingattributes authors better than other approaches. The result onWMPR-AA2016-B, which is bigger dataset, is much better thananother dataset for all approaches. This may indicate that ifadequate data is provided to train language modeling themodified language modeling can be a good solution to AAproblem.
Authors
Samane Vazirian
Kharazmi International Campus, Shahrood University of Technology Shahrood, Iran
Morteza Zahedi
Kharazmi International Campus, Shahrood University of Technology Shahrood, Iran
مراجع و منابع این Paper:
لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :