Authorship Clustering using Homogeneous Feature Space and Two-stepped Automatic Fuzzy Cmeans Clustering
Publish Year: 1399
نوع سند: مقاله ژورنالی
زبان: English
View: 205
This Paper With 10 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_JAISIS-1-1_006
تاریخ نمایه سازی: 17 فروردین 1400
Abstract:
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications. Moreover, it is a challenging task for both humans and computers considering complex content of document with variety of backgrounds. Due to nature of task it is always considered as an unsupervised task. Clustering documents according to the linguistic style of the authors who wrote them has been a task little studied by the research community. In order to address this problem, PAN Evaluation Framework has become the first effort to promote the development of the author clustering. There are different approaches to address the task and this article proposes a method based on a set of homogeneous features and two-stepped automatic FCM clustering. We use word Ngram, part-of-speech tagging and some other context free features, then using document similarity graph (DSG) estimating number of clusters; finally we use FCM to cluster corpus. We have done the task in very short amount of time and our performance results is comparable with leaderboard competitors in PAN CLEF 2017 challenge.
Keywords:
Authors
Mohammad Aminian
Computer Engineering Department, Bu Ali Sina University, Hamedan, Iran
Mahdi Eskandari
Computer Engineering Department, Bu Ali Sina University, Hamedan, Iran