Text Anomalies Detection Using Histograms of Words
Publish Year: 1395
نوع سند: مقاله ژورنالی
زبان: English
View: 326
This Paper With 6 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
JR_ACSIJ-5-1_010
تاریخ نمایه سازی: 19 آبان 1397
Abstract:
Authors of written texts mainly can be characterized by some collection of attributes obtained from texts. Texts of the same author are very similar from the style point of view. We can consider that attributes of a full text are very similar to attributes of parts in the same text. In the same thoughts can be compared different parts of the same text. In the paper, we describe an algorithm based on histograms of a mapped text to interval 0,1 . In the mapping, it is kipped the word order as in the text. Histograms are analyzed from a cluster point of view. If a cluster dispersion is not large, the text is probably written by the same author. If the cluster dispersion is large, the text will be split in two or more parts and the same analysis will be done for the text parts. The experiments were done on English and Arabic texts. For combined English texts our algorithmcovers that texts were not written by one author. We have got the similar results for combined Arabic texts. Our algorithm can be used to basic text analysis if the text was written by one author.
Keywords:
Authors
Abdulwahed Almarimi
Institute of Computer Science, Faculty of Science, P. J. Šafárik University in Košice ۰۴۰۰۱ Košice, Slovakia
Gabriela Andrejková
Institute of Computer Science, Faculty of Science, P. J. Šafárik University in Košice ۰۴۰۰۱ Košice, Slovakia