Persian News Classification Using Bag-of-Concepts
Publish Year: 1397
نوع سند: مقاله کنفرانسی
زبان: English
View: 588
This Paper With 6 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IDS03_074
تاریخ نمایه سازی: 31 اردیبهشت 1398
Abstract:
Text classification is the task of automatically assigning a document set to a predefined set of classes or topics.The representation of a document has a strong impact on the performance of classification algorithms. A common document representation is Bag-Of-Words (BOW), which represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality and as the number of unique words increases, the classifier fails to preserve an acceptable accuracy. In this paper the method of Bag-Of-Concepts (BOC) is employed which overcomes the weakness of (BOW). The purpose of this method is to group semantically similar words in the hope of decreasing high dimensionality that occurs in the BOW method. The superiority of this method compared to other approaches is that this method incorporates the impact of semantically similar words on preserving document proximity effectively.
Keywords:
Authors
Asma Faraji Dizaji
school of mathematics, statistics and computer science, college of science, university of tehran
Hedieh Sajedi
school of mathematics, statistics and computer science, college of science, university of tehran
Arian Hedayati Far
school of mathematics, statistics and computer science, college of science, university of tehran