Persian News Classification Using Bag-of-Concepts

Publish Year: 1397
نوع سند: مقاله کنفرانسی
زبان: English
View: 586

This Paper With 6 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

IDS03_074

تاریخ نمایه سازی: 31 اردیبهشت 1398

Abstract:

Text classification is the task of automatically assigning a document set to a predefined set of classes or topics.The representation of a document has a strong impact on the performance of classification algorithms. A common document representation is Bag-Of-Words (BOW), which represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality and as the number of unique words increases, the classifier fails to preserve an acceptable accuracy. In this paper the method of Bag-Of-Concepts (BOC) is employed which overcomes the weakness of (BOW). The purpose of this method is to group semantically similar words in the hope of decreasing high dimensionality that occurs in the BOW method. The superiority of this method compared to other approaches is that this method incorporates the impact of semantically similar words on preserving document proximity effectively.

Authors

Asma Faraji Dizaji

school of mathematics, statistics and computer science, college of science, university of tehran

Hedieh Sajedi

school of mathematics, statistics and computer science, college of science, university of tehran

Arian Hedayati Far

school of mathematics, statistics and computer science, college of science, university of tehran