CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Persian News Classification Using Bag-of-Concepts

عنوان مقاله: Persian News Classification Using Bag-of-Concepts
شناسه ملی مقاله: IDS03_074
منتشر شده در سومین کنفرانس سیستم های تصمیم گیری هوشمند در سال 1397
مشخصات نویسندگان مقاله:

Asma Faraji Dizaji - school of mathematics, statistics and computer science, college of science, university of tehran
Hedieh Sajedi - school of mathematics, statistics and computer science, college of science, university of tehran
Arian Hedayati Far - school of mathematics, statistics and computer science, college of science, university of tehran

خلاصه مقاله:
Text classification is the task of automatically assigning a document set to a predefined set of classes or topics.The representation of a document has a strong impact on the performance of classification algorithms. A common document representation is Bag-Of-Words (BOW), which represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality and as the number of unique words increases, the classifier fails to preserve an acceptable accuracy. In this paper the method of Bag-Of-Concepts (BOC) is employed which overcomes the weakness of (BOW). The purpose of this method is to group semantically similar words in the hope of decreasing high dimensionality that occurs in the BOW method. The superiority of this method compared to other approaches is that this method incorporates the impact of semantically similar words on preserving document proximity effectively.

کلمات کلیدی:
Bag of Words; Bag of Concepts; K-means.

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/855074/