Multi-class classification of Persian news with machine learning algorithms and neural network

Publish Year: 1403
نوع سند: مقاله کنفرانسی
زبان: English
View: 255

This Paper With 21 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

ICAII01_058

تاریخ نمایه سازی: 19 اسفند 1403

Abstract:

Every day, a huge amount of textual data is generated in the web space. The ever-increasing growth of human needs for the separated use of information resulting from the processing of these data indicates the importance of using text classification. In this research, using a standard dataset in Persian language, we have evaluated the performance of text classification algorithms. In the first step, we have paid attention to the production of a dataset of Persian news. In the second step, we have performed various preprocessing actions on the generated dataset, which is also of great importance. It should be mentioned that according to the evaluation carried out in this research, the use of different preprocessing methods has a great impact on the final result and the performance of classification algorithms. In the third step, after standardizing the dataset, we have converted the desired texts into vectors, for which we have used the TF-IDF model. In this research, simple Bayes, simple Gaussian Bayes, support vector machine, k-nearest neighbor, logistic regression, and long-short-term memory neural network have been used to compare different methods in text classification on ۸۶ thousand records of Persian news texts. The accuracy obtained is ۵۶%, ۷۰%, ۸۶%, ۷۶%, ۹۴%, and ۸۶% respectively.

Keywords:

Text classification , Persian news dataset production , Naive Bayes , Gaussian Naive Bayes , support vector machine , k-nearest neighbor , logistic regression , long short term memory neural network

Authors

Hossein Hosseini

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran

Hossein Rayat Parvar

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran

Hamid Reza Lotfi

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran

Mohammad Mehdi Mokhtari

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran

Mohammad Ghalenoei

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran

Mohammad Ali Ziaee

Master’s student in artificial intelligence and robotics, Faculty of Artificial Intelligence and Cognitive Sciences - Imam Hossein Comprehensive University - Tehran – Iran