Classification of Persian News Articles using Machine Learning Techniques

Publish Year: 1400
نوع سند: مقاله ژورنالی
زبان: English
View: 169

This Paper With 10 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_CKE-4-1_001

تاریخ نمایه سازی: 25 خرداد 1401

Abstract:

Automatic text classification, which is defined as the process of automatically classifying texts into predefined categories, has many applications in our everyday life and it has recently gained much attention due to the in-creased number of text documents available in electronic form. Classifying News articles is one of the applications of text classification. Automatic classification is a subset of machine learning techniques in which a classifier is built by learning from some pre-classified documents. Naïve Bayes and k-Nearest Neighbor are among the most common algorithms of machine learning for text classification. In this paper, we suggest a way to improve the performance of a text classifier using Mutual information and Chi-square feature selection algorithms. We have observed that MI feature selection method can improve the accuracy of Naïve Bayes classifier up to ۱۰%. Experimental results show that the proposed model achieves an average accuracy of ۸۰% and an average F۱-measure of ۸۰%.

Keywords:

Automatic Persian text classification , k-Nearest Neighbor , Naïve Bayes , News text classification , Text mining

Authors

Sareh Mostafavi

Department of Computational Linguistics, Regional Information Center for Science and Technology (RICeST), Shiraz, Fars, Iran

Bahareh Pahlevanzadeh

Department of Design and System Operations, Regional Information Center for Science and Technology (RICeST), Shiraz, Fars, Iran

Mohammad Reza Falahati Qadimi Fumani

Department of Computational Linguistics, Regional Information Center for Science and Technology (RICeST), Shiraz, Fars, Iran

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • [۱]V. K. Vijayan, K. R. Bindu, and L. Parameswaran, “A ...
  • J. Novoviĉová, A. Malík, and P. Pudil, “Feature selection using ...
  • Bahasine, S., et al., Feature selection using an improved Chi-square ...
  • J. R. Vergara and P. A. Estévez, “A review of ...
  • F. Sebastiani, "Machine learning in automated text categorization". ACM computing ...
  • I. Moulinier and J.-G. Ganascia, “Applying an existing machine learning ...
  • P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural ...
  • M. K. Dalal and M. A. Zaveri, “Automatic Text Classification: ...
  • B. S. Harish, D. S. Guru, and S. Manjunath, “Representation ...
  • Mahinovs, A., et al., Text classification method review. ۲۰۰۷ ...
  • R. Jindal, R. Malhotra, and A. Jain, “Techniques for text ...
  • A. McCallum and K. Nigam, “A Comparison of Event Models ...
  • D. W. Aha, D. Kibler, and M. K. Albert, “Instance-Based ...
  • J.R., Quinlan, C۴. ۵: programs for machine learning. ۲۰۱۴: Elsevier ...
  • C. Cortes, and V. Vapnik, Support vector machine. Machine learning, ...
  • M.E. Ruiz, and P. Srinivasan, "Automatic text categorization using neural ...
  • P. Domingos and M. Pazzani, “On the Optimality of the ...
  • J. H. Friedman, (۱۹۹۷). “On bias, variance, ۰/۱-loss, and the ...
  • S. Gil-Begue, C. Bielza, and P. Larrañaga, (۲۰۲۱). "Multi-dimensional Bayesian ...
  • L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep ...
  • X. Zhu, Y. J. Ko, S. Berry, K. Shah, E. ...
  • J. Li, X. Y. Tong, L. Da Zhu, and H. ...
  • S. Paudel, P. W. C. Prasad, and A. Alsadoon, Feature ...
  • R. Wongso, F. A. Luwinda, B. C. Trisnajaya, O. Rusli, ...
  • L. Zhang, L. Jiang, C. Li, and G. Kong, “Two ...
  • D. Tomar and S. Agarwal, “A survey on data mining ...
  • J. O. Pedersen and Y. Yang, “A Comparative Study on ...
  • [۲۷]Y. Yang and X. Liu, “A re-examination of text categorization ...
  • S. Tan, “Neighbor-weighted K-nearest neighbor for unbalanced text corpus,” Expert ...
  • M. Farhoodi and A. Yari, “Applying machine learning algorithms for ...
  • T. Joachims, “Text categorization with support vector machines: Learning with ...
  • L. S. Larkey, “Automatic essay grading using text categorization techniques,” ...
  • L. S. Larkey, “Patent search and classification system,” ۱۹۹۹, doi: ...
  • W. Lam, M. Ruiz, and P. Srinivasan, “Automatic text categorization ...
  • Y. Zhou, Y. Li, and S. Xia, “An improved KNN ...
  • Y. Bao and N. Ishii, “Combining multiple k-nearest neighbor classifiers ...
  • P. Soucy and G. W. Mineau, “A simple KNN algorithm ...
  • L. Esmaeili, M. K. Akbari, V. Amiry, and S. Sharifian, ...
  • M. T. Pilevar, H. Feili, and M. Soltani, “Classification of ...
  • M. H. Elahimanesh, B. Minaei-Bidgoli, and H. Malekinezhad, “Improving K-nearest ...
  • M. Parchami, B. Akhtar, and M. Dezfoulian, “Persian text classification ...
  • A.Bagheri, M. Saraee, and S. Nadi, "PSA: a hybrid feature ...
  • P. Ahmadi, M. Tabandeh, and I. Gholampour, “Persian text classification ...
  • M. B. Dastgheib and S. Koleini, “Persian text classification enhancement ...
  • H. Eghbalzadeh, B. Hosseini, S. Khadivi, and A. Khodabakhsh, “Persica: ...
  • H. Almagrabi, "Predicting the Helpfulness of Product Reviews: a Sentence ...
  • S. Yadav and S. Shukla, “Analysis of k-Fold Cross-Validation over ...
  • H. K. Kim and M. Kim, “Model-induced term-weighting schemes for ...
  • T. Wang, L. Liu, N. Liu, H. Zhang, L. Zhang, ...
  • Y. Li, D. F. Hsu, and S. M. Chung, “Combination ...
  • D. Agnihotri, K. Verma, and P. Tripathi, “An automatic classification ...
  • G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, ...
  • J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature ...
  • H. Liu and R. Setiono, “Chi۲: feature selection and discretization ...
  • S. Lee, J. Song, and Y. Kim, “An empirical comparison ...
  • J. C. Lamirel, P. Cuxac, A. S. Chivukula, and K. ...
  • J. He, A. H. Tan, and C. L. Tan, “On ...
  • J. Tang, S. Alelyani, and H. Liu, “Feature selection for ...
  • S. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive ...
  • D. D. Lewis and M. Ringuette, “A comparison of two ...
  • نمایش کامل مراجع