Providing an efficient method based on machine learning for classifying imbalanced datasets

Mostafa Boroumandzadeh

Providing an efficient method based on machine learning for classifying imbalanced datasets

Publish place: National Congress of Science and Technology in Electrical and Computer Science and Engineering

Publish Year: 1397

نوع سند: مقاله کنفرانسی

زبان: English

This Paper With 8 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

هوش مصنوعی > یادگیری ماشین

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/868299

شناسه ملی سند علمی:

KTCONG01_002

تاریخ نمایه سازی: 21 خرداد 1398

Abstract:

One of the most important issues in data mining is classifying imbalanced datasets. In many supervised learning applications, there is a significant difference between the prior probabilities of different classes, such as between the probabilities with which an example belongs to the different classes of the classification problem. This situation is known as the class imbalance problem (Chawla et al, 2004). Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews (Garcia et al, 2009). The term imbalanced dataset is generally referred to a dataset that has many differences in the number of instances in various classes (Wang and Yao, 2009). Traditional classification methods do not act well on imbalanced data in order to minimize overall errors, since they generally assume that the distribution of classes is balanced. This issue is very important and is considered as a challenging issue. In this work, the data is classified according to the Bagging algorithm, which uses the C4.5 Cost- Sensitive Random Tree as a single classifier. The imperialist competitive algorithm has also been used to determine the cost of misclassify classes in order to construct a cost-sensitive tree.

Keywords:

Imbalanced dataset , bagging algorithm , C4.5 cost-sensitive random tree , imperialist competitive algorithm , G-Mean criterion

Authors

Mostafa Boroumandzadeh

Department of Computer Engineering and Information Technology, Payame Noor University, IR. Iran