Providing an efficient method based on machine learning for classifying imbalanced datasets
Publish place: National Congress of Science and Technology in Electrical and Computer Science and Engineering
Publish Year: 1397
نوع سند: مقاله کنفرانسی
زبان: English
View: 413
This Paper With 8 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
KTCONG01_002
تاریخ نمایه سازی: 21 خرداد 1398
Abstract:
One of the most important issues in data mining is classifying imbalanced datasets. In many supervised learning applications, there is a significant difference between the prior probabilities of different classes, such as between the probabilities with which an example belongs to the different classes of the classification problem. This situation is known as the class imbalance problem (Chawla et al, 2004). Although existing knowledge discovery and data engineering techniques have shown great success in many real-world applications, the problem of learning from imbalanced data (the imbalanced learning problem) is a relatively new challenge that has attracted growing attention from both academia and industry. The imbalanced learning problem is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews (Garcia et al, 2009). The term imbalanced dataset is generally referred to a dataset that has many differences in the number of instances in various classes (Wang and Yao, 2009). Traditional classification methods do not act well on imbalanced data in order to minimize overall errors, since they generally assume that the distribution of classes is balanced. This issue is very important and is considered as a challenging issue. In this work, the data is classified according to the Bagging algorithm, which uses the C4.5 Cost- Sensitive Random Tree as a single classifier. The imperialist competitive algorithm has also been used to determine the cost of misclassify classes in order to construct a cost-sensitive tree.
Keywords:
Imbalanced dataset , bagging algorithm , C4.5 cost-sensitive random tree , imperialist competitive algorithm , G-Mean criterion
Authors
Mostafa Boroumandzadeh
Department of Computer Engineering and Information Technology, Payame Noor University, IR. Iran