Outlier Detection in Test Samples using Standard Deviation and Unsupervised Training Set Selection

Publish Year: 1402
نوع سند: مقاله ژورنالی
زبان: English
View: 161

This Paper With 11 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_IJE-36-1_021

تاریخ نمایه سازی: 24 دی 1401

Abstract:

Outlier detection is a technique to identify and remove significantly different data from the more correct and consistent data in a data set. Outlier data can have negative impact on classification and clustering performance; that should be identified and removed to improve the classification efficiency. Regardless of whether a classifying technique classifies an outlier correctly, the very notion of identifying a data as outlier is of great significance.   In this paper, a new approach is proposed for outlier data detection within a test data set along with unsupervised training set selection. The selected training set is used for two-step classification. After unsupervised clustering the training set, the closest cluster to a test sample is selected using the Euclidean distance measure. Then, the outlier in the test sample is identified with the concepts of standard deviation and mean value.  The results showed by evaluating the distance of each sample of the test set with the new selected data set. the accuracy of the classifiers is enhanced after detection and elimination of outlier data.

Authors

N. Mohseni

Department of Computer Engineering, Babol Branch, Islamic Azad University, Babol, Iran

H. Nematzadeh

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

E. Akbarib

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

H. Motameni

Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :