A Big Data Processing System for Spam Analysis in Apache Spark
Publish Year: 1403
نوع سند: مقاله کنفرانسی
زبان: English
View: 80
This Paper With 14 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CONFIT01_0815
تاریخ نمایه سازی: 4 مهر 1403
Abstract:
Spam is defined as unwanted emails that treat users’ security. Due to their high efficiency, machine learning methods are used as a common and effective way to classify emails in a spam detection system, but these methods cannot manage a high volume of high-dimensional data. To resolve this problem, this study attempts to use a dimension reduction-based method called the “butterfly algorithm”, which can reduce the sample space by ۴۳.۲%. On the other hand, the decision tree and random forest methods are used in the Spark cloud space to increase the processing speed and concurrency. The results of the experiments show that the spam detection error in the proposed method is greater than in the decision tree, random forest, support vector machine, and Bayesian network methods; also, according to the results, in the case where the samples are dimensionally reduced, the decision tree and random forest methods will have better speeds in the Spark.
Keywords:
Authors
Nasrin Aghaee-Maybodi
Department of Computer Engineering, Maybod Branch, Islamic Azad University, Maybod, Iran