A Big Data Processing System for Spam Analysis in Apache Spark
Publish Year: 1403
Type: Conference paper
Language: English
View: 144
This Paper With 14 Page And PDF Format Ready To Download
- Certificate
- I'm the author of the paper
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
Export:
Document National Code:
CONFIT01_0815
Index date: 25 September 2024
A Big Data Processing System for Spam Analysis in Apache Spark abstract
Spam is defined as unwanted emails that treat users’ security. Due to their high efficiency, machine learning methods are used as a common and effective way to classify emails in a spam detection system, but these methods cannot manage a high volume of high-dimensional data. To resolve this problem, this study attempts to use a dimension reduction-based method called the “butterfly algorithm”, which can reduce the sample space by 43.2%. On the other hand, the decision tree and random forest methods are used in the Spark cloud space to increase the processing speed and concurrency. The results of the experiments show that the spam detection error in the proposed method is greater than in the decision tree, random forest, support vector machine, and Bayesian network methods; also, according to the results, in the case where the samples are dimensionally reduced, the decision tree and random forest methods will have better speeds in the Spark.
A Big Data Processing System for Spam Analysis in Apache Spark Keywords:
A Big Data Processing System for Spam Analysis in Apache Spark authors
Nasrin Aghaee-Maybodi
Department of Computer Engineering, Maybod Branch, Islamic Azad University, Maybod, Iran