CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics

عنوان مقاله: Distributed Online Pre-Processing Framework for Big Data Sentiment Analytics
شناسه ملی مقاله: JR_JADM-10-2_005
منتشر شده در در سال 1401
مشخصات نویسندگان مقاله:

M. Molaei - Department of Computer Engineering, University of Zanjan, Iran.
D. Mohamadpur - Department of Computer Engineering, University of Zanjan, Iran.

خلاصه مقاله:
Performing sentiment analysis on social networks big data can be helpful for various research and business projects to take useful insights from text-oriented content. In this paper, we propose a general pre-processing framework for sentiment analysis, which is devoted to adopting FastText with Recurrent Neural Network variants to prepare textual data efficiently. This framework consists of three different stages of data cleansing, tweets padding, word embedding’s extraction from FastText and conversion of tweets to these vectors, which implemented using DataFrame data structure in Apache Spark. Its main objective is to enhance the performance of online sentiment analysis in terms of pre-processing time and handle large scale data volume. In addition, we propose a distributed intelligent system for online social big data analytics. It is designed to store, process, and classify a huge amount of information in online. The proposed system adopts any word embedding libraries like FastText with different distributed deep learning models like LSTM or GRU. The results of the evaluations show that the proposed framework can significantly improve the performance of previous RDD-based methods in terms of processing time and data volume.

کلمات کلیدی:
BigData, pre-processing, Apache-Spark, DataFrame, RNN

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1466724/