CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

An Investigation of Term Weighting and Feature Selection Methods for Sentiment Analysis

عنوان مقاله: An Investigation of Term Weighting and Feature Selection Methods for Sentiment Analysis
شناسه ملی مقاله: JR_MJEE-12-2_008
منتشر شده در در سال 1397
مشخصات نویسندگان مقاله:

Tuba Parlar - Department of Mathematics, Mustafa Kemal University, ۳۱۰۶۰, Hatay, Turkiye
Selma Ayşe Özel - Department of Computer Engineering, Cukurova University, ۰۱۳۳۰, Adana, Turkiye

خلاصه مقاله:
Sentiment analysis automatically classifies the opinions, which are expressed in a document, usually as positive or negative. A review document in general, reflects its author’s opinion about the objects mentioned in the text. Therefore, it can have many useful applications such as opinionated web search and automatic analysis of reviews. Although sentiment analysis is a kind of text classification problem, structures of review documents are different from texts like news, articles, or web pages; so that techniques applied for text classification are needed to be re-experimented for the sentiment analysis. Assigning appropriate weights to features is important to the performance of sentiment analysis so that important features can receive higher weights for the feature vectors. Feature selection reduces feature vector size by eliminating redundant or irrelevant features to improve classification accuracy. In this study, our aim is to examine the effects of term weighting methods on newly proposed Query Expansion Ranking (QER) feature selection method and also compare the classification results with one of the well-known feature selection method namely Chi-square statistic. We use three popular term weighting methods (i.e., term presence, term frequency, term frequency and inverse document frequency-tf*idf) and perform experiments using multinomial Naïve Bayes classifier. The experimental results show that when QER feature selection method is used with tf*idf term weighting method, the classification performance improves in terms of F-score.

کلمات کلیدی:
Sentiment Analysis, Feature Selection, Term Weighting, Text Classification

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1603930/