A Joint Semantic Vector Representation Model for Text Clustering and Classification

Publish Year: 1398
نوع سند: مقاله ژورنالی
زبان: English
View: 464

This Paper With 8 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

این Paper در بخشهای موضوعی زیر دسته بندی شده است:

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

JR_JADM-7-3_009

تاریخ نمایه سازی: 19 تیر 1398

Abstract:

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use semantic models for document vector representations. Latent Dirichlet allocation (LDA) topic modeling and doc2vec neural document embedding are two well-known techniques for this purpose. In this paper, we first study the conceptual difference between the two models and show that they have different behavior and capture semantic features of texts from different perspectives. We then proposed a hybrid approach for document vector representation to benefit from the advantages of both models. The experimental results on 20newsgroup show the superiority of the proposed model compared to each of the baselines on both text clustering and classification tasks. We achieved 2.6% improvement in F-measure for text clustering and 2.1% improvement in F-measure in text classification compared to the best baseline model.

Authors

S. Momtazi

Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.

A. Rahbar

Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.

D. Salami

Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.

I. Khanijazani

Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran.