Topic Word Set-Based Text Clustering

Amir Mehdi, Ghazifard; Mohammadreza, Shams; Zeinab, Shamaee

Topic Word Set-Based Text Clustering

عنوان مقاله: Topic Word Set-Based Text Clustering
شناسه ملی مقاله: ECDC07_047
منتشر شده در هفتمین کنفرانس بین المللی تجارت الکترونیک در کشورهای در حال توسعه با رویکرد بر امنیت ECDC2013 در سال 1392

مشخصات نویسندگان مقاله:

Amir Mehdi Ghazifard - E-Learning Department,University of Isfahan, Isfahan, Iran
Mohammadreza Shams - ECE Department,University of Tehran, Tehran, Iran
Zeinab Shamaee - ECE Department,Isfahan University of Technology, Isfahan, Iran

خلاصه مقاله:

Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructuredtextual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhancedifferent e-commerce applications like recommender systems, customer relationshipmanagement systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic wordsets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the clusterof documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.

کلمات کلیدی:

e-commerce; clustering; classification; term correlation graph; topic word set

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/203675/