The Application of Clustering Methods in Information Retrieval Systems
Publish place: 3rd Iran Data Mining Conference (IDMC)
Publish Year: 1389
نوع سند: مقاله کنفرانسی
زبان: English
View: 1,767
This Paper With 12 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IDMC04_012
تاریخ نمایه سازی: 15 دی 1389
Abstract:
The explosion of on-line information has given rise to many query-based search engines (such as Alta Vista) and manually constructed topic hierarchies (such as Yahoo!). But with the current growth rate in the amount of information, query results grow incomprehensibly large and manual classification in topic hierarchies creates an immense information bottleneck. Therefore, these tools are rapidly becoming inadequate for addressing users’ information needs. Clustering techniques are generally applied for finding unobvious relations and structures in data sets. In this paper, we implement an integrated approach to information retrieval which combines some techniques of clustering in order to evaluate of the effect of clustering methods in the retrieval performance. To capture the relationships among index terms, vector space model are used. Three clustering methods are adapted (such as Modified K-means, Lloyd’s and Splitting algorithm) to the task of clustering documents with respect to the index terms. Our implemented system combines clustering approach with traditional relevance feedback approach of retrieval. The performance evaluation of clustering based IR system is carried out, and a comparison with a traditional IR system is presented. Precision and recall are defined and applied as quantitative evaluation measures of the results. The experiments with various test document sets have shown that in most cases clustering based IR system performs better than the traditional IR system. A series of experiments have been conducted in order to validate this approach; descriptions of those experiments along with the results are presented.
Keywords:
Authors
Aboozar Kalantari Soltanieh
Master Student, School of Computer Science, Physics and Mathematics, Linnaeus university, Vaxjo, Sweden
Ali Haydar;
Assistant Professor, Computer Engineering department, Girne American University, Northern Cyprus
Kamil Dimililer
Assistant Professor, Computer Engineering department, Girne American University, Northern Cyprus