CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

A New Approach to Improve the Accuracy of the TF_IDF Ranking Algorithm in Text Retrieval

عنوان مقاله: A New Approach to Improve the Accuracy of the TF_IDF Ranking Algorithm in Text Retrieval
شناسه ملی مقاله: IRANWEB04_011
منتشر شده در چهارمین کنفرانس بین المللی وب پژوهی در سال 1397
مشخصات نویسندگان مقاله:

Azize Nemati - Graduate Student, Department of Computer Engineering, Golestan University Gorgan
Soheila Karbasi - Assistant Professor, Department of Computer Engineering, Golestan University, Gorgan

خلاصه مقاله:
Today, the World Wide Web is considered as the largest source of data with the help of Web search engines, as one of the most useful tools for extracting information. Due to the web growth, providing information related to user queries by search engines is very difficult. Also, the effectiveness of the information retrieval systems is largely dependent on term-weighting. Therefore, search engines use different web mining techniques to rank search results. For this purpose, various ranking algorithms are presented.In this research, the weighting algorithm TF_ IDF is used to rank the documents. By introducing the entropy parameter related to the number of user query words in the text of the documents, the accuracy of the ranking of the documents in the information retrieval is evaluated. The remarkable points obtained from the surveys on standard questions provide a new approach to increasing the efficiency of text search systems, which the responses from subsequent experiments demonstrate its validation. The proposed approach in this paper uses the Standard Web collections and the results show that it can significantly increase the accuracy of retrieval in terms of the volume of test data collection

کلمات کلیدی:
Information retrieval, Web mining, TF_IDF weighting model, Document scoring, Entropy

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/773309/