CIVILICA We Respect the Science
(ناشر تخصصی کنفرانسهای کشور / شماره مجوز انتشارات از وزارت فرهنگ و ارشاد اسلامی: ۸۹۷۱)

IECA: Intelligent Effective Crawling Algorithm for Web Pages

عنوان مقاله: IECA: Intelligent Effective Crawling Algorithm for Web Pages
شناسه ملی مقاله: JR_ITRC-4-4_004
منتشر شده در در سال 1391
مشخصات نویسندگان مقاله:

Mohammad Amin Golshani
Ali Mohammad Zareh Bidoki

خلاصه مقاله:
Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Webin a reasonable amount of time.Several Crawling algorithms such as Partial PageRank,Batch PageRank, OPIC, and FICA have been proposed, but they have high time complexity or low throughput. To overcome these problems, we propose a new crawling algorithm called IECA which is easy to implement with low time O(E*logV)and memory complexity O(V) -Vand Eare the number of nodes and edges in the Web graph, respectively. Unlike the mentioned algorithms, IECA traverses the Web graph only once and the importance of the Web pages is determined based on the logarithmic distance and weight of the incoming links. To evaluate IECA, we use threedifferent Web graphs such as the UK-۲۰۰۵, Web graph of university of California, Berkeley-۲۰۰۸, and Iran-۲۰۱۰. Experimental results show that our algorithm outperforms other crawling algorithms in discovering highly important pages.

کلمات کلیدی:
search engines, Web crawling, Web graph, logarithmic distance, reinforcement learning, World Wide Web

صفحه اختصاصی مقاله و دریافت فایل کامل: https://civilica.com/doc/1425811/