Employing a novel content-based similarity measure for a machine learning-driven focused crawler
Publish place: 6th National Conference on Applied Research in Computer Engineering and Information Technology
Publish Year: 1398
نوع سند: مقاله کنفرانسی
زبان: English
View: 773
This Paper With 10 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
این Paper در بخشهای موضوعی زیر دسته بندی شده است:
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CEPS06_121
تاریخ نمایه سازی: 9 اردیبهشت 1399
Abstract:
The volume of the World Wide Web is growing rapidly, reaching a point where governing data is challenging. Search engines are used to collect data across the web for users. Web crawlers as the major part of search engines are then used to retrieve relevant data on the web according to the user requests. Accordingly, a focused crawler considers a predefined subject and retrieves corresponding relevant pages. In this paper, we propose an efficient focused web crawling approach, which uses a combination of a content-based similarity measure and a Naive Bayes learning classifier in order to find relevant pages to a particular subject. Our first experimental studies show satisfactory improvements where accuracy and recall are increased by 4% and 1% respectively.
Keywords:
Authors
Atiye Jabalameli
Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran
S. Mehdi Vahidipour
Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran
Mohammad Mahdi Mohammadi
Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran