سیویلیکا را در شبکه های اجتماعی دنبال نمایید.

Web Data Extraction Using Textual Anchors

Publish Year: 1394
Type: Journal paper
Language: English
View: 505

This Paper With 8 Page And PDF Format Ready To Download

Export:

Link to this Paper:

Document National Code:

JR_JKBEI-2-4_004

Index date: 7 September 2016

Web Data Extraction Using Textual Anchors abstract

In this paper, we present an approach and a visual tool, called ABDES, for creating web wrappers to extract data records from web pages. In our approach, we rely mainly on the visible page content, simulating the way a human user scans a web page for specific data. To create a wrapper, we use text features such as textual delimiters, keywords, constants or text patterns, which we call anchors, to create patterns for the target data regions and data records. We offer a polynomial data extraction algorithm, in which these patterns are checked against the page elements in a mixed bottom-up and top-down traverse of the DOM tree. The extracted data is directly mapped onto a hierarchical XML structure as the output of the algorithm. The wrappers generated by the system are robust and independent of the HTML structure. Therefore, they can be adapted to multiple websites to gather and integrate information.

Web Data Extraction Using Textual Anchors Keywords:

Web Data Extraction Using Textual Anchors authors

Ahmad Pouramini

Department of Computer Engineering,Sirjan University of Technology, Sirjan, Iran