Web Data Extraction Using Textual Anchors
Publish Year: 1394
Type: Journal paper
Language: English
View: 505
This Paper With 8 Page And PDF Format Ready To Download
- Certificate
- I'm the author of the paper
Export:
Document National Code:
JR_JKBEI-2-4_004
Index date: 7 September 2016
Web Data Extraction Using Textual Anchors abstract
In this paper, we present an approach and a visual tool, called ABDES, for creating web wrappers to extract data records from web pages. In our approach, we rely mainly on the visible page content, simulating the way a human user scans a web page for specific data. To create a wrapper, we use text features such as textual delimiters, keywords, constants or text patterns, which we call anchors, to create patterns for the target data regions and data records. We offer a polynomial data extraction algorithm, in which these patterns are checked against the page elements in a mixed bottom-up and top-down traverse of the DOM tree. The extracted data is directly mapped onto a hierarchical XML structure as the output of the algorithm. The wrappers generated by the system are robust and independent of the HTML structure. Therefore, they can be adapted to multiple websites to gather and integrate information.
Web Data Extraction Using Textual Anchors Keywords:
Web Data Extraction Using Textual Anchors authors
Ahmad Pouramini
Department of Computer Engineering,Sirjan University of Technology, Sirjan, Iran