User-friendly and Extensible Web Data Extraction

Novella, Tomáš
Holubová, Irena

Open PDF

Open link

Publication date

September 2017

Publisher

AIS Electronic Library (AISeL)

Language

English

Abstract

Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully dep...

User-friendly and Extensible Web Data Extraction

Abstract

Extracted data

Related items

Related items