User-friendly and Extensible Web Data Extraction

  • Novella, Tomáš
  • Holubová, Irena
Open PDF
Publication date
September 2017
Publisher
AIS Electronic Library (AISeL)
Language
English

Abstract

Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully dep...

Extracted data

We use cookies to provide a better user experience.