Unfortunately, websites are continuously evolving and structural changes happen with no forewarning, which usually results in wrappers working incorrectly. Thus, wrappers maintenance is necessary for detecting whether wrapper is extracting erroneous data. Wrappers are pieces of software used to extract data from websites and structure them for further application processing. The solution consists of using verification models to detect whether wrapper output is statistically similar to the output produced by the wrapper itself when it was successfully invoked in the past. Current proposals present some weaknesses, as the data used to build these models are supposed to be homogeneous, independent or representative enough, or following a singl...
Abstract—Modern applications use back-end data stores for persistent data. Automated verification of...
Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then f...
Abstract. We present a framework for verifying that programs correctly preserve impor-tant data stru...
The proliferation of online information sources has led to an increased use of wrappers for extracti...
The amount of information available on the Web grows at an incredible high rate. Systems and procedu...
The previously performed research has focused on quick and effective production of wrapping units, t...
Web wrapper extracts data from HTML document. The accuracy and quality of the information extracted ...
Many information sources on the Web are semi-structured; hence there is an oppor-tunity for automati...
Abstract. This paper investigates automatic wrapper generation and maintenance for Forums, Blogs and...
Abstract. In todays internet-centric world, web applications are replac-ing desktop applications. Cl...
In order to let software programs gain full benefit from semi-structured web sources, wrapper progra...
Web scraping (or wrapping) is a popular means for acquiring data from the web. Recent advancements h...
A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML d...
While much of the data on the web is unstructured in nature, there is also a significant amount of e...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
Abstract—Modern applications use back-end data stores for persistent data. Automated verification of...
Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then f...
Abstract. We present a framework for verifying that programs correctly preserve impor-tant data stru...
The proliferation of online information sources has led to an increased use of wrappers for extracti...
The amount of information available on the Web grows at an incredible high rate. Systems and procedu...
The previously performed research has focused on quick and effective production of wrapping units, t...
Web wrapper extracts data from HTML document. The accuracy and quality of the information extracted ...
Many information sources on the Web are semi-structured; hence there is an oppor-tunity for automati...
Abstract. This paper investigates automatic wrapper generation and maintenance for Forums, Blogs and...
Abstract. In todays internet-centric world, web applications are replac-ing desktop applications. Cl...
In order to let software programs gain full benefit from semi-structured web sources, wrapper progra...
Web scraping (or wrapping) is a popular means for acquiring data from the web. Recent advancements h...
A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML d...
While much of the data on the web is unstructured in nature, there is also a significant amount of e...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
Abstract—Modern applications use back-end data stores for persistent data. Automated verification of...
Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then f...
Abstract. We present a framework for verifying that programs correctly preserve impor-tant data stru...