The development of solutions to scale the extraction of data from Web sources is still a challenging issue. High accu- racy can be achieved by supervised approaches, but the costs of training data, i.e., annotations over a set of sam- ple pages, limit their scalability. Crowdsourcing platforms are making the manual annotation process more affordable. However, the tasks demanded to these platforms should be extremely simple, to be performed by non-expert people, and their number should be minimized, to contain the costs. We demonstrate alfred, a wrapper inference system super- vised by the workers of a crowdsourcing platform. Training data are labeled values generated by means of membership queries, the simplest form of queries, posed to the...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
One of the foremost challenges for information technology over the last few years has been to explor...
By incorporating human workers into the query execution process crowd-enabled databases facilitate i...
The Web is a rich source of data that represents a valuable resource for many organizations. Data in...
We present a crowdsourcing system for large-scale production of accurate wrappers to extract data fr...
We present solutions based on crowdsourcing platforms to support large-scale production of accurate ...
Wrapper inference deals in generating programs to extract data from Web pages. Several supervised an...
International audienceCrowdsourcing is a way to solve problems that need human contribution. Crowdso...
The amount of text data has been growing exponentially and with it the demand for improved informati...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
We introduce a method to greatly reduce the amount of redundant annotations required when crowdsourc...
Some complex problems, such as image tagging and natural lan-guage processing, are very challenging ...
Abstract: Named entity extraction is an established research area in the field of information extrac...
The web contains a tremendous number of data sets presented visually, which computers cannot current...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
One of the foremost challenges for information technology over the last few years has been to explor...
By incorporating human workers into the query execution process crowd-enabled databases facilitate i...
The Web is a rich source of data that represents a valuable resource for many organizations. Data in...
We present a crowdsourcing system for large-scale production of accurate wrappers to extract data fr...
We present solutions based on crowdsourcing platforms to support large-scale production of accurate ...
Wrapper inference deals in generating programs to extract data from Web pages. Several supervised an...
International audienceCrowdsourcing is a way to solve problems that need human contribution. Crowdso...
The amount of text data has been growing exponentially and with it the demand for improved informati...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
We introduce a method to greatly reduce the amount of redundant annotations required when crowdsourc...
Some complex problems, such as image tagging and natural lan-guage processing, are very challenging ...
Abstract: Named entity extraction is an established research area in the field of information extrac...
The web contains a tremendous number of data sets presented visually, which computers cannot current...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are ...
One of the foremost challenges for information technology over the last few years has been to explor...
By incorporating human workers into the query execution process crowd-enabled databases facilitate i...