Wrapper inference deals in generating programs to extract data from Web pages. Several supervised and unsupervised wrapper inference approaches have been proposed in the literature. On one hand, unsupervised approaches produce erratic wrappers: whenever the sources do not satisfy underlying assumptions of the inference algorithm, their accuracy is compromised. On the other hand, supervised approaches produce accurate wrappers, but since they need training data, their scalability is limited. The recent advent of crowdsourcing platforms has opened new opportunities for supervised approaches, as they make possible the production of large amounts of training data with the support of workers recruited online. Nevertheless, involving human worker...
© 2019 Dr. Yuan LiThis thesis explores aggregation methods for crowdsourced annotations. Crowdsourci...
The internet enables us to collect and store unprecedented amounts of data. We need better models fo...
Data extraction from the Web represents an important issue. Several approaches have been developed t...
Wrapper inference deals in generating programs to extract data from Web pages. Several supervised an...
We present solutions based on crowdsourcing platforms to support large-scale production of accurate ...
We present a crowdsourcing system for large-scale production of accurate wrappers to extract data fr...
The development of solutions to scale the extraction of data from Web sources is still a challenging...
The Web is a rich source of data that represents a valuable resource for many organizations. Data in...
The web contains a tremendous number of data sets presented visually, which computers cannot current...
International audienceCrowdsourcing is a way to solve problems that need human contribution. Crowdso...
We propose novel algorithms for the problem of crowd- sourcing binary labels. Such binary labeling t...
With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of ...
We introduce a method to greatly reduce the amount of redundant annotations required when crowdsourc...
Crowdsourcing has become an effective and popular tool for human-powered computation to label large ...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
© 2019 Dr. Yuan LiThis thesis explores aggregation methods for crowdsourced annotations. Crowdsourci...
The internet enables us to collect and store unprecedented amounts of data. We need better models fo...
Data extraction from the Web represents an important issue. Several approaches have been developed t...
Wrapper inference deals in generating programs to extract data from Web pages. Several supervised an...
We present solutions based on crowdsourcing platforms to support large-scale production of accurate ...
We present a crowdsourcing system for large-scale production of accurate wrappers to extract data fr...
The development of solutions to scale the extraction of data from Web sources is still a challenging...
The Web is a rich source of data that represents a valuable resource for many organizations. Data in...
The web contains a tremendous number of data sets presented visually, which computers cannot current...
International audienceCrowdsourcing is a way to solve problems that need human contribution. Crowdso...
We propose novel algorithms for the problem of crowd- sourcing binary labels. Such binary labeling t...
With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of ...
We introduce a method to greatly reduce the amount of redundant annotations required when crowdsourc...
Crowdsourcing has become an effective and popular tool for human-powered computation to label large ...
Crowd-sourcing has become a popular means of acquiring labeled data for a wide variety of tasks wher...
© 2019 Dr. Yuan LiThis thesis explores aggregation methods for crowdsourced annotations. Crowdsourci...
The internet enables us to collect and store unprecedented amounts of data. We need better models fo...
Data extraction from the Web represents an important issue. Several approaches have been developed t...