A hidden database refers to a dataset that an organization makes accessible on the web by allowing users to issue queries through a search interface. In other words, data acquisition from such a source is not by following static hyper-links. Instead, data are ob-tained by querying the interface, and reading the result page dy-namically generated. This, with other facts such as the interface may answer a query only partially, has prevented hidden databases from being crawled effectively by existing search engines. This paper remedies the problem by giving algorithms to extract all the tuples from a hidden database. Our algorithms are provably efficient, namely, they accomplish the task by performing only a small number of queries, even in th...
Abstract—An ever increasing amount of valuable information is stored in web databases, "hidden ...
Journal ArticleRecently, there has been increased interest in the retrieval and integration of hidde...
Many online services like Twitter and GNIP offer streaming programming interfaces that allow real-ti...
Abstract- The web contains a large amount of information which is increasing by magnitude every day....
Abstract—A large number of web databases are only accessible through proprietary form-like interface...
Journal ArticleIn this paper, we study the problem of automating the retrieval of data hidden behind...
As large amount of information is growing in web daily, lots of relevant data are available in the f...
Many databases on the web are “hidden ” behind (i.e., accessible only through) their restrictive, fo...
Many web databases are only accessible through a proprietary search interface which allows users to ...
Abstract—A large number of online databases are hidden behind form-like interfaces which allow users...
A wealth of information is available on the Web. But often, such data are hidden behind form interfa...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of web pag...
Abstract- A web crawler is a software program that browses the web in a very systematic manner. Craw...
Abstract. The term Deep Web (sometimes also called Hid-den Web) refers to the data content that is c...
In this paper we address the problem of estimating the index size needed by web search engines to an...
Abstract—An ever increasing amount of valuable information is stored in web databases, "hidden ...
Journal ArticleRecently, there has been increased interest in the retrieval and integration of hidde...
Many online services like Twitter and GNIP offer streaming programming interfaces that allow real-ti...
Abstract- The web contains a large amount of information which is increasing by magnitude every day....
Abstract—A large number of web databases are only accessible through proprietary form-like interface...
Journal ArticleIn this paper, we study the problem of automating the retrieval of data hidden behind...
As large amount of information is growing in web daily, lots of relevant data are available in the f...
Many databases on the web are “hidden ” behind (i.e., accessible only through) their restrictive, fo...
Many web databases are only accessible through a proprietary search interface which allows users to ...
Abstract—A large number of online databases are hidden behind form-like interfaces which allow users...
A wealth of information is available on the Web. But often, such data are hidden behind form interfa...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of web pag...
Abstract- A web crawler is a software program that browses the web in a very systematic manner. Craw...
Abstract. The term Deep Web (sometimes also called Hid-den Web) refers to the data content that is c...
In this paper we address the problem of estimating the index size needed by web search engines to an...
Abstract—An ever increasing amount of valuable information is stored in web databases, "hidden ...
Journal ArticleRecently, there has been increased interest in the retrieval and integration of hidde...
Many online services like Twitter and GNIP offer streaming programming interfaces that allow real-ti...