Many text databases on the web are hidden behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to nd these valuable resources. We propose a novel strategy to automate the classi cation of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classi- er's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial explorator...
The contents of many valuable Web-accessible databases are only available through search interfaces ...
The proliferation of text databases within large organizations and on the Internet makes it difficul...
While automated methods for information organization have been around for several decades now, expon...
Many text databases on the web are 'hidden' behind search interfaces, and their documents are only a...
Many valuable text databases on the web have non-crawlable contents that are “hidden” behind search ...
Many valuable text databases on the web have noncrawlable contents that are “hidden ” behind search ...
The contents of many valuable Web-accessible databases are only available through search inter-faces...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
: We study the automatic classification of Web documents into pre-specified categories, with the ob...
The contents of many valuable web-accessible databases are only available through search interfaces ...
We propose a methodology for building a robust query classification system that can identify thou-sa...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
The search query is a set of words or phrases a user enters when looking for information on a specif...
With the exponential growth of the World Wide Web, automated subject classification has become a maj...
We illustrate that Web searches can often be utilized to gen-erate background text for use with text...
The contents of many valuable Web-accessible databases are only available through search interfaces ...
The proliferation of text databases within large organizations and on the Internet makes it difficul...
While automated methods for information organization have been around for several decades now, expon...
Many text databases on the web are 'hidden' behind search interfaces, and their documents are only a...
Many valuable text databases on the web have non-crawlable contents that are “hidden” behind search ...
Many valuable text databases on the web have noncrawlable contents that are “hidden ” behind search ...
The contents of many valuable Web-accessible databases are only available through search inter-faces...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
: We study the automatic classification of Web documents into pre-specified categories, with the ob...
The contents of many valuable web-accessible databases are only available through search interfaces ...
We propose a methodology for building a robust query classification system that can identify thou-sa...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
The search query is a set of words or phrases a user enters when looking for information on a specif...
With the exponential growth of the World Wide Web, automated subject classification has become a maj...
We illustrate that Web searches can often be utilized to gen-erate background text for use with text...
The contents of many valuable Web-accessible databases are only available through search interfaces ...
The proliferation of text databases within large organizations and on the Internet makes it difficul...
While automated methods for information organization have been around for several decades now, expon...