We report on a study that was undertaken to better understand what kinds of Web pages are the most useful for web search engine users by exploiting queryindependent features of retrieval target pages. To our knowledge, there has been little research towards query-independent web page cleansing for web information retrieval. Based on more than 30 million web pages obtained both from TREC and from a widely-used Chinese search engine SOGOU (www.sogou.com), we provide analysis on the differences between retrieval target pages and ordinary ones. We also propose a learning-based data cleansing algorithm for reducing Web pages which are not likely to be useful for user request. The results obtained show that retrieval target pages can be separated...
We propose a methodology for building a practical robust query classification system that can identi...
The growing importance and need of data processing for information extraction is vital for Web datab...
Web Search Engines (WSEs) are probably nowadays the most complex information systems since they need...
Understanding what kinds of Web pages are the most useful for Web search engine users is a critical ...
Abstract This paper examines a new approach to Web information retrieval, and proposes a new two sta...
The rapid development of the Internet has made a variety of Web applications and Web data, which bec...
This paper examines a new approach to Web information retrieval, and proposes a new two stage scheme...
With the growth of web data, how to estimate web page quality effectively and rapidly becomes more a...
Abstract-With the exponential growth of the Internet, it has become more and more difficult to find ...
This paper presents an algorithm to improve a web search query based on the feedback on the viewed ...
To store the information in a database is one of the major tasks. The efficient storage of data is i...
The efficiency of retrieval system is crucial for large-scale information retrieval systems. By anal...
The amount of Web information is growing rapidly, improving the efficiency and accuracy of Web infor...
AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the ...
Web retrieval methods have evolved through three major steps in the last decade or so. They started ...
We propose a methodology for building a practical robust query classification system that can identi...
The growing importance and need of data processing for information extraction is vital for Web datab...
Web Search Engines (WSEs) are probably nowadays the most complex information systems since they need...
Understanding what kinds of Web pages are the most useful for Web search engine users is a critical ...
Abstract This paper examines a new approach to Web information retrieval, and proposes a new two sta...
The rapid development of the Internet has made a variety of Web applications and Web data, which bec...
This paper examines a new approach to Web information retrieval, and proposes a new two stage scheme...
With the growth of web data, how to estimate web page quality effectively and rapidly becomes more a...
Abstract-With the exponential growth of the Internet, it has become more and more difficult to find ...
This paper presents an algorithm to improve a web search query based on the feedback on the viewed ...
To store the information in a database is one of the major tasks. The efficient storage of data is i...
The efficiency of retrieval system is crucial for large-scale information retrieval systems. By anal...
The amount of Web information is growing rapidly, improving the efficiency and accuracy of Web infor...
AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the ...
Web retrieval methods have evolved through three major steps in the last decade or so. They started ...
We propose a methodology for building a practical robust query classification system that can identi...
The growing importance and need of data processing for information extraction is vital for Web datab...
Web Search Engines (WSEs) are probably nowadays the most complex information systems since they need...