Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on queryindependent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large p...
With the rapid increase in internet technology, users get easily confused in large hypertext structu...
The rapid development of the Internet has made a variety of Web applications and Web data, which bec...
Even experienced users of IR systems experience a high degree of frustration in searching for inform...
We report on a study that was undertaken to better understand what kinds of Web pages are the most u...
for the Web Hyperlink analysis algorithms allow search engines to deliver focused results to user qu...
Information retrieval (IR) is the process of finding relevant information, based on user queries, in...
Abstract This paper examines a new approach to Web information retrieval, and proposes a new two sta...
AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the ...
With the growth of web data, how to estimate web page quality effectively and rapidly becomes more a...
Abstract-With the exponential growth of the Internet, it has become more and more difficult to find ...
Nowadays, the World Wide Web technology isdeveloped and it is a very large, distributed digitalinfor...
To store the information in a database is one of the major tasks. The efficient storage of data is i...
Faced with the massive amount of information on the Web, which includes not only texts but nowadays ...
1 Introduction Large amount of information available of the Web is attracting many users that are tr...
Abstract: Problems statement: Nowadays, many users use web search engines to find and gather informa...
With the rapid increase in internet technology, users get easily confused in large hypertext structu...
The rapid development of the Internet has made a variety of Web applications and Web data, which bec...
Even experienced users of IR systems experience a high degree of frustration in searching for inform...
We report on a study that was undertaken to better understand what kinds of Web pages are the most u...
for the Web Hyperlink analysis algorithms allow search engines to deliver focused results to user qu...
Information retrieval (IR) is the process of finding relevant information, based on user queries, in...
Abstract This paper examines a new approach to Web information retrieval, and proposes a new two sta...
AbstractIn this paper we briefly explore the challenges to expand information retrieval (IR) on the ...
With the growth of web data, how to estimate web page quality effectively and rapidly becomes more a...
Abstract-With the exponential growth of the Internet, it has become more and more difficult to find ...
Nowadays, the World Wide Web technology isdeveloped and it is a very large, distributed digitalinfor...
To store the information in a database is one of the major tasks. The efficient storage of data is i...
Faced with the massive amount of information on the Web, which includes not only texts but nowadays ...
1 Introduction Large amount of information available of the Web is attracting many users that are tr...
Abstract: Problems statement: Nowadays, many users use web search engines to find and gather informa...
With the rapid increase in internet technology, users get easily confused in large hypertext structu...
The rapid development of the Internet has made a variety of Web applications and Web data, which bec...
Even experienced users of IR systems experience a high degree of frustration in searching for inform...