In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of dataset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40 % in precision when using non-domain-separated data. Second, we show ranktime features can improve the performance...
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent...
Abstract. Web spam is an escalating problem that wastes valuable resources, misleads people and can ...
Abstract—In this paper, we present recent contributions for the battle against one of the main probl...
High ranking of a Web site in search engines can be directly correlated to high revenues. This ampli...
Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in orde...
We propose link-based techniques for automatic detection of Web spam, a term referring to pages whic...
To avoid of manipulating search engines results by web spam, anti spam system use machine learning t...
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We...
Feature selection is an important issue in data mining, and it is used to reduce dimensions of featu...
Feature selection is an important issue in data mining, and it is used to reduce dimensions of featu...
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We...
AbstractVarious Web spam features and machine learning structures were constantly proposed to classi...
To avoid of manipulating search engines results by web spam, anti spam system use machine learning t...
Web spam denotes the manipulation of web pages with the sole intent to raise their position in searc...
Abstract. The page rank of a commercial web site has an enormous economic impact because it directly...
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent...
Abstract. Web spam is an escalating problem that wastes valuable resources, misleads people and can ...
Abstract—In this paper, we present recent contributions for the battle against one of the main probl...
High ranking of a Web site in search engines can be directly correlated to high revenues. This ampli...
Web spam refers to some techniques, which try to manipulate search engine ranking algorithms in orde...
We propose link-based techniques for automatic detection of Web spam, a term referring to pages whic...
To avoid of manipulating search engines results by web spam, anti spam system use machine learning t...
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We...
Feature selection is an important issue in data mining, and it is used to reduce dimensions of featu...
Feature selection is an important issue in data mining, and it is used to reduce dimensions of featu...
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We...
AbstractVarious Web spam features and machine learning structures were constantly proposed to classi...
To avoid of manipulating search engines results by web spam, anti spam system use machine learning t...
Web spam denotes the manipulation of web pages with the sole intent to raise their position in searc...
Abstract. The page rank of a commercial web site has an enormous economic impact because it directly...
The steady growth and popularization of the Web has led spammers to develop techniques to circumvent...
Abstract. Web spam is an escalating problem that wastes valuable resources, misleads people and can ...
Abstract—In this paper, we present recent contributions for the battle against one of the main probl...