We present a classification model for semi-structured documents based on statistical language modelling theory which outperforms extant approaches to spam filtering on the LingSpam email corpus [1]. We also introduce two variants of a novel discounting technique for higher-order N-gram language models developed in the light of the spam filtering problem
This thesis proposes an innovative adaptive multi-classifier spam filtering model, with a grey-list ...
learning filters) use tokens, which are found during message content analysis, to separate spam from...
This paper applies a language model approach to different sources of information extracted from a We...
A solution to spam emails remains elusive despite over a decade long research efforts on spam filter...
The paper elaborates on how text analysis influences classification—a key part of the spam-filtering...
more than 31 trillion spams have been sent in 2009. These spam or “junk mails” can involve various k...
In this paper, we study the usability of linguistic features in the context of statistical-based mac...
The paper presents a brief survey of the fight between spammers and antispam software developers, an...
Spam identification is crucial in implementing an effective email filtering system, while spam recog...
In recent years, email spam has become an increasingly important problem, with a big economic impact...
The increasing volume of unsolicited mass e-mail (otherwise called spam) has generated a need for re...
Spam filtering poses a special problem in text categorization, of which the defining characteristic ...
In this paper we propose a middleware infrastructure to address the problem of filtering unsolicitat...
Abstract. In this paper, we propose a novel feature selection method— INTERACT to select relevant wo...
The development of data-mining applications such as classification and clustering has shown the need...
This thesis proposes an innovative adaptive multi-classifier spam filtering model, with a grey-list ...
learning filters) use tokens, which are found during message content analysis, to separate spam from...
This paper applies a language model approach to different sources of information extracted from a We...
A solution to spam emails remains elusive despite over a decade long research efforts on spam filter...
The paper elaborates on how text analysis influences classification—a key part of the spam-filtering...
more than 31 trillion spams have been sent in 2009. These spam or “junk mails” can involve various k...
In this paper, we study the usability of linguistic features in the context of statistical-based mac...
The paper presents a brief survey of the fight between spammers and antispam software developers, an...
Spam identification is crucial in implementing an effective email filtering system, while spam recog...
In recent years, email spam has become an increasingly important problem, with a big economic impact...
The increasing volume of unsolicited mass e-mail (otherwise called spam) has generated a need for re...
Spam filtering poses a special problem in text categorization, of which the defining characteristic ...
In this paper we propose a middleware infrastructure to address the problem of filtering unsolicitat...
Abstract. In this paper, we propose a novel feature selection method— INTERACT to select relevant wo...
The development of data-mining applications such as classification and clustering has shown the need...
This thesis proposes an innovative adaptive multi-classifier spam filtering model, with a grey-list ...
learning filters) use tokens, which are found during message content analysis, to separate spam from...
This paper applies a language model approach to different sources of information extracted from a We...