We consider a data mining problem in a large collection of unstructured texts based on association rules over subwords of texts. A two-word association pattern is an expression such as $ (TATA, 30, AGGAGGT) Rightarrow C $ that expresses a rule that if a text contains a subword TATA followed by another subword AGGAGGT with distance no more than 30 letters then a property C will hold with a probability. We present an efficient algorithm for computing frequent patterns ($ alpha $, $k$ , $\beta $) that optimize the confidence with respect to a given collection of texts. The algorithm runs in time $ O(mn^2) $ and space $ O(kn) $, where $ m $ and $ n $ are the number and the total length of classification examples, respectively, and $ k $ is a sm...
Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformati...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
In recent years, several algorithms for mining frequent and emerging substring patterns from databas...
. We consider a data mining problem in a large collection of unstructured texts based on association...
We study a data mining problem in a large collection of unstructured texts based on association rule...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
2000 Kyoto International Conference on Digital Libraries : research and practice, 11/13/2000 - 11/16...
We propose a new algorithmic framework that solves frequency-related data mining queries on database...
Abstract. Motivated by the imminent growth of massive, highly redun-dant genomic databases we study ...
Bio-data analysis deals with the most vital discovering problem of similarity search and finding rel...
Inductive database systems typically include algorithms for mining and querying frequent patterns an...
A string is just a sequence of letters. But strings can be massive. Plant and animal genomes are str...
Abstract. Finding similar substrings/substructures is a central task in analyzing huge amounts of st...
Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformati...
Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformati...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
In recent years, several algorithms for mining frequent and emerging substring patterns from databas...
. We consider a data mining problem in a large collection of unstructured texts based on association...
We study a data mining problem in a large collection of unstructured texts based on association rule...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
We propose new frequent substring pattern mining which can enumerate all substrings with statistical...
2000 Kyoto International Conference on Digital Libraries : research and practice, 11/13/2000 - 11/16...
We propose a new algorithmic framework that solves frequency-related data mining queries on database...
Abstract. Motivated by the imminent growth of massive, highly redun-dant genomic databases we study ...
Bio-data analysis deals with the most vital discovering problem of similarity search and finding rel...
Inductive database systems typically include algorithms for mining and querying frequent patterns an...
A string is just a sequence of letters. But strings can be massive. Plant and animal genomes are str...
Abstract. Finding similar substrings/substructures is a central task in analyzing huge amounts of st...
Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformati...
Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformati...
Algorithms for finding similar, or highly conserved, regions in a group of sequences are at the core...
In recent years, several algorithms for mining frequent and emerging substring patterns from databas...