To establish lists of words with unexpected frequencies in random sequences, for instance in a molecular biology context, one needs to quantify the exceptionality of families of word frequencies. We study large deviation probabilities of multidimensional word counts in Markov models and hidden Markov models. To prove these results, we establish Edgeworth-like expansions on multidimentional fonctionals of finite Markov chains. We use those theorems to get lists of words with unexpected frequencies in the genomic sequences of Escherichia Coli and Bacillus Subtilis.Pour obtenir des listes de mots de fréquences exceptionnelles par rapport à un modèle aléatoire, par exemple dans un contexte de biologie moléculaire, il faut quantifier la qualité ...
atteson. ~ p eaplant, biology, yale. edu We present algorithms for the exact computation of the prob...
Consider a given pattern H and a random text T generated by a Markovian source of any order. We stud...
We study a problem related to the extraction of over-represented words from a given source text x, o...
Abstract. To establish lists of words with unexpected frequencies in long sequences, for instance in...
To establish lists of words with unexpected frequencies in long sequences, for instance in a molec...
International audienceIn this paper, me give an overview about the different results existing on the...
International audiencen the following, an overview is given on statistical and probabilistic propert...
Some strings -the texts- are assumed to be randomly generated, according to a probability model that...
AbstractEvaluation of the expected frequency of occurrences of a given set of patterns in a DNA sequ...
Word match counts have traditionally been proposed as an alignment-free measure of similarity for bi...
A finite-context (Markov) model of order k yields the probability distribution of the next symbol in...
The distribution of the distance between two (or more) successive occurrences of a specific word in ...
The D2 statistic, which counts the number of word matches between two given sequences, has long been...
Using recent results on the occurrence times of a string of symbols in a stochastic process with mix...
The enormous growth of biomolecular databases makes it increasingly important to have fast and autom...
atteson. ~ p eaplant, biology, yale. edu We present algorithms for the exact computation of the prob...
Consider a given pattern H and a random text T generated by a Markovian source of any order. We stud...
We study a problem related to the extraction of over-represented words from a given source text x, o...
Abstract. To establish lists of words with unexpected frequencies in long sequences, for instance in...
To establish lists of words with unexpected frequencies in long sequences, for instance in a molec...
International audienceIn this paper, me give an overview about the different results existing on the...
International audiencen the following, an overview is given on statistical and probabilistic propert...
Some strings -the texts- are assumed to be randomly generated, according to a probability model that...
AbstractEvaluation of the expected frequency of occurrences of a given set of patterns in a DNA sequ...
Word match counts have traditionally been proposed as an alignment-free measure of similarity for bi...
A finite-context (Markov) model of order k yields the probability distribution of the next symbol in...
The distribution of the distance between two (or more) successive occurrences of a specific word in ...
The D2 statistic, which counts the number of word matches between two given sequences, has long been...
Using recent results on the occurrence times of a string of symbols in a stochastic process with mix...
The enormous growth of biomolecular databases makes it increasingly important to have fast and autom...
atteson. ~ p eaplant, biology, yale. edu We present algorithms for the exact computation of the prob...
Consider a given pattern H and a random text T generated by a Markovian source of any order. We stud...
We study a problem related to the extraction of over-represented words from a given source text x, o...