Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d(N) of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguo...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
International audienceAlignment-free methods are increasingly used to calculate evolutionary distanc...
International audienceAlignment-free methods are increasingly used to estimate distances between DNA...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequence...
<p>Nine sets of DNA sequence pairs were simulated with distances <i>d</i> between 0.1 and 0.9 substi...
Methods for measuring genetic distances in phylogenetics are known to be sensitive to the evolutiona...
Alignment-free distance measures are generally less accurate but more efficient than traditional ali...
Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are base...
Alignment-free distance measures are generally less accurate but more efficient than traditional ali...
Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are base...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
International audienceAlignment-free methods are increasingly used to calculate evolutionary distanc...
International audienceAlignment-free methods are increasingly used to estimate distances between DNA...
We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequence...
<p>Nine sets of DNA sequence pairs were simulated with distances <i>d</i> between 0.1 and 0.9 substi...
Methods for measuring genetic distances in phylogenetics are known to be sensitive to the evolutiona...
Alignment-free distance measures are generally less accurate but more efficient than traditional ali...
Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are base...
Alignment-free distance measures are generally less accurate but more efficient than traditional ali...
Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are base...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Genomic string comparison via alignment are widely applied for mining and retrieval of information i...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...