Classifying, clustering or building a phylogeny on a set of genomes without the expensive computation of sequence alignment involves calculating pairwise distances by an appropriate metric. One such metric is the normalized compression distance (NCD), an approximation of the true information distance between two objects. Despite NCD\u27s universal applicability, it has seen few applications in bioinformatics, with no existing tools applying NCD to whole-genome datasets to the best of our knowledge. We introduce Sequence Non-Alignment Compression and Comparison (snacc), a pipeline specifically tailored for computing pairwise distances between genomic sequences. snacc employs the NCD with a variety of compression algorithms, alongside an inte...
ABSTRACT Comprehensive collections approaching millions of sequenced genomes have become central inf...
Most existing methods for phylogenetic analysis involve developing an evolutionary model and then us...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
Background: Enormous volumes of short read data from next-generation sequencing (NGS) technologies h...
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the al...
Inferring evolutionary relationships based on comparative analysis of genomic data remains a fundame...
based on it has shown promising results. alignments. Our main result uses algorithmic (Kolmogorov) ...
We have recently developed a distance metric for efficiently estimating the number of substitutions ...
We have recently developed a distance metric for efficiently estimating the number of substitutions ...
BACKGROUND: Existing sequence alignment algorithms use heuristic scoring schemes based on biological...
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the ali...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in ...
Background:Similarity of sequences is a key mathematical notion for Classification and Phylogenetic ...
Alignment-free methods, in which shared properties of sub-sequences (e. g. identity or match length)...
ABSTRACT Comprehensive collections approaching millions of sequenced genomes have become central inf...
Most existing methods for phylogenetic analysis involve developing an evolutionary model and then us...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...
Background: Enormous volumes of short read data from next-generation sequencing (NGS) technologies h...
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the al...
Inferring evolutionary relationships based on comparative analysis of genomic data remains a fundame...
based on it has shown promising results. alignments. Our main result uses algorithmic (Kolmogorov) ...
We have recently developed a distance metric for efficiently estimating the number of substitutions ...
We have recently developed a distance metric for efficiently estimating the number of substitutions ...
BACKGROUND: Existing sequence alignment algorithms use heuristic scoring schemes based on biological...
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the ali...
We present a new method for clustering based on compression. The method doesn't use subject-spe...
Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in ...
Background:Similarity of sequences is a key mathematical notion for Classification and Phylogenetic ...
Alignment-free methods, in which shared properties of sub-sequences (e. g. identity or match length)...
ABSTRACT Comprehensive collections approaching millions of sequenced genomes have become central inf...
Most existing methods for phylogenetic analysis involve developing an evolutionary model and then us...
Motivation: A standard approach to classifying sets of genomes is to calculate their pairwise distan...