Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence databases doubles every two years. Searching the ever larger and more redundant databases is getting increasingly inefficient. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed, sensitivity, and readability of homology searches. However, because the clustering time is quadratic in the number of sequences, standard sequence search methods are becoming impracticable. Results: Here we present a method to cluster large protein sequence databases such as UniProt within days down to 20\%-30\% maximum pairwise sequence identity. kClust owes its speed and sensitivity to an alignment-free p...
BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of pr...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Abstract Background In bioinformatics community, many tasks associate with matching a set of protein...
Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence data...
Background Fueled by rapid progress in high-throughput sequencing, the size of public sequence datab...
Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundanc...
Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale fu...
Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. BMC Bioinfor...
BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity s...
The problem of finding remote homologues of a given protein sequence via alignment methods is not fu...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
International audienceBackground: An important problem in computational biology is the automatic det...
One of the main reasons for protein clustering is prediction of structure, function and evolution. M...
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three ...
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such ...
BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of pr...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Abstract Background In bioinformatics community, many tasks associate with matching a set of protein...
Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence data...
Background Fueled by rapid progress in high-throughput sequencing, the size of public sequence datab...
Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundanc...
Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale fu...
Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. BMC Bioinfor...
BACKGROUND: HH-suite is a widely used open source software suite for sensitive sequence similarity s...
The problem of finding remote homologues of a given protein sequence via alignment methods is not fu...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
International audienceBackground: An important problem in computational biology is the automatic det...
One of the main reasons for protein clustering is prediction of structure, function and evolution. M...
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three ...
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such ...
BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of pr...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Abstract Background In bioinformatics community, many tasks associate with matching a set of protein...