The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analysis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-me...
Background Searching a biological sequence database with a query sequence looking for homologues has...
To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon seq...
Backgrounds: Recent explosion of biological data brings a great challenge for the traditional cluste...
<div><p>The rapid development of sequencing technology has led to an explosive accumulation of genom...
Cluster analysis or clustering is an important data mining technique widely used for pattern recogni...
Genomic sequences can be viewed as special types of documents. These are typically organised and sto...
Metagenomics is the investigation of genetic samples directly obtained from the environment. Driven ...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational pro...
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such ...
Abstract Background Modern pyrosequencing techniques make it possible to study complex bacterial pop...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
EST clustering is a simple, yet effective method to discover all the genes present in a variety of s...
International audienceThis paper presents SpCLUST, a new C++ package that takes a list of sequences ...
Background Searching a biological sequence database with a query sequence looking for homologues has...
To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon seq...
Backgrounds: Recent explosion of biological data brings a great challenge for the traditional cluste...
<div><p>The rapid development of sequencing technology has led to an explosive accumulation of genom...
Cluster analysis or clustering is an important data mining technique widely used for pattern recogni...
Genomic sequences can be viewed as special types of documents. These are typically organised and sto...
Metagenomics is the investigation of genetic samples directly obtained from the environment. Driven ...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational pro...
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such ...
Abstract Background Modern pyrosequencing techniques make it possible to study complex bacterial pop...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
EST clustering is a simple, yet effective method to discover all the genes present in a variety of s...
International audienceThis paper presents SpCLUST, a new C++ package that takes a list of sequences ...
Background Searching a biological sequence database with a query sequence looking for homologues has...
To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon seq...
Backgrounds: Recent explosion of biological data brings a great challenge for the traditional cluste...