Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps. Results: Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficie...
A DNA sequence analysis parallelization in large databases using cluster, multi-cluster, and GRID is...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence data...
Cluster analysis or clustering is an important data mining technique widely used for pattern recogni...
Clustering is a widely used unsupervised data analysis technique in machine learning. However, a com...
Recently, clustering has been recognized as an important and fundamental method that analyzes and cl...
MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational pro...
Background: We propose a sequence clustering algorithm and compare the partition quality and executi...
International audienceThis paper presents SpCLUST, a new C++ package that takes a list of sequences ...
<div><p>The rapid development of sequencing technology has led to an explosive accumulation of genom...
Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. BMC Bioinfor...
The rapid development of sequencing technology has led to an explosive accumulation of genomic seque...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all ar...
Background: Clustering is a fundamental operation in the analysis of biological sequence data. New D...
Genomic sequences can be viewed as special types of documents. These are typically organised and sto...
A DNA sequence analysis parallelization in large databases using cluster, multi-cluster, and GRID is...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence data...
Cluster analysis or clustering is an important data mining technique widely used for pattern recogni...
Clustering is a widely used unsupervised data analysis technique in machine learning. However, a com...
Recently, clustering has been recognized as an important and fundamental method that analyzes and cl...
MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational pro...
Background: We propose a sequence clustering algorithm and compare the partition quality and executi...
International audienceThis paper presents SpCLUST, a new C++ package that takes a list of sequences ...
<div><p>The rapid development of sequencing technology has led to an explosive accumulation of genom...
Krause A, Stoye J, Vingron M. Large scale hierarchical clustering of protein sequences. BMC Bioinfor...
The rapid development of sequencing technology has led to an explosive accumulation of genomic seque...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all ar...
Background: Clustering is a fundamental operation in the analysis of biological sequence data. New D...
Genomic sequences can be viewed as special types of documents. These are typically organised and sto...
A DNA sequence analysis parallelization in large databases using cluster, multi-cluster, and GRID is...
This paper describes a new technique for parallelizing protein clustering, an important bioinformati...
Background: Fueled by rapid progress in high-throughput sequencing, the size of public sequence data...