Abstract—Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. With the advent of high throughput sequencing technologies the problem regained significance as data sets with millions of sequences became ubiquitous. This paper is an initial report on our parallel, distributed memory and sketching-based approach to constructing large-scale sequence similarity graphs. We develop load balancing techniques, derived from multi-way number partitioning and work stealing, to manage computational imbalance and ensure scalability on thousands of processors. Our experimental results show that the method is efficient, and can be used to analyze data sets with millions of DN...
One of the most computationally intensive tasks in computational biology is de novo genome assembly,...
The challenge of comparing two or more genomes that have undergone recombination and substantial amo...
Motivation: Next-generation sequencing (NGS) has revolutionized biomedical research in the past deca...
Thesis (Ph.D.), Department of Electrical Engineering and Computer Science, Washington State Universi...
Distributed Shared Memory systems allow the use of the shared memory programming paradigm in distrib...
The tremendous quantity and quality of data obtained by conformations of DNA and protein sequences m...
Distributed Shared Memory systems allow the use of the shared memory programming paradigm in distrib...
Motivation: Next generation sequencing (NGS) has revolutionized biomedical research in the last deca...
With the advance of genomic researches, the number of sequences involved in comparative methods has ...
Thesis (Ph.D.), School of Electrical Engineering and Computer Science, Washington State UniversityTh...
LNCS v. 6124 is Proceedings of the 6th International Conference, AAIM 2010To study the genetic varia...
This paper reports on solving instances with more than 10,000 data items in a few hours. Keane, Page...
The local similarity problem is to determine the similar regions within two given sequences. We rece...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the grow...
One of the most computationally intensive tasks in computational biology is de novo genome assembly,...
The challenge of comparing two or more genomes that have undergone recombination and substantial amo...
Motivation: Next-generation sequencing (NGS) has revolutionized biomedical research in the past deca...
Thesis (Ph.D.), Department of Electrical Engineering and Computer Science, Washington State Universi...
Distributed Shared Memory systems allow the use of the shared memory programming paradigm in distrib...
The tremendous quantity and quality of data obtained by conformations of DNA and protein sequences m...
Distributed Shared Memory systems allow the use of the shared memory programming paradigm in distrib...
Motivation: Next generation sequencing (NGS) has revolutionized biomedical research in the last deca...
With the advance of genomic researches, the number of sequences involved in comparative methods has ...
Thesis (Ph.D.), School of Electrical Engineering and Computer Science, Washington State UniversityTh...
LNCS v. 6124 is Proceedings of the 6th International Conference, AAIM 2010To study the genetic varia...
This paper reports on solving instances with more than 10,000 data items in a few hours. Keane, Page...
The local similarity problem is to determine the similar regions within two given sequences. We rece...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
Biological sequence comparison is one of the most important tasks in Bioinformatics. Due to the grow...
One of the most computationally intensive tasks in computational biology is de novo genome assembly,...
The challenge of comparing two or more genomes that have undergone recombination and substantial amo...
Motivation: Next-generation sequencing (NGS) has revolutionized biomedical research in the past deca...