Abstract Background Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
The efforts by the international genome sequencing projects have resulted in huge and exponentially ...
The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Comp...
Metagenome sequencing efforts have provided a large pool of billions of genes for identifying enzyme...
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods ...
<div><p>In the era of metagenomics and diagnostics sequencing, the importance of protein comparison ...
Motivation: The genomic era in molecular biology has brought on a rapidly widening gap between the a...
BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of pr...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
Metagenomic studies produce large datasets that are estimated to grow at a faster rate than the avai...
Sequence similarity in biological databases is used to characterize a newly discovered protein and c...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
[[abstract]]Background: Protein structural data has increased exponentially, such that fast and accu...
Since the availability of high throughput sequencing tools, the number of known protein sequences ha...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
The efforts by the international genome sequencing projects have resulted in huge and exponentially ...
The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Comp...
Metagenome sequencing efforts have provided a large pool of billions of genes for identifying enzyme...
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods ...
<div><p>In the era of metagenomics and diagnostics sequencing, the importance of protein comparison ...
Motivation: The genomic era in molecular biology has brought on a rapidly widening gap between the a...
BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of pr...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
Metagenomic studies produce large datasets that are estimated to grow at a faster rate than the avai...
Sequence similarity in biological databases is used to characterize a newly discovered protein and c...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
[[abstract]]Background: Protein structural data has increased exponentially, such that fast and accu...
Since the availability of high throughput sequencing tools, the number of known protein sequences ha...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
The efforts by the international genome sequencing projects have resulted in huge and exponentially ...
The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Comp...