Similarity search in sequence databases is ofparamount importance in bioinformatics research. As the size of the genomic databases increases, similarity search of proteins in these databases becomes a bottle-neck in large-scale studies, calling for more efficient methods of content-based retrieval. In this study, we present a metric-preserving, landmark-guided embedding approach to represent sequences in the vector domain in order to allow efficient indexing and similarity search. We analyze various properties of the embedding and show that the approximation achieved by the embedded representation is sufficient to achieve biologically relevant results. The approximate representation is shown to provide several orders of magnitude speed-up i...
Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post...
textBiological data analysis can uncover hidden properties of existing experimental results, and gu...
Genomics, with the high amount of heterogeneous data that it is generating, is opening many interest...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
One of the principal operations in the area of bioinformatics is similarity assessment at the levels...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
Motivation: Sequence similarity searches are of great importance in bioinformatics. Exhaustive searc...
Sequence similarity in biological databases is used to characterize a newly discovered protein and c...
We describe a new approach for identifying sequence similarity between a query sequence and a data b...
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequence...
Sequence comparison is a fundamental task in computational biology, traditionally dominated by align...
Abstract Background Similarity inference, one of the main bioinformatics tasks, has to face an expon...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Comp...
Accepted to BioinformaticsAnalysis of genetic sequences is usually based on finding similar parts of...
Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post...
textBiological data analysis can uncover hidden properties of existing experimental results, and gu...
Genomics, with the high amount of heterogeneous data that it is generating, is opening many interest...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
One of the principal operations in the area of bioinformatics is similarity assessment at the levels...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
Motivation: Sequence similarity searches are of great importance in bioinformatics. Exhaustive searc...
Sequence similarity in biological databases is used to characterize a newly discovered protein and c...
We describe a new approach for identifying sequence similarity between a query sequence and a data b...
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequence...
Sequence comparison is a fundamental task in computational biology, traditionally dominated by align...
Abstract Background Similarity inference, one of the main bioinformatics tasks, has to face an expon...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
The primary goal of bioinformatics is to increase an understanding in the biology of organisms. Comp...
Accepted to BioinformaticsAnalysis of genetic sequences is usually based on finding similar parts of...
Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post...
textBiological data analysis can uncover hidden properties of existing experimental results, and gu...
Genomics, with the high amount of heterogeneous data that it is generating, is opening many interest...