Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for most of the genomic era by exact and heuristic alignment-based algorithms. However, even efficient heuristics such as BLAST may not scale to the data sets now emerging, motivating a range of alignment-free alternatives exploiting the underlying lexical structure of each sequence. In this paper, we introduce two supervised approaches-SuperVec and SuperVecX-to learn sequence embeddings. These methods extend earlier Representation Learning (RepL) based methods to include class-related information for each sequence during training. Including class information ensures that related sequence fragments have proximal representations in the target spac...
This thesis explores the utility of representation learning for bioinformatics applications. It prop...
Similarity search over long sequence dataset becomes increasingly popular in many emerging applicati...
Similarity search over long sequence dataset becomes increasingly popular in many emerg-ing applicat...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
Similarity search in sequence databases is ofparamount importance in bioinformatics research. As the...
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequence...
Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)Interna...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up...
This thesis presents an application of a generalized suffix tree extended by the use of frequency of...
Sequence alignment is an important bioinformatics tool for identifying homology, but searching again...
This dissertation proposes a novel tree structure, Error Tree (ET), to more efficiently solve the Ap...
This thesis explores the utility of representation learning for bioinformatics applications. It prop...
Similarity search over long sequence dataset becomes increasingly popular in many emerging applicati...
Similarity search over long sequence dataset becomes increasingly popular in many emerg-ing applicat...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
Similarity-based search of sequence collections is a core task in bioinformatics, one dominated for ...
Similarity search in sequence databases is ofparamount importance in bioinformatics research. As the...
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequence...
Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)Interna...
Efficient and accurate search in biological sequence databases remains a matter of priority due to t...
International audienceWith genome sequencing projects producing huge amounts of sequence data, datab...
We present a fast algorithm for sequence clustering and searching which works with large sequence da...
This paper introduces a novel method, called Reference-Based String Alignment (RBSA), that speeds up...
This thesis presents an application of a generalized suffix tree extended by the use of frequency of...
Sequence alignment is an important bioinformatics tool for identifying homology, but searching again...
This dissertation proposes a novel tree structure, Error Tree (ET), to more efficiently solve the Ap...
This thesis explores the utility of representation learning for bioinformatics applications. It prop...
Similarity search over long sequence dataset becomes increasingly popular in many emerging applicati...
Similarity search over long sequence dataset becomes increasingly popular in many emerg-ing applicat...