Abstract: With the exponential increase of the protein sequence databases overtime, multiple-sequence alignment (MSA) methods, like PSI-BLAST, perform exhaustive and time-consuming database search to retrieve evolutionary information. The resulting position-specific scoring matrices (PSSMs) of such search engines represent a crucial input to many machine learning (ML) models in the field of bioinformatics and computational biology. A protein sequence is a collection of contiguous tokens or characters called amino acids (AAs). The analogy to natural language allowed us to exploit the recent advancements in the field of Natural Language Processing (NLP) and therefore transfer NLP state-of-the-art algorithms to bioinformatics. This research pr...
A broad and simple definition of `language' is a set of sequences constructed from a finite set of s...
Proteins perform critical roles in a growing list of human-devised applications, and as demands for ...
Our understanding of the natural world is largely dependent on our ability to extract information fr...
Novel protein sequences arise through mutation. These mutations may be deleterious, beneficial, or n...
As part of his master thesis at the Rostlab, which is located at the Technical University of Munich ...
Background Predicting protein function and structure from sequence is one important ...
Motivation: Evolutionary models of amino acid sequences can be adapted to incorporate structure info...
Many life activities and key functions in organisms are maintained by different types of proteinS...
In the field of artificial intelligence, a combination of scale in data and model capacity enabled b...
Computational models starting from large ensembles of evolutionarily related protein sequences captu...
A number of protein sequences are found and added to the database but its functional properties are ...
Current protein language models (PLMs) learn protein representations mainly based on their sequences...
Self-supervised neural language models with attention have recently been applied to biological seque...
Recent advances in sequencing and synthesis technologies have sparked extraordinary growth in large-...
A new approach to the process of Directed Evolution is proposed, which utilizes different machine le...
A broad and simple definition of `language' is a set of sequences constructed from a finite set of s...
Proteins perform critical roles in a growing list of human-devised applications, and as demands for ...
Our understanding of the natural world is largely dependent on our ability to extract information fr...
Novel protein sequences arise through mutation. These mutations may be deleterious, beneficial, or n...
As part of his master thesis at the Rostlab, which is located at the Technical University of Munich ...
Background Predicting protein function and structure from sequence is one important ...
Motivation: Evolutionary models of amino acid sequences can be adapted to incorporate structure info...
Many life activities and key functions in organisms are maintained by different types of proteinS...
In the field of artificial intelligence, a combination of scale in data and model capacity enabled b...
Computational models starting from large ensembles of evolutionarily related protein sequences captu...
A number of protein sequences are found and added to the database but its functional properties are ...
Current protein language models (PLMs) learn protein representations mainly based on their sequences...
Self-supervised neural language models with attention have recently been applied to biological seque...
Recent advances in sequencing and synthesis technologies have sparked extraordinary growth in large-...
A new approach to the process of Directed Evolution is proposed, which utilizes different machine le...
A broad and simple definition of `language' is a set of sequences constructed from a finite set of s...
Proteins perform critical roles in a growing list of human-devised applications, and as demands for ...
Our understanding of the natural world is largely dependent on our ability to extract information fr...