This dataset includes; - Precomputed representation vectors of human proteins with various protein embedding models. - Precomputed representation vectors of SKEMPI dataset with various protein embedding models. - MSAs of human proteins calculated with HHBlits. -- Splitted tar.gz files can be opened by command; cat human_protein_msa.tar.gz.* | tar xzvf - - MSAs of protein sequences of SKEMPI dataset calculated with HHBlits
Deep-learning language models have shown promise in various biotechnological applications, including...
Many life activities and key functions in organisms are maintained by different types of proteinS...
Machine learning in systems biology; Data mining in systems biology the amount of macromolecular seq...
Current protein language models (PLMs) learn protein representations mainly based on their sequences...
Motivation: Machine-learning models trained on protein sequences and their measured functions can in...
This dataset includes the data for training the protein function prediction models at github.com/sta...
Novel protein sequences arise through mutation. These mutations may be deleterious, beneficial, or n...
MotivationThe Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of pr...
As part of his master thesis at the Rostlab, which is located at the Technical University of Munich ...
This repository contains two language model checkpoints: 1) The pretrained ProGen language model, 2)...
Abstract Proteins are the building blocks of life, carrying out fundamental functions in biology. In...
Abstract: With the exponential increase of the protein sequence databases overtime, multiple-sequenc...
Self-supervised language modeling is a rapidly developing approach for the analysis of protein seque...
The development of high-throughput measurement techniques resulted in rapidlyincreasing amounts of b...
Intrinsically disordered proteins (IDPs) and regions (IDRs) are a class of functionally important pr...
Deep-learning language models have shown promise in various biotechnological applications, including...
Many life activities and key functions in organisms are maintained by different types of proteinS...
Machine learning in systems biology; Data mining in systems biology the amount of macromolecular seq...
Current protein language models (PLMs) learn protein representations mainly based on their sequences...
Motivation: Machine-learning models trained on protein sequences and their measured functions can in...
This dataset includes the data for training the protein function prediction models at github.com/sta...
Novel protein sequences arise through mutation. These mutations may be deleterious, beneficial, or n...
MotivationThe Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of pr...
As part of his master thesis at the Rostlab, which is located at the Technical University of Munich ...
This repository contains two language model checkpoints: 1) The pretrained ProGen language model, 2)...
Abstract Proteins are the building blocks of life, carrying out fundamental functions in biology. In...
Abstract: With the exponential increase of the protein sequence databases overtime, multiple-sequenc...
Self-supervised language modeling is a rapidly developing approach for the analysis of protein seque...
The development of high-throughput measurement techniques resulted in rapidlyincreasing amounts of b...
Intrinsically disordered proteins (IDPs) and regions (IDRs) are a class of functionally important pr...
Deep-learning language models have shown promise in various biotechnological applications, including...
Many life activities and key functions in organisms are maintained by different types of proteinS...
Machine learning in systems biology; Data mining in systems biology the amount of macromolecular seq...