Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy1, and over 214 million predicted structures are available in the AlphaFold database2. However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Cluster...
Protein tertiary structure plays a very important role in determining its possible functional sites ...
Abstract Proteome-scale bioinformatics research is increasingly conducted as the number of completel...
By analysing the structure of a protein it is possible to draw conclusions about its function. Obtai...
Next-generation sequencing has allowed many new protein sequences to be identified. However, this ex...
As a result of large sequencing projects, databases of protein sequences and structures are growing ...
Proteins are a major interface between the genotype and phenotype of living things. Understanding th...
Protein structural annotation and classification is an important and challenging problem in bioinfor...
Background: Protein annotation is a major goal in molecular biology, yet experimentally determined k...
As a result of large sequencing projects, data banks of protein sequences and structures are growing...
Protein sequence annotation is a major challenge in the post-genomic era. The number of uncharacteri...
For the past half-century, structural biologists relied on the notion that similar protein sequences...
One of the main reasons for protein clustering is prediction of structure, function and evolution. M...
We are now entering a new era in protein sequence and structure annotation, with hundreds of million...
Abstract Background Protein structures are comprised of modular elements known as domains. These uni...
Background: Annotating protein function is a major goal in molecular biology, yet experimentally det...
Protein tertiary structure plays a very important role in determining its possible functional sites ...
Abstract Proteome-scale bioinformatics research is increasingly conducted as the number of completel...
By analysing the structure of a protein it is possible to draw conclusions about its function. Obtai...
Next-generation sequencing has allowed many new protein sequences to be identified. However, this ex...
As a result of large sequencing projects, databases of protein sequences and structures are growing ...
Proteins are a major interface between the genotype and phenotype of living things. Understanding th...
Protein structural annotation and classification is an important and challenging problem in bioinfor...
Background: Protein annotation is a major goal in molecular biology, yet experimentally determined k...
As a result of large sequencing projects, data banks of protein sequences and structures are growing...
Protein sequence annotation is a major challenge in the post-genomic era. The number of uncharacteri...
For the past half-century, structural biologists relied on the notion that similar protein sequences...
One of the main reasons for protein clustering is prediction of structure, function and evolution. M...
We are now entering a new era in protein sequence and structure annotation, with hundreds of million...
Abstract Background Protein structures are comprised of modular elements known as domains. These uni...
Background: Annotating protein function is a major goal in molecular biology, yet experimentally det...
Protein tertiary structure plays a very important role in determining its possible functional sites ...
Abstract Proteome-scale bioinformatics research is increasingly conducted as the number of completel...
By analysing the structure of a protein it is possible to draw conclusions about its function. Obtai...