# A World Model of Protein Biology

**Source:** https://biohub.ai/esm/protein/about
**Generated:** May 28, 2026
**Format:** Scientific Research

## Overview

Biohub has released ESM (Evolutionary Scale Models) — a world model of protein biology comprising three breakthrough artifacts: [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model) for structure prediction, [ESM Atlas](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmAtlas) mapping 6.8 billion sequences and 1.1 billion predicted structures, and [ESMC](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmCModel) trained on approximately 2.8 billion sequences. This system learns from protein sequences produced by evolution to represent, map, predict, and design proteins.

## Key Statistics

| Metric | Value |
|--------|-------|
| Protein Sequences in Atlas | 6.8 billion |
| Predicted Structures in Atlas | 1.1 billion |
| Sequences ESMC Trained On | ~2.8 billion |
| SAE Features Discovered | 16,000+ |
| ESMFold2-Fast (1024aa protein) | 9.4 seconds |
| Antibody-antigen prediction accuracy | 55% (ESMFold2) |

## Main Models

### [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model)
State-of-the-art protein structure prediction model using looped transformer architecture. Achieves 55% accuracy on antibody-antigen complexes, 71% on protein-protein interactions (single sequence), rising to 77% with alignment data.

### [ESMC (ESM Cambrian)](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmCModel)
Protein language model trained on approximately 2.8 billion sequences from across all of life. Provides foundation for modeling sequence, structure, and function of proteins.

### [ESM Atlas](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmAtlas)
Map of 6.8 billion sequences and 1.1 billion predicted structures, enabling comprehensive study of proteins across all of life.

## Core Concepts

### [Looped Transformer Architecture](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23loopTransformerArch)
ESMFold2 passes representations through the same parameters multiple times, optimized through this loop during training. Enables compute scaling at inference by running more loops without retraining.

### [Sparse Autoencoders (SAE)](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23saeTechnique)
Technique used to decompose ESMC's internal representations into more than 16,000 distinct features, revealing the model's learned organization of protein biology concepts.

### [Minibinders](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23minibinderFormat)
Compact, de novo protein scaffolds with no predetermined structure used for computationally designing protein binders with therapeutic-relevant affinities.

### [scFv Antibodies](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23scFvFormat)
Single-chain variable fragment antibodies — antibody-derived molecules using unstructured loops to bind targets. ESMFold2 can design these computationally.

## Therapeutic Targets

Five clinically relevant targets validated with ESMFold2:

| Target | Type | Application |
|--------|------|-------------|
| [EGFR](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23egfrTarget) | Receptor tyrosine kinase | Tumor growth |
| [PDGFRβ](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23pdgfrbTarget) | Receptor tyrosine kinase | Tumor growth |
| [PD-L1](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23pdL1Target) | Immune checkpoint | Cancer immunotherapy; 4.3 nM affinity achieved |
| [CTLA-4](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23ctla4Target) | Immune checkpoint | Cancer immunotherapy |
| [CD45](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23cd45Target) | Immune regulator | Immune cell signaling |

## Design Success Rates

- **Minibinders:** 54% to 70% success (higher compute)
- **scFvs:** 12% to 21% success (higher compute, nearly doubled)
- [ESMFold2-designed PD-L1 binder](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23pdL1Target): 4.3 nM affinity, nanomolar potency in cell-based assays

## Partner Organizations

- [NVIDIA](https://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FNVIDIA) — TransformerEngine and cuEquivariance kernels
- [AWS Bio Discovery](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23awsBioDiscovery) — Partner platform
- [Benchling AI](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23benchlingAI) — Partner platform
- [Modal](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23modalPlatform) — Partner platform
- [Phylo](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23phyloplatform) — Partner platform
- [SandboxAQ](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23sandboxaqPlatform) — Partner platform
- [Tamarind Bio](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23tamarindBio) — Partner platform
- [Tool Universe](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23toolUniverse) — Partner platform

## ESM Architecture

### [Technical Details](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23technicalSection)

1. **Transformer Language Model Evolution** — ESM program developed first transformer language model of protein sequences in 2019
2. **Scaling Laws** — ESMC identified scaling law linking compute power to representation accuracy
3. **Looped Transformer** — ESMFold2 uses recurrence instead of additional parameters, avoiding overfitting

### [Latent Space Organization](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23latentSpaceSection)

Features discovered by SAEs include:
- Specific amino acids and classes (aromatics, small hydrophobics)
- [Alpha Helix](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23alphaHelixFeature) and [Beta Sheet](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23betaSheetFeature) structures
- Cellular localization signals
- Post-translational modifications
- The [nucleophilic elbow](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23nucleophilicElbow) catalytic motif (independently evolved across 25 protein folds)

## FAQ

1. [What is ESM (Evolutionary Scale Models)?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq1)
2. [How does ESMFold2 differ from other structure prediction models?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq2)
3. [What therapeutic targets has ESMFold2 been validated against?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq3)
4. [What is the scale of ESM Atlas?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq4)
5. [How was ESMC trained and what scaling laws were discovered?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq5)
6. [What are minibinders and scFvs in protein design?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq6)
7. [What did sparse autoencoders reveal about ESMC's latent space?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq7)
8. [How does ESMFold2 achieve compute-time scaling?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq8)
9. [What partner platforms provide access to ESM?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq9)
10. [What is the significance of ESM for medicine?](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23faq10)

## Glossary

- [ESM (Evolutionary Scale Models)](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term1) — World model of protein biology
- [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term2) — Protein structure prediction model
- [ESMC (ESM Cambrian)](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term3) — Protein language model
- [ESM Atlas](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term4) — Sequence atlas
- [Protein Sequence](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term5) — Amino acid chains
- [Protein Structure](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term6) — 3D arrangement of atoms
- [Minibinder](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term7) — Compact de novo scaffolds
- [scFv](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term8) — Single-chain variable fragment
- [Sparse Autoencoders (SAE)](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term9) — Feature decomposition technique
- [Alpha Helix and Beta Sheet](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23term10) — Secondary structure arrangements

## How to Design Protein Binders with ESMFold2

### [Step 1: Select Target Molecule](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step1)
Choose a clinically relevant molecular target such as EGFR, PDGFRβ, PD-L1, CTLA-4, or CD45. Define the binding site and desired affinity characteristics.

### [Step 2: Choose Binder Format](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step2)
Select between minibinders (compact de novo scaffolds) or scFvs (single-chain variable fragment antibodies) depending on therapeutic requirements.

### [Step 3: Define Design Constraints](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step3)
Establish parameters including required affinity (nanomolar potency), specificity requirements, and structural constraints.

### [Step 4: Run ESMFold2 Design Algorithm](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step4)
Use ESMFold2's design algorithm to search through joint model of sequence and structure for predicted binders.

### [Step 5: Evaluate Predicted Binders](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step5)
Review computational predictions for binding affinity and selectivity. Select top candidates for experimental validation.

### [Step 6: Validate in Laboratory](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step6)
Test using cell-based assays to measure affinity and functional activity. Verify structure using cryo-electron microscopy.

### [Step 7: Iterate and Optimize](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23step7)
Refine design parameters using experimental feedback. ESM enables rapid computational iteration before bench experiments.

## Relationships

- [Biohub](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23biohubOrganization) → [schema:publisher](https://schema.org/publisher) → [ESM Research Program](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmResearchProgram)
- [ESM Research Program](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmResearchProgram) → [schema:hasPart](https://schema.org/hasPart) → [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model), [ESMC](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmCModel), [ESM Atlas](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmAtlas)
- [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model) → [usesArchitecture](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23hasTrainingScale) → [Looped Transformer](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23loopTransformerArch)
- [NVIDIA](https://linkeddata.uriburner.com/describe/?url=http%3A%2F%2Fdbpedia.org%2Fresource%2FNVIDIA) → [schema:contributor](https://schema.org/contributor) → [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model), [ESMC](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmCModel)
- [ESMFold2](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmfold2Model) → [schema:applicationVariant](https://schema.org/applicationVariant) → [EGFR](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23egfrTarget), [PDGFRβ](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23pdgfrbTarget), [PD-L1](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23pdL1Target), [CTLA-4](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23ctla4Target), [CD45](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23cd45Target)
- [ESMC](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23esmCModel) → [schema:isBasedOn](https://schema.org/isBasedOn) → [Sparse Autoencoders](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23saeTechnique)
- [StructurePredictionModel](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23StructurePredictionModel) → [rdfs:subClassOf](https://www.w3.org/2000/01/rdf-schema#subClassOf) → [ProteinModel](https://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Fbiohub.ai%2Fesm%2Fprotein%2Fabout%23ProteinModel)

## Related Resources

- [Explore Knowledge Graph using SPARQL](https://linkeddata.uriburner.com/sparql?query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0ASELECT+DISTINCT+%3Fsubject+%3Ftype+%28SAMPLE%28%3Flabel%29+AS+%3Fname%29%0AWHERE+%7B%0A++GRAPH+%3Chttps%3A%2F%2Flinkeddata.uriburner.com%2FDAV%2Fdemos%2Fdaas%2Fesm-protein-world-model-minimax_m2.5free-1.ttl%3E+%7B%0A++++%3Fsubject+rdf%3Atype+%3Ftype+.%0A++++OPTIONAL+%7B+%3Fsubject+rdfs%3Alabel+%3Flabel+%7D%0A++++OPTIONAL+%7B+%3Fsubject+schema%3Aname+%3Flabel+%7D%0A++%7D%0A%7D%0AGROUP+BY+%3Fsubject+%3Ftype%0AORDER+BY+%3Ftype%0ALIMIT+50)
- [Original Page](https://biohub.ai/esm/protein/about)
- [Turtle RDF](../rdf/esm-protein-world-model-minimax_m2.5free-1.ttl)
- [HTML Infographic](../webpages/esm-protein-world-model-minimax_m2.5free-1.html)
