A World Model of Protein Biology

Biohub releases ESM (Evolutionary Scale Models) - comprising ESMFold2, ESM Atlas, and ESMC - for protein structure prediction, sequence mapping, and language modeling.

📅 Published: 2026-05-28 🏢 Publisher: Biohub 📄 License: MIT

ESM Models & Artifacts

6.8B
Sequences in ESM Atlas
1.1B
Predicted Structures
~2.8B
Sequences for ESMC Training
9.4s
ESMFold2-Fast per 1024 aa
🧬
ESMFold2
55% ab-ag, 71% PPI accuracy. Looped Transformer architecture.
🗺️
ESM Atlas
Map of 6.8B sequences and 1.1B predicted structures across all life.
🧠
ESMC (ESM Cambrian)
Protein language model trained on ~2.8B sequences from across all life.
ESMFold2-Fast
9.4 seconds per 1024-length protein. Outperforms on ab-ag folding.

Therapeutic Targets Validated with ESMFold2

🎯
EGFR
Epidermal Growth Factor Receptor - receptor tyrosine kinase in tumor growth.
🎯
PDGFRβ
Platelet-Derived Growth Factor Receptor beta - implicated in tumor growth.
🛡️
PD-L1
4.3 nM affinity achieved. Immune checkpoint exploited by cancer cells.
🛡️
CTLA-4
Cytotoxic T-Lymphocyte Antigen 4 - immune checkpoint target.
🔬
CD45
Cluster of Differentiation 45 - regulator of immune cell signaling.
Design Success Rates: Minibinders: 54% → 70% with higher compute. scFvs: 12% → 21%. Cryo-EM verified at 1.204 Å RMSD.

Partner Platforms

🔵
NVIDIA
TransformerEngine & cuEquivariance kernels
☁️
AWS Bio Discovery
Cloud platform access
🧬
Benchling AI
Platform for ESM access
📦
Modal
Platform for ESM access
🌲
Phylo
Platform for ESM access
🔷
SandboxAQ
Platform for ESM access
🌿
Tamarind Bio
Platform for ESM access
🛠️
Tool Universe
Platform for ESM access

Frequently Asked Questions

ESM is a world model of protein biology comprising three artifacts: ESMFold2 (structure prediction), ESM Atlas (map of 6.8B sequences/1.1B structures), and ESMC (protein language model trained on ~2.8B sequences). It learns from protein sequences produced by evolution to represent, map, predict, and design proteins.
ESMFold2 uses a looped transformer architecture rather than searching for evolutionarily related sequences (MSAs). It operates directly from ESMC's learned protein representations, capturing evolutionary information encoded during language model pretraining. Achieves 55% on antibody-antigen complexes and 9.4s per 1024-length protein.
Five clinically relevant targets: EGFR and PDGFRβ (receptor tyrosine kinases), PD-L1 and CTLA-4 (immune checkpoints), and CD45 (immune cell signaling regulator). An ESMFold2-designed scFv bound PD-L1 with 4.3 nM affinity.
ESM Atlas contains 6.8 billion sequences and 1.1 billion predicted structures, enabling the sequences and structures of proteins across all of life to be studied as a complete picture.
ESMC was trained on approximately 2.8 billion sequences drawn from across all of life. A scaling law links compute power used in training to how accurately representations capture biology. This powers linear returns with scale, leading to state-of-the-art protein representations.
Minibinders are compact, de novo protein scaffolds with no predetermined structure used for binder design. scFvs (single-chain variable fragment antibodies) are antibody-derived molecules using unstructured loops to bind targets. ESMFold2 can design both computationally with therapeutic-relevant affinities.
Sparse autoencoders decomposed ESMC's internal representations into more than 16,000 distinct features. The model independently recovered basic organizing principles of biology: amino acid chemistry, local structural interactions, abstract functional concepts, and evolutionary themes connecting all of life.
ESMFold2 uses a looped transformer where representations pass through the same parameters multiple times. Each pass refines the structural representation based on previous computations. This allows compute scaling at inference by running more loops without retraining.
AWS Bio Discovery, Benchling AI, Modal, Phylo, SandboxAQ, Tamarind Bio, and Tool Universe. ESMFold2, ESMC, and ESM Atlas are available at the Biohub Platform.
ESM enables computational protein binder design validated against five clinically relevant targets in oncology and immunology. When digital representations of biology become accurate enough, protein designs can be tested computationally before they reach the bench. Particularly promising for cancer and rare diseases where much of disease is individual.

Glossary of ESM Protein Biology

ESM (Evolutionary Scale Models)

A world model of protein biology comprising ESMFold2, ESM Atlas, and ESMC that learns from protein sequences produced by evolution.

ESMFold2

State-of-the-art protein structure prediction model using looped transformer architecture without requiring multiple sequence alignments.

ESMC (ESM Cambrian)

Protein language model trained on ~2.8 billion sequences from across all of life, providing state-of-the-art representations.

ESM Atlas

A map of 6.8 billion protein sequences and 1.1 billion predicted structures across all of life.

Protein Sequence

Chains of 20 chemical building blocks (amino acids) whose order determines folding and function.

Protein Structure

The three-dimensional arrangement of atoms in a protein, determined by amino acid sequence.

Minibinder

Compact, de novo protein scaffolds with no predetermined structure used for designing protein binders computationally.

scFv (Single-chain Variable Fragment)

Antibody-derived molecules using unstructured loops to bind targets; demanding examples of the binder design problem.

Sparse Autoencoders (SAE)

Technique for identifying interpretable structure in large language models by decomposing internal representations.

Alpha Helix and Beta Sheet

The two primary secondary structure arrangements that form when a protein backbone folds.

How to Design Protein Binders with ESMFold2

Select Target Molecule

Choose a clinically relevant target such as EGFR, PDGFRβ, PD-L1, CTLA-4, or CD45. Define binding site and desired affinity.

Choose Binder Format

Select between minibinders (compact de novo scaffolds) or scFvs (single-chain variable fragment antibodies) based on therapeutic requirements.

Define Design Constraints

Establish parameters: required affinity (nanomolar potency), specificity requirements, and structural constraints for the binding interface.

Run ESMFold2 Design Algorithm

Use ESMFold2's design algorithm searching through joint model of sequence and structure. Higher compute yields up to 70% minibinder success.

Evaluate Predicted Binders

Review computational predictions for binding affinity and selectivity. Select top candidates for experimental validation.

Validate in Laboratory

Test designed binders using cell-based assays to measure affinity and functional activity. Verify structure using cryo-EM if needed.

Iterate and Optimize

Use experimental feedback to refine design parameters. ESM enables rapid iteration - designs can be computationally validated before bench experiments.

Knowledge Graph Explorer

Physics

Predicates

Node Types

Display

Class
Property
Instance

SPARQL Workbench

Query 1 All ESM Models — classes and instances
PREFIX schema: 
PREFIX rdfs: 

SELECT ?model ?name ?type ?desc
WHERE {
  { ?model a schema:SoftwareApplication ; schema:name ?name ; schema:description ?desc }
  UNION
  { ?model a rdfs:Class ; rdfs:label ?name }
  FILTER(CONTAINS(LCASE(?name), "esm") || CONTAINS(LCASE(?desc), "protein"))
}
Run Query ↗
Query 2 Therapeutic Targets with Bindings
PREFIX schema: 

SELECT ?target ?name ?desc
WHERE {
  ?target a schema:Product ;
          schema:name ?name ;
          schema:description ?desc .
  FILTER(CONTAINS(?desc, "cancer") || CONTAINS(?desc, "immune") || CONTAINS(?name, "PD"))
}
Run Query ↗
Query 3 Organizations and Partners
PREFIX schema: 
PREFIX owl: 

SELECT ?org ?name ?url ?sameAs
WHERE {
  ?org a schema:Organization ;
       schema:name ?name .
  OPTIONAL { ?org schema:url ?url }
  OPTIONAL { ?org owl:sameAs ?sameAs }
}
Run Query ↗
Query 4 Properties and Relationships
PREFIX rdf: 
PREFIX rdfs: 

SELECT ?prop ?label ?range
WHERE {
  ?prop a rdf:Property ;
        rdfs:label ?label .
  OPTIONAL { ?prop rdfs:range ?range }
}
Run Query ↗
Query 5 FAQ Questions and Answers
PREFIX schema: 

SELECT ?question ?text
WHERE {
  ?question a schema:Question ;
            schema:question ?text .
}
Run Query ↗
Query 6 Glossary Terms
PREFIX schema: 

SELECT ?term ?name ?definition
WHERE {
  ?term a schema:DefinedTerm ;
        schema:termName ?name ;
        schema:description ?definition .
}
Run Query ↗
Query 7 How-To Steps
PREFIX schema: 
PREFIX xsd: 

SELECT ?step ?name ?text (COUNT(?step) AS ?position)
WHERE {
  ?step a schema:HowToStep ;
        schema:name ?name ;
        schema:text ?text .
}
ORDER BY ?position
Run Query ↗
Query 8 Technical Architecture Details
PREFIX schema: 

SELECT ?section ?name ?text
WHERE {
  ?section a schema:ArticleSection ;
           schema:name ?name .
  OPTIONAL { ?section schema:text ?text }
  FILTER(CONTAINS(?name, "Technical") || CONTAINS(?name, "Architecture"))
}
Run Query ↗