A World Model of Protein Biology

Biohub releases ESM (Evolutionary Scale Models) - comprising ESMFold2, ESM Atlas, and ESMC - for protein structure prediction, sequence mapping, and language modeling.

📅 Published: 2026-05-28 🏢 Publisher: Biohub 📄 License: MIT

ESM Models & Artifacts

6.8B

Sequences in ESM Atlas

1.1B

Predicted Structures

~2.8B

Sequences for ESMC Training

9.4s

ESMFold2-Fast per 1024 aa

🧬

ESMFold2

55% ab-ag, 71% PPI accuracy. Looped Transformer architecture.

🗺️

ESM Atlas

Map of 6.8B sequences and 1.1B predicted structures across all life.

🧠

ESMC (ESM Cambrian)

Protein language model trained on ~2.8B sequences from across all life.

⚡

ESMFold2-Fast

9.4 seconds per 1024-length protein. Outperforms on ab-ag folding.

Therapeutic Targets Validated with ESMFold2

🎯

EGFR

Epidermal Growth Factor Receptor - receptor tyrosine kinase in tumor growth.

🎯

PDGFRβ

Platelet-Derived Growth Factor Receptor beta - implicated in tumor growth.

🛡️

PD-L1

4.3 nM affinity achieved. Immune checkpoint exploited by cancer cells.

🛡️

CTLA-4

Cytotoxic T-Lymphocyte Antigen 4 - immune checkpoint target.

🔬

CD45

Cluster of Differentiation 45 - regulator of immune cell signaling.

Design Success Rates: Minibinders: 54% → 70% with higher compute. scFvs: 12% → 21%. Cryo-EM verified at 1.204 Å RMSD.

Partner Platforms

🔵

NVIDIA

TransformerEngine & cuEquivariance kernels

☁️

AWS Bio Discovery

Cloud platform access

🧬

Benchling AI

Platform for ESM access

📦

Modal

Platform for ESM access

🌲

Phylo

Platform for ESM access

🔷

SandboxAQ

Platform for ESM access

🌿

Tamarind Bio

Platform for ESM access

🛠️

Tool Universe

Platform for ESM access

Frequently Asked Questions

1

What is ESM (Evolutionary Scale Models)?

▼

ESM is a world model of protein biology comprising three artifacts: ESMFold2 (structure prediction), ESM Atlas (map of 6.8B sequences/1.1B structures), and ESMC (protein language model trained on ~2.8B sequences). It learns from protein sequences produced by evolution to represent, map, predict, and design proteins.

2

How does ESMFold2 differ from other structure prediction models?

▼

ESMFold2 uses a looped transformer architecture rather than searching for evolutionarily related sequences (MSAs). It operates directly from ESMC's learned protein representations, capturing evolutionary information encoded during language model pretraining. Achieves 55% on antibody-antigen complexes and 9.4s per 1024-length protein.

3

What therapeutic targets has ESMFold2 been validated against?

▼

Five clinically relevant targets: EGFR and PDGFRβ (receptor tyrosine kinases), PD-L1 and CTLA-4 (immune checkpoints), and CD45 (immune cell signaling regulator). An ESMFold2-designed scFv bound PD-L1 with 4.3 nM affinity.

4

What is the scale of ESM Atlas?

▼

ESM Atlas contains 6.8 billion sequences and 1.1 billion predicted structures, enabling the sequences and structures of proteins across all of life to be studied as a complete picture.

5

How was ESMC trained and what scaling laws were discovered?

▼

ESMC was trained on approximately 2.8 billion sequences drawn from across all of life. A scaling law links compute power used in training to how accurately representations capture biology. This powers linear returns with scale, leading to state-of-the-art protein representations.

6

What are minibinders and scFvs in protein design?

▼

Minibinders are compact, de novo protein scaffolds with no predetermined structure used for binder design. scFvs (single-chain variable fragment antibodies) are antibody-derived molecules using unstructured loops to bind targets. ESMFold2 can design both computationally with therapeutic-relevant affinities.

7

What did sparse autoencoders reveal about ESMC's latent space?

▼

Sparse autoencoders decomposed ESMC's internal representations into more than 16,000 distinct features. The model independently recovered basic organizing principles of biology: amino acid chemistry, local structural interactions, abstract functional concepts, and evolutionary themes connecting all of life.

8

How does ESMFold2 achieve compute-time scaling?

▼

ESMFold2 uses a looped transformer where representations pass through the same parameters multiple times. Each pass refines the structural representation based on previous computations. This allows compute scaling at inference by running more loops without retraining.

9

What partner platforms provide access to ESM?

▼

AWS Bio Discovery, Benchling AI, Modal, Phylo, SandboxAQ, Tamarind Bio, and Tool Universe. ESMFold2, ESMC, and ESM Atlas are available at the Biohub Platform.

10

What is the significance of ESM for medicine?

▼

ESM enables computational protein binder design validated against five clinically relevant targets in oncology and immunology. When digital representations of biology become accurate enough, protein designs can be tested computationally before they reach the bench. Particularly promising for cancer and rare diseases where much of disease is individual.

Glossary of ESM Protein Biology

ESM (Evolutionary Scale Models)

A world model of protein biology comprising ESMFold2, ESM Atlas, and ESMC that learns from protein sequences produced by evolution.

ESMFold2

State-of-the-art protein structure prediction model using looped transformer architecture without requiring multiple sequence alignments.

ESMC (ESM Cambrian)

Protein language model trained on ~2.8 billion sequences from across all of life, providing state-of-the-art representations.

ESM Atlas

A map of 6.8 billion protein sequences and 1.1 billion predicted structures across all of life.

Protein Sequence

Chains of 20 chemical building blocks (amino acids) whose order determines folding and function.

Protein Structure

The three-dimensional arrangement of atoms in a protein, determined by amino acid sequence.

Minibinder

Compact, de novo protein scaffolds with no predetermined structure used for designing protein binders computationally.

scFv (Single-chain Variable Fragment)

Antibody-derived molecules using unstructured loops to bind targets; demanding examples of the binder design problem.

Sparse Autoencoders (SAE)

Technique for identifying interpretable structure in large language models by decomposing internal representations.

Alpha Helix and Beta Sheet

The two primary secondary structure arrangements that form when a protein backbone folds.

How to Design Protein Binders with ESMFold2

Select Target Molecule

Choose a clinically relevant target such as EGFR, PDGFRβ, PD-L1, CTLA-4, or CD45. Define binding site and desired affinity.

Choose Binder Format

Select between minibinders (compact de novo scaffolds) or scFvs (single-chain variable fragment antibodies) based on therapeutic requirements.

Define Design Constraints

Establish parameters: required affinity (nanomolar potency), specificity requirements, and structural constraints for the binding interface.

Run ESMFold2 Design Algorithm

Use ESMFold2's design algorithm searching through joint model of sequence and structure. Higher compute yields up to 70% minibinder success.

Evaluate Predicted Binders

Review computational predictions for binding affinity and selectivity. Select top candidates for experimental validation.

Validate in Laboratory

Test designed binders using cell-based assays to measure affinity and functional activity. Verify structure using cryo-EM if needed.

Iterate and Optimize

Use experimental feedback to refine design parameters. ESM enables rapid iteration - designs can be computationally validated before bench experiments.

Physics

Enable physics Charge strength: -320 Link distance: 100

Predicates

Node Types

Display

Show predicate labels Literal filter:

Class

Property

Instance

SPARQL Workbench

Query 1 All ESM Models — classes and instances ▼

PREFIX schema: 
PREFIX rdfs: 

SELECT ?model ?name ?type ?desc
WHERE {
  { ?model a schema:SoftwareApplication ; schema:name ?name ; schema:description ?desc }
  UNION
  { ?model a rdfs:Class ; rdfs:label ?name }
  FILTER(CONTAINS(LCASE(?name), "esm") || CONTAINS(LCASE(?desc), "protein"))
}

Run Query ↗

Query 2 Therapeutic Targets with Bindings ▼

PREFIX schema: 

SELECT ?target ?name ?desc
WHERE {
  ?target a schema:Product ;
          schema:name ?name ;
          schema:description ?desc .
  FILTER(CONTAINS(?desc, "cancer") || CONTAINS(?desc, "immune") || CONTAINS(?name, "PD"))
}

Run Query ↗

Query 3 Organizations and Partners ▼

PREFIX schema: 
PREFIX owl: 

SELECT ?org ?name ?url ?sameAs
WHERE {
  ?org a schema:Organization ;
       schema:name ?name .
  OPTIONAL { ?org schema:url ?url }
  OPTIONAL { ?org owl:sameAs ?sameAs }
}

Run Query ↗

Query 4 Properties and Relationships ▼

PREFIX rdf: 
PREFIX rdfs: 

SELECT ?prop ?label ?range
WHERE {
  ?prop a rdf:Property ;
        rdfs:label ?label .
  OPTIONAL { ?prop rdfs:range ?range }
}

Run Query ↗

Query 5 FAQ Questions and Answers ▼

PREFIX schema: 

SELECT ?question ?text
WHERE {
  ?question a schema:Question ;
            schema:question ?text .
}

Run Query ↗

Query 6 Glossary Terms ▼

PREFIX schema: 

SELECT ?term ?name ?definition
WHERE {
  ?term a schema:DefinedTerm ;
        schema:termName ?name ;
        schema:description ?definition .
}

Run Query ↗

Query 7 How-To Steps ▼

PREFIX schema: 
PREFIX xsd: 

SELECT ?step ?name ?text (COUNT(?step) AS ?position)
WHERE {
  ?step a schema:HowToStep ;
        schema:name ?name ;
        schema:text ?text .
}
ORDER BY ?position

Run Query ↗

Query 8 Technical Architecture Details ▼

PREFIX schema: 

SELECT ?section ?name ?text
WHERE {
  ?section a schema:ArticleSection ;
           schema:name ?name .
  OPTIONAL { ?section schema:text ?text }
  FILTER(CONTAINS(?name, "Technical") || CONTAINS(?name, "Architecture"))
}

Run Query ↗