What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
Paper
β’
2512.24497
β’
Published
β’
4
π€ JEPA-WMs Pretrained Models
This π€ HuggingFace repository hosts pretrained JEPA-WM world models.
π See the main repository for training code and datasets.
This repository contains pretrained world model checkpoints from the paper "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
jepa_wm_droid |
DROID & RoboCasa | 256Γ256 | DINOv3 ViT-L/16 | 12 |
jepa_wm_metaworld |
Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_pusht |
Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_pointmaze |
PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
jepa_wm_wall |
Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
dino_wm_droid |
DROID & RoboCasa | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_metaworld |
Metaworld | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_pusht |
Push-T | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_pointmaze |
PointMaze | 224Γ224 | DINOv2 ViT-S/14 | 6 |
dino_wm_wall |
Wall | 224Γ224 | DINOv2 ViT-S/14 | 6 |
| Model | Environment | Resolution | Encoder | Pred. Depth |
|---|---|---|---|---|
vjepa2_ac_droid |
DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
vjepa2_ac_oss |
DROID & RoboCasa | 256Γ256 | V-JEPA-2 ViT-G/16 | 24 |
| Model | Encoder | Resolution |
|---|---|---|
dinov2_vits_224 |
DINOv2 ViT-S/14 | 224Γ224 |
dinov2_vits_224_INet |
DINOv2 ViT-S/14 | 224Γ224 |
dinov3_vitl_256_INet |
DINOv3 ViT-L/16 | 256Γ256 |
vjepa2_vitg_256_INet |
V-JEPA-2 ViT-G/16 | 256Γ256 |
import torch
# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')
# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')
# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')
from huggingface_hub import hf_hub_download
import torch
# Download a specific checkpoint
checkpoint_path = hf_hub_download(
repo_id="facebook/jepa-wms",
filename="jepa_wm_droid.pth.tar"
)
# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys()) # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])
Note: This only downloads the weights. To instantiate the full model with the correct architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the jepa-wms repository and using the training/eval scripts.
@misc{terver2025drivessuccessphysicalplanning,
title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
year={2025},
eprint={2512.24497},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24497},
}
These models are licensed under CC-BY-NC 4.0.