Robotics
PyTorch
world-model
jepa
planning

πŸ€– JEPA-WMs Pretrained Models

Github HuggingFace ArXiv

Meta AI Research, FAIR

This πŸ€— HuggingFace repository hosts pretrained JEPA-WM world models.
πŸ‘‰ See the main repository for training code and datasets.

This repository contains pretrained world model checkpoints from the paper "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"

Available Models

JEPA-WM Models

Model Environment Resolution Encoder Pred. Depth
jepa_wm_droid DROID & RoboCasa 256Γ—256 DINOv3 ViT-L/16 12
jepa_wm_metaworld Metaworld 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_pusht Push-T 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_pointmaze PointMaze 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_wall Wall 224Γ—224 DINOv2 ViT-S/14 6

DINO-WM Baseline Models

Model Environment Resolution Encoder Pred. Depth
dino_wm_droid DROID & RoboCasa 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_metaworld Metaworld 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_pusht Push-T 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_pointmaze PointMaze 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_wall Wall 224Γ—224 DINOv2 ViT-S/14 6

V-JEPA-2-AC Baseline Models

Model Environment Resolution Encoder Pred. Depth
vjepa2_ac_droid DROID & RoboCasa 256Γ—256 V-JEPA-2 ViT-G/16 24
vjepa2_ac_oss DROID & RoboCasa 256Γ—256 V-JEPA-2 ViT-G/16 24

VM2M Decoder Heads

Model Encoder Resolution
dinov2_vits_224 DINOv2 ViT-S/14 224Γ—224
dinov2_vits_224_INet DINOv2 ViT-S/14 224Γ—224
dinov3_vitl_256_INet DINOv3 ViT-L/16 256Γ—256
vjepa2_vitg_256_INet V-JEPA-2 ViT-G/16 256Γ—256

Usage

Via PyTorch Hub (Recommended)

import torch

# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')

# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')

# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')

Via Hugging Face Hub

from huggingface_hub import hf_hub_download
import torch

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="facebook/jepa-wms",
    filename="jepa_wm_droid.pth.tar"
)

# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys())  # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])

Note: This only downloads the weights. To instantiate the full model with the correct architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the jepa-wms repository and using the training/eval scripts.

Citation

@misc{terver2025drivessuccessphysicalplanning,
      title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
      author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
      year={2025},
      eprint={2512.24497},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.24497},
}

License

These models are licensed under CC-BY-NC 4.0.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train facebook/jepa-wms

Paper for facebook/jepa-wms

facebook/jepa-wms Β· Hugging Face

Robotics
PyTorch
world-model
jepa
planning

πŸ€– JEPA-WMs Pretrained Models

Github HuggingFace ArXiv

Meta AI Research, FAIR

This πŸ€— HuggingFace repository hosts pretrained JEPA-WM world models.
πŸ‘‰ See the main repository for training code and datasets.

This repository contains pretrained world model checkpoints from the paper "What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?"

Available Models

JEPA-WM Models

Model Environment Resolution Encoder Pred. Depth
jepa_wm_droid DROID & RoboCasa 256Γ—256 DINOv3 ViT-L/16 12
jepa_wm_metaworld Metaworld 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_pusht Push-T 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_pointmaze PointMaze 224Γ—224 DINOv2 ViT-S/14 6
jepa_wm_wall Wall 224Γ—224 DINOv2 ViT-S/14 6

DINO-WM Baseline Models

Model Environment Resolution Encoder Pred. Depth
dino_wm_droid DROID & RoboCasa 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_metaworld Metaworld 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_pusht Push-T 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_pointmaze PointMaze 224Γ—224 DINOv2 ViT-S/14 6
dino_wm_wall Wall 224Γ—224 DINOv2 ViT-S/14 6

V-JEPA-2-AC Baseline Models

Model Environment Resolution Encoder Pred. Depth
vjepa2_ac_droid DROID & RoboCasa 256Γ—256 V-JEPA-2 ViT-G/16 24
vjepa2_ac_oss DROID & RoboCasa 256Γ—256 V-JEPA-2 ViT-G/16 24

VM2M Decoder Heads

Model Encoder Resolution
dinov2_vits_224 DINOv2 ViT-S/14 224Γ—224
dinov2_vits_224_INet DINOv2 ViT-S/14 224Γ—224
dinov3_vitl_256_INet DINOv3 ViT-L/16 256Γ—256
vjepa2_vitg_256_INet V-JEPA-2 ViT-G/16 256Γ—256

Usage

Via PyTorch Hub (Recommended)

import torch

# Load JEPA-WM models
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_droid')
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'jepa_wm_metaworld')

# Load DINO-WM baselines
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'dino_wm_metaworld')

# Load V-JEPA-2-AC baseline
model, preprocessor = torch.hub.load('facebookresearch/jepa-wms', 'vjepa2_ac_droid')

Via Hugging Face Hub

from huggingface_hub import hf_hub_download
import torch

# Download a specific checkpoint
checkpoint_path = hf_hub_download(
    repo_id="facebook/jepa-wms",
    filename="jepa_wm_droid.pth.tar"
)

# Load checkpoint (contains 'encoder', 'predictor', and 'heads' state dicts)
checkpoint = torch.load(checkpoint_path, map_location="cpu")
print(checkpoint.keys())  # dict_keys(['encoder', 'predictor', 'heads', 'opt', 'scaler', 'epoch', 'batch_size', 'lr', 'amp'])

Note: This only downloads the weights. To instantiate the full model with the correct architecture and load the weights, we recommend using PyTorch Hub (see above) or cloning the jepa-wms repository and using the training/eval scripts.

Citation

@misc{terver2025drivessuccessphysicalplanning,
      title={What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?},
      author={Basile Terver and Tsung-Yen Yang and Jean Ponce and Adrien Bardes and Yann LeCun},
      year={2025},
      eprint={2512.24497},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.24497},
}

License

These models are licensed under CC-BY-NC 4.0.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train facebook/jepa-wms

Paper for facebook/jepa-wms