Adding desc
AI & ML interests
None defined yet.
Recent Activity
Papers
Learning Personalized Agents from Human Feedback
Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation
Collection for Code World Model, an agentic coding model from FAIR.
A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann
-
facebook/vjepa2-vitl-fpc64-256
Video Classification • 0.3B • Updated • 46.4k • 182 -
facebook/vjepa2-vith-fpc64-256
Video Classification • 0.7B • Updated • 3.51k • 15 -
facebook/vjepa2-vitg-fpc64-256
Video Classification • 1B • Updated • 74.7k • 25 -
facebook/vjepa2-vitg-fpc64-384
Video Classification • 1B • Updated • 14.9k • 37
A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages.
Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
facebook/MobileLLM-125M
Text Generation • Updated • 760 • 130 -
facebook/MobileLLM-350M
Text Generation • Updated • 46 • 36 -
facebook/MobileLLM-600M
Text Generation • Updated • 7 • 29
Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710
-
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 80 -
facebook/layerskip-llama2-7B
Text Generation • 7B • Updated • 10 • 15 -
facebook/layerskip-llama2-13B
Text Generation • 13B • Updated • 12 • 5 -
facebook/layerskip-llama2-70B
Text Generation • 69B • Updated • 4 • 5
A significant step towards removing language barriers through expressive, fast and high-quality AI translation.
-
Seamless: Multilingual Expressive and Streaming Speech Translation
Paper • 2312.05187 • Published • 14 -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 62.4k • 957 -
Seamless M4T v2
📞516Translate speech and text between languages
-
facebook/seamless-expressive
Text-to-Speech • Updated • 187
A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 64.7k • 160 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 16.1k • 34 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.56M • 387 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 1.69k • 7
A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition.
-
facebook/wav2vec2-large-xlsr-53
Updated • 324k • 155 -
facebook/wav2vec2-xlsr-53-espeak-cv-ft
Automatic Speech Recognition • Updated • 374k • 46 -
facebook/wav2vec2-large-xlsr-53-dutch
Automatic Speech Recognition • Updated • 81 • 3 -
facebook/wav2vec2-large-xlsr-53-french
Automatic Speech Recognition • Updated • 718 • 13
A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains.
-
facebook/wav2vec2-large-robust
Updated • 6.93k • 38 -
facebook/wav2vec2-large-robust-ft-libri-960h
Automatic Speech Recognition • 0.3B • Updated • 111k • 15 -
facebook/wav2vec2-large-robust-ft-swbd-300h
Automatic Speech Recognition • Updated • 4.84k • 20 -
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Paper • 2104.01027 • Published • 1
A collection of checkpoints from the second VoxPopuli release.
-
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-bg-voxpopuli-v2
Automatic Speech Recognition • Updated • 5 • 2 -
facebook/wav2vec2-base-cs-voxpopuli-v2
Automatic Speech Recognition • Updated • 5 • 1 -
facebook/wav2vec2-base-da-voxpopuli-v2
Automatic Speech Recognition • Updated • 8
Text-to-speech models from fairseq s^2
A collection of stereo music generation models as part of the v2 MusicGen release.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3.
-
facebook/metaclip-2-worldwide-huge-quickgelu
Zero-Shot Image Classification • 2B • Updated • 8.33k • 17 -
facebook/metaclip-2-worldwide-huge-378
Zero-Shot Image Classification • 2B • Updated • 276 • 6 -
facebook/metaclip-2-worldwide-giant
Zero-Shot Image Classification • 4B • Updated • 95 • 7 -
facebook/metaclip-2-worldwide-giant-378
Zero-Shot Image Classification • 4B • Updated • 1.32k • 11
MobileLLM-R1, a series of sub-billion parameter reasoning models
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
-
facebook/dinov3-vit7b16-pretrain-lvd1689m
Image Feature Extraction • Updated • 22.4k • 211 -
facebook/dinov3-vits16-pretrain-lvd1689m
Image Feature Extraction • 21.6M • Updated • 108k • 67 -
facebook/dinov3-convnext-small-pretrain-lvd1689m
Image Feature Extraction • 49.5M • Updated • 30.5k • 22 -
facebook/dinov3-vitb16-pretrain-lvd1689m
Image Feature Extraction • 85.7M • Updated • 217k • 101
Scaling CLIP data with transparent training distribution from an end-to-end pipeline.
-
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 14.9k • 49 -
facebook/metaclip-l14-fullcc2.5b
Zero-Shot Image Classification • Updated • 2.17k • 7 -
facebook/metaclip-b16-fullcc2.5b
Zero-Shot Image Classification • Updated • 2.13k • 11 -
facebook/metaclip-b32-fullcc2.5b
Zero-Shot Image Classification • Updated • 50 • 9
-
facebook/webssl-dino300m-full2b-224
Image Feature Extraction • 0.3B • Updated • 882 • 11 -
facebook/webssl-dino1b-full2b-224
Image Feature Extraction • 1B • Updated • 201 • 3 -
facebook/webssl-dino2b-full2b-224
Image Feature Extraction • 2B • Updated • 24 -
facebook/webssl-dino3b-full2b-224
Image Feature Extraction • 3B • Updated • 12
A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing
MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Masked Audio Generation using a Single Non-Autoregressive Transformer
SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly.
First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release.
-
facebook/voxpopuli
Viewer • Updated • 1.26M • 12.5k • 143 -
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-100k-voxpopuli
Automatic Speech Recognition • Updated • 120 • 4 -
facebook/wav2vec2-base-10k-voxpopuli-ft-cs
Automatic Speech Recognition • Updated • 2
A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data.
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Paper • 2106.07447 • Published • 4 -
facebook/hubert-base-ls960
Feature Extraction • Updated • 191k • • 66 -
facebook/hubert-large-ll60k
Feature Extraction • Updated • 418k • • 33 -
facebook/hubert-large-ls960-ft
Automatic Speech Recognition • Updated • 213k • 76
DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193
-
facebook/dinov2-small
Image Feature Extraction • 22.1M • Updated • 2.77M • 57 -
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.42M • 169 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 747k • 102 -
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 144k • 56
Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning.
Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens
Adding desc
-
facebook/metaclip-2-worldwide-huge-quickgelu
Zero-Shot Image Classification • 2B • Updated • 8.33k • 17 -
facebook/metaclip-2-worldwide-huge-378
Zero-Shot Image Classification • 2B • Updated • 276 • 6 -
facebook/metaclip-2-worldwide-giant
Zero-Shot Image Classification • 4B • Updated • 95 • 7 -
facebook/metaclip-2-worldwide-giant-378
Zero-Shot Image Classification • 4B • Updated • 1.32k • 11
MobileLLM-R1, a series of sub-billion parameter reasoning models
Collection for Code World Model, an agentic coding model from FAIR.
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
-
facebook/dinov3-vit7b16-pretrain-lvd1689m
Image Feature Extraction • Updated • 22.4k • 211 -
facebook/dinov3-vits16-pretrain-lvd1689m
Image Feature Extraction • 21.6M • Updated • 108k • 67 -
facebook/dinov3-convnext-small-pretrain-lvd1689m
Image Feature Extraction • 49.5M • Updated • 30.5k • 22 -
facebook/dinov3-vitb16-pretrain-lvd1689m
Image Feature Extraction • 85.7M • Updated • 217k • 101
Scaling CLIP data with transparent training distribution from an end-to-end pipeline.
-
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 14.9k • 49 -
facebook/metaclip-l14-fullcc2.5b
Zero-Shot Image Classification • Updated • 2.17k • 7 -
facebook/metaclip-b16-fullcc2.5b
Zero-Shot Image Classification • Updated • 2.13k • 11 -
facebook/metaclip-b32-fullcc2.5b
Zero-Shot Image Classification • Updated • 50 • 9
A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann
-
facebook/vjepa2-vitl-fpc64-256
Video Classification • 0.3B • Updated • 46.4k • 182 -
facebook/vjepa2-vith-fpc64-256
Video Classification • 0.7B • Updated • 3.51k • 15 -
facebook/vjepa2-vitg-fpc64-256
Video Classification • 1B • Updated • 74.7k • 25 -
facebook/vjepa2-vitg-fpc64-384
Video Classification • 1B • Updated • 14.9k • 37
-
facebook/webssl-dino300m-full2b-224
Image Feature Extraction • 0.3B • Updated • 882 • 11 -
facebook/webssl-dino1b-full2b-224
Image Feature Extraction • 1B • Updated • 201 • 3 -
facebook/webssl-dino2b-full2b-224
Image Feature Extraction • 2B • Updated • 24 -
facebook/webssl-dino3b-full2b-224
Image Feature Extraction • 3B • Updated • 12
A collection of small (sub-1B) multilingual dense retrievers that generalize well across a number of tasks and languages.
A first-of-its-kind behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 -
facebook/MobileLLM-125M
Text Generation • Updated • 760 • 130 -
facebook/MobileLLM-350M
Text Generation • Updated • 46 • 36 -
facebook/MobileLLM-600M
Text Generation • Updated • 7 • 29
Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing
Models continually pretrained using LayerSkip - https://arxiv.org/abs/2404.16710
-
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper • 2404.16710 • Published • 80 -
facebook/layerskip-llama2-7B
Text Generation • 7B • Updated • 10 • 15 -
facebook/layerskip-llama2-13B
Text Generation • 13B • Updated • 12 • 5 -
facebook/layerskip-llama2-70B
Text Generation • 69B • Updated • 4 • 5
MelodyFlow: High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
A significant step towards removing language barriers through expressive, fast and high-quality AI translation.
-
Seamless: Multilingual Expressive and Streaming Speech Translation
Paper • 2312.05187 • Published • 14 -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 62.4k • 957 -
Seamless M4T v2
📞516Translate speech and text between languages
-
facebook/seamless-expressive
Text-to-Speech • Updated • 187
Masked Audio Generation using a Single Non-Autoregressive Transformer
A collection for the first release of Wav2Vec 2.0, a speech encoder that learns powerful representations from unlabelled audio data.
-
facebook/wav2vec2-large-960h-lv60-self
Automatic Speech Recognition • Updated • 64.7k • 160 -
facebook/wav2vec2-large-960h
Automatic Speech Recognition • Updated • 16.1k • 34 -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 2.56M • 387 -
facebook/wav2vec2-base-100h
Automatic Speech Recognition • Updated • 1.69k • 7
SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly.
A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition.
-
facebook/wav2vec2-large-xlsr-53
Updated • 324k • 155 -
facebook/wav2vec2-xlsr-53-espeak-cv-ft
Automatic Speech Recognition • Updated • 374k • 46 -
facebook/wav2vec2-large-xlsr-53-dutch
Automatic Speech Recognition • Updated • 81 • 3 -
facebook/wav2vec2-large-xlsr-53-french
Automatic Speech Recognition • Updated • 718 • 13
First release checkpoints for XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.
A collection of "robust" Wav2Vec 2.0 checkpoints pre-trained on datasets from multiple domains.
-
facebook/wav2vec2-large-robust
Updated • 6.93k • 38 -
facebook/wav2vec2-large-robust-ft-libri-960h
Automatic Speech Recognition • 0.3B • Updated • 111k • 15 -
facebook/wav2vec2-large-robust-ft-swbd-300h
Automatic Speech Recognition • Updated • 4.84k • 20 -
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Paper • 2104.01027 • Published • 1
A collection of open-source artefacts (datasets + checkpoints) from the first VoxPopuli release.
-
facebook/voxpopuli
Viewer • Updated • 1.26M • 12.5k • 143 -
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-100k-voxpopuli
Automatic Speech Recognition • Updated • 120 • 4 -
facebook/wav2vec2-base-10k-voxpopuli-ft-cs
Automatic Speech Recognition • Updated • 2
A collection of checkpoints from the second VoxPopuli release.
-
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Paper • 2101.00390 • Published • 1 -
facebook/wav2vec2-base-bg-voxpopuli-v2
Automatic Speech Recognition • Updated • 5 • 2 -
facebook/wav2vec2-base-cs-voxpopuli-v2
Automatic Speech Recognition • Updated • 5 • 1 -
facebook/wav2vec2-base-da-voxpopuli-v2
Automatic Speech Recognition • Updated • 8
A collection of checkpoints from the HuBERT release, a speech encoder that learns powerful representations from unlabelled audio data.
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Paper • 2106.07447 • Published • 4 -
facebook/hubert-base-ls960
Feature Extraction • Updated • 191k • • 66 -
facebook/hubert-large-ll60k
Feature Extraction • Updated • 418k • • 33 -
facebook/hubert-large-ls960-ft
Automatic Speech Recognition • Updated • 213k • 76
Text-to-speech models from fairseq s^2
DINOv2: foundation models producing robust visual features suitable for image-level and pixel-level visual tasks - https://arxiv.org/abs/2304.07193
-
facebook/dinov2-small
Image Feature Extraction • 22.1M • Updated • 2.77M • 57 -
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.42M • 169 -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 747k • 102 -
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 144k • 56
A collection of stereo music generation models as part of the v2 MusicGen release.
Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens
OPT (Open Pretrained Transformer) is a series of open-sourced large causal language models which perform similar in performance to GPT3.