π UniPic3-Teacher-Model
π Introduction
UniPic3-Teacher-Model is the high-quality teacher diffusion model used in the UniPic 3.0 framework. It is trained with full multi-step diffusion sampling and optimized for maximum perceptual quality, semantic consistency, and realism.
This model serves as the teacher backbone for:
- Distribution Matching Distillation (DMD)
- Consistency / trajectory distillation
- Few-step student model training
Rather than being optimized for fast inference, the teacher model prioritizes generation fidelity and stability, providing a strong and reliable supervision signal for downstream distilled models.
π§ Model Characteristics
- Role: Teacher model (not a distilled student)
- Sampling: Multi-step diffusion (high-fidelity)
- Architecture: Unified UniPic3 Transformer
- Tasks Supported:
- Single-image editing
- Multi-image composition (2β6 images)
- HumanβObject Interaction (HOI)
- Resolution: Flexible, within pixel budget constraints
- Training Objective:
- Flow Matching / Diffusion loss
- Used as teacher for DMD & consistency training
π Benchmarks
This teacher model achieves state-of-the-art performance on:
- Image editing benchmarks
- Multi-image composition benchmarks
It provides high-quality supervision targets for distilled UniPic3 student models.
β οΈ Important Note
This repository hosts the teacher model.
It is not optimized for few-step inference.
If you are looking for:
- β‘ 4β8 step fast inference
- π Deployment-friendly distilled models
please refer to the UniPic3-DMD / distilled checkpoints instead.
π§ Usage (Teacher Model)
1. Clone the Repository
git clone https://github.com/SkyworkAI/UniPic
cd UniPic-3
2. Set Up the Environment
conda create -n unipic python=3.10
conda activate unipic3
pip install -r requirements.txt
3.Batch Inference
transformer_path = "Skywork/Unipic3"
python -m torch.distributed.launch --nproc_per_node=1 --master_port 29501 --use_env \
qwen_image_edit_fast/batch_inference.py \
--jsonl_path data/val.jsonl \
--output_dir work_dirs/output \
--distributed \
--num_inference_steps 50 \
--true_cfg_scale 4.0 \
--transformer transformer_path \
--skip_existing
π License
This model is released under the MIT License.
Citation
If you use Skywork-UniPic in your research, please cite:
@article{wang2025skywork,
title={Skywork unipic: Unified autoregressive modeling for visual understanding and generation},
author={Wang, Peiyu and Peng, Yi and Gan, Yimeng and Hu, Liang and Xie, Tianyidan and Wang, Xiaokun and Wei, Yichen and Tang, Chuanxin and Zhu, Bo and Li, Changshi and others},
journal={arXiv preprint arXiv:2508.03320},
year={2025}
}
@article{wei2025skywork,
title={Skywork unipic 2.0: Building kontext model with online rl for unified multimodal model},
author={Wei, Hongyang and Xu, Baixin and Liu, Hongbo and Wu, Cyrus and Liu, Jie and Peng, Yi and Wang, Peiyu and Liu, Zexiang and He, Jingwen and Xietian, Yidan and others},
journal={arXiv preprint arXiv:2509.04548},
year={2025}
}
@article{wei2026skywork,
title={Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling},
author={Wei, Hongyang and Liu, Hongbo and Wang, Zidong and Peng, Yi and Xu, Baixin and Wu, Size and Zhang, Xuying and He, Xianglong and Liu, Zexiang and Wang, Peiyu and others},
journal={arXiv preprint arXiv:2601.15664},
year={2026}
}
- Downloads last month
- 71