Post
214
โก Blackwell-native Vision Reasoning at the edge โก
Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16
๐ Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.
Thorough on-device benchmarks on AGX Thor in the modelcard. ๐ค ๐
Try it out:
Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16
๐ Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.
Thorough on-device benchmarks on AGX Thor in the modelcard. ๐ค ๐
Try it out:
docker run --rm -it \
--network host \
--shm-size=8g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--runtime=nvidia \
--name=vllm-serve \
-e HF_TOKEN=hf_*** \
-e HF_HOME=/root/.cache/huggingface \
nvcr.io/nvidia/vllm:26.01-py3 \
vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 1 \
--max-model-len 16384 \
--gpu-memory-utilization 0.9