4 10

Jonna Matthiesen

JonnaMat

AI & ML interests

None yet

Recent Activity

posted an update about 7 hours ago

⚡ Blackwell-native Vision Reasoning at the edge ⚡ Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B: https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16 💖 Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart. Thorough on-device benchmarks on AGX Thor in the modelcard. 🤓 📊 Try it out: ``` docker run --rm -it \ --network host \ --shm-size=8g \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ --runtime=nvidia \ --name=vllm-serve \ -e HF_TOKEN=hf_*** \ -e HF_HOME=/root/.cache/huggingface \ nvcr.io/nvidia/vllm:26.01-py3 \ vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \ --host 0.0.0.0 \ --port 8000 \ --tensor-parallel-size 1 \ --max-model-len 16384 \ --gpu-memory-utilization 0.9 ```

updated a model about 15 hours ago

embedl/Cosmos-Reason2-2B-W4A16

updated a model about 15 hours ago

JonnaMat/temp-gated-test

View all activity

Organizations

posted an update about 7 hours ago

Post

214

⚡ Blackwell-native Vision Reasoning at the edge ⚡

Released a NVFP4A16-variant of nvidia/Cosmos-Reason2-2B:
embedl/Cosmos-Reason2-2B-NVFP4A16

💖 Optimized for Blackwell with minimal accuracy drop compared to its FP16 counterpart.

Thorough on-device benchmarks on AGX Thor in the modelcard. 🤓 📊

Try it out:

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9

posted an update 4 days ago

Post

245

👀 Multimodal VLM running on 8GB RAM

Dropped edge-optimized Cosmos-Reason2 running on Jetson Orin. 🚀

Check it out: embedl/Cosmos-Reason2-2B-W4A16

posted an update about 7 hours ago

Post

214

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  nvcr.io/nvidia/vllm:26.01-py3 \
  vllm serve "embedl/Cosmos-Reason2-2B-NVFP4A16" \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 16384 \
    --gpu-memory-utilization 0.9

posted an update 4 days ago

Post

245

👀 Multimodal VLM running on 8GB RAM

Dropped edge-optimized Cosmos-Reason2 running on Jetson Orin. 🚀

Check it out: embedl/Cosmos-Reason2-2B-W4A16

Jonna Matthiesen

AI & ML interests

Recent Activity

Organizations

JonnaMat's activity

Jonna Matthiesen

AI & ML interests

Recent Activity

Organizations

JonnaMat's activity