amd
/

MiniMax-M2.1-MXFP4

Text Generation

8-bit precision

Model card Files Files and versions

Mismatch model shape

#1

by twinsen123 - opened 2 days ago

2 days ago

Hi there,

we are hitting below issue that when we are running the model against with MI300X using the suggested VLLM version.

It reported the data.shape and loaded weight are incorrect

assert param_data.shape == loaded_weight.shape

docker run -it
--device=/dev/kfd
--device=/dev/dri
--group-add video
--shm-size 16G
--security-opt seccomp=unconfined
--security-opt apparmor=unconfined
--cap-add=SYS_PTRACE
--env VLLM_ROCM_USE_AITER=1
--env VLLM_DISABLE_COMPILE_CACHE=1
-p 8000:8000
-d
rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210
bash -c "
python3 -m vllm.entrypoints.openai.api_server
--model amd/MiniMax-M2.1-MXFP4
--gpu-memory-utilization 0.95
--max-model-len 196608
--kv-cache-dtype fp8
--enable-chunked-prefill false
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--quantization quark
--trust_remote_code
--enable-auto-tool-choice
--host 0.0.0.0
--port 8000

any chance can have some suggestion how to fix it?

AMD org about 5 hours ago

•

edited about 5 hours ago

Hi, this is a model support issue in vLLM for MiniMax-M2.
Would you please add a patch by applying the packed_modules_mapping to MiniMaxM2Model?
Take this as a reference: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/minimax_vl_01.py#L182

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

amd/MiniMax-M2.1-MXFP4 · Mismatch model shape

amd
/

MiniMax-M2.1-MXFP4

Text Generation

8-bit precision

Model card Files Files and versions

Mismatch model shape

#1

by twinsen123 - opened 2 days ago

2 days ago

Hi there,

we are hitting below issue that when we are running the model against with MI300X using the suggested VLLM version.

It reported the data.shape and loaded weight are incorrect

assert param_data.shape == loaded_weight.shape

docker run -it
--device=/dev/kfd
--device=/dev/dri
--group-add video
--shm-size 16G
--security-opt seccomp=unconfined
--security-opt apparmor=unconfined
--cap-add=SYS_PTRACE
--env VLLM_ROCM_USE_AITER=1
--env VLLM_DISABLE_COMPILE_CACHE=1
-p 8000:8000
-d
rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210
bash -c "
python3 -m vllm.entrypoints.openai.api_server
--model amd/MiniMax-M2.1-MXFP4
--gpu-memory-utilization 0.95
--max-model-len 196608
--kv-cache-dtype fp8
--enable-chunked-prefill false
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--quantization quark
--trust_remote_code
--enable-auto-tool-choice
--host 0.0.0.0
--port 8000

any chance can have some suggestion how to fix it?

AMD org about 5 hours ago

•

edited about 5 hours ago

Hi, this is a model support issue in vLLM for MiniMax-M2.
Would you please add a patch by applying the packed_modules_mapping to MiniMaxM2Model?
Take this as a reference: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/minimax_vl_01.py#L182

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment