inference-optimization/Qwen3-4B-Thinking-2507.w4a16
Text Generation
• 1B • Updated
• 35
inference-optimization/GLM-4.6-FP8-dynamic
353B • Updated
• 1
inference-optimization/GLM-4.6-NVFP4
199B • Updated
• 2
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B • Updated
• 1
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B • Updated
• 4
inference-optimization/Qwen3-Next-80B-A3B-Instruct-quantized.w8a8
Updated
inference-optimization/Llama-3.1-8B-Instruct-HIGGS-quantized-paths
Updated
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-gate_up_proj-all
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-down_proj-all
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-qkv_proj-all
5B • Updated
• 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_DYNAMIC-out_proj-all
5B • Updated
• 1
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated
• 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated
• 4
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated
• 2
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
inference-optimization/Qwen3-32B-QKV-Cache-FP8-Per-Tensor
33B • Updated
• 2
inference-optimization/Qwen3-32B-QKV-Cache-FP8-Per-Head
33B • Updated
• 3
inference-optimization/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
33B • Updated
• 1
inference-optimization/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Head
33B • Updated
• 1
inference-optimization/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor
71B • Updated
inference-optimization/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head
71B • Updated
inference-optimization/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
71B • Updated
• 2
inference-optimization/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
71B • Updated
• 1
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor
8B • Updated
• 1