Support for FP8 + Fused MoE layers in vLLM?

#23

by szlevi - opened Apr 10, 2025

When I try to run it as
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
--dtype auto

it starts up then crashes:

❌ ValueError: For FP8 Fused MoE layers, only per-tensor scales for weights and activations are supported...

szlevi changed discussion title from Support for FP8 + Fused MoE layersin vLLM? to Support for FP8 + Fused MoE layers in vLLM? Apr 10, 2025

Meta Llama org Apr 11, 2025

can you try vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len=30000 --dtype auto?

This comment has been hidden (marked as Off-Topic)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment