Support for FP8 + Fused MoE layers in vLLM?

#23
by szlevi - opened

When I try to run it as
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
--dtype auto

it starts up then crashes:

❌ ValueError: For FP8 Fused MoE layers, only per-tensor scales for weights and activations are supported...

szlevi changed discussion title from Support for FP8 + Fused MoE layersin vLLM? to Support for FP8 + Fused MoE layers in vLLM?
Meta Llama org

can you try vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len=30000 --dtype auto?

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment