Support for FP8 + Fused MoE layers in vLLM?
#23
by szlevi - opened
When I try to run it as
vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
--dtype auto
it starts up then crashes:
❌ ValueError: For FP8 Fused MoE layers, only per-tensor scales for weights and activations are supported...
szlevi changed discussion title from Support for FP8 + Fused MoE layersin vLLM? to Support for FP8 + Fused MoE layers in vLLM?
can you try vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 -tp 8 --max-model-len=30000 --dtype auto?
This comment has been hidden (marked as Off-Topic)