mlx-community/gemma-4-e4b-it-OptiQ-4bit

A 4-bit mixed-precision MLX quant produced by mlx-optiq, the sensitivity-aware quantization toolkit for Apple Silicon.

A 4-bit mixed-precision MLX quant of google/gemma-4-e4b-it. Per-layer bit-widths come from a KL-divergence sensitivity pass on the bundled optiq.jsonl five-domain calibration mix (prose · reasoning · code · agent · tool-call). Sensitive layers go to 8-bit; robust ones stay at 4-bit. The on-disk size is within ~5 % of a stock uniform 4-bit MLX quant.

Quantization details

Property	Value
Predominant precision	4-bit
Layers at 8-bit (sensitive)	155
Layers at 4-bit (robust)	224
Total quantized layers	379
Group size	64
Calibration mix	`optiq.jsonl` (32 samples × 5 domains)
Reference for sensitivity	bf16 (auto-resolved; falls back to uniform-4-bit if bf16 doesn't fit)

We follow the same naming convention llama.cpp uses for Q4_K_M and similar mixed-precision quants: the "4-bit" label is for the predominant precision, not the weighted average. The mixed allocation is what lets this build beat stock uniform-4-bit at the same disk size. Benchmark deltas are below.

Usage

Load it with mlx-lm and use it as usual:

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/gemma-4-e4b-it-OptiQ-4bit")
response = generate(
    model, tokenizer,
    prompt="Explain quantum computing in simple terms.",
    max_tokens=200,
)

For more (mixed-precision KV-cache serving, sensitivity-aware LoRA fine-tuning, OpenAI + Anthropic-compatible inference server, hot-swap mounted adapters, sandboxed Python execution for agent workflows), install mlx-optiq:

pip install mlx-optiq

See the Gemma-4 family guide on mlx-optiq.com for sampling defaults, training recipes, and family-specific caveats.

Benchmarks

Five-metric suite that drives the Capability Score:

Metric	Score
MMLU (5-shot, 1000 samples)	58.8%
GSM8K (1000 samples, 3-shot CoT)	77.8%
IFEval (full set, strict)	70.6%
IFEval (full set, loose)	70.8%
BFCL-V3 simple (200 single-turn calls)	69.0%
HumanEval (164 problems, pass@1)	76.8%
Capability Score (mean of the 5 benchmarks above)	70.6
KL vs bf16 reference (mean / p95)	0.2755 / 1.3460
On-disk size	6.1 GB

The Capability Score is the simple unweighted mean of the five benchmarks. Every metric gets one equal vote. Disk size is reported next to it as an honest second axis instead of being folded into the score. See the eval-framework writeup for the full methodology.

License

Gemma license (inherits from base model). See https://ai.google.dev/gemma/terms for the terms of use.

Downloads last month: 6,982

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

mlx-community
/

gemma-4-e4b-it-OptiQ-4bit

mlx-community/gemma-4-e4b-it-OptiQ-4bit

Quantization details

Usage

Benchmarks

Links

License