AI & ML interests

None defined yet.

Recent Activity

inference-optimization 's collections 4

NVIDIA-Nemotron-3-Nano-30B-A3B Quantized Models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
Mixed Precision Models
Collection of Mixed Precision LLaMA and Qwen Models
KV Cache Quantization
Collection on FP8 Quantization of Weights, Activations and KV Cache