Text Generation
GGUF
PyTorch
nvidia
nemotron-3
latent-moe
mtp
unsloth
conversational

NVFP4 GGUF?

#3
by andrew-stanton - opened

My understanding is that nvidia trained it end to end on NVFP4, similar to how GPT-OSS-120b/20b did for MXFP4. I looked at the MXFP4_MOE quants you provided and it appears majority of the tensors are actually in F32 and q8. Any plans to release of the natively trained NVFP4 model in GGUF?

It looks like llama.cpp support for NVFP4 was merged today?

https://github.com/ggml-org/llama.cpp/pull/19769

Unsloth AI org

It looks like llama.cpp support for NVFP4 was merged today?

https://github.com/ggml-org/llama.cpp/pull/19769

We'll see what we can do. Llama.cpp team always cooking

Any update on this?
Even for models that were not trained natively in NVFP4 would be of great use in this format for blackwell users.

Sign up or log in to comment