NVFP4 GGUF?
#3
by andrew-stanton - opened
My understanding is that nvidia trained it end to end on NVFP4, similar to how GPT-OSS-120b/20b did for MXFP4. I looked at the MXFP4_MOE quants you provided and it appears majority of the tensors are actually in F32 and q8. Any plans to release of the natively trained NVFP4 model in GGUF?
It looks like llama.cpp support for NVFP4 was merged today?
It looks like llama.cpp support for NVFP4 was merged today?
We'll see what we can do. Llama.cpp team always cooking
Any update on this?
Even for models that were not trained natively in NVFP4 would be of great use in this format for blackwell users.