vlut.cpp SOTA ternary-packed versions of 1.58-bit LLMs for efficient on-device inference with vlut.cpp. XXXXyu/Llama3-8B-1.58-100B-tokens-vlut-gguf Text Generation • 8B • Updated Jan 1 • 49 XXXXyu/bitnet_b1_58-3B-vlut-gguf Text Generation • 3B • Updated Jan 1 • 91 XXXXyu/Falcon3-1B-Instruct-1.58bit-vlut-gguf Text Generation • 2B • Updated Jan 1 • 55
vlut.cpp SOTA ternary-packed versions of 1.58-bit LLMs for efficient on-device inference with vlut.cpp. XXXXyu/Llama3-8B-1.58-100B-tokens-vlut-gguf Text Generation • 8B • Updated Jan 1 • 49 XXXXyu/bitnet_b1_58-3B-vlut-gguf Text Generation • 3B • Updated Jan 1 • 91 XXXXyu/Falcon3-1B-Instruct-1.58bit-vlut-gguf Text Generation • 2B • Updated Jan 1 • 55