Ed Addario's picture

Building on HF

Ed Addario PRO

eaddario

·

EAddario

AI & ML interests

Finding ways to optimize LLMs' inference performance in resource-constrained environments (e.g. commodity hardware, desktops, laptops, mobiles, edge devices, etc.)

Recent Activity

new activity 1 day ago

eaddario/Qwen3.5-9B-GGUF:Amazing quants

new activity 1 day ago

eaddario/imatrix-calibration:Great collection, I'm using it for my little project.

posted an update 1 day ago

Experimental global target bits‑per‑weight quantization of google/gemma-4-E2B-it, google/gemma-4-E4B-it and google/gemma-4-26B-A4B-it Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards https://huggingface.co/eaddario/gemma-4-E2B-it-GGUF https://huggingface.co/eaddario/gemma-4-E4B-it-GGUF https://huggingface.co/eaddario/gemma-4-26B-A4B-it-GGUF

View all activity

Organizations

eaddario 's datasets 1

eaddario/imatrix-calibration

Viewer • Updated 10 days ago • 299 • 17.3k • 38