r/LocalLLaMA · · 2 min read

QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I wanted to try new QATs and opened two collections on HF (which HF found for me):

https://huggingface.co/collections/google/gemma-4-qat-q4-0

https://huggingface.co/collections/unsloth/gemma-4-qat

One strange thing caught my attention, for e.g. E4B: https://huggingface.co/google/gemma-4-E4B-it-qat-q4_0-gguf/resolve/main/gemma-4-E4B_q4_0-it.gguf 5.15 GB

https://huggingface.co/unsloth/gemma-4-E4B-it-qat-GGUF/resolve/main/gemma-4-E4B-it-qat-UD-Q4_K_XL.gguf 4.22 GB

How can _0 be larger than _K_XL I thought. So I checked* (see how at the end) them.

One from Google:

 | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | -------------------------------------------------------------------------------- | q6_k | 0.75 | 2 | 3,489,660,928 | 2.44 GiB | | q4_0 | 0.5 | 342 | 3,945,267,200 | 1.84 GiB | | f16 | 2.0 | 1 | 27,525,120 | 52.50 MiB | | f32 | 4.0 | 321 | 560,426 | 2.14 MiB | 

From unsloth:

 | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | -------------------------------------------------------------------------------- | q4_0 | 0.5 | 345 | 7,462,453,248 | 3.47 GiB | | f32 | 4.0 | 321 | 560,426 | 2.14 MiB | 

I have also checked other GGUFs from Google. E2B:

 | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | -------------------------------------------------------------------------------- | q6_k | 0.75 | 2 | 2,751,463,424 | 1.92 GiB | | q4_0 | 0.5 | 275 | 1,863,057,408 | 888.38 MiB | | f16 | 2.0 | 1 | 13,762,560 | 26.25 MiB | | f32 | 4.0 | 263 | 286,243 | 1.09 MiB | 

Looks _K_XL type to me. Larger ones are just Q4_0 though, e.g. 12B:

 | Dtype | Size Used | Tensors Qty | Elements Total | Bytes Total | -------------------------------------------------------------------------------- | q4_0 | 0.5 | 328 | 10,899,947,520 | 5.08 GiB | | q6_k | 0.75 | 1 | 1,006,632,960 | 720.00 MiB | | f32 | 4.0 | 338 | 770,096 | 2.94 MiB | 

What I do not know and will appreciate the answers is why E2B and E4B have additional (as opposed to larger ones) tensors in GGUF :

1 : f16 | per_layer_model_proj.weight | [1536, 8960] 2 : f32 | per_layer_proj_norm.weight | [256] 3 : q6_k | per_layer_token_embd.weight | [8960, 262144] 
  • koboldcpp --analyze model.GGUF | vibe_coded.py. If you know how to sum up tensors data from GGUFs using llama bundle, please let me know I will compare results with the vibed tool. I have thought about putting the tool on github, but I still do not know how to properly attribute AI usage.
submitted by /u/alex20_202020
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA