this post was submitted on 26 Jul 2023

19 points (100.0% liked)

LocalLLaMA

2604 readers

5 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

19

What is better: higher quantiation or higher parameter count? (yiffit.net)

submitted 2 years ago by Wander@yiffit.net to c/localllama@sh.itjust.works

16 comments fedilink hide all child comments

For example, does a 13B parameter model at 2_K quantiation perform worse than a 7B parameter model at 8bit or 16bit?

top 7 comments

sorted by: hot top controversial new old

[–] rufus@discuss.tchncs.de 9 points 2 years ago* (last edited 2 years ago) (2 children)

https://github.com/ggerganov/llama.cpp#quantization

https://github.com/ggerganov/llama.cpp/pull/1684

Regarding your question: 13B 2_K seems to be on par with 7B 16bit

[–] Wander@yiffit.net 5 points 2 years ago

That graph is great. Very easy to understand. Thank you!

[–] noneabove1182@sh.itjust.works 2 points 2 years ago

These are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes:

https://arxiv.org/abs/2210.17323

[–] noneabove1182@sh.itjust.works 2 points 2 years ago (1 children)

Anyone else see 11 comments on the post count but only 2 comments..?

[–] alexthelion335@sh.itjust.works 2 points 2 years ago (1 children)

Yes

[–] rufus@discuss.tchncs.de 5 points 2 years ago (1 children)

Well, a few of those extra numbers are my fault. I edited my answer a few times. And lemmy reportedly counts every edit as an additional comment. (When user and community are on different instances.) I hope they fix that soon.

[–] noneabove1182@sh.itjust.works 2 points 2 years ago

ahh makes sense, i just made a post and deleted the comment i made on it but it glitched and deleted twice so now my post has -1 comments lmao

load more comments