https://github.com/ggerganov/llama.cpp#quantization
https://github.com/ggerganov/llama.cpp/pull/1684
Regarding your question: 13B 2_K seems to be on par with 7B 16bit
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
https://github.com/ggerganov/llama.cpp#quantization
https://github.com/ggerganov/llama.cpp/pull/1684
Regarding your question: 13B 2_K seems to be on par with 7B 16bit
That graph is great. Very easy to understand. Thank you!
These are good sources, to add one more, the GPTQ paper talks a lot about perplexity at several quantization and model sizes:
Anyone else see 11 comments on the post count but only 2 comments..?
Yes
Well, a few of those extra numbers are my fault. I edited my answer a few times. And lemmy reportedly counts every edit as an additional comment. (When user and community are on different instances.) I hope they fix that soon.
ahh makes sense, i just made a post and deleted the comment i made on it but it glitched and deleted twice so now my post has -1 comments lmao