is the 4k context length of llama2 for real?

actually-a-cat@sh.itjust.works · 1 year ago

is the 4k context length of llama2 for real?

h3ndrik@feddit.de · edit-2 1 year ago

Sorry, didn’t find it. If i remember correctly it was either for using models where the foundation model was trained to fewer (2048?) tokens. Or for the measurement/benchmark being too ‘synthetic’ / not meaningful for real-world scenarios or something.

I read this: https://www.reddit.com/r/LocalLLaMA/comments/155vy0k/llama_2_too_repetitive/ (And maybe also related to this topic: https://arize.com/blog/lost-in-the-middle-how-language-models-use-long-contexts-paper-reading/ and https://github.com/THUDM/LongBench )

Also: I’ve played around a bit with llama. I haven’t had good results with summarizing things whatsoever. Maybe it’s not the context length, but the wrong model for the task? Aren’t there other language models out there, specifically suited for the task of summarization? Llama is kind of generalist and maybe just not exceptionally good at this specific task.

https://huggingface.co/learn/nlp-course/chapter7/5?fw=tf#models-for-text-summarization and https://www.width.ai/post/bart-text-summarization

Regarding the original question: I’m not sure whether KoboldCPP does it correctly for the newer 4k context length. For me it says Using automatic RoPE scaling (scale:1.000, base:32000.0) But is that the correct base value? That’s the same as if i were using an LLaMA1 model with artificially increased context length.

actually-a-cat@sh.itjust.works · edit-2 1 year ago

You are supposed to manually set scale to 1.0 and base to 10000 when using llama 2 with 4096 context. The automatic scaling assumes the model was trained for 2048. Though as I say in the OP, that still doesn’t work, at least with this particular fine tune.

h3ndrik@feddit.de · 1 year ago

Aah. Thank you very much. I’ve been wondering what KopoldCPP is supposed to to in this case. I don’t think this is mentioned in the documentation. (At least not when i first tried it.)