I am using a code-completion model for (will be open sourced very soon).
Qwen2.5-coder 1.5b though tends to repeat what has already been written, or change it slightly. (See the video)
Is this intentional? I am passing the prefix and suffix correctly to ollama, so it knows where it currently is. I’m also trimming the amount of lines it can see, so the time-to-first-token isn’t too long.
Do you have a recommendation for a better code model, better suited for this?
If you want in line completions, you need a model that is trained on “fill in the middle” tasks. On their Huggingface page they even say that this is not supported and needs fine tuning:
We do not recommend using base language models for conversations. Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., or fill in the middle tasks on this model.
A model that can do it is:
- starcoder2
- codegemma
- codellama
Another option is to just use the qwen model, but instead of only adding a few lines let it rewrite the entire function each time.
Have a look at the other comments. Sometimes it does fill in the code correctly, even without any prompting! The template specifically has the fill in the middle part in it.
The ollama site has the template with <fim_prefix> and such.