Fish Speech 1.5, an open source voice cloning TTS that's actually good

hok@lemmy.dbzer0.com · 4 days ago

You are right. Their description of “SOTA Open Source TTS” caused me to assume it was open source, but it’s clear that

This codebase and all models are released under CC-BY-NC-SA-4.0 License.

So, it’s “source available” and not released under a permissive licence.

hok@lemmy.dbzer0.com · edit-2 5 days ago

I followed their instructions here: https://speech.fish.audio/

I am using the locally-run API server to do inference: https://speech.fish.audio/inference/#http-api-inference

I don’t know about other ways. To be clear, this is not (necessarily) an LLM, it’s just for speech synthesis, so you don’t run it on ollama. That said I think it does technically use Llama under the hood since there are two models, one for encoding text and the other for decoding to audio. Honestly the paper is terrible but it explains the architecture somewhat: https://arxiv.org/pdf/2411.01156

hok@lemmy.dbzer0.com · 5 days ago

Fish Speech 1.5, an open source voice cloning TTS that's actually good

hok@lemmy.dbzer0.com · 6 days ago

On Lemmy, everything is a bit leftist at the moment.

hok@lemmy.dbzer0.com · 3 months ago

What models can we use for img2img today?