Stephen King: My Books Were Used to Train AI

L4sBot@lemmy.world · 1 year ago

Stephen King: My Books Were Used to Train AI

sab@lemmy.world · 1 year ago

Humans with imperfect memories being influenced by a work <> AI language models being trained on a work.

P03 Locke@lemmy.dbzer0.com · 1 year ago

You seem to imply that AI has perfect memory. It doesn’t.

Stable Diffusion is a 4GB file of weights. ChatGPT’s model is of a similar size. It is mathematically impossible for it to store the entire internet on a few GBs of data, just like it is physically impossible for one human brain to store the entire internet with its neutral network.

Turun@feddit.de · 1 year ago

GPT3/ChatGPT has 175 Gigaparameters, GPT4 is speculated to have 1.1 Teraparameters. So that means 700GB/4.4TB if stored at full precision.

But the general point remains

sab@lemmy.world · edit-2 1 year ago

But you can easily fit all of Kings work in a 4gb model. Just because it isn’t done in the most popular models, doesn’t make it ethical to do it in the first place.

In my opinion, you should only be able to use a work to train an AI model, if the work is public domain or if you have explicit permission to do so by the license holder. Especially if you then use that model for profit or charge orders to use ie.

P03 Locke@lemmy.dbzer0.com · 1 year ago

But you can easily fit all of Kings work in a 4gb model.

But, uhhhh, they didn’t. They didn’t copy everything, word for word, and put it into a model. That’s not how AI models work.

sab@lemmy.world · edit-2 1 year ago

I didn’t claim it was.

We can discuss technicalities all day long, but that’s so beside the point. Thread OP claimed that creating an LLM based on a copyrighted work is okay, because humans are influenced by other works as well. But a human can’t crank out hundreds of Stephen King-like chapters per hour. Or hundreds of Dali-like paintings pretty minute.

If King or Dali had given permission for their works to be used in this way, it might have been a different story, but as it is, AI models are being trained on (and profit from) huge amounts of data that they did not have permission for.

Edit: nevermind, I think trying to discuss AI ethics with you is pointless. Have a nice weekend!

Turun@feddit.de · 1 year ago

Thread OP here.

Creating an AI that incorporates copyrighted work is indeed OK in my opinion, but I would view fine tuning much more critically. A commercial AI that can accurately reproduce large parts of Kings work would be problematic in my opinion. This is something I think we both agree on.

But my point is that they can’t. His works make up a minuscule fraction of the training set. An AI can not pump out hundreds of Stephen King-like chapters per hour. The same way Stephen King can not write in the style of xzy, only crudely approximate it. So in my opinion it’s a non issue ^^for at least a few more years until AI achieves superhuman performance.^^

Turun@feddit.de · 1 year ago

Sure, if you want to see it like that. But if you try out StableDiffusion, etc you will notice that “imperfect memory” describes the AI as well. You can ask it for famous paintings and it will get the objects and colors generally correct, but only as well as a human artist would. The details will be severely lacking. And that’s the best case scenario for the AI, because famous paintings will be over represented in the training data.