• 9 Posts
  • 25 Comments
Joined 7 months ago
cake
Cake day: December 18th, 2023

help-circle


  • It’s steady pressure and it’s only in one direction. Some countries resist more than others. I’m guessing you are not in the EU, because if so, you’d be aware of the “chat control” push.

    Even so, it’s not the days of Napster anymore. Think about hardware DRM. It stops no one but you, too, paid to have it developed and built into your devices. Think about Content ID. That’s not going away. It’s only going to be expanded. That frog will be boiled.

    Recently, intellectual property has been reframed as being about “consensual use of data”. I think this is proving to be very effective. It’s no longer “piracy” or “theft”, it’s a violation of “consent”. The deepfake issue creates a direct link to sexual aggression. One bill in the US, that ostensibly targets deepfakes, would apply to any movie with a sex scene; making sharing it a federal felony.






  • Ich hab ein bisschen nachgedacht, wie man sowas wohl umsetzen könnte. Sagt mir falls ich was übersehen habe.

    Mit der Chatkontrolle soll der Tausch von Bildern/Videos/etc aufgedeckt werden, sowie “Grooming” (Kontaktaufnahme mit dem Ziel Missbrauch zu begehen). Es wird zwar immer über Kinder geredet, aber gemeint sind alle Menschen unter 18 Jahren. Die Definition von “Kinderpornografie” ist extrem weitreichend. Die Originalfassung von Die Blechtrommel müsste jetzt eigentlich auch verboten sein. Muss ich mal nachschlagen.

    In Deutschland gibt es noch ein bisschen eine Ausnahme für “Sexting”. Aber wenn Nudes irgendwie in der Klasse umgehen, oder in Internetforen landen, ist das sofort strafbar. Das heißt, alle Flirt- und Ficknachrichten, alle Nacktselfies, von/mit Minderjährigen sind erstmal verdächtig.

    Die Behörden müssen eigentlich bei jedem Verdachtsfall erstmal das Alter und die Hintergründe klären. Das ist logistisch herausfordernd.

    Man müsste es so machen: Alle Minderjährigen (oder so Aussehenden), die durch Sexting auffallen, kommen in eine biometrische Datenbank. Die Besitzer von Handys und ihr Alter sind den Behörden bekannt. Solange also eine Beziehung hält, kann man neues Material, das da hin und hergeschickt wird, ignorieren. Wenn man erst so eine Datenbank hat, von allen online-sexuell Aktiven, dann kann man sich auf ungewöhnliche Muster konzentrieren. Man muss natürlich nachwachsende Pubertierende eintragen. Wahrscheinlich wäre es am einfachsten, wenn man die biometrischen Daten schon von vornherein sammelt, zB bei Schulfotos oder schulärztlichen Untersuchungen.

    Ich glaube, es geht nicht mit weniger Aufwand. Oder hat irgendjemand eine Idee?











  • Private ownership ≠ capitalism.

    Right. It’s private ownership of capital; aka the means of production. You’re saying that data should be owned because it can be used productively. That’s exactly capitalism for capitalism’s sake.

    This is a typical economically right-wing approach. There is a problem, so you just create a new kind of property and call it done. The magic of the market takes care of it, or something. I don’t understand why one would expect a different result from trying the same thing.







  • Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

    Hmm. I’m not really sure why anyone would write such a text. There is no “weighted proportionality” (or pathways). Is this a common conception?

    You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

    I guess you picked up on the fact that transformers output a probability distribution. I don’t think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less “creative”. That’s certainly no longer an average.

    You can see a neural net as a kind of regression analysis. I don’t think I have ever heard someone calling that a kind of average, though. I’m also skeptical if you can see a transformer as a regression but I don’t know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.

    The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.


  • I accidentally clicked reply, sorry.

    B) you do know there’s a lot of different definitions of average, right?

    I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.




  • Who exactly creates the image is not the only issue and maybe I gave it too much prominence. Another factor is that the use of copyrighted training data is still being negotiated/litigated in the US. It will help if they tread lightly.

    My opinion is that it has to be legal on first amendment grounds, or more generally freedom of expression. Fair use (a US thing) derives from the 1st amendment, though not exclusively. If AI services can’t be used for creating protected speech, like parody, then this severely limits what the average person can express.

    What worries me is that the major lawsuits involve Big Tech companies. They have an interest in far-reaching IP laws; just not quite far-reaching enough to cut off their R&D.