Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped
Probably not. An enormous amount of publicly availablr data on a single instance, like with bluesky, is an AI scraper’s wet dream.
The fediverse, in contrast, has much fewer people spread around perhaps HUNDREDS of instances. That’s a much less appealing effort to reward ratio for the scrapers…
I see. Probably mastodon.social gets scraped, then 🫣