There is an interesting study (May 2024), also linked in the article: When Online Content Disappears

Historians of the future may struggle to understand fully how we lived our lives in the early 21st Century. That’s because of a potentially history-deleting combination of how we live our lives digitally – and a paucity of official efforts to archive the world’s information as it’s produced these days.

However, an informal group of organisations are pushing back against the forces of digital entropy – many of them operated by volunteers with little institutional support. None is more synonymous with the fight to save the web than the Internet Archive, an American non-profit based in San Francisco, started in 1996 as a passion project by internet pioneer Brewster Kahl. The organisation has embarked what may be the most ambitious digital archiving project of all time, gathering 866 billion web pages, 44 million books, 10.6 million videos of films and television programmes and more. Housed in a handful of data centres scattered across the world, the collections of the Internet Archive and a few similar groups are the only things standing in the way of digital oblivion.

“The risks are manifold. Not just that technology may fail, but that certainly happens. But more important, that institutions fail, or companies go out of business. News organisations are gobbled up by other news organisations, or more and more frequently, they’re shut down,” says Mark Graham, director of the Internet Archive’s Wayback Machine, a tool that collects and stores snapshots of websites for posterity. There are numerous incentives to put content online, he says, but there’s little pushing companies to maintain it over the long term.

Despite the Internet Archive’s achievements thus far, the organisation and others like it face financial threats, technical challenges, cyberattacks and legal battles from businesses who dislike the idea of freely available copies of their intellectual property. And as recent court losses show, the project of saving the internet could be just as fleeting as the content it’s trying to protect.

“More and more of our intellectual endeavours, more of our entertainment, more of our news, and more of our conversations exist only in a digital environment,” Graham says. “That environment is inherently fragile.”

  • AbbieAbbie@beehaw.org
    link
    fedilink
    arrow-up
    13
    ·
    3 months ago

    The key is going to be how it’s curated.

    I remember reading a while back that some moron dropped his shoe down a well in the 1400s, and when they found it 600 years later they put it in a museum because so few things from that time have survived.

    Historians of the future are going to have the opposite problem. We have massive amounts of digital pornography and pictures of cats, the landfills have millions of Styrofoam cups and plastic spoons, and someone will have to pick through that mess and decide what mattered and what didn’t.

    • DdCno1@beehaw.org
      link
      fedilink
      arrow-up
      9
      ·
      edit-2
      3 months ago

      The thing is, this pornography and cats will tell future historians a ton about what people were like in our times. Not all of it will be accurate, but that’s an issue with any primary source. Hell, watch some grainy smut from the '70s or '80s and pay attention to things other than the “action”, like the choice of music, the way the actors are talking, how they are dressed, what the sets look like, what kind of excuses for plots are being used, all of which are clearly products of their time. Amateur stuff is even more illuminating. Before anyone thinks I’m overthinking this: We learned a lot about Ancient Rome from the smut Romans carved into buildings in Pompeii.

      It’s the same with old cat pictures. You can reasonably date many of them by what the background looks like, e.g. what kind of electronics and furniture are present, how people who are also in the photos are dressed, image quality (provided it hasn’t been compressed to hell and back since), etc. These kinds of seemingly inconsequential artifacts of our time will be highly illuminating to future historians (provided they are being preserved), just like the complaint letters ol’ Ea Nasir received thousands of years ago.

    • t3rmit3@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      3 months ago

      massive amounts of digital pornography and pictures of cats, the landfills have millions of Styrofoam cups and plastic spoons, and someone will have to pick through that mess and decide what mattered and what didn’t.

      I have bad news for you…

  • Lime Buzz@beehaw.org
    link
    fedilink
    arrow-up
    13
    ·
    edit-2
    3 months ago

    The internet archive is a good idea.

    In practise it has a lot of flaws like enabling harrasment and doxxing etc of individuals, as they take very little down, even if it would lead to abuse.

  • intensely_human@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    3 months ago

    The Internet Archive can only preserve history if it becomes decentralized. As long as it’s controlled by a single organization, whoever controls the present controls the past.