OpenAI just admitted it can’t identify AI-generated text. That’s bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

  • Hamartiogonic@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Text written before 2023 is going be exceptionally valuable because that way we can be reasonably sure it wasn’t contaminated by an LLM.

    This reminds me of some research institutions pulling up sunken ships so that they can harvest the steel and use it to build sensitive instruments. You see, before the nuclear tests there was hardly any radiation anywhere. However, after America and the Soviet Union started nuking stuff like there’s no tomorrow, pretty much all steel on Earth has been a little bit contaminated. Not a big issue for normal people, but scientists building super sensitive equipment certainly notice the difference between pre-nuclear and post-nuclear steel

  • BackupRainDancer@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 year ago

    Predictable issue if you knew the fundamental technology that goes into these models. Hell it should have been obvious it was headed this way to the layperson once they saw the videos and heard the audio.

    We’re less sensitive to patterns in massive data, the point at which we cant tell fact from ai fiction from the content is before these machines can’t tell. Good luck with the FB aunt’s.

    GANs final goal is to develop content that is indistinguishable… Are we surprised?

    Edit since the person below me made a great point. GANs may be limited but there’s nothing that says you can’t setup a generator and detector llm with the distinct intent to make detectors and generators for the sole purpose of improving the generator.

    • throwsbooks@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      For laymen who might not know how GANs work:

      Two AI are developed at the same time. One that generates and one that discriminates. The generator creates a dataset, it gets mixed in with some real data, then that all of that gets fed into the discriminator whose job is to say “fake or not”.

      Both AI get better at what they do over time. This arms race creates more convincing generated data over time. You know your generator has reached peak performance when its twin discriminator has a 50/50 success rate. It’s just guessing at that point.

      There literally cannot be a better AI than the twin discriminator at detecting that generator’s work. So anyone trying to make tools to detect chatGPT’s writing is going to have a very hard time of it.