• simple@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Feels like AI creators can only get away with using pre-2022 data for so long. At some point the information will be outdated and they’ll have to train on newer data, and it’ll be interesting to see if this is a problem that can be solved without harming the dataset’s quality.

    My guess is they’d need to have an AI that tries to find blatantly AI generated data and take it out of the dataset. It won’t be 100% accurate, but it’ll be better than nothing.

    • AggressivelyPassive@feddit.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      I’m surprised, these models don’t have something like a “ground truth layer” by now.

      Given that ChatGPT for example is completely unspecialized, I would have expected that relatively there’s a way to hand encode axiomatic knowledge. Like specialized domain knowledge or even just basic math. Even tieried data (i.e. more/less trusted sources) seem not to be part of the design.

      • Drewelite@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        I think this is something that’s easier said than done. Maybe at our current level, but as these AI get more advanced… What is truth? Sure mathematics seems like an easy target until we consider one of the best use cases for AI could be theory. An AI could have a fresh take on our interpretation of mathematics, where these base level assumptions would actually be a hindrance.

        • AggressivelyPassive@feddit.de
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 year ago

          I mean, let’s be honest here: AI will not be primarily used to find out new truthiness about the universe, but order butter at the right time. Or write basic essays, code, explain known things.

          That kind of knowledge could easily be at categorized.