• Sabata@ani.social
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    4 months ago

    Some apps allow you to offload to GPU, and CPU while loading the active part of the model. I have a an old SSD that give me 500gb of “usable” ram set up as swap.

    It is horrendously slow and pointless but you can do it. I got about 2 tokens in 10 minutes before I gave up on a 70b model on a 1080 ti.

    • AeonFelis@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 months ago

      Even if they used more powerful hardware than you, the model they ran is still almost 6 times bigger - so if you got two tokens in 10 minutes, one token in 30 minutes for them sounds plausible.

      • Sabata@ani.social
        link
        fedilink
        arrow-up
        4
        ·
        4 months ago

        I would have to use an entire 1tb drive for swap but I’m sure I could manage 1 token before the heat death of the universe.

        • AeonFelis@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          4 months ago

          I’d worry less about the heat death of the universe and more about your hardware’s heat from all that load.