• nick@midwest.social
    link
    fedilink
    arrow-up
    42
    ·
    3 months ago

    Just had to restart our main MySQL instance today. Had to do it at 6am since that’s the lowest traffic point, and boy howdy this resonates.

    2 solid minutes of the stack throwing 500 errors until the db was back up.

    • xmunk@sh.itjust.works
      link
      fedilink
      arrow-up
      20
      ·
      3 months ago

      If you have the bandwidth… it is absolutely worth it to invest in a maintenance mode for your system, just check some flat file on disk for a flag before loading up a router or anything and then, if it’s engaged, just send back a static html file with ye olde “under construction” picture.

      • nick@midwest.social
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        3 months ago

        That’s not really… possible at this point. We have thousands of customers (some very large ones, like A——n and G—-e and Wal___t) with tens or hundreds of millions of users, and even at lowest traffic periods do 60k+ queries per second.

        This is the same MySQL instance I wrote about a while ago that hit the 16TiB table size limit (due to ext4 file system limitations) and caused a massive outage; worst I’ve been involved in during my 26 year career.

        Every day I am shocked at our scale, considering my company is only like 90 engineers.