I know there’s been a few posts about this before, but it’s been a month since the last one and it’s still ongoing. It doesn’t seem that any of the LW admins responded to Zag’s post on their help community, and the last response from lodion/Nath I’m aware of was from 4 months ago when there were outright federation failures as opposed to just lengthy delays.

@Nothing4You@programming.dev posted a comment on the post from last month about the delays stating that it’s an issue on our end as our server isn’t keeping up. I’m not sure whether this is the case or not, and I’m not sure how to interpret the Grafana dashboard they linked to, but as it’s a new reply on an old post, I wanted to note it.

Current federation delays seem to be around 7 days. It doesn’t seem to be affecting posts themselves on Lemmy.world communities, but does affect all replies to them (even from users on other instances), and all upvotes on the posts. [Edit: on further investigation, this isn’t the case. The current delays are at least 13 days, and this does actually affect posts too]

I don’t want to sound too pushy, since the LW admins and Lodion/Nath are all volunteers, but I was hoping we might be able to get an update on what the cause is, and if it’s an issue in Lemmy itself, if anybody’s opened an issue on GitHub and the developers are aware.

(NB: I don’t interact that much with LW, so all of my testing has been on the Boost for Lemmy community.)

  • Baku@aussie.zoneOP
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    6 months ago

    Please ELI5: How does latency alone (319ms from Helsinki to Sydney, apparently) cause week long response times?

    • Nothing4You@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      lemmy’s current federation implementation works with a sending queue, so it stores a list of activities to be sent in its database. there is a worker running for each linked instance checking if an activity should be sent to that instance, and if it should, then send it. due to how this is currently implemented, this is always only sending a single activity at a time, waiting for this activity to be successfully sent (or rejected), then sending the next one.

      an activity is any federation message when an instance informs another instance about something happening. this includes posts, comments, votes, reports, private messages, moderation actions, and a few others.

      let’s assume an activity is generated on lemmy.world every second. now every second this worker will send this activity from helsinki to sydney and wait for the response, then wait for the next activity to be available. to simplify things, i’ll skip processing time in this example and just work with raw latency, based on the number you provided. now lemmy.world has to send an activity to sydney. this takes approximately 160ms. aussie.zone immediately responds, which takes 160ms for the response to get back to helsinki. in sum this means the entire process took 320ms. as long as only one activity is generated per second, this is easy to keep up with. still assuming there is no other time needed for any processing, this means about 3.125 activities can be transmitted from lemmy.world to aussie.zone on average.

      the real activity generation rate on lemmy.world is quite a bit higher than 3.125 activities per second, and in reality there are also other things that take up some time during this process. over the last 7 days, lemmy.world had an average activity generation rate of about 5.45 activities per second. it is important to note here that not all activities generated on an instance will be sent to all other linked instance, so this isn’t a reliable number of how many activities are actually supposed to be sent to aussie.zone every second, rather an upper limit. for example, for content in a community, lemmy will only send these activities to other instances that have at least one subscriber on the remote instance. although only a fraction of the activities, private messages are another example of an activity that is only sent to a single linked instance.

      to answer the original question: the week of delay is simply built up over time, as the amount of lag just keeps growing.

      additionally, lemmy also discards its queued activities that are older than a week once a week, so if you go over 7 days of lag for too long you will start completely missing activities that were over the limit. as previously explained, this can be any kind of federated content. it can be posts, comments, votes, which are usually not that important, but it can also affect private messages, which are then just lost without the sender ever knowing.