• Skull giver@popplesburger.hilciferous.nl
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    The fun thing about most of the Fediverse is that all of it is request/response based. You can set up robots.txt and simply not respond to requests if you don’t want certain content to be accepted into certain platforms.

    Nobody does this (yet), but the protocol allows it.

    Maybe someone should write an “ActivityPub firewall” that allows the user to select what servers/software/software versions can query for posts. Banning all unpatched Mastodon servers would’ve prevented a lot of spam, for instance.

    I feel like the Fediverse people who complain about “consent” should probably stop posting to services running blacklist based federation. Just whitelist the servers you trust instead of blacklisting the ones you don’t, that’s the only way you can ever get the kind of consent-based social media with ActivityPub.

    • Zak@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      If I’m reading this comment right, it’s relying on a mistaken understanding of robots.txt. It is not an instruction to the server hosting it not to serve certain robots. It’s actually a request to any robot crawling the site to limit its own behavior. Compliance is 100% voluntary on the part of the robot.

      The ability to deny certain requests from servers that self-report running a version of their software with known vulnerabilities would be useful.

      • Skull giver@popplesburger.hilciferous.nl
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Of course robots.txt is voluntarily, but the scraper that started this round of drama did actually follow robots.txt, so the problem would be solved in this instance.

        For malicious actors, there is no solution, except for whitelisted federation (with authorised fetch and a few other settings) or encryption (i.e. Circles, the social media based on Matrix). Anyone can pretend to be a well-willing Mastodon server and secretly scrape data. There’s little different between someone’s web browser looking through comments on a profile and a bot collecting information. Pay a few dollars and those “browsers” will come from residential ISPs as well. Even Cloudflare currently doesn’t block scrapers anymore if you pay the right service money.

        I’ve considered writing my own “scraper” to generate statistics about Lemmy/Mastodon servers (most active users, voting rings, etc.) but ActivityPub is annoying enough to run that I haven’t made time.

        As for the “firewall”, I’m thinking more broadly here; for example, I’d also include things like DNSBL, authorised fetch for anything claiming to be Mastodon, origin detection to bypass activitypub-proxy, a WAF for detecting and reporting attempts exploitation attempts, possibly something like SpamAssasin integration to reject certain messages, maybe even a “wireshark mode” for debugging Fediverse applications. I think a well-placed, well-optimised middlebox could help reduce the load on smaller instances or even larger ones that see a lot of bot traffic, especially during spam waves like with those Japanese kids we saw last week.