It turns out, if people in an online community really don't like what you're doing, they can turn to harassment, threats, or worse to try to shut you down.
It Has Been X Days Since a Techbro Asshole Released a Fedi Scraper/Indexer.
There is an extreme amount of hostility from a certain segment of the (mostly Mastodon-using) Fediverse community toward anything that does anything with Fediverse content “without consent”. Trouble is, there’s no machine-readable mechanism for determining what people have consented to in most cases, and certainly no standard for it.
If your computer sends my computer an image and some text via ActivityPub, without any further communication, may I…
Put it on a website visible to the public?
Send it to other peoples’ computers to do the same with? Same as above.
Search for it later?
Display it next to advertisements?
Display it on a service I charge people a fee to use?
Keep it after your computer asks mine to delete it?
Some of those things are what Mastodon does normally, but could be understood as copyright violations because the protocol doesn’t transmit any licensing information. Others, like search indexing are almost certainly legal, and the protocol is silent about them, but a few people will get very angry at anyone who visibly handles them differently from Mastodon. Meanwhile, how many people are quietly running servers with search indexes that aren’t even aware of Mastodon’s new opt-in/out search features?
Pixelfed has started attaching licenses to content, but I think we might need more sophisticated, machine-readable licenses.
If your computer sends my computer an image and some text via [email], without any further communication, may I…
Isn’t the answer just the same if you consider it as email? I mean ActivityPub is basically just email but with “social media” features. Surely lawyers already have answers to the question when it comes to email.
If I send an email to the whole world, what is anyone allowed to do with it?
In some ways, I feel like ActivityPub is just public. It’s not reasonable to be able to enforce any license, so it may as well just be considered public domain. But IANAL.
If you send me an image by email and I display it on a website without permission, I am violating your copyright. If we apply the same thinking to ActivityPub, then most implementations of it are illegal. Fortunately, judges usually have enough common sense to step in and say a reasonable server admin would reasonably believe they have permission to do the things the popular software actually does.
On the other hand, if someone takes photos I’ve shared on Mastodon and sells prints of them or licenses them to a stock photo agency, they’re definitely violating my copyright, and I will sue them. Some of the other options like running ads on a server are a little more ambiguous.
Some of the other expectations people seem to have aren’t based on law but still-evolving concepts of consent. It would be nice to be able to program systems that have some awareness of what people are OK with.
It’s almost like those websites that say “when you upload your content we can do what we want with it” did that for a good reason: to avoid all this complexity and possible lawsuit.
If you send me an image by email and I display it on a website without permission, I am violating your copyright.
Unless the image is already copyrighted, it takes publishing to provide a claim of copyright. Is email publishing? What if it’s a listserv with 300 recipients?
In the 181 countries party to the Berne Convention, the image is copyrighted as soon as it is recorded to a physical medium. Yes, that includes a memory card, hard drive, etc…
I remember some of these discussions around the time of the Twitter and Reddit exodii and the mindset of many of these folks was essentially that they’d used this social media protocol to create a nice, quiet safe space for like-minded tech-savvy queer leftists, and felt that the explosion in interest threatened to expose their posts to people outside of the community that they had come to know and trust – which is a point of view I can understand, but as a counterargument, you’re on a public social media platform. If you wanted to keep things out of the view of the larger Internet there were other, better solutions for a community platform that you probably should have picked instead.
I’ll refrain from writing the uncharitable version of my reaction to the idea that the Fediverse should be some small, close-knit community forever and instead say that people who want small, close-knit communities based on ActivityPub are free to create them. Mastodon and other major server software supports allowlist-only federation.
People using servers with open federation should expect that their posts will reach an ad-hoc, informally-specified, bug-ridden, slow implementation of half of ActivityPub running on a jailbroken smart light bulb, and that it will behave differently from vanilla Mastodon.
That’s where my frustrations with “consent fedi” come into play. They want to force people to comply with their views. They could go into allow list federation and connect with those that views thing similarly yet they don’t.
The fun thing about most of the Fediverse is that all of it is request/response based. You can set up robots.txt and simply not respond to requests if you don’t want certain content to be accepted into certain platforms.
Nobody does this (yet), but the protocol allows it.
Maybe someone should write an “ActivityPub firewall” that allows the user to select what servers/software/software versions can query for posts. Banning all unpatched Mastodon servers would’ve prevented a lot of spam, for instance.
I feel like the Fediverse people who complain about “consent” should probably stop posting to services running blacklist based federation. Just whitelist the servers you trust instead of blacklisting the ones you don’t, that’s the only way you can ever get the kind of consent-based social media with ActivityPub.
If I’m reading this comment right, it’s relying on a mistaken understanding of robots.txt. It is not an instruction to the server hosting it not to serve certain robots. It’s actually a request to any robot crawling the site to limit its own behavior. Compliance is 100% voluntary on the part of the robot.
The ability to deny certain requests from servers that self-report running a version of their software with known vulnerabilities would be useful.
Of course robots.txt is voluntarily, but the scraper that started this round of drama did actually follow robots.txt, so the problem would be solved in this instance.
For malicious actors, there is no solution, except for whitelisted federation (with authorised fetch and a few other settings) or encryption (i.e. Circles, the social media based on Matrix). Anyone can pretend to be a well-willing Mastodon server and secretly scrape data. There’s little different between someone’s web browser looking through comments on a profile and a bot collecting information. Pay a few dollars and those “browsers” will come from residential ISPs as well. Even Cloudflare currently doesn’t block scrapers anymore if you pay the right service money.
I’ve considered writing my own “scraper” to generate statistics about Lemmy/Mastodon servers (most active users, voting rings, etc.) but ActivityPub is annoying enough to run that I haven’t made time.
As for the “firewall”, I’m thinking more broadly here; for example, I’d also include things like DNSBL, authorised fetch for anything claiming to be Mastodon, origin detection to bypass activitypub-proxy, a WAF for detecting and reporting attempts exploitation attempts, possibly something like SpamAssasin integration to reject certain messages, maybe even a “wireshark mode” for debugging Fediverse applications. I think a well-placed, well-optimised middlebox could help reduce the load on smaller instances or even larger ones that see a lot of bot traffic, especially during spam waves like with those Japanese kids we saw last week.
Somebody put up a site saying
There is an extreme amount of hostility from a certain segment of the (mostly Mastodon-using) Fediverse community toward anything that does anything with Fediverse content “without consent”. Trouble is, there’s no machine-readable mechanism for determining what people have consented to in most cases, and certainly no standard for it.
If your computer sends my computer an image and some text via ActivityPub, without any further communication, may I…
Some of those things are what Mastodon does normally, but could be understood as copyright violations because the protocol doesn’t transmit any licensing information. Others, like search indexing are almost certainly legal, and the protocol is silent about them, but a few people will get very angry at anyone who visibly handles them differently from Mastodon. Meanwhile, how many people are quietly running servers with search indexes that aren’t even aware of Mastodon’s new opt-in/out search features?
Pixelfed has started attaching licenses to content, but I think we might need more sophisticated, machine-readable licenses.
Isn’t the answer just the same if you consider it as email? I mean ActivityPub is basically just email but with “social media” features. Surely lawyers already have answers to the question when it comes to email.
If I send an email to the whole world, what is anyone allowed to do with it?
In some ways, I feel like ActivityPub is just public. It’s not reasonable to be able to enforce any license, so it may as well just be considered public domain. But IANAL.
If you send me an image by email and I display it on a website without permission, I am violating your copyright. If we apply the same thinking to ActivityPub, then most implementations of it are illegal. Fortunately, judges usually have enough common sense to step in and say a reasonable server admin would reasonably believe they have permission to do the things the popular software actually does.
On the other hand, if someone takes photos I’ve shared on Mastodon and sells prints of them or licenses them to a stock photo agency, they’re definitely violating my copyright, and I will sue them. Some of the other options like running ads on a server are a little more ambiguous.
Some of the other expectations people seem to have aren’t based on law but still-evolving concepts of consent. It would be nice to be able to program systems that have some awareness of what people are OK with.
It’s almost like those websites that say “when you upload your content we can do what we want with it” did that for a good reason: to avoid all this complexity and possible lawsuit.
Unless the image is already copyrighted, it takes publishing to provide a claim of copyright. Is email publishing? What if it’s a listserv with 300 recipients?
In the 181 countries party to the Berne Convention, the image is copyrighted as soon as it is recorded to a physical medium. Yes, that includes a memory card, hard drive, etc…
I remember some of these discussions around the time of the Twitter and Reddit exodii and the mindset of many of these folks was essentially that they’d used this social media protocol to create a nice, quiet safe space for like-minded tech-savvy queer leftists, and felt that the explosion in interest threatened to expose their posts to people outside of the community that they had come to know and trust – which is a point of view I can understand, but as a counterargument, you’re on a public social media platform. If you wanted to keep things out of the view of the larger Internet there were other, better solutions for a community platform that you probably should have picked instead.
I’ll refrain from writing the uncharitable version of my reaction to the idea that the Fediverse should be some small, close-knit community forever and instead say that people who want small, close-knit communities based on ActivityPub are free to create them. Mastodon and other major server software supports allowlist-only federation.
People using servers with open federation should expect that their posts will reach an ad-hoc, informally-specified, bug-ridden, slow implementation of half of ActivityPub running on a jailbroken smart light bulb, and that it will behave differently from vanilla Mastodon.
“Running on a jailbroken smart light bulb” might be the best phrase I’ve read all week, congratulations
That’s where my frustrations with “consent fedi” come into play. They want to force people to comply with their views. They could go into allow list federation and connect with those that views thing similarly yet they don’t.
There have been many attempts to add content licencing but some of the devs (lemmy devs) really dont like the idea.
Its already implemented on peertube and pixelfed but lemmy devs have so far refused to add it.
“Refused”, I mean there is an open issue for it. I don’t think most users want to think about licensing when posting their stuff either.
The fun thing about most of the Fediverse is that all of it is request/response based. You can set up robots.txt and simply not respond to requests if you don’t want certain content to be accepted into certain platforms.
Nobody does this (yet), but the protocol allows it.
Maybe someone should write an “ActivityPub firewall” that allows the user to select what servers/software/software versions can query for posts. Banning all unpatched Mastodon servers would’ve prevented a lot of spam, for instance.
I feel like the Fediverse people who complain about “consent” should probably stop posting to services running blacklist based federation. Just whitelist the servers you trust instead of blacklisting the ones you don’t, that’s the only way you can ever get the kind of consent-based social media with ActivityPub.
If I’m reading this comment right, it’s relying on a mistaken understanding of robots.txt. It is not an instruction to the server hosting it not to serve certain robots. It’s actually a request to any robot crawling the site to limit its own behavior. Compliance is 100% voluntary on the part of the robot.
The ability to deny certain requests from servers that self-report running a version of their software with known vulnerabilities would be useful.
Of course robots.txt is voluntarily, but the scraper that started this round of drama did actually follow robots.txt, so the problem would be solved in this instance.
For malicious actors, there is no solution, except for whitelisted federation (with authorised fetch and a few other settings) or encryption (i.e. Circles, the social media based on Matrix). Anyone can pretend to be a well-willing Mastodon server and secretly scrape data. There’s little different between someone’s web browser looking through comments on a profile and a bot collecting information. Pay a few dollars and those “browsers” will come from residential ISPs as well. Even Cloudflare currently doesn’t block scrapers anymore if you pay the right service money.
I’ve considered writing my own “scraper” to generate statistics about Lemmy/Mastodon servers (most active users, voting rings, etc.) but ActivityPub is annoying enough to run that I haven’t made time.
As for the “firewall”, I’m thinking more broadly here; for example, I’d also include things like DNSBL, authorised fetch for anything claiming to be Mastodon, origin detection to bypass activitypub-proxy, a WAF for detecting and reporting attempts exploitation attempts, possibly something like SpamAssasin integration to reject certain messages, maybe even a “wireshark mode” for debugging Fediverse applications. I think a well-placed, well-optimised middlebox could help reduce the load on smaller instances or even larger ones that see a lot of bot traffic, especially during spam waves like with those Japanese kids we saw last week.