Disclaimer: I wrote this article and made this website.
There was some talk of this issue in the recent fediverse inefficiencies thread. I’m hopeful that in the future we’ll have a decentralized solution for file hosting but for now I deeply believe that users should pay for their own file hosting.
am I the only one who would advocate for text only storage: no. This comment also gets my point? There should be text only lemmy instances which only save text and do not allow any kind of image posting or storage.
- text is less offensive to consume and moderate than evil images or video
- text is way more information dense and can be even compressed more! Truly the green biosphere friendly data format. I would be willing to save text only data of strangers on my hard-drive, but not images or video. Could even be valuable llm analysis training data.
Yes people could post base64 encoded images, but that is a larger technical barrier and can be detected. If image storage is really need, images should be heavily compressed (webp 90% quality loss), provided as links to external sites, and whenever possible svg / vector graphics should be preferred.
We totally need sustainable file hosting. Freedom!
Wait… the fuck did you just upload? Oh god. Oh god no. Do I have to call the cops on you? Oh no. Wait, does this count as possession? FUCK!!!
We need someone else to handle the totally sustainable file hosting. Freedom!
Seems to me that this is a use-case
FreenetHyphanet would be good for, both because it distributes the problem of file storage load and because it eliminates responsibility for each host to police his node by making it impossible for anyone to know which file chunks said node is hosting.Nothing solves the problem of CSAM quite like… making everyone partially culpable in the storage and distribution of CSAM.
You can’t prove I was hosting child porn. Statistically, we all only had a 70% probability of having it on our computers
Stuff that isn’t accessed eventually gets deleted. If the Lemmy instances (which are clearnet, of course) delete the references to it, it would go away.
Which gets back to volunteers going through and moderating it. And the ethical and moral question of whether people who upload it are reported.
And… honestly? if there is even a 20% chance that running a file sharing node (because I just love to give away both bandwidth and storage…) is being used to store CSAM? I ain’t doing that shit and most people will similarly run screaming and call the cops.
Ok, hear me out.
We find the users with the slowest internet and start sending them all the data. They don’t have to keep anything on disk. Then they send it all back and forth between each other. Any time a user makes a request, we just wait for one of the slow nodes to come across the data and send it out.
We use the slowest wires for all the storage. It’s fool proof.
Somebody actually did make this as a joke years ago haha https://github.com/yarrick/pingfs
I was brushing my teeth when reading this comment and inadvertently ended up swallowing all my toothpaste.
don’t forget to spit
Ha! That’s awesome!
Too wet for server racks in the forest.
They grew there
Look I know there called “farms” but like I told the last forest gnome, the dank woods is no place to host data.
Have you considered providing something like this: https://jortage.com/ and maybe contribute to their efforts to develop a specific API for that? Source code is here: https://github.com/jortage
Jortage is a really interesting approach. It definitely helps reduce the impact of the file hosting problem but it doesn’t fully address the underlying cost issue. The cost of storing files grows every month indefinitely while donations typically don’t.
I would like to see a file hosting pool come to lemmy though. So I will look into it. :)
Pict-rs that is used by Lemmy to store images already supports S3 type storage, so in theory it should work with Jortage, but I don’t think anybody has tested that yet. The people behind Feddit.org might have experimented with it as they expressed interest a while back.
Personally I’m in the camp that I want history to be lost. That’s part of the appeal to me. In fact my favorite feature in the fedi is Mastodon’s option to enable auto-deleting posts of a certain age.
Only content that is explicitly pinned or reaches a certain amount of interactions should be saved imo. Since that’s the stuff you’d actually want to preserve rather than the 99% of forgettable content, and it would also drastically cut down on file hosting.
Another thing is that a federation should only act as the exchange between users on ActivityPub. It should only cache relevant information and not be expected to store everything, like I wrote before. The user should be a portable account that is stored on a device. The federation server would sync your account between your devices, but not store it. You send your content to the federation, and then the federation sends it out into the world where they choose to do what they want with it. The federation shouldn’t hoard it indefinitely.
Also this makes sense from a privacy perspective. If you care about privacy, why would you also want all your data indefinitely stored? Unless certain things are relevant and explicitly kept, it should be expected to expire and be lost by default. Where did we get this expectation that data should be stored forever? Also you expect it to be stored forever and not be trained on by AI?
This comment for example, after about a week or two most of the visibility and interaction of it will drop to zero. At that point, this comment should expire and no longer exist. I wrote this comment, it reached some people, and served it’s purpose and should expire. I’m not going to pretend like this comment is some kind of historic document that should be indefinitely preserved, nor do I expect or want it to be.
This comment for example, after about a week or two most of the visibility and interaction of it will drop to zero. At that point, this comment should expire and no longer exist.
That’s an incredible naive and egoistic take. Think about all the knowledge that is getting lost by applying this approach. How many times have you searched for some obscure thing and found the answer only on some five years old reddit post? That information would be lost for ever if you had your way.
I think the massive privacy benefits outweigh things like that, which should be documented properly anyways
Can you judge a work of art by it’s virality? Should you judge by virality?
A lot of times in history artists got recognition they deserved only after their death. When they ware alive they lived in poverty struggling to make ends meet.
There is a lot of internet 1.0 preserved by internet archive that I didn’t get to experience. There are flash games that I would love to preserve and show the next generation.
We wouldn’t have known how Scotts Cawthon games have looked like before he made FNAF if not for the preservation efforts.
Usually those artist did get some recognition during their life, but never got into the main stream. That changed due to the main stream changing and the people who did like the art showing it again. That is actually rather easy to do with something like the Fediverse. It just requires a download option. Especially when everybody is aware, that the content will be deleted, that would be a decent option.
Also a lot of content on social media in general is very short term. Stuff like politcal discussions are fairly useless after a few months in most cases. So that can be deleted without much care and again, if somebody wants to preserve it, they easily can just download it.
IPFS?
as I stated in this comment it’s not really feasible as to ~5s delay that was tested some time ago.
That’s the wrong comment.
What would an IPFS solution look like here? That’s a genuine question. I don’t have much experience with IPFS. It seems like it isn’t really used outside of blockchain applications.
What is stopping some big giant, let’s say Yahoo/Verizon from buying a shitload of storage, starting their own private instance which is open to the public, but private in the sense that only Verizon employees are admins and mods. Only Verizon controls things. Then advertise to the point that the average person on the street knows that Verizon.Lemmy exists, and assosiates Lemmy with being a Verizon thing? What is stopping big tech from pouring the money required for this concept to take off, and using their control over their instance from making the decentralized a centralized service in the general public’s minds?
Right now Lemmy is 60k people. Ok. What if Lemmy was 200 million people, and only 60k knew it was a decentralized service? Everyone else just thought Verizon owned Lemmy?
Either they federate and all their users are exposed to the rest of the fediverse, or they don’t and they may as well be a separate thing
Yeah. What I’m saying is, they federate, but people have no idea what “federate” means. So they’d come here, and see “@smeg@feddit.uk” and not understand what feddit.uk was.
They would see you, and think you are a user of the verizon owned service. Not question it one bit, and just move on thinking it’s all verizon.
The same way people in Atlanta will say “I want a coke” “What kind of coke?” “Root Beer”.
Or the same way parents in the 90s would say “I bought you a Nintendo Game!” then you open it, and it’s a Sega Saturn disc, when you have Sony Playstation. It’s all just a Nintendo to them.
I’m saying if Verizon grew Lemmy to 200 million users, and all except 60k were on the Verizon instance, then despite being incorrect, Lemmy becomes “The Verizon owned Facebook”.
Doesn’t matter that it’s federated.
I guess that’s what instances are trying to avoid by preemptively blocking Threads. If everyone else blocks it then Lemmy carries on existing as it is. And I can’t imagine big corpo wouldn’t want to create their own name.
Is file hosting really a must? I mean Reddit and feddit are basically forums. And not many forums allow file uploads. Also, we should have retention limits. Low value posts are allowed to fade away. High value posts that have some level of interaction stay alive longer.
Reddit is basically entirely image or video posts, all hosted by reddit directly.
I think IPFS could help the fediverse with storage.
To actually keep data persistent on IPFS and not be deleted by the garbage collector, you need to have a server(s) pin the node that holds that data.
You either host these servers yourself, or pay providers to store it for you.
And at that point you just reinvented a server simply hosting your data but with extra steps.
Thank you for pointing that out. I’m not familiar with IPFS but I tend to agree there’s no free lunch here. People think you can wave the blockchain wand and free computing appears but there’s always costs built in somewhere.
Interesting approach, good luck! Admittedly I’m not sure if many users want to take their media uploading in their own hands and pay for it but maybe I’m wrong. Where are the images stored? Do you have your own hardware? Backups etc?
Also since you’re interested in Fediverse media storage, I recently read about https://jortage.com/ It’s a third party storage for your instance with deduplication, pretty interesting idea. Takes away a bit of the federated part though
This feels like something the Fediverse is ultimately going to build for itself. I know jack squat about the details, but it’s gonna have to be a thing eventually, I think.
Is a p2p system for media with the instances just hosting magnet links too slow for fediverse purposes? To me this seems like the most resilient way to handle media in a decentralized system
I wish there was some version of PBS for Lemmy, like public funds for hosting. I’ll admit I haven’t really thought this through, so there’s probably some problems with my idea.
At least as far as US law is concerned, a federally hosted and administrated social media platform gets interesting with America’s unusually strong free speech laws, since there’s content which is legal but unethical which they likely would not be allowed to block or moderate, such as bullying, hate speech, misinformation, etc. but also illegal content would be immediately moderated away, which might include content that falls into legal grey areas or ethical but technically illegal content, like someone copy/pasting the contents of a paywalled article, or discussing any kind of DRM or digital security bypass
Honestly I think there’s good reason for governments to host a Mastodon instance for their representatives to use for communications, but inviting the public to use it might get weird for sure
Oh yeah, I totally agree with you that governments should at least host their own Mastodon instances. I thought it was weird when Twitter became the go to for communication from the US Government.
Thank you for writing this. Small typo: focued (focused).
Thanks for reading and pointing out that typo! (I fixed it)