this post was submitted on 26 Jun 2023
22 points (95.8% liked)

Fediverse

18546 readers
4 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS
 

I recently came across a torrent that seems to be an archive of Reddit. It got me thinking if it would be possible to make it locally browsable. However, I also considered the possibility that someone might have already addressed this by creating a public Lemmy instance, enabling the content to be accessible from any federated instance.

top 10 comments
sorted by: hot top controversial new old
[–] qprimed@lemmy.ml 13 points 2 years ago

I can actually see some merit to a lemmy API accessible reddit corpus. it would be interesting to reference old reddit info in a lemmy compatible way with zero reference to reddit itself.

doing so for the entire corpus properly (link fixups, etc) would be... challenging, but doable.

[–] nydas@lemmy.world 8 points 2 years ago

Letting the LLMs source from their for free, completely invalidating the proposed licensing model at Reddit.

[–] graphito@beehaw.org 6 points 2 years ago (1 children)
[–] vamp07@lemm.ee 5 points 2 years ago

I’m at a loss as to why anybody wouldnwant this in the first place.

[–] RagingNerdoholic@lemmy.ca 1 points 2 years ago* (last edited 2 years ago) (1 children)

Am I really going to buy a 2TB drive to hold all of reddit...

Actually, I'm pretty surprised that it's only 2TB.

Edit: and it looks like it's only captured data up until about six months ago.

[–] InternetPirate@lemmy.fmhy.ml 2 points 2 years ago (2 children)

It would be helpful if there were an instance that migrated all of this to Lemmy so that we could access it from any other instance, instead of having to download it for local browsing.

[–] taladar@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago) (1 children)

I don't really see how this would be useful. Having purely archived data available in a software designed to show you new posts feels like a format/content mismatch.

[–] InternetPirate@lemmy.fmhy.ml 1 points 2 years ago

I find Reddit more useful because of all the data it has than because of the new posts. I'm sure I'm not the only one.

[–] RagingNerdoholic@lemmy.ca 1 points 2 years ago

I haven't downloaded it. Looks like a collection of compressed files, but I don't know exactly what's inside of them. Do you know what format they're in?

load more comments
view more: next ›