Technology

403 readers

425 users here now

Share interesting Technology news and links.

Rules:

No paywalled sites at all.
News articles has to be recent, not older than 2 weeks (14 days).
No videos.
Post only direct links.

To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:

Al Jazeera;
NBC;
CNBC;
Substack;
Tom's Hardware;
ZDNet;
TechSpot;
Ars Technica;
Vox Media outlets, with exception for Axios;
Engadget;
TechCrunch;
Gizmodo;
Futurism;
PCWorld;
ComputerWorld;
Mashable;
Hackaday;
WCCFTECH;
Neowin.

More sites will be added to the blacklist as needed.

Encouraged:

Archive links in the body of the post.
Linking to the direct source, instead of linking to an article talking about the source.

founded 3 months ago

MODERATORS

Pro@programming.dev

698

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges. (i.imgur.com)

submitted 2 days ago* (last edited 1 day ago) by Pro@programming.dev to c/Technology@programming.dev

112 comments fedilink hide all child comments

Comments

Lemmy;
Hackernews.

Source.

you are viewing a single comment's thread
view the rest of the comments

[–] tal@lemmy.today 20 points 1 day ago (1 children)

If someone just wants to download code from Codeberg for training, it seems like it'd be way more efficient to just clone the git repositories or even just download tarballs of the most-recent releases for software hosted on Codeberg than to even touch the Web UI at all.

I mean, maybe you need the Web UI to get a list of git repos, but I'd think that that'd be about it.

[–] witten@lemmy.world 27 points 1 day ago (1 children)

Then they'd have to bother understanding the content and downloading it as appropriate. And you'd think if anyone could understand and parse websites in realtime to make download decisions, it be giant AI companies. But ironically they're only interested in hoovering up everything as plain web pages to feed into their raw training data.

[–] Natanael@infosec.pub 17 points 1 day ago

The same morons scrape Wikipedia instead of downloading the archive files which trivially can be rendered as web pages locally