this post was submitted on 17 Aug 2025
698 points (99.7% liked)
Technology
403 readers
419 users here now
Share interesting Technology news and links.
Rules:
- No paywalled sites at all.
- News articles has to be recent, not older than 2 weeks (14 days).
- No videos.
- Post only direct links.
To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:
- Al Jazeera;
- NBC;
- CNBC;
- Substack;
- Tom's Hardware;
- ZDNet;
- TechSpot;
- Ars Technica;
- Vox Media outlets, with exception for Axios;
- Engadget;
- TechCrunch;
- Gizmodo;
- Futurism;
- PCWorld;
- ComputerWorld;
- Mashable;
- Hackaday;
- WCCFTECH;
- Neowin.
More sites will be added to the blacklist as needed.
Encouraged:
- Archive links in the body of the post.
- Linking to the direct source, instead of linking to an article talking about the source.
founded 3 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If the site is getting slowed at times (regardless of whether it is when you scrape), you might want to not scrape at all.
Probably not a good idea to download the whole site, but then that depends upon the site.
As far as I know, the website doesn't have an API but I just download the HTML and format the result with a simple Python script, it makes around 10 to 20 requests, one for each series I'm following at each time.
You can use the cache feature in curl/wget so it does not download the same css, html, twice. Also, can ignore JavaScript, and image files to save on unnecessary requests.
I would reduce the frequency to once every two days to further reduce the impact.
That might/might not be much.
Depends upon the site, I'd say.
e.g. If it's something like Netflix, I wouldn't think much, because they have the means to serve the requests.
But for some PeerTube instance, even a single request seems to be too heavy for them. So if that server does not respond to my request, I usually wait for an hour or so before refreshing the page.