this post was submitted on 26 Jul 2025
243 points (98.4% liked)

Linux

8652 readers
472 users here now

A community for everything relating to the GNU/Linux operating system (except the memes!)

Also, check out:

Original icon base courtesy of lewing@isc.tamu.edu and The GIMP

founded 2 years ago
MODERATORS
 

This is very exciting. Here is the APK I downloaded. And the associated discussion.

It even already seems to support stylus input which is very exciting seeing as there has been talk of porting RNote to Android.

you are viewing a single comment's thread
view the rest of the comments
[–] solardirus@slrpnk.net 4 points 1 day ago (2 children)

Tarpits suck. Not worth the implementation or overhead. Instead the better strat is to pretend the server is down with a 503 code or that the url is onvalid with a 404 code so the bots stop clinging to your content.

Also we already have non-PoW captchas that dont require javascript. See: go-away for these implemwntations

[–] possiblylinux127@lemmy.zip 1 points 18 hours ago (1 children)
[–] solardirus@slrpnk.net 1 points 8 hours ago* (last edited 7 hours ago)

It's actually not that hard. Most of these bots are using a predictable scheme of headless browsers with no js or minimal js rendering to scrape the web page. Fully deployed browser instances are demonstrably harder to scale and basically impossible to detect without behavioral pattern detection or sophisticated captchas that also cause friction to users.

The problem with bots has never rested solely on detectability. It's about:

A. How much you inconvenience the user to detect them

B. Impacting good or acceptable bots like archival, curl, custom search tools, and loads of other totally benign use cases.

[–] sxan@midwest.social 1 points 23 hours ago (1 children)

There is negligible server overhead for a tarpit. It can be merely a script that listens on a socket and never replies, or it can reply with markov-generated html with a few characters a second, taking minutes to load a full page. This has almost no overhead. Implementation is adding a link to your page headers and running the script. It's not exactly rocket science.

Which part of that is overhead, or difficult?

[–] solardirus@slrpnk.net 1 points 8 hours ago* (last edited 7 hours ago)

It certainly is not negligble compared to static site delivery which can breezily be cached compared to on-the-fly tarpits. Even traditional static sites are getting their asses kicked sometimes by these bots. And yoy want to make that worse by having the server generate text with markov chains for each request? The point for most is reducing the sheer bandwidth and cpu cycles being eating up by these bots hitting every endpoint.

Many of these bots are designed to stop hitting endpoints when they return codes that signal they've flattened it.

Tarpits only make sense from the perspective of someone trying to cause monetary harm to an otherwise uncaring VC funded mob with nigh endless amounts of cache to burn. Chances are your middling attempt at causing them friction isn't going to, alone, actually get them to leave you.

Meanwhile you burn significant amounts of resources and traffic is still stalled for normal users. This is not the kind of method a server operator actually wanting a dependable service is deploying to try to get up and running gain. You want the bots to hit nothing even slightly expensive (read: preferably something minimal you can cache or mostly cache) and to never come back.

A compromise between these two things is what Anubis is doing. It inflicts maximum pain (on those attempting to bypass it - otheriwse it just fails) for minimal cost by creating a small seed (more trivial than even a markov chain -- it's literally just an sha256) that a client then has to solve a challenge based on. It's nice, but certainly not my preference: I like go-away because it leverages browser apis these headless agents dont use (and subsequnetly let's js-less browsers work) in this kind of field of problems. Then, if you have a record of known misbehavers (their ip ranges, etc), or some other scheme to keeo track of failed challeneges, you hit them with fake server down errors.

Markov chains and slow loading sites are costing you material just to cost them more material.