this post was submitted on 26 Jul 2025
243 points (98.4% liked)
Linux
8652 readers
472 users here now
A community for everything relating to the GNU/Linux operating system (except the memes!)
Also, check out:
Original icon base courtesy of lewing@isc.tamu.edu and The GIMP
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There is negligible server overhead for a tarpit. It can be merely a script that listens on a socket and never replies, or it can reply with markov-generated html with a few characters a second, taking minutes to load a full page. This has almost no overhead. Implementation is adding a link to your page headers and running the script. It's not exactly rocket science.
Which part of that is overhead, or difficult?
It certainly is not negligble compared to static site delivery which can breezily be cached compared to on-the-fly tarpits. Even traditional static sites are getting their asses kicked sometimes by these bots. And yoy want to make that worse by having the server generate text with markov chains for each request? The point for most is reducing the sheer bandwidth and cpu cycles being eating up by these bots hitting every endpoint.
Many of these bots are designed to stop hitting endpoints when they return codes that signal they've flattened it.
Tarpits only make sense from the perspective of someone trying to cause monetary harm to an otherwise uncaring VC funded mob with nigh endless amounts of cache to burn. Chances are your middling attempt at causing them friction isn't going to, alone, actually get them to leave you.
Meanwhile you burn significant amounts of resources and traffic is still stalled for normal users. This is not the kind of method a server operator actually wanting a dependable service is deploying to try to get up and running gain. You want the bots to hit nothing even slightly expensive (read: preferably something minimal you can cache or mostly cache) and to never come back.
A compromise between these two things is what Anubis is doing. It inflicts maximum pain (on those attempting to bypass it - otheriwse it just fails) for minimal cost by creating a small seed (more trivial than even a markov chain -- it's literally just an sha256) that a client then has to solve a challenge based on. It's nice, but certainly not my preference: I like go-away because it leverages browser apis these headless agents dont use (and subsequnetly let's js-less browsers work) in this kind of field of problems. Then, if you have a record of known misbehavers (their ip ranges, etc), or some other scheme to keeo track of failed challeneges, you hit them with fake server down errors.
Markov chains and slow loading sites are costing you material just to cost them more material.