28 January 2014
One week number of http hits: 75,233
Number of those that are from bots: 30,406
(using the extremely scientific method of matching a regex of “bot”)
Number of the bot hits that are in the pipermail (mailman) directory: 10,592
In other words, 1/7 of this one computer’s web bandwidth is being used for bots to crawl mailing lists each week.
To which I say: really? Seriously?
You have all the freakin’ bandwidth in the world (we do not), and that’s how you use it? To see if a post from 2005 has changed? (one of the requested URLs is: /pipermail/conspire/2005-August/001382.html)
Are you freaking mental?
What the hell does that tell you this week that it did not last week?
Anyone want to guess who the biggest offender is?
sudo grep pipermail.*Googlebot /var/log/apache2/access.log.1 | wc -l
6,787 hits (a little less than half the total hits from said bot, fwiw)
For a handful of lists that each have, at most, a handful of posts each week.
How many Ph.D.s do you have working there?
And they haven’t figured this out yet?
And you pay them how much?
Well, okay, less than you would have had not CEOs conspired to suppress engineering salaries. Thanks, guys. Thanks a fucking lot.
That’s almost 10% of the hits of this web server last week. For nothing of any credible gain.
And yeah, I can (and will) use a robots.txt, but I shouldn’t have to to say not to waste bandwidth for a thousand hits a day on ancient history that isn’t mutable. Spot checking would be far saner.