Referrer and Comment spammers are a PITA.

This shouldn’t be news to anyone – but Referrer and Comment spammers are a real pain in the a*se.  Polluting my web logs and making any meaningful log analysis problematic.

So, I now have an itch to scratch and I’m going to do something about it. I would encourage you, the reader, to do something about it too.

Firstly, get yourself over to Project Honey Pot and read up on the project.  If you can, set up a Honey Pot or two yourself. Also be sure to read about the http:BL – this works along similar lines to the DNS blacklists used for Email spammers.

Next, I’m going to write a general Apache mod_perl module which will provide integration (lookup) to the http:BL and allow the user to “action”* the abusers.  Minimally, it will prevent the normal apache log files from being polluted by diverting the log entries to a httpbl logfile.

* “action” – To provide flexibility, I’m thinking of running an external script with the IP of the abuser.  The script can then perform any action you wish. The one I’m going for is an iptables firewall block.

Comments and suggestions welcome.

Project Honey Pot has implementations for several languages, including PHP and Perl (the languages that mean most to me).  There may be an implementation for your Web application so you might not be interested in what I’m doing at all 🙂

Project Honey Pot Logo

3 Replies to “Referrer and Comment spammers are a PITA.”

  1. Oh the irony…. I approved this “spam” comment to prove the point (after editing “jade’s” link to not work).

    Posting IP: 122.36.165.202

    My module log shows:
    [Tue Nov 30 05:50:13 2010] HTTPBL: 122.36.165.202 (8) “127.22.8.1” “www.pgregg.com” “/mt/mt-comments.cgi” “http://pgregg.com/blog/2010/11/referrer-and-comment-spammers-are-a-pita.html” [HTTPBL:8]

    This would have been caught, and when I get the firewalling going, stopped.

  2. I have victory in phase 1 and 2.

    Phase 1: Implement a http:BL module to identify the “bad guy” and divert the log entry into a httpbl.log instead of access_log

    Phase 2: I wanted to automatically firewall the malicious IP address, but the apache UID is unable to use iptables (it would seem prudent for iptables to allow a specific CHAIN to be created and permit another UID to control it – shame noone has done it yet). My options then became some form of message passing to a root daemon, or suid-root. I was happy with neither, so implemented an apache level firewall via another mod_perl plugin 🙂

    Logs:
    ==> httpbl.log <== [Tue Nov 30 23:36:22 2010] httpBL: 113.22.131.111 (13) "127.48.15.1" "www.pgregg.com" "/projects/php/preg_find/preg_find.phps" "http://www.dslreports.com/forum/r19430990-PHP-link-generator" [HTTPBL:13] [Tue Nov 30 23:36:23 2010] httpBL: 113.22.131.111 (13) "127.48.15.1" "www.pgregg.com" "/favicon.ico" "" [HTTPBL:13] [Tue Nov 30 23:36:24 2010] httpBL: 113.22.131.111 (13) "127.48.15.1" "-" "-" "-" [HTTPBL:13] ==> error_log <== IP 113.22.131.111 is blocked ==> httpbl.log <== [Tue Nov 30 23:36:26 2010] httpBL: 113.22.131.111 (13) "127.48.15.1" "-" "-" "-" [HTTPBL:13] Notice that none of these made it to the normal apache access_log. You also tend to get 3-4 simultaneous connections from clients so it is possible that you don't have time to implement the firewall since the other connections are running in parallel (and the firewall plugin happens right at the connection handling stage). However, here we can see the firewall kick in and catch the last one. This IP will now be firewalled for (13) days (the score) after which time this firewall will be removed (and can be recreated by the logging plugin if necessary).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

All content © Paul Gregg, 1994 - 2024
This site http://pgregg.com has been online since 5th October 2000
Previous websites live at various URLs since 1994