form spam check class

Formspamcheck PHP Class.

Sept 6th, 2011

The phpList site was quite heavily hit by spam signups, so I decided to investigate what's available on the internet to do something about it. I found three services that can help to fight this kind of activity. I decided to make a class that uses all of them.

You can find the source here: FormspamCheck

The class is documented with phpDocumentor, and you can read it online.

There's no need to use all three services, just one or two can be used as well. When a service it not configured, it won't be used.

Quickest use is like this:

<?php

$honeypotApiKey = 'My API key for Honeypot Project';
$akismetApiKey = 'My API key for Akismet';
$akismetBlogURL = 'http://www.example.com';

include 'formspamcheck.class.php';

$fsc = new FormSpamCheck($honeypotApiKey,$akismetApiKey,$akismetBlogURL);
if ($fsc->isSpam(
  array(
    'username' => 'someusername',
    'email' => ''
  )
)) {
  print "This is spam";
} else {
  print "This is ham";
}


 

Notes:

1. If you want to use this class, make sure to only do your calls on a POST request in your application. I made the mistake initially to do it on every request, causing an overload of the API of stop forum spam. They have a 20.000 limit of calls per day, and by noon I had reached that limit. Instead, when doing the call on a POST, the number of API calls are only a few hundred per day.

2. I've applied this class in both the phpList forums and the phpList Hosted signups. The class has a lot of logging built in, which is then used to graph the activity with munin . If you're interested in the munin plugins as well, let me know.

Here's an example. This graph shows the activity in the forums and the filtering applied by using this class and calling all three services. As you can see over 50% of activity in the forums is spam. 

forum spam

 

3. When using the class, you can do a comparative analysis of the blocking of spam by any of the three services. In my case, StopForumSpam (SFS) filters most. Occasionally Honeypot (HP) and Akismet (AKI) contribute, but most often a hit in SFS has misses in HP and AKI, and only rarely is a hit in HP or AKI a miss in SFS.

However, that may mean that SFS is filtering out more than necessary and is less lenient towards IPs and usernames. For example SFS won't allow anyone to sign up with the name "Ron", which you can verify with this API call. Now, I don't think that's a big problem, they can just register with a different name, but it depends on the context.

But it highlights the need to be specific with the class:

4. In an application you can ask the class whether a request is spam, but then return a few more details.

But whatever you do, it will be best to not make the spammers any wiser as to why you are blocking them.

$class->matchedBy will return which service considered it spam

$class->matchedOn will return what field was used to determine that. This will be "ip", "username", "email" or "unknown". This information is only available from SFS and HP will only check the IP. Akismet does not reveal what they used to determine something is spam (I asked but they won't).

So, if you do this, you may want to do something like this:

if ($fsc->isSpam($data)) {

  switch ($fsc->matchedOn) {

    case 'username': return "that name is already taken, please choose another one";

    case 'email': return "this email has already been registered";

    default: return "error processing your request, please try again later";

  }

}

5. The two main issues with a system like this are false positives and performance. For performance reasons, I've added memcached support and minimize calls to the APIs because each call will cause a delay. I'm tracking the delay and will add a graph for it. It looks like the average is well below half a second, with the occasional one that is more than a second. That is, when using all three services.

If you want to improve performance, use the "isSpam" call without the "checkAll" option. That will minimise the calls to the services. I have the checkAll on, in order to be able to graph a comparison between them.

As for false positives, I've kept an eye on it, and haven't seen any yet. That doesn't meant there aren't any. In many cases, when someone hits a wall, they'll just walk away, and won't tell you about it. I'll try to see if there's a way to measure this.

Results

It looks like in general Stop Forum Spam catches most spam, but Akismet and Honeypot contribute as well, and therefore the combination of the three is the best. 

 

Here's the relevant bit of the graph showing the spam attack on the site. In a few months' time, I'll post an update, which should show a return to the original straight line of signups. This graph shows the registered users in the site.

(you can interpolate the gaps, they are caused by moving the munin server around).

Within the time span of a few weeks, over 10% of registered users were spam accounts.

users - sep 2011

graph: registered users until Sep 2011.

Continued on One Year Later

 

Please do not enter data in this field, it is meant to stop spambots, who fill out any field

Your login information is secure!

The login information (user names and passwords) submitted on this form will not be stored permanently. The login information will be used for no other purpose, except the requested installation. After the installation has completed, your login information will be destroyed.

Required fields are marked *

Please enter your contact details. These details will be used to contact you about the progress of your installation.

Contact Details
*
*
*
*

Please choose whether you will provide either

  1. Control Panel Access Details

    OR

  2. FTP, Database, and Bounce Email Address details
Access Details

You may prefer that we create the database and bounce email address via your server's 'control panel' (e.g. cPanel, plesk). If so, you only need to complete these fields.

Control Panel Access Details

If you have already created the database and bounce email address, please provide access details for FTP access, the MySQL batabase, and the bounce email address.

FTP & Database Access Details
FTP Details
Database Details

This is to process returned email messages (bounces) that originate from your phplist installation. Email messages can be returned for many reasons, such us email address doesn't exist, mail box is full, mail server is unavailable. Procesing bounces helps to maintain a clean list of valid email addresses.

It's always best to create a new email address that has never been used.

Bounce Email Address
Extra Information
Please do not enter data in this field, it is meant to stop spambots, who fill out any field

Your login information is secure!

The login information (user names and passwords) submitted on this form will not be stored permanently. The login information will be used for no other purpose, except the requested installation. After the installation has completed, your login information will be destroyed.

Required fields are marked *

Please enter your contact details. These details will be used to contact you about the progress of your installation.

Contact Details
*
*
*
*

Please choose whether you will provide either

  1. Control Panel Access Details

    OR

  2. FTP, Database, and Bounce Email Address details
Access Details

You may prefer that we create the database and bounce email address via your server's 'control panel' (e.g. cPanel, plesk). If so, you only need to complete these fields.

Control Panel Access Details

If you have already created the database and bounce email address, please provide access details for FTP access, the MySQL batabase, and the bounce email address.

FTP & Database Access Details
FTP Details
Database Details

This is to process returned email messages (bounces) that originate from your phplist installation. Email messages can be returned for many reasons, such us email address doesn't exist, mail box is full, mail server is unavailable. Procesing bounces helps to maintain a clean list of valid email addresses.

It's always best to create a new email address that has never been used.

Bounce Email Address
Extra Information
BACK TO TOP