Sign in to follow this  

How to protect your website against bots and spammers (part 1)

Recommended Posts

In this tutorial I'll demonstrate how you can use very simple techniques to help combatĀ  bots and spammers using HTML and PHP. The techniques proposed here are not award winning methods but could definitely help in certain scenarios.

Let us first start by looking at how spammers and bots work, how they are developed and why they are so effective.
Botting is all about abusing a website by automating some/all activities in a completely autonomous way. What better way is there than to illustrate by means of a simple example. For the demo I have signed up for a new account on Reddit, because I've heard people are developing bots for the purpose of upvoting their questions and/or comments.

Next up, I've launched Fiddler (developed by a company named Telerik), this is basically a HTTP(s) proxy logger. It has the ability to capture all HTTP traffic which you make (using your web browser or some software). First things, first, I picked a random topic on Reddit and clicked the "upvote" button next to the title (the orange up-arrow):


Before I did this, I made sure Fiddler was running and HTTPS traffic was captured (since Reddit uses SSL/TLS) for web traffic.
If you do not know what the HTTP protocol is, I would advise you to learn the very basics (this seems like a good source: , otherwise the rest of this tutorial will be a pain to follow and understand.

The "upvote" click resulted in the following HTTP request details (in raw format):

I have blurred out all irrelevant/private information.
The most important part is the URL of the request, as you can see it makes a call to the endpoint /api/vote?dir=1&id=t3_TOPIC&sr=CATEGORY

Further, looking inside the body of the request, we see the same id=t3_TOPIC parameter. (TOPIC correspondsĀ  to the Reddit topic ID, obtained from the Reddit page above).

A spammer/botter's mission is to create a function (in his/her favorite programming language) to reproduce these HTTP requests, and sell she/he has to do is provide a list of URLs such that TOPIC's ID can be extracted from. As easy as that you can upvote all of your comments and dominate Reddit.

Fortunately enough, Reddit has some other protection mechanism, some of these are hidden in the blurred lines on the image above. But if you invest a couple of hours (or less) figuring out what they are and how they are generated, you can easily bypass their protection.

Protection by means of randomizing parameters

Now that you know how a botter develops automation tools (aka bots), you should also know that those parameters (in the HTTP's request body) will most likely be hard coded into the software. Because these parameters are very unlikely to be changed by the website for various reasons.

We can use this fact to combat spammers and bots by developing a system that is designed to randomize parameters. Randomization can occur after a certain time, or for every request/user. The purpose is to annoy the bot developers and make their life immensely hard, until they finally decide to quit/abandon botting.

To illustrate how this can be done, I have prepared a very basic demo for you.
I've create a form, two textboxes and a submit button. This is a common setup of a contact form. It looks like on the image below (left) and the HTML code is shown on the right:


Lets now enter the fields and click the Submit button. I am using Fiddler as well as you'll notice:


Notice the body of the HTTP request message. It contains two parameters (NAMO and SUBJ), which correspond to the two fields as defined by the 'name' attribute in the HTML code.

If a spammer wishes to develop a bot/tool and abuse this contact form, he/she will hard code both parameters and assign them the value he/she wishes. Then submitting a POST request to the correct endpoint URL, will send an email to the webmaster (or whatever is programmed).

How can we protect contact forms, buttons and all other web based actions?

Have a look at this HTML+PHP snippet:


You will see that the PHP code at the very top defines a variable "RANDOM_FIRST_NAME". This variable is the concatenation of the current hour of the day, and a suffix '__NAMO'. When the time is 13:55 (1.55 pm), then the field will have the name '13__NAMO'.

The second PHP code (down below on the image) checks if a POST request was received, and then whether a correct name for the name field is used. If not the correct one is entered, it will throw an error "something went wrong".

The beauty of this method is that it can be quite burdensome to bot developers. Assume you develop are attempting to make a bot that reproduces the HTTP requests of our contact form, and you use (+ hard code) the request parameters at a certain period in time (at 13.55 (1.55pm) your local time), then all parameter names will have a prefix "13__ ...". Then five minutes later, the clock hits 14:00 and now your bot will no longer work because the hard coded parameters are incorrect.

Circumventing this technique

If you're a little bit creative, you can find a way to bypass/circumvent this anti-bot technique easily.
First you make a request to the form page, and then extract all names of the <input> HTML elements. If you do this every time before sending a POST request, you will always have the correct parameters and you do not have to hard code them at all! :)

There is however, another smart approach that can provide even better security, that will be explained in part 2.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this