Sign in to follow this  

How social media networks can combat spam, bots and digital theft (part 1)

Recommended Posts

Spam, botting and theft are serious matters and social networks do their best to combat these.
In this post we look for some basic techniques which social media sites (can) use to combat spam and bots in a quite effective manner.

It's not a trivial task to detect spam and profiles which were built using bots. Oftentimes genuine users perform an act of spam and abusive advertising, but as long no damage/harm is done, these accounts will not be flagged/banned. Further more, manually reviewing all potentially fake/botted profiles is very intensive, thus it's only done in more serious cases (such as for bigger profiles).

You may have noticed that some social media sites are utilizing much more intelligent approaches to combat spam and bots.
Before we delve deeper into this statement, we must understand that spam and bots primarily exist for economic reasons (such as profits and monetary rewards). So, if we were a "social media site" (e.g. Tumblr, Pinterest, Instagram, ...) then we could introduce a system that attempts to separate spammers from legitimate users.

To give you an example, assume our social media site has a feed of the most popular uploads of the past 24 hours (based on #likes). Then it's possible that a network of bots (under the supervision of the bot owner) will attempt to "rank" certain images on that top feed, resulting in a lot of new followers, traffic, leads and/or sales. It is in our best interest to make sure this does not happen. As developers/engineers we could propose various (hypothetical) solutions, such as detecting accounts which have a high chance they are botting and/or associated with many other accounts which are botting/spamming.

We can establish a ranking for all profiles on our social media site and then give a weight to all their posts/activities. Now, the higher an account's spam-rate is, the lower the chance of its uploads to appear on the "most popular feed".

Have a look at this figure:


Let's introduce a spam-detection-function: F_spam(user) which calculates how "spammy" a certain user/profile is based on various criteria.
Assume A is the collection of all profiles of legitimate users, such that for each user F_spam(user) < 50%
And collection B, consisting of all spammy profiles, such that F_spam(user) >= 50%

* All profiles/accounts whose spam ratio is exactly =50% fall both in group A and B, and these users need to be reviewed manually to determine their behavior and intentions (whether they are spammers or not).

The beauty of this strategy, as the figure shows, is that we have 2 distinct and separated groups inside one big social media network. This allows us to ignore all accounts from group B when it comes down to our "most popular feed", or "best uploads of the year", etc... This will leave the botters/spammers in the dark with very little revenue/profit.

I have searched through various research sites, such as sciencedirect and ieeexplore, but haven't found many articles that explicitly discusses this method. The following article, is most likely the closest one to my proposed approach:

You may ask how can we detect spam and establish a F_spam function?
Luckily I have found an article explaining how:
It's a very short paper, however, it contains valuable information. They mention that spam posts are detected when:

  • they contain advertisement words such as "buy", "#buy" and website links. And it's almost immediately spam if the link is an affiliate URL.
  • they have many repetitive and/or duplicate words.
  • contain watermarks of website links, which can be detected on the images using OCR (optical character recognition).

An interesting and highly experimental technique discussed in the paper is the detection of the contents displayed on the image (using machine learning). For instance if the image is taken during clear daylight, but the caption/description contains "#night", then something is clearly off.

Now that we have looked at how social media sites protect themselves against bots and spam, we shall look at this method from a spammer's perspective. One may argue that spam is becoming very smart and intelligent, but when we look at some of the existing tools and software on the market, they are not intelligent at all. These tools simply emulate user's behavior (such as automatic uploading and liking content). The real power of spam lies in the hands of the spammer, not the tools themselves. Creative and smart social engineering can lure and trick users into a trap, make them buy/click wherever the spammer wants them.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this