To misquote those anti-piracy ads: you wouldn’t steal a car… you wouldn’t steal a handbag… you wouldn’t steal a television… but would you steal someone’s content?
Content scrapers are rife on the Internet. If you’ve had a blog for any length of time and you track your incoming links, you might have noticed that some of them are from sites that publish your articles verbatim. What’s that all about?
Well, on one level, this is a piece of smart thinking. Set up a script to grab articles on your chosen subject from somewhere like Technorati, and a site can become an aggregator of others’ content. Even better, it can become a great source of information from all over the net and, by adding a bit of advertising, the owner can make a bit of money out of it. And, so long as they include a link back to the original article, there’s no harm in it. Right?
Wrong! The ranking algorithms that search engines use are a bit of a mystery, but it’s thought that part of the algorithm is the quality of the incoming links to a site. Not just the quantity, but that quality. So whilst it might be nice to have several sites pointing to yours, if they’re all splogs (spam blogs) then they aren’t going to help you much. In fact, search engines may well penalise you for these links, assuming that your site must be full of spam as well.
Let’s also consider the traffic implications. If someone searches for a term that would normally bring them to your site, but instead they end up on a splog showing your content, they aren’t going to visit your site. Or, if they do, they’ll realise they already read the article and will bounce away again. If you are trying to build a community of engaging readers, or even if you’re trying to make some cash through advertising, this kind of thing can really damage your pursuit of those goals.
So, quite apart from the fact that they are copying your content, splogs have the potential to damage your search engine rankings and leech away your readers and/or advertising revenue. So what should you do?
Well, unless you’re willing to get really heavy with lawyers and such like, there’s not an immense amount you can do. But you can make things difficult for them. Splogs and scrapers are there to make money, so why not try and make it as hard as possible for them to do so?
- If they have advertising – report them to their advertisers: most advertisers don’t want to be associated with spam, so visit their site and see if there’s a way to report one of their publishers as a splog. If they’re using adsense you can click on the “ads by Google” link and then on a link at the bottom of the page marked, “Send Google your thoughts on the site or the ads you just saw”. This will let you report a violation and report the splog for nicking your content.
- Report them to search engines – since search engine ranking is important for getting traffic, you should report splogs to search engines to have them removed from or moved further down the index. Google’s spam reporting page is here. Apparently you can do it on Yahoo by visiting their site explorer and there should be a link on the “inlinks” section allowing you to report spam links… but I can’t find the button! Anyone seen it?
- If they are hosted by a reputable hosting company, try and report them to their hosts as well – there’s no guarantee that the host will do anything about it, but you never know.
Have you had experience of people scraping your content? If so, how did you deal with it? And do you have any other tips for dealing with splogs and scrapers? Let us know in the comments.
Spam image by David-Trattnig
Related reading (auto-generated):
Join us on Facebook
Declare your geekdom for the world to see... well, the part of the world that's on Facebook anyway.
Visit our Facebook page to keep up to date with the latest Geek-Speak posts right in your Facebook stream, as well as hearing about discounts and offers before they're posted on the site.
What are you waiting for? Head on over and "like" us.

{ 13 comments }
Very good article/post. For every good thing that moves us forward, it seems like the “bad” people are just as creative and intelligent to start a new attack.
Thank you for the heads up, and ways to deal with it.
Take Care,
Sally
http://www.stopsmokingwithdrsally.com
Thanks for the comment, Sally. I hope you’ll never have to use any of the tips, but if you do I hope they prove useful!
I always use option #1 – it’s surprising how quickly they give up when their revenue source severs their relationship with them!
Very good article! John.
@scam Do you find it works quite quickly, or does it take a while?
I find it takes a variable amount of time but, on average, less than a week (assuming their ads are served by Google).
I rarely see scrapers using any other form of advertising – they hope for a couple of Adsense clicks per day and once that’s gone they have no reason to continue :)
What’s really weird is that I’ve had a few links from sites that look like spam, but have no advertising on them. I don’t see the point, really.
Perhaps they hope that will increase the chances of people linking to them? Once they have several incoming links then they’ll whack the advertising on?
That’s entirely possible, which makes it harder to report them to their advertisers (you’d have to keep checking back to see when they start advertising).
If that’s what they’re up to that’s both very clever and very sneaky.
I’ll use option#3, it will definitely kill their website and business temporary. Harsh but direct.
I know of someone who uses clipmarks for his blog. Is that considered “scraping” as well?
Hi ShuTian – that’s a tough one, actually. Personally, I’d say that scrapers tend to be automated, whereas your friend is presumably clipping only items that are actually of interest to him?
Good practice, I’d suggest, would be to include a link to the original site if you have clipped anything, and to make some sort of comment besides simply lifting clips verbatim. That way people will know the site is not an automated scraper, and the site’s visitors would perceive extra value at the same time.
Hope that makes sense!
Generally automated site if they are scrapers, aggregation sites etc. their main aim (if the person behind them has a brain) is not to make money of the adds from the site.
The adds are there for ‘bonus’ income, so they can pay for the hosting.
The main reason for such ‘splogs’ is for its Link value. ‘splogs’ are created in groups and these can very easily propel a affiliate higher in the serps with ease; or just ‘launder’ their PR onwards to a 2nd tier site and from there to the money making ones.
Comments on this entry are closed. Have you read our commenting guidelines?