Blocking Baidu and Yandex Search Spiders

September 4, 2012/6 Comments/in SEO /by Vincent

Search engines play a pivotal role in the online visibility of businesses. However, not all search engine bots are beneficial for every website. In some instances, they can consume significant bandwidth without providing any tangible return on investment. Two such culprits that have been identified are the Baidu and Yandex search engine spiders.

The Issue with Baidu and Yandex Spiders

It’s been noted that Baidu and Yandex search engines sometimes ignore the rules set in the Robots Text file. This oversight can lead to excessive bandwidth consumption, especially if these bots frequently crawl and index a website. For businesses monitoring their site’s performance, such unexpected bandwidth usage can be concerning.

A Solution to the Search Crawler Problem

To address this issue, Paul who always loves a good challenge implemented the following at the top of the HTACCESS file which seems to have blocked the Baidu and Yandex bots from spidering the website. This method has proven effective in curbing the unwanted attention from these bots.

“———————————————————————————–

SetEnvIfNoCase User-agent “Baidu” spammer=yes
SetEnvIfNoCase User-agent “Yandex” spammer=yes
SetEnvIfNoCase User-agent “Sosospider” spammer=yes

order deny,allow

deny from env=spammer
</Limit>

————————————————————————————-”

While this solution has shown promising results, it’s important to note that any determined application or bot might still find a workaround. Therefore, continuous monitoring and updating is crucial.

WARNING

DO NOT block Yandex or Baidu if you rely on either of those search engines to index your content so that it will be seen in Russia and China.

Don’t Block the Spiders

In contrast to the above, If your business targets audiences in Russia or China, it’s advised not to block Yandex or Baidu. Yandex, for instance, holds a significant market share in Russia with approximately 61% of the market share according to LiveInternet.ru stats, making it an essential platform for businesses targeting Russian consumers.

UPDATE:
We have updated the file to include the Sosospider which we found repeatedly ‘hitting our server’.

6 replies

Phil says:
January 28, 2013 at 21:31

Hey Vincent,

I came here because Baidu and Yandex bots are the main drain on my bandwidth for all the sites I administer. I actually wrote to baidu’s support department to beg them to stop hitting my website and for a while they did, but of course they came back.

Have you found the above fix to work without side effects since you’ve implemented it, because if so this will be awesome news!

Phil
Reply
- Vincent says:
  January 29, 2013 at 11:29
  
  Hi Phil,
  Thanks for the response. It has been almost six months since we implemented it onto a couple of e-commerce sites we manage. So far so good. Your response prompted me to have a look through the stats and I found no incidents of any of the above bots. However, I have updated the post to include the ‘Sosospider’ another resource draining spider / bot.
  Let us know how you get along.
  Reply
Jim says:
February 10, 2013 at 18:06

I was hoping this was going to resolve my issue with these bots and their CPU drain but alas your fix at least for me has not been successful in blocking them. 🙁 I have added this code to five off my domains and they are still getting bombarded. Total Bummer…
Reply
Daniel says:
February 12, 2013 at 14:42

hy vicent. Sry for my english
i have this rules in my htaccess. is ok like that or.. is better in your way ? I don’t know the difference. Some use this, some use like u with SetEnvIfNoCase how is better?
RewriteCond %{HTTP_USER_AGENT} ^Baidu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Yandex [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sosospider [OR]
RewriteRule ^.* – [F,L]
Reply
Paul says:
February 18, 2013 at 04:36

Hi Daniel, there’s more than one way to send the bots away and either method should work just fine.
You can even catch the request in your CMS and issue 403 messages to the bot, you’ll still see the request to the site in that case but you won’t be serving up any data, therefore if it’s done right there’s minimal load on the server.
Reply
CathyL says:
February 18, 2013 at 10:45

It’s working for us and has reduced bandwidth usage. We do not mind blocking spiders from China or Russia as were only selling into the UK.

Thanks x
Reply

Want to join the discussion?
Feel free to contribute!

Blocking Baidu and Yandex Search Spiders

The Issue with Baidu and Yandex Spiders

A Solution to the Search Crawler Problem

Don’t Block the Spiders

Leave a Reply

Leave a Reply Cancel reply

Service Pages

Latest Blog News

Legal

Company

The Issue with Baidu and Yandex Spiders

A Solution to the Search Crawler Problem

Don’t Block the Spiders

You might also like

Leave a Reply

Leave a Reply Cancel reply

Service Pages

Latest Blog News

Legal

Company