LP-Trel Zen

Joined: 02 Dec 2002 Posts: 5721 Location: Nirvana by Boredom
|
Posted: Sun Jan 08, 2006 6:22 am Post subject: Bad web robots and what you can do to protect your website |
|
|
As many of you are likely aware not all of your visitors are human. Some are robots such as GoogleBot, Yahoo! Slurp, or MSNbot and they can be quite hungry for bandwidth when browsing or "crawling" your websites.
Note: Not all spiders are good.
Some of them intentionally attempt to spam your contact forms or blog comments while others attempt to download your entire website's contents. Others are just "dumb bots" and continue eating bandwidth for no good reason ignoring robots.txt files completely.
Many of these are already blocked out from reaching dynamic websites (php, perl, ruby etc) via our application firewall but, some such as the bandwidth hungry bots can eat enough bandwidth to take your website offline (bandwidth exceeded) in just a few short hours on static content.
To protect yourself the following resources may be of help:
http://www.javascriptkit.com/h.....ss13.shtml
http://www.google.com/search?q.....+.htaccess
Note: Adding the following to your list of blocked robots could help save bandwidth. We have found that this robot is very aggressive and can eat away gigabytes of bandwidth in just a few days.
| Code: |
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer\_Bot [OR]
|
Also placing a robots.txt in the top level (public_html directory) containing at least:
User-agent: *
Crawl-delay: 5
can help your website be spidered without killing the server or causing your website to be suspended.  _________________ * Knowledge Base * Wiki * Forum FAQs * |
|