This article started as a thread on Cosmic Village's 'Speak Your Mind' forum that detailed an .htaccess file modification which prevents known 'Whacker' programs from automatically downloading all of your content. This was a useful and significant enough technique that I had to share it with all of you:
Is anyone using .htaccess to block "whacker" programs? You know, those programs that some surfers use to grab all of the contents on your site and download them to their hard drive while they sleep or surf other sites. The programs are also known as "Off-Line Browsers" and can take quite a toll on your bandwidth.
Consider the following application... Let's say you have a pay site with a gig or two of content in the member's area and I join your site with the intention of defrauding you. Once inside your member's area, I crank up my whacker program and set the parameters of it, and then go off to work. By the time I come home, the program will have downloaded your entire site onto my hard drive (depending on the speed of my connection of course). Then, I can either cancel my brief Trial Membership or call my bank to report CC Fraud on my card when my statement comes. Regardless of what I do, I still have all of your content, exactly replicated as it is on your site, page by page, by page. Best of all I didn't have to click a single link to get any of it, and at the most, it might have cost me a couple bucks for the Trial Membership.
This form of site leeching can be prevented through the use of an .htaccess file with the various names of known Whacker program's 'User Agents' defined in the file code. By placing the full re-write syntax structure into the .htaccess file and referring the User Agent to some place else to 'whack,' anytime a Whacker program (as defined in the list) is detected, it will be halted from whacking your site and redirected elsewhere instead.
As far as where to send them off to, I recommend sending them a site opposite in appeal to what your site offers, or a nasty CJ site... The important part of this is to WARN the person first. The first thing inside our Member's Area is a notice saying that we do not support Whackers and that if they attempt to Whack anyway, that they'll end up with a nice surprise.
Keep in mind that many Whacker Programs have an option to allow the person who is Whacking OFF (hahaha) to keep the Whacker Program confined to the site it was originally set to Whack. But, since content can be stored on multiple domains, some Whacker's will leave the option to Whack across multiple domains turned off. So, if you want to mess with them, you have to send them to a "Special Page" on your own domain. To that point, your "Special Page" could cause 100 or more new browser windows to open at one time, draining the Whacker's CPU resources and forcing a hung system for him. Get creative with it and have some fun!
The Code
Like all .htaccess files, this file must be put it into the directory whose contents and sub-level folders you want to protect. With this User Agent file in the upper most folder of your member's area, as soon as a whacker program is detected, it's routed to the URL specified in the syntax and it attempts to whack that URL and it's links:
Files .htaccess>
RewriteEngine On
order allow,deny
deny from all
/Files>
RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Iria.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Stripper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Copier.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Crawler.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snagger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Reaper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Sucker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Downloader.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Siphon.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Collector.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mag-Net.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Widow.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pockey.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*DA.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snake.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BackWeb.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*gotit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Vacuum.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SmartDownload.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pump.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HMView.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Ninja.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JOC.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*likse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Memo.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*pcBrowser.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperBot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*leech.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mirror.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Recorder.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Likse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Navroad.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*attach.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Magnet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Surfbot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bandit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Ants.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Buddy.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Whacker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*DISCo\Pump.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Drip.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EirGrabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EyeNetIE.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FlashGet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GetRight.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Gets.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go!Zilla.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go-Ahead-Got-It.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grafula.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*IBrowse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*InterGET.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Internet\Ninja.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JetCar.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JustView.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MIDown\tool.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mister\PiX.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NearSite.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetSpider.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline\Explorer.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PageGrabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Papa\Foto.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pockey.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ReGet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Slurp.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SpaceBison.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperHTTP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCopier.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebFetch.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebReaper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebSauger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebStripper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebWhacker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Web\Image\Collector.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Web\Sucker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Webster.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*eCatch.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*lftp.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*tAkeOut.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FileHound.*$ [OR]
RewriteRule .[Ss][Hh][Tt][Mm][Ll]*$ /leeches.html [L]
Programs with similar names... WebWhacker, WebReaper, WebStripper can all be represented by simply using one term ^.*Web.* If written that way, any program with the word "Web" at the beginning will be treated as an Agent. As you can see from the list, some combining could be done to reduce the number of lines of code. To stay on top of new programs coming out, just visit the various Shareware / Free Download sites and look for Off-Line Browsers or do a search for them. Dropping in on a few WAREZ type sites will also usually help.
Make sure that you change 'leeches.html' to the URL you wish to send the 'Whacker' to, and visit the full thread by following the link below; it contains additional USER_AGENT files supplied by dvd871 as well as helpful useage information.