Defensive web development

Whether the currency in question is dollars, Bitcoin, moral principles or infamy, a compromised site is just the end result of a business transaction. The purpose of this post is to consider the basic options in making this business unfavorable to an attacker; not eliminate it altogether. There are circumstances in which the business of compromise will still take place even in extremely unfavorable or, unforeseen to you, favorable conditions. Although some of the examples are in PHP as implemented on a typical *nix environment, the ideas here should apply to most other development conditions.

Broad premises

Reasons for compromise beyond “because they could” should be considered irrelevant.

You will not think of every conceivable approach to compromise so plan for contingencies. Always keep current backups, leave customer data segregated and encrypted, and never test on a production machine or connect to a production environment during testing. Always turn off debugging info and error messages where they may be seen by clients. Never store passwords, keys to storage servers, authentication tokens etc… in your script files. If these must be used in some way by your code, try storing them in php.ini or in a folder outside the web root in a per-user .ini that only PHP has read access to, but the http server does not.

What do you do when they come for you

What do you do when they come for you

Enable two factor authentication for any critical services that use the feature (especially your email). If you have login or administrator privileges for your project, never use HTML email. In fact, I’d recommend not using HTML in emails at all and filtering any clickable links into plain URLs that you can copy > paste if you need to visit them.

You won't always see it coming. Even if you do, you may not be able to avoid it.

You won’t always see it coming. Even if you do, you may not be able to avoid it.


Try to avoid “I’ve done everything I could” and “that’s probably OK” lines of thought, but do prioritize critical sections and continue to explore responses to undesirable inputs and conditions. E.G. Try to throw strings or whole files at fields where you were expecting an integer. The type of input, E.G. <select>, <input type=”email”> etc… means nothing to someone who has the “action” URL of your form. Send ridiculously large text, cookies, binaries or otherwise malformed content and see how the server responds. Always validate and sanitize client data.

In the same vein, blacklists are not favorable compared to whitelists when filtering. Only allowing inputs that follow a known set of acceptable criteria is simply a matter of practicality (and in most cases, feasibility since you probably lack omnipotence). An attacker need not succeed on every attempt at compromise, but a defender only gets to fail once. And that single failure could be catastrophic.

Always make sure your read/write/execute privileges are appropriate to minimize chances of accidental exposure. Never allow uploads to folders that have execute permissions and never allow write permissions on executable folders. Put script files outside your web root whenever possible and try to avoid applications and web hosts that limit these options. Consider putting file uploads outside the web root as well and let your scripting handle access to them by stripping out invalid path characters and specifying which directory to search. This creates for some additional overhead, but it prevents the http server from reading uploads directly which may lead to directory traversal if the server isn’t configured properly.

Client requests

Stick to what you can actually digest

Stick to what you can actually digest


Read on GET, act on POST, do nothing special on HEAD, use PUT or PATCH with extreme caution, filter all and let the rest die();

The GET method is for retrieval I.E. reading and you should concentrate on that. Generally, we want to avoid writing to a database on GET unless it’s for statistics or analytics purposes (*).

* Analytics needs a major overhaul. You don’t need to record everything a visitor does on your page and almost everything you do record will be obsolete fairly quickly. So unless you run an ad company, keep analytics to an absolute minimum. Always remember, more “things” are more moving parts and moving parts tend to fail.

POST should be used for creating new content E.G. pages, posts, comments etc… When the database auto-increments IDs or otherwise generates unique identifiers for you, POST is a great way to handle content creation. When using PUT or PATCH, you’re telling the server what the name the resource is. This is not quite the same as a content post title which can double as a URL slug; the database still has an auto-generated ID unique to that post. The resource handler needs to account for name conflict resolution, and the fact that PUT is idempotent. That is, the current request doesn’t rely on the success or failure of the previous one and so can be sent multiple times for the same resource. This may not be desirable in POST where you often don’t want content to be submitted twice.

PATCH is a special case that gets abused often (almost as much as PUT) and it’s simply a set of instructions on how to modify a resource already present on the server. Learn more about these methods before implementing PUT or PATCH.

Never touch $_GET, $_POST or $_FILES directly throughout your application. Always use filters and sanitization to ensure you’re getting the type of content you expected. For $_GET, Regular Expressions will usually suffice since we’re not dealing with HTML. Never handle HTML content with regex. The following is a friendly URL router for a possible blog or similar application.

<?php

namespace Blog; //... Or something

class Router {
	
	/**
	 * @var array Methods, routes and callbacks
	 */
	private static $routes	= array();
	
	/**
	 * Router constructor
	 */
	public function __construct() {	}
	
	/**
	 * Add a request method with an accompanying route and callback
	 * 
	 * @param	string		$method Lowercase request method
	 * @param	string		$route Simple regex route path
	 * @param	callable	$callback Function call
	 */
	public function add( $method, $route, $callback ) {
		// Format the regex pattern
		$route = self::cleanRoute( $route );
		
		// First time we're adding a path to this method?
		if ( !isset( self::$routes[$method] ) ) {
			 self::$routes[$method] = array();
		}
		
		// Add a route to this method and set callback as value
		self::$routes[$method][$route] = $callback;
	}
	
	/**
	 * Sort all sent routes for the current request method, iterate 
	 * through them for a match and trigger the callback function
	 */
	public function route() {
		if ( empty( self::$routes ) ) { // No routes?
			$this->fourOhFour();
		}
		
		// Client request path
		$path	= $_SERVER['REQUEST_URI'];
		
		// Client request method
		$method = strtolower( $_SERVER['REQUEST_METHOD'] );
		
		// No routes for this method?
		if ( empty( self::$routes[$method] ) ) {
			$this->fourOhFour();
		}
		
		// Found flag
		$found	= false;
		
		// For each path in each method, iterate until match
		foreach( self::$routes[$method] as $route => $callback ) {
			
			// Found a match for this method on this path
			if ( preg_match( $route, $path, $params ) ) {
				
				$found = true; // Set found flag
				if ( count( $params ) > 0) {
					// Clean parameters
					array_shift( $params );
				}
				
				// Trigger callback
				return call_user_func_array( 
					$callback, $params 
				);
			}
		}
		
		// We didn't find a path 
		if ( !$found ) {
			$this->fourOhFour();
		}
	}
	
	/**
	 * Paths are sent in bare. Make them suitable for matching.
	 * 
	 * @param	string		$route URL path regex
	 */
	private static function cleanRoute( $route ) {
		$regex	= str_replace( '.', '\.', $route );
		return '@^/' . $route . '/?$@i';
	}
	
	/**
	 * Possible 404 not found handler. 
	 * Something that looks nicer should be used in production.
	 */
	private function fourOhFour() {
		die( "<em>Couldn't find the page you're looking for.</em>" );
	}
}

You can then utilize it as follows

// Main index. 
function index( $page = 1 ) {
	// Do something with the given page number
}

function read( $id, $page = 1 ) {
	// Do something with $id and page number
}

// Now, you can create the router
$router		= new Blog\Router();

// Browsing index or homepage
$router->add( 'get', '', 'index' );
$router->add( 'get', '([1-9][0-9]+)', 'index' );

// Note: The regex requires the page number to start from 1-9

// Specific post
$router->add( 'get', '/post/([1-9][0-9]+)', 'read' );
$router->add( 
	'get', 
	'post/([1-9][0-9]+)/([1-9][0-9]+)', // ID and pages start from 1-9
	'read' 
);

// Now we can route
$router->route();

When handling POST content, we have to be a little more careful. The following is an example of a content post filter which uses typical fields and PHP’s built in content filtering

function getPost() {
	$filter	= array(
		'csrf'	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'id' 	=> FILTER_SANITIZE_NUMBER_INT,
		'parent'=> FILTER_SANITIZE_NUMBER_INT,
		'title' => FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'body' 	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS
	);
	
	return filter_input_array( INPUT_POST, $filter );
}

You probably want to do some special formatting for filtering HTML, but this gets rid of the overwhelming majority of undesired inputs a client may send. The filter_input_array function is quite useful for building content with multiple fields at once. When the field has not been sent, the array value will be NULL. You’ll note the ‘csrf’ field. It’s important to ensure that content sent by the user was actually intended, and anti-cross-site request forgery tokens are very helpful in that regard.

Authentication

Looks mighty suspicious!

Looks mighty suspicious!


The only safe way to ensure communication between a user and the server is secure is when the connection uses TLS. Even then, you should avoid storing the username or user ID in the cookie of a logged in user as that is sent on each request to the server. Instead, use an ‘auth’ field in your database table that is a randomly generated hash as the identifier. When the logged in user visits the site, the random hash is sent to the server and the server can use that to lookup the user instead of an ID or username. The ‘auth’ token should be renewed after each successful login.

As an additional benefit, using an auth hash will make it easy to force logout a user simply by deleting the hash stored in the database. If you believe a user’s password has been compromised or if the user requests a password reset, it’s best to delete the auth token, and send a separate link (which expires within the hour and is valid for single-use) to the user’s email to be reset instead of generating a new one yourself.

If you want to add an additional bit of verification to the cookie, you can add a hash of the client’s request signature. This is not going to be unique at all, but it will make spoofing a tiny bit harder for someone who simply steals the cookie without making note of the browser characteristics of the victim user. Keep in mind that if the cookie was sniffed in clear text, this may not help much. Remember that nothing seen in “HTTP_” header variables are reliable.

function signature() {
	$out = '';
	foreach ( $_SERVER as $k => $v ) {
		switch( $k ) {
			case 'HTTP_ACCEPT_CHARSET':
			case 'HTTP_ACCEPT_ENCODING':
			case 'HTTP_ACCEPT_LANGUAGE':
			case 'HTTP_UA_CPU':
			case 'HTTP_USER_AGENT':
			case 'HTTP_VIA':
			case 'HTTP_CONNECTION':
				$out .= $v;
		}
	}
	return hash( 'tiger160,4', $out );
}

Note that I avoided using the client’s IP address which may change often and is sometimes shared with popular proxies. Storing the output of this hash with the cookie along with the auth token will help to avoid identifying the user by name or user ID using the cookie alone.

From the inside

The hardest position to defend against is when the attacker is on the inside. There’s a large swath of information out there about compartmentalization, decentralization and restricting access to information to those who need to know. Instead, I’ll leave you with this excerpt from The Godfather Part II.

Michael Corleone: There’s a lot I can’t tell you, Tom. Yeah, I know that’s upset you in the past. You felt it was a because of a lack of trust or confidence, but it’s… it’s because I admire you. And I love you that, I kept things secret from you. It’s why at this moment that you’re the only one I can completely trust.

Fredo. Ah, he’s got a good heart. But he’s weak and he’s stupid. And this is life and death. Tom, you’re my brother.

Tom Hagen: I always wanted to be thought of as a brother by you, Mikey. A real brother.

Michael: You’re gonna take over. You’re gonna be the Don. If what I think is happened has happened, I’m gonna leave here tonight. I give you complete power, Tom. Over Fredo and his men. Rocco, Neri, everyone. I’m trusting you with the lives of wife and my children. The future of this family.

Tom: If we ever catch these guys do you think we’ll find out who’s at the back of all this?

Michael: We’re not gonna catch ’em. Unless I’m very wrong, they’re dead already. Killed by somebody close to us. Inside. Very, very frightened they’ve botched it.

Tom: But your people, Rocco and Neri, you don’t think they had something to do with this.

Michael: You see, all our people are businessmen. Their loyalty is based on that. One thing I learned from pop, was to try to think as people around you think. Now on that basis, anything is possible.

Advertisements

Rendering a CAPTCHA image in PHP

It’s been a while since I posted anything web or programming related (I honestly don’t even the remember the last time) so I thought I’d post an update with something asked in an email by a friend. He was putting together a something which I’ve been asked to co-write and we came across the CAPTCHA issue again. We’re thinking of using these in a somewhat different way.

What they don’t tell you about CAPTCHA

They’re, more often than not, completely ineffective. The whole point about trying to prevent bots is only relevant when talking about simple drive-by spammers forum flooding or the like, but for anything more than that, you’re better off finding something else.

What they ARE useful for is to make sure only people who really have something to say end up voicing their opinion. I.E. It’s a think-before-you-speak buffer in many ways. This is especially helpful when you have anonymous posting enabled.

I’ve seen tons of examples of how to generate CAPTCHAs, but many of these (especially for PHP) are depending either on an existing image background or are so unreadable, they are not only bot proof they’re human proof. Worse yet, I’ve seen examples longer than one page of code. And I’m not even talking about session handling.

How someone writes something as simple as a CAPTCHA render in longer than a couple of functions is beyond me. OK, that’s just me being lazy, but to a programmer, laziness is a virtue sometimes.

Here’s another thing they don’t tell you about CAPTCHAs: Anything over 3 characters is useless. If they’ve managed to use OCR to break 3 characters, they’ve got the rest, your efforts will only frustrate legitimate users. Using 4 characters is a bit excessive, 5 and you’re getting on my nerves. 6 Is ridiculous and with any more, chances are, I’d rather not participate in whatever it is you have behind your unreadable gibberish.

Another thing a lot of these CAPTCHAs seem to overlook is the character pool. In some of these things, I’ve seen s that looks like 5, u that looks like v. Don’t even get me started on 0, o, 1, i, and j. Best option in this case is to get rid of these similar looking characters. In fact, you’re better off getting rid of most characters that even remotely have the ability to be confused with another letter. This is why I’m leaving out e as well, since that’s too easily confused with ‘c’ sometimes.

Here’s a sample of a CAPTCHA that hopefully doesn’t suck.

You should at least be able to read the bloody thing.

Here’s the code file that generated it (I didn’t include sessions and stuff, but plenty of examples are available elsewhere) :

<?php

ini_set( "display_errors", true );

// Rudimentary random string generator
function random( $length ) {
	
	$out = '';
	$pool = str_split( '2345689abcdfghkmnpqrstwxyzABCDEFGHKMNPQRSTWXYZ' );
	
	// This doesn't need to be any more complicated
	for($i=0; $i < $length; $i++)
		$out .= $pool[ array_rand( $pool ) ];
	
	return $out;
}

// The business end
function captcha( $txt ) {
	
	// Height of 50 is usually good enough	
 	$sizey = 50;
	
	// Character length
 	$cl = strlen( $txt );
	
	// We'll expand the image with the number of characters
 	$sizex= ( $cl * 19 ) + 10;
	
	// I used monofont, but you can download another font to use
	// Try http://dafont.com (don't pick crazy fonts, a nice monospace will do)
	$font = 'monofont.ttf'; 
	
	// Some initial padding
	$w = floor( $sizex / $cl ) - 13;
	
	$img = imagecreatetruecolor( $sizex, $sizey );
	$bg = imagecolorallocate( $img, 255, 255, 255 );
	imagefilledrectangle( $img, 0, 0, $sizex, $sizey, $bg );
	
	// Random lines
	for( $i=0; $i < ( $sizex * $sizey ) / 250; $i++ ) {
		
		// Select colors in a comfortable range
		$t = imagecolorallocate( $img, rand( 150, 200 ), rand( 150, 200 ), rand( 150, 200 ) );
		imageline($img, 
			mt_rand( 0, $sizex ), 
			mt_rand( 0, $sizey ), 
			mt_rand( 0, $sizex ), 
			mt_rand( 0, $sizey ), $t );
	}
	
	// Insert the text (with random colors and placement)
	for ( $i = $cl; $i >= 0; $i--) {
		
		$l = substr( $txt, $i, 1 );
		
		// Again, colors in a comfortable range. I was thinking pastels
		$tc = imagecolorallocate( $img, rand( 0, 150 ), rand( 10, 150 ), rand( 10, 150 ) );
		imagettftext( $img, 30, 
			rand( -10, 10 ), 
			$w + ( $i * rand( 18, 19 ) ), 
			rand( 30, 40 ), $tc, $font, $l );
	}
	
	// Move the header up the code page if this is going in as part of a bigger project
	header("Content-type: image/png");
	
	imagepng( $img );
	imagedestroy( $img );
}

// Use the render to store in a session first. Remember to clear it with each attempt (success or failure)
captcha( random( 3 ) );
?>

Stop WebMatrix from opening every folder on double-click

Just when you thought in 2012, Microsoft, out of the kindess of their heart (or by finally listening to the public they’ve been shafting for years) wouldn’t do something completely asinine with their software…

If you’ve installed Microsoft WebMatrix lately, you may have noticed that it has the ability to open a folder as a website by right clicking and selecting “Open as a Web Site with Microsoft WebMatrix”. The problem is that now it opens every single folder just by double clicking on.

Fire up RegEdit and navigate to :
\HKEY_CLASSES_ROOT\Directory\shell

Find a key called “OpenAsAWebsite”. You’ll find a subkey called “command”. Copy the default value of the key and delete it and its parent “OpenAsAWebSite”.

Now navigate to:
\HKEY_CLASSES_ROOT\Folder\shell\ContextMenuHandlers

Add a new key here called “OpenAsAWebSite” (like the one above) and give this a default value of “Open as a Web Site with Microsoft WebMatrix”. Then create a “command” key under it, just like you saw before, and change its default value by pasting what you copied from before. Or if you forgot to copy it, change to the install directory of WebMatrix if necessary and paste the following :

C:\Program Files (x86)\Microsoft WebMatrix\WebMatrix.exe #ExecuteCommand# SiteFromFolder %L

Now you should still be able to use the Open as Web Site feature in WM, but not have it happen by default when you try to open every bloody folder.

Note to Microsoft:
I know you’ll ignore this, but stop being stupid!