Defensive web development

Whether the currency in question is dollars, Bitcoin, moral principles or infamy, a compromised site is just the end result of a business transaction. The purpose of this post is to consider the basic options in making this business unfavorable to an attacker; not eliminate it altogether. There are circumstances in which the business of compromise will still take place even in extremely unfavorable or, unforeseen to you, favorable conditions. Although some of the examples are in PHP as implemented on a typical *nix environment, the ideas here should apply to most other development conditions.

Broad premises

Reasons for compromise beyond “because they could” should be considered irrelevant.

You will not think of every conceivable approach to compromise so plan for contingencies. Always keep current backups, leave customer data segregated and encrypted, and never test on a production machine or connect to a production environment during testing. Always turn off debugging info and error messages where they may be seen by clients. Never store passwords, keys to storage servers, authentication tokens etc… in your script files. If these must be used in some way by your code, try storing them in php.ini or in a folder outside the web root in a per-user .ini that only PHP has read access to, but the http server does not.

What do you do when they come for you

What do you do when they come for you

Enable two factor authentication for any critical services that use the feature (especially your email). If you have login or administrator privileges for your project, never use HTML email. In fact, I’d recommend not using HTML in emails at all and filtering any clickable links into plain URLs that you can copy > paste if you need to visit them.

You won't always see it coming. Even if you do, you may not be able to avoid it.

You won’t always see it coming. Even if you do, you may not be able to avoid it.


Try to avoid “I’ve done everything I could” and “that’s probably OK” lines of thought, but do prioritize critical sections and continue to explore responses to undesirable inputs and conditions. E.G. Try to throw strings or whole files at fields where you were expecting an integer. The type of input, E.G. <select>, <input type=”email”> etc… means nothing to someone who has the “action” URL of your form. Send ridiculously large text, cookies, binaries or otherwise malformed content and see how the server responds. Always validate and sanitize client data.

In the same vein, blacklists are not favorable compared to whitelists when filtering. Only allowing inputs that follow a known set of acceptable criteria is simply a matter of practicality (and in most cases, feasibility since you probably lack omnipotence). An attacker need not succeed on every attempt at compromise, but a defender only gets to fail once. And that single failure could be catastrophic.

Always make sure your read/write/execute privileges are appropriate to minimize chances of accidental exposure. Never allow uploads to folders that have execute permissions and never allow write permissions on executable folders. Put script files outside your web root whenever possible and try to avoid applications and web hosts that limit these options. Consider putting file uploads outside the web root as well and let your scripting handle access to them by stripping out invalid path characters and specifying which directory to search. This creates for some additional overhead, but it prevents the http server from reading uploads directly which may lead to directory traversal if the server isn’t configured properly.

Client requests

Stick to what you can actually digest

Stick to what you can actually digest


Read on GET, act on POST, do nothing special on HEAD, use PUT or PATCH with extreme caution, filter all and let the rest die();

The GET method is for retrieval I.E. reading and you should concentrate on that. Generally, we want to avoid writing to a database on GET unless it’s for statistics or analytics purposes (*).

* Analytics needs a major overhaul. You don’t need to record everything a visitor does on your page and almost everything you do record will be obsolete fairly quickly. So unless you run an ad company, keep analytics to an absolute minimum. Always remember, more “things” are more moving parts and moving parts tend to fail.

POST should be used for creating new content E.G. pages, posts, comments etc… When the database auto-increments IDs or otherwise generates unique identifiers for you, POST is a great way to handle content creation. When using PUT or PATCH, you’re telling the server what the name the resource is. This is not quite the same as a content post title which can double as a URL slug; the database still has an auto-generated ID unique to that post. The resource handler needs to account for name conflict resolution, and the fact that PUT is idempotent. That is, the current request doesn’t rely on the success or failure of the previous one and so can be sent multiple times for the same resource. This may not be desirable in POST where you often don’t want content to be submitted twice.

PATCH is a special case that gets abused often (almost as much as PUT) and it’s simply a set of instructions on how to modify a resource already present on the server. Learn more about these methods before implementing PUT or PATCH.

Never touch $_GET, $_POST or $_FILES directly throughout your application. Always use filters and sanitization to ensure you’re getting the type of content you expected. For $_GET, Regular Expressions will usually suffice since we’re not dealing with HTML. Never handle HTML content with regex. The following is a friendly URL router for a possible blog or similar application.

<?php

namespace Blog; //... Or something

class Router {
	
	/**
	 * @var array Methods, routes and callbacks
	 */
	private static $routes	= array();
	
	/**
	 * Router constructor
	 */
	public function __construct() {	}
	
	/**
	 * Add a request method with an accompanying route and callback
	 * 
	 * @param	string		$method Lowercase request method
	 * @param	string		$route Simple regex route path
	 * @param	callable	$callback Function call
	 */
	public function add( $method, $route, $callback ) {
		// Format the regex pattern
		$route = self::cleanRoute( $route );
		
		// First time we're adding a path to this method?
		if ( !isset( self::$routes[$method] ) ) {
			 self::$routes[$method] = array();
		}
		
		// Add a route to this method and set callback as value
		self::$routes[$method][$route] = $callback;
	}
	
	/**
	 * Sort all sent routes for the current request method, iterate 
	 * through them for a match and trigger the callback function
	 */
	public function route() {
		if ( empty( self::$routes ) ) { // No routes?
			$this->fourOhFour();
		}
		
		// Client request path
		$path	= $_SERVER['REQUEST_URI'];
		
		// Client request method
		$method = strtolower( $_SERVER['REQUEST_METHOD'] );
		
		// No routes for this method?
		if ( empty( self::$routes[$method] ) ) {
			$this->fourOhFour();
		}
		
		// Found flag
		$found	= false;
		
		// For each path in each method, iterate until match
		foreach( self::$routes[$method] as $route => $callback ) {
			
			// Found a match for this method on this path
			if ( preg_match( $route, $path, $params ) ) {
				
				$found = true; // Set found flag
				if ( count( $params ) > 0) {
					// Clean parameters
					array_shift( $params );
				}
				
				// Trigger callback
				return call_user_func_array( 
					$callback, $params 
				);
			}
		}
		
		// We didn't find a path 
		if ( !$found ) {
			$this->fourOhFour();
		}
	}
	
	/**
	 * Paths are sent in bare. Make them suitable for matching.
	 * 
	 * @param	string		$route URL path regex
	 */
	private static function cleanRoute( $route ) {
		$regex	= str_replace( '.', '\.', $route );
		return '@^/' . $route . '/?$@i';
	}
	
	/**
	 * Possible 404 not found handler. 
	 * Something that looks nicer should be used in production.
	 */
	private function fourOhFour() {
		die( "<em>Couldn't find the page you're looking for.</em>" );
	}
}

You can then utilize it as follows

// Main index. 
function index( $page = 1 ) {
	// Do something with the given page number
}

function read( $id, $page = 1 ) {
	// Do something with $id and page number
}

// Now, you can create the router
$router		= new Blog\Router();

// Browsing index or homepage
$router->add( 'get', '', 'index' );
$router->add( 'get', '([1-9][0-9]+)', 'index' );

// Note: The regex requires the page number to start from 1-9

// Specific post
$router->add( 'get', '/post/([1-9][0-9]+)', 'read' );
$router->add( 
	'get', 
	'post/([1-9][0-9]+)/([1-9][0-9]+)', // ID and pages start from 1-9
	'read' 
);

// Now we can route
$router->route();

When handling POST content, we have to be a little more careful. The following is an example of a content post filter which uses typical fields and PHP’s built in content filtering

function getPost() {
	$filter	= array(
		'csrf'	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'id' 	=> FILTER_SANITIZE_NUMBER_INT,
		'parent'=> FILTER_SANITIZE_NUMBER_INT,
		'title' => FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'body' 	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS
	);
	
	return filter_input_array( INPUT_POST, $filter );
}

You probably want to do some special formatting for filtering HTML, but this gets rid of the overwhelming majority of undesired inputs a client may send. The filter_input_array function is quite useful for building content with multiple fields at once. When the field has not been sent, the array value will be NULL. You’ll note the ‘csrf’ field. It’s important to ensure that content sent by the user was actually intended, and anti-cross-site request forgery tokens are very helpful in that regard.

Authentication

Looks mighty suspicious!

Looks mighty suspicious!


The only safe way to ensure communication between a user and the server is secure is when the connection uses TLS. Even then, you should avoid storing the username or user ID in the cookie of a logged in user as that is sent on each request to the server. Instead, use an ‘auth’ field in your database table that is a randomly generated hash as the identifier. When the logged in user visits the site, the random hash is sent to the server and the server can use that to lookup the user instead of an ID or username. The ‘auth’ token should be renewed after each successful login.

As an additional benefit, using an auth hash will make it easy to force logout a user simply by deleting the hash stored in the database. If you believe a user’s password has been compromised or if the user requests a password reset, it’s best to delete the auth token, and send a separate link (which expires within the hour and is valid for single-use) to the user’s email to be reset instead of generating a new one yourself.

If you want to add an additional bit of verification to the cookie, you can add a hash of the client’s request signature. This is not going to be unique at all, but it will make spoofing a tiny bit harder for someone who simply steals the cookie without making note of the browser characteristics of the victim user. Keep in mind that if the cookie was sniffed in clear text, this may not help much. Remember that nothing seen in “HTTP_” header variables are reliable.

function signature() {
	$out = '';
	foreach ( $_SERVER as $k => $v ) {
		switch( $k ) {
			case 'HTTP_ACCEPT_CHARSET':
			case 'HTTP_ACCEPT_ENCODING':
			case 'HTTP_ACCEPT_LANGUAGE':
			case 'HTTP_UA_CPU':
			case 'HTTP_USER_AGENT':
			case 'HTTP_VIA':
			case 'HTTP_CONNECTION':
				$out .= $v;
		}
	}
	return hash( 'tiger160,4', $out );
}

Note that I avoided using the client’s IP address which may change often and is sometimes shared with popular proxies. Storing the output of this hash with the cookie along with the auth token will help to avoid identifying the user by name or user ID using the cookie alone.

From the inside

The hardest position to defend against is when the attacker is on the inside. There’s a large swath of information out there about compartmentalization, decentralization and restricting access to information to those who need to know. Instead, I’ll leave you with this excerpt from The Godfather Part II.

Michael Corleone: There’s a lot I can’t tell you, Tom. Yeah, I know that’s upset you in the past. You felt it was a because of a lack of trust or confidence, but it’s… it’s because I admire you. And I love you that, I kept things secret from you. It’s why at this moment that you’re the only one I can completely trust.

Fredo. Ah, he’s got a good heart. But he’s weak and he’s stupid. And this is life and death. Tom, you’re my brother.

Tom Hagen: I always wanted to be thought of as a brother by you, Mikey. A real brother.

Michael: You’re gonna take over. You’re gonna be the Don. If what I think is happened has happened, I’m gonna leave here tonight. I give you complete power, Tom. Over Fredo and his men. Rocco, Neri, everyone. I’m trusting you with the lives of wife and my children. The future of this family.

Tom: If we ever catch these guys do you think we’ll find out who’s at the back of all this?

Michael: We’re not gonna catch ’em. Unless I’m very wrong, they’re dead already. Killed by somebody close to us. Inside. Very, very frightened they’ve botched it.

Tom: But your people, Rocco and Neri, you don’t think they had something to do with this.

Michael: You see, all our people are businessmen. Their loyalty is based on that. One thing I learned from pop, was to try to think as people around you think. Now on that basis, anything is possible.

To register or not register

I’m at an impasse at the moment with regard to the forum. The classic way to run a forum was to create a user account with username, password and email that tied each and every post to a particular user. This made viewing the history of a user and establishing a reputation easy, but it also meant established users asserted their authority quite often. Sometimes objectionably.

Then there’s the ye olde imageboard system where a user may enter a name and password, but it’s only tagged per post via a pseudo-unique identifier. I’m not sure if this method is better than the registration, but it does cut down on the code requirements. It also makes viewing a user history more difficult as the system deliberately caters to anonymous posting first.

4Chan, the most well known imageboard in the West, uses this system as well. Something it inherited from 2Ch, the most famous textboard in the East. Despite 4Chan’s reputation as a wretched hive of scum and villainy a la Mos Eisley, there are sections that are remarkably well kept despite the anonymity. I’ve even seen intelligent and remarkably humane discussions take place, on a few salient boards.

Of course, registration doesn’t  automatically make for a well kept community either. Reddit, for example, can easily surpass the reputation of 4Chan. A cursory browse of some of the more unsavory subreddits can easily depress the most optimistic folks with an unshakable faith in humanity. Likewise, there are others that offer the same or better intelligent content as well. Of course, it also offers many other flavors that don’t quite fit anywhere on the spectrum of discussion.

The difference, then, is moderation.

I’m trying to create a voting system that, while remaining anonymous, still affords users a voice at a balanced volume in determining what should be promoted to the front page or remain in the “New/Firehose” section or which ones should be nuked from orbit. I also want to ensure voting power decreases over time. I.E. When a post is new, all votes for or against it count more than when it’s a few hours old. I think this prevents excessive judgment with the hindsight of over-analyzed social norms which, for better or worse, tend to be overcorrected. The user interface and online disinhibition make sound judgments more difficult, but we should all know what is obviously wrong upon first read and right away take appropriate action.

This level of self-moderation with rare moderator intervention early can work as long as consistency is maintained. I don’t believe in excessively long codes of conducts, which are seldom followed by those intent on not following them anyway. I mean the first law in all civil discourse is “Don’t be an ass”. How hard is that? Those obviously being asses are easy to spot and should have their candy taken away.

In that regard, I’m still following the old tried and true approach to community building and moderation. Least amount of friction, least amount of fluff, brutally simple and consistent.

Right then. Onward to building the damn thing.

What Does a Neural Network Actually Do?

Some Thoughts on a Mysterious Universe

There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others). Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving. To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem. I will do this here for the case of multilayer perceptrons. By the end…

View original post 1,775 more words

My heart is ok, but my eyes are bleeding

Leaf Security Research

TL;DR: heartbleed is bad, but not world ending. OpenSSL is not any more vulnerable because of its freelists and would still be vulnerable without them.

We felt that there weren’t enough heartbleed write-ups yet, so we wrote another one. Unlike many of the other posts, we are not going to talk about the TLS protocol or why we think the heartbeat extension is pointless. Instead, we are going to focus on the bug itself and more specifically, why sensitive data gets leaked.

First we would like to state that, as far as complexity goes, the heartbleed vulnerability is nothing special, but that doesn’t mean it was easy to find. All bugs are easy to spot after someone else points them out to you. Hindsight is 20/20 after all. Riku, Antti and Matti at Codenomicon and Neel Mehta at Google all independently discovered this bug. Neel was also kind enough to…

View original post 2,226 more words