Defensive web development

Whether the currency in question is dollars, Bitcoin, moral principles or infamy, a compromised site is just the end result of a business transaction. The purpose of this post is to consider the basic options in making this business unfavorable to an attacker; not eliminate it altogether. There are circumstances in which the business of compromise will still take place even in extremely unfavorable or, unforeseen to you, favorable conditions. Although some of the examples are in PHP as implemented on a typical *nix environment, the ideas here should apply to most other development conditions.

Broad premises

Reasons for compromise beyond “because they could” should be considered irrelevant.

You will not think of every conceivable approach to compromise so plan for contingencies. Always keep current backups, leave customer data segregated and encrypted, and never test on a production machine or connect to a production environment during testing. Always turn off debugging info and error messages where they may be seen by clients. Never store passwords, keys to storage servers, authentication tokens etc… in your script files. If these must be used in some way by your code, try storing them in php.ini or in a folder outside the web root in a per-user .ini that only PHP has read access to, but the http server does not.

What do you do when they come for you

What do you do when they come for you

Enable two factor authentication for any critical services that use the feature (especially your email). If you have login or administrator privileges for your project, never use HTML email. In fact, I’d recommend not using HTML in emails at all and filtering any clickable links into plain URLs that you can copy > paste if you need to visit them.

You won't always see it coming. Even if you do, you may not be able to avoid it.

You won’t always see it coming. Even if you do, you may not be able to avoid it.

Try to avoid “I’ve done everything I could” and “that’s probably OK” lines of thought, but do prioritize critical sections and continue to explore responses to undesirable inputs and conditions. E.G. Try to throw strings or whole files at fields where you were expecting an integer. The type of input, E.G. <select>, <input type=”email”> etc… means nothing to someone who has the “action” URL of your form. Send ridiculously large text, cookies, binaries or otherwise malformed content and see how the server responds. Always validate and sanitize client data.

In the same vein, blacklists are not favorable compared to whitelists when filtering. Only allowing inputs that follow a known set of acceptable criteria is simply a matter of practicality (and in most cases, feasibility since you probably lack omnipotence). An attacker need not succeed on every attempt at compromise, but a defender only gets to fail once. And that single failure could be catastrophic.

Always make sure your read/write/execute privileges are appropriate to minimize chances of accidental exposure. Never allow uploads to folders that have execute permissions and never allow write permissions on executable folders. Put script files outside your web root whenever possible and try to avoid applications and web hosts that limit these options. Consider putting file uploads outside the web root as well and let your scripting handle access to them by stripping out invalid path characters and specifying which directory to search. This creates for some additional overhead, but it prevents the http server from reading uploads directly which may lead to directory traversal if the server isn’t configured properly.

Client requests

Stick to what you can actually digest

Stick to what you can actually digest

Read on GET, act on POST, do nothing special on HEAD, use PUT or PATCH with extreme caution, filter all and let the rest die();

The GET method is for retrieval I.E. reading and you should concentrate on that. Generally, we want to avoid writing to a database on GET unless it’s for statistics or analytics purposes (*).

* Analytics needs a major overhaul. You don’t need to record everything a visitor does on your page and almost everything you do record will be obsolete fairly quickly. So unless you run an ad company, keep analytics to an absolute minimum. Always remember, more “things” are more moving parts and moving parts tend to fail.

POST should be used for creating new content E.G. pages, posts, comments etc… When the database auto-increments IDs or otherwise generates unique identifiers for you, POST is a great way to handle content creation. When using PUT or PATCH, you’re telling the server what the name the resource is. This is not quite the same as a content post title which can double as a URL slug; the database still has an auto-generated ID unique to that post. The resource handler needs to account for name conflict resolution, and the fact that PUT is idempotent. That is, the current request doesn’t rely on the success or failure of the previous one and so can be sent multiple times for the same resource. This may not be desirable in POST where you often don’t want content to be submitted twice.

PATCH is a special case that gets abused often (almost as much as PUT) and it’s simply a set of instructions on how to modify a resource already present on the server. Learn more about these methods before implementing PUT or PATCH.

Never touch $_GET, $_POST or $_FILES directly throughout your application. Always use filters and sanitization to ensure you’re getting the type of content you expected. For $_GET, Regular Expressions will usually suffice since we’re not dealing with HTML. Never handle HTML content with regex. The following is a friendly URL router for a possible blog or similar application.


namespace Blog; //... Or something

class Router {
	 * @var array Methods, routes and callbacks
	private static $routes	= array();
	 * Router constructor
	public function __construct() {	}
	 * Add a request method with an accompanying route and callback
	 * @param	string		$method Lowercase request method
	 * @param	string		$route Simple regex route path
	 * @param	callable	$callback Function call
	public function add( $method, $route, $callback ) {
		// Format the regex pattern
		$route = self::cleanRoute( $route );
		// First time we're adding a path to this method?
		if ( !isset( self::$routes[$method] ) ) {
			 self::$routes[$method] = array();
		// Add a route to this method and set callback as value
		self::$routes[$method][$route] = $callback;
	 * Sort all sent routes for the current request method, iterate 
	 * through them for a match and trigger the callback function
	public function route() {
		if ( empty( self::$routes ) ) { // No routes?
		// Client request path
		$path	= $_SERVER['REQUEST_URI'];
		// Client request method
		$method = strtolower( $_SERVER['REQUEST_METHOD'] );
		// No routes for this method?
		if ( empty( self::$routes[$method] ) ) {
		// Found flag
		$found	= false;
		// For each path in each method, iterate until match
		foreach( self::$routes[$method] as $route => $callback ) {
			// Found a match for this method on this path
			if ( preg_match( $route, $path, $params ) ) {
				$found = true; // Set found flag
				if ( count( $params ) > 0) {
					// Clean parameters
					array_shift( $params );
				// Trigger callback
				return call_user_func_array( 
					$callback, $params 
		// We didn't find a path 
		if ( !$found ) {
	 * Paths are sent in bare. Make them suitable for matching.
	 * @param	string		$route URL path regex
	private static function cleanRoute( $route ) {
		$regex	= str_replace( '.', '\.', $route );
		return '@^/' . $route . '/?$@i';
	 * Possible 404 not found handler. 
	 * Something that looks nicer should be used in production.
	private function fourOhFour() {
		die( "<em>Couldn't find the page you're looking for.</em>" );

You can then utilize it as follows

// Main index. 
function index( $page = 1 ) {
	// Do something with the given page number

function read( $id, $page = 1 ) {
	// Do something with $id and page number

// Now, you can create the router
$router		= new Blog\Router();

// Browsing index or homepage
$router->add( 'get', '', 'index' );
$router->add( 'get', '([1-9][0-9]+)', 'index' );

// Note: The regex requires the page number to start from 1-9

// Specific post
$router->add( 'get', '/post/([1-9][0-9]+)', 'read' );
	'post/([1-9][0-9]+)/([1-9][0-9]+)', // ID and pages start from 1-9

// Now we can route

When handling POST content, we have to be a little more careful. The following is an example of a content post filter which uses typical fields and PHP’s built in content filtering

function getPost() {
	$filter	= array(
	return filter_input_array( INPUT_POST, $filter );

You probably want to do some special formatting for filtering HTML, but this gets rid of the overwhelming majority of undesired inputs a client may send. The filter_input_array function is quite useful for building content with multiple fields at once. When the field has not been sent, the array value will be NULL. You’ll note the ‘csrf’ field. It’s important to ensure that content sent by the user was actually intended, and anti-cross-site request forgery tokens are very helpful in that regard.


Looks mighty suspicious!

Looks mighty suspicious!

The only safe way to ensure communication between a user and the server is secure is when the connection uses TLS. Even then, you should avoid storing the username or user ID in the cookie of a logged in user as that is sent on each request to the server. Instead, use an ‘auth’ field in your database table that is a randomly generated hash as the identifier. When the logged in user visits the site, the random hash is sent to the server and the server can use that to lookup the user instead of an ID or username. The ‘auth’ token should be renewed after each successful login.

As an additional benefit, using an auth hash will make it easy to force logout a user simply by deleting the hash stored in the database. If you believe a user’s password has been compromised or if the user requests a password reset, it’s best to delete the auth token, and send a separate link (which expires within the hour and is valid for single-use) to the user’s email to be reset instead of generating a new one yourself.

If you want to add an additional bit of verification to the cookie, you can add a hash of the client’s request signature. This is not going to be unique at all, but it will make spoofing a tiny bit harder for someone who simply steals the cookie without making note of the browser characteristics of the victim user. Keep in mind that if the cookie was sniffed in clear text, this may not help much. Remember that nothing seen in “HTTP_” header variables are reliable.

function signature() {
	$out = '';
	foreach ( $_SERVER as $k => $v ) {
		switch( $k ) {
			case 'HTTP_UA_CPU':
			case 'HTTP_USER_AGENT':
			case 'HTTP_VIA':
				$out .= $v;
	return hash( 'tiger160,4', $out );

Note that I avoided using the client’s IP address which may change often and is sometimes shared with popular proxies. Storing the output of this hash with the cookie along with the auth token will help to avoid identifying the user by name or user ID using the cookie alone.

From the inside

The hardest position to defend against is when the attacker is on the inside. There’s a large swath of information out there about compartmentalization, decentralization and restricting access to information to those who need to know. Instead, I’ll leave you with this excerpt from The Godfather Part II.

Michael Corleone: There’s a lot I can’t tell you, Tom. Yeah, I know that’s upset you in the past. You felt it was a because of a lack of trust or confidence, but it’s… it’s because I admire you. And I love you that, I kept things secret from you. It’s why at this moment that you’re the only one I can completely trust.

Fredo. Ah, he’s got a good heart. But he’s weak and he’s stupid. And this is life and death. Tom, you’re my brother.

Tom Hagen: I always wanted to be thought of as a brother by you, Mikey. A real brother.

Michael: You’re gonna take over. You’re gonna be the Don. If what I think is happened has happened, I’m gonna leave here tonight. I give you complete power, Tom. Over Fredo and his men. Rocco, Neri, everyone. I’m trusting you with the lives of wife and my children. The future of this family.

Tom: If we ever catch these guys do you think we’ll find out who’s at the back of all this?

Michael: We’re not gonna catch ’em. Unless I’m very wrong, they’re dead already. Killed by somebody close to us. Inside. Very, very frightened they’ve botched it.

Tom: But your people, Rocco and Neri, you don’t think they had something to do with this.

Michael: You see, all our people are businessmen. Their loyalty is based on that. One thing I learned from pop, was to try to think as people around you think. Now on that basis, anything is possible.

Storing Database Credentials (and other stuff) in php.ini

If you’re storing your database password + username and other secure information in just any old .php file in your application, you’re doing it so, very, very, very wrong. If you must physically store these keys to the castle, the old method with Apache used to be SetEnv. Of course, not everyone uses Apache these days (I use Nginx on my *nix boxes).

The best place to store these things is in an .ini file. Specifically, for content that rarely, if ever, changes (I.E. database connection strings) it should be php.ini. Every PHP installation should have one and if you don’t have access to this, it’s time to switch web hosts.

In your php.ini, you can add the following or equivalent settings somewhere in the bottom.

myapp.cfg.DB_HOST = 'mysql:host=;dbname=mydatabase'
myapp.cfg.DB_USER = 'dbusername'
myapp.cfg.DB_PASS = 'dbpassword'

Note: MyCustomApp is just the configuration label set to that particular group of settings. It’s good practice to give labels to your configuration settings and group them together. Especially if you move on to have a lot more of them later on.

Here is a very simple bit of code to load the above settings into globally defined variables :

// Very simple loader
function loadConfig( $vars = array() ) {
	foreach( $vars as $v ) {
		define( $v, get_cfg_var( "myapp.cfg.$v" ) );

// Then call :
$cfg = array( 'DB_HOST', 'DB_USER', 'DB_PASS' );
loadConfig( $cfg );

Doing this is the far more secure method of setting up most other applications (including *cough* WordPress) as opposed to the old-school way, which I’m sure anyone who’s setup any PHP app in the past has dealt with:

 // Ordinary config.php or some such file 
define( 'DB_HOST', 'mysql:host=;dbname=mydatabase' );
define( 'DB_USER', 'dbusername' );
define( 'DB_PASS', 'dbpassword' );

The best way to prevent information you have on your hands falling into the wrong hands is to not have it in your hands. If some misconfiguration results in raw PHP files being served as text files (this happens far more often than you might think), the only thing you’ve exposed is just the site code, not your DB credentials AWS passwords, secret salts etc…


As mentioned above, not everyone will have access to php.ini from their web host (which, as I also said, is a good hint it’s time to switch hosts). You will also need to reload PHP to ensure the new configuration changes will take effect. It’s possible to gracefully shutdown and restart these days, but that will mean a tinsey bit of down time of a few seconds at least so this will need to be done for configuration settings that are critical and yet will change infrequently. Or, if you’re using PHP-FPM with Nginx, you can start another FastCGI instance and have Nginx fail over to that.


The PDO driver for MySQL for some reason demands the username and password separately. I thought this is kinda silly since other drivers (E.G. Postgresql) can function just fine with a connection string such as :


Well, to keep the MySQL driver and many others happy, I’ve written a small helper class that intercepts the connection string and breaks it down so the username and password can be kept separate. It also works with the above php.ini trick in that you can now store a complete connection string as php.dsn.mydb or the like as shown in the PDO docs.

 * PDO Connector class
 * Modifies the DSN to parse username and password individually.
 * Optionally, gets the DSN directly from php.ini.
 * @author Eksith Rodrigo <reksith at>
 * @license ISC License
 * @version 0.1
class Cxn {
	protected $db;
	public function __construct( $dbh ) {
		$this->connect( $dbh );
	public function getDb() {	
		if ( is_object( $this->db ) ) {
			return $this->db;
		} else {
			die('There was a database problem');
	public function __destruct() {
		$this->db = null;

	private function connect( $dbh ) {
		if ( !empty( $this->db ) && is_object( $this->db ) ) {
		try {
			$settings = array(
				PDO::ATTR_TIMEOUT		=> "5",
				PDO::ATTR_PERSISTENT		=> false
			$this->_dsn( $dbh, $username, $password );
			$this->db = new PDO( $dbh, $username, $password, $settings );
		} catch ( PDOException $e ) {
			exit( $e->getMessage() );
	 * Extract the username and password from the DSN and rebuild
	private function _dsn( &$dsn, &$username = '', &$password = '' ) {
		 * No host name with ':' would mean this is a DSN name in php.ini
		if ( false === strrpos( $dsn, ':' ) ) {
			 * We need get_cfg_var() here because ini_get doesn't work
			$dsn = get_cfg_var( "php.dsn.$dsn" );
		 * Some people use spaces to separate parameters in
		 * DSN strings and this is NOT standard
		$d = explode( ';', $dsn );
		$m = count( $d );
		$s = '';
		for( $i = 0; $i < $m; $i++ ) {
			$n = explode( '=', $d[$i] );

			// Empty parameter? Continue
			if ( count( $n ) <= 1 ) {
				$s .= implode( '', $n ) . ';';
			switch( trim( $n[0] ) ) {
				case 'uid':
				case 'user':
				case 'username':
					$username = trim( $n[1] );
				case 'pwd':
				case 'pass':
				case 'password':
					$password = trim( $n[1] );
				default: // Some other parameter? Leave as-is
					$s .= implode( '=', $n ) . ';';
		$dsn = $s;

You can use this class with :

$cxn = new Cxn( DBH );

Where DBH came from (hopefully) php.ini.

Log All The Things!

Recently on Hacker News, a new product called Heap was linked and quickly rose to the front page. Heap is an analytics tool that captures everything that happens on your web site. I mean everything: clicks, submissions, even perhaps mouse movements, and it’s all dumped to a database that you can analyze later for marketing etc…

It’s a UI firehose

I’m not knocking the product since they’ve obviously spent a lot of time and energy and I can respect that. I hope they’re successful. But anyone using things like these have to keep one thing in mind…

Data is like a plucked flower; it starts to die and go stale the moment it’s captured. How soon it goes stale depends on the type of data, but you then need to make sure it’s shifted (another firehose) to analytics ASAP. It needs to be looked at for actual usable information and interpreted somehow to monetize, if that’s your intention, or feed further research.

By simply capturing everything, you’re effectively creating a very large feed of data that you may or may not use. You’re facing a similar conundrum as the U.S. intelligence agencies who (we’re told) collect and log virtually every cell transmission and, allegedly, every email passing through computers within and, also allegedly, outside our borders.

Who said or did what, and more importantly, why? Are they really interested? Curious? Can be driven to become interested? Marketed to? Can they get their friends, family and colleagues interested? Can they give me money? These are more important questions to answer first before you go about Logging All The Things. I have a criticism about this further below.

Which brings us to those users who don’t fall under the umbrella of “marketable” for technical reasons.

The problem of backward compatibility

Is it just me or IE8 is just not considered by web startups anymore ? Sorry to deviate a little but every time I try to look at a “Show HN” in IE8 (work computer), it fails for about 85% of the time. I could understand that some startups heavily depend on latest browsers but what about others ? – codegeek

No it’s not your imagination.

The web doesn’t all flock to Firefox or Chrome or Webkit or, the soon to be former independent, Opera. By limiting themselves to a relatively smaller subset of web users, Heap and similar products have effectively decided not to bother with the rest. To be fair to them: they’re probably not wrong.

The vast majority of analytics, web apps, shiny new HTML5 things are all geared toward the tech-savvy crowd who are connected to everything all the time and, for better or for worse, are geared toward the browser as an OS substitute. Those interested in these things in the first place will likely be running a browser capable of those very things, but this now takes away the potential consumption of the web.

The web is a product delivery network

It used to be a means of just communication, then trade, then sharing and now it’s all about content delivery. “Content” as in data is now a product itself so along with your shoes, fishing rods/reels, computers etc… you now have data being sold as well.

To sell data, you must first gather it, and then you bring yourself back full circle to the aforementioned stagnation problem. Captured data is useless if it cannot be applied in a meaningful way and it becomes harder to apply to anything useful if you have too much of it.

As DevOps Borat put so eloquently :

I don’t want to be marketed to

This isn’t really a secret, but I don’t think I’ve mentioned it here before. When generally I browse for leisure (which I sadly don’t get to do as often these days) and it doesn’t involve me watching YouTube videos or playing a game or some form of interaction, I browse with JavaScript and all plugins disabled in Firefox.

I realise this involves killing advertising on sites I enjoy and I don’t completely feel comfortable doing it, but the alternative is more intrusive and objectionable to me and I don’t think I’m alone in this.

As the old addage goes: If you’re using a service for free, you’re the product.

This holds true of YouTube, Facebook, Google+, Gmail and yes, even WordPress. Advertising and value added features are how these services stay afloat and I can appreciate that. I also hope that they can appreciate the sheer volume of crap constantly targeted at me through the use of cookies, JavaScript and of course my IP (I’m sure) no matter where I go and what I do.

I’m old fasioned

I remember a time when web pages were hastily constructed bits of content consisting of tables, poorly contrasting background images, tags and barely functioning CSS that broke in any browser other than IE5 or Netscape. It looks like we may be returning to these bad old days with newer technology.

Governments and universities controlled most of the internet connectivity — for better or for worse — and the few companies that did let you build a site for free on the newly emerging “web” were Tripod, Geocities, AOL Hompages et al… and they too made sure there was ample advertising (or value added as in the case of AOL).

But you know what?

Aside from the odd virus or two, since those are also ubiquitus (and antivirus was/is Snake Oil), the blistering popup storm that can be managed if you knew how to tweak Netscape or installed the latest popup blocker, It was still managable.

I could actually consume the web without being consumed

I didn’t mind the ads that sold me dates to college students, mostly cause I was still in junior high, but also cause they didn’t know which site I visited before or where I was looking or what I liked to buy (eBay started in 1995 and I thought it was the best idea since sliced bread).

But these days suddenly web sites that have nothing to do with what I was looking at before, know which ads to show me.

I’m visiting a site on chemistry books that’s showing me ads to fishing reels. How did they know I was looking for fishing reels before?

I’m visitng a site on telescopes and they’re showing me ads on test tubes and beakers.

I’m visiting a site on printing and homemade paper and there’s an ad on star tracking scopes and GPS.

What is this madness?

Now as for the folks at Heap who may be getting, an undeserved, flogging from me for contributing to this tracking malarkey, I apologize for coming off as somewhat irascible. It’s not your fault since you’re only contributing to the demand.

What worries me is that there is demand.

ID Obfuscation Part II

Last week, I wrote a simple function for obfuscating a string that can be used to shorten URLs. I got a few emails from people who would actually like to obfuscate an ID key (E.G. a numeric primary key) of a large size (E.G. a PostgreSQL ‘bigserial’ type which can go up to 231). So many examples out there, but they seem convert the input to integers first, which can lead to loss of precision, especially in PHP.

I use Postgres too and I’ve moved around the big number problem by appending a random digit or two to the front and then encoding the whole thing. So when I need the original, I just decode it and remove the front digit(s). This does two things: It obfuscates the ID (no one needs to know 10001 and 10002 are neighbors) and makes sure each one is unique as long as the key given to it is unique. Of course if it’s a primary key from a database, you won’t have to worry too much about uniqueness; it already is. And since I’m always appending the same number of digits as I’ll remove when decoding, it doesn’t matter how large the number gets.

So here’s a function that will create a shortened ID from a given numeric key in PHP :

public function ConvertKey( $k, $create = false ) {
	$range = str_split( '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' );
	$il = strlen( $k );
	$cl = 62; // count( $range ) is redundant;
	$out = '';
	// Get number from key
	if( $create ) {
		$out = 0;
		// Letter/number to array key swap
		$a = array_flip( $range );
		for( $i = 0; $i < $il; $i++ ) {
			$c = $k[$i];
			$n = bcpow( $cl, $il - $i - 1 );
			$out = bcadd( $out, bcmul( $a[$c], $n ) );
		// Strip front two random digits (appended below)
		$out = substr( $out, 2 );
	} else {
		// Append two random digits to the front
		// (NOT added, just attached to the front)
		$k = mt_rand( 10, 99 ) . $k . '.0';
		do {
			$c = bcmod( $k, $cl );
			$out .= $range[$c];
			$k = bcdiv( bcsub( $k, $c ), $cl );
		} while( bccomp( $k, 0 ) > 0 );
		// We worked from back to front
		$out = strrev( $out );
	return $out;

You can test this out by sticking it in a loop :

for( $i = 5000; $i < 6000; $i++ ) {
	$kConverted = ConvertKey( $i );
	$kOriginal = ConvertKey( $kConverted, true );
	echo $i . ' - ' . $kConverted . ' - ' . $kOriginal . '<br />';

Of course, you’ll need to keep in mind that the generated key will be different each time you run it, however the end result after decoding will be the same.

I also wrote a post on encryption with… *ahem*… colorful comments and, thankfully, most people stuck to the actual code itself when contacting me about it. Yes, I did change the encryption mode from CFB to CBC. CFB doesn’t need padding so I wasn’t lying about the sleep-deprivation. Thanks to those who wrote to me about it.