Defensive web development

Whether the currency in question is dollars, Bitcoin, moral principles or infamy, a compromised site is just the end result of a business transaction. The purpose of this post is to consider the basic options in making this business unfavorable to an attacker; not eliminate it altogether. There are circumstances in which the business of compromise will still take place even in extremely unfavorable or, unforeseen to you, favorable conditions. Although some of the examples are in PHP as implemented on a typical *nix environment, the ideas here should apply to most other development conditions.

Broad premises

Reasons for compromise beyond “because they could” should be considered irrelevant.

You will not think of every conceivable approach to compromise so plan for contingencies. Always keep current backups, leave customer data segregated and encrypted, and never test on a production machine or connect to a production environment during testing. Always turn off debugging info and error messages where they may be seen by clients. Never store passwords, keys to storage servers, authentication tokens etc… in your script files. If these must be used in some way by your code, try storing them in php.ini or in a folder outside the web root in a per-user .ini that only PHP has read access to, but the http server does not.

What do you do when they come for you

What do you do when they come for you

Enable two factor authentication for any critical services that use the feature (especially your email). If you have login or administrator privileges for your project, never use HTML email. In fact, I’d recommend not using HTML in emails at all and filtering any clickable links into plain URLs that you can copy > paste if you need to visit them.

You won't always see it coming. Even if you do, you may not be able to avoid it.

You won’t always see it coming. Even if you do, you may not be able to avoid it.


Try to avoid “I’ve done everything I could” and “that’s probably OK” lines of thought, but do prioritize critical sections and continue to explore responses to undesirable inputs and conditions. E.G. Try to throw strings or whole files at fields where you were expecting an integer. The type of input, E.G. <select>, <input type=”email”> etc… means nothing to someone who has the “action” URL of your form. Send ridiculously large text, cookies, binaries or otherwise malformed content and see how the server responds. Always validate and sanitize client data.

In the same vein, blacklists are not favorable compared to whitelists when filtering. Only allowing inputs that follow a known set of acceptable criteria is simply a matter of practicality (and in most cases, feasibility since you probably lack omnipotence). An attacker need not succeed on every attempt at compromise, but a defender only gets to fail once. And that single failure could be catastrophic.

Always make sure your read/write/execute privileges are appropriate to minimize chances of accidental exposure. Never allow uploads to folders that have execute permissions and never allow write permissions on executable folders. Put script files outside your web root whenever possible and try to avoid applications and web hosts that limit these options. Consider putting file uploads outside the web root as well and let your scripting handle access to them by stripping out invalid path characters and specifying which directory to search. This creates for some additional overhead, but it prevents the http server from reading uploads directly which may lead to directory traversal if the server isn’t configured properly.

Client requests

Stick to what you can actually digest

Stick to what you can actually digest


Read on GET, act on POST, do nothing special on HEAD, use PUT or PATCH with extreme caution, filter all and let the rest die();

The GET method is for retrieval I.E. reading and you should concentrate on that. Generally, we want to avoid writing to a database on GET unless it’s for statistics or analytics purposes (*).

* Analytics needs a major overhaul. You don’t need to record everything a visitor does on your page and almost everything you do record will be obsolete fairly quickly. So unless you run an ad company, keep analytics to an absolute minimum. Always remember, more “things” are more moving parts and moving parts tend to fail.

POST should be used for creating new content E.G. pages, posts, comments etc… When the database auto-increments IDs or otherwise generates unique identifiers for you, POST is a great way to handle content creation. When using PUT or PATCH, you’re telling the server what the name the resource is. This is not quite the same as a content post title which can double as a URL slug; the database still has an auto-generated ID unique to that post. The resource handler needs to account for name conflict resolution, and the fact that PUT is idempotent. That is, the current request doesn’t rely on the success or failure of the previous one and so can be sent multiple times for the same resource. This may not be desirable in POST where you often don’t want content to be submitted twice.

PATCH is a special case that gets abused often (almost as much as PUT) and it’s simply a set of instructions on how to modify a resource already present on the server. Learn more about these methods before implementing PUT or PATCH.

Never touch $_GET, $_POST or $_FILES directly throughout your application. Always use filters and sanitization to ensure you’re getting the type of content you expected. For $_GET, Regular Expressions will usually suffice since we’re not dealing with HTML. Never handle HTML content with regex. The following is a friendly URL router for a possible blog or similar application.

<?php

namespace Blog; //... Or something

class Router {
	
	/**
	 * @var array Methods, routes and callbacks
	 */
	private static $routes	= array();
	
	/**
	 * Router constructor
	 */
	public function __construct() {	}
	
	/**
	 * Add a request method with an accompanying route and callback
	 * 
	 * @param	string		$method Lowercase request method
	 * @param	string		$route Simple regex route path
	 * @param	callable	$callback Function call
	 */
	public function add( $method, $route, $callback ) {
		// Format the regex pattern
		$route = self::cleanRoute( $route );
		
		// First time we're adding a path to this method?
		if ( !isset( self::$routes[$method] ) ) {
			 self::$routes[$method] = array();
		}
		
		// Add a route to this method and set callback as value
		self::$routes[$method][$route] = $callback;
	}
	
	/**
	 * Sort all sent routes for the current request method, iterate 
	 * through them for a match and trigger the callback function
	 */
	public function route() {
		if ( empty( self::$routes ) ) { // No routes?
			$this->fourOhFour();
		}
		
		// Client request path
		$path	= $_SERVER['REQUEST_URI'];
		
		// Client request method
		$method = strtolower( $_SERVER['REQUEST_METHOD'] );
		
		// No routes for this method?
		if ( empty( self::$routes[$method] ) ) {
			$this->fourOhFour();
		}
		
		// Found flag
		$found	= false;
		
		// For each path in each method, iterate until match
		foreach( self::$routes[$method] as $route => $callback ) {
			
			// Found a match for this method on this path
			if ( preg_match( $route, $path, $params ) ) {
				
				$found = true; // Set found flag
				if ( count( $params ) > 0) {
					// Clean parameters
					array_shift( $params );
				}
				
				// Trigger callback
				return call_user_func_array( 
					$callback, $params 
				);
			}
		}
		
		// We didn't find a path 
		if ( !$found ) {
			$this->fourOhFour();
		}
	}
	
	/**
	 * Paths are sent in bare. Make them suitable for matching.
	 * 
	 * @param	string		$route URL path regex
	 */
	private static function cleanRoute( $route ) {
		$regex	= str_replace( '.', '\.', $route );
		return '@^/' . $route . '/?$@i';
	}
	
	/**
	 * Possible 404 not found handler. 
	 * Something that looks nicer should be used in production.
	 */
	private function fourOhFour() {
		die( "<em>Couldn't find the page you're looking for.</em>" );
	}
}

You can then utilize it as follows

// Main index. 
function index( $page = 1 ) {
	// Do something with the given page number
}

function read( $id, $page = 1 ) {
	// Do something with $id and page number
}

// Now, you can create the router
$router		= new Blog\Router();

// Browsing index or homepage
$router->add( 'get', '', 'index' );
$router->add( 'get', '([1-9][0-9]+)', 'index' );

// Note: The regex requires the page number to start from 1-9

// Specific post
$router->add( 'get', '/post/([1-9][0-9]+)', 'read' );
$router->add( 
	'get', 
	'post/([1-9][0-9]+)/([1-9][0-9]+)', // ID and pages start from 1-9
	'read' 
);

// Now we can route
$router->route();

When handling POST content, we have to be a little more careful. The following is an example of a content post filter which uses typical fields and PHP’s built in content filtering

function getPost() {
	$filter	= array(
		'csrf'	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'id' 	=> FILTER_SANITIZE_NUMBER_INT,
		'parent'=> FILTER_SANITIZE_NUMBER_INT,
		'title' => FILTER_SANITIZE_FULL_SPECIAL_CHARS,
		'body' 	=> FILTER_SANITIZE_FULL_SPECIAL_CHARS
	);
	
	return filter_input_array( INPUT_POST, $filter );
}

You probably want to do some special formatting for filtering HTML, but this gets rid of the overwhelming majority of undesired inputs a client may send. The filter_input_array function is quite useful for building content with multiple fields at once. When the field has not been sent, the array value will be NULL. You’ll note the ‘csrf’ field. It’s important to ensure that content sent by the user was actually intended, and anti-cross-site request forgery tokens are very helpful in that regard.

Authentication

Looks mighty suspicious!

Looks mighty suspicious!


The only safe way to ensure communication between a user and the server is secure is when the connection uses TLS. Even then, you should avoid storing the username or user ID in the cookie of a logged in user as that is sent on each request to the server. Instead, use an ‘auth’ field in your database table that is a randomly generated hash as the identifier. When the logged in user visits the site, the random hash is sent to the server and the server can use that to lookup the user instead of an ID or username. The ‘auth’ token should be renewed after each successful login.

As an additional benefit, using an auth hash will make it easy to force logout a user simply by deleting the hash stored in the database. If you believe a user’s password has been compromised or if the user requests a password reset, it’s best to delete the auth token, and send a separate link (which expires within the hour and is valid for single-use) to the user’s email to be reset instead of generating a new one yourself.

If you want to add an additional bit of verification to the cookie, you can add a hash of the client’s request signature. This is not going to be unique at all, but it will make spoofing a tiny bit harder for someone who simply steals the cookie without making note of the browser characteristics of the victim user. Keep in mind that if the cookie was sniffed in clear text, this may not help much. Remember that nothing seen in “HTTP_” header variables are reliable.

function signature() {
	$out = '';
	foreach ( $_SERVER as $k => $v ) {
		switch( $k ) {
			case 'HTTP_ACCEPT_CHARSET':
			case 'HTTP_ACCEPT_ENCODING':
			case 'HTTP_ACCEPT_LANGUAGE':
			case 'HTTP_UA_CPU':
			case 'HTTP_USER_AGENT':
			case 'HTTP_VIA':
			case 'HTTP_CONNECTION':
				$out .= $v;
		}
	}
	return hash( 'tiger160,4', $out );
}

Note that I avoided using the client’s IP address which may change often and is sometimes shared with popular proxies. Storing the output of this hash with the cookie along with the auth token will help to avoid identifying the user by name or user ID using the cookie alone.

From the inside

The hardest position to defend against is when the attacker is on the inside. There’s a large swath of information out there about compartmentalization, decentralization and restricting access to information to those who need to know. Instead, I’ll leave you with this excerpt from The Godfather Part II.

Michael Corleone: There’s a lot I can’t tell you, Tom. Yeah, I know that’s upset you in the past. You felt it was a because of a lack of trust or confidence, but it’s… it’s because I admire you. And I love you that, I kept things secret from you. It’s why at this moment that you’re the only one I can completely trust.

Fredo. Ah, he’s got a good heart. But he’s weak and he’s stupid. And this is life and death. Tom, you’re my brother.

Tom Hagen: I always wanted to be thought of as a brother by you, Mikey. A real brother.

Michael: You’re gonna take over. You’re gonna be the Don. If what I think is happened has happened, I’m gonna leave here tonight. I give you complete power, Tom. Over Fredo and his men. Rocco, Neri, everyone. I’m trusting you with the lives of wife and my children. The future of this family.

Tom: If we ever catch these guys do you think we’ll find out who’s at the back of all this?

Michael: We’re not gonna catch ’em. Unless I’m very wrong, they’re dead already. Killed by somebody close to us. Inside. Very, very frightened they’ve botched it.

Tom: But your people, Rocco and Neri, you don’t think they had something to do with this.

Michael: You see, all our people are businessmen. Their loyalty is based on that. One thing I learned from pop, was to try to think as people around you think. Now on that basis, anything is possible.

Advertisement

To register or not register

I’m at an impasse at the moment with regard to the forum. The classic way to run a forum was to create a user account with username, password and email that tied each and every post to a particular user. This made viewing the history of a user and establishing a reputation easy, but it also meant established users asserted their authority quite often. Sometimes objectionably.

Then there’s the ye olde imageboard system where a user may enter a name and password, but it’s only tagged per post via a pseudo-unique identifier. I’m not sure if this method is better than the registration, but it does cut down on the code requirements. It also makes viewing a user history more difficult as the system deliberately caters to anonymous posting first.

4Chan, the most well known imageboard in the West, uses this system as well. Something it inherited from 2Ch, the most famous textboard in the East. Despite 4Chan’s reputation as a wretched hive of scum and villainy a la Mos Eisley, there are sections that are remarkably well kept despite the anonymity. I’ve even seen intelligent and remarkably humane discussions take place, on a few salient boards.

Of course, registration doesn’t  automatically make for a well kept community either. Reddit, for example, can easily surpass the reputation of 4Chan. A cursory browse of some of the more unsavory subreddits can easily depress the most optimistic folks with an unshakable faith in humanity. Likewise, there are others that offer the same or better intelligent content as well. Of course, it also offers many other flavors that don’t quite fit anywhere on the spectrum of discussion.

The difference, then, is moderation.

I’m trying to create a voting system that, while remaining anonymous, still affords users a voice at a balanced volume in determining what should be promoted to the front page or remain in the “New/Firehose” section or which ones should be nuked from orbit. I also want to ensure voting power decreases over time. I.E. When a post is new, all votes for or against it count more than when it’s a few hours old. I think this prevents excessive judgment with the hindsight of over-analyzed social norms which, for better or worse, tend to be overcorrected. The user interface and online disinhibition make sound judgments more difficult, but we should all know what is obviously wrong upon first read and right away take appropriate action.

This level of self-moderation with rare moderator intervention early can work as long as consistency is maintained. I don’t believe in excessively long codes of conducts, which are seldom followed by those intent on not following them anyway. I mean the first law in all civil discourse is “Don’t be an ass”. How hard is that? Those obviously being asses are easy to spot and should have their candy taken away.

In that regard, I’m still following the old tried and true approach to community building and moderation. Least amount of friction, least amount of fluff, brutally simple and consistent.

Right then. Onward to building the damn thing.

Whitelist HTML sanitizing with PHP

The following is a single class written to perform comprehensive HTML input filtering with minimal dependencies (basically only Tidy) and should work in PHP 5.3+. This will be included in my forum script as the default filter.

This version captures URL encoded XSS attempts with deep attribute inspection (to a decoding depth of 6 by default) as well as scrubbing all non-whitelisted attributes, tags and conversion of surviving attribute data into HTML entities.

In addition, it will attempt to capture directory traversal attempts ( ../ or \\ or /~/ etc… ) which may give access to restricted areas of a site. Your web server should deny access to these URLs by default, however that won’t stop someone from posting links pointing elsewhere. This will reduce your liability should such a link be included in your site content by a user.

You can post sourcecode within <code> tags and it will be encoded by default.

<?php

/**
 * HTML parsing, filtering and sanitization
 * This class depends on Tidy which is included in the core since PHP 5.3
 *
 * @author Eksith Rodrigo <reksith at gmail.com>
 * @license http://opensource.org/licenses/ISC ISC License
 * @version 0.2
 */

class Html {
	
	/**
	 * @var array HTML filtering options
	 */
	public static $options = array( 
		'rx_url'	=> // URLs over 255 chars can cause problems
			'~^(http|ftp)(s)?\:\/\/((([a-z|0-9|\-]{1,25})(\.)?){2,7})($|/.*$){4,255}$~i',
		
		'rx_js'		=> // Questionable attributes
			'/((java)?script|eval|document)/ism',
		
		'rx_xss'	=> // XSS (<style> can also be a vector. Stupid IE 6!)
			'/(<(s(?:cript|tyle)).*?)/ism',
		
		'rx_xss2'	=> // More potential XSS
			'/(document\.|window\.|eval\(|\(\))/ism',
		
		'rx_esc'	=> // Directory traversal/escaping/injection
			'/(\\~\/|\.\.|\\\\|\-\-)/sm'	,
		
		'scrub_depth'	=> 6, // URL Decoding depth (fails on exceeding this)
		
		'nofollow'	=> true // Set rel='nofollow' on all links

	);
	
	/**
	 * @var array List of HTML Tidy output settings
	 * @link http://tidy.sourceforge.net/docs/quickref.html
	 */
	private static $tidy = array(
		// Preserve whitespace inside tags
		'add-xml-space'			=> true,
		
		// Remove proprietary markup (E.G. og:tags)
		'bare'				=> true,
		
		// More proprietary markup
		'drop-proprietary-attributes'	=> true,
		
		// Remove blank (E.G. <p></p>) paragraphs
		'drop-empty-paras'		=> true,
		
		// Wraps bare text in <p> tags
		'enclose-text'			=> true,
		
		// Removes illegal/invalid characters in URIs
		'fix-uri'			=> true,
		
		// Removes <!-- Comments -->
		'hide-comments'			=> true,
		
		// Removing indentation saves storage space
		'indent'			=> false,
		
		// Combine individual formatting styles
		'join-styles'			=> true,
		
		// Converts <i> to <em> & <b> to <strong>
		'logical-emphasis'		=> true,
		
		// Byte Order Mark isn't really needed
		'output-bom'			=> false,
		
		// Ensure UTF-8 characters are preserved
		'output-encoding'		=> 'utf8',
		
		// W3C standards compliant markup
		'output-xhtml'			=> true,
		
		// Had some unexpected behavior with this
		//'markup'			=> true,

		// Merge multiple <span> tags into one		
		'merge-spans'			=> true,
		
		// Only outputs <body> (<head> etc... not needed)
		'show-body-only'		=> true,
		
		// Removing empty lines saves storage
		'vertical-space'		=> false,
		
		// Wrapping tags not needed (saves bandwidth)
		'wrap'				=> 0
	);
	
	
	/**
	 * @var array Whitelist of tags. Trim or expand these as necessary
	 * @example 'tag' => array( of, allowed, attributes )
	 */
	private static $whitelist = array(
		'p'		=> array( 'style', 'class', 'align' ),
		'div'		=> array( 'style', 'class', 'align' ),
		'span'		=> array( 'style', 'class' ),
		'br'		=> array( 'style', 'class' ),
		'hr'		=> array( 'style', 'class' ),
		
		'h1'		=> array( 'style', 'class' ),
		'h2'		=> array( 'style', 'class' ),
		'h3'		=> array( 'style', 'class' ),
		'h4'		=> array( 'style', 'class' ),
		'h5'		=> array( 'style', 'class' ),
		'h6'		=> array( 'style', 'class' ),
		
		'strong'	=> array( 'style', 'class' ),
		'em'		=> array( 'style', 'class' ),
		'u'		=> array( 'style', 'class' ),
		'strike'	=> array( 'style', 'class' ),
		'del'		=> array( 'style', 'class' ),
		'ol'		=> array( 'style', 'class' ),
		'ul'		=> array( 'style', 'class' ),
		'li'		=> array( 'style', 'class' ),
		'code'		=> array( 'style', 'class' ),
		'pre'		=> array( 'style', 'class' ),
		
		'sup'		=> array( 'style', 'class' ),
		'sub'		=> array( 'style', 'class' ),
		
		// Took out 'rel' and 'title', because we're using those below
		'a'		=> array( 'style', 'class', 'href' ),
		
		'img'		=> array( 'style', 'class', 'src', 'height', 
					  'width', 'alt', 'longdesc', 'title', 
					  'hspace', 'vspace' ),
		
		'table'		=> array( 'style', 'class', 'border-collapse', 
					  'cellspacing', 'cellpadding' ),
					
		'thead'		=> array( 'style', 'class' ),
		'tbody'		=> array( 'style', 'class' ),
		'tfoot'		=> array( 'style', 'class' ),
		'tr'		=> array( 'style', 'class' ),
		'td'		=> array( 'style', 'class', 
					'colspan', 'rowspan' ),
		'th'		=> array( 'style', 'class', 'scope', 'colspan', 
					  'rowspan' ),
		
		'q'		=> array( 'style', 'class', 'cite' ),
		'cite'		=> array( 'style', 'class' ),
		'abbr'		=> array( 'style', 'class' ),
		'blockquote'	=> array( 'style', 'class' ),
		
		// Stripped out
		'body'		=> array()
	);
	
	
	
	/**#@+
	 * HTML Filtering
	 */
	
	
	/**
	 * Convert content between code blocks into code tags
	 * 
	 * @param $val string Value to encode to entities
	 */
	protected function escapeCode( $val ) {
		
		if ( is_array( $val ) ) {
			$out = self::entities( $val[1] );
			return '<code>' . $out . '</code>';
		}
		
	}
	
	
	/**
	 * Convert an unformatted text block to paragraphs
	 * 
	 * @link http://stackoverflow.com/a/2959926
	 * @param $val string Filter variable
	 */
	protected function makeParagraphs( $val ) {
		
		/**
		 * Convert newlines to linebreaks first
		 * This is why PHP both sucks and is awesome at the same time
		 */
		$out = nl2br( $val );
		
		/**
		 * Turn consecutive <br>s to paragraph breaks and wrap the 
		 * whole thing in a paragraph
		 */
		$out = '<p>' . preg_replace('#(?:<br\s*/?>\s*?){2,}#', 
			'<p></p><p>', $out ) . '</p>';
		
		/**
		 * Remove <br> abnormalities
		 */
		$out = preg_replace( '#<p>(\s*<br\s*/?>)+#', '</p><p>', $out );
		$out = preg_replace( '#<br\s*/?>(\s*</p>)+#', '<p></p>', $out );
		
		return $out;
	}
	
	
	/**
	 * Filters HTML content through whitelist of tags and attributes
	 * 
	 * @param $val string Value filter
	 */
	public function filter( $val ) {
		
		if ( !isset( $val ) || empty( $val ) ) {
			return '';
		}
		
		/**
		 * Escape the content of any code blocks before we parse HTML or 
		 * they will get stripped
		 */
		$out	= preg_replace_callback( "/\<code\>(.*)\<\/code\>/imu", 
				array( $this, 'escapeCode' ) , $val
			);
		
		/**
		 * Convert to paragraphs and begin
		 */
		$out	= $this->makeParagraphs( $out );
		$dom	= new DOMDocument();
		
		/**
		 * Hide parse warnings since we'll be cleaning the output anyway
		 */
		$err	= libxml_use_internal_errors( true );
		
		$dom->loadHTML( $out );
		$dom->encoding = 'utf-8';
		
		$body	= $dom->getElementsByTagName( 'body' )->item( 0 );
		$this->cleanNodes( $body, $badTags );
		
		/**
		 * Iterate through bad tags found above and convert them to 
		 * harmless text
		 */
		foreach ( $badTags as $node ) {
			if( $node->nodeName != "#text" ) {
				$ctext = $dom->createTextNode( 
						$dom->saveHTML( $node )
					);
				$node->parentNode->replaceChild( 
					$ctext, $node 
				);
			}
		}
		
		
		/**
		 * Filter the junk and return only the contents of the body tag
		 */
		$out = tidy_repair_string( 
				$dom->saveHTML( $body ), 
				self::$tidy
			);
		
		
		/**
		 * Reset errors
		 */
		libxml_clear_errors();
		libxml_use_internal_errors( $err );
		
		return $out;
	}
	
	
	protected function cleanAttributeNode( 
		&$node, 
		&$attr, 
		&$goodAttributes, 
		&$href 
	) {
		/**
		 * Why the devil is an attribute name called "nodeName"?!
		 */
		$name = $attr->nodeName;
		
		/**
		 * And an attribute value is still "nodeValue"?? Damn you PHP!
		 */
		$val = $attr->nodeValue;
		
		/**
		 * Default action is to remove the attribute completely
		 * It's reinstated only if it's allowed and only after 
		 * it's filtered
		 */
		$node->removeAttributeNode( $attr );
		
		if ( in_array( $name, $goodAttributes ) ) {
			
			switch ( $name ) {
				
				/**
				 * Validate URL attribute types
				 */
				case 'url':
				case 'src':
				case 'href':
				case 'longdesc':
					if ( self::urlFilter( $val ) ) {
						$href = $val;
					} else {
						$val = '';
					}
					break;
				
				/**
				 * Everything else gets default scrubbing
				 */
				default:
					if ( self::decodeScrub( $val ) ) {
						$val = self::entities( $val );
					} else {
						$val = '';
					}
			}
			
			if ( '' !== $val ) {
				$node->setAttribute( $name, $val );
			}
		}
	}
	
	
	/**
	 * Modify links to display their domains and add 'nofollow'.
	 * Also puts the linked domain in the title as well as the file name
	 */
	protected static function linkAttributes( &$node, $href ) {
		try {
			if ( !self::$options['nofollow'] ) {
				return;
			}
			
			$parsed	= parse_url( $href );
			$title	= $parsed['host'] . ' ';
			
			$f	= pathinfo( $parsed['path'] );
			$title	.= ' ( /' . $f['basename'] . ' ) ';
				
			$node->setAttribute( 
				'title', $title
			);
			
			if ( self::$options['nofollow'] ) {
				$node->setAttribute(
					'rel', 'nofollow'
				);
			}
			
		} catch ( Exception $e ) { }
	}
	
	
	/**
	 * Iterate through each tag and add non-whitelisted tags to the 
	 * bad list. Also filter the attributes and remove non-whitelisted ones.
	 * 
	 * @param htmlNode $node Current HTML node
	 * @param array $badTags Cumulative list of tags for deletion
	 */
	protected function cleanNodes( $node, &$badTags = array() ) {
		
		if ( array_key_exists( $node->nodeName, self::$whitelist ) ) {
			
			if ( $node->hasAttributes() ) {
				
				/**
				 * Prepare for href attribute which gets special 
				 * treatment
				 */
				$href = '';
				
				/**
				 * Filter through attribute whitelist for this 
				 * tag
				 */
				$goodAttributes = 
					self::$whitelist[$node->nodeName];
				
				
				/**
				 * Check out each attribute in this tag
				 */
				foreach ( 
					iterator_to_array( $node->attributes ) 
					as $attr ) {
					$this->cleanAttributeNode( 
						$node, $attr, $goodAttributes, 
						$href
					);
				}
				
				/**
				 * This is a link. Treat it accordingly
				 */
				if ( 'a' === $node->nodeName && '' !== $href ) {
					self::linkAttributes( $node, $href );
				}
				
			} // End if( $node->hasAttributes() )
			
			/**
			 * If we have childnodes, recursively call cleanNodes 
			 * on those as well
			 */
			if ( $node->childNodes ) {
				foreach ( $node->childNodes as $child ) {
					$this->cleanNodes( $child, $badTags );
				}
			}
			
		} else {
			
			/**
			 * Not in whitelist so no need to check its child nodes. 
			 * Simply add to array of nodes pending deletion.
			 */
			$badTags[] = $node;
			
		} // End if array_key_exists( $node->nodeName, self::$whitelist )
		
	}
	
	/**#@-*/
	
	
	/**
	 * Returns true if the URL passed value is harmless.
	 * This regex takes into account Unicode domain names however, it 
	 * doesn't check for TLD (.com, .net, .mobi, .museum etc...) as that 
	 * list is too long.
	 * The purpose is to ensure your visitors are not harmed by invalid 
	 * markup, not that they get a functional domain name.
	 * 
	 * @param string $v Raw URL to validate
	 * @returns boolean
	 */
	public static function urlFilter( $v ) {
		
		$v = strtolower( $v );
		$out = false;
		
		if ( filter_var( $v, 
			FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED ) ) {
			
			/**
			 * PHP's native filter isn't restrictive enough.
			 */
			if ( preg_match( self::$options['rx_url'], $v ) ) {
				$out = true;
			} else {
				$out = false;
			}
			
			if ( $out ) {
				$out = self::decodeScrub( $v );
			}
		} else {
			$out = false;
		}
		
		return $out;
	}
	
	
	/**
	 * Regular expressions don't work well when used for validating HTML.
	 * It really shines when evaluating text so that's what we're doing here
	 * 
	 * @param string $v string Attribute name
	 * @param int $depth Number of times to URL decode
	 * @returns boolean True if nothing unsavory was found.
	 */
	public static function decodeScrub( $v ) {
		if ( empty( $v ) ) {
			return true;
		}
		
		$depth		= self::$options['scrub_depth'];
		$i		= 1;
		$success	= false;
		$old		= '';
		
		
		while( $i <= $depth && !empty( $v ) ) {
			// Check for any JS and other shenanigans
			if (
				preg_match( self::$options['rx_xss'], $v ) || 
				preg_match( self::$options['rx_xss2'], $v ) || 
				preg_match( self::$options['rx_esc'], $v )
			) {
				$success = false;
				break;
			} else {
				$old	= $v;
				$v	= self::utfdecode( $v );
				
				/**
				 * We found the the lowest decode level.
				 * No need to continue decoding.
				 */
				if ( $old === $v ) {
					$success = true;
					break;
				}
			}
			
			$i++;
		}
		
		
		/**
		 * If after decoding a number times, we still couldn't get to 
		 * the original string, then there's something still wrong
		 */
		if ( $old !== $v && $i === $depth ) {
			return false;
		}
		
		return $success;
	}
	
	
	/**
	 * UTF-8 compatible URL decoding
	 * 
	 * @link http://www.php.net/manual/en/function.urldecode.php#79595
	 * @returns string
	 */
	public static function utfdecode( $v ) {
		$v = urldecode( $v );
		$v = preg_replace( '/%u([0-9a-f]{3,4})/i', '&#x\\1;', $v );
		return html_entity_decode( $v, null, 'UTF-8' );
	}
	
	
	/**
	 * HTML safe character entitites in UTF-8
	 * 
	 * @returns string
	 */
	public static function entities( $v ) {
		return htmlentities( 
			iconv( 'UTF-8', 'UTF-8', $v ), 
			ENT_NOQUOTES | ENT_SUBSTITUTE, 
			'UTF-8'
		);
	}	
}

Usage is pretty simple:

$data = $_POST['body'];
$html = new Html();
$data = $html->filter( $data );

PDO for Fun and Profit (mostly fun)

Connecting and retrieving information from the database is usually the least sexy part of writing an app and so gets left behind a lot. The following classes, which are going to be part of the forum script, (more or less) should take some of the dreariness out of CRUD; that’s Create, Read, Update and Delete. The cornerstones of our universe. Shout out to Kevin Worthington. (Do people still say “shout out”? Man, I feel old.)

Let’s begin with the (semi) universal PDO connector. If you use Postgres, you can include your entire connection string in one line including username and password :

//PostgreSQL :
define( 'DBH', 'pgsql:host=localhost;dbname=hitlist;user=MichaelCorleone;password=IKnowItWasYouFredo' );

However the MySQL PDO driver will complain if you just include a single string like that as it demands the username and password separately. So, you’ll have to do the following :

//MySQL PDO connection (usually)
define( 'DBH', 'mysql:host=localhost;dbname=hitlist' );
define( 'DB_USER', 'MichaelCorleone' );
define( 'DB_PASS', 'IKnowItWasYouFredo' );

$dbh = new PDO( DBH, DB_USER, DB_PASS );

This is annoying.

There’s also the problem of security. If you just leave the connection strings in your script and for some reason the script gets served in plain-text (this happens far more often than you might think) there go your keys to the castle. The safest place to store connection strings is in your php.ini.

You can put the DSN name in a php.ini config as follows :

php.dsn.mydsn ='mysql:host=localhost;dbname=hitlist;username=MichaelCorleone;password=IKnowItWasYouFredo'

Now it’s the same as the (sane) Postgresql driver. But now you need a way to grab this complete DSN and dissect the username and password as necessary.

I wrote a handy class for this.

<?php
/**
 * PDO Connector class
 * Modifies the DSN to parse username and password individually.
 * Optionally, gets the DSN directly from php.ini.
 *
 * @author Eksith Rodrigo <reksith at gmail.com>
 * @license http://opensource.org/licenses/ISC ISC License
 * @version 0.2
 */

namespace Models;
 
class Cxn {
	
	/**
	 * @var object PDO connection.
	 */
	protected $db;
	
	
	public function __construct( $dbh ) {
		$this->connect( $dbh );
	}
	
	
	public function __destruct() {
		$this->db = null;
	}
	
	
	public function getDb() {
		
		if ( is_object( $this->db ) ) {
			return $this->db;
		} else {
			die('There was a database problem');
		}
	}

	private function connect( $dbh ) {
		if ( !empty( $this->db ) && is_object( $this->db ) ) {
			return;
		}
		
		try {
			$settings = array(
				\PDO::ATTR_TIMEOUT		=> "5",
				\PDO::ATTR_ERRMODE		=> 
						\PDO::ERRMODE_EXCEPTION,
				\PDO::ATTR_DEFAULT_FETCH_MODE	=> 
						\PDO::FETCH_ASSOC,
				\PDO::ATTR_PERSISTENT		=> true
			);
			
			$this->_dsn( $dbh, $username, $password );
			if ( !defined( 'DBTYPE' ) ) {
				$this->setDbType( $dbh );
			}
			
			if ( 'mysql' === DBTYPE || 'postgres' === DBTYPE) {
				/**
				 * Might be slightly slower, but more secure to 
				 * disable emulation
				 */
				$settings[\PDO::ATTR_EMULATE_PERPARES] = false;
			}
			
			$this->db = new \PDO( $dbh, $username, $password, 
						$settings );
			
			
		} catch ( \PDOException $e ) {
			exit( $e->getMessage() );
		}
	}
	
	/**
	 * Extract the username and password from the DSN and rebuild the 
	 * connection string
	 * 
	 * @param string $dsn Full connection string or DSN identifyer in php.ini
	 * @param string $username Optional username. If empty, it will extract from DSN
	 * @param string $password Also optional and will extract from DSN as above
	 */
	private function _dsn( &$dsn, &$username = '', &$password = '' ) {
		
		/**
		 * No host name with ':' would mean this is a DSN name in php.ini
		 */
		if ( false === strrpos( $dsn, ':' ) ) {
			
			/**
			 * We need get_cfg_var() here because ini_get doesn't work
			 * https://bugs.php.net/bug.php?id=54276
			 */
			$dsn = get_cfg_var( "php.dsn.$dsn" );
		}
		
		/**
		 * Some people use spaces to separate parameters in DSN strings 
		 * and this is NOT standard
		 */
		$d = explode( ';', $dsn );
		$m = count( $d );
		$s = '';
		
		for( $i = 0; $i < $m; $i++ ) {
			$n = explode( '=', $d[$i] );
			
			/**
			 * Empty parameter? Continue
			 */
			if ( count( $n ) <= 1 ) {
				$s .= implode( '', $n ) . ';';
				continue;
			}
			
			/**
			 * Username or password?
			 */
			switch( trim( $n[0] ) ) {
				case 'uid':
				case 'user':
				case 'username':
					$username = trim( $n[1] );
					break;
				
				case 'pwd':
				case 'pass':
				case 'password':
					$password = trim( $n[1] );
					break;
				
				/**
				 * Some other parameter? Leave as-is
				 */
				default:
					$s .= implode( '=', $n ) . ';';
			}
		}
		
		$dsn = rtrim( $s, ';' );
	}
	
	/**
	 * Sets the DBTYPE defined variable.
	 * Useful for database specific SQL.
	 * Expand as necessary.
	 */
	private function setDbType( $dsn ) {
		
		if ( 0 === strpos( $dsn, 'mysql' ) ) {
			
			define( 'DBTYPE', 'mysql' );
			
		} elseif ( 0 === strpos( $dsn, 'postgres' ) ) {
			
			define( 'DBTYPE', 'postgres' );
			
		} elseif ( 0 === strpos( $dsn, 'sqlite' ) ) {
			
			define( 'DBTYPE', 'sqlite' );
			
		} else {
			define( 'DBTYPE', 'other' );
		}
	}
}

Using this is pretty simple :

$cxn = new Cxn( 'mydsn' ); // Where php.dsn.mydsn is in php.ini
$db = $cxn->getDb();
$st = $db->prepare( 'SELECT * FROM whacks' );

$st->execute();
$whacks = $st->fetchAll();

Of course you want your family to be completely clean in a few years or Kay will try to leave you and take the kids with her. To prevent this and hide any whacks you do commit from Kay, we need some OOP and I’ve used this class on a previous project (not specifically to whack anyone) where I needed object/model separation. Hence the namespace will again be “Models”, same as above.

<?php
/* 
Example MySQL schema: 
CREATE TABLE whacks (
	id int(10) unsigned NOT NULL auto_increment,
	name varchar(100) NOT NULL,
	method varchar(100) NOT NULL,
	reason text NOT NULL,
	created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
	updated_at datetime NOT NULL,
	whacked_at timestamp NOT NULL,
	PRIMARY KEY (id)
);
*/

namespace Models;

class Whack extends base {
	
	/**
	 * @var string Keep your friends close, but your enemies closer 
	 */
	public $name	= '';
	
	
	/**
	 * @var string Stupid thugs. People behaving like that with guns.
	 */
	public $method	= '';
	
	
	
	/**
	 * @var string I don't feel I have to wipe everybody out, Tom. 
	 *		Just my enemies.
	 */
	public $reason = '';
	
	
	/**
	 * @var string He's has been dying from the same heart attack 
	 *		for the last twenty years.
	 */
	public $whacked_at= '';
	
	
	public function __construct( array $data = null ) {
		
		if ( empty( $data ) ) {
			return;
		}
		
		foreach ( $data as $field => $value ) {
			$this->$field = $value;
		}
	}
	
	
	public function save() {
		$row		= 0;
		$params = array(
			'name'		=> $this->name,
			'method'	=> $this->method
		);
		
		
		/**
		 * I don't want anything to happen to him while my mother's alive
		 */
		if ( !empty( $this->whacked_at ) ) {
			$this->whacked_at = parent::_myTime( $this->whacked_at );
			$params['whacked_at'] = $this->whacked_at;
		}
		
		/**
		 * Sal, Tom, the boss says he'll come in a separate car.
		 */
		if ( isset( $this->id ) ) {
			$params['id'] = $this->id;
			parent::edit( 'whacks', $params );
		} else {
			parent::put( 'whacks', $params );
		}
	}
	
	
	/**
	 * I want you to find out what he's got under his fingernails.
	 */
	public static function find( $filter = array() ) {
		$sql = "SELECT id, name, method, reason, created_at, whacked_at 
				FROM whacks";
		
		// Filtering cleanup
		$filter = parent::filterConfig( $filter );
		$params = array();
		
		// Get by id
		if ( $filter['id'] > 0 ) {
			$sql .= " WHERE id = :id";
			$param['id'] = $filter['id'];
		}
		
		// Get by name
		if ( isset( $filter['name'] ) ) {
			if ( false !== strrpos( $sql, 'WHERE' ) ) {
				$sql .= " OR";
			} else {
				$sql .= " WHERE";
			}
			$sql .= " name = :name";
			$param['name'] = $filter['name'];
		}
		
		// Get 10, 20 results etc...
		if ( isset( $filter['limit'] ) ) {
			$sql .= " LIMIT :limit";
			$param['limit'] = $filter['limit'];
		}
		
		// Pagination
		if ( isset( $filter['offset'] ) ) {
			$sql .= " OFFSET :offset";
			$param['offset'] = $filter['offset'];
		}
		
		// Many whacks. Family's been busy.
		if ( isset( $filter['limit'] ) && $filter['limit'] > 1 )
			return parent::find( $sql, $params );
		}
		
		// One whack (note the 'true' for a single object)
		return parent::find( $sql, $params, true );
	}
	
	
	/**
	 * Tessio? I always thought it would be Clemenza.
	 */
	public static function delete( $params = array() ) {
		parent::delete( 'whacks', $params );
	}
}

You’ll notice there’s a base class this inherits from. The base is what’s referred to in all those “parent::” methods for creating (put), editing and, of course, deleting because no self-respecting member of the family will be too stubborn, lest he wishes to get gunned down on the causeway.

Here’s the parent class (You don’t need to manually assign the DSN to this; just have it as a defined variable) :

<?php
/**
 * Base object modal for finding, editing, deleting objects by table
 * This class handles connecting to a database if necessary and so it 
 * depends on the 'Cxn' class and requires DBH to be defined.
 * 
 * @author Eksith Rodrigo <reksith at gmail.com>
 * @license http://opensource.org/licenses/ISC ISC License
 * @uses Cxn
 */

namespace Models;

abstract class base {
	
	/**
	 * @var int Class object unique identifier (every class should have one)
	 */
	public $id;
	
	
	/**
	 * @var int Class object creation date. Should not modified.
	 */
	public $created_at;
	
	
	/**
	 * @var int Class object edited/saved date. Must be modified.
	 */
	public $updated_at;
	
	
	/**
	 * @var int Special status. Relevance will differ per object.
	 * @example A Post with status = -1 may be 'hidden' from view
	 */
	public $status;
	
	
	/**
	 * @var object PDO connection.
	 */
	protected static $db = null;
	
	
	/**
	 * @var int Total number of queries executed.
	 */
	public static $qcount = 0;
	
	
	/**
	 * Checks PDO connection or assigns to self::$db if it hasn't been set 
	 * and a new one has been passed.
	 * 
	 * @param object $db PDO connection
	 * @return bool True of the variable was set. False on failure.
	 */
	protected static function _db( $db = null ) {
		
		if ( is_object( self::$db ) ) {
			return true;
		} elseif ( is_object( $db ) ) {
			self::$db = $db;
			return true;
		}
		
		return false;
	}
	
	
	/**
	 * Clean up
	 */
	public function __destruct() {
		
		if ( is_object( self::$db ) ) {
			self::$db = null;
		}
	}
	
	
	/**
	 * Setup local Cxn instance with PDO 
	 * Note: This depends on the Cxn class and also 
	 * DBH (your connection string) being set
	 */
	public static function init() {
		
		if ( !self::_db() ) {
			$db = new Cxn( DBH );
			self::_db( $db->getDb() );
		}
	}
	
	
	/**
	 * Find objects using the given sql and parameters
	 * 
	 * @param string $sql Database query
	 * @param array $params Selector parameters in 'column' => 'value' format
	 * @return object Of the same type as the class calling this or null on failure
	 * @return mixed Single object or array of the same type as the class calling 
	 * 			this or null on failure
	 */
	protected static function find( $sql, $params, $single = false ) {
		
		self::init();
		
		$class	= get_called_class();
		$stmt	= self::$db->prepare( $sql );
		
		if ( $stmt->execute( $params ) ) {
			self::$qcount++;
			if ( $single ) {
				return $stmt->fetchObject( $class );
			} else {
				return $stmt->fetchAll( 
					\PDO::FETCH_CLASS, $class );
			}
		}
		return null;
	}
	
	
	/**
	 * Insert a single row into a table
	 * 
	 * @param string $table Database table name
	 * @param array $params Insert values in 'column' => 'value' format
	 * @return int The ID of the newly inserted row or 0 on failure
	 */
	protected static function put( $table, $params ) {
		
		self::init();
		
		$sql	= self::_insertStatement( $table, $params );
		$stmt	= self::$db->prepare( $sql );
		
		if ( $stmt->execute( $params ) ) {
			self::$qcount++;
			if ( 'postgres' === DBTYPE ) {
				return self::$db->lastInsertId( 'id' );
			}
			return self::$db->lastInsertId();
		}
		
		return 0;
	}
	
	
	/**
	 * Update records in a single table
	 * 
	 * @param string $table Database table name
	 * @param array $params Column parameters (id required)
	 * @return int Number of rows affected
	 */
	protected static function edit( $table, $params ) {
		
		if ( !isset( $params['id'] ) ) {
			return 0;
		}
		
		$id	= $params['id'];
		unset( $params['id'] );
				
		$sql = self::_updateStatement(
			$table, $params, "$table.id = :id"
		);
		
		self::init();
		$params['id']	= $id;
		$stmt		= self::$db->prepare( $sql );
	
		if ( $stmt->execute( $params ) ) {
			self::$qcount++;
			return $stmt->rowCount();
		}
		return 0;
	}
	
	
	/**
	 * Delete from a single table based on parameters
	 * 
	 * @param string $table Table name (only one) to delete from
	 * @param array $params Delete selectors
	 * 
	 * @example Deleting a post with ID = 223
	 * 		base::delete( 'posts', array( 'id' = 223 ) );
	 * 
	 * @return int Number of rows affected/deleted
	 */
	protected static function delete( $table, $params ) {
		
		self::init();
		
		$sql	= self::_deleteStatement( $table, $params );
		$stmt	= self::$db->prepare( $sql );
		
		if ( $stmt->execute( $params ) ) {
			self::$qcount++;
			return $stmt->rowCount();
		}
		return 0;
	}
	
	
	/**
	 * Add parameters to conditional IN/NOT IN ( x,y,z ) query
	 */
	protected function _addParams( $t, &$values, &$params = array(), &$in = '' ) {
		
		$vc = count( $values );
		for ( $i = 0; $i < $vc; $i++ ) {
			$in			= $in . ":val_{$i},";
			$params["val_{$i}"]	= array( $values[$i], $t );
		}
		$in = rtrim( $in, ',' );
	}
	
	
	/**
	 * Prepares parameters for SELECT, UPDATE or INSERT SQL statements.
	 * 
	 * E.G. For INSERT
	 * :name, :email, :password etc...
	 * 
	 * For UPDATE or DELETE
	 * name = :name, email = :email, password = :password etc...
	 */
	protected static function _setParams( 
		$fields = array(), 
		$mode = 'select', 
		$table = '' 
	) {
		$columns = is_array( $fields ) ? 
				array_keys( $fields ) : 
				array_map( 'trim', explode( ',', $fields ) );
		
		switch( $mode ) {
			case 'select':
				return implode( ', ', $columns );
				
			case 'insert':
				return ':' . implode( ', :', $columns );
			
			case 'update':
			case 'delete':
				$v = array_map( 
					function( $field ) use ( $table ) {
						
						if ( empty( $field ) ) { 
							return '';
						}
						return "$field = :$field";
					}, $columns );
				return implode( ', ', $v );
		}
	}
	
	
	/**
	 * Prepares SQL INSERT query with parameters matching field names
	 * 
	 * @param string $table Table name
	 * @param string|array $fields A comma delimited string of fields or 
	 * 		an array
	 * 
	 * @example field => value pairs (the 'field' key will be extracted as 
	 * 		the parameter)
	 * 		_insertStatement(
	 * 			'posts', 
	 * 			array( 
	 * 				'title' => 'Test title', 
	 * 				'author' => 'Guest'
	 * 			) 
	 * 		);
	 * 
	 * @return string;
	 */
	protected static function _insertStatement( $table, $fields ) {
		
		$cols = self::_setParams( $fields, 'select', $table );
		$vals = self::_setParams( $fields, 'insert', $table );
		return	"INSERT INTO $table ( $cols ) VALUES ( $vals );";
	}
	
	
	/**
	 * Prepares sql UPDATE query with parameters matching field names
	 * 
	 * @param string $table Table name
	 * @param string|array $fields A single field or comma delimited string 
	 * 		of fields or an array
	 * 
	 * @example field => value pairs (the 'field' key will be extracted as 
	 * 		the parameter)
	 * 		_updateStatement(
	 * 			'posts', 
	 * 			array( 
	 * 				'title' => 'Changed title', 
	 * 				'author' => 'Edited Guest' 
	 * 			),
	 * 			array( 'id' => 223 )
	 * 		);
	 * @return string;
	 */
	protected static function _updateStatement( 
		$table, 
		$fields = null, 
		$cond = ''
	) {
		$params = self::_setParams( $fields, 'update', $table );
		$sql	= "UPDATE $table SET $params";
		
		if ( !empty( $cond ) ) {
			$sql .= " WHERE $cond";
		}
		
		return $sql . ';';
	}
	
	
	/**
	 * Prepares sql DELETE query with parameters matching field names
	 * 
	 * @param string $table Table name
	 * @param string|array $fields A comma delimited string of fields or an 
	 * 		array of field => value pairs (the 'field' key will be 
	 * 		extracted as the parameter)
	 * @return string;
	 */
	protected static function _deleteStatement( 
		$table, 
		$fields = null, 
		$limit = null
	) {
		
		$params	= self::_setParams( $fields, 'delete', $table );
		$sql	= "DELETE FROM $table WHERE ( $params )";
		
		
		/**
	 	 * Limit must consist of an integer and not start with a '0'
	 	 */
		if ( null !== $limit && preg_match('/^([1-9][0-9]?+){1,2}$/', 
			$limit ) ) {
			$sql .= " LIMIT $limit";
		}
		
		return $sql . ';';
	}
	
	
	/**
	 * Pagination offset calculator
	 * Hard limit of 300 set for page since we rarely browse that many 
	 * casually. That's what searching is for ( also reduces abuse )
	 * 
	 * @param int $page Currently requested index (starting from 1)
	 * @param int $limit Maximum number of records per page
	 * @return int Offset
	 */
	protected static function _offset( $page, $limit ) {
		
		$page	= ( isset( $page ) )? ( int ) $page - 1 : 0;
		$limit	= ( isset( $limit ) )? ( int ) $limit : 0;
		
		if ( empty( $page ) || $page < 0 || 
			empty( $limit ) || $limit < 0
		) {
			return 0; 
		}
		return ( $page > 300 ) ? 0 : $page * $limit;
	}
	
	
	/**
	 * Convert a unix timestamp a datetime-friendly timestamp
	 * 
	 * @param int $time Unix timestamp
	 * @return string 'Year-month-date Hour:minute:second' format
	 */
	protected static function _myTime( $time ) {
		return gmdate( 'Y-m-d H:i:s', $time );
	}
	
	
	/**
	 * Sets filter configuration ( pagination, limit, id etc... )
	 */
	public static function filterConfig( &$filter = array() ) {
		
		$filter['id']		= isset( $filter['id'] ) ? 
						$filter['id'] : 0;
						
		$filter['limit']	= isset( $filter['limit'] ) ? 
						$filter['limit'] : 1;
						
		$filter['page']		= isset( $filter['page'] ) ? 
						$filter['page'] : 1;
						
		$filter['search']	= isset( $filter['search'] ) ? 
						$filter['search'] : '';
		
		$offset			= self::_offset( 
						$filter['page'] , 
						$filter['limit']
					);
		if ( $offset > 0 ) {
			$filter['offset'] = $offset;
		}
	}
}

This class was built to help me with a bunch of other classes as well, so I added a quick filterConfig there for a lot of the default parameters I’d use, including for offset calculation for pagination.

Putting it all together, you might use something like this to add a new entry :

 // Make sure, your full DSN is set in php.ini
// under pdo.dsn.mywhacks = 'sqlite:data/hitlist.sqlite' or something
define( 'DBH', 'mywhacks' );

$whack = new \Models\Whack();
$whack->name = "Carlo";
$whack->method = "garrote in the car";
$whack->reason = "Fingered Sonny for the Barzini people.";

// Baptism is already over so why wait? 
$whack->save();

To change something :

$fredo = \Models\Whack::find( array( 'name' => 'Fredo' ) );
$fredo->method = "shot while fishing";
$fredo->reason = "Almost got Michael killed. Took sides against the family.";

$fredo->whacked_at = time(); // Mamma's dead now
$fredo->save();

To delete :

\Models\Whack::delete( array( 'name' => 'Clemenza' ) );

And that’s about it.

Forum schema

I’ve been working slowly, but steadily, on the forum lately and thought I’d put up the schema I’ve been testing against for the past few weeks. This is for SQLite since it’s very simple to setup, change things and if necessary, completely scrap and start over without having to mess with a DB server.

I’ve been using the SQLite Manager plugin for Firefox to make quick changes to the schema on the go. I’ll post snippets of the rest of the script in increments when I have time to clean up and maybe I’ll post the whole thing on GitHub for the alpha release.

Maybe this will serve as a good starting point for someone else looking for ideas as well.

DROP TABLE IF EXISTS users;
CREATE TABLE users (
	id INTEGER PRIMARY KEY AUTOINCREMENT, 
	username VARCHAR NOT NULL UNIQUE, 
	password TEXT NOT NULL, 
	email VARCHAR NOT NULL UNIQUE, 
	token VARCHAR NOT NULL,
	bio TEXT, 
	pubkey TEXT, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	updated_at DATETIME NOT NULL , 
	status INTEGER NOT NULL DEFAULT 0
);

CREATE UNIQUE INDEX users_name_idx ON users (username, email);
CREATE INDEX users_activity_idx ON users (created_at, updated_at);
CREATE INDEX users_status_idx ON users (status);



DROP TABLE IF EXISTS posts;
CREATE TABLE posts (
	id INTEGER PRIMARY KEY AUTOINCREMENT, 
	title VARCHAR,
	url VARCHAR,
	author VARCHAR NOT NULL, 
	user_id INTEGER NOT NULL DEFAULT 0, 
	summary TEXT NOT NULL, 
	body TEXT NULL, 
	reply_count INTEGER NOT NULL DEFAULT 0, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	updated_at DATETIME NOT NULL, 
	lastreply_at DATETIME NOT NULL, 
	status INTEGER NOT NULL  DEFAULT 0
);

CREATE INDEX posts_activity_idx ON posts (created_at, updated_at, lastreply_at);
CREATE INDEX posts_author_idx ON posts (author, user_id);
CREATE INDEX posts_url_idx ON posts (url) WHERE url IS NOT NULL;
CREATE INDEX posts_status_idx ON posts (status);



DROP TABLE IF EXISTS posts_family;
CREATE TABLE posts_family(
	parent_id INTEGER NOT NULL, 
	child_id INTEGER NOT NULL, 
	last_author VARCHAR NOT NULL, 
	last_id INTEGER NOT NULL, 
	lastreply_at DATETIME NOT NULL
);

CREATE INDEX posts_family_idx ON posts_family (parent_id, child_id);
CREATE INDEX posts_family_activity_idx ON posts_family 
	(last_author, last_id, lastreply_at);



DROP TABLE IF EXISTS posts_flags;
CREATE TABLE posts_flags(
	post_id INTEGER NOT NULL, 
	user_id INTEGER NOT NULL, 
	reason VARCHAR NOT NULL, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	status INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX posts_flags_idx ON posts_flags (post_id, user_id);
CREATE INDEX posts_flags_created_at_idx ON posts_flags (created_at);
CREATE INDEX posts_flags_status_idx ON posts_flags (status);



DROP TABLE IF EXISTS taxonomy;
CREATE TABLE taxonomy (
	id INTEGER PRIMARY KEY AUTOINCREMENT, 
	label VARCHAR,
	term VARCHAR,
	post_count INTEGER NOT NULL DEFAULT 0, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
	body TEXT NULL
);

CREATE INDEX taxonomy_created_at_idx ON taxonomy (created_at);
CREATE UNIQUE INDEX taxonomy_term_idx ON taxonomy (label, term);
CREATE INDEX taxonomy_count_idx ON taxonomy (post_count);



DROP TABLE IF EXISTS taxonomy_posts;
CREATE TABLE taxonomy_posts(
	taxonomy_id INTEGER NOT NULL, 
	post_id INTEGER NOT NULL
);

CREATE UNIQUE INDEX taxonomy_posts_idx ON taxonomy_posts (taxonomy_id, post_id);



DROP TABLE IF EXISTS pms;
CREATE TABLE pms (
	id INTEGER PRIMARY KEY AUTOINCREMENT, 
	title VARCHAR,
	sender_id INTEGER NOT NULL, 
	receiver_id INTEGER NOT NULL, 
	body TEXT NULL, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	status INTEGER NOT NULL  DEFAULT 0
);

CREATE INDEX pms_contact_idx ON pms (sender_id, receiver_id);
CREATE INDEX pms_created_idx ON pms (created_at);
CREATE INDEX pms_status_idx ON pms (status);



DROP TABLE IF EXISTS sessions;
CREATE TABLE sessions(
	id VARCHAR PRIMARY KEY, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	updated_at DATETIME NOT NULL,
	skey VARCHAR NOT NULL,
	data TEXT NOT NULL
);

CREATE INDEX sessions_activity_idx ON sessions (created_at, updated_at);



DROP TABLE IF EXISTS firewall;
CREATE TABLE firewall(
	id INTEGER PRIMARY KEY AUTOINCREMENT, 
	label VARCHAR,
	response VARCHAR,
	ip VARCHAR,
	ua VARCHAR,
	protocol VARCHAR,
	method VARCHAR,
	uri VARCHAR,
	headers TEXT NOT NULL, 
	request_at DATETIME NOT NULL,
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX firewall_ip_idx ON firewall (ip, label);
CREATE INDEX firewall_response_idx ON firewall (response);
CREATE INDEX firewall_activity_idx ON firewall (request_at, created_at);



DROP TABLE IF EXISTS blocklist;
CREATE TABLE blocklist(
	ip VARCHAR PRIMARY KEY, 
	label VARCHAR,
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
	expires_at DATETIME NOT NULL
);

CREATE INDEX blocklist_label_idx ON blocklist (label);
CREATE INDEX blocklist_dates_idx ON blocklist (created_at, expires_at);



DROP TABLE IF EXISTS settings;
CREATE TABLE settings(
	id VARCHAR PRIMARY KEY, 
	data TEXT NOT NULL,
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX settings_created_idx ON settings (created_at);



DROP TABLE IF EXISTS history;
CREATE TABLE history(
	post_id INTEGER NOT NULL,
	user_id INTEGER NOT NULL  DEFAULT 0, 
	anon_id VARCHAR, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE UNIQUE INDEX history_idx ON history (post_id, user_id, anon_id);
CREATE INDEX history_created_idx ON history (created_at);



DROP TABLE IF EXISTS actions;
CREATE TABLE actions(
	post_id INTEGER NOT NULL,
	user_id INTEGER NOT NULL  DEFAULT 0, 
	anon_id VARCHAR, 
	command VARCHAR, 
	auth_key VARCHAR, 
	created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, 
	expires_at DATETIME NOT NULL
);

CREATE UNIQUE INDEX actions_idx ON actions (post_id, user_id, anon_id);
CREATE INDEX actions_auth_key_idx ON actions (auth_key);
CREATE INDEX actions_activity_idx ON actions (created_at, expires_at);