Whitelist HTML sanitizing with PHP

The following is a single class written to perform comprehensive HTML input filtering with minimal dependencies (basically only Tidy) and should work in PHP 5.3+. This will be included in my forum script as the default filter.

This version captures URL encoded XSS attempts with deep attribute inspection (to a decoding depth of 6 by default) as well as scrubbing all non-whitelisted attributes, tags and conversion of surviving attribute data into HTML entities.

In addition, it will attempt to capture directory traversal attempts ( ../ or \\ or /~/ etc… ) which may give access to restricted areas of a site. Your web server should deny access to these URLs by default, however that won’t stop someone from posting links pointing elsewhere. This will reduce your liability should such a link be included in your site content by a user.

You can post sourcecode within <code> tags and it will be encoded by default.

<?php

/**
 * HTML parsing, filtering and sanitization
 * This class depends on Tidy which is included in the core since PHP 5.3
 *
 * @author Eksith Rodrigo <reksith at gmail.com>
 * @license http://opensource.org/licenses/ISC ISC License
 * @version 0.2
 */

class Html {
	
	/**
	 * @var array HTML filtering options
	 */
	public static $options = array( 
		'rx_url'	=> // URLs over 255 chars can cause problems
			'~^(http|ftp)(s)?\:\/\/((([a-z|0-9|\-]{1,25})(\.)?){2,7})($|/.*$){4,255}$~i',
		
		'rx_js'		=> // Questionable attributes
			'/((java)?script|eval|document)/ism',
		
		'rx_xss'	=> // XSS (<style> can also be a vector. Stupid IE 6!)
			'/(<(s(?:cript|tyle)).*?)/ism',
		
		'rx_xss2'	=> // More potential XSS
			'/(document\.|window\.|eval\(|\(\))/ism',
		
		'rx_esc'	=> // Directory traversal/escaping/injection
			'/(\\~\/|\.\.|\\\\|\-\-)/sm'	,
		
		'scrub_depth'	=> 6, // URL Decoding depth (fails on exceeding this)
		
		'nofollow'	=> true // Set rel='nofollow' on all links

	);
	
	/**
	 * @var array List of HTML Tidy output settings
	 * @link http://tidy.sourceforge.net/docs/quickref.html
	 */
	private static $tidy = array(
		// Preserve whitespace inside tags
		'add-xml-space'			=> true,
		
		// Remove proprietary markup (E.G. og:tags)
		'bare'				=> true,
		
		// More proprietary markup
		'drop-proprietary-attributes'	=> true,
		
		// Remove blank (E.G. <p></p>) paragraphs
		'drop-empty-paras'		=> true,
		
		// Wraps bare text in <p> tags
		'enclose-text'			=> true,
		
		// Removes illegal/invalid characters in URIs
		'fix-uri'			=> true,
		
		// Removes <!-- Comments -->
		'hide-comments'			=> true,
		
		// Removing indentation saves storage space
		'indent'			=> false,
		
		// Combine individual formatting styles
		'join-styles'			=> true,
		
		// Converts <i> to <em> & <b> to <strong>
		'logical-emphasis'		=> true,
		
		// Byte Order Mark isn't really needed
		'output-bom'			=> false,
		
		// Ensure UTF-8 characters are preserved
		'output-encoding'		=> 'utf8',
		
		// W3C standards compliant markup
		'output-xhtml'			=> true,
		
		// Had some unexpected behavior with this
		//'markup'			=> true,

		// Merge multiple <span> tags into one		
		'merge-spans'			=> true,
		
		// Only outputs <body> (<head> etc... not needed)
		'show-body-only'		=> true,
		
		// Removing empty lines saves storage
		'vertical-space'		=> false,
		
		// Wrapping tags not needed (saves bandwidth)
		'wrap'				=> 0
	);
	
	
	/**
	 * @var array Whitelist of tags. Trim or expand these as necessary
	 * @example 'tag' => array( of, allowed, attributes )
	 */
	private static $whitelist = array(
		'p'		=> array( 'style', 'class', 'align' ),
		'div'		=> array( 'style', 'class', 'align' ),
		'span'		=> array( 'style', 'class' ),
		'br'		=> array( 'style', 'class' ),
		'hr'		=> array( 'style', 'class' ),
		
		'h1'		=> array( 'style', 'class' ),
		'h2'		=> array( 'style', 'class' ),
		'h3'		=> array( 'style', 'class' ),
		'h4'		=> array( 'style', 'class' ),
		'h5'		=> array( 'style', 'class' ),
		'h6'		=> array( 'style', 'class' ),
		
		'strong'	=> array( 'style', 'class' ),
		'em'		=> array( 'style', 'class' ),
		'u'		=> array( 'style', 'class' ),
		'strike'	=> array( 'style', 'class' ),
		'del'		=> array( 'style', 'class' ),
		'ol'		=> array( 'style', 'class' ),
		'ul'		=> array( 'style', 'class' ),
		'li'		=> array( 'style', 'class' ),
		'code'		=> array( 'style', 'class' ),
		'pre'		=> array( 'style', 'class' ),
		
		'sup'		=> array( 'style', 'class' ),
		'sub'		=> array( 'style', 'class' ),
		
		// Took out 'rel' and 'title', because we're using those below
		'a'		=> array( 'style', 'class', 'href' ),
		
		'img'		=> array( 'style', 'class', 'src', 'height', 
					  'width', 'alt', 'longdesc', 'title', 
					  'hspace', 'vspace' ),
		
		'table'		=> array( 'style', 'class', 'border-collapse', 
					  'cellspacing', 'cellpadding' ),
					
		'thead'		=> array( 'style', 'class' ),
		'tbody'		=> array( 'style', 'class' ),
		'tfoot'		=> array( 'style', 'class' ),
		'tr'		=> array( 'style', 'class' ),
		'td'		=> array( 'style', 'class', 
					'colspan', 'rowspan' ),
		'th'		=> array( 'style', 'class', 'scope', 'colspan', 
					  'rowspan' ),
		
		'q'		=> array( 'style', 'class', 'cite' ),
		'cite'		=> array( 'style', 'class' ),
		'abbr'		=> array( 'style', 'class' ),
		'blockquote'	=> array( 'style', 'class' ),
		
		// Stripped out
		'body'		=> array()
	);
	
	
	
	/**#@+
	 * HTML Filtering
	 */
	
	
	/**
	 * Convert content between code blocks into code tags
	 * 
	 * @param $val string Value to encode to entities
	 */
	protected function escapeCode( $val ) {
		
		if ( is_array( $val ) ) {
			$out = self::entities( $val[1] );
			return '<code>' . $out . '</code>';
		}
		
	}
	
	
	/**
	 * Convert an unformatted text block to paragraphs
	 * 
	 * @link http://stackoverflow.com/a/2959926
	 * @param $val string Filter variable
	 */
	protected function makeParagraphs( $val ) {
		
		/**
		 * Convert newlines to linebreaks first
		 * This is why PHP both sucks and is awesome at the same time
		 */
		$out = nl2br( $val );
		
		/**
		 * Turn consecutive <br>s to paragraph breaks and wrap the 
		 * whole thing in a paragraph
		 */
		$out = '<p>' . preg_replace('#(?:<br\s*/?>\s*?){2,}#', 
			'<p></p><p>', $out ) . '</p>';
		
		/**
		 * Remove <br> abnormalities
		 */
		$out = preg_replace( '#<p>(\s*<br\s*/?>)+#', '</p><p>', $out );
		$out = preg_replace( '#<br\s*/?>(\s*</p>)+#', '<p></p>', $out );
		
		return $out;
	}
	
	
	/**
	 * Filters HTML content through whitelist of tags and attributes
	 * 
	 * @param $val string Value filter
	 */
	public function filter( $val ) {
		
		if ( !isset( $val ) || empty( $val ) ) {
			return '';
		}
		
		/**
		 * Escape the content of any code blocks before we parse HTML or 
		 * they will get stripped
		 */
		$out	= preg_replace_callback( "/\<code\>(.*)\<\/code\>/imu", 
				array( $this, 'escapeCode' ) , $val
			);
		
		/**
		 * Convert to paragraphs and begin
		 */
		$out	= $this->makeParagraphs( $out );
		$dom	= new DOMDocument();
		
		/**
		 * Hide parse warnings since we'll be cleaning the output anyway
		 */
		$err	= libxml_use_internal_errors( true );
		
		$dom->loadHTML( $out );
		$dom->encoding = 'utf-8';
		
		$body	= $dom->getElementsByTagName( 'body' )->item( 0 );
		$this->cleanNodes( $body, $badTags );
		
		/**
		 * Iterate through bad tags found above and convert them to 
		 * harmless text
		 */
		foreach ( $badTags as $node ) {
			if( $node->nodeName != "#text" ) {
				$ctext = $dom->createTextNode( 
						$dom->saveHTML( $node )
					);
				$node->parentNode->replaceChild( 
					$ctext, $node 
				);
			}
		}
		
		
		/**
		 * Filter the junk and return only the contents of the body tag
		 */
		$out = tidy_repair_string( 
				$dom->saveHTML( $body ), 
				self::$tidy
			);
		
		
		/**
		 * Reset errors
		 */
		libxml_clear_errors();
		libxml_use_internal_errors( $err );
		
		return $out;
	}
	
	
	protected function cleanAttributeNode( 
		&$node, 
		&$attr, 
		&$goodAttributes, 
		&$href 
	) {
		/**
		 * Why the devil is an attribute name called "nodeName"?!
		 */
		$name = $attr->nodeName;
		
		/**
		 * And an attribute value is still "nodeValue"?? Damn you PHP!
		 */
		$val = $attr->nodeValue;
		
		/**
		 * Default action is to remove the attribute completely
		 * It's reinstated only if it's allowed and only after 
		 * it's filtered
		 */
		$node->removeAttributeNode( $attr );
		
		if ( in_array( $name, $goodAttributes ) ) {
			
			switch ( $name ) {
				
				/**
				 * Validate URL attribute types
				 */
				case 'url':
				case 'src':
				case 'href':
				case 'longdesc':
					if ( self::urlFilter( $val ) ) {
						$href = $val;
					} else {
						$val = '';
					}
					break;
				
				/**
				 * Everything else gets default scrubbing
				 */
				default:
					if ( self::decodeScrub( $val ) ) {
						$val = self::entities( $val );
					} else {
						$val = '';
					}
			}
			
			if ( '' !== $val ) {
				$node->setAttribute( $name, $val );
			}
		}
	}
	
	
	/**
	 * Modify links to display their domains and add 'nofollow'.
	 * Also puts the linked domain in the title as well as the file name
	 */
	protected static function linkAttributes( &$node, $href ) {
		try {
			if ( !self::$options['nofollow'] ) {
				return;
			}
			
			$parsed	= parse_url( $href );
			$title	= $parsed['host'] . ' ';
			
			$f	= pathinfo( $parsed['path'] );
			$title	.= ' ( /' . $f['basename'] . ' ) ';
				
			$node->setAttribute( 
				'title', $title
			);
			
			if ( self::$options['nofollow'] ) {
				$node->setAttribute(
					'rel', 'nofollow'
				);
			}
			
		} catch ( Exception $e ) { }
	}
	
	
	/**
	 * Iterate through each tag and add non-whitelisted tags to the 
	 * bad list. Also filter the attributes and remove non-whitelisted ones.
	 * 
	 * @param htmlNode $node Current HTML node
	 * @param array $badTags Cumulative list of tags for deletion
	 */
	protected function cleanNodes( $node, &$badTags = array() ) {
		
		if ( array_key_exists( $node->nodeName, self::$whitelist ) ) {
			
			if ( $node->hasAttributes() ) {
				
				/**
				 * Prepare for href attribute which gets special 
				 * treatment
				 */
				$href = '';
				
				/**
				 * Filter through attribute whitelist for this 
				 * tag
				 */
				$goodAttributes = 
					self::$whitelist[$node->nodeName];
				
				
				/**
				 * Check out each attribute in this tag
				 */
				foreach ( 
					iterator_to_array( $node->attributes ) 
					as $attr ) {
					$this->cleanAttributeNode( 
						$node, $attr, $goodAttributes, 
						$href
					);
				}
				
				/**
				 * This is a link. Treat it accordingly
				 */
				if ( 'a' === $node->nodeName && '' !== $href ) {
					self::linkAttributes( $node, $href );
				}
				
			} // End if( $node->hasAttributes() )
			
			/**
			 * If we have childnodes, recursively call cleanNodes 
			 * on those as well
			 */
			if ( $node->childNodes ) {
				foreach ( $node->childNodes as $child ) {
					$this->cleanNodes( $child, $badTags );
				}
			}
			
		} else {
			
			/**
			 * Not in whitelist so no need to check its child nodes. 
			 * Simply add to array of nodes pending deletion.
			 */
			$badTags[] = $node;
			
		} // End if array_key_exists( $node->nodeName, self::$whitelist )
		
	}
	
	/**#@-*/
	
	
	/**
	 * Returns true if the URL passed value is harmless.
	 * This regex takes into account Unicode domain names however, it 
	 * doesn't check for TLD (.com, .net, .mobi, .museum etc...) as that 
	 * list is too long.
	 * The purpose is to ensure your visitors are not harmed by invalid 
	 * markup, not that they get a functional domain name.
	 * 
	 * @param string $v Raw URL to validate
	 * @returns boolean
	 */
	public static function urlFilter( $v ) {
		
		$v = strtolower( $v );
		$out = false;
		
		if ( filter_var( $v, 
			FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED ) ) {
			
			/**
			 * PHP's native filter isn't restrictive enough.
			 */
			if ( preg_match( self::$options['rx_url'], $v ) ) {
				$out = true;
			} else {
				$out = false;
			}
			
			if ( $out ) {
				$out = self::decodeScrub( $v );
			}
		} else {
			$out = false;
		}
		
		return $out;
	}
	
	
	/**
	 * Regular expressions don't work well when used for validating HTML.
	 * It really shines when evaluating text so that's what we're doing here
	 * 
	 * @param string $v string Attribute name
	 * @param int $depth Number of times to URL decode
	 * @returns boolean True if nothing unsavory was found.
	 */
	public static function decodeScrub( $v ) {
		if ( empty( $v ) ) {
			return true;
		}
		
		$depth		= self::$options['scrub_depth'];
		$i		= 1;
		$success	= false;
		$old		= '';
		
		
		while( $i <= $depth && !empty( $v ) ) {
			// Check for any JS and other shenanigans
			if (
				preg_match( self::$options['rx_xss'], $v ) || 
				preg_match( self::$options['rx_xss2'], $v ) || 
				preg_match( self::$options['rx_esc'], $v )
			) {
				$success = false;
				break;
			} else {
				$old	= $v;
				$v	= self::utfdecode( $v );
				
				/**
				 * We found the the lowest decode level.
				 * No need to continue decoding.
				 */
				if ( $old === $v ) {
					$success = true;
					break;
				}
			}
			
			$i++;
		}
		
		
		/**
		 * If after decoding a number times, we still couldn't get to 
		 * the original string, then there's something still wrong
		 */
		if ( $old !== $v && $i === $depth ) {
			return false;
		}
		
		return $success;
	}
	
	
	/**
	 * UTF-8 compatible URL decoding
	 * 
	 * @link http://www.php.net/manual/en/function.urldecode.php#79595
	 * @returns string
	 */
	public static function utfdecode( $v ) {
		$v = urldecode( $v );
		$v = preg_replace( '/%u([0-9a-f]{3,4})/i', '&#x\\1;', $v );
		return html_entity_decode( $v, null, 'UTF-8' );
	}
	
	
	/**
	 * HTML safe character entitites in UTF-8
	 * 
	 * @returns string
	 */
	public static function entities( $v ) {
		return htmlentities( 
			iconv( 'UTF-8', 'UTF-8', $v ), 
			ENT_NOQUOTES | ENT_SUBSTITUTE, 
			'UTF-8'
		);
	}	
}

Usage is pretty simple:

$data = $_POST['body'];
$html = new Html();
$data = $html->filter( $data );

What’s the point of HTML5 Boilerplate?

This isn’t a rhetorical question, I’m genuinely trying to understand what the point of this package is. I’m sure a lot of people have invested a lot of time and energy to it, but maybe I’m just missing something.

And I did read the “Why it is good” section on the project site, but still couldn’t figure out why all of this shouldn’t be known to a designer to begin with as it seems to me that this stuff is a collection of Googlings or stuff that can be resolved by browsing the spec page for a bit. And browser quirks aren’t something that will trouble you if you keep to sane uses of CSS and HTML; a lot of problems can be avoided when following the rule of Less is more.

Cross-browser compatible (IE6+, yeah we got that.)

IE6 is dead. Supporting a dead or dying browser makes no sense at all and, realistically speaking, any “advanced” feature you’re trying to push via HTML5 on your site will be thoroughly broken on it anyway.

Browsers that matter (I.E. ones you want to push your flashy new site on) will likely have other issues besides how well it renders every pixel on screen. Besides that, “cross-browser compatible” is usually a byword term for “not accessible” so in essence there’s a lot of mutually exclusive stuff pushed with the package with regard to accessibility.

HTML5 ready. Use the new tags with certainty.

Modernizr allows you to do this already and most browsers (even the old ones) simply treat a lot of the new architecture tags like <article>, <section> or <nav> as just a div by default.

Video and audio are the sticking points for most HTML5 sites with media and while you can do embedding fairly easily, the hard part is the codec standard (which none of the major vendors are agreeing on) so that leaves the embed tag you choose a non-issue compared to the codec headaches.

Optimal caching and compression rules for grade-A performance

jQuery caches dynamically loaded scripts as does Modernizr and most web hosts already have mod_deflate enabled for HTML, CSS and JavaScript.

Best practice site configuration defaults

Less is more. See above.

Often times, if you’re struggling to make a layout look exactly the same as a mockup, you’re writing more markup and not effective markup. CSS and HTML only get complicated when you try to do things that are best left doing with images rather than pure markup.

Mobile browser optimizations

Less is more (2). See above.

Also, modern mobile browsers are fully capable of rendering a page meant for a full screen; Mobile browsers have a zoom feature. The issue isn’t the rendering, it’s the data plan. Worry more about “how much” you’re sending to the mobile device rather than “how” it’s rendered.

Progressive enhancement graceful degradation … yeah yeah we got that

Less is more (3). See above.

Browsers that will render HTML4 will render HTML5 without issue for the most part. CSS 3 with conditional 2.1 or older may help with older browsers, but what will really help a visitor is a helpful reminder (if they’re on IE 6) that they’re using a dangerously insecure browser and should upgrade right away. Let’s be more concerned with our civic duty to protect the web and its users than how pretty our sites look for a change.

IE specific classes for maximum cross-browser control

Less is more (4). See above.

There’s a line in Star Wars Episode IV: A New Hope by Princess Leia to governor Tarkin

The more you tighten your grip, Tarkin, the more star systems will slip through your fingers

Sensible control is a better option than “maximum” control. You want your site to look good (“perfect” is a dream until we all start using the same browser; at which point I’d rather not be developing for the web) so think about this… what is it that you’re trying to accomplish with your site? And does your preoccupation with perfection keeping you from accomplishing it?

Handy .no-js and .js classes to style based on capability

Less is more (5). See above.

Sensible use of CSS 2.1 and 3  will get a great deal of the same functionality as JS. I’d much rather see designers become familiar with 2.1 first as there are a lot of basics that get skipped that lay a good foundation. You would be amazed at how little markup you actually need to get a good result.

Also, JS gets ignored by three kind of visitors :

  • JS Disabled, but capable – Aren’t interested in being bothered with ads or other flashy nonsense and will likely be interested in the content rather than presentation. Be concerned with the quality of the content and just make it available without hurdles.
  • JS Incapable – Usually screen readers, text only or other such specialty browser. CSS is only applicable for accessibility.
  • Bots – Don’t need CSS anyway.

Again, here is a good case for using sensible CSS rather than clever CSS.

Console.log nerfing so you won’t break anyone by mistake.

This is vendor specific stuff that shouldn’t be encouraged too much. Firebug is fine and all, but if something “breaks” something else, it’s usually a result of poor encapsulation.

Also… Less is more (6). See above.

Never go wrong with your doctype or markup!

Never underestimate a novice developer’s ability to break the unbreakable. I remember Adam Savage of the Mythbusters once giving a presentation and mentioned how surprised he was that actors were able to break welded steel on certain stage props (back in his prop making days) by merely handling them. If there’s a way to break it, they will find it. Also of note, Less is more (7) and the less markup you have, the less you will likely break it.

An optimal print stylesheet, performance optimized.

If your HTML5 layout can’t be made printer friendly with the following :

body { color:#000 !important; background:#fff !important;}
aside, footer { display:none !important; }

You’re doing it wrong or doing it too complicated.

Also… Less is more (8). See above.

iOS, Android, Opera Mobile-adaptable markup and CSS skeleton.

Less is more (9). See above.

As mentioned before, mobile devices today aren’t the text only readers of yesteryear. When multi-touch is becoming commonplace, your chief concern should still be bulk, not pixel perfection.

Fun fact: A pixel is not really a pixel on a mobile screen.

.clearfix, .visuallyhidden classes to style things wisely and accessibly.

Never been a fan of hacks. You can bash IE all you want, but there are ways to get around it without resorting to too much CSS witchcraft. And since Less is more (10), you automatically improve accesibility by simply not piling too much on the browser to begin with and taking a browse at the above linked spec page for HTML5 and CSS3 at w3schools.

.htaccess file that allows proper use of HTML5 features and faster page load

Also guarantees to break any CMS. There are some things in the default .htaccess that do make sense like denying access to hidden folders or log and other such special files, but then there are other things that are completely pedantic or just plain asinine. Like forcibly rewriting http://www.example.com into example.com. Honestly what’s the big deal?

And measures like setting session.cookie_httponly makes sense if you don’t use JavaScript to manipulate cookies on your own web application. This is a classic case of more security not necessarily being better than just better security. You should be vetting your application for SQL injection and XSS vulnerabilities and not handicapping yourself. Security is a process, not a destination, and the .htaccess isn’t something to be casually played with. You’re better off not having a .htaccess file there at all and instead a link to the OWASP page so anyone who downloads the package can familiarlize themselves and understand what it is they’re doing rather than copy > pasting.

And then there’s this :

# Force the latest IE version, in various cases when it may fall back to IE7 mode
#  github.com/rails/rails/commit/123eb25#commitcomment-118920
# Use ChromeFrame if it’s installed for a better experience for the poor IE folk

This may be a secret to a lot of people and you may not know this about IE users, but please believe me when I say this, it’s completely true. Most IE users DON’T GIVE A SH!# ABOUT WHICH JS ENGINE THEY HAVE!!

If your site needs a JS engine swap to make it work better on IE7, then it’s more poorly designed than IE7.

Also, did I mention Less is more (11)? See above.

CDN hosted jQuery with local fallback failsafe.

I’m fairly certain that the Google library or the one hosted by Microsoft are unlikely to be unavailable unless you’re hosting your site in Iran or China (do they block ajax.googleapis.com?) Other than that, this is a 3 second fix for anyone using Modernizr or jQuery.

Think there’s too much? The HTML5 Boilerplate is delete-key friendly. :)

Enter-key friendly will always leave less cruft in your code than delete-key friendly.

Here’s all you need to start with HTML 5 (add more as necessary) using just the bare essentials to get going.

A basic index.html file

<!DOCTYPE html>
<html lang="en-us">
<head>
	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1" />

	<title>Site title</title>
	<script type="text/javascript" src="lib/modernizr.min.js"></script>
	<script type="text/javascript">
		var spath = "lib/";
	</script>
	<script type="text/javascript" src="lib/loader.js"></script>
	<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
	<div class="page">
		<header>
			<div class="title">
				<h1><a href=".">This is a site</a></h1>
				<p>Some sorta description</p>
			</div>
		</header>
		<article>
			<section class="column two-thirds">
				<h2>Section header</h2>
				<p>This is where your main content goes.</p>
			</section>
			<aside class="column one-third">
				<h3>This can be the sidebar</h3>
				<p>Put your links and stuff here.</p>
			</aside>
			<footer>
				<p>Copyright and stuff here</p>
			</footer>
		</article>
	</div>
</body>
</html>

You can download Modernizr here. Using modernizr, here is the loader.js file (gets the basics downloaded including jQuery, validation and jQuery UI).

if(!window.jQuery)
Modernizr.load([{
	load: "https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"
}, {
	load: "https://ajax.aspnetcdn.com/ajax/jquery.validate/1.9/jquery.validate.min.js"
}, {
	load: "https://ajax.aspnetcdn.com/ajax/jquery.validate/1.9/additional-methods.min.js"
}, {
	load: "https://ajax.googleapis.com/ajax/libs/jqueryui/1.8.18/jquery-ui.min.js"
}, {
	load: spath + "yourlib.js"
}]);

You can add more of your own libraries as necessary, but yourlib.js can be just a jQuery specific code file.

$(function () {
// More of your stuff here
});

And here’s a usable style.css that you can run with.

/* Default */
body
{
	font-family: "Segoe UI" , Tahoma, Sans-Serif;
	font-weight: normal;
	font-size: medium;
}

@media screen
{
	/* Reset */
	body, div, header, article, section, aside, footer,
	p, h1, h2, h3, h4, h5, h6, ul, li, blockquote,
	form, fieldset, legend, input, textarea, select,
	table, td, th, hr
	{
		margin: 0;
		padding: 0;
		border: 0;
		line-height: normal;
	}

	body
	{
		font: normal 86% "Segoe UI" , "Myriad" , Tahoma, Sans-serif;
		color: #333;
		background: #fff;
		margin: 0;
		padding: .7em;
	}

	div.page
	{
		width: 80%;
		max-width: 1200px;
		min-width: 800px;
		text-align: left;
		margin: 0 auto 1em auto;
	}

	/* Headings */
	h1, h2, h3, h4
	{
		font-weight: normal;
		padding: .4em 0 .1em 0;
	}
	h1
	{
		font-size: 230%;
		color: #a33;
	}
	h2
	{
		font-size: 150%;
		padding: .4em 0;
	}
	h3
	{
		font-size: 140%;

	}
	aside h3
	{
		border:1px dotted #aaa;
		border-width:0 0 1px 0;
	}
	h4
	{
		font-size: 130%;
	}
	h5
	{
		font-size: 120%;
	}
	h6
	{
		font-size: 110%;
	}

	/* Page segments */
	article, header, footer, hr
	{
		clear: both;
	}
	header
	{
	}
	article
	{
		width: 100%;
	}

	header:first-child
	{
		margin: 0 0 1em 0;
		border: 1px dotted #aaa;
		border-width: 0 0 1px 0;
	}
	footer
	{
		border: 1px dotted #aaa;
		border-width: 1px 0 0 0;
	}

	aside
	{
		background: #f8f8ff;
		box-shadow:3px 3px 3px #ddd;
		border-radius:3px;
	}

	hr
	{
		background: #a33;
		height: 1px;
		margin: .5em 0;
	}

	/* Paragraphs */
	p
	{
		line-height: 140%;
		padding: 1em;
	}

	.column p
	{
		padding:1em 0;
	}

	section p
	{
		padding: .2em 0 1em 0;
	}
	aside p
	{
		padding: 1em !important;
	}

	form p
	{
		line-height:normal;
		padding:.2em .5em .5em .5em !important;
	}
	header p
	{
		padding: 0 0 1em 0;
	}

	footer p
	{
		font-size: 90%;
		color: #000;
	}
	blockquote p
	{
		padding: .4em 0;
	}

	/* Block content */
	blockquote
	{
		background: #f5f5ff;
		border: 1px dashed #aaa;
		padding: .4em 1em;
		margin: .5em 0;
		border-radius:.5em;
	}

	/* Columns */
	.column
	{
		float: left;
		margin: 0 2% 0 0;
		padding: .2em 0;
	}

	.half
	{
		width: 47.4%;
	}

	.two-thirds
	{
		width: 64.7%;
	}

	.one-third
	{
		width: 31.3%;
	}

	.one-fourth
	{
		width: 23%;
	}

	.one-fifth
	{
		width: 18%;
	}

	/* Images */
	img
	{
		vertical-align: middle;
	}
	a img
	{
		border: 0;
	}

	/* Links */
	a
	{
		color: #a33;
	}
	nav a
	{
		color: #a33 !important;
	}
	h1 a, h2 a, h3 a, h4 a
	{
		text-decoration: none;
	}
	h1 a:hover, h2 a:hover, h3 a:hover, h4 a:hover
	{
		text-decoration: underline;
	}

	/* Pager */
	nav.page
	{
		font-size:120%;
		padding:1em 0;
		clear:both;
	}

	nav.page strong, nav.page a
	{
		font-weight:bold;
	}
	nav.page a
	{
		text-decoration:none;
		font-size:90%;
	}
	nav.page a:hover
	{
		box-shadow:2px 2px 2px #ddd;
	}
	nav.page strong
	{
		font-size:120%;
	}

	/* Tags */
	aside p.tags
	{
		line-height: 200% !important;
	}
	a.tag
	{
		padding: 7px 3px !important;
		text-decoration: none;
	}
	a.tag:hover
	{
		text-decoration: underline;
	}
	a.x1
	{
		font-size: 11px;
	}
	a.x2
	{
		font-size: 12px;
	}
	a.x3
	{
		font-size: 13px;
	}
	a.x4
	{
		font-size: 14px;
	}
	a.x5
	{
		font-size: 15px;
	}
	a.x6
	{
		font-size: 16px;
	}
	a.x7
	{
		font-size: 17px;
	}
	a.x7
	{
		font-size: 18px;
	}
	a.x8
	{
		font-size: 18px;
	}
	a.x9
	{
		font-size: 19px;
	}
	a.x10
	{
		font-size: 20px;
	}
	a.x11
	{
		font-size: 21px;
	}
	a.x12
	{
		font-size: 22px;
	}
	a.x13
	{
		font-size: 23px;
	}
	a.x14
	{
		font-size: 24px;
	}
	a.x15
	{
		font-size: 25px;
	}

	/* Lists */
	ul, ol
	{
		margin: 1em 2em;
	}
	ul li, ol li
	{
		margin: 0 0 .3em 0;
	}

	/* Form elements */
	header form
	{
		float: right;
		margin:3.5em 0 0 0;
	}
	header form legend
	{
		display: none;
	}

	fieldset
	{
		border: 1px solid #a33;
		border-color:#a33 #ddd #ddd #ddd;
		box-shadow:3px 3px 3px #ddd;
		margin: 0 0 1em 0;
		padding:.3em;
	}

	.column fieldset
	{
		min-height:15em;
	}

	legend
	{
		font-size: 130%;
		padding: .2em .3em;
		margin: .2em 1em 0 1em;
	}

	input[type^='button'], input[type^='text'], input[type^='password'], textarea
	{
		border: 1px solid #999;
		border-radius:.3em;
		background: #fff;
		color: #575757;
		padding: .4em;
	}
	.column input[type^='text'], .column input[type^='password'], textarea
	{
		width: 90%;
	}
	input[type^='text']:focus, input[type^='password']:focus, textarea:focus
	{
		border-color:#a33;
	}
	textearea {
		font: normal 100% "Segoe UI" , "Myriad" , Tahoma, Sans-serif;
	}

	input[type^='submit'], input[type^='reset'], input[type^='button']
	{
		cursor: pointer;
		background: transparent;
		font-size: 120%;
		color: #a33;
	}

	label input:not([type^='checkbox']), label textarea
	{
		display: block;
	}
	label span
	{
		font-size: 75%;
		font-weight: bold;
		color: #a33;
	}

	input.error
	{
		color: #000;
		background: #fcc;
		border: 1px solid #a33;
	}

	label span
	{
		font: bold 75% sans-serif;
		color: #a33;
	}

	label.error
	{
		padding: 2px;
		color: #fff;
		background: #f33;
	}
}

@media print
{
	body { color:#000 !important; background:#fff !important; }
	aside, footer { display:none !important; }
}

/*
Chocolate? This is doo doo baby!
- Dave Chappelle
*/

I do know that HTML5 Boilerplate is doing a very good job of promoting HTML5 Boilerplate.

All in all, it seems to be a solution to a problem that doesn’t really exist. Or maybe I’m wrong.

Basic: The community version

A little while ago, I created a theme called “Basic” for corporate-ish websites and I’ve been getting requests for a community version; basically a version that would accomodate a blog/forum. I decided to reuse 90% – 95% of the original backend JavaScript to the discussion forum mockup (which I’ve renamed “Road”) including a lot of similar functionality. Except this time, I’m going for a more “component” oriented layout that I can easily turn into tabs. I created a simple tab plugin in jQuery to turn page elements into tabs with minimal overhead and I’m reusing a slightly modified version of the code formatting plugin in the topic page example .

Community index

 

Community tag browsing

 

Most of the changes really went to the topic view page. I made some changes to the code format snippet and moved the user info and avatar away to the side for clarity. The darker colors were partly inspired by the Sublime text editor (thanks to Shannon for mentioning it) as I found it easier to read large blocks of code on a darker background although I’m still not doing any syntax highlighting.

Forum topic view

 

Haven’t change much of the backend code of the add topic form, but I have changed the styling and streamlined things a bit.

Reworded instructions

Code formatting (snippet)

I had a question sent to me asking about formatting code for viewing on a web page purely using HTML and CSS and without using JavaScript or any syntax highlighter of some sort. My initial answer would be that this isn’t usually a good idea since the <pre> and <code> tags are still probaby the best option for displaying code, however it isn’t really that hard to do.

To show code with line numbers, the best way would be to put it into an <ol> (ordered list) and each line into <li> tags.

The CSS portion is fairly straightforward…

/* Code Formatting */
ol.code
{
	margin: 1em 0;
	padding:0 0 0 3em;
	background:#f5f5ff;
	border:1px solid #eee;
}

ol.code li
{
	margin: 0;
	padding: .1em 0 .1em .3em;
	background: #fff;
	white-space: pre-wrap;
	border: 1px solid #eee;
	border-width: 0 0 0 1px;
}

ol.code li:nth-child(even)
{
	background: #f8f8ff;
}

And the HTML (used to display the same code above)…

<ol class="code">
	<li>/* Code Formatting */</li>
	<li>ol.code</li>
	<li>{</li>
	<li>	margin: 1em 0;</li>
	<li>	padding: 0 0 0 2.2em;</li>
	<li>	background: #f5f5ff;</li>
	<li>	border: 1px solid #eee;</li>
	<li>}</li>
	<li></li>
	<li>ol.code li</li>
	<li>{</li>
	<li>	margin: 0;</li>
	<li>	padding: .1em 0 .1em .3em;</li>
	<li>	background:#fff;</li>
	<li>	white-space:pre-wrap;</li>
	<li>	border:1px solid #eee;</li>
	<li>	border-width:0 0 0 1px;</li>
	<li>}</li>
	<li></li>
	<li>ol.code li:nth-child(even)</li>
	<li>{</li>
	<li>	background: #f8f8ff;</li>
	<li>}</li>
</ol>

You can see a running example added to the Basic theme.

The down side to this is that this means if you copy the code off the formatted view, all your line breaks and even some spacing would be lost. There’s still a need to provide an alternative like a textarea view or side <pre> tag. One idea would be to use jQuery UI tabs to switch between formatted and raw views since this way at least you’re reusing existing code as much as possible.

Update

After a couple of back and forth emails, I wrote a very simple jQuery plugin that will turn any text block into a formatted and raw view via a simple tab interface. This is meant to be as fast and as lightweight as possible so it doesn’t have any syntax highlighting.

/*
jQuery fast code format (non-syntax highlighting) plugin
*/

(function($) {
	$.fn.codeformat = function(options) {
		var settings = $.extend({
			rawText : "Raw",
			formattedText: "Formatted"
		}, options);
		
		return this.each(function(i) {
			var c = $(this);
			
			var oc = c.text().split("\n"); // Get each line of code
			var ol = '<ol class="code">'; // <ol> wrapper
			
			// This is faster than "append()";
			$.each(oc, function(j, v) {
				ol += '<li>'+ v +'</li>';
			});
			ol += '</ol>'; // Finish <ol>
			
			// Container
			var w = $('<div id="codeview'+ i +'" class="codeformat" />');
			
			// Tab controls
			var u = $('<ul><li><a href="#codeview'+ i +'f">'+ settings.formattedText +'</a></li>'+
				'<li><a href="#codeview'+ i +'r">'+ settings.rawText +'</a></li></ul>');

			// Formatted and Raw view containers
			var fv = $('<div id="codeview'+ i +'f" class="codeview" />');
			var rv = $('<div id="codeview'+ i +'r" class="codeview" />');
			
			w.insertBefore(c)	// Put the container before the current code view
			u.appendTo(w)	// Put the tabs into the container
			fv.insertAfter(u);	// Put formatted view right after tabs
			rv.insertAfter(fv);	// Put raw view after formatted view
			
			fv.append(ol);	// Insert code lines into formatted view
			c.appendTo(rv)	// Put the current code block into raw view
			
			// Tab controls
			u.find('a:[href="#codeview'+ i +'f"]').bind('click', function(e) {
				if(rv.is(':visible')) {
					fv.toggle();
					rv.toggle();
					$(this).addClass("active");
					u.find('a:[href="#codeview'+ i +'r"]').removeClass("active");
				}
				e.preventDefault();
			});
			u.find('a:[href="#codeview'+ i +'r"]').bind('click', function(e) {
				if(fv.is(':visible')) {
					fv.toggle();
					rv.toggle();
					$(this).addClass("active");
					u.find('a:[href="#codeview'+ i +'f"]').removeClass("active");
				}
				e.preventDefault();
			});

			// Set initial view (show formatted)
			$('#codeview'+ i +'r').toggle();
			$('a:[href="#codeview'+ i +'f"]').addClass("active");
		});
	};
})(jQuery);

 

And the updated CSS

/* Code Formatting */
div.codeformat
{
	padding: .3em;
}

div.codeview
{
	background:#fff;
	color:#444;
	border:1px solid #ddd;
}

/* Code format tabs */
div.codeformat ul:first-child
{
	margin:.4em .2em .3em .2em;
	padding:0;
	list-style:none;
	min-height:1.8em;
}

div.codeformat ul:first-child li 
{
	margin:.3em .3em 0 .5em;
	padding:0;
	float:left;
}

div.codeformat ul:first-child li a
{
	padding:.4em .6em;
	text-decoration:none;
	font-weight:bold;
	color: #999;
	background: #f5f5ff;
	border-radius: 5px 5px 0 0;
	border:1px solid #ddd;
	border-width: 1px 1px 0 1px;
}

div.codeformat ul:first-child li a.active
{
	color: #444;
	background:#fff;
	padding:.4em .6em .5em .6em;
}

code, pre
{
	margin: 1em 0;
	font-family: Monospace;
	font-size:100%;
	overflow-x: scroll;
}
code
{
	white-space: pre;
}
div.codeformat code, div.codeformat pre
{
	margin:.2em .5em;
}

ol.code
{
	margin: 0 !important;
	padding: 0 0 0 3em;
	background: #f5f5ff;
	font-family: Monospace;
}

ol.code li
{
	margin: 0;
	padding: .2em 0 .2em .3em;
	background: #fff;
	white-space: pre-wrap;
	border: 1px solid #ddd;
	border-width: 0 0 0 1px;
}

ol.code li:nth-child(even)
{
	background: #f8f8ff;
}

 

To use this, just add jQuery and the plugin to your page and call the following on any <pre> or <code> block

$('code').codeformat();

You can checkout the updated running example.

Basic: A simple theme for corporate-ish sites

When I say corporate-ish, I mean totally serious and devoid of any shenanigans that may offend your boss… or you, if you’re that type of person. Deadline: 1 hour.

I was going to make this post before, but I just didn’t have the time. This was another design partly inspired by need and part by reader request. I got an email this evening by somone with a very similar need so I figured I’d post it here for anyone else to pickup.

“Basic” would fall into the simple-but-not-simplistic category of designs in that it’s got the bare essentials only, similar to the Simply theme, but with even fewer bells and whistles and yet is still clear, functional and usable. While it can be used as-is, it’s really a learning tool ment to be taken apart and reassembled with any additional hacks by the user. I also left out a lot of older cruft from pre HTML 5 designs as much as possible and the CSS also reflects more v.3 usage.

Basic: Front page

 

While creating this theme, I realised how similar the header is to the asp.net tutorial pages, especially the article header below the top links. This was purely coincidence since this was meant to be a bare minimum theme and Microsoft has a reputation for being terribly unexciting.

Header uses the same font and sizes

 

But since it looked similar at this point anyway, I decided to add a matching breadcrumb navigation…

Since we're this close. Why not go all the way?

 

The difference of course is that the HTML and CSS in my version are completely different. I.E. Simpler and straightforward. In the MS example, they’re using a paragraph with actual backlashes and s to denote seperators. This is way overkill and pointless so in my example, I’m using the <nav> tag and a <ul> list.

The backslashes are added via CSS “after” and “content” :

nav.crumbs ul li:after
{
	content: "/";
}
nav.crumbs ul li:last-child:after
{
	content: "";
}

Since most modern browsers support this anyway, and the fact that these are not inline links, but list items, it pretty well does exactly what a navigation list is meant to do.

The breadcrumbs are shown on the corresponding “topics” example page.

Basic with breadcrumbs

Borrowing from the Discussion Forum mockup where I had placed the new topic, login and registration forms on the same page, I included them on the bottom here and used an expandable “component” function. Basically takes any container with a “component” class and turns the elements inside into widgets with the first <h2> element inside as the handler instead of using jQuery UI tabs or accordion.

The JavaScript is otherwise the same as the forum mockup. Here’s that “component” function :

function initComps() {
	$('.components').children().each(function(i) {
		var b = $(this);

		// First element is usually the title/handler
		var hd = b.children('h2');

		// All other elements need to be wrapped up
b.children(':not(:first-child)').wrapAll('</pre>
<div class="blockct"></div>
<pre>');
		var ct = b.children('.blockct');
		var lnk = hd.append(' <a href="#" rel="showhide">(show)</a>');

		// Find the toggle link in the header and assign the control
		hd.children('a[rel="showhide"]').click(function (e) {
			ct.slideToggle();	// Toggle contents

			// Change link text
			$(this).text(($(this).text() == "(show)")? "(hide)" : "(show)");
			return false;	// Don't return
		});

		ct.hide(); // Initially hide everything
	});
}

If you think the opener to this post is odd : Not that I’m calling my boss boring…

…but I can’t think of a way to finish that sentence.

Bart Simpson