AntiXss 4.2 Breaks everything

This is one of those situations where none of your available options are good and your least harmful alternative is to shoot yourself in the foot at a slightly odd angle so as to only lose the little toe and not the big one.

All of this happened when Microsoft revealed January that their AntiXss library, now known as the Microsoft Web Protection Library (never seen a more ironic combination of words), had a vulnerability and like all obedient drones, we must update immediately to avoid shooting ourselves in our big toe. The problem is that updating will cause you to loose your little toe.

You see, the new library BREAKS EVERYTHING and eats your children.

Update 11/14/2013:
A new HTML sanitizer is now available for PHP.

I WILL EAT ALL YOUR TAGS!!!

I think the problem is best described by someone who left a comment at the project discussion board.

I was using an old version of Anti-XSS with a rich text editor (CkEditor). It was working very great. But when upgrading to latest version, I discovered the new sanitized is way too much aggressive and is removing almost everything “rich” in the rich editor, specially colors, backgrounds, font size, etc… It’s a disaster for my CMS!

Is there any migration path I can use to keep some of the features of the rich text editor and having at least minimal XSS protection ?

Lovely eh?

Here’s the response from the coordinator.

CSS will always be stripped now – it’s too dangerous, but in other cases it is being too greedy, dropping hrefs from a tags for example. That is being looked at.

I know this may be a strange idea to comprehend for the good folks who developed the library, but you see in the civilized world, many people tend to use WYSIWYG in their projects so as to not burden their users with tags. These days more people are familiar with rudimentary HTML, but when you just want to quickly make a post, comment or otherwise share something, it’s nice to know there’s an editor that can accommodate rich formatting. This is especially true on a mobile device, where switching from text to special characters for tags is still annoying.

Those WYSIWYGs invariably use CSS and inline styles to accomplish this rich formatting, thereby making your assertion ridiculous and this library now completely impractical.

A very quick test on the 4.2 Sanitizer shows that it totally removes strong tags, h1 tags, section tags and as mentioned above strips href attributes from anchor tags. At this rate the output will soon be string.Empty. I hope that the next version will allow basic markup tags and restore the href to anchors.

So in other words, AntiXss is now like an antidepressant. You’ll feel a lot better after taking it, but you may end up killing yourself.

And that’s not all…

I would have kept my mouth shut about this even though I’ve had my doubts about depending on the library over something DIY, but since I work with a bunch of copycat monkeys, I have to use whatever everyone else deems worthy of being included in a project (common sense be damned). I thought, surely there would at least be the older versions available, but no

It’s company policy I’m afraid. The source will remain though, so if you desperately wanted you could download and compile your own versions of older releases.

Of course, I lost my temper at that. Since I’m forced to use this library and one of the devs went ahead and upgraded without backing up the old version or finding out exactly how the vulnerability would affect us. I now had to go treasure hunting across three computers to find 4.0 after just getting home.

AntiXss 4.2 is stupid and so is Microsoft.

Here’s my current workaround until MS comes up with a usable alternative. I’m also using the HtmlAgilityPack which at the moment hasn’t contracted rabies, thankfully, and the 4.0 library.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace Arcturus.Helpers
{
	/// <summary>
	/// This is an HTML cleanup utility combining the benefits of the
	/// HtmlAgilityPack to parse raw HTML and the AntiXss library
	/// to remove potentially dangerous user input.
	///
	/// Additionally it uses a list created by Robert Beal to limit
	/// the number of allowed tags and attributes to a sensible level
	/// </summary>
	public sealed class HtmlUtility
	{
		private static volatile HtmlUtility _instance;
		private static object _root = new object();

		private HtmlUtility() { }

		public static HtmlUtility Instance
		{
			get
			{
				if (_instance == null)
					lock (_root)
						if (_instance == null)
							_instance = new HtmlUtility();

				return _instance;
			}
		}

		// Original list courtesy of Robert Beal :
		// http://www.robertbeal.com/

		private static readonly Dictionary<string, string[]> ValidHtmlTags =
			new Dictionary<string, string[]>
        {
            {"p", new string[]          {"style", "class", "align"}},
            {"div", new string[]        {"style", "class", "align"}},
            {"span", new string[]       {"style", "class"}},
            {"br", new string[]         {"style", "class"}},
            {"hr", new string[]         {"style", "class"}},
            {"label", new string[]      {"style", "class"}},

            {"h1", new string[]         {"style", "class"}},
            {"h2", new string[]         {"style", "class"}},
            {"h3", new string[]         {"style", "class"}},
            {"h4", new string[]         {"style", "class"}},
            {"h5", new string[]         {"style", "class"}},
            {"h6", new string[]         {"style", "class"}},

            {"font", new string[]       {"style", "class",
				"color", "face", "size"}},
            {"strong", new string[]     {"style", "class"}},
            {"b", new string[]          {"style", "class"}},
            {"em", new string[]         {"style", "class"}},
            {"i", new string[]          {"style", "class"}},
            {"u", new string[]          {"style", "class"}},
            {"strike", new string[]     {"style", "class"}},
            {"ol", new string[]         {"style", "class"}},
            {"ul", new string[]         {"style", "class"}},
            {"li", new string[]         {"style", "class"}},
            {"blockquote", new string[] {"style", "class"}},
            {"code", new string[]       {"style", "class"}},
			{"pre", new string[]       {"style", "class"}},

            {"a", new string[]          {"style", "class", "href", "title"}},
            {"img", new string[]        {"style", "class", "src", "height",
				"width", "alt", "title", "hspace", "vspace", "border"}},

            {"table", new string[]      {"style", "class"}},
            {"thead", new string[]      {"style", "class"}},
            {"tbody", new string[]      {"style", "class"}},
            {"tfoot", new string[]      {"style", "class"}},
            {"th", new string[]         {"style", "class", "scope"}},
            {"tr", new string[]         {"style", "class"}},
            {"td", new string[]         {"style", "class", "colspan"}},

            {"q", new string[]          {"style", "class", "cite"}},
            {"cite", new string[]       {"style", "class"}},
            {"abbr", new string[]       {"style", "class"}},
            {"acronym", new string[]    {"style", "class"}},
            {"del", new string[]        {"style", "class"}},
            {"ins", new string[]        {"style", "class"}}
        };

		/// <summary>
		/// Takes raw HTML input and cleans against a whitelist
		/// </summary>
		/// <param name="source">Html source</param>
		/// <returns>Clean output</returns>
		public string SanitizeHtml(string source)
		{
			HtmlDocument html = GetHtml(source);
			if (html == null) return String.Empty;

			// All the nodes
			HtmlNode allNodes = html.DocumentNode;

			// Select whitelist tag names
			string[] whitelist = (from kv in ValidHtmlTags
								  select kv.Key).ToArray();

			// Scrub tags not in whitelist
			CleanNodes(allNodes, whitelist);

			// Filter the attributes of the remaining
			foreach (KeyValuePair<string, string[]> tag in ValidHtmlTags)
			{
				IEnumerable<HtmlNode> nodes = (from n in allNodes.DescendantsAndSelf()
											   where n.Name == tag.Key
											   select n);

				// No nodes? Skip.
				if (nodes == null) continue;

				foreach (var n in nodes)
				{
					// No attributes? Skip.
					if (!n.HasAttributes) continue;

					// Get all the allowed attributes for this tag
					HtmlAttribute[] attr = n.Attributes.ToArray();
					foreach (HtmlAttribute a in attr)
					{
						if (!tag.Value.Contains(a.Name))
						{
							a.Remove(); // Attribute wasn't in the whitelist
						}
						else
						{
							// *** New workaround. This wasn't necessary with the old library
							if (a.Name == "href" || a.Name == "src") {
								a.Value = (!string.IsNullOrEmpty(a.Value))? a.Value.Replace("\r", "").Replace("\n", "") : "";
								a.Value =
									(!string.IsNullOrEmpty(a.Value) &&
									(a.Value.IndexOf("javascript") < 10 || a.Value.IndexOf("eval") < 10)) ?
									a.Value.Replace("javascript", "").Replace("eval", "") : a.Value;
							}
							else if (a.Name == "class" || a.Name == "style")
							{
								a.Value =
									Microsoft.Security.Application.Encoder.CssEncode(a.Value);
							}
							else
							{
								a.Value =
									Microsoft.Security.Application.Encoder.HtmlAttributeEncode(a.Value);
							}
						}
					}
				}
			}

			// *** New workaround (DO NOTHING HAHAHA! Fingers crossed)
			return allNodes.InnerHtml;

			// *** Original code below

			/*
			// Anything we missed will get stripped out
			return
				Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(allNodes.InnerHtml);
			 */
		}

		/// <summary>
		/// Takes a raw source and removes all HTML tags
		/// </summary>
		/// <param name="source"></param>
		/// <returns></returns>
		public string StripHtml(string source)
		{
			source = SanitizeHtml(source);

			// No need to continue if we have no clean Html
			if (String.IsNullOrEmpty(source))
				return String.Empty;

			HtmlDocument html = GetHtml(source);
			StringBuilder result = new StringBuilder();

			// For each node, extract only the innerText
			foreach (HtmlNode node in html.DocumentNode.ChildNodes)
				result.Append(node.InnerText);

			return result.ToString();
		}

		/// <summary>
		/// Recursively delete nodes not in the whitelist
		/// </summary>
		private static void CleanNodes(HtmlNode node, string[] whitelist)
		{
			if (node.NodeType == HtmlNodeType.Element)
			{
				if (!whitelist.Contains(node.Name))
				{
					node.ParentNode.RemoveChild(node);
					return; // We're done
				}
			}

			if (node.HasChildNodes)
				CleanChildren(node, whitelist);
		}

		/// <summary>
		/// Apply CleanNodes to each of the child nodes
		/// </summary>
		private static void CleanChildren(HtmlNode parent, string[] whitelist)
		{
			for (int i = parent.ChildNodes.Count - 1; i >= 0; i--)
				CleanNodes(parent.ChildNodes[i], whitelist);
		}

		/// <summary>
		/// Helper function that returns an HTML document from text
		/// </summary>
		private static HtmlDocument GetHtml(string source)
		{
			HtmlDocument html = new HtmlDocument();
			html.OptionFixNestedTags = true;
			html.OptionAutoCloseOnEnd = true;
			html.OptionDefaultStreamEncoding = Encoding.UTF8;

			html.LoadHtml(source);

			// Encode any code blocks independently so they won't
			// be stripped out completely when we do a final cleanup
			foreach (var n in html.DocumentNode.DescendantNodesAndSelf())
			{
				if (n.Name == "code") {
					//** Code tag attribute vulnerability fix 28-9-12 (thanks to Natd)
					HtmlAttribute[] attr = n.Attributes.ToArray();
					foreach (HtmlAttribute a in attr) {
						if (a.Name != "style" && a.Name != "class")  { a.Remove(); }
					} //** End fix
					n.InnerHtml =
						Microsoft.Security.Application.Encoder.HtmlEncode(n.InnerHtml);
				}
			}

			return html;
		}
	}
}

This is a singleton class, so you need to call Instance to initiate.

E.G.

HtmlUtility util = HtmlUtility.Instance;

7:40AM… Bedtime!

Update : September 28.

Natd discovered a vulnerability in this code that allowed onclick attributes to be added to the code tag itself. Fixed.

Discussion Forum update (Utilities and PostRepository)

This is just a followup with two classes from the discussion forum. I haven’t tested the PostRepository class well yet, but I’ll update it with fixes later. Util is the general utilities class that I’ve used in previous projects. It’s basically for rudimentary formatting, input validation etc…

6:40 AM… Time for bed!

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Security.Cryptography;
using System.Globalization;
using System.Web;

namespace Road.Helpers
{
	public class Util
	{
		/// <summary>
		/// Gets the checksum of a local file or some text.
		/// </summary>
		/// <param name="source">Path to a file or a string</param>
		/// <param name="mode">Checksum mode in sha1, sha256, sha512 or md5 (default)</param>
		/// <param name="isFile">True if file mode or false for text mode (defaults to false)</param>
		/// <returns>Completed checksum</returns>
		public static string GetChecksum(string source, string mode = "md5", bool isFile = false)
		{
			byte[] bytes = { };
			Stream fs;

			if (isFile)
				fs = new BufferedStream(File.OpenRead(source), 120000);
			else
				fs = new MemoryStream(Encoding.UTF8.GetBytes(source));

			switch (mode.ToLower())
			{
				case "sha1":
					using (SHA1CryptoServiceProvider sha1 =
						new SHA1CryptoServiceProvider())
						bytes = sha1.ComputeHash(fs);
					break;

				case "sha256":
					using (SHA256CryptoServiceProvider sha256 =
						new SHA256CryptoServiceProvider())
						bytes = sha256.ComputeHash(fs);
					break;

				case "sha512":
					using (SHA512CryptoServiceProvider sha512 =
						new SHA512CryptoServiceProvider())
						bytes = sha512.ComputeHash(fs);
					break;

				case "md5":
				default:
					using (MD5CryptoServiceProvider md5 =
						new MD5CryptoServiceProvider())
						bytes = md5.ComputeHash(fs);
					break;
			}

			// Cleanup
			fs.Close();
			fs = null;

			return BitConverter
					.ToString(bytes)
					.Replace("-", "")
					.ToLower();
		}

		/// <summary>
		/// Returns the page slug or converts a page title into a slug
		/// </summary>
		public static string GetSlug(string val, string d, int length = 45, bool lower = false)
		{
			val = Util.DefaultFlatString(val, d, length);

			// Duplicate spaces
			val = Regex.Replace(val, @"[\s-]+", " ").Trim();
			val = Util.NormalizeString(val); // Remove special chars
			val = Regex.Replace(val, @"\s", "-"); // Spaces to dashes


			// If we still couldn't get a proper string, generate one from default
			val = (String.IsNullOrEmpty(val) || val.Length < 3) ? d :
			val.Substring(0, val.Length <= length ? val.Length : length).Trim();

			
			return (lower) ? val.ToLower() : val;
		}

		private static string NormalizeString(string txt)
		{
			if (!String.IsNullOrEmpty(txt))
				txt = txt.Normalize(NormalizationForm.FormD);

			StringBuilder sb = new StringBuilder();

			sb.Append(
				txt.Normalize(NormalizationForm.FormD).Where(
					c => CharUnicodeInfo.GetUnicodeCategory(c)
					!= UnicodeCategory.NonSpacingMark).ToArray()
				);

			return sb.ToString().Normalize(NormalizationForm.FormD);
		}

		/// <summary>
		/// Gets an array of cleaned tags
		/// </summary>
		/// <param name="txt">A comma delimited string of tags</param>
		/// <returns>Array of cleaned tags</returns>
		public static string[] GetTags(string txt, bool lower = false)
		{
			string[] tags = txt.Split(',');
			ArrayList clean = new ArrayList();

			for (int i = 0; i < tags.Length; i++)
			{
				tags[i] = DefaultFlatString(tags[i], " ").Trim();

				if (!string.IsNullOrEmpty(tags[i]))
					tags[i] = NormalizeString((lower)? 
						tags[i].ToLower() : tags[i]);

				// Don't want to repeat
				if (!clean.Contains(tags[i])) 
						clean.Add(tags[i]);
			}

			return (string[])clean.ToArray(typeof(string));
		}

		/// <summary>
		/// Gets an array of cleaned keywords
		/// </summary>
		/// <param name="txt">A comma delimited string of keywords</param>
		/// <param name="limit">Limit s the number of tags returned</param>
		/// <param name="tolower">Optional parameter to convert the text to lowercase</param>
		/// <returns>Array of cleaned keywords</returns>
		public static List<string> GetKeywords(string txt, int limit, bool tolower = true)
		{
			string[] tags = txt.Split(',');
			List<string> clean = new List<string>();

			for (int i = 0; i < tags.Length; i++)
			{
				tags[i] = Util.DefaultFlatString(tags[i], "");

				if (!String.IsNullOrEmpty(tags[i]))
				{
					if (tolower)
						clean.Add(tags[i].ToLower());
					else
						clean.Add(tags[i]);
				}
			}

			return clean;
		}

		/// <summary>
		/// Shorten a give text block followed by an ellipse
		/// </summary>
		public static string TrimText(string strInput, int intNum)
		{
			strInput = strInput.Replace("\r", string.Empty)
				.Replace("\n", string.Empty);
			if ((strInput.Length > intNum) && (intNum > 0))
			{
				strInput = strInput.Substring(0, intNum) + "...";
			}
			return strInput;
		}

		/// <summary>
		/// Checks whether string has value or sets default it doesn't or is at 0
		/// </summary>
		public static int DefaultInt(string val, int d, int? min)
		{
			int tmp = 0;

			if (!Int32.TryParse(val, out tmp))
				tmp = d;

			if (min.HasValue)
				if (tmp <= min.Value) tmp = d;

			return tmp;
		}

		/// <summary>
		/// Checks whether nullable int has value or sets default it doesn't or is at 0
		/// </summary>
		public static int DefaultInt(int? val, int d, int? min)
		{
			val = val ?? d;
			if (min.HasValue)
				if (val.Value <= min.Value) val = d;

			return val.Value;
		}

		/// <summary>
		/// Checks whether nullable bool has value or sets default it doesn't
		/// </summary>
		public static bool DefaultBool(bool? val, bool d)
		{
			val = val ?? d;
			return val.Value;
		}

		/// <summary>
		/// Checks whether nullable bool has value or sets default it doesn't
		/// </summary>
		public static bool DefaultBool(string val, bool d)
		{
			bool tmp = d;

			if (Boolean.TryParse(val, out tmp))
				return tmp;

			return d;
		}

		/// <summary>
		/// Returns a flat (no line breaks) string or a default value if empty
		/// </summary>
		public static string DefaultFlatString(string val, string d, int l = 255)
		{
			return Util.DefaultString(val, d, l).Replace(Environment.NewLine, "");
		}

		/// <summary>
		/// Checks whether nullable string has value or sets default it doesn't or is at empty
		/// </summary>
		public static string DefaultString(string val, string d, int l = 255)
		{
			if (string.IsNullOrEmpty(val)) val = d;

			val.Replace("\r", Environment.NewLine)
				.Replace("\n", Environment.NewLine);

			if ((val.Length > 0) && (val.Length > l))
				val = val.Substring(0, l-1);

			return val;
		}

		/// <summary>
		/// Converts a string value to a DateTime object or returns the default value on failure
		/// </summary>
		public static DateTime DefaultDate(string val, DateTime d)
		{
			DateTime dt;
			if (DateTime.TryParse(val, out dt))
				return dt;

			return d;
		}

		/// <summary>
		/// Converts a nullable date value to a DateTime object or returns the default value on failure
		/// </summary>
		public static DateTime DefaultDate(DateTime? val, DateTime d)
		{
			DateTime dt;
			dt = (val.HasValue) ? val.Value : d;

			return d;
		}

		/// <summary>
		/// Gets the current user's IP address
		/// </summary>
		public static string GetUserIP()
		{
			// Connecting through a proxy?
			string ip = HttpContext.Current.Request.ServerVariables["HTTP_X_FORWARDED_FOR"];

			// None found
			if (string.IsNullOrEmpty(ip) || ip.ToLower() == "unknown")
				ip = HttpContext.Current.Request.ServerVariables["REMOTE_ADDR"];

			return ip;
		}

		public static string GetEmail(string v)
		{
			string email = @"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*" +
							@"@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$";

			if (!string.IsNullOrEmpty(v))
				if (Regex.IsMatch(v, email))
					return v;

			// Didn't match the email format, so sent a cleaned string
			return Util.DefaultFlatString(v, "");
		}
	}
}

PostRepository. This one’s a bit long…

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Road.Helpers;

namespace Road.Models
{
	public class PostRepository
	{
		// Common DataContext
		private readonly CMDataContext db;

		/// <summary>
		/// Constructor
		/// </summary>
		/// <param name="context">Global context</param>
		public PostRepository(CMDataContext context)
		{
			this.db = context;
		}

		#region Topic display methods


		/// <summary>
		/// Gets a topic by the given id and finds corresponding replies if any
		/// </summary>
		/// <param name="id">Topic id to search</param>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved replies (optional)</param>
		/// <param name="status">Array of reply status types (optional)</param>
		/// <param name="newestFirst">Sort by newest replies first (optional)</param>
		public Topic TopicById(int id, int index, int limit,
			bool unapproved = false, ReplyStatus[] status = null, bool newestFirst = false)
		{
			var query = from p in db.Posts
						where p.PostId == id
						select p;

			Topic topic = TopicQuery(query).FirstOrDefault();

			// We have a topic and replies were also requested
			if (topic != null && limit > 0)
			{
				var rquery = from p in db.Posts
							 join pa in db.PostRelations on p.PostId equals pa.ParentId
							 where pa.ParentId == topic.Id
							 select p;

				topic.Replies =
					ReplyQuery(rquery, unapproved, status, newestFirst).ToPagedList(index, limit);
			}

			return topic;
		}

		/// <summary>
		/// Gets a list of topics (most basic request, usually for frontpage)
		/// </summary>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of status topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>List of topics</returns>
		public List<Topic> TopicList(int limit, bool unapproved = false,
			TopicStatus[] status = null, bool newestFirst = true)
		{
			var query = from p in db.Posts
						select p;

			return TopicQuery(query, unapproved, status, newestFirst).Take(limit).ToList();
		}

		/// <summary>
		/// Gets a paged list of topics
		/// </summary>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of status topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>Paged list of topics</returns>
		public PagedList<Topic> TopicPageList(int index, int limit,
			bool unapproved = false, TopicStatus[] status = null, 
			bool newestFirst = true)
		{
			var query = from p in db.Posts
						select p;

			return TopicQuery(query, unapproved, status, newestFirst).ToPagedList(index, limit);
		}

		/// <summary>
		/// Gets a paged list of topics belonging to a tag(s)
		/// (This uses the lowercase TagName property)
		/// </summary>
		/// <param name="tag">Array of tags to search</param>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of status topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>Paged list of topics</returns>
		public PagedList<Topic> TopicsByTag(string[] tag, int index, int limit,
			bool unapproved = false, TopicStatus[] status = null, bool newestFirst = true)
		{
			var query = from t in db.PostTags
						join pt in db.PostTagRelations on t.TagId equals pt.TagId
						join p in db.Posts on pt.PostId equals p.PostId
						where tag.Contains(t.TagName)
						select p;

			return TopicQuery(query, unapproved, status, newestFirst).ToPagedList(index, limit);
		}

		/// <summary>
		/// Gets an individual topic belonging to a list of tag(s)
		/// </summary>
		/// <param name="tagId">Array of tag Ids to search</param>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of status topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>Paged list of topics</returns>
		public PagedList<Topic> TopicsByTagId(int[] tagId, int index, int limit,
			bool unapproved = false, TopicStatus[] status = null, bool newestFirst = true)
		{
			var query = from t in db.PostTags
						join pt in db.PostTagRelations on t.TagId equals pt.TagId
						join p in db.Posts on pt.PostId equals p.PostId
						where tagId.Contains(t.TagId)
						select p;

			return TopicQuery(query, unapproved, status, newestFirst).ToPagedList(index, limit);
		}

		/// <summary>
		/// Gets a paged list of topics by the search criteria
		/// </summary>
		/// <param name="search">Title and body search terms</param>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of status topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>Paged list of topics</returns>
		public PagedList<Topic> TopicsBySearch(string search, int index, int limit,
			bool unapproved = false, TopicStatus[] status = null, bool newestFirst = true)
		{
			var query = from p in db.Posts
						where p.BodyText.Contains(search) || p.Title.Contains(search)
						select p;

			return TopicQuery(query, unapproved, status, newestFirst).ToPagedList(index, limit);
		}

		/// <summary>
		/// Gets a paged list of replies by the search criteria 
		/// (only searches the bodytext)
		/// </summary>
		/// <param name="search">Search terms</param>
		/// <param name="index">Current page index</param>
		/// <param name="limit">Page size limit</param>
		/// <param name="unapproved">Include unapproved topics (optional)</param>
		/// <param name="status">Array of topic status types (optional)</param>
		/// <param name="newestFirst">Sort by newest topics first (optional)</param>
		/// <returns>Paged list of topics</returns>
		public PagedList<Reply> RepliesBySearch(string search, int index, int limit,
			bool unapproved = false, ReplyStatus[] status = null, bool newestFirst = true)
		{
			var query = from p in db.Posts
						where p.BodyText.Contains(search)
						select p;

			return ReplyQuery(query, unapproved, status, newestFirst).ToPagedList(index, limit);
		}

		#endregion

		#region Save methods

		/// <summary>
		/// Saves or creates a new reply under the given topic
		/// </summary>
		/// <param name="topic">Topic the reply belongs to</param>
		/// <param name="reply">Reply to save</param>
		public Reply SaveReply(Topic topic, Reply reply)
		{
			Post p = null;
			DateTime dt = DateTime.UtcNow;

			if (reply.Id != 0)
			{
				p = (from post in db.Posts
					where post.PostId == reply.Id
					select post).FirstOrDefault();
			}
			else
			{
				p = new Post();
				p.CreatedDate = dt;
				p.ReplyCount = 0;
				p.ViewCount = 0;
				db.Posts.InsertOnSubmit(p);
			}

			p.Approved = reply.Approved;
			p.Status = (byte)reply.Status;
			p.Threshold = reply.Threshold;
			p.LastModified = dt;
			p.BodyHtml = reply.Body;
			p.BodyText = reply.Summary;

			// Save reply
			db.SubmitChanges();

			// If this is a new reply...
			if (p.PostId > 0 && reply.Id == 0)
			{
				// We now have an Id to set
				reply.Id = p.PostId;

				// Create Author, PostRelation and PostAuthor relationships
				Author a = new Author();
				a.MemberId = reply.CreatedBy.Id;
				a.AuthorIP = reply.CreatedBy.IP;
				a.AuthorName = reply.CreatedBy.Name;
				a.AuthorEmail = reply.CreatedBy.Email;
				a.AuthorWeb = reply.CreatedBy.Web;
				db.Authors.InsertOnSubmit(a);


				PostRelation pr = new PostRelation();
				pr.ParentId = topic.Id;
				pr.PostId = p.PostId;
				db.PostRelations.InsertOnSubmit(pr);
				db.SubmitChanges();

				if (a.AuthorId > 0)
				{
					PostAuthor pa = new PostAuthor();
					pa.AuthorId = reply.CreatedBy.Id;
					pa.PostId = reply.Id;
					db.PostAuthors.InsertOnSubmit(pa);
					db.SubmitChanges();
				}
			}

			return reply;
		}

		/// <summary>
		/// Saves or creates a new topic
		/// </summary>
		/// <param name="topic">Topic to save</param>
		/// <returns>Returns the saved topic</returns>
		public Topic SaveTopic(Topic topic)
		{
			Post p = null;
			DateTime dt = DateTime.UtcNow;

			if (topic.Id != 0)
			{
				p = (from post in db.Posts
					 where post.PostId == topic.Id
					 select post).FirstOrDefault();
			}
			else
			{
				p = new Post();
				p.CreatedDate = dt;
				db.Posts.InsertOnSubmit(p);
			}

			p.Title = topic.Name;
			p.Approved = topic.Approved;
			p.Status = (byte)topic.Status;
			p.Threshold = topic.Threshold;

			p.LastModified = dt;
			p.BodyHtml = topic.Body;
			p.BodyText = topic.Summary;

			p.ViewCount = topic.ViewCount;
			p.ReplyCount = topic.ReplyCount;

			// Save
			db.SubmitChanges();

			// If this is a new topic...
			if (p.PostId > 0 && topic.Id == 0)
			{
				// Set the Id, now that we have one
				topic.Id = p.PostId;

				// Create author and set relationship
				Author a = new Author();
				a.MemberId = topic.CreatedBy.MemberId;
				a.AuthorIP = topic.CreatedBy.IP;
				a.AuthorName = topic.CreatedBy.Name;
				a.AuthorEmail = topic.CreatedBy.Email;
				a.AuthorWeb = topic.CreatedBy.Web;
				db.Authors.InsertOnSubmit(a);

				PostRelation pr = new PostRelation();
				pr.ParentId = p.PostId; // Same since it's a topic
				pr.PostId = p.PostId;
				db.PostRelations.InsertOnSubmit(pr);

				db.SubmitChanges();

				if (a.AuthorId > 0)
				{
					PostAuthor pa = new PostAuthor();
					pa.AuthorId = a.AuthorId;
					pa.PostId = p.PostId;
					db.PostAuthors.InsertOnSubmit(pa);

					db.SubmitChanges();
				}
			}


			topic.Slug = Util.GetSlug(p.Title, "topic");


			ApplyTags(topic.Tags.ToList(), topic);

			return topic;
		}

		#endregion

		#region Tag methods

		/// <summary>
		/// Gets a list of tags by a search string 
		/// (usually for tag autocomplete)
		/// </summary>
		/// <param name="tag">Tag search string</param>
		/// <param name="limit">Page size limit</param>
		/// <returns></returns>
		public List<Tag> TagsByName(string tag, int limit)
		{
			var query = from t in db.PostTags
						orderby t.TagName ascending
						where t.TagName.StartsWith(tag)
						select new Tag
						{
							Id = t.TagId,
							Name = t.TagName,
							Slug = t.Slug,
							DisplayName = t.TagName
						};

			if (limit > 0)
				query = query.Take(limit);

			return query.ToList();
		}

		/// <summary>
		/// Associates a list of tags with the given topic
		/// </summary>
		/// <param name="tags">Tags to link to topic</param>
		/// <param name="topic">Target topic</param>
		private void ApplyTags(List<Tag> tags, Topic topic)
		{
			List<PostTagRelation> existing = (from pt in db.PostTagRelations
											  join t in db.PostTags on pt.TagId equals t.TagId
											  where pt.PostId == topic.Id
											  select pt).ToList();

			// Clean existing relationships
			db.PostTagRelations.DeleteAllOnSubmit(existing);
			db.SubmitChanges();

			// Setup the new relationships
			List<PostTagRelation> newrelation = new List<PostTagRelation>();

			// Store the new tags and get the complete list of tags
			tags = StoreTags(tags, topic.CreatedBy);

			foreach (Tag t in tags)
			{
				PostTagRelation tag = new PostTagRelation();
				tag.TagId = t.Id;
				tag.PostId = topic.Id;
				newrelation.Add(tag);
			}
			// Save the new tag relationships
			db.PostTagRelations.InsertAllOnSubmit(newrelation);
			db.SubmitChanges();
		}


		/// <summary>
		/// Finds existing tags and creates new tags with the associated creator if
		/// the tag doesn't exist
		/// </summary>
		/// <param name="tags">List of tags to create/find</param>
		/// <param name="creator">Tag creator</param>
		/// <returns>List of found and newly created tags</returns>
		private List<Tag> StoreTags(List<Tag> tags, Creator creator)
		{
			// Complete list of all tags
			List<Tag> complete = new List<Tag>();

			// Created date
			DateTime dt = DateTime.UtcNow;

			string[] search = tags.Select(tg => tg.Name).ToArray();
			string[] existing = (from t in db.PostTags
								where search.Contains(t.TagName)
								select t.TagName).ToArray();

			// Tags except those already in the database
			string[] newtags = search.Except(existing).ToArray();

			// We have new tags to save
			if (newtags.Length > 0)
			{
				List<PostTag> savetags = (from tg in tags
								   where newtags.Contains(tg.Name)
								   select new PostTag
								   {
									   DisplayName = tg.DisplayName,
									   TagName = tg.DisplayName.ToLower(),
									   Slug = Util.GetSlug(tg.DisplayName, tg.Name),
									   Status = (byte)TagStatus.Open,
									   LastModified = dt,
									   CreatedDate = dt,
									   BodyHtml = "",
									   BodyText = ""
								   }).ToList();

				if (savetags.Count() > 0)
				{
					db.PostTags.InsertAllOnSubmit(savetags);
					db.SubmitChanges();

					// Create author info for each new tag
					Author author = getAuthor(creator);
					List<TagAuthor> authors = (from tg in savetags
											   select new TagAuthor
											   {
												   AuthorId = author.AuthorId,
												   TagId = tg.TagId
											   }).ToList();
					db.TagAuthors.InsertAllOnSubmit(authors);
					db.SubmitChanges();
				}

				// Get all existing and newly inserted tags
				complete = (from tg in db.PostTags
							where search.Contains(tg.TagName)
							select new Tag
							{
								Id = tg.TagId,
								Name = tg.TagName,
								DisplayName = tg.DisplayName,
								Slug = tg.Slug
							}).ToList();
			}

			return complete;
		}

		#endregion


		#region Queries

		/// <summary>
		/// Creates a deferred execution IQueryable to search topics
		/// </summary>
		/// <param name="posts">Initial search query</param>
		/// <returns>Topic IQueryable</returns>
		private IQueryable<Topic> TopicQuery(IQueryable<Post> posts, 
			bool unapproved = false, TopicStatus[] status = null,
			bool newestFirst = true)
		{
			var query = from p in posts
						join au in db.PostAuthors on p.PostId equals au.PostId
						join a in db.Authors on au.AuthorId equals a.AuthorId
						join m in db.Members on au.AuthorId equals m.MemberId into author
						from auth in author.DefaultIfEmpty() // Empty if anonymous post

						let postauthor = getCreator(a, auth)
						let tags = GetTagsForTopic(p.PostId)

						select new { p, postauthor, tags };

			// Include unapproved topics?
			query = (unapproved) ?
				query.Where(r => r.p.Approved == false) :
				query.Where(r => r.p.Approved == true);

			// Any status other than "Open"?
			query = (status != null) ?
				query = query.Where(r => status.Contains((TopicStatus)r.p.Status)) :
				query = query.Where(r => r.p.Status == (byte)TopicStatus.Open);

			// Sort by new topics first?
			query = (newestFirst) ?
				query.OrderByDescending(r => r.p.CreatedDate) :
				query.OrderBy(r => r.p.CreatedDate);

			return from r in query
				   select new Topic
				   {
					   Id = r.p.PostId,
					   Name = r.p.Title,
					   CreatedBy = r.postauthor,
					   CreatedDate = r.p.CreatedDate,
					   LastModified = r.p.LastModified,
					   Summary = r.p.BodyText,
					   Slug = Util.GetSlug(r.p.Title, "topic", 50, true),
					   Tags = new LazyList<Tag>(r.tags),
					   ViewCount = r.p.ViewCount,
					   ReplyCount = r.p.ReplyCount,
					   Threshold = (float)r.p.Threshold,
					   Status = (TopicStatus)r.p.Status
				   };
		}

		/// <summary>
		/// Creates a deferred execution IQueryable to search replies
		/// </summary>
		/// <param name="posts">Initial posts query</param>
		/// <param name="status">Status restriction array</param>
		/// <param name="newestFirst">Sort by new replies first</param>
		/// <returns>Reply IQueryable</returns>
		private IQueryable<Reply> ReplyQuery(IQueryable<Post> posts,  bool unapproved, ReplyStatus[] status, bool newestFirst)
		{
			var query = from p in posts
						join au in db.PostAuthors on p.PostId equals au.PostId
						join a in db.Authors on au.AuthorId equals a.AuthorId
						join m in db.Members on au.AuthorId equals m.MemberId into author
						from auth in author.DefaultIfEmpty() // Empty if anonymous post

						let postauthor = getCreator(a, auth)
						select new { p, postauthor };

			// Include unapproved replies?
			query = (unapproved) ?
				query.Where(r => r.p.Approved == false) :
				query.Where(r => r.p.Approved == true);

			// Any status other than "Open"?
			query = (status != null) ?
				query = query.Where(r => status.Contains((ReplyStatus)r.p.Status)) :
				query = query.Where(r => r.p.Status == (byte)ReplyStatus.Open);

			// Sort by new replies first?
			query = (newestFirst) ?
				query.OrderByDescending(r => r.p.CreatedDate) :
				query.OrderBy(r => r.p.CreatedDate);

			return from r in query.AsQueryable()
				   select new Reply
				   {
					   Id = r.p.PostId,
					   CreatedBy = r.postauthor,
					   CreatedDate = r.p.CreatedDate,
					   LastModified = r.p.LastModified,
					   Body = r.p.BodyHtml,
					   Threshold = (float)r.p.Threshold
				   };
		}

		/// <summary>
		/// Helper finds the tags for a topic by id
		/// </summary>
		/// <param name="id">Topic id to search</param>
		/// <returns>IQueryable Tag</returns>
		private IQueryable<Tag> GetTagsForTopic(int id)
		{
			return from t in db.PostTags
				   join pt in db.PostTagRelations on t.TagId equals pt.TagId
				   where pt.PostId == id
				   select new Tag
				   {
					   Id = t.TagId,
					   Name = t.TagName,
					   Slug = t.Slug,
					   DisplayName = t.DisplayName,
					   CreatedDate = t.CreatedDate,
					   LastModified = t.LastModified
				   };
		}

		#endregion

		#region Author/Creator Helpers

		/// <summary>
		/// Helper function generates a save friendly Author from a given Creator
		/// </summary>
		/// <param name="c">Creator data</param>
		/// <returns>Author object</returns>
		private static Author getAuthor(Creator c)
		{
			if (c == null)
				return null;

			Author author = new Author();
			if (c.Id > 0)
			{
				author.AuthorId = c.Id;
			}
			else
			{
				author.AuthorEmail = c.Email;
				author.AuthorWeb = c.Web;
			}

			author.AuthorIP = c.IP;
			author.AuthorName = c.Name;

			return author;
		}

		/// <summary>
		/// Finds or creates a Creator object from given author information
		/// </summary>
		/// <param name="a">Saved author information</param>
		/// <param name="m">Optional membership information</param>
		/// <returns>Composite Creator object</returns>
		private static Creator getCreator(Author a, Member m)
		{
			Creator au = new Creator();

			au.IP = a.AuthorIP;
			if (m != null)
			{
				au = getCreator(m);
			}
			else
			{
				au.LastModified = DateTime.MinValue;
				au.Id = a.AuthorId;
				au.Name = a.AuthorName;
				au.DisplayName = a.AuthorName;
				au.Email = a.AuthorEmail;
				au.Web = a.AuthorWeb;
			}
			return au;
		}

		/// <summary>
		/// Helper function generates a Creator object from membership info
		/// </summary>
		/// <param name="m">Member object</param>
		/// <returns>Composite Creator object</returns>
		private static Creator getCreator(Member m)
		{
			return new Creator
			{
				Id = m.MemberId,
				Name = m.Username,
				DisplayName = m.DisplayName,
				Email = m.Email,
				Web = m.Web,
				Slug = Util.GetSlug(m.Username, m.MemberId.ToString(), 70),
				CreatedDate = m.CreatedDate,
				LastModified = m.LastActivity,
				Avatar = m.Avatar
			};
		}

		#endregion
	}
}

Brainfart Saturday

I need to switch coffee brands…

Yesterday, for no apparent reason, I thought it may be a good idea to create a file transfer app that will asynchronously send files in slices(chunks) and still ensure the receiving party’s checksum will mach the sender’s. I was planning on adding public key security to the whole thing, but I can’t seem to get past step 1 without issues.

I tried testing splitting a file into slices and merging them immediately after and it seems to work just fine for small files.

Blob blob = FileUtils.GetBlob("C:\\Users\\Portable\\Downloads\\smalldoc.pdf");

FileUtils.SplitSlices(ref blob);

// Change the filename
blob.Path = "C:\\Users\\Portable\\Downloads\\smalldocCopy.pdf";
// Merge the slices back into one under the new filename
FileUtils.MergeSlices(ref blob);

The head-scratching starts when splitting and merging a large-ish file (50Mb+). The “Size on disk” identical to the original, but the “Size” is smaller than the original Size. Meaning it’s taking up the same disk allocation, but some bits got lost along the way. The funny thing is that if I then split the merged copy and merge it again into another copy, then this third copy is identical to the second. So original is still the odd one out.

I can’t seem to find the reason for this other than I’m missing something really obvious or this is a platform issue. I hope it’s the former because cursing at the latter feels… weird.

Here’s the “Slice” class where data would be stored and sent/received async.

public class Slice
{
	// Slice Id (Checksum / Currently not used)
	public string Id { get; set; }
	
	// File(Blob) Id (Checksum)
	public string SourceId { get; set; }

	// Blob location index
	public int Index { get; set; }

	// Slice byte length
	public int Size { get; set; }

	// Slice data
	public string Data { get; set; }

	public bool Complete { get; set; }

	public Slice()
	{
		Complete = false;
	}
}

And the “Blob” class that use the above slice(s)

public class Blob
{
	// File Id (Checksum)
	public string Id { get; set; }

	// Slice collection
	public SortedDictionary<int, Slice> Slices { get; set; }

	// Save path
	public string Path { get; set; }

	// File size
	public int Size { get; set; }

	// Assembled file size
	public int CompletedSize { get; set; }

	public Blob()
	{
		Slices = new SortedDictionary<int, Slice>();
		Size = 0;
		CompletedSize = 0;
	}
}

And of course, the uglier-than-sin FileUtils class (those with weak hearts, avert your eyes).

public static class FileUtils
{
	private static int _blockSize = 65536;

	public static void SplitSlices(ref Blob blob)
	{
		FileInfo info = new FileInfo(blob.Path);
		string source = info.FullName;
		string dir = info.DirectoryName;

		using (FileStream fs = new FileStream(source, FileMode.Open, FileAccess.Read))
		{
			foreach (KeyValuePair<int, Slice> kv in blob.Slices)
			{
				Slice slice = kv.Value;
				byte[] data = new byte[slice.Size];
				int read = 0;

				fs.Seek(slice.Index, SeekOrigin.Begin);
				if ((read = fs.Read(data, 0, slice.Size)) > 0)
				{
					WriteSlice(ref slice, data, dir);
				}
			}
		}
	}

	public static void WriteSlice(ref Slice slice, byte[] data, string dir)
	{
		string slicePath = SourceFromSlice(slice, dir);
		using (FileStream ofs =
			new FileStream(slicePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
		{
			ofs.Write(data, 0, slice.Size);
			slice.Complete = true;
		}
	}

	public static void MergeSlices(ref Blob blob)
	{
		FileInfo blobInfo = new FileInfo(blob.Path);
		string dir = blobInfo.DirectoryName;

		using (FileStream outfs =
			new FileStream(blobInfo.FullName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
		{
			foreach (KeyValuePair<int, Slice> kv in blob.Slices)
			{
				Slice slice = kv.Value;
				if (slice.Complete)
				{
					byte[] bytes = ReadSlice(ref slice, dir, true);
					outfs.Seek(slice.Index, SeekOrigin.Begin);
					outfs.Write(bytes, 0, slice.Size);

					// Update the completed count
					blob.CompletedSize += slice.Size;
				}
			}
		}
	}

	public static byte[] ReadSlice(ref Slice slice, string dir, bool delAfterReading)
	{
		int read = 0;
		byte[] data = new byte[slice.Size];
		string slicePath = SourceFromSlice(slice, dir);

		using (FileStream ifs = new FileStream(slicePath, FileMode.Open, FileAccess.Read))
		{
			read = ifs.Read(data, 0, slice.Size);
		}

		if (delAfterReading)
			File.Delete(slicePath);

		return data;
	}

	public static void InitBlob(ref Blob blob)
	{
		int sliceCount = 0;
		int sliceSize;

		// Catch remaining byte length after splitting
		int remainder = (blob.Size > _blockSize)? (blob.Size % _blockSize) : 0;

		// If this is a big file that can be split...
		if (blob.Size > _blockSize)
		{
			sliceCount = blob.Size / _blockSize;
			sliceSize = blob.Size / sliceCount;
		}
		else // Slice size same as blob size and only one slice needed
		{
			sliceCount = 1;
			sliceSize = blob.Size;
		}

		for (int i = 0; i < sliceCount; i++)
		{
			Slice slice = new Slice();
			slice.SourceId = blob.Id;
			slice.Size = (i == 0) ? sliceSize + remainder : sliceSize;
			slice.Index = i * slice.Size;

			blob.Slices.Add(slice.Index, slice);
		}
	}

	public static Blob GetBlob(string source)
	{
		Blob blob = new Blob();
		FileInfo info = new FileInfo(source);

		blob.Id = FileId(source);
		blob.Size = LengthToInt(info.Length);
		blob.Path = info.FullName;
		blob.CompletedSize = LengthToInt(info.Length);

		InitBlob(ref blob);
		return blob;
	}

	public static string GetChecksum(string source, string mode = "md5", bool isFile = false)
	{
		byte[] bytes = { };
		Stream fs;

		if (isFile)
			fs = new BufferedStream(File.OpenRead(source), 120000);
		else
			fs = new MemoryStream(Encoding.UTF8.GetBytes(source));

		switch (mode.ToLower())
		{
			case "sha1":
				using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider())
					bytes = sha1.ComputeHash(fs);
				break;

			case "sha256":
				using (SHA256CryptoServiceProvider sha256 = new SHA256CryptoServiceProvider())
					bytes = sha256.ComputeHash(fs);
				break;

			case "sha512":
				using (SHA512CryptoServiceProvider sha512 = new SHA512CryptoServiceProvider())
					bytes = sha512.ComputeHash(fs);
				break;

			case "md5":
			default:
				using (MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider())
					bytes = md5.ComputeHash(fs);
				break;
		}

		// Cleanup
		fs.Close();
		fs = null;

		return BitConverter
			.ToString(bytes)
			.Replace("-", "")
			.ToLower();
	}

	private static int LengthToInt(long length)
	{
		return (int)Math.Ceiling((double)length);
	}

	private static string FileId(string source)
	{
		return GetChecksum(new FileInfo(source).FullName, "sha256", true);
	}

	private static string SourceFromSlice(Slice slice, string dir)
	{
		return dir + "\\" + slice.SourceId + "_" + slice.Index + ".slice";
	}
}

Cryptographically secure One-time Pads

It’s the end of Christmas day… And I’ve got a splitting headache (because I don’t drink and had to watch everything). Luckily I’m not covered in someone else’s puke or urine, which is always nice.  No one drove into any trees (as far as I know), lost their pants or an eye.

Super…

Just finished checking my email and it seems a quite a few people were interested in my last post on the one-time pad and one frequent question was how to make it secure enough for truly private communication. First off, the JavaScript version is out if you want true security. Second, it’s best to implement something in a reusable class that can be used perhaps in a web application or client-side app. I chose an MVC web app in C# for this demonstration because my brain won’t let me stay up for much longer.

To make a large enough one-time pad that is both cryptographically secure and still computationally practical, we have to balance between how strong the algorithm is and how often it is iterated. In my original example, I opted to shuffle the character pool using the Fisher-Yates algorithm before a character is picked and each time, the pick was also random from the shuffled pool.

Considering JavaScript’s inherent weakness in that it must rely on the browser’s scripting engine for the psudorandom number, these were necessary improvements even with the added overhead. However, since modern browers do have optimized JS engines, this wasn’t a huge concren.

The problem is when we move toward much stronger cryptographic functions where computation starts to become non-trivial.

In the case of the RNGCryptoServiceProvider, the class provides a much stronger random number than the simple Random function. The output is spectrally white and is better suited for generating keys. The main point here is that it is cryptographically secure. I.E. It’s random enough to be used for proper encryption while still being a psudorandom number generator. The down side is that it is more computationally intensive than just plain Random.

The solution is to not shuffle the pool between character picks, but randomize between “segments”; the separated 6-8 character “words” in the pad. This strikes a good balance between randomness and speed.

Then, the issue comes down to how the pad is generated and presented to the user. Conventionally, this was in the form of plain text or as a downloadable file, however one of the requests I received was to make something that can do both. If the one-time pad can be saved as an image file, it can be sent via encrypted email to the recipient. The risk is that browsers store images and the like and the browser cache must be emptied after each generation. If a printer is used, it must also be cleared of its cache because some printers save a history of printed documents.

The following is a quick class that has both text and image generation. The GetImg function can be used to turn any sort of text into an image byte array, not just pad text.

/**
 * THIS SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

using System;
using System.Security;
using System.Security.Cryptography;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
using System.IO;
using System.Text;

namespace OneTimePad
{
	public class PadModel
	{
		// Render settings
		FontFamily renderFont = FontFamily.GenericMonospace;
		FontStyle renderStyle = FontStyle.Bold;
		GraphicsUnit renderUnit = GraphicsUnit.Pixel;
		int renderSize = 10;

		// Creates formatted pages of keys
		public string RenderPad(int s, int l, string chars)
		{
			// Result
			StringBuilder sb = new StringBuilder();

			// First page
			int p = 1;

			for (int i = 0; i < l; i++)
			{
				// First page number
				if (p == 1 && i == 0)
					sb.Append("1.\n\n");

				// Generate segment
				sb.Append(GenerateRandomString(s, chars));

				// Page, number and segment separartion
				if (i % 63 == 62)
				{
					if (i + 1 < l)
					{
						p++;
						sb.Append("\n\n\n");
						sb.Append(p);
						sb.Append(".\n\n");
					}
				}
				else if (i % 7 == 6) // Line separation
				{
					sb.Append("\n");
				}
				else // Segment separation
				{
					sb.Append("   ");
				}
			}

			return sb.ToString();
		}

		// Generates a random string of given length
		public static string GenerateRandomString(int len, string range)
		{
			Byte[] _bytes = new Byte[len];
			char[] _chars = new char[len];

			// Shuffle the range first
			range = Shuffle(range);

			using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
			{
				rng.GetBytes(_bytes);

				for (int i = 0; i < len; i++)
					_chars[i] = range[(int)_bytes[i] % range.Length];
			}

			return new string(_chars);
		}

		// Implements the Fisher-Yates algorithm to shuffle the range
		public static string Shuffle(string range)
		{
			char[] _chars = range.ToCharArray();
			int len = _chars.Length;
			Byte[] _bytes = new Byte[len];

			using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
			{
				for (int i = len - 1; i > 1; i--)
				{
					// New set of random bytes
					rng.GetBytes(_bytes);

					int r = (int)_bytes[i] % len;
					char c = _chars[i];
					_chars[i] = _chars[r]; // Swap
					_chars[r] = c;
				}
			}

			return new string(_chars);
		}

		// Generates a jpeg of given text
		public byte[] GetImg(string txt)
		{
			// Blank image
			Bitmap bmp = new Bitmap(1, 1);
			Graphics gfx = Graphics.FromImage(bmp);

			// Font settings
			Font fnt = new Font(renderFont, renderSize, renderStyle, renderUnit);

			// Image dimensions
			int w = (int)gfx.MeasureString(txt, fnt).Width;
			int h = (int)gfx.MeasureString(txt, fnt).Height;

			// New image to text size
			bmp = new Bitmap(bmp, new Size(w, h));

			gfx = Graphics.FromImage(bmp);

			// Defaults
			gfx.Clear(Color.White);
			gfx.SmoothingMode = SmoothingMode.Default;

			gfx.TextRenderingHint =
				System.Drawing.Text.TextRenderingHint.AntiAlias;

			gfx.DrawString(txt, fnt, new SolidBrush(Color.Black), 0, 0);

			// Cleanup
			gfx.Flush();

			MemoryStream ms = new MemoryStream();
			bmp.Save(ms, ImageFormat.Png);
			return ms.ToArray();
		}
	}
}

 

To use this in an MVC application, for example, I would use the following ActionResult.

public ActionResult Index(FormCollection collection)
{
	// Generate
	string chars = "2346789ABCDEFGHKLMNPQRTWXYZ";

	// Default values
	int ds = 8, dl = 882;

	// Store (s = segment size, l = number of segments)
	int s = 0, l = 0;

	// Assign values
	if (Int32.TryParse(collection["s"], out s) || Int32.TryParse(collection["l"], out l))
	{
		// Set defaults to practical limits
		s = (s > 0 && s <= 20) ? s : ds;
		l = (l > 0 && l <= 1000) ? l : dl;
	}
	else
	{
		// Set defaults
		s = ds;
		l = dl;
	}

	PadModel model = new PadModel();
	string txt = model.RenderPad(s, l, chars);
	string img = Convert.ToBase64String(model.GetImg(txt));

	ViewData["s"] = s; // Segment size to be shown
	ViewData["l"] = l; // Total segment count
	ViewData["pad"] = txt; // Plain text version of the pad

	// Instead of sending binary data, I opted to use a base64 encoded image instead
	// since most modern browsers support it anyway
	ViewData["img"] = string.Format("data:image/png;base64,{0}", img);

	return View();
}

 

And in the View page, we can display both the plain text version and the image version of the pad using the following two lines

<img src="<%= ViewData["img"] %>" alt="PadImage" />
<pre><%= ViewData["Pad"] %></pre>

 

As an added benefit of sending a base64 encoded image instead of a binary file is that the base64 text itself can be sent via encrypted email that can be reconstituted by the recipient. This added layer of security is also handy if the user intends to hide the pad in its entirety in another data file.

The image version should produce something like this…

An image based one time pad that can be printed

 

I generated that pad just a little while ago and printed it out on my laser printer. Of course the quality is rubbish because my printer is low-res and only meant for office documents, but with a proper high resolution inkjet, we can get much finer details.

 

One-time pad close-up

 

Because the sheets are numbered, the previous message can contain details on which sheet to use for the next communication. Ideally, these pages should be cut into individual sheets and destroyed after each use.

Hope this was helpful.

Now I’m off to hit the hay!

ASP.Net MVC 3 Script license WTF?!

I had a few breaks between visits so I decided to re-write some of my old work in MVC 3 and Razor. I was going through all the included files in the Scripts folder when I came across Modernizr. I admit that I haven’t really looked into it that much since most of my client work involves CSS2 and XHTML, not CSS3 and HTML5. And the few times I needed HTML5 compatibility for <video> and such it was included as needed.

For the record, Modernizr is offered under a BSD/MIT license.

Imagine my surprise when, in Visual Web Developer Express 2010, I opened up modernizr-1.7.js to find this on top :

/*!
* Note: While Microsoft is not the author of this file, Microsoft is
* offering you a license subject to the terms of the Microsoft Software
* License Terms for Microsoft ASP.NET Model View Controller 3.
* Microsoft reserves all other rights. The notices below are provided
* for informational purposes only and are not the license terms under
* which Microsoft distributed this file.
*
* Modernizr v1.7
* http://www.modernizr.com
*
* Developed by:
* - Faruk Ates  http://farukat.es/
* - Paul Irish  http://paulirish.com/
*
* Copyright (c) 2009-2011
*/

After that, I looked into the rest of the included script files and I found the same in jQuery and jQuery UI as well and they are originally MIT/GPL . Now I’ve either been living under a rock (actually living “out-of-town” for a while) or this is a sudden inclusion not present in any pre-MVC3 projects because I don’t recall seeing anything like this before.

So Microsoft is re-licensing the original code because it is included with an MVC 3 project template? Can they even do that even though the files are included in their project template (what about consent form the original authors)? Is this only because it was included in the template or does it also apply when I use the original code instead of the provided copies as long as I’m still using it with other Microsoft provided script files?

I really don’t like seeing surprises like this because it’s not a technical problem; It’s a legalese problem and I really hate to have to choose code based on licenses. I’m now tempted to simply delete the entire contents of the Scripts folder and download everything from scratch.