AntiXss 4.2 Breaks everything

This is one of those situations where none of your available options are good and your least harmful alternative is to shoot yourself in the foot at a slightly odd angle so as to only lose the little toe and not the big one.

All of this happened when Microsoft revealed January that their AntiXss library, now known as the Microsoft Web Protection Library (never seen a more ironic combination of words), had a vulnerability and like all obedient drones, we must update immediately to avoid shooting ourselves in our big toe. The problem is that updating will cause you to loose your little toe.

You see, the new library BREAKS EVERYTHING and eats your children.

Update 11/14/2013:
A new HTML sanitizer is now available for PHP.

I WILL EAT ALL YOUR TAGS!!!

I think the problem is best described by someone who left a comment at the project discussion board.

I was using an old version of Anti-XSS with a rich text editor (CkEditor). It was working very great. But when upgrading to latest version, I discovered the new sanitized is way too much aggressive and is removing almost everything “rich” in the rich editor, specially colors, backgrounds, font size, etc… It’s a disaster for my CMS!

Is there any migration path I can use to keep some of the features of the rich text editor and having at least minimal XSS protection ?

Lovely eh?

Here’s the response from the coordinator.

CSS will always be stripped now – it’s too dangerous, but in other cases it is being too greedy, dropping hrefs from a tags for example. That is being looked at.

I know this may be a strange idea to comprehend for the good folks who developed the library, but you see in the civilized world, many people tend to use WYSIWYG in their projects so as to not burden their users with tags. These days more people are familiar with rudimentary HTML, but when you just want to quickly make a post, comment or otherwise share something, it’s nice to know there’s an editor that can accommodate rich formatting. This is especially true on a mobile device, where switching from text to special characters for tags is still annoying.

Those WYSIWYGs invariably use CSS and inline styles to accomplish this rich formatting, thereby making your assertion ridiculous and this library now completely impractical.

A very quick test on the 4.2 Sanitizer shows that it totally removes strong tags, h1 tags, section tags and as mentioned above strips href attributes from anchor tags. At this rate the output will soon be string.Empty. I hope that the next version will allow basic markup tags and restore the href to anchors.

So in other words, AntiXss is now like an antidepressant. You’ll feel a lot better after taking it, but you may end up killing yourself.

And that’s not all…

I would have kept my mouth shut about this even though I’ve had my doubts about depending on the library over something DIY, but since I work with a bunch of copycat monkeys, I have to use whatever everyone else deems worthy of being included in a project (common sense be damned). I thought, surely there would at least be the older versions available, but no

It’s company policy I’m afraid. The source will remain though, so if you desperately wanted you could download and compile your own versions of older releases.

Of course, I lost my temper at that. Since I’m forced to use this library and one of the devs went ahead and upgraded without backing up the old version or finding out exactly how the vulnerability would affect us. I now had to go treasure hunting across three computers to find 4.0 after just getting home.

AntiXss 4.2 is stupid and so is Microsoft.

Here’s my current workaround until MS comes up with a usable alternative. I’m also using the HtmlAgilityPack which at the moment hasn’t contracted rabies, thankfully, and the 4.0 library.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace Arcturus.Helpers
{
	/// <summary>
	/// This is an HTML cleanup utility combining the benefits of the
	/// HtmlAgilityPack to parse raw HTML and the AntiXss library
	/// to remove potentially dangerous user input.
	///
	/// Additionally it uses a list created by Robert Beal to limit
	/// the number of allowed tags and attributes to a sensible level
	/// </summary>
	public sealed class HtmlUtility
	{
		private static volatile HtmlUtility _instance;
		private static object _root = new object();

		private HtmlUtility() { }

		public static HtmlUtility Instance
		{
			get
			{
				if (_instance == null)
					lock (_root)
						if (_instance == null)
							_instance = new HtmlUtility();

				return _instance;
			}
		}

		// Original list courtesy of Robert Beal :
		// http://www.robertbeal.com/

		private static readonly Dictionary<string, string[]> ValidHtmlTags =
			new Dictionary<string, string[]>
        {
            {"p", new string[]          {"style", "class", "align"}},
            {"div", new string[]        {"style", "class", "align"}},
            {"span", new string[]       {"style", "class"}},
            {"br", new string[]         {"style", "class"}},
            {"hr", new string[]         {"style", "class"}},
            {"label", new string[]      {"style", "class"}},

            {"h1", new string[]         {"style", "class"}},
            {"h2", new string[]         {"style", "class"}},
            {"h3", new string[]         {"style", "class"}},
            {"h4", new string[]         {"style", "class"}},
            {"h5", new string[]         {"style", "class"}},
            {"h6", new string[]         {"style", "class"}},

            {"font", new string[]       {"style", "class",
				"color", "face", "size"}},
            {"strong", new string[]     {"style", "class"}},
            {"b", new string[]          {"style", "class"}},
            {"em", new string[]         {"style", "class"}},
            {"i", new string[]          {"style", "class"}},
            {"u", new string[]          {"style", "class"}},
            {"strike", new string[]     {"style", "class"}},
            {"ol", new string[]         {"style", "class"}},
            {"ul", new string[]         {"style", "class"}},
            {"li", new string[]         {"style", "class"}},
            {"blockquote", new string[] {"style", "class"}},
            {"code", new string[]       {"style", "class"}},
			{"pre", new string[]       {"style", "class"}},

            {"a", new string[]          {"style", "class", "href", "title"}},
            {"img", new string[]        {"style", "class", "src", "height",
				"width", "alt", "title", "hspace", "vspace", "border"}},

            {"table", new string[]      {"style", "class"}},
            {"thead", new string[]      {"style", "class"}},
            {"tbody", new string[]      {"style", "class"}},
            {"tfoot", new string[]      {"style", "class"}},
            {"th", new string[]         {"style", "class", "scope"}},
            {"tr", new string[]         {"style", "class"}},
            {"td", new string[]         {"style", "class", "colspan"}},

            {"q", new string[]          {"style", "class", "cite"}},
            {"cite", new string[]       {"style", "class"}},
            {"abbr", new string[]       {"style", "class"}},
            {"acronym", new string[]    {"style", "class"}},
            {"del", new string[]        {"style", "class"}},
            {"ins", new string[]        {"style", "class"}}
        };

		/// <summary>
		/// Takes raw HTML input and cleans against a whitelist
		/// </summary>
		/// <param name="source">Html source</param>
		/// <returns>Clean output</returns>
		public string SanitizeHtml(string source)
		{
			HtmlDocument html = GetHtml(source);
			if (html == null) return String.Empty;

			// All the nodes
			HtmlNode allNodes = html.DocumentNode;

			// Select whitelist tag names
			string[] whitelist = (from kv in ValidHtmlTags
								  select kv.Key).ToArray();

			// Scrub tags not in whitelist
			CleanNodes(allNodes, whitelist);

			// Filter the attributes of the remaining
			foreach (KeyValuePair<string, string[]> tag in ValidHtmlTags)
			{
				IEnumerable<HtmlNode> nodes = (from n in allNodes.DescendantsAndSelf()
											   where n.Name == tag.Key
											   select n);

				// No nodes? Skip.
				if (nodes == null) continue;

				foreach (var n in nodes)
				{
					// No attributes? Skip.
					if (!n.HasAttributes) continue;

					// Get all the allowed attributes for this tag
					HtmlAttribute[] attr = n.Attributes.ToArray();
					foreach (HtmlAttribute a in attr)
					{
						if (!tag.Value.Contains(a.Name))
						{
							a.Remove(); // Attribute wasn't in the whitelist
						}
						else
						{
							// *** New workaround. This wasn't necessary with the old library
							if (a.Name == "href" || a.Name == "src") {
								a.Value = (!string.IsNullOrEmpty(a.Value))? a.Value.Replace("\r", "").Replace("\n", "") : "";
								a.Value =
									(!string.IsNullOrEmpty(a.Value) &&
									(a.Value.IndexOf("javascript") < 10 || a.Value.IndexOf("eval") < 10)) ?
									a.Value.Replace("javascript", "").Replace("eval", "") : a.Value;
							}
							else if (a.Name == "class" || a.Name == "style")
							{
								a.Value =
									Microsoft.Security.Application.Encoder.CssEncode(a.Value);
							}
							else
							{
								a.Value =
									Microsoft.Security.Application.Encoder.HtmlAttributeEncode(a.Value);
							}
						}
					}
				}
			}

			// *** New workaround (DO NOTHING HAHAHA! Fingers crossed)
			return allNodes.InnerHtml;

			// *** Original code below

			/*
			// Anything we missed will get stripped out
			return
				Microsoft.Security.Application.Sanitizer.GetSafeHtmlFragment(allNodes.InnerHtml);
			 */
		}

		/// <summary>
		/// Takes a raw source and removes all HTML tags
		/// </summary>
		/// <param name="source"></param>
		/// <returns></returns>
		public string StripHtml(string source)
		{
			source = SanitizeHtml(source);

			// No need to continue if we have no clean Html
			if (String.IsNullOrEmpty(source))
				return String.Empty;

			HtmlDocument html = GetHtml(source);
			StringBuilder result = new StringBuilder();

			// For each node, extract only the innerText
			foreach (HtmlNode node in html.DocumentNode.ChildNodes)
				result.Append(node.InnerText);

			return result.ToString();
		}

		/// <summary>
		/// Recursively delete nodes not in the whitelist
		/// </summary>
		private static void CleanNodes(HtmlNode node, string[] whitelist)
		{
			if (node.NodeType == HtmlNodeType.Element)
			{
				if (!whitelist.Contains(node.Name))
				{
					node.ParentNode.RemoveChild(node);
					return; // We're done
				}
			}

			if (node.HasChildNodes)
				CleanChildren(node, whitelist);
		}

		/// <summary>
		/// Apply CleanNodes to each of the child nodes
		/// </summary>
		private static void CleanChildren(HtmlNode parent, string[] whitelist)
		{
			for (int i = parent.ChildNodes.Count - 1; i >= 0; i--)
				CleanNodes(parent.ChildNodes[i], whitelist);
		}

		/// <summary>
		/// Helper function that returns an HTML document from text
		/// </summary>
		private static HtmlDocument GetHtml(string source)
		{
			HtmlDocument html = new HtmlDocument();
			html.OptionFixNestedTags = true;
			html.OptionAutoCloseOnEnd = true;
			html.OptionDefaultStreamEncoding = Encoding.UTF8;

			html.LoadHtml(source);

			// Encode any code blocks independently so they won't
			// be stripped out completely when we do a final cleanup
			foreach (var n in html.DocumentNode.DescendantNodesAndSelf())
			{
				if (n.Name == "code") {
					//** Code tag attribute vulnerability fix 28-9-12 (thanks to Natd)
					HtmlAttribute[] attr = n.Attributes.ToArray();
					foreach (HtmlAttribute a in attr) {
						if (a.Name != "style" && a.Name != "class")  { a.Remove(); }
					} //** End fix
					n.InnerHtml =
						Microsoft.Security.Application.Encoder.HtmlEncode(n.InnerHtml);
				}
			}

			return html;
		}
	}
}

This is a singleton class, so you need to call Instance to initiate.

E.G.

HtmlUtility util = HtmlUtility.Instance;

7:40AM… Bedtime!

Update : September 28.

Natd discovered a vulnerability in this code that allowed onclick attributes to be added to the code tag itself. Fixed.

Advertisement

Poopsicles

I had forgotten how much I used to enjoy TV before work crushed my soul and ate my, as of yet unborn, children.

Then I remembered that during the mid 2000’s, TV made a brief watchability comeback of sorts, though now I see that not only did we revert to square one, we’ve already packed away the pieces.

I started watching TV again… I mean really watching instead of leaving it on in the background while snorting emails and shaking my fist in delirium at the invisible code-monkey-demons hovering over my over-caffeinated head, secretly inserting bugs into my work. I finally thought about what it was that made me so irritated about TV and started paying attention to find the cause. This was only a slightly less traumatic and pointless experience than self-trepination.

Reality TV — otherwise known as a compendium of caustic, cacophonous, kaka — at first didn’t seem to be the boob tube equivalent of herpes that it has now become. Practically every channel short of the shopping channels and public/gov access have some variety of faux reality entertainment contracted, I imagine, due to the shuffling of execs from network to network and copycat behavior.

See kids? Always use a condom.

I couldn’t have had this realization about TV had I not been outside the country for a while, thereby completely extricating myself from loop. The damage we’re doing to ourselves by watching this drivel rarely makes itself obvious until you stop the unprotected channel to channel voyeuristic promiscuity and take a good hard look at yourself. And then it hits you :

Crap! Warts!!

What really grates me is not only the sheer breath and depth of damage done to sane entertainment by this invasive species, but the idea that blithering idiocy, conformity and mediocrity are now the food pyramid for the daily TV diet. We have actually been trained to expect entertainment in the same format over and over and over.

We have shows like Style TV’s (a channel I know painfully well thanks to my ex) How do I look; a show that, if you’re a viewer like me, would seem to declare in no uncertain terms that your uniqueness and individuality are verboten in civilized society with all the delicacy of a steel-toed boot to the testicles. I’m all for not looking like a freak in front of people, but there’s a limit to how much of a cookie-cutter-Barbie you can turn a women into.

Speaking of conformity (conspiracy hat on), I think the Barbies are eventually destined to be fed into the commercial machine to become money mills at some future date so the entertainment can continue. How else would we have a show like Millionaire Matchmaker on Bravo exist? A show that makes me seriously consider whether I would really want to wake up next to some of the featured clients or rather have a steaming hot bowl of yak dung and vodka for breakfast.

BTW… I was told by a number of people that Bravo, which is now officially reality TV central and caters a sizable gay demographic, has a reputation for “converting” straight people to homosexuality and I say that’s a load of BS. I was visiting a friend who’s an avid fan of the network and he had it on the entire time I was there. The only time it would have even remotely turned me gay was when I briefly wondered if hemlock suppositories existed and whether they would be a less painful alternative to the slow suicide I was experiencing at the time. The only watchable show on the network now is Inside the Actors Studio, and even that’s a stretch considering some of the guests as of late.

I could go on to the Real Housewives of XXX or Jerseylicious but I’d rather not risk dying yet from the inevitable aneurism.

Then there’s the self-help malarkey : I.E. Supernanny. Here’s the gist of the Supernanny guide (this is basically every episode and I’m not even kidding) :

  • Calm assertive authority
  • Be consistent
  • Instill discipline
  • Employ manners
  • Avoid laziness

If not for the last two, this show could have essentially been re-titled the Child Whisperer, but that would have been creepy. Besides, I imagine the term would have already been copyrighted by now for Hollywood to tell the Jerry Sandusky story.

Reality TV should technically only be palatable if you’re suffering from a legitimate condition such as depression or OCD or as comfort food for morons or just schadenfreude. But thanks to the never-ending marathon assault on our sense of taste by constant exposure, it looks like we’re being mutated into target demographics.

I think that should cover my brief examination of what’s killing TV and our sanity for now; also it’s 5:45 AM and it’s time for me to go to bed.

Google and Self-induced depression

Ever stop and think to yourself “I’m just too damn happy today”? Well, rest your happy head and prepare to loose faith in humanity…

Would we...

Will I...

Why would...

Why will...

Why is...

Why... (OK seriously, what's with the green poop questions)

Why can...

Who will...

Who can...

Should we...

Should I...

Could I...

Can we...

Can I...

And my all time favorite…

Is it wrong... Is so wrong on so many levels, I don't even know what to think.

Screw thee before thee screws me

I’ve been extremely busy for the past few moons and haven’t had a chance to update here. Although I haven’t been completely silent as I did make a few remarks on my other “blog” of sorts. Actually it’s just a place for me to randomly inject noise into the web.

For my latest soap adventure, I behaved as any programmer would. I decided to use software at every step in the production, sale and shipping process. More commonly refered to as “Business Management” software, these are supposed to help you keep track of all your expenses, obligations, debts and customer managment. Think of it as retail Production + Point of Sale (in this case a web site) + CRM (Customer Relationship Management).

While researching for a good suite to manage all this, I came across a promising entity with quite a few big name companies using them. Let’s call this business management software company F.U. Corp because the last thing I want to do is give these idiots more business. Something I’m sure they can’t handle despite their designated industry.

F.U. Corp sounds really pretty

When I called up F.U. Corp, it was the best customer service I’ve ever experienced. And I don’t say that lightly. The representative was the friendliest lady I’ve spoken to in a while and seemed quite competent in her responses when I explained how far I’ve come and how far I need to go to finish making the business startup hurdles a thing of the past.

My only gripe at the time was that after the demo presentation, it would cost me $950+ including license fees to use the software suite should I decide to keep it.

F.U. Corp tech support is run by twelve year olds

Well, my plesant experience with F.U. Corp pretty much ended after that call. What followed was a demo and presentation that all but ruined my faith in almost all business management software. It was the most painfully convoluted, hidden cost ridden, security swiss cheese, broken bear trap I’ve ever had the displeasure of having to wade through.

I should have had the good sense to take the hint when when during the first installation on a fresh Windows XP Pro 32bit machine (as per their recommendation) the software stopped half way through management setup. For a second, I thought maybe I didn’t configure it properly. I even didn’t instally any anti virus software (also per their recommendation). I get really nervous when installations require that no AV software be present on the system instead of simply disabling them temporarily.

Au contrair, not only was Dingbat Corp tech support quick to point out that I had incorrectly installed the software (apparently there’s more to inserting a CD and clicking on “Install” that I wasn’t aware of), I had also had the wrong OS, the wrong systems settings and I had installed AV while sleep walking.

Let’s recap:
XP Pro 32-bit : Check
Fresh installation : Check
No AV protection : Check

Now they requested I enable Remote Desktop (there’s some more good news) so they can see, what “you did wrong”. Exact words.
After an uncomfortable silence, “hmm… well it looks like XP Pro 32 bit”. NO FECES SHERLOCK!
And the real kicker… “do you happen to have a Windows 2000 or Vista machine to test this? Maybe one of them will accept the installation.”

They still had the nerve to try and charge me for the tech support call even when the demo explicitly states that support calls are free for the first week of the demo and I have no obligation to keep it.

I would go on at length about the intricacies of their own unique flavor of business management, but I don’t think my blood pressure would endure.

Screw thee before thee screws me

(Should be the first commandment when dealing with fishy business.)

And so ended my relationship with F.U. Corp. Their whole presentation and demo package is going back tomorrow. I didn’t sign up to do their debugging and involuntary beta testing (which I’m sure that was what it was).

So I’m still where I began.

I may be desperate enough to turn to Microsoft’s Dynamics brand at this point. It might be the most bloated elephant crossed with a rhino in the world, but the bloody thing will at least be consistent. The only thing that might keeping me from persuing this is phpBMS.

Two things phpBMS is that Dynamics isn’t…

1) Open source, which means I have an option if what I want to modify it. And there’s no saying that I’ll keep things in exactly the same models Microsoft provides. One thing I learned from F.U. Corp, when you try to guess every instance and every possibility, you don’t account for any of them well. Better to stay flexible from the start. Also, it’s phpBMS and this would give me a rare opportunity to do some real world work with the language.

2) It’s free. And I is broke!

Two things that Dynamics is that phpBMS isn’t…

1) As feature rich. Which doesn’t really seem like a con at this point because I still have the option of adding modules, modifying the core or adding in functionality myself if necessary. I’m just concerned about how much time I’ll have to do just that though, since the day job is just as busy as the night job these days.

2) Have professional tech support. Call me old fashioned, but I prefer to talk to a person even if I have to pay extra for that. There’s just too much disconnect with email or message board threads. Plus a lot of times you have an immediate problem that needs to be resolved and even a few hours delay is unacceptable.

I’m also thinking of coming up with an in-house solution to all this, but again that would mean taking time away from my jobs. It may almost be worth it if I can come up with a decent piece of software. It would give me better content management and online integration options.

“Primus Sucks” IS the tagline!

Apparently people who really love Primus have never heard of this and still claim to be “die hard fans”.

The band has deprecated its use since the last decade for obvious reasons. No one wants to explain this 800 billion times to each and every dimwit who goes…
“PRIMUS DOESN’T SUCK! YOU SUCK!! GO DIE SOMEWHERE!!!!”

Check out the YouTube comments on their Tommy the Cat video.

WARNING: Reading YouTube comments has been scientifically proven to reduce your I.Q.

I used to be a reasonably intelligent person until I got into the habit of reading YT comments. Now I have the intelligence of a half eaten moldy slice of bread.