This is one of those situations where none of your available options are good and your least harmful alternative is to shoot yourself in the foot at a slightly odd angle so as to only lose the little toe and not the big one.
All of this happened when Microsoft revealed January that their AntiXss library, now known as the Microsoft Web Protection Library (never seen a more ironic combination of words), had a vulnerability and like all obedient drones, we must update immediately to avoid shooting ourselves in our big toe. The problem is that updating will cause you to loose your little toe.
You see, the new library BREAKS EVERYTHING and eats your children.
A new HTML sanitizer is now available for PHP.
I think the problem is best described by someone who left a comment at the project discussion board.
I was using an old version of Anti-XSS with a rich text editor (CkEditor). It was working very great. But when upgrading to latest version, I discovered the new sanitized is way too much aggressive and is removing almost everything “rich” in the rich editor, specially colors, backgrounds, font size, etc… It’s a disaster for my CMS!
Is there any migration path I can use to keep some of the features of the rich text editor and having at least minimal XSS protection ?
Here’s the response from the coordinator.
CSS will always be stripped now – it’s too dangerous, but in other cases it is being too greedy, dropping hrefs from a tags for example. That is being looked at.
I know this may be a strange idea to comprehend for the good folks who developed the library, but you see in the civilized world, many people tend to use WYSIWYG in their projects so as to not burden their users with tags. These days more people are familiar with rudimentary HTML, but when you just want to quickly make a post, comment or otherwise share something, it’s nice to know there’s an editor that can accommodate rich formatting. This is especially true on a mobile device, where switching from text to special characters for tags is still annoying.
Those WYSIWYGs invariably use CSS and inline styles to accomplish this rich formatting, thereby making your assertion ridiculous and this library now completely impractical.
A very quick test on the 4.2 Sanitizer shows that it totally removes strong tags, h1 tags, section tags and as mentioned above strips href attributes from anchor tags. At this rate the output will soon be string.Empty. I hope that the next version will allow basic markup tags and restore the href to anchors.
So in other words, AntiXss is now like an antidepressant. You’ll feel a lot better after taking it, but you may end up killing yourself.
And that’s not all…
I would have kept my mouth shut about this even though I’ve had my doubts about depending on the library over something DIY, but since I work with a bunch of copycat monkeys, I have to use whatever everyone else deems worthy of being included in a project (common sense be damned). I thought, surely there would at least be the older versions available, but no…
It’s company policy I’m afraid. The source will remain though, so if you desperately wanted you could download and compile your own versions of older releases.
Of course, I lost my temper at that. Since I’m forced to use this library and one of the devs went ahead and upgraded without backing up the old version or finding out exactly how the vulnerability would affect us. I now had to go treasure hunting across three computers to find 4.0 after just getting home.
AntiXss 4.2 is stupid and so is Microsoft.
Here’s my current workaround until MS comes up with a usable alternative. I’m also using the HtmlAgilityPack which at the moment hasn’t contracted rabies, thankfully, and the 4.0 library.
This is a singleton class, so you need to call Instance to initiate.
HtmlUtility util = HtmlUtility.Instance;
Update : September 28.
Natd discovered a vulnerability in this code that allowed onclick attributes to be added to the code tag itself. Fixed.
although i guess i dont understand the most technique part in your post, i feel you are illustrating it in a funny way, lol.
Mad humor makes the madness of my job a little less maddening. ;)
what could i say? you are a person full of fun and wisdom! :D
Hey eksith, thanks for posting this! I thought I was going mental, I mean… H1 tags!? I was convinced that I was doing something wrong if simple H1 tag was confusingly being removed. I only just started using WPL tonight, sooo glad I’m not more entangled in it….
Chris, count your lucky stars you haven’t started heavily depending on this yet. Talk about a bait-and-switch; can you imagine what a nightmare it has been for people who only rely on AntiXss for their filtering? It really has been a disaster and they still don’t have a timeline for fixing all the problems.
From another thread their discussion board :
So not only is quite literally everything broken, they also have no comment on when things will be fixed. Go figure.
Pingback: Microsoft has a different definition for “Open Source” « This page intentionally left ugly
Pingback: Discussion Forum Update (tables and classes) | This page intentionally left ugly
I’ve tried the above code with AntiXss version 4.2.1, but all of my inline css styles are full of numbers like this: 0002D(- character) (eg: text-decoration is text 0002Ddecoration
Should I be using Version 4.0? If so, do you have a copy of it? Could you put it up on the internet somewhere?
I’ve replaced the EncoderCssEncode with an HtmlAttributeEncode for the time being, as it returns valid html, but I am worried that doing this could cause problems.
What should this line be?
a.Value = (!string.IsNullOrEmpty(a.Value)? : a.Value.Replace(“\r”, “”).Replace(“\n”, “”);
Html attributes shouldn’t have any line breaks in them. It may even confuse older browsers and potentially allow vulnerabilities.
Pingback: Giving up on ASP.Net | This page intentionally left ugly
There is a compile error on that line… That is why Northrills asking.
And still nothing from MS on this, at least nothing I could find.
Oh, I’ve left AntiXSS out of any of the new projects. It’s pretty much a dead library now.
We’re using something else at work, which sadly I can’t release here because it’s in-house and therefore proprietary, but using HtmlAgilityPack gets you 95% of the way to filter out unnecessary tags and attributes. The rest is easily DIY.
Best of all, unlike AntiXSS, HAP actually is open source as in you can view the bloody source!
Great job. Thanks for posting this! I had a quick question. Would it be a bad idea to, instead of removing tags and attributes that are not white listed, simply encode them?
Eg, line 204 becomes: node.ParentNode.ReplaceChild(HtmlNode.CreateNode(Microsoft.Security.Application.Encoder.HtmlEncode(node.OuterHtml)), node);
Sure that would work. I’m using encoding in this case only for the contents of code tags here since that was what I needed at the time.
I do have to caution you that there may be times where you don’t want the content to be shown at all. E.G. It was a malicious script that included links to sites that had even more nasty stuff. To prevent showing stuff that visitors may navigate to manually, sometimes it’s best not to show it at all.
You can’t completely prevent users from harming their own systems if they insist, but you can make it harder for them to do so.
Tnx for the post.
Little remark you do not cleaned html attributes in “code” tag wich wraping our source code what we don’t wont sinitize. It’s harmful, lets imagine:
our encoded html...
so i added one more dictionary wich tags need skip, but validate attributes
private Dictionary<string, List> SourceTagList = new Dictionary<string, List>();
SourceTagList.Add(“code”, new List());
SourceTagList.Add(“source”, new List());
and checking if node not in the list
//clean attributes only in wraping tag
//encode all inside wraping tag
node.InnerHtml = Microsoft.Security.Application.Encoder.HtmlEncode(node.InnerHtml);
i mean some one can add onclick attribute to code block
Ah! Thanks for catching this.
Sorry, I was late to fix this; my attention was caught elsewhere.
This post is getting a large number of spam comments that I’ve been trying to keep on top of. Until the spam attack dies down, I’m temporarily closing comments.
Edit November 16, 2012 : OK comments are re-enabled.
Thank you for taking the time to write up this solution. I think the solution in AJAX Control Toolkit must have been based on this, if anyone was considering using that. I didn’t need all the extra stuff that comes with ACT so I was very happy to be able to use this solution.
Hi Simon, Thanks! Glad you found it useful.
Still haven’t heard back from MS on any permanant fixes to AntiXSS, but it looks like the library will be part of .Net 4.5 as well. On their CodePlex page, there’s a post from the coordinator that it’s being “worked on”, but I’m not holding my breath.
OK, I’m gonna have to close the comments on this post again, because I’m getting a ton of spam on it (1000+) as of December 3, 2012.
I don’t think I’ll reopen comments, so if you need to contact me about it, drop me an email or comment on another post (doesn’t matter if they’re unrelated).