Wednesday, November 2, 2011

A Confusing Issue with the Ampersand in the XML Sent to Printer.

Everyone knows that some symbols such as <, >, &, " "break" the XML. That's why Server.HtmlEncode is used to replace them with correct HTML code, like &amp; for the ampersand and so on. After that replacement the XML is supposed to be "safe". However, not under every possible condition. One of the applications I work on prints barcodes on tickets. The number is encoded using Interleaved 2 of 5 format. The encoding is performed by a function provided by the company IDAutomation and the output generally looks like this: "Ë'Zj`!/ÉI?!&!Ì". The output is then passed through the Server.HtmlEncode and added to the XML, which is fed to a printer.

However, yesterday I received a bug report and the error essentially boiled down to

Type : System.Xml.XmlException, System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=blah Message : An error occurred while parsing EntityName. Line 1, position 2832.
Source : System.Xml
Help link :
LineNumber : 1
LinePosition : 2832
SourceUri :
Data : System.Collections.ListDictionaryInternal
TargetSite : Void Throw(System.Exception)
Stack Trace : at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.Throw(String res)
at System.Xml.XmlTextReaderImpl.ParseEntityName()
at System.Xml.XmlTextReaderImpl.ParseEntityReference()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at PrintController.AddDoc(String xmlString)

And, fortunately, I got two XML samples, one of those was causing an error, and the other one was not.

This bit in the encoded XML did not cause any problems:

<text x="centre" y="620" font="IDAutomationHI25L" size="20" bold="false" italic="false" underline="false">
&#203;'Zj`!/&#201;I5!'!&#204;</text>

This one, however, did, regardless of the "safely encoded" ampersand

<text x="centre" y="620" font="IDAutomationHI25L" size="20" bold="false" italic="false" underline="false">
&#203;'Zj`!/&#201;I?!&amp;!&#204;</text>

Solution? I had to think about it and that's what I came up with.

xmlData = xmlData.Replace("barcode", Server.HtmlEncode(mybarcode).Replace("&amp;", "&#038;"))

Because "&#038;" is the HTML ASCII value for "&". And it worked like a charm. Now I just need to convert it to a small function instead which takes care of all "strange" characters: <, >, & and "

References

Server.HTMLEncode Method

HTML ASCII Characters

by . Also posted on my website

No comments: