Everyone knows that some symbols such as <, >, &, " "break" the XML. That's why Server.HtmlEncode is used to replace them with correct HTML code, like & for the ampersand and so on. After that replacement the XML is supposed to be "safe". However, not under every possible condition. One of the applications I work on prints barcodes on tickets. The number is encoded using Interleaved 2 of 5 format. The encoding is performed by a function provided by the company IDAutomation and the output generally looks like this: "Ë'Zj`!/ÉI?!&!Ì". The output is then passed through the Server.HtmlEncode and added to the XML, which is fed to a printer.
However, yesterday I received a bug report and the error essentially boiled down to
Type : System.Xml.XmlException, System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=blah Message : An error occurred while parsing EntityName. Line 1, position 2832.
Source : System.Xml
Help link :
LineNumber : 1
LinePosition : 2832
SourceUri :
Data : System.Collections.ListDictionaryInternal
TargetSite : Void Throw(System.Exception)
Stack Trace : at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.Throw(String res)
at System.Xml.XmlTextReaderImpl.ParseEntityName()
at System.Xml.XmlTextReaderImpl.ParseEntityReference()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at PrintController.AddDoc(String xmlString)
And, fortunately, I got two XML samples, one of those was causing an error, and the other one was not.
This bit in the encoded XML did not cause any problems:
<text x="centre" y="620" font="IDAutomationHI25L" size="20" bold="false" italic="false" underline="false">
Ë'Zj`!/ÉI5!'!Ì</text>
This one, however, did, regardless of the "safely encoded" ampersand
<text x="centre" y="620" font="IDAutomationHI25L" size="20" bold="false" italic="false" underline="false">
Ë'Zj`!/ÉI?!&!Ì</text>
Solution? I had to think about it and that's what I came up with.
xmlData = xmlData.Replace("barcode", Server.HtmlEncode(mybarcode).Replace("&", "&"))
Because "&" is the HTML ASCII value for "&". And it worked like a charm. Now I just need to convert it to a small function instead which takes care of all "strange" characters: <, >, & and "
References
by Evgeny. Also posted on my website
No comments:
Post a Comment