Archive for November, 2006


I’m attempting to claim this on Technorati.

Technorati Profile

Leave a comment

Oh say, can you CDATA?

I worked really hard on that title.

An astute reader wrote in regarding my previous post about using Array.join("") to pull HTML tags from XML. He said that all this can be avoided by simply using CDATA in the XML document and pulling out the data with the .nodeValue property.

This is most certainly true, and it’s the simplest solution to the problem. I’ll even admit to being teh st00pidz for not mentioning this right from the start.

However, what if we needed to do both? Suppose, for example, you need to write an XML parser to parse data from a variety of publishers. Publishers, as we all know, are an unpredictable bunch. So publisher #1 sends out an XML file with their content enclosed in CDATA, and publisher #2 does not. Now what?

We’ll need a function to convert both CDATA and non-CDATA data into a universal format that can be displayed in an HTML textfield. Because the nodeValue method will only reliably return CDATA, I’m going to use the Array.join("") method.

Again, assuming we’ve loaded and parsed the XML file, let’s first get a string out of it:

var str:String = mx.xpath.XPathAPI.selectSingleNode(xml.firstChild.childNodes[0], "item/description").childNodes.join("");

Now, because the function requires some searching and replacing, let’s add a replace() function to the String prototype:

String.prototype.replace = function(replaceString, withString) {
   var myArray = this.split(replaceString);

And finally,the conversion function:

function convert(str) {
	if( str.indexOf(">",0) != -1) { // if the string has come in as CDATA, it will not have ">",">");

	var mc:MovieClip = _root.createEmptyMovieClip("converterMC",0);
	var htmlconverter = mc.createTextField("htmlconverter",mc.getNextHighestDepth(), -500,-500,100,25)
	htmlconverter.html = true;
	htmlconverter.htmlText = str;
	var converted:String = htmlconverter.text;
	delete mc;
	return converted;

What in the world is going on here, anyway?

When Flash pulls CDATA out of an XML document, all of the HTML entities will become escaped. So, “<” becomes “&lt;“, etc. If there’s already an escaped “&lt;” in the data, it comes back “double-escaped,” like this: &amp;lt;.

The first part of the function, the if block, checks for the existence of a “<” character. If one of these exists, then the text has not been escaped, and therefore, we can safely assume it is not coming in as CDATA. Then, the replace calls turn the HTML less-than and greater-than tag markers into escaped entities, and “double-escapes” any already-escaped entities. In other words, it converts the string into the same format as any CDATA. Now, it’s conversion time!

First, the function creates a text field on the stage and sets the html property to true. Then, it places the newly converted string into the htmlText property, which will cause the textfield to interpret the HTML. In this case, it takes escaped HTML text and converts it to non-escaped text. The contents of the textfield will now be proper HTML, with <> symbols around the tags, and escaped entities elsewhere, if necessary. This can then be set as the htmlText of another field, and will display properly.

If there’s ever a need to convert HTML text to non-HTML text, I highly recommend this method. Basically, it’s a nice quick way to strip any HTML tags, or in this case, interpret escaped entities. You could also put an HTML textfield on the stage, set its htmlText, and then pull out its text property, but I prefer this way because it’s a purely ActionScript solution.

By the way, does anyone know if there’s a proper term for “non-CDATA“?

, ,

1 Comment