Oh say, can you CDATA?

I worked really hard on that title.

An astute reader wrote in regarding my previous post about using Array.join("") to pull HTML tags from XML. He said that all this can be avoided by simply using CDATA in the XML document and pulling out the data with the .nodeValue property.

This is most certainly true, and it’s the simplest solution to the problem. I’ll even admit to being teh st00pidz for not mentioning this right from the start.

However, what if we needed to do both? Suppose, for example, you need to write an XML parser to parse data from a variety of publishers. Publishers, as we all know, are an unpredictable bunch. So publisher #1 sends out an XML file with their content enclosed in CDATA, and publisher #2 does not. Now what?

We’ll need a function to convert both CDATA and non-CDATA data into a universal format that can be displayed in an HTML textfield. Because the nodeValue method will only reliably return CDATA, I’m going to use the Array.join("") method.

Again, assuming we’ve loaded and parsed the XML file, let’s first get a string out of it:

var str:String = mx.xpath.XPathAPI.selectSingleNode(xml.firstChild.childNodes[0], "item/description").childNodes.join("");

Now, because the function requires some searching and replacing, let’s add a replace() function to the String prototype:

String.prototype.replace = function(replaceString, withString) {
   var myArray = this.split(replaceString);

And finally,the conversion function:

function convert(str) {
	if( str.indexOf(">",0) != -1) { // if the string has come in as CDATA, it will not have ">",">");

	var mc:MovieClip = _root.createEmptyMovieClip("converterMC",0);
	var htmlconverter = mc.createTextField("htmlconverter",mc.getNextHighestDepth(), -500,-500,100,25)
	htmlconverter.html = true;
	htmlconverter.htmlText = str;
	var converted:String = htmlconverter.text;
	delete mc;
	return converted;

What in the world is going on here, anyway?

When Flash pulls CDATA out of an XML document, all of the HTML entities will become escaped. So, “<” becomes “&lt;“, etc. If there’s already an escaped “&lt;” in the data, it comes back “double-escaped,” like this: &amp;lt;.

The first part of the function, the if block, checks for the existence of a “<” character. If one of these exists, then the text has not been escaped, and therefore, we can safely assume it is not coming in as CDATA. Then, the replace calls turn the HTML less-than and greater-than tag markers into escaped entities, and “double-escapes” any already-escaped entities. In other words, it converts the string into the same format as any CDATA. Now, it’s conversion time!

First, the function creates a text field on the stage and sets the html property to true. Then, it places the newly converted string into the htmlText property, which will cause the textfield to interpret the HTML. In this case, it takes escaped HTML text and converts it to non-escaped text. The contents of the textfield will now be proper HTML, with <> symbols around the tags, and escaped entities elsewhere, if necessary. This can then be set as the htmlText of another field, and will display properly.

If there’s ever a need to convert HTML text to non-HTML text, I highly recommend this method. Basically, it’s a nice quick way to strip any HTML tags, or in this case, interpret escaped entities. You could also put an HTML textfield on the stage, set its htmlText, and then pull out its text property, but I prefer this way because it’s a purely ActionScript solution.

By the way, does anyone know if there’s a proper term for “non-CDATA“?

, ,

  1. #1 by Andy Frey on December 11, 2006 - 3:37 pm

    Hey Nerdabilly!

    I can’t remember the original reason I wound up at your website, but I found my way to the entries about XML and Flash and such and thought I might throw you a link to a PHP class I wrote that does a fast and fairly efficient job of building and parsing XML in a way similar to the way ActionScript does it. I tried to model the class and methods like the AS XML class. I don’t know if this will have any benefit for you, but here is the link to it: http://onesandzeros.biz/xml/

    Great blog!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

  • Least-Old Tweets

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

%d bloggers like this: