I guess I’m on an XML rampage lately. I often receive XML documents that are not well formed. They started out that way and it’s a long story how they end up mangled but basically they contain extraneous whitespace. Something like this:
<root some attr="value"> <childnode attr ="123" /> </root>
To help recover documents like this I’ve built a web application based on Apache XMLBeans which parses and then displays a formatted version. I also added features to attempt to remove the extraneous whitespace, wrap long lines, and compress multiple occurrences of whitespace in attribute values and text nodes.
The first of these features I called Smart Scan. Looking back this was kinda arrogant as the code isn’t that smart at all. In fact, it is surprisingly difficult to programmatically examine an XML document that is not well formed and deduce how it should be corrected. Basically I attempt to remove whitespace that would cause the parse of the document to fail. This whitespace can occur in two places in an XML element. For example, consider:
<myele ment attr="1234"/>
As is this document will not parse. We could correct it in one of two ways:
- <myelement attr=”1234″/>
- <myele mentattr=”1234″/>
Both of these result in a well formed XML document but the first one is probably what was intended. As humans we deduce this because the words/abbreviations used make more sense. It’s pretty hard to write code that is this smart and I chose to format this as is shown in #2 above. The reasoning behind this is that an element is likely to have many more characters dedicated to attributes than to the element name. Therefore, it is more likely that extraneous whitespace will occur in an attribute.
The second feature I added is named Wrap Lines and basically attempts to wrap long element lines in a pleasing manner without impacting the validity of the document.
Finally there is a feature called Compress Whitespace that will replace all occurrences of whitespace in an attribute value or text node with a single space. This can be useful if, for example, I have a document that gets mangled as follows:
<myelement attr="This is the attribute value"/>
All three of these features may be enabled or disabled from the page that submits the XML document to be formatted so if they are not working for you just turn them off. If they behave inappropriately with one of your documents I would appreciate it if you would share the document with me by sending it to bwit AT pobox.com.
Here’s an example of using the application with our first example XML document:
That’s about it. If you read all this drivel you certainly deserve some free code and I hope it serves your needs. if it doesn’t leave a comment and take me to task for it but, remember, this is a mediocre application developed by an average guy. So be kind.
The code is here. By the way, if you’ve just jumped to the bottom without reading any of the valuable information above then this link just won’t work!
Nah, just kidding.
I was reading the September 18th issue of Infoworld Magazine which had a series of articles on the current employment market for high tech professionals, or semi-professional in my case. One thing that caught my eye was a list of the top 10 most wanted skills in this market according to Dice.com. I don’t know much about Dice.com but here is their list:
1. C++ Programmer
2. Oracle Database Administrator
3. Windows Administrator
4. Project Manager
5. Unix Administrator
6. SQL Database Administrator
7. .NET Programmer
8. J2EE/Java Programmer
9. ASP Programmer
10. XML Programmer
I found this list interesting on a couple of levels. First I am a bit confused by the skill set SQL Database Administrator. Does this mean SQL Server DBA or just DBA of a relational database? Would that also include Oracle DBA’s or is it all DBA’s of relational database except Oracle DBA’s which are, apparently, in much higher demand. Dice needs to be a little more explicit.
I admit that my DBA confusion is a bit nit-picky and I only mention it as a (poor) segue into my main point of this post – XML Programmer. Ok, this is not the only place I’ve seen or heard this term. I’ve had recruiters call me and ask how my XML programming skills were.
So, what exactly is an XML Programmer? I know that a C++ Programmer writes programs in the C++ language and that a Java Programmer writes programs in the Java language so it would make sense that an XML programmer writes programs in the XML language. But, what is the XML programming language/? Isn’t XML a format for representing hierarchal data?
I went to Dice.com and tried to find an XML programming position. My search returned a lot of hits but they were all for people developing in conventional languages that somehow used XML as a storage medium. I couldn’t find a position titled XML Programmer.
Let’s come at this another way. XML is a format for storing data. Another very popular format for storing data is CSV, or comma separated variables. So, are there CSV programmers out there? I mean, seriously, would it not be possible to develop a language that parsed and executed instructions stored in a CSV data file? I think so. Still, I haven’t heard of much demand for CSV programmers. None in fact.
I spend a good part of my workday and, unfortunately, my free time, developing JEE programs in Java. This entails a fair amount of work with XML. JEE uses XML for deployment descriptors. In addition, I often make use of the Spring Framework which supports XML as a configuration option. I frequently find myself using an XML parser and writing snippets in XPath and XQuery. Also, I often define messages using XML Schema. Does this make me an XML programmer? I’ve never though so. I don’t list that on my resume.
In general when a recruiter asks me how much XML programming I’ve done I immediately form the opinion that they don’t have a clue what they are talking about and terminate the call as quickly as possible. If you are a recruiter and have been having trouble finding highly qualified technical people you might want to stop asking about XML programming skills and be specific about what skills you seek.
Or maybe I should avail myself of one of the numerous books on XML programming and stop being such a curmudgeon.
It’s Talk Like a Pirate Day and I just didn’t have time to construct a good post. Me pint laddie is I’m bummed.
Apparently it was a nice day for the annual How Berkeley Can You Be parade.
MSNBC has long restricted its video playback to users of Internet Explorer. If you attempted to view an MSNBC video using another browser you would get a popup telling you that you needed to upgrade to the latest version of internet Explorer, or some such rot.
I refuse to use the brain dead IE and had come to accept that I would never watch a video on MSNBC. Now I will admit that on the odd occasion that a video sounded too good to pass up I did fire up IE and watch it. But it always pissed me off.
Actually it had gotten to the point that I mostly ignored anything on MSNBC that had the tag VIDEO.
Well apparently without any fanfare (at least no one told me) MSNBC has started a beta with new video playback technology that works on non-IE browsers. Here’s a capture from Firefox running on OS X:
You may be wondering how I learned of this development. I accidently clicked on a video link and it worked! Now I have to go change my pants.
Update: Ok, I discovered it still doesn’t work with Safari, but Firefox is a good start.