Enabling XYplorer for Checking XML
Monday, January 9, 2017 at 1:54PM
Stephen Millard in Technology, XYplorer

Working with data and configuration files is a part of my day job and this presents me with many opportunities to automate the things I need to do with them. Recently I had a need to check a large set of XML files to make sure there were no obvious errors in them and the approach I settled on was to integrate a command line tool into XYplorer - my Windows file manager of choice.

XML Files

XML stands for eXtensible Mark-up Language and is a way of defining structured data in a plain text based file format. It's built from angle bracket based tags just like HTML but unlike HTML the tags are not predefined. Anyone can define their own XML tags and use them in the file. The key is that you have to have something that will process the file and know how to use the data within those tags. This could be a human (as XML was intended to allow data structures to be readily understood by anyone reading them) but is more likely to be some sort of software solution.

XML files can be denoted as well-formed and valid. A well formed file essentially uses a logical structure of data where all tags are used logically and the content is populated within the tags. A valid file is not only well formed but also uses a secondary file called a Definition Type Document (DTD) to validate that all of the tags used are within a constrained set and use appropriate attributes, etc. An XML file includes a link to a DTD file if validation is to be considered. In reality not all XML files have a DTD defined and assigned.

My Requirement

In my work I manipulate XML files quite a bit. Sometimes I'm changing one file and other times I'm changing maybe up to a few hundred. When these changes are loaded into a system they don't always load smoothly. Sometimes I've missed closing a tag in the right way, I've carried out a find and replace that didn't account for an exception use case in a file or maybe I just got an ordering wrong or made a typo in a tag name. Whilst my test editor helps me get this right the vast majority of the time human error does creep in.

In these cases the issue becomes how do I find the file that's wrong and correct it? Checking a handful of files is easy enough but becomes a real chore when you hit double figures and quite frankly when you're looking at dozens of large files of data it's pretty easy to just gloss over something pertinent.

It just makes sense to have the computer do it for you at that point. It can follow instructions to check the files and as a result it is going to be much faster and give a more reliable result.

Finding a Validator

XML files have been around quite a while and I was sure that there was probably a free tool available somewhere that I could use to do this rather than having to build one myself which is not a trivial task. A quick Google search revealed a number of potential tools that could be used to support the approach I wanted to take. I wanted to have a command line utility that could be passed a path to an XML file and alert me to any issues found in it such that I could then sequentially pass in all the files I wanted checking as a batch and get some sort of summary back for all of the files.

The tool I settled on was XMLStartlet which is a set of command line utilities that can be used not only to validate XML files but also transform them, edit them and query them. Whilst most of the files I work with don't have a DTD defined being able to carry out a full validation rather than just a well-formedness check when possible is only beneficial.

To validate a file using this tool an action parameter is set as 'val' and then the XML file is passed in as a second parameter. The result is then put out on the command line standard output.

Using XYplorer

I'm in XYplorer a lot when working with these files and given that it is a scriptable file manager it was an obvious option to build in the validation here. I could have written a stand alone script and made it accessible via the send-to menu or perhaps the XML context menu with a registry hack. However XYplorer replaces the inbuilt file manager for me, my send-to menu is already full of miscellaneous scripts and I had a feeling I might have some additional uses for running command line tools like this directly from XYplorer where I could then potentially act on the returned information from the command line. For now therefore it seemed like a good opportunity to give it an initial try and a reasonable fit for the task at hand.

I created a script inside of XYplorer and assigned it to a custom button on the toolbar. I expect if I come up with any other XML automation requirements I'll add them to the same button and menu option them. I don't expect to use the script(s) on a daily or even a weekly basis and so can get away with the inefficiencies of overloading it if required.

The script starts by checking how many files are selected in the currently active pane. Note that I've not made any check of the file extension. This is actually on purpose as sometimes I might have backup or new versions with different file extensions (e.g. ".old", ".new", ".txt") that I might want to also validate against. If there's no file selected then it pops up a message to the user.

If there's just one file selected the script will output the validation result directly to a message box, but if there are multiple files selected a temporary file is specified (in fact it first deletes any old copies of the file) called xmlcheck.txt. The script then iterates through all of the validation checks, and appends each result to the xmlcheck.txt file which is finally displayed by opening it in Windows notepad. The script also sets the status bar message in XYplorer upon completion.

The command line calls return the standard output from the command line and you may note that there appear to be an inordinate number of double quotes in use. This is because the parameters to the runret scripting command need to be double quoted and also the command line file paths need to be double quoted - so a pair of sequential double quotes within the runret parameter is actually resolved to a single (i.e. escaped) double quote. In addition you may also note the use of a number of special variables such as countselected, selecteditemspathnames and curitem that relate to information about file selections in XYplorer.

//Check XML file validity
  if (get("CountSelected") < 1) {
    msg "You must select at least one XML-based file to validate";
    status "XML Validation Could Not Be Completed", "000000", "alert";
  }
  elseif (get("CountSelected") > 1) {
    status "Processing XML files...", "000000", "progress";
    $strOutputFilePath = "c:\temp\xmlcheck.txt";
    delete 1, 0, "$strOutputFilePath";

    foreach($token, ) {
      status "XML Validating - $token", "000000", "progress";
      $output = runret("""C:\Users\millaste\Dropbox\Programming and Scripting\DOS Batch\Starlet XML\xml.exe"" val ""$token""");
      writefile("$strOutputFilePath", "$output", "a", "t");
    }
    run "notepad ""$strOutputFilePath""";
    status "XML Validation Processing Complete", "000000", "ready";
  }
  else {
    status "Processing XML files...", "000000", "progress";
    msg runret("""C:\Users\millaste\Dropbox\Programming and Scripting\DOS Batch\Starlet XML\xml.exe"" val """"");
    status "XML Validation Processing Complete", "000000", "ready";
  }

Result

The script works well for me and whilst it can take a while to validate large numbers of large files the time saving benefit is huge. Even if you aren't an XYplorer user I'd recommend that if you do work with XML files that you have a batch validation option on hand for when you do. It's well worth it.

Article originally appeared on Thought Asylum (http://www.thoughtasylum.com/).
See website for complete article licensing information.