Monday, January 26, 2009

A Universal XML Software Pattern




XML is essentially a language used to define complex data by decomposing it into meaningful parts and then arranging the parts in a hierarchical structure. This hierarchical structure is what computer scientists refer to as a “tree.” For example, as shown in the above diagram, a book could be defined as a bunch of data that is composed of chapters, with the chapters, in turn, composed of paragraphs. The complete book could be defined by further decomposing the paragraphs, sentences, and phrases into words.

While this example is relatively simple, XML can more usefully be used to define complex numerical data that is being transmitted between various application programs. The XML allows each application program to interpret the meaning of the data without ambiguity. Financial reports, such as balance sheets and income statements can be decomposed into parts that represent the company’s assets, expenses, and revenues. Each of these parts could be further broken down into parts that represent various categories of those items. The real value of this use of XML is that, while humans can interpret the items in a financial report by their relative positions on paper, a computer program requires the exacting definition that only be provided by a hierarchical structuring like that provided by XML.

So, the key to utilizing XML is to take the unambiguous data interpretation that is provided by the hierarchical tree structure of XML and convert it into a parallel hierarchical tree structure in the application programs’ native languages (i.e. Java or C++). The XML data needs to be applied to variables inside the program, and, because the tree structure represented by the XML document represents the ultimate meaning of the data, the program will need to at least begin its interpretation of the XML document with its internal variables arranged in a hierarchical structure that parallels the XML document.

The question then becomes, “Can we write a single program module that will work for all XML documents and all application programs, regardless of the complexity of the XML?” Can we write software that is reusable across many programs and all XML documents that will interpret the document and make its data available in a hierarchical tree of variables internal to the application programs?

The answer to these questions is a positive one, and in following blog posts here I will show how it can be done in any programming environment that supports object-oriented design. The solution will involve the application of three common design patterns that are standard across the object-oriented design universe and can be implemented in Java, C++, C#, various scripting languages, or any other object-oriented language. This simple but powerful solution and the patterns that it uses will be detailed in the posts that immediately follow this one.

No comments: