Friday, January 30, 2009

A Universal XML Software Pattern, Part IV

There already exists a framework for inserting XML in programming languages such as Java and C++. It is known as DOM (Document Object Model). DOM and its variations, however, work at a very low level that assists reading of the document into our program but which proves insufficient to convert the XML data into objects that are meaningful to our applications. Each component of the XML document is parsed by DOM and turned into an object of the type “Element.” Since the Element class has no relationship to the application programs that are being developed, it offers no effective “methods” or activities that would help our programs process information. To fully utilize XML, DOM Element objects must be converted into a Composite pattern made up of objects that are both an integral part of our program and a functional component of a universal tree data structure.

The DOM model is not really a true reflection of the XML document. The XML document is made up of objects, arranged in a hierarchy, that belong to different classes or categories with different meanings based upon the object’s membership in those classes or categories. XML may actually format each of these objects into syntactical units called elements, reflecting the DOM model, but the different elements of XML actually have separate classes with class behaviors and meanings based upon the element’s tag name. Syntactically, the DOM programming model may be a reflection of the XML document, but semantically, representing the true meaning of the data, the Element objects of DOM are empty and need the object-oriented classes of our application programs to give them life. The DOM model works for all XML because it does not involve itself in the meaning of the data, only the syntax. Our Composite framework, on the other hand, must allow for the giving of meaning to the information.

The actual building phase of our Composite pattern involves converting a tree of primitive Element objects into a tree of objects that are meaningful citizens of our application programs.
This, in a nutshell, is our challenge: how do we best design a framework that allows our objects in the Composite tree to have the meaning that our programs need to solve particular problems while still requiring them all to have the common interface that is necessary to organize them into a Composite tree? We need to design a way to allow classes of objects to be organized into a Composite tree without requiring each of these perhaps thousands of classes of objects to have the Composite pattern interface.

To support the Composite tree, each object in the hierarchy needs to have the interface designed in the previous post here, but, to give our Composite tree a universal application, we need to be able to have it used with objects that know nothing about the hierarchy they are being assembled into and which therefore do not support the Composite pattern’s interface. As a review, this interface, which we will hereafter refer to as the TreeNode interface, has the following methods.
  • getParentNode
  • getChildrenNodes
To resolve the conflict between universality and the very specific requirements of the Composite tree we again resort to the standard patterns that exist in the world of object-oriented programmers. Specifically, the Adapter pattern, with some modification, will be used to wrap Composite-na├»ve objects into “wrapper” objects that support the TreeNode interface. In the following post, this design pattern will be further elaborated upon as we continue to design a universal Composite framework that can be used with any XML document.

A Universal XML Software Pattern, Part III

The creation of a universal XML framework must begin with the Composite pattern. The Composite pattern, the representation of a part-whole hierarchy into a tree structure, is such a powerful model of the many objects that make up our world that it needs to be made into a framework of its own – a Composite framework that can be used to model many other things in addition to XML. As a universal framework, we want our composite pattern to the independent of XML or any other medium of communication. We are not simply solving a particular problem; we are going to convert the Composite pattern into true intellectual capital that can actually increase the value of an IT department.

The Composite pattern is worthy of the effort that goes into a universal framework. As a programming representation of a hierarchical tree structure it is the most fundamental configuration of data in information science. The fact that it is an isomorphic fit to the hierarchical structure that defines XML attests to the universal nature of its utility to the study of information in general.

In later posts to this blog, it will be shown how all of the information found in the financial data now produced by accountants can be expressed simply as a Composite tree that has the ability to easily produce and display the standard financial reports that are used to measure the components of our economy and has the power to vastly expand on the information provided in those reports.

Our framework design begins by defining the classes of objects that interact to support our Composite tree framework. Objects interact by offering each other a set of useful activities. One object calls on of another object’s activities in order to perform steps in the process that is the application program. The definition of each object is made up of the activities that it can perform for the asking or “calling” object. This definition is known as the object’s interface. A certain class of objects (i.e. a class of persons, accounts, departments, etc.) has a common interface that is shared by all of the objects of that class, allowing us to define a whole population of objects that behave similarly according to this common interface.

How many class interfaces need to be defined for the objects that make up our Composite framework and how complicated do their interfaces have to be? The answer is that there only needs to be one interface defined for this framework and that interfaces is a very simple one. Every object in the Composite tree is a node that points at its neighboring objects as is shown in the diagram above. Each node points at a single parent node higher than itself in the hierarchy and any number of child nodes that are immediately below in the hierarchy. To facilitate the program’s navigation throughout the Composite tree, each node has to provide access to these neighboring objects for the parts of the program that “call” its interface. Therefore, each object needs to provide the activities (or, in programming parlance, “methods”) that give access to its parent and children for a part of the program that “calls” it. Logically, we could name these activities as follows:
  • getParentNode
  • getChildrenNodes
Because of the inherent nature of tree nodes, we can determine that a Composite tree can be fully navigated and utilized when it is populated by objects that satisfy this simple interface. The “getParentNode” method allows us to navigate up the tree by recursively asking each node to give us its parent node. Similarly, we can be navigated downwardly by similarly calling on the node objects’ “getChildrendNodes” methods.

This small simple interface essentially provides us with a Composite framework. The interface can be satisfied by defining classes of objects that all fulfill this interface, and, as we will show in following posts, we can generate a single class of objects that satisfies this interface universally by using the Adapter pattern.

Tuesday, January 27, 2009

A Universal XML Software Pattern, Part II

The Composite pattern in object-oriented design is a natural fit for implementing an XML document into the variables of an application program. According to the “gang of four” canonical definition of design patterns, the Composite pattern is used to “Compose objects into tree structures to represent part-whole hierarchies.” (1) This pattern does for the variables of an application program exactly what XML does for the data in a document – it structures it into a hierarchical tree wherein the children nodes of the tree are components or parts of their parent node.

The above diagram shows two application programs importing the data from an XML document and storing it internally as a Composite pattern tree. The program on the left has a structure of internal variables that is a direct reflection of an XML document. The program on the right is presenting the same XML document in a graphical user interface using a graphical tree (in a manner similar to the way the file system of a computer is displayed graphically to the user in the Windows Explorer).

The fit of the Composite pattern to the elements of an XML document are apparent and intuitive at first glance. Each element of an XML document is a whole that can be decomposed into parts (until we reach the most fundamental elements which we refer to as the “leaves of our tree”). This is exactly how the Composite pattern works. Each node in the Composite tree includes a set of pointers to its children (again excepting the lowest level “leaves” of the Composite tree). In the Composite pattern, each node in the tree can be treated as a whole that is made up of its children parts and this is exactly how the elements of an XML document work. Put simply, there is a one-to-one isomorphic correspondence between the elements of an XML document and the nodes of a tree in the program’s Composite pattern.

Our Universal XML Pattern starts with the storing of the XML elements with a Composite pattern. This is the easy and intuitive part. The greater challenge will be in bridging the gap from an incoming stream of XML data to a fully structured Composite tree within the application program and doing so in a manner that works for all XML documents.

The following posts to this blog will show how a short recursive algorithm can read through the elements of any XML document and generate a tree-like structure internal to an application program. Using simple variations of the Factory Method and Adapter patterns, we can enable any application program to universally convert any XML document into a Composite pattern of program variables.

(1) Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software (Reading, Massachusetts: Addison-Wesley Publishing Company: 1995) p. 163.

Monday, January 26, 2009

A Universal XML Software Pattern

XML is essentially a language used to define complex data by decomposing it into meaningful parts and then arranging the parts in a hierarchical structure. This hierarchical structure is what computer scientists refer to as a “tree.” For example, as shown in the above diagram, a book could be defined as a bunch of data that is composed of chapters, with the chapters, in turn, composed of paragraphs. The complete book could be defined by further decomposing the paragraphs, sentences, and phrases into words.

While this example is relatively simple, XML can more usefully be used to define complex numerical data that is being transmitted between various application programs. The XML allows each application program to interpret the meaning of the data without ambiguity. Financial reports, such as balance sheets and income statements can be decomposed into parts that represent the company’s assets, expenses, and revenues. Each of these parts could be further broken down into parts that represent various categories of those items. The real value of this use of XML is that, while humans can interpret the items in a financial report by their relative positions on paper, a computer program requires the exacting definition that only be provided by a hierarchical structuring like that provided by XML.

So, the key to utilizing XML is to take the unambiguous data interpretation that is provided by the hierarchical tree structure of XML and convert it into a parallel hierarchical tree structure in the application programs’ native languages (i.e. Java or C++). The XML data needs to be applied to variables inside the program, and, because the tree structure represented by the XML document represents the ultimate meaning of the data, the program will need to at least begin its interpretation of the XML document with its internal variables arranged in a hierarchical structure that parallels the XML document.

The question then becomes, “Can we write a single program module that will work for all XML documents and all application programs, regardless of the complexity of the XML?” Can we write software that is reusable across many programs and all XML documents that will interpret the document and make its data available in a hierarchical tree of variables internal to the application programs?

The answer to these questions is a positive one, and in following blog posts here I will show how it can be done in any programming environment that supports object-oriented design. The solution will involve the application of three common design patterns that are standard across the object-oriented design universe and can be implemented in Java, C++, C#, various scripting languages, or any other object-oriented language. This simple but powerful solution and the patterns that it uses will be detailed in the posts that immediately follow this one.

Wednesday, August 13, 2008

2. XML and Web 2.0

In the previous post, XML was introduced as a language that allowed computer programs to determine the meaning of data. While humans can interpret data based upon some general context, such as its position on a page or how it is used in a sentence, a computer program interprets data by using the unambiguous structure of a markup language such as XML.

How significant is the use of XML is peer-to-peer communication between computer programs? It is revolutionizing the internet. When you hear terms such as AJAX, REST, Web 2.0, news feeds, RSS, and ATOM, you are basically talking about using the internet to allow isolated computer programs to access remote data, interpret it, and use the data as its fodder.

A computer program can scan thousands of items from the internet, all formatted in a well-defined XML format, and mash the data up into a single meaningful report that it delivers to its owner. A program working for the Securities and Exchange Commission is able to read thousands of earnings reports, all formatted in an XBRL (XML for financial reports), and use artificial intelligence techniques to determine which companies are financially stressed. A busy executive could have his own tailor-made search engine that scans and flags internet news articles that are relevant to his marketing strategies. By using the power of the computer program, users are able to magnify the amount of information that they can used from the internet.

HTML reading browsers made the first generation of the web accessible to the average human, but its power was limited by the data volume that could be consumed by the human eye and brain. Web 2.0, on the other hand, is making the web more accessible to the machine, allowing the machine to consume orders of magnitude more data.

Web 2.0 and XML are providing the ultimate consumer of data, the human, with the ability to utilize huge volumes of data that are first interpreted and summarized by computer programs.

Monday, July 14, 2008

1. XML for Business Intelligence

Much sooner than later, every company will have to do business the XML way, or it won’t do business at all.
Robert H. Hertz, E. Mary Keegan, and David M. H. Phillips, The Value Reporting Revolution

XML (an acronym for “eXtensible Markup Language”) is not a programming language, and, despite its name, it is not just a language for “marking up” documents. XML is a way of expressing the meaning of information – XML provides us with an unambiguous way of expressing the semantics of data that we can communicate to another person or machine.

It is the ability of XML to communicate meaning to a computer, however, that makes it a revolution whose time is inevitable. Human readers have the ability to derive meaning for data from the position or format of a report. Columns represent categories, rows allow the data to be linked to a person or object, and positions within sentences provide us with textual context for the data. However, machines have a great difficulty in determining visual patterns and require a “markup” language to give data categories, textual context, and the other dimensions of meaning.

XML simply provides us with the ability to attach meaningful tags to pieces of information, allowing a machine that reads that information to detect the critical information. A machine that is programmed to read income summaries from financial statements, get simply scan the document for the “income” tag and read the numbers inside the tag, allowing the machine to find thousands of corporate incomes within minutes.

The intent of this blog is to introduce the reader to the powers of XML in the world of financial information and business intelligence. This is the first post for this blog. In the following blogs we will continue to show how the hierarchical structure of XML allows a document to report orders of magnitude more information than was possible before its inception. See Banking the Past, page 180.