Blogger: Joe Maguire
Some people consider master data management (MDM) initiatives to be a sign of failure on the part of IT: If the systems had been designed properly in the first place, we wouldn’t need MDM techniques to clean up the mess. As my expressive young nephew would say, this is wrongedy wrong wrong.
Many—perhaps most—data integration problems originate outside the realm of IT. A casual scan of the daily newspaper can yield stories about data-integration problems in unexpected places, such as the obituaries or the fine-arts pages.
On April 30 of this year, 90-year-old Venetia Burney Phair died at Barnstead, in the county of Surrey, England. She studied mathematics at Newnham College, Cambridge, and went on to become a chartered accountant and a teacher of math and economics. But her greatest accomplishment occurred on March 14, 1930, when as 11-year-old Venetia Burney she proposed that the ninth planet in our solar system be called “Pluto.” For this accomplishment, Mrs. Phair received a substantial New York Times obituary—in the neighborhood of 15 column inches. (Remember this day; it might be the last time you ever encounter the unit “column-inches” used by someone who is not a historian discussing the quaint era of printed newspapers.)
By all accounts, Mrs. Phair lived admirably. Pluto, on the other hand, has had a rougher time of it. Pluto received one of the most widely publicized demotions in history. It is our most famous ex-planet. The hapless orb is not even unique: another ex-planet is Ceres, demoted to an asteroid in the middle of the 19th century when astronomers realized that there were hundreds of celestial bodies similar to Ceres.
By demoting Pluto on August 24, 2004, the International Astronomical Union created a far-reaching data-integration problem. Textbooks, almanacs, and educational posters would have to be reprinted. Drop-down lists of the planets would have to be adjusted. Board games about the solar system became outdated.
(On the plus side, Gustav Holst’s seven-movement orchestral suite “The Planets” would once again have one movement for each non-Earth planet in our solar system. Holst wrote the suite in the early 20th century, before Pluto was discovered but well after Ceres had been demoted. Why only seven movements? Holst excluded Earth because his metaphorical motivation was astrology, not astronomy.)
In demoting Pluto, astronomers were acting reasonably—doing what is right for their community of expertise. Their motives were pure and utterly blameless. IT personnel who are inconvenienced by the resulting data-integration problem have no cause to complain.
In fact, IT personnel should expect such disruptions. Here’s another one: The Antoinette Perry Awards for Excellence in Theatre, more commonly known as the Tony Awards, have created an interesting problem with the nominees for the 2009 award for Best Performance by a Leading Actor in a Musical. There are five nominations:
- David Alvarez, Trent Kowalik, and Kiril Kulish in Billy Elliot the Musical as “Billy Elliot.”
- Gavin Creel in Hair as “Claude.”
- Brian d’Arcy James in Shrek the Musical as “Shrek.”
- Constantine Maroulis in Rock of Ages as “Drew.”
- J. Robert Spencer in Next to Normal as “Dan.”
The problem occurs at the top of the list. Three actors starring as Billy Elliot share the arduous, eight-shows-a-week load by alternating performances. (A fourth actor, Stephen Hanna, plays the role of “Billy’s Older Self” in every show. Hanna is not included in the Award Nomination because his part is considered to be a distinct role.)
The three-way shared nomination causes all sorts of problems. Naturally, any structured rendering of the data that expects each nomination to consist of a single nominee will have to be redesigned.
The disruption extends far beyond the storage structures. Formatted presentations (e.g., reports and forms) about nominees will also need to be redesigned.
Even business policies and procedures can be affected. In previous years, each Tony-award voter could cast a ballot in a particular category only if he or she had seen all the shows. Applied to the three-way nomination for Billy Elliot, the business rule is ambiguous. Are potential voters required to attend five performances (one for each nomination) or seven (one for each nominee)? The answer, according to the Tony Awards administration committee, is five.
That answer might seem strange, but that is immaterial to this discussion. What’s important here is the need to ask the question. Such questions—questions that clarify seemingly well-understood phenomena for which software has already been deployed—are commonplace. Data-integration problems are not exceptional; they are the norm.
In acknowledgement of this state of affairs, software engineers are often encouraged to “plan for change.” That can mean a lot of things.
One way to plan for change is to accurately predict the future. Here, the state-of-the-art is immature. Astrology might suffice as a metaphorical framework for orchestral suites, but it’s not so good for anticipating changing business conditions. Other approaches are needed.
One result of the desire to plan for change: Software engineering practice, including modularization, information hiding, separation of concerns, generalization, abstraction, … Software designs can be judged on how well or poorly they honor these principles. That is, engineers can honor these principles by creating code that is appropriately generalized, abstract, modularized, etc…
Beyond these principles, there is another way to plan for change: to expect change and to have in-place processes and techniques for coping with it. These processes will include data governance; the techniques will include master data management.
In the absence of reliable seers, time machines, and other ways to peer into the future, the best approach is to classify the various potential changes to data requirements into abstract patterns, and to develop techniques that can apply to those patterns. For example, two such patterns follow:
Pattern name: A maximum cardinality constraint changes from “One” to “Many.”
- Pattern name: An enumerated type loses a value.
These patterns can be a linchpin of data-governance and MDM initiatives. Data-governance processes can include decision points based on the type of problem pattern encountered, and MDM solutions can be planned based on the kind of solution pattern required. The two patterns listed above correspond to “the Pluto problem” and “the Billy Elliot problem” respectively.
Allow me to predict the future: Over the next few months, I’ll be contemplating such patterns as they apply to data-governance and MDM initiatives. I’m especially looking forward to describing some of these patterns at a day-long workshop on Master Data Management to be presented at Catalyst 2009 by me and my colleague Joe Bugajski.