Data management

July 09, 2009

Does IT manage systems, or manage data?

LRobison_biopic Blogger: Lyn Robison

The traditional view is that enterprise IT manages systems. The data is the business’s concern. The only thing that IT people do with data is define the structure of data in databases, which means that the only work IT people do with data is to design the shape of the buckets that the businesspeople pour business data into.

Smart enterprises are realizing that someone needs to manage the data within these systems: not merely the schema, but the data itself (the rows, not just the columns). But who does that? My Burton Group colleague Noreen Kendle asks a great question in her previous blog post: Is Data Management an IT Function?

Every large enterprise needs a data management organization (DMO) that is populated with people who are techie enough to manage data, yet not so nerdy that they refuse to sully their hands by touching data. The DMO must include a healthy mix of tech-savvy businesspeople along with techie-but-open-minded IT people. Noreen’s question is whether that DMO should report to the business leadership or to IT leadership.

Technology is becoming a commodity. IT is being consumerized and commoditized, and enterprise IT departments are finding themselves competing against providers of IT externalization (outsourcing, SaaS, cloud computing, etc.). Those IT departments whose bread and butter is managing and implementing technology will sooner or later find that they are peddling artisan wares in a highly competitive commodity market. The enterprise IT department’s custom, handcrafted IT systems will be expensive and of poor quality in comparison to the mass-produced information technology that businesspeople well be able to buy for a few shekels from electronics retailers and cloud computing providers.

So, IMHO, the answer to the question of whether the DMO should report to the business leadership or to IT leadership depends on whether or not the IT leadership has a clue. If they have no clue, IOW if the IT leadership believes that IT's primary job is to manage technology and systems, then the DMO should report to the business leadership. If the DMO is part of the business, when IT commoditization and IT externalization eventually banish the IT department’s technophiles from the enterprise, the DMO won’t be banished along with them. The DMO will continue to function within the business, no matter what happens to the nerds in IT.

If the IT leadership does have a clue, if the enterprise has a CIO who realizes that they are a CIO instead of a CTO, then the DMO could report to IT leadership and be effective in its mission of managing the enterprise’s data. The DMO will include people who manage structured data, documents, content, information security, identity data, and the delivery of useful business information to businesspeople, and that DMO will become the enterprise IT department. From what I can see, the techies in the IT department will by and large have go to work for cloud providers, or will be obliged to find new careers.

The bottom line is that enterprises will no longer be able to afford enterprise IT’s artisan technology and systems when the off-the-shelf, mass-produced technology and systems begin to offer cheaper and more effective alternatives. If the enterprise IT department survives at all, its primary task will be data management, because the technology will be an outsourced commodity. So, as a technophile myself, I can say that it’s been fun to tinker with technology, but now the time is right for enterprise IT people to learn to manage data.

July 06, 2009

Is Data Management an Information Technology (IT) Function?

A few years ago I would not have questioned data managements fit as an IT function.  As data continues to gain recognition through initiatives such as master data management, service oriented architecture and business intelligence, IT departments have been challenged with understanding, managing, and governing the data. Many of my fellow data professionals who have been involved with data ownership/governance initiatives have expressed the difficulty of getting the business to take ownership of the data.  This is one of several reasons I have begun to question ITs role in data management. Why is it ITs job to “get” the business to own what they already own? There is something backwards about that.

Data is a representation of the organization.  The organization uses this representation, the data to operate record, manage, report and plan.  Organizations have been creating and using data long before computers were ever thought of. Data is clearly a business asset, not an IT asset as is hardware or software.  Prior to computerization, the business owned, managed, understood and governed their data assets.

Information technology is about effectively applying technology to the organizations data/information assets in order to help the organization reach its goals. Just as machine technology can be applied to a manufacturing process, information technology can be applied to a business process. The products that flow through the machine technology are never considered part of that technology or the responsibility of the machine technologist.  Products are considered a business owned asset.  So why has data and its responsibilities an IT function?    Data is clearly a business asset and most likely one of their most important assets.   It’s becoming clear to me that the responsibilities for data belong in a business department, not IT. The functions involving applying technology to the data asset, such as a DBMS, should remain the responsibility of an IT department.

June 28, 2009

Data Management and Cloud Computing

LRobison_biopic Blogger: Lyn Robison

The graphic below is from my esteemed Burton Group colleague Dan Blum’s upcoming Catalyst presentation on Cloud computing.

As he explains in a recent blog post,: “As we move from left to right in the diagram and put more and more control in the hands of the service providers, the outlook shifts from fair weather green to ominous red.” 

CloudAndData  

The far-left column shows in green that a traditional enterprise IT department controls the entire technology stack with only the network shared with a service provider (because of the Internet). The next column shows that with server hosting providers, the organization shares control of the server, storage, and network functions.

Dan explains in his blog, “As we move from Infrastructure-as-a-Service (IaaS) with its line of demarcation in the server where the silicon stops, to Platform-as-a-Service (PaaS) where you cross the line after your code and applications are integrated with outside components, to Software-as-a-Service (SaaS) where you abandon all control when you hand over your data I paint the functions these services control an alarming red.”

This graphic illustrates that as cloud computing alters the IT landscape, data is the only thing that organizations maintain any control over. Ironically, most enterprises lack any formal data management function. IT people tend to think that their job is to manage technology and systems, yet data (not technology) is something that enterprises must manage as cloud computing becomes prevelant.

As cloud computing gets adopted, those enterprise IT people who think that their job is merely to manage technology and systems will find themselves no longer working in enterprise IT –- they will be forced to go to work for or to compete against cloud providers.

The Information Management track in the upcoming Catalyst conference will provide guidance for managing enterprise data, which is important because, as this graphic illustrates, data management might become the primary task of enterprise IT in the future.

June 23, 2009

A Data Management Freebie

LRobison_biopic Blogger: Lyn Robison

It’s not often that you get something for nothing, especially something valuable like innovation in silo bridging for large enterprises.

Guidance on overcoming the problem of data silos is particularly valuable because:

  • Data silos are a permanent fixture in modern enterprises -- silos exist because of organizational boundaries and because of the boundaries of information systems, applications, and databases.

  • Data silos prevent businesspeople from getting the information they need to make informed decisions and do their jobs. You can see examples here and here.

  • Efforts at silo busting, where silos are eliminated using SOA or enterprise-wide applications, are risky and expensive and usually don’t succeed.

  • Silo bridging instead of silo busting is the only sensible strategy.

The best way that I know of to bridge silos is to use MODS, the Methodology for Overcoming Data Silos. I am doing a free webcast on MODS. It is not magic, but it is inexpensive, low-risk, and delivers compelling results. You can find out about it here:

You can get the overview of MODS here. You'll need to register to download it, but it's a simple process. I hope that you find this information valuable. Lemme know what you think!

June 22, 2009

Data Integration that can actually Work

LRobison_biopic Blogger: Lyn Robison

Recently, I watched an interesting documentary about Worldport, the worldwide hub for UPS in Louisville, Kentucky. It is obvious that shipping companies such as UPS have conquered the data integration problem, and offer a vital key for the rest of us.

UPS has multiple computer systems at Worldport, multiple computer systems at each of their regional hubs, and handheld computer systems for each of their drivers. These computer systems are silo-ed at UPS, just like computer systems are silo-ed in any other large enterprise, and as a result, each package enters and leaves many data silos on its journey from its origin to its destination. Yet UPS is able to provide an integrated, 360-degree view of each parcel as it moves through UPS’s shipping lifecycle. How does UPS do it?

One thing they do -- and this is a key for any enterprise that is looking to integrate operational data from silos -- is this: they identify each parcel.

That’s it. That is the big secret. They identify each parcel beyond the bounds of any data silo. They don’t waste hundreds of thousands of dollars trying to eliminate silos by doing SOA. They don’t replace all of their little silos with one big silo by implementing a risk-laden and hugely expensive ERP or CRM system. They simply identify each parcel. They give each parcel a tracking number by which it is known within all of the IT applications, information systems, and databases throughout UPS. Because each parcel is known in all information systems by its tracking number, UPS can pull together information about each parcel from all of their data silos, on demand.

Assigning each parcel a unique identifier is no doubt cheaper and a lot more effective than implementing SOA or a CRM system. We ought to do that in enterprise IT. We could give a unique identifier to each thing that we want to keep track of: each customer, each product, each supplier, each policy, each asset, each employee, each project, each decision, each work activity, etc.

If you knew and if everyone in the enterprise knew that every system that had any information about any of these individual things would reveal that information based on that thing’s identifier, data integration could almost be easy. Okay, maybe not easy, but certainly easier.

It turns out that in data integration, which one is almost more important than what kind? Any enterprise that identifies its non-fungible assets with unique identifiers can do silo-bridging instead of silo-busting, and will be better prepared to transition to cloud data management when the time comes.

Identifying important instances of data is one of the pillars of Burton Group’s MODS. Stay tuned for more guidance on MODS at Burton Group’s upcoming Catalyst conference.

BTW, we have a secret discount to Catalyst available to readers of this blog. To get the discount, here's what you do:

1. Go to the Catalyst home page (http://catalyst.burtongroup.com/). Either: click and then drag your mouse off the logo and release the button. OR: roll over the San Diego button but do not click, wait about 20 sec.

2. A message will pop-up stating "Congratulations! You’ve found an exclusive discount code for Catalyst 2009. Use promo code: Easter Egg and get General Sessions for only $999! Register today – this discount is limited to 50 users and could disappear at any time!"

3. Register.

That's it. Hope to see you at Catalyst!

June 10, 2009

The Seven Top Data Delusions

The world of data is full of delusions - false beliefs or ideas about data. These are fueled by the mountains of data related white papers, articles, blogs, and marketing material. If I "google" any data topic, like master data or BI, millions of hits are returned. As I skim through these, nearly all are regurgitations of the last – thus the data delusions continue to grow. It is interesting how much is assumed to be true if we read it in print.

Below are the seven most popular I continue to see:

Data Delusion One

: If the data is there then it must have been deemed good data. There are not secret data police monitoring the data in most organizations. A large percentage of incorrect data lives within the data stores.

Data Delusion Two

: If it looks right then it must be. Typically, data is considered "poor quality" when it obviously looks incorrect or is known to be incorrect. Often data can "look" right, when it is not. How do you know if the answer returned when you ask a question, using a computer system, is correct - you would not need to ask if you knew the correct answer?

Data Delusion Three

: A new tool/technology will fix the data problems. There continues to be a belief that the tools/technology will auto-magically figure out if the data is correct or belongs together. Unfortunately success is always dependant on the quality of what goes in– garbage in, garbage out is still true.

Data Delusion Four

: Data is a computer phenomenon like software or hardware. Many of the definitions support this, but data has existed for longer than before computers were ever imagined. Data is a representation of the real-world organization, its things, people, locations and events. Computers help to automate the processing of data.

Data Delusion Five

: "Cleaning" the data fixes it. There is always a reason data becomes corrupted. It just does not magically happen. Data errors or poor quality data are a symptom of a problem, rarely the problem itself. Fixing a symptom does not fix the problem - it’s like taking an aspirin for a brain tumor.

Data Delusion Six

: The data meaning can be deduced from its name/definition. Even in the rare case when a data store has been diligently modeled from a business standpoint and implemented accordingly, the data system deteriorates over time. Many of the data stores in our organizations have never been designed / modeled in the first place. The data field names and sparse definitions were often the best guess by the programmer at the time. `

Data Delusion Seven

: Data can be managed/integrate/cleaned at an individual attributes/columns level. The data attributes/ columns are intended for description purposes. They are relative to what they are describing, as well as to the relationships/ dependencies of the things they are describing. When data attributes/columns are taken out of this context and treated indiviually, they can lose much of their meaning, and thus integrity.

June 08, 2009

Increasing the Mileage of Enterprise IT

LRobison_biopic Blogger: Lyn Robison

The best way for an enterprise to improve their IT mileage is to get more from their existing systems, and the best way to get more from their existing systems is to do a better job of managing data.

IOW, to get more out of IT, enterprises don’t need to implement new IT systems; all they need to do is manage data more effectively in their existing systems. Competent data management will breathe new life into old systems and will add polish and shine and power to mainstream systems.

Economic conditions are making it difficult for enterprise IT groups to undertake expensive and risky software development projects. Smart IT groups are looking instead at low-cost projects that offer a large bang for the buck. Data management projects, particularly MODS projects, give the high IT mileage that these lean times demand.

June 04, 2009

Is IT like a $300K watch that can't tell time?

LRobison_biopic Blogger: Lyn Robison

I just found a $300,000 watch that can’t tell time. (See here and here.)

An article about it says, “This $300,000 watch does not tell the time. In fact, all it does is tells you whether it is day or night, a handy feature if you spend countless hours in a windowless room. But the watch makes up for in looks, what it lacks in basic function.”

This will be great for a metaphor in one of my talks in our upcoming Catalyst conference about fancy technology and information systems that don’t deliver any useful information. Information quality metrics are the way to avoid having expensive information systems that don't deliver information.

May 18, 2009

Enterprise Data and “Which one?” vs. “What kind?”

LRobison_biopic Blogger: Lyn Robison

Every competent IT department does data modeling and therefore knows what kind of information the buiness requires and what kind of data structures hold the enterprise’s information.

I realize that many IT departments have not performed adequate data modeling and don’t yet know “what kind” of information they have, but I am going to mention the need to go beyond type and to begin identifying instances. When dealing with enterprise information, it is often more important to answer “which one?” than “what kind?”

As I explained in my “Are your assets fungible?” post, when it comes to enterprise information, instances are often just as important as types – and sometimes more so. When you can identify the instances within your enterprise information, you can manage important business information across data silos. You can begin to do silo-bridging instead of silo-busting.

Imagine if you knew whether “Customer 57” in one silo is the same as “Cust XYZ” in another silo. Businesspeople could begin to do cross-silo joins of customer information. Now think about vendors/suppliers. Your enterprise could begin to consolidate purchases across departments to get better pricing and terms. Think about products, assets, employees, policies, decisions, etc.

In my Hippocratic Oath for Computer Systems post, I mention the fact that if information has to go from one human being, through a computer system, to another human being, the computer system should not get in the way. The computer system must not distort or damage the information. Sometimes this means that the computer system should not try too hard to understand the data, because if the computer data model is inadequate, the information will be distorted going into and coming out of the computer system. When transmitting information between humans, transmitting the instance without mucking with the meaning is often the best thing for a computer system to do.

May 17, 2009

Change Is Good

Blogger: Joe Maguire

Some people consider master data management (MDM) initiatives to be a sign of failure on the part of IT:  If the systems had been designed properly in the first place, we wouldn’t need MDM techniques to clean up the mess.  As my expressive young nephew would say, this is wrongedy wrong wrong. 

Many—perhaps most—data integration problems originate outside the realm of IT.  A casual scan of the daily newspaper can yield stories about data-integration problems in unexpected places, such as the obituaries or the fine-arts pages.

On April 30 of this year, 90-year-old Venetia Burney Phair died at Barnstead, in the county of Surrey, England.  She studied mathematics at Newnham College, Cambridge, and went on to become a chartered accountant and a teacher of math and economics.  But her greatest accomplishment occurred on March 14, 1930, when as 11-year-old Venetia Burney she proposed that the ninth planet in our solar system be called “Pluto.”  For this accomplishment, Mrs. Phair received a substantial New York Times obituary—in the neighborhood of 15 column inches.  (Remember this day; it might be the last time you ever encounter the unit “column-inches” used by someone who is not a historian discussing the quaint era of printed newspapers.)

By all accounts, Mrs. Phair lived admirably.  Pluto, on the other hand, has had a rougher time of it.  Pluto received one of the most widely publicized demotions in history.  It is our most famous ex-planet.  The hapless orb is not even unique: another ex-planet is Ceres, demoted to an asteroid in the middle of the 19th century when astronomers realized that there were hundreds of celestial bodies similar to Ceres.

By demoting Pluto on August 24, 2004, the International Astronomical Union created a far-reaching data-integration problem.  Textbooks, almanacs, and educational posters would have to be reprinted. Drop-down lists of the planets would have to be adjusted.  Board games about the solar system became outdated.

(On the plus side, Gustav Holst’s seven-movement orchestral suite “The Planets” would once again have one movement for each non-Earth planet in our solar system.  Holst wrote the suite in the early 20th century, before Pluto was discovered but well after Ceres had been demoted.  Why only seven movements? Holst excluded Earth because his metaphorical motivation was astrology, not astronomy.)

In demoting Pluto, astronomers were acting reasonably—doing what is right for their community of expertise.  Their motives were pure and utterly blameless.  IT personnel who are inconvenienced by the resulting data-integration problem have no cause to complain.

In fact, IT personnel should expect such disruptions.  Here’s another one:  The Antoinette Perry Awards for Excellence in Theatre, more commonly known as the Tony Awards, have created an interesting problem with the nominees for the 2009 award for Best Performance by a Leading Actor in a Musical.    There are five nominations:

  1. David Alvarez, Trent Kowalik, and Kiril Kulish in Billy Elliot the Musical as “Billy Elliot.”
  2. Gavin Creel in Hair as “Claude.”
  3. Brian d’Arcy James in Shrek the Musical as “Shrek.”
  4. Constantine Maroulis in Rock of Ages as “Drew.”
  5. J. Robert Spencer in Next to Normal as “Dan.”

The problem occurs at the top of the list.  Three actors starring as Billy Elliot share the arduous, eight-shows-a-week load by alternating performances.  (A fourth actor, Stephen Hanna, plays the role of “Billy’s Older Self” in every show.  Hanna is not included in the Award Nomination because his part is considered to be a distinct role.)

The three-way shared nomination causes all sorts of problems.  Naturally, any structured rendering of the data that expects each nomination to consist of a single nominee will have to be redesigned. 

The disruption extends far beyond the storage structures.  Formatted presentations (e.g., reports and forms) about nominees will also need to be redesigned.

Even business policies and procedures can be affected. In previous years, each Tony-award voter could cast a ballot in a particular category only if he or she had seen all the shows.  Applied to the three-way nomination for Billy Elliot, the business rule is ambiguous.  Are potential voters required to attend five performances (one for each nomination) or seven (one for each nominee)?  The answer, according to the Tony Awards administration committee, is five.

That answer might seem strange, but that is immaterial to this discussion.  What’s important here is the need to ask the question.  Such questions—questions that clarify seemingly well-understood phenomena for which software has already been deployed—are commonplace.  Data-integration problems are not exceptional; they are the norm.

In acknowledgement of this state of affairs, software engineers are often encouraged to “plan for change.”  That can mean a lot of things.

One way to plan for change is to accurately predict the future.  Here, the state-of-the-art is immature.  Astrology might suffice as a metaphorical framework for orchestral suites, but it’s not so good for anticipating changing business conditions.  Other approaches are needed.

One result of the desire to plan for change: Software engineering practice, including modularization, information hiding, separation of concerns, generalization, abstraction, …  Software designs can be judged on how well or poorly they honor these principles.  That is, engineers can honor these principles by creating code that is appropriately generalized, abstract, modularized, etc…

Beyond these principles, there is another way to plan for change:  to expect change and to have in-place processes and techniques for coping with it.  These processes will include data governance; the techniques will include master data management.

In the absence of reliable seers, time machines, and other ways to peer into the future, the best approach is to classify the various potential changes to data requirements into abstract patterns, and to develop techniques that can apply to those patterns.  For example, two such patterns follow:

  • Pattern name:  A maximum cardinality constraint changes from “One” to “Many.”

  • Pattern name:  An enumerated type loses a value.

These patterns can be a linchpin of data-governance and MDM initiatives.  Data-governance processes can include decision points based on the type of problem pattern encountered, and MDM solutions can be planned based on the kind of solution pattern required.  The two patterns listed above correspond to “the Pluto problem” and “the Billy Elliot problem” respectively.

Allow me to predict the future: Over the next few months, I’ll be contemplating such patterns as they apply to data-governance and MDM initiatives.   I’m especially looking forward to describing some of these patterns at a day-long workshop on Master Data Management to be presented at Catalyst 2009 by me and my colleague Joe Bugajski.


  • Burton Group Free Resources Stay Connected Stay Connected Stay Connected Stay Connected


Catalyst 2009


Blog powered by TypePad