By Noreen Kendle
The world of data is full of delusions - false beliefs or ideas about data. These are fueled by the mountains of data related white papers, articles, blogs, and marketing material. If I "google" any data topic, like master data or BI, millions of hits are returned. As I skim through these, nearly all are regurgitations of the last – thus the data delusions continue to grow. It is interesting how much is assumed to be true if we read it in print.
Below are the seven most popular I continue to see:
Data Delusion One
: If the data is there then it must have been deemed good data. There are not secret data police monitoring the data in most organizations. A large percentage of incorrect data lives within the data stores.
Data Delusion Two
: If it looks right then it must be. Typically, data is considered "poor quality" when it obviously looks incorrect or is known to be incorrect. Often data can "look" right, when it is not. How do you know if the answer returned when you ask a question, using a computer system, is correct - you would not need to ask if you knew the correct answer?
Data Delusion Three
: A new tool/technology will fix the data problems. There continues to be a belief that the tools/technology will auto-magically figure out if the data is correct or belongs together. Unfortunately success is always dependant on the quality of what goes in– garbage in, garbage out is still true.
Data Delusion Four
: Data is a computer phenomenon like software or hardware. Many of the definitions support this, but data has existed for longer than before computers were ever imagined. Data is a representation of the real-world organization, its things, people, locations and events. Computers help to automate the processing of data.
Data Delusion Five
: "Cleaning" the data fixes it. There is always a reason data becomes corrupted. It just does not magically happen. Data errors or poor quality data are a symptom of a problem, rarely the problem itself. Fixing a symptom does not fix the problem - it’s like taking an aspirin for a brain tumor.
Data Delusion Six
: The data meaning can be deduced from its name/definition. Even in the rare case when a data store has been diligently modeled from a business standpoint and implemented accordingly, the data system deteriorates over time. Many of the data stores in our organizations have never been designed / modeled in the first place. The data field names and sparse definitions were often the best guess by the programmer at the time. `
Data Delusion Seven
: Data can be managed/integrate/cleaned at an individual attributes/columns level. The data attributes/ columns are intended for description purposes. They are relative to what they are describing, as well as to the relationships/ dependencies of the things they are describing. When data attributes/columns are taken out of this context and treated indiviually, they can lose much of their meaning, and thus integrity.
Excellent post.
In medicine, these delusions are playing out as a belief in "computational alchemy", the turning of uncontrolled EMR data ("lead') algorithmically into gold. For example, our new secretary of HHS for example calls for "comparative effectiveness research" using such EMR data.
I wrote about my concerns in "The Syndrome of Inappropriate Overconfidence in Computing: An Invasion of Medicine by the Information Technology Industry?" at http://www.jpands.org/vol14no2/silverstein.pdf (PDF).
Posted by: S Silverstein | June 11, 2009 at 02:58 AM
Do you know that it is correct time to get the business loans, which can realize your dreams.
Posted by: Boyer30Effie | March 22, 2010 at 07:05 AM