Blogger: Joe Maguire
We have this from the New York Times:
The United States Supreme court on Friday overturned a lower court's order requiring state officials in Ohio to supply information that would have made it easier to challenge prospective votersClick here for the full article.
As expected, the disputants fall on opposite ends of the political spectrum. With a U.S. presidential election only weeks away, 'tis the season for that sort of thing. What's not expected: the case hinges on data-quality issues.
The Help America Vote Act (HAVA) requires states to compare information on voter registration applications against information in government databases such as those maintained by state departments of motor vehicles. When some of the information does not match, the application is flagged.
The Ohio members of one major political party sued to compel the Ohio Secretary of State, a member of – hang on to your hat – the other major political party, to release information about such applications to county officials. These county officials could presumably use this information to force certain recently-registered voters to cast provisional rather than regular ballots. (Many provisional ballots are ultimately excluded from final election tallies.)
For seasoned data-management professionals, here's one more part of the story that should come as no surprise. Again from the same article in the Times:
Voting experts and state election officials have raised concerns about treating flagged voters differently because the databases used to check registrations are prone to errors. Most non-matches are the result of typographical errors by government officials, computer errors and the use of nicknames or middle initials, not voter ineligibility, they said.
In one audit of match failure in 2004 by New York City election officials, more than 80 percent of the failures were found to have resulted from errors by government officials; most of the remaining failures were because of immaterial discrepancies between the two records.
There are two important lessons here. First, data quality matters.
Second, you cannot always predict what data will be used for. System designers who design data for a single application or a single expected use will ultimately find themselves serving other masters, such as election officials. One set of data can contribute to dozens or hundreds of processes, so data designers should design the data in process-neutral (or process agnostic) way.