In October, during my travels to the International Conference on Information Quality (ICIQ), held at the University of Arkansas Little Rock (UALR), I rented a Toyota Yaris from the Dollar rental car company. When I returned the Yaris, the friendly staff member casually asked if I had just driven around town, and I said yes, but asked why they were asking me. Well he said, based on what we have recorded for the mileage when the car was checked out, and what it says now, you have driven 20,000 miles. My breath stopped for a second, but he quickly added that he thinks that the last time the car was checked in, another staff member wrote down the trip mileage rather than the odometer mileage, and because the trip log had not been reset for many miles- it had such a high number of miles- nearly 60,000- that the staff likely assumed it was the odometer reading, and wrote that value down instead of the odometer.
As you can see in the screenshot below, the mileage label is shown on the left (green arrow), the mileage value shown just to the right of that (blue arrow), and the button to toggle between trip A, trip B and Odometer, on the other side of the dash (red arrow). In order to see the Odometer, one has to push the button (note it's actually a pole that one can also twist to change the level of dashboard light).

The following is what one sees when displaying the odometer mileage. Clearly, the label is now missing (green arrow).

In actuality, the Odometer was closer to 80,000 miles (not shown here), so the difference between what was recorded the last time the car was returned, and what it showed when I returned it, was the apparent mileage driven- 20,000 miles. On a side note, in order to have driven 20,000 miles in 48 hours (amount of time I was in town for the conference) I would have had to be driving about 416 miles per hour the whole time (20,000miles/48hrs). Not really believable is it? (*future blog on believability).
Often, data quality begins with the people (because we're the ones creating/entering the data). Computers perform binary operations, with internal checksums and methods to consistently ensure correct transactions, but in the situation described above, we can see how a minor human oversight caused a data quality error. In this scenario, we can see that without paying attending to the label, human data entry resulted in an incorrect value.
So which dimensions of data quality could be used to describe this situation and how might we prevent the creation of poor quality data in the future?
- The first is Consistency- which measures of whether or not data is equivalent across systems or location of storage.
- The second is Representation- which measures the ease of understanding data, consistency of presentation, appropriate media choice, and availability of documentation (metadata).
- The third is Validity- which measures whether a value conforms to a preset standard.
The following lists out a number of processes or changes that could be made within Dollar in order to prevent this type of poor data quality. I've tried to include discussion around the cost of each type- so that if you're a leader you'd be able to choose which one to implement first and how much to budget.
 
| # | Possible process or technological remediation | Associated CDDQ Dimension and Underlying Subject | 
| 1 | They could require two staffers to read the value (high cost given that the number doesn't appear to impact operations nor finances too much). This would provide two data points and the DQ check would be to ensure consistency of those two points. | Consistency- Equivalence of Redundant or Distributed Data | 
| 2 | Proper training is key to correct data collection. The use of procedure documents (providing step by step instructions and reminders) may be a solution in this case. Typically, the cost of implementing in-house training isn't too high, and provides a decent return, but may not eliminate the cost of rework when errors are caught later on or managers need to remind staff to perform the task again. | N/A | 
| 3 | Assuming that Dollar has a significant relationship with Toyota- due to the size of their fleet- they could review the functional design of the odometer with Toyota. This discussion might lead to the redesign of the odometer in such a way that representation of trip measure versus odometer are displayed in unmistakably different ways. This might mean placing the odometer on the left side of the dash and the trip reading on the opposite side- to prevent confusion. Unfortunately, this option will likely take a long time thereby driving up opportunity costs. | Representation- Easy to Read & Interpret | 
| 4 | Use of Validity Checks Using Business Rules At some point the staff accidentally started taking the readings based on the trip metric and once that was relatively large it likely made the current trip number seem reasonably correct (e.g. one would not suspect the trip to be 40,000 miles- so mentally they didn't realize that it may have been the trip distance). Given that the trip number will always be lower than the odometer a business rule could be established that if they record a number less than the last time the car was checked in, it should be flagged for review. | Validity- Values Conform to Business Rule | 
| 5 | Compared to suggestion #3 above, a faster and sustainable option might be to using tablet to check in the cars (as is often done already). Staff could take a picture of the mileage at check-in, which would allow an Optical Character Recognition (OCR) program to warn the staff if the picture they took includes the "TRIP" label instead of the Odometer. This solution assumes the company has a decently advanced IT department- able to implement mobile apps with embedded OCR functions. Or alternatively they could purchase an off the shelf application to accomplish this. | Validity- Values Conform to Business Rule | 
In the next blog we'll be discussing whether commonly cited subjective dimensions of data quality, e.g. Believability, could be used to flag suspect data- and how to do that in a scientific way.
| As always, if you have additional insight into this or similar data entry issues involving humans, please reach out to me dan[at]dqmatters{dot}com, so that we can strengthen these examples of the Conformed Dimensions. | 
 
         
