Accuracy, Fitness, Velocity, & Proportionality
Data are used to represent real world events or objects and are collected to serve multiple purposes, e.g., to justify tax deductions, memorialize customer orders, or identify customer trends. Individual data points could be the price paid for an item, its weight, color, and dimensions, or they could be the names of contracting parties, the date of the beginning term, and the venue that can be used to resolve disputes.
While data accuracy is an important dimension of data quality, there are other dimensions that are equally important: fitness, velocity, and proportionality. Evaluating these dimensions requires that the organization define the intended or desired use of the data and assess the desirability of improving that data’s quality. The four data quality dimensions are:
- Accuracy. Data accuracy is a measure of how well the data serve as representations of the underlying events or objects. Usually this involves the notion of replicability: if the real world event or object were described or measured a second time would the descriptions or measurements be the same? Close enough considering the purpose? Precision is another important aspect of accuracy. Gravel might be weighed to the nearest 10 pounds while gold might be measured to the thousandth of a gram.
- Fitness. Fitness determines how well a given data representation is suited for achieving a specific result considering the type of media holding the data and the format of the data on the media. Does the organization just want to be able to find something or create ordered lists or perform statistical analysis on the data? For example, handwritten inventory cards may be 100% accurate, but the data may require some sort of manual or automated data conversion to achieve some purposes. PDF change order requests might correctly identify the project, date, and person submitting but not permit sorting by date or listing all change orders in a project.
Fitness for a particular purpose takes a holistic view of how to achieve a given result and can involve converting media and formats of multiple sources to achieve the desired purpose, e.g., correlating patient age and diseases could involve converting one set of records that provide name, patient ID, and age, and another set of records that have patient name, patient ID, and diagnostic code.
- Velocity. Velocity measures how quickly the data can be converted to the representation or format required to achieve a stated purpose. Data in tables with defined rows and cells are usually thought of as the having the most velocity. The same is true with data that have been tagged to indicate what property or type of data is represented by individual data values. Low velocity can mean the data can’t be made fit for a particular purpose quickly enough to be useful for that purpose, e.g., can a set of auditors reports be made useful in the time allotted for pre-merger due diligence?
- Proportionality. Proportionality is a concept used in civil litigation to determine whether the potential evidentiary value of a specific discovery request is greater than the cost and burden of producing it considering all the factors such as the value of the underlying controversy. It’s a useful construct in information governance because it invites comparing the business value of improving the quality of given data with the cost and burden of making those improvements.
Technology for converting, tagging, or structuring individual data sources can have a dramatic impact on improving data fitness and velocity while lowering the costs associated with the improvement. An example of this is visual classification and attribute extraction technology that can convert paper format or unstructured electronic format files into structured, validated data representations.
For more insight on how to manage unstructured content, sign up to receive notice when the Guide to Managing Unstructured Content, Practical Advice on Gaining Control of Unstructured Content, is published: