Face of Quality | Jim L. Smith
www.qualitymag.com/articles/98612-data-integrity-use-quality-tools-and-principles-for-greater-data-accuracy
Two professional technicians working in an industrial plant, one holding a tablet.

Image Source: Thirawatana Phaisalratana / iStock / Getty Images Plus

Data Integrity: Use quality tools and principles for greater data accuracy.

March 8, 2025

It’s a safe bet most everyone reading this column has been faced with a deadline to complete a project report. If that’s not stressful enough, you’re in the middle of analyzing experimental data necessary for the report when you realize many of the results that were manually entered into the computer from handwritten data sheets appear to be typos! Stress now escalates to panic mode.

It’s not possible for you to even isolate all the errors, let alone correct all of them in time to meet your delivery deadline. Now, what are you going to do? The right thing to do is to be honest with your boss and request more time to correct the errors so the report will be based on reliable data.

Errors will occur when someone inputs data manually. No one is perfect. The question is how to discover which entries are incorrect or appear to be unusual enough to warrant validation.

Techniques to find errors have evolved from the tedious exercise of manually verifying each entry to doing double entries in separate files and making electronic comparisons to having someone else perform the checks. You name it, it’s likely been tried.

These methods are weak because they confirm that 100% inspection is less than 100% effective. One of the greatest quality giants of the 20th century, Dr. Joseph M. Juran, indicated that, based on his studies performed on inspector accuracy, 100% inspection is about 87% effective. So how can the discovery of potential data entry errors be made more effective?

The objective when dealing with numbers is to discover potential special causes. What better way to find special causes than to use control chart methodology to assist in this process.

The type of data to be checked and the way they are grouped may influence the use of a particular control chart. The most universal approach for checking many types of data is using an individual’s chart with a moving range chart. Occasionally average and range charts are used. It depends on how the data are grouped and the magnitude of differences between groups. Through experience, you will know which chart to use, but when in doubt, use the individual’s chart.

Selective sorting may help when the magnitude of differences between data groups is large. Make an individual’s chart for each column if several columns of data have been recorded. Note the line number where there are verifiable errors in one column as these may correspond to the same line in other columns.

After identifying out-of-control signals, it’s time to validate whether these are real or possible errors. Look for some of the typical errors that surface quite often.  A few of the common ones are missing or misplaced decimal points, reversed numbers, missing zeros, exact numbers (e.g. 3.000 instead of 3.128), etc.

Even conscientious efforts result in error rates in the 1-2% range for these types of errors. When an experimenter is trying to discover a 5% significant event, these kinds of errors can create wasteful forays when none are warranted, especially when several hundred data entries have been made.

Of course it’s entirely possible, or even probable, that some errors will go undetected. Indeed, what was originally a special cause value could be entered incorrectly and end up looking like all the other numbers.

What if you are working with non-numeric or categorical data? It might be best to use a program like Microsoft Excel and make use of the code function in the software. It will transform alphanumerical characters into their corresponding ASCII number, which can then be used for input for control charts.

Another technique would be to sort the data column and scan it for odd strings. Or use Excel’s built-in automatic filter for each column and click on each unique occurrence or an alphanumeric string to find fliers. This can even be used for numerical data if the data set is not too large.

Analyzing data from any experiment or quality improvement activity requires trust in the data’s accuracy, especially when they have been manually entered into statistical software. On a similar note, recommendations resulting from an analysis of experimental data cannot be trusted if the data entry is unreliable. Therefore, it’s imperative the analyzer devote sufficient time to make sure data inputs are as reliable as possible. The use of selective quality tools like control charts and application of a few principles can be extremely helpful in increasing data accuracy.