Source: Instron


The need to ensure the repeatability and reliability of mechanical test results between individual testing systems, whether those systems are located in the same lab or different labs, or a comparison is being made between multiple suppliers, has recently surfaced as a critical concern throughout every industry. This is not surprising since product quality is critical and quality assurance depends on the ability of testing systems to provide accurate results.

Further, as many industries expand R&D and manufacturing operations into different parts of the world, data comparisons have become increasingly more important and complex.

Gage repeatability and reproducibility (GR&R) is a type of statistical analysis that is often performed by quality and product engineers as a method of testing equipment validation and verification. However, in cases where GR&R values are higher than expected, it is necessary to investigate and resolve, or at least minimize, sources of variation.

A nondestructive GR&R using a single controlled specimen is a good way to evaluate a testing system’s basic functionality. If the testing system is not able to successfully pass the nondestructive study, with results under 10%, it will not be possible to get acceptable destructive GR&R results. Source: Instron

Overview of GR&R

A GR&R study is a thorough investigation that provides a statistical approximation of the variation and percent of process variation for a test measurement system. Such studies are recommended by the Automotive Industry Action Group (AIAG), Six Sigma and ISO 9000 quality plans, and suggest that quantification of the repeatability and reproducibility of a testing system is required in order to determine how much of the observed variability is a product of the testing system versus part-to-part variation or process changes.

The term repeatability defines how well the system can produce a known result over multiple tests. Reproducibility is the ability of another operator to produce the same results from similar parts with the same level of consistency. The output of a GR&R study is a quantitative result by which a testing system can be measured.

Statistical norms for GR&R values fall under three different categories. A GR&R value less than 10% is ideal for most measurement systems. This value suggests that the variability in the testing system is negligible and the results can be used to identify variability between parts or differences in production processes. A GR&R value between 10% and 30% suggests that the variability in the system is not negligible but may be acceptable for evaluating part variability. The performance of the testing system should be evaluated for areas of improvement to decrease inconsistencies. Finally, a GR&R value greater than 30% suggests that the error in the system is too great and will prevent differentiation between system error and part variation.

In many quality assurance labs, the purpose of mechanical testing is to ensure that products meet or exceed specifications, to identify changes in process that affect the critical part specifications and to evaluate the consistencies of those processes for reducing part variation. However, there is a danger in using GR&R exclusively as a methodology for evaluating a mechanical testing system.

GR&R does not address accuracy. It is possible to have very low GR&R values and test results that are wrong and not truly representative of the material or the product tested. The ability of a testing system to provide accurate results depends not only on the quality of the testing system, but also on errors that can be introduced into the system.

A fishbone diagram shows the major categories by which a testing system should be evaluated for error sources that can affect both the accuracy of a testing process and the ability of the gage to produce repeatable and reliable results. The major categories include the following: test method, measurement, operator, material, machine and environment. Within these major categories are subcategory sources of error that must be specifically addressed. This is not an exhaustive list of error sources, but rather common sources that should be considered.

When trying to determine and understand errors in mechanical testing results, one must examine the testing system in both a qualitative, physical manner and a quantitative manner. For the former, all elements of the fishbone diagram must be considered and evaluated. For the latter, results must be calculated and analyzed typically through a GR&R study.

Fishbone diagrams show the major categories by which a testing system should be evaluated for error sources that can affect the accuracy of a testing system. Source: Instron

The Experiment

In order to demonstrate these theories in practice, a nondestructive study was conducted on four different materials testing systems from four different manufacturers. The specimens used were three different springs of varying stiffness. The study included both a qualitative examination to evaluate sources of potential error in each system and a quantitative analysis, nondestructive GR&R, to compare mean peak load values and determine a nondestructive GR&R value. The ultimate output was to provide recommendations for improving destructive test results.

The first and most interesting point of discussion is the fact that in this relatively basic compression test, which was designed to over simplify the test parameters of the actual tests on sporting equipment, a test result range of 24 to 26-pounds was seen for each spring between systems. The differences in these mean values were the result of both poor GR&R, as well as sources of error in the testing systems. The following types of errors were uncovered and addressed.

  • Pre-load and standard operating procedures. For one of the testing systems, there were two methods of pre-loading the specimen, automatic pre-load or manual pre-load. The standard operating procedures required the use of a manual pre-load setting because, through investigation, they discovered that the system would always overshoot the automatic pre-load value. Therefore, despite the added effort, these technicians felt more confident with the manual pre-load method in its ability to achieve the desired pre-load value of 5 pounds. However, when using the manual pre-load method, the operators would balance, or zero, both the extension and load transducers. Balancing of the load after the pre-load has been set will result in an approximate 5-pound net lowering in results.

  • Speed accuracy and transducer verification. Another important source of error to examine is speed accuracy. When reviewing the system service records and when looking at the raw data generated from one testing system, it was not clear that the accuracy of the speed had been verified according to ASTM standards. The actual time to complete the test did not correspond with calculated values derived from speed and displacement.

    Although speed accuracy may not be vitally important for testing springs, it is definitely a significant factor to consider when testing strain-sensitive materials. It is an important qualitative step to address before completing any comparative study between systems.

  • Data rate. The data rate on one testing system had a maximum setting of 5 points per second. Because the test only runs for approximately 15 seconds, there are only 75 data points that characterize the load-extension data. For comparison, another system had a data rate of 100 points per second and, therefore, 1,500 data points characterized the load-extension curve.

    When looking for the load value at a specific point-for example 0.25 inches of compression-and there is no exact data point corresponding to that specified point, the software will interpolate the data or pick the next closest point. The more data points that are available, the more repeatable the system will report the correct value.

  • Frame stiffness. System compliance incorporates the mechanical compliance, or system slack, that exists in the testing frame, the load cell and the accessories, and has a significant implication on the total stiffness of the system. A testing system with low stiffness used in this type of compression application on springs will consistently yield lower peak load results than a similar system with a higher stiffness. Because of the slack in the frame with the lower stiffness, the actual distance traveled is lower than expected and therefore, the load values reported will be lower. An extensometer is a device that can be used to overcome this issue and provide better accuracy.

  • Operator error. In previous GR&R studies, it was found that operator error generally tends to be the greatest source of error, as compared with error in the testing system and specimen variability. Software that simplifies the number of steps required by the operator to complete a test and report results minimizes the chance for unnecessary procedural errors.

    For example, one of the testing systems evaluated in this recent spring study did not use software, but rather a control panel, that required the operator to go through a series of repetitive manual steps for each test and manually record test results. It is likely that the increased number of steps and the reliance of the operator to record and translate the data to another system will increase the sources of error in reported data.

    It is important to have very detailed operating procedures, regularly scheduled training for operators and test methodologies that limit the number of steps required by an operator to minimize the chance for error.

    The sources of errors that were revealed and discussed in this study are just a sample of the types of errors that can be identified with both qualitative and quantitative evaluations of a materials testing system. Every testing system is unique, and test setups and configurations can vary widely.

    In the case of a nondestructive test with a spring, the specimen is simplified to limit the errors that can complicate the analysis. When evaluating a destructive test with real test specimens, everything from the environment from which the materials originated, to how they were prepared, to the condition and environment in which they were tested needs to be considered.

    A nondestructive GR&R using a single controlled specimen, such as the one described in this study, is a good way to evaluate a testing system’s basic functionality. If the testing system is not able to successfully pass the nondestructive study, with results under 10%, it will not be possible to get acceptable destructive GR&R results.

    When conducting a destructive GR&R, the methodology described for identifying and resolving sources of error is critical to ensuring accurate, repeatable and reproducible results. Q


  • Tech Tips

  • A GR&R study is a thorough investigation that provides a statistical approximation of the variation and percent of process variation for a test measurement system.

  • A GR&R value less than 10% is ideal for most measurement systems.

  • GR&R does not address accuracy. It is possible to have very low GR&R values and test results that are wrong and not truly representative of the material or the product tested.