Quality Management

Fig. 7.1 illustrates the meaning of bias and imprecision for a measurement procedure. The horizontal axis represents the numeric value for an individual result, and the vertical axis represents the number of repeated measurements with the same value made on samples of a QC material. The blue line shows the dispersion of results for repeated measurements of the same QC material, which is the random imprecision of the measurement. The standard deviation (SD) is a measure of expected imprecision in a measurement procedure when it is performing within specifications. The mean of repeated measurements of a QC sample becomes the expected value for that QC sample.

Fig. 7.1B illustrates that if a systematic bias (error) occurs in the measurements, the mean value shifts to another value. Note that the imprecision is the same as before the bias occurred because it is unlikely, although not impossible, that a change in imprecision would occur at the same time as a bias shift. The primary purpose of measuring QC samples is to statistically evaluate the measurement procedure to verify that it continues to perform within the specifications consistent with its acceptable expected stable condition or to identify that a change in performance occurred that needs to be corrected. QC result acceptance criteria are based on the probability for an individual QC result to be different from the variability in results expected when the measurement procedure is performing in a stable condition within its specifications.

FIG. 7.1 (A) Distribution of results showing the mean value and expected imprecision for repeated measurements of a quality control sample. (B) Bias when a change in calibration has occurred. SD, Standard deviation.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

Fig. 7.2 shows a Levey-Jennings chart that is the most common presentation for evaluating QC results. This format shows each QC result sequentially over time and allows a quick visual assessment of performance. Assuming the measurement procedure is performing in a stable condition, the mean value represents the target (or expected) value for the QC result, and the SD lines represent the expected imprecision. Assuming a Gaussian (normal) distribution of imprecision, the results should be distributed uniformly around the mean with results observed more frequently closer to the mean than near the extremes of the distribution. The data show fluctuations in performance at different intervals and illustrate the importance of calculating the SD over a long enough interval to appropriately include all sources of variability in the measurement procedure. Note that a small number of results in Fig. 7.2 are greater than 2 SDs, and three results slightly exceed 3 SDs, which is expected for an approximately Gaussian distribution of imprecision. The number of results expected within the standard deviation intervals (SDIs) is:

• ±1 SD = 68.3% of observations
• ±2 SD = 95.4% of observations
• ±3 SD = 99.7% of observations

Interpretation of an individual QC result is based on its probability to be part of the expected distribution of results for the measurement procedure when the procedure is performing correctly. Note that evaluation of individual QC results may be performed by computer algorithms without visual examination of a Levey-Jennings chart.

FIG. 7.2 Levey-Jennings chart of quality control (QC) results (n = 1232) for a single lot of QC material used over a 10-month period. SD, Standard deviation.
Reprinted with permission from Miller, W. G., & Nichols, J. H. [2011]. Quality control. In: W. A. Clarke [Ed.], Contemporary practice in clinical chemistry [2nd ed.]. Washington, DC: AACC Press.

Performance of a Measurement Procedure for Its Intended Medical Use

It is necessary to determine how the performance of a measurement procedure relates to the medical requirements for interpreting results to determine the frequency to measure QC samples and the criteria to use to evaluate the QC results. The sigma metric is commonly used to assess how well a measurement procedure performs relative to the medical requirement. For laboratory measurements, the sigma metric is calculated as:

Sigma=(TEa−|bias|)SD

where TE_a is the total error allowed based on medical requirements, and absolute value of bias and SD refer to performance characteristics of the measurement procedure. The SD is estimated from the QC data. It is critically important that the estimate of SD be made using QC data that represent all or most components of variability that occur over an extended time period. The bias is difficult for a laboratory to estimate because it is difficult to evaluate if a particular measurement procedure has a bias compared with a reliable estimate of a true value such as a reference measurement procedure. For internal QC, a laboratory is usually interested to determine if a bias has occurred compared with the condition established by calibration of a measurement procedure, and if a bias has occurred it will be corrected. Consequently, the bias is usually assumed to be zero for calculating sigma.

TE_a represents the measurement procedure performance required for medical decisions based on a test result. TE_a can be estimated using three models. The preferred model (model 1) is to set a performance specification based on an outcome study such as the impact of analytical performance on the clinical outcome. Outcome studies are only available for a small number of analytes and typically included in clinical laboratory practice guidelines.

Model 2 bases the TE_a on a fraction of the within and between individual biological variations of the measurand. (For additional information on biological variation refer to Chapter 4.) This model minimizes the ratio of the “analytical noise” to the “biologic signal,” with an assumption that a small ratio will identify measurement procedure performance that relates to the medical requirements. Tables of optimal, desirable, and minimal TE_a based on biological variation are available.

Model 3 bases the performance specifications on the “state of the art,” which is the performance capability of a measurement procedure usually derived from QC data. The laboratory should consult with clinical care providers to agree on an appropriate TE_a for the patient population served.

Because sigma assumes a Gaussian or normal distribution for repeated measurements, the probability of a defect (i.e., an erroneous laboratory result) can be predicted. The term six-sigma refers to a condition when the variability in the measurement process is sufficiently smaller than the medical requirement that erroneous results are very uncommon. Fig. 7.3A shows a six-sigma test that has the TE_a limits 6 SDs away from the center point of the distribution of variability in measurements. A small amount of bias or increased imprecision will have little influence on the number of erroneous results produced, and the risk of producing an erroneous result even with some loss of performance is very low. Consequently, less stringent QC is suitable.

Fig. 7.3B shows a three-sigma measurement procedure that has the TE_a limits 3 SDs away from the center point of the expected distribution of variability in measurements. In the three-sigma situation, a small amount of bias or increased imprecision will cause the number of erroneous results to increase substantially. More frequent QC and more stringent acceptance criteria will allow the laboratory to identify when small changes in performance occur so they can be corrected to minimize the risk of harm to a patient from erroneous results being acted on to make clinical care decisions.

FIG. 7.3 Test performance relative to the sigma scale to describe how well performance meets medical requirements expressed as the allowable total error (TEa). (A) A six-sigma measurement procedure. (B) A three-sigma measurement procedure.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

Points to Remember

• The performance characteristics of a measurement procedure when it is performing in a stable in-control condition must be known.
• The allowable total error for a measurement procedure must be established based on requirements for using a laboratory result in patient care decisions.
• The sigma metric represents the probability that a given number of erroneous results may occur when the test measurement procedure is performing to its specifications.

Developing a Quality Control Plan and Implementing Internal Quality Control Procedures

Selection of Quality Control Materials

In general, two different concentrations of QC materials are necessary for adequate statistical QC. For quantitative measurement procedures, QC materials should be selected to provide analyte concentrations that represent clinical decision values over the analytical measuring interval of the measurement procedure. More than two concentrations of QC materials may be needed when there are several important medical decision values. In practice, laboratories are frequently limited by concentrations available in commercial QC products. For procedures with extraction or other pretreatment steps, controls must be used to include any pretreatment steps. For qualitative tests, QC values should assess the stability of a threshold for a classification decision.

The QC materials must be manufactured to provide a stable product that can be used for an extended time period when possible. Use of a single lot for a year or more allows reliable interpretive criteria to be established that will minimize the effort and uncertainty associated with QC lot changes.

Limitations of Quality Control Materials

An important limitation of most QC materials, which also applies to EQA or PT materials, is called noncommutability with patient samples. Fig. 7.4A shows that a commutable QC material gives a result that closely agrees with results for authentic patient samples with the same amount of analyte when measured by different procedures. Fig. 7.4B shows that results for a noncommutable QC material have a different relationship than observed for patient samples when measured by different procedures. QC and EQA/PT materials are typically noncommutable with patient samples because the serum or other biologic fluid matrix is usually altered from that of a patient sample during product manufacturing, for example, by use of partially purified human and nonhuman additives to achieve desired concentrations and various stabilization additives and processes. The impact of the matrix alteration on the recovery of a measurand is not predictable and is frequently different for different lots of QC material, for different lots of reagent within a given measurement procedure, and for different measurement procedures. Because of the noncommutability limitation, special procedures are required when changing lots of reagent or comparing QC results among two or more measurement procedures.

A second limitation of QC materials is gradual deterioration of the analyte during storage and after reconstitution, thawing, or vial opening.

Establishing the Quality Control Target Value and Standard Deviation That Represent a Stable Measurement Operating Condition

QC target values and acceptable performance limits are established to optimize the probability to detect a measurement defect that is large enough to have an impact on clinical care while minimizing the frequency of “false alerts” caused by statistical limitations of the criteria used to evaluate QC results.

Quality Control Material Target Value

The generally accepted minimum protocol for target value assignment is to use the mean from a minimum of 10 measurements of the QC material on 10 different days when the measurement procedure is correctly calibrated and performing to its specifications. Because all sources of variability cannot be captured in 10 measurements, it is recommended to update the target value after more data have been acquired during use of the QC material. If a 10-day protocol is not possible (e.g., if an emergency replacement of a lot of QC material is necessary), a provisional target value can be established with fewer data but should be updated when additional QC results are available.

Quality Control Material Standard Deviation

The SD must represent the variability expected for a measurement procedure over an extended time interval to include all sources of variability when its performance meets its specifications. Measurement variability has short time interval components (such as pipetting volume), gradual changes (such as pipet seal deterioration or coating of cuvette or electrode surfaces), or variable and sometimes long intervals (such as calibration cycles, reagent replenishment, and maintenance procedures). An SD that represents stable measurement performance can usually be estimated from the cumulative SD over a 6- to 12-month period for a single lot of QC material because most expected sources of variation are likely to be represented. Fig. 7.5 illustrates the fluctuation in SD that occurred when calculated for monthly intervals compared with the relatively stable value observed for the cumulative SD after a period of 6 months. Note that the cumulative SD is not the average of the monthly values but is the SD determined from all individual results obtained over a time interval since the lot of QC material was first used. If the imprecision expected during normal stable operation is underestimated, the acceptable range for QC results will be too small, and the false-alert rate will be unacceptably high.

When a measurement procedure has been established in a laboratory and a new lot of QC material is being introduced, the target value for the new lot of QC material is used along with the well-established SD from the previous lot. This practice is appropriate because in most cases, measurement imprecision is a property of the measurement procedure and equipment used and is unlikely to change with a different lot of QC material.

FIG. 7.5 Cumulative standard deviation (SD) versus single monthly values calculated from the data in Fig. 7.2.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

TABLE 7.1

Abbreviation Nomenclature for Quality Control Evaluation Rules
Rule	Meaning	Detects
1_2S	One observation exceeds 2 SDs from the target value. The 1_2S rule is not recommended except for low sigma measurement procedures because it has a high false-alert rate.	Bias or imprecision
1_3S	One observation exceeds 3 SDs from the target value.	Bias or large imprecision
2_2S (2_2.5S)	Two sequential observations, or observations for two QC samples measured at approximately the same time, exceed 2 SDs (or 2.5 SDs) from the target value in the same direction.	Bias
2 of 3_2S	Two observations for three QC samples measured at approximately the same time exceed 2 SDs from the target value in the same direction. Note that this type of rule is used when three QC materials are used for a measurement procedure.	Bias
R_4S	Range between observations for two QC samples measured at approximately the same time, or for two sequential observations of the same QC sample, exceeds 4 SDs.	Imprecision
10_x or 10_m	Ten sequential observations for the same QC sample are on the same side of the target value (x or mean). The 10_x rule is not recommended because it has an excessive false-alert rate.	Not recommended
8_1S (8_1.5S)	Eight sequential observations for the same QC sample exceed 1 SD (or 1.5 SD) in the same direction from the target value.	Bias trend
CUSUM	CUSUM of SDI for the current and previous results.	Bias trend
EWMA	EWMA for the current and previous results with newer results having more influence (weight).	Bias trend

From Miller, W. G. (2016). Quality control. In: Henry’s clinical diagnosis and management by laboratory methods (23rd ed.). Philadelphia: Elsevier.

When a new measurement procedure replaces an existing procedure, the SD for the existing procedure can in many cases be used as the initial SD for the new measurement procedure until the SD for the new measurement procedure has been established. An assumption is made that the SD for the existing measurement procedure was appropriate to ensure the results were suitable for use in medical decisions, consequently that SD is likely to be suitable for QC decisions for the new measurement procedure.

If a measurement procedure for a new analyte is introduced, there will not be any historical performance information. The SD is based on QC data obtained during the measurement procedure validation. A minimum of 20 observations on different days is recommended for the initial estimate of the SD. This initial estimate of SD will likely be an underestimate because it does not include all sources of variability and must be updated when sufficient QC results have been accumulated to include most sources of variability.

Quality Control Materials With Preassigned Values

Some QC materials are provided by the measurement procedure manufacturer with preassigned target values and acceptable ranges intended to confirm that the measurement procedure meets the manufacturer’s specifications. Such assigned values may be used to verify the manufacturer’s specifications. However, it is recommended that both the target value and the SD should be reevaluated and assigned by the laboratory after adequate replicate results have been obtained because the QC interpretive rules used in a single laboratory should reflect performance for the measurement procedure in that laboratory.

QC materials with assigned target values and SDs are also available from third-party manufacturers (i.e., manufacturers not affiliated with the measurement procedure manufacturer) and typically have values that are applicable to specifically stated measurement procedures and reagent lots to accommodate the influence of noncommutability. However, a laboratory should determine a target value and SD that reflect the operating conditions in that laboratory to ensure optimal assessment of QC results to identify an erroneous measurement condition.

Establishing Rules to Evaluate Quality Control Results

The acceptable range and rules for interpretation of QC results are based on the probability of detecting an analytical error condition with an acceptably small false-alert rate.

The conventional way to express QC interpretive rules is by using an abbreviation nomenclature popularized among clinical laboratories by Westgard and summarized in Table 7.1. Note that fractional SDIs can be used as in the 2_2.5S and 8_1.5S examples and that combinations of numbers of controls and limits can be used as appropriate for QC interpretive rules. Trend detection procedures such as cumulative sum (CUSUM) or exponentially weighted moving average (EWMA) are recommended if supported by an available computer system because they are more powerful for detecting trends than approaches based on counting the number of sequential observations exceeding a specified SDI. A trend rule can be set to give an alert as a warning that may not require discontinuing testing but indicate that a problem is developing that should be investigated.

TABLE 7.2

Empirical Multirule for the Quality Control Data Presented in Fig. 7.2
Multirule Components	Type of Variability Detected
1_3S	Imprecision or bias
2_2.5S	Bias
R_4S	Imprecision
8_1.5S	Bias trend

From Miller, W. G. (2016). Quality control. In: Henry’s clinical diagnosis and management by laboratory methods (23rd ed.). Philadelphia: Elsevier.

In practice, empirical judgment is frequently used to establish acceptance criteria (rules) to evaluate QC results based on data acquired over a long enough period of time to adequately estimate the expected variability when a measurement procedure is working correctly. An empirical approach can be used by obtaining a set of QC data that represents stable measurement procedure performance over a time interval expected to include most sources of variability. Using those data, the false-alert rate for a rule can be determined, and bias errors of different magnitudes can be added to estimate the ability of a rule, or a combination of rules, to identify that error. Table 7.2 gives an example of an empirically developed multirule for a 6-sigma measurement procedure. Such control rules should allow the laboratory to detect errors before they are of a magnitude that will affect clinical decisions. A 10_x rule was not used because it would have increased the false-alert rate by 10.6%. A 10_x rule or other rule that counts the number of sequential QC results on one side of the target value is not recommended because this condition typically does not indicate a problem with clinical interpretation of patient results when the magnitude of the difference from the target value is small. Counting the number of sequential results that exceed a larger SD from the target value, such as 8_1.5S in this example, is more likely to represent a measurement condition that might need investigation.

For measurement procedures with small sigma values, small deviations from the expected performance need to be identified. Consequently, more stringent QC practices need to be used such as selecting rules such as 1_2S that will give an alert at smaller error conditions, using additional rules in a multirule set, measuring QC more frequently, using more than two QC samples, and not releasing patient results until QC assessment is complete for the time interval during which patient samples were measured. More stringent QC rules will have more false alerts, but this is an unavoidable cost when lower sigma measurement procedures are used.

Specifying the Quality Control Plan

The preceding subsections describe the considerations for each component in a QC plan. The laboratory director is responsible for considering the components, making judgments regarding the considerations, and approving the final plan for each analyte measured in a laboratory. A plan for internal QC specifies the following components:

• The number of QC samples to be measured and the approximate concentrations of analytes in those controls
• The target value for each QC sample
• The SD for each QC sample to be used in the QC rules
• The rules for evaluating the QC results
• The frequency to test the QC samples

Considerations for Point-of-Care Testing

Internal QC of POC instruments offer extra challenges compared with those addressed at the central laboratory. The main reasons are that POC instruments are often operated by persons without laboratory training; they often use methodologic principles that are different from those in the central laboratory; they often have “built-in” controls; and the number of measurements can be small, making the use of QC samples in the traditional way expensive. In cartridge-based and some strip-based instruments, the manufacturer often has placed the technology in the cartridge or strip together with QCs, and in some cases, QC rules are built in so that patient results cannot be reported unless the QC is “satisfactory.” The instrument is then merely an electronic reader that often has incorporated an “electronic quality control” that verifies the electronics of the measurement procedure. The electronic instrument checks do not verify the reagents in the cartridges or strips, and unless each cartridge has internal QC materials, the reagent cartridges or strips should be checked at delivery and then at intervals (e.g., with the arrival of a new shipment or lot or at a suitable interval such as weekly or monthly depending on the cartridges/strips).

Not all POC instruments include enhanced QC features. In these cases, one must rely on daily liquid QC performed by the operator. The limitation of using the liquid QC sample in this situation is that it only checks if one disposable cartridge or strip meets the performance specifications. This limitation requires an assumption that all devices in a lot were manufactured uniformly and will perform equivalently.

How internal QC should be performed and supervised also depend on the location of the POC instruments. In a hospital, it is now possible with real-time bidirectional connectivity between the POC devices and the central laboratory to transfer both patient and QC results and to set lock-out parameters for conformance to a QC protocol. As technology advances, the general trend is for more sophisticated POC devices with built-in control systems to be incorporated to minimize or prevent the possibility for an incorrect result.

Corrective Action When a Quality Control Result Indicates a Measurement Problem

A QC alert occurs when a QC result fails an evaluation rule, which indicates that an analytical problem may exist. A QC alert means there is a high probability that the measurement procedure is producing results that have the potential to be unreliable for patient care and testing must be stopped until the problem is resolved. Fig. 7.6 presents a generalized troubleshooting sequence. QC materials can deteriorate after opening because of improper handling and storage or because of unstable analytes. Thus repeating the measurement on a new vial of the QC material is a useful step to determine if the alert was caused by deteriorated QC material rather than by a measurement procedure problem. In this situation, if the result for the new QC sample is acceptable, testing of patient samples can resume. One caution when the repeat QC result is near acceptability limits is to consider whether the repeat and original results are essentially the same. It is not acceptable to repeat the QC until a value happens to be just within the acceptable limit. In this situation the probability is high that a measurement problem exists, and this possibility should be investigated. In addition, current and preceding QC results should be examined for a trend in bias that indicates a measurement issue that needs to be corrected. These precautions in evaluating repeat results for a new QC sample can be challenging or impossible for automated evaluation by computer systems, thus requiring the laboratory technologist to be vigilant in reviewing results.

FIG. 7.6 Generalized troubleshooting sequence showing the initial steps after an unacceptable quality control (QC) result. The details of troubleshooting the defect may be different for different rules violations or if more than one unacceptable QC result was obtained.

When repeat testing of a new QC sample does not resolve the alert situation, the instrument and reagents should be inspected for component deterioration, empty reagent containers, mechanical problems, and so on. In many cases, it will be necessary to recalibrate. When the problem is identified and corrected, QC samples should be measured to verify the correction, and all patient samples since the time of the last acceptable QC results, or the time when the error condition occurred, should be measured again. The laboratory director must establish acceptable criteria to determine if the repeat results agree adequately to permit reporting of original results without issuing a corrected report. Otherwise, corrected results must be reported. The criteria for acceptability of repeated tests are based mainly on the TE_a described earlier with consideration of the measurement procedure performance characteristics.

It may be difficult to establish the time when an error condition occurred. One approach is to repeat every few samples back to the time of the last acceptable QC results. The repeated results are then compared with acceptable criteria for repeated results agreement to identify a point in time when the error condition occurred. When selecting the samples to repeat, it is important to ensure that a substantial representation of the potentially erroneous samples is repeated and that samples at a concentration consistent with that of the unacceptable QC are represented. Alternatively, groups of 10 patient samples can be repeated, again ensuring that samples at a concentration consistent with that of the unacceptable QC are represented, until all repeat results in at least two sequential groups are within acceptable criteria for repeated results. When the point at which the error condition was likely to have occurred is identified, all patient samples must be repeated from that point until the unacceptable QC result was obtained. Any assessment of the point at which an error condition occurred by repeating selected patient samples has a risk to incorrectly identify that point, and laboratories are encouraged to repeat enough patient samples to have confidence in the assessment.

Verifying Quality Control Evaluation Parameters After a Reagent Lot Change

Changing reagent lots can cause a shift in QC results when there is no change in results for patient samples. Because the matrix-related noncommutability between a QC material and a reagent can change with a different reagent lot, QC results are not a reliable indicator of a measurement procedure’s performance for patient samples after a reagent lot change. In the example in Fig. 7.7, QC values for the high-concentration control shifted after the change to a new lot of reagents, but there was no change in results for the low control. A comparison of results for a panel of patient samples assayed using the new and old reagent lots showed equivalent results. Consequently, the change in QC values for the high-concentration material was due to a difference in matrix-related noncommutability bias between the QC material and each of the reagent lots.

It is necessary to use patient samples to verify the consistency of results between old and new lots of reagents because of the unpredictability of a matrix-related noncommutability bias being present for QC materials. Fig. 7.8 presents a protocol to verify or adjust QC material target values after a reagent lot change. A group of patient samples and the QC samples are measured using both the current (old) and new reagent lots. The first step is to verify that results for a group of patient samples measured with the new reagent lot are consistent with results from the current (old) lot. The patient sample results, not the QC results, provide the basis for verifying that the new reagent lot is acceptable for use. If a problem is identified, the calibration of the new reagent lot must be investigated and corrected, or the new reagent lot may be defective and should not be used. When evaluating the patient results, keep in mind that the calibration of the old reagent lot may have drifted and should be verified before concluding that the new reagent lot is not giving acceptable results for the patient samples.

FIG. 7.7 Levey-Jennings plot showing impact of a reagent lot change on matrix bias with quality control (QC) samples.
Modified with permission from Miller, W. G., & Nichols, J. H. [2010]. Quality control. In: W. A. Clarke [Ed.], Contemporary practice in clinical chemistry [2nd ed.]. Washington, DC: AACC Press.

The number of patient samples to use for verifying the performance of a new reagent lot will depend on the measuring interval, the imprecision of a measurement procedure, and the concentrations at which clinical decisions are made. CLSI document EP26 recommends a minimum of three patient samples and more patient samples depending on the number of important clinical decision concentrations and the imprecision of a measurement procedure. This CLSI guideline includes a statistical analysis to determine if a difference in patient results is less than a critical difference that would represent risk for an inappropriate patient care decision based on a particular laboratory test result. An alternate approach is to select 5 to 10 patient samples that span the measuring interval and use a difference plot to evaluate average performance over the interval of concentrations represented by the patient samples. The laboratory must establish acceptance criteria for the agreement of patient results for old and new lot measurements consistent with the relatively small number of samples used, the analytical performance characteristics of a measurement procedure, and the clinical requirements for interpreting results.

When the results for patients are acceptable, the second step in Fig. 7.8 evaluates results for each QC material to determine if its target value is correct for use with the new lot of reagent(s). If the target value has changed, it must be adjusted to correct for the change in matrix-related noncommutability bias between old and new lots of reagent(s). This adjustment keeps the expected variability centered around the QC target value so that QC interpretive rules will remain valid.

FIG. 7.8 Process for assessment of potential matrix-related noncommutability bias on quality control (QC) samples after a reagent lot change. SD, Standard deviation.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

Failure to make a target value adjustment will cause subsequent QC results to be evaluated incorrectly, as illustrated in Fig. 7.9. The shift in target value would cause some of the results shown by blue squares to exceed the old upper QC rule limit, when in reality there is no defect in patient results because the increase in QC results is caused by the matrix-related noncommutability bias with the new reagent lot. Similarly, the increased magnitude of the gap to the old lower QC rule limit will permit a low bias condition, as shown by the blue square points at sequence number 29 and 30, to be undetected. In most cases the SD for a QC material will be the same with any lot of reagent(s).

Note that a reagent lot induced matrix-related noncommutability bias change in the numeric values for the QC results will cause an incorrect increase in the cumulative SD if all results are used for the calculation. For this reason, it is recommended to use the cumulative SD from a single reagent lot or the pooled SD from more than one reagent lot when determining the SD to use for interpreting QC rules.

Experience in clinical laboratories has shown that there are changes, other than reagent lot changes, in measurement procedures that can also affect the QC values but not the results for patient samples. Such changes could be caused by instrument component replacement or other causes. In theory, there should be an assignable cause for such effects, but such a cause is not always identifiable. In practice, any condition that affects QC results but does not affect patient results is treated in the same manner as described for reagent lot changes. The important QC principle is that if the results for patient samples are consistent between the two conditions, then the target value for the QC sample should be adjusted, if necessary, to reflect its value under the new condition. Failure to adjust the QC target value will cause inappropriate acceptability criteria to be used for evaluating the QC results.

Review of Quality Control Data and the Effectiveness of the Quality Control Plan

The immediate use of QC data is to determine if the results for patient samples can be reported for use in clinical care decisions, as described in the preceding sections. In addition, QC data must be reviewed by laboratory management on a regular schedule. Typical review schedules are weekly by senior technologists or supervisors and at least monthly by the laboratory director. However, the laboratory director or supervisor should promptly review items such as reagent or calibrator lot change validations, changes in QC target values associated with reagent lot or other changes, EQA/PT results review, and other occurrences that may affect quality of the laboratory results.

The weekly review process should determine that correct follow-up of any QC alerts was conducted, that all patient samples that may have had erroneous results were repeated, that any corrected reports were issued, and that the process was properly documented in QC records. The monthly review should include any issues identified by the weekly review process, as well as examination of the Levey-Jennings chart or a computer-based report, to identify trends or changes in assay performance that may need to be addressed before they have effects on clinical care decisions. Note that automated systems to assist in the review of QC data are acceptable, and individual Levey-Jennings charts do not need to be examined every month. The monthly review should also include any adjustments made to QC parameters or the QC plan for a measurement procedure during the month.

FIG. 7.9 Illustration of the influence on the failure rate for a quality control (QC) rule when failing to adjust the target value for a matrix-related shift. QC results before a new reagent lot are shown as gray circles and after the new reagent lot as blue squares.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

Points to Remember

• Internal QC samples are measured along with patient samples.
• The target value and SD expected for a QC sample are established by the laboratory.
• Results from QC samples are evaluated using interpretive rules that are established after considering the probability for false alerts and the probability for detecting errors that represent a risk of harm to a patient.

External Quality Assessment or Proficiency Testing

EQA/PT is used to evaluate measurement procedure performance by comparing a laboratory’s results with those of other laboratories for the same set of samples. An EQA/PT provider circulates a set of samples among a group of laboratories that can be several thousand in larger programs. Each laboratory measures the EQA/PT samples as if they were patient samples and reports the results to the EQA/PT provider for evaluation. The EQA/PT provider establishes target values for the EQA/PT samples and determines if the results for an individual laboratory are in close enough agreement with the target value to be consistent with acceptable measurement procedure performance.

EQA/PT is not available for some analytes because a particular measurement procedure may be new to the clinical laboratory or is not commonly performed or because analyte stability makes it difficult to include in an EQA/PT sample. In these situations the laboratory should use an alternate approach to periodically verify acceptable performance of the measurement procedure. CLSI guideline GP27 provides approaches for verifying measurement procedure performance when formal EQA/PT is not available.

Internal QC material manufacturers may provide a data analysis service that compares results from different laboratories using the same QC material by calculating group statistics for performance evaluation. As with EQA/PT evaluation, this type of interlaboratory QC data analysis allows a laboratory to verify that it is producing QC results that are consistent with those of other laboratories using the same measurement procedure. This information can be helpful for troubleshooting measurement procedure issues and for assessing performance of a new measurement procedure being introduced to a laboratory.

External Quality Assessment or Proficiency Testing Programs That Use Commutable Samples

EQA/PT programs that use commutable samples are preferred whenever available. Commutable samples are typically prepared by using an individual donor’s specimen or by pooling clinical patient samples with minimal processing or additives to avoid alteration of the sample matrix. When commutable EQA/PT samples can be prepared, the results reflect what would be expected if individual patient samples were sent to each of the different laboratories. Thus agreement among different laboratories and measurement procedures (harmonization) can be correctly evaluated. The agreement between an individual laboratory result and a reference measurement result gives an assessment of correct calibration for the laboratory. The agreement between an individual laboratory result and an all-methods mean gives an assessment of harmonization of results with other measurement procedures. The agreement between a measurement procedure group mean value and the reference measurement result gives an assessment of trueness and calibration traceability for the measurement procedure group. In addition, the agreement between a measurement procedure group mean value and an all results mean gives an assessment of harmonization of results. The information for measurement procedure groups is of particular interest to the producers of measurement procedures and can be used as part of a surveillance program for the calibration traceability scheme.

External Quality Assessment or Proficiency Testing Programs That Use Noncommutable Samples

The materials commonly used for EQA/PT samples are derived from blood, urine, or other body fluids but are altered in the process to manufacture EQA/PT samples such that the matrix is modified and the samples frequently do not have the same measurement characteristics as observed for unaltered clinical patient samples. In addition, some EQA/PT samples (e.g., urine, cerebrospinal fluid, or blood gas) are prepared as synthetic materials that are not derived from patient fluids. Consequently, many EQA/PT samples, as for QC samples, are noncommutable with authentic patient samples. The results for a noncommutable EQA/PT sample will have a different relationship in their numeric values between different measurement procedures and sometimes for different reagent lots within a measurement procedure than would be observed for patient samples.

It is a common practice for EQA/PT providers to organize results for noncommutable samples into “peer groups” of measurement procedures that represent similar technology expected to have the same result for a noncommutable EQA/PT sample. The mean or median value of the peer group results is the target value. Because the peer group mean value may be influenced by a matrix-related noncommutability bias, that mean value can be used only to evaluate laboratories using the same or very similar measurement procedures and cannot be used to evaluate laboratories using other measurement procedures or to evaluate if results from different measurement procedures agree with each other. If an individual laboratory’s results agree with those of the peer group, the individual laboratory can conclude that the measurement procedure was performing in conformance with the manufacturer’s specifications. An individual laboratory cannot use results for noncommutable EQA/PT samples to verify that a measurement procedure is calibrated correctly to be traceable to the reference system for an analyte.

However, even within a peer group using the same measurement procedure, differences can occur because of different reagent lots used by the measurement procedures in different laboratories because the matrix of the EQA/PT material can influence the results from different reagent lots when the patient samples give similar results. Therefore, in some cases, reagent lots should be registered as part of the EQA/PT reporting, and even reagent lot–specific target values may need to be assigned.

Reporting External Quality Assessment or Proficiency Testing Results When One Measurement Procedure Is Adjusted to Agree With Another Measurement Procedure

It is good laboratory practice to adjust the calibration of different measurement procedures for the same measurand used within a large hospital system that can have several satellite laboratories or within a collection of several hospitals with the same management structure, so that the results for patient samples are consistent, irrespective of which measurement procedure is used. Such harmonization of results is important for uniform use of reference intervals and decision thresholds within a hospital or clinic system. It is important to report EQA/PT results such that they can be properly evaluated against the peer group target value. The peer group target value will reflect the measurement procedure calibration established by the measurement procedure manufacturer. For an individual laboratory’s EQA/PT result to be evaluated against the peer group mean, that individual result must be reported to the EQA/PT provider after removing any calibration adjustments so that the reported result is consistent with the manufacturer’s nonadjusted calibration.

The most convenient way to remove a calibration adjustment is to first measure the EQA/PT samples with the calibration adjustment applied to the measurement procedure, as would be the usual measurement process for patient samples. After the measurement, the EQA/PT results should be adjusted “in reverse” by mathematically removing the calibration adjustment factors, and the results should be reported to the EQA/PT provider with any adjustment factors removed. One should not recalibrate the instrument with a new set of calibrators for the purpose of measuring the EQA/PT samples because this practice would violate regulations requiring the EQA/PT material to be measured in the same manner as patient samples. This process permits the EQA/PT sample to be measured in the same manner as patient samples and the numeric result reported to the EQA/PT provider to reflect the actual measured result using the manufacturer’s calibration settings.

Interpretation of External Quality Assessment or Proficiency Testing Results

Many countries have regulations requiring EQA/PT and specifying the evaluation criteria for acceptable performance. When criteria are set by regulations, an EQA/PT provider is required to use them. When criteria are not set by regulations, the EQA/PT provider sets evaluation criteria on the basis of clinically acceptable performance, biological variation, or the analytical capability of the measurement procedures in use. EQA/PT evaluation criteria are usually designed to evaluate the total error of a single measurement. In some programs, measurements are made several times, and it is possible to separately assess the bias and the imprecision. The acceptability limits for EQA/PT include bias and imprecision components considered acceptable for clinical use of a result, plus other error components that are unique to EQA/PT samples such as between-laboratory variation in calibration; variable matrix-related noncommutability bias with different lots of reagent within a peer group; uncertainty in the target value; stability variability in the EQA/PT material, both in storage and shipping, and after reconstitution or opening in the laboratory; and homogeneity of the EQA/PT material vials. Consequently, the acceptability limits for EQA/PT samples are frequently larger than what might be expected for clinically acceptable total error with patient samples.

FIG. 7.10 Example of an external proficiency testing evaluation report sent to a participating laboratory. Part A uses conventional units and part B uses SI units. SD, Standard deviation; SDI, standard deviation interval.
From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.

Fig. 7.10 is an example of a typical evaluation report sent to a participating laboratory. Each reported result is compared with the mean result for the peer group using the same measurement procedure. The report also includes the SD for the distribution of results in the peer group, the number of laboratories in the peer group, and the SDI (also called a z-score), which expresses the reported result as the number of SDs it is from the mean value (SDI = [result − mean]/SD). The limits of acceptability are shown. Acceptability criteria may be a number of SDs from the mean value, a fixed percent from the mean value, or a fixed concentration from the mean value. For example, in Fig. 7.10, calcium acceptability criteria are ±1 mg/dL (0.25 mmol/L) from the mean value, and iron criteria are ±20% from the mean value.

FIG. 7.11 Example of part of a feedback report to hemoglobin A_1c (HbA_1c) point-of-care (POC) users in a survey for general practitioners’ offices and nursing homes. Commutable external quality assessment/proficiency testing material was circulated in two levels and measured in duplicate. The participant is informed about the bias (mean of the two results) compared with a reference measurement procedure target (x-axis) and “precision” as the difference between the two results. The histogram represents the distribution of results among all participants (light blue) and for the participant’s method group (dark blue). The thick black line represents the interval for “good” results, and the thin black line represents the interval for “acceptable” results. Results outside these limits are characterized as “poor.” The triangle points to the result of the participant.
Modified with permission from the Norwegian Quality Improvement of Primary Care Laboratories, the external quality assessment provider in Norway.

In Fig. 7.10 the calcium results are in close agreement with the peer group mean (SDI ranges from −0.2 to −1.4), and the laboratory can conclude that its results are consistent with those of others in the peer group using the same measurement procedure and that it is using the measurement procedure according to the manufacturer’s specifications. However, the iron results show greater variability, with one result +3.5 SDI. Although all iron results are within the acceptability criteria, it is recommended to investigate the measurement procedure because a +3.5 SDI is more likely to be different from other participants in the peer group than to be in agreement with them.

Fig. 7.11 shows another type of evaluation report sent to a primary care office for hemoglobin A_1c (HbA_1c) for one of two EQA/PT samples. In this situation the EQA/PT provider is communicating directly with the clinician or the coworker in the general practice office, and the feedback must be easy to understand for nonlaboratory professionals. The EQA/PT result is evaluated as “good,” “acceptable,” or “poor.” The lot numbers of the reagent are registered so that the participant, in case of an aberrant result, can get information if the result was due to the measurement procedure used, the reagent lot used, or the performance of the user.

Fig. 7.12 shows a similar report from the same HbA_1c survey provided to hospital laboratories. In addition to the figures about the distribution of results, information is provided on how different measurement procedures performed, as well as a historical overview of performance on consecutive EQA/PT samples and performance related to the concentration of the sample. The EQA/PT material used for the HbA_1c is pooled fresh patient blood (commutable), and the target value is set by a reference measurement procedure and is therefore the same for all measurement procedures. Each sample was measured in duplicate (as requested by the EQA/PT provider), and the mean of the duplicate was used to estimate bias versus the reference measurement procedure. In the present example the performance was within the acceptability limits but with a generally high bias during the whole period. Because this observation was true for all the instruments using this measurement procedure, the EQA/PT organizer discussed the results with the manufacturer to solve the problem. Until the problem was solved (the manufacturer had to make a new calibrator), the participants were advised by the EQA/PT provider to use a correction factor when reporting their results for patient samples.

If an unacceptable EQA/PT result is identified, the measurement procedure must be investigated for possible causes and the necessary corrective action taken. Even when an EQA/PT result is within acceptability criteria, it is a good laboratory practice to investigate results that are more than approximately 2.5 SDI from the peer group mean. When the SDI is 2.5, there is only a 0.6% probability that the result will be within the expected distribution for the peer group; consequently, the probability is reasonable that a measurement procedure problem may need to be corrected.

Common causes for EQA/PT failure are listed in Box 7.1. Incorrect handling and reporting are unique to EQA/PT events and may not reflect the process used in the laboratory for patient samples. Because the influence of reagent lots on noncommutability related bias is well documented, a reagent lot-specific bias is a possible explanation when no other root cause can be identified.

Points to Remember

• An independent external organization circulates EQA/PT samples with unknown target values.
• When commutable “patient-like” material is used, a laboratory can compare its results with results from all other measurement procedures and often with a true value from a reference measurement procedure.
• When noncommutable material is used, a laboratory can compare its results only with results from participants in a “peer group” using a similar measurement procedure.

FIG. 7.12 Example of a part of a feedback report to hemoglobin A_1c (HbA_1c) users in hospital laboratories. Same survey and same materials as presented in Fig. 7.11. The histogram represents the distribution of results among all participants (light blue) and for the participant’s method group (dark blue). Only limits for “acceptable” (Acc.) results are given (thin black lines in figures). Information about performance of measurement procedures is given in addition to a historical overview of percentage deviation from target values dependent on time and concentration of HbA_1c. CV, Coefficient of variation; HPLC, high-pressure liquid chromatography; SD, standard deviation.
Modified with permission from the Norwegian Quality Improvement of Primary Care Laboratories, the external quality assessment provider in Norway.

BOX 7.1 Classification of Potential Problems Identified When Investigating Unacceptable External Quality Assessment or Proficiency Testing Results a

1. Clerical errors
2. Measurement procedure problems
3. Equipment problems
4. Technical problems caused by personnel errors
5. A problem with the EQA/PT material such as:

EQA, External quality assessment; PT, proficiency testing; QC, quality control; SOP, standard operating procedure.

From Miller, W. G., Jones, G. R. D., Horowitz, G. L., & Weykamp, C. (2011). Proficiency testing/external quality assessment: current challenges and future directions. Clinical Chemistry, 57, 1670–1680.

Measurement Procedure Performance as a Prerequisite for a Quality Control Plan

Analytical Bias and Imprecision

Performance of a Measurement Procedure for Its Intended Medical Use

Developing a Quality Control Plan and Implementing Internal Quality Control Procedures

Selection of Quality Control Materials

Limitations of Quality Control Materials

Frequency to Measure Quality Control Samples

Analytical Stability of the Measurement Procedure

Risk of Harm to a Patient and Number of Patients Who May be at Risk

Event-Based Quality Control Sample Measurement

Establishing the Quality Control Target Value and Standard Deviation That Represent a Stable Measurement Operating Condition

Quality Control Material Target Value

Quality Control Material Standard Deviation

Quality Control Materials With Preassigned Values

Establishing Rules to Evaluate Quality Control Results

Specifying the Quality Control Plan

Considerations for Point-of-Care Testing

Corrective Action When a Quality Control Result Indicates a Measurement Problem

Verifying Quality Control Evaluation Parameters After a Reagent Lot Change

Review of Quality Control Data and the Effectiveness of the Quality Control Plan

External Quality Assessment or Proficiency Testing

External Quality Assessment or Proficiency Testing Programs That Use Commutable Samples

External Quality Assessment or Proficiency Testing Programs That Use Noncommutable Samples

Reporting External Quality Assessment or Proficiency Testing Results When One Measurement Procedure Is Adjusted to Agree With Another Measurement Procedure

Interpretation of External Quality Assessment or Proficiency Testing Results