2. Explain how to establish a target value and SD for use with a quality control material in the following situations: a new lot of quality control material replaces an existing lot for an analyte; a new measurement procedure replaces an existing procedure for the same analyte; a new measurement procedure for a new analyte is introduced to the laboratory.
3. Explain how to establish rules for evaluating quality control sample results to confirm that results for patient samples from a measurement procedure are acceptable to make medical decisions.
4. Explain how to verify quality control (QC) target values following a reagent lot change.
5. Explain the components of a quality control plan and how the plan is established.
6. Explain the actions taken when a failed quality control result is obtained.
7. Explain how to evaluate results from external quality assessment (EQA) or proficiency testing.
8. Explain why peer group evaluation is used in EQA and what information a laboratory gets about its measurement procedure performance from the data.
9. Explain the limitations in using peer group mean values to assess the relationship of results among different measurement procedures.
10. Explain the value of EQA that uses commutable samples.
11. Explain how quality control information is documented and reviewed by laboratory staff.
Key Words and Definitions
Coefficient of variation Also called relative standard deviation; calculated as the standard deviation (SD) divided by the mean and multiplied by 100 to express in percent.
External quality assessment Also called external quality control or proficiency testing; an assessment process in which samples that simulate patient specimens are received from an external organization and results compared to those from other laboratories or from a reference method to determine that a measurement procedure’s performance meets preestablished criteria to be suitable for use in medical decisions.
Internal quality control Term used to refer to quality control as defined here to confirm that results from a measurement procedure are suitable for use.
Levey-Jennings chart A graphical display with observed control values plotted on the y-axis and time (typically in days) shown on the x-axis. The y-axis shows the target value with plus and minus 1, 2, and 3 SDs indicated. Control limits can be indicated but are not practical to show on the chart when multiple control rules are used for evaluation of quality control results.
Mean The arithmetic average of a series of numbers such as results of repeated measurements of a quality control sample. The mean of a series of replicate results for a quality control material is used as the target value for future measurements of that quality control material.
Proficiency testing Another term for external quality assessment.
Quality assessment Protocols to confirm that the laboratory service meets the needs of medical providers who use laboratory results for medical decisions. Quality assessment includes preanalytical, analytical, and postanalytical components of quality.
Quality control Statistical and/or nonstatistical check protocols, typically using quality control samples that are surrogates for the patient samples being tested, to assess that results from a measurement procedure meet preestablished criteria to be suitable for use in medical decisions.
Quality control plan A document that describes the organization and operation of the internal quality control process.
Quality control rules A set of rules that results for one or more quality control samples must meet to make a decision that the measurement procedure performance is producing results for patient samples that are suitable for use in medical decisions.
Sigma metric A value that expresses the variation in performance of a measurement procedure relative to the allowable variability in results to be suitable for medical decisions expressed in SD units. Six-sigma performance means that six SDs of measurement procedure variation fit within the allowable limits for acceptable performance.
Standard deviation A statistical value that estimates the dispersion of replicate values around a mean value. A SD assumes a Gaussian distribution of values that is typically observed for numeric quality control results.
Standard deviation interval The number of SDs an individual result is from a mean value calculated as mean minus individual value divided by SD. The SD interval can be positive or negative relative to the mean value.
The purpose of a clinical laboratory test is to provide information on the pathophysiological condition of an individual patient to assist with diagnosis, therapy, or to assess risk for a disease. Internal quality control, frequently referred to as quality control (QC), is a statistical sampling approach performed by a laboratory on a regular schedule to assess performance of its measuring systems. External QC, frequently referred to as external quality assessment (EQA) or proficiency testing (PT) is an approach to assess measuring system performance by comparison of results for samples measured by a group of different laboratories. Both types of QC are addressed in this chapter (see later section for EQA/PT).
Internal QC evaluates a measurement procedure by periodically measuring a QC sample for which the expected result is known in advance. If the result for a QC sample is within acceptable limits of the known value, the measurement procedure is verified to be performing as expected, and results for patient samples can be reported with high probability that they are suitable for clinical use. If a QC result is not within acceptable limits, the measurement procedure is not performing correctly, there is a high probability that results for patient samples are not suitable for clinical use, and corrective action is necessary. Patient sample measurements could need to be repeated when the measurement procedure has been restored to its stable performance condition. If erroneous results have already been reported before an error condition is identified, a corrected report must be issued.
Measurement procedures fall into one of two general categories from a QC plan perspective. One type of procedure is a “batch” measurement process in which the results for patient samples and QC samples are completed before the results are reported. The other type of procedure is a “continuous” measurement process in which patient sample results are reported during the interval between QC samples measurements. For continuous measurement procedures, there is a possibility that erroneous results have already been reported if an error condition is identified by the next QC samples measurements. In either category, QC procedures only identify error conditions present at the point in time when a QC sample is measured.
The design of a QC plan must consider the analytical performance capability of a measurement procedure and the risk of harm to a patient that might occur if an erroneous laboratory test result is used for a clinical care decision.
Points to Remember
Internal Quality Control
• The primary role of internal QC is to confirm that patient results are correct and to identify possible errors before they will affect patient care decisions.
Measurement Procedure Performance as a Prerequisite for a Quality Control Plan
Analytical Bias and Imprecision
Fig. 7.1 illustrates the meaning of bias and imprecision for a measurement procedure. The horizontal axis represents the numeric value for an individual result, and the vertical axis represents the number of repeated measurements with the same value made on samples of a QC material. The blue line shows the dispersion of results for repeated measurements of the same QC material, which is the random imprecision of the measurement. The standard deviation (SD) is a measure of expected imprecision in a measurement procedure when it is performing within specifications. The mean of repeated measurements of a QC sample becomes the expected value for that QC sample.
Fig. 7.1B illustrates that if a systematic bias (error) occurs in the measurements, the mean value shifts to another value. Note that the imprecision is the same as before the bias occurred because it is unlikely, although not impossible, that a change in imprecision would occur at the same time as a bias shift. The primary purpose of measuring QC samples is to statistically evaluate the measurement procedure to verify that it continues to perform within the specifications consistent with its acceptable expected stable condition or to identify that a change in performance occurred that needs to be corrected. QC result acceptance criteria are based on the probability for an individual QC result to be different from the variability in results expected when the measurement procedure is performing in a stable condition within its specifications.
FIG. 7.1 (A) Distribution of results showing the mean value and expected imprecision for repeated measurements of a quality control sample. (B) Bias when a change in calibration has occurred. SD, Standard deviation. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
Fig. 7.2 shows a Levey-Jenningschart that is the most common presentation for evaluating QC results. This format shows each QC result sequentially over time and allows a quick visual assessment of performance. Assuming the measurement procedure is performing in a stable condition, the mean value represents the target (or expected) value for the QC result, and the SD lines represent the expected imprecision. Assuming a Gaussian (normal) distribution of imprecision, the results should be distributed uniformly around the mean with results observed more frequently closer to the mean than near the extremes of the distribution. The data show fluctuations in performance at different intervals and illustrate the importance of calculating the SD over a long enough interval to appropriately include all sources of variability in the measurement procedure. Note that a small number of results in Fig. 7.2 are greater than 2 SDs, and three results slightly exceed 3 SDs, which is expected for an approximately Gaussian distribution of imprecision. The number of results expected within the standard deviation intervals (SDIs) is:
• ±1 SD = 68.3% of observations
• ±2 SD = 95.4% of observations
• ±3 SD = 99.7% of observations
Interpretation of an individual QC result is based on its probability to be part of the expected distribution of results for the measurement procedure when the procedure is performing correctly. Note that evaluation of individual QC results may be performed by computer algorithms without visual examination of a Levey-Jennings chart.
FIG. 7.2 Levey-Jennings chart of quality control (QC) results (n = 1232) for a single lot of QC material used over a 10-month period. SD, Standard deviation. Reprinted with permission from Miller, W. G., & Nichols, J. H. [2011]. Quality control. In: W. A. Clarke [Ed.], Contemporary practice in clinical chemistry [2nd ed.]. Washington, DC: AACC Press.
Performance of a Measurement Procedure for Its Intended Medical Use
It is necessary to determine how the performance of a measurement procedure relates to the medical requirements for interpreting results to determine the frequency to measure QC samples and the criteria to use to evaluate the QC results. The sigma metric is commonly used to assess how well a measurement procedure performs relative to the medical requirement. For laboratory measurements, the sigma metric is calculated as:
Sigma=(TEa−|bias|)SD
where TEa is the total error allowed based on medical requirements, and absolute value of bias and SD refer to performance characteristics of the measurement procedure. The SD is estimated from the QC data. It is critically important that the estimate of SD be made using QC data that represent all or most components of variability that occur over an extended time period. The bias is difficult for a laboratory to estimate because it is difficult to evaluate if a particular measurement procedure has a bias compared with a reliable estimate of a true value such as a reference measurement procedure. For internal QC, a laboratory is usually interested to determine if a bias has occurred compared with the condition established by calibration of a measurement procedure, and if a bias has occurred it will be corrected. Consequently, the bias is usually assumed to be zero for calculating sigma.
TEa represents the measurement procedure performance required for medical decisions based on a test result. TEa can be estimated using three models. The preferred model (model 1) is to set a performance specification based on an outcome study such as the impact of analytical performance on the clinical outcome. Outcome studies are only available for a small number of analytes and typically included in clinical laboratory practice guidelines.
Model 2 bases the TEa on a fraction of the within and between individual biological variations of the measurand. (For additional information on biological variation refer to Chapter 4.) This model minimizes the ratio of the “analytical noise” to the “biologic signal,” with an assumption that a small ratio will identify measurement procedure performance that relates to the medical requirements. Tables of optimal, desirable, and minimal TEa based on biological variation are available.
Model 3 bases the performance specifications on the “state of the art,” which is the performance capability of a measurement procedure usually derived from QC data. The laboratory should consult with clinical care providers to agree on an appropriate TEa for the patient population served.
Because sigma assumes a Gaussian or normal distribution for repeated measurements, the probability of a defect (i.e., an erroneous laboratory result) can be predicted. The term six-sigma refers to a condition when the variability in the measurement process is sufficiently smaller than the medical requirement that erroneous results are very uncommon. Fig. 7.3A shows a six-sigma test that has the TEa limits 6 SDs away from the center point of the distribution of variability in measurements. A small amount of bias or increased imprecision will have little influence on the number of erroneous results produced, and the risk of producing an erroneous result even with some loss of performance is very low. Consequently, less stringent QC is suitable.
Fig. 7.3B shows a three-sigma measurement procedure that has the TEa limits 3 SDs away from the center point of the expected distribution of variability in measurements. In the three-sigma situation, a small amount of bias or increased imprecision will cause the number of erroneous results to increase substantially. More frequent QC and more stringent acceptance criteria will allow the laboratory to identify when small changes in performance occur so they can be corrected to minimize the risk of harm to a patient from erroneous results being acted on to make clinical care decisions.
FIG. 7.3 Test performance relative to the sigma scale to describe how well performance meets medical requirements expressed as the allowable total error (TEa). (A) A six-sigma measurement procedure. (B) A three-sigma measurement procedure. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
Points to Remember
• The performance characteristics of a measurement procedure when it is performing in a stable in-control condition must be known.
• The allowable total error for a measurement procedure must be established based on requirements for using a laboratory result in patient care decisions.
• The sigma metric represents the probability that a given number of erroneous results may occur when the test measurement procedure is performing to its specifications.
Developing a Quality Control Plan and Implementing Internal Quality Control Procedures
Selection of Quality Control Materials
In general, two different concentrations of QC materials are necessary for adequate statistical QC. For quantitative measurement procedures, QC materials should be selected to provide analyte concentrations that represent clinical decision values over the analytical measuring interval of the measurement procedure. More than two concentrations of QC materials may be needed when there are several important medical decision values. In practice, laboratories are frequently limited by concentrations available in commercial QC products. For procedures with extraction or other pretreatment steps, controls must be used to include any pretreatment steps. For qualitative tests, QC values should assess the stability of a threshold for a classification decision.
The QC materials must be manufactured to provide a stable product that can be used for an extended time period when possible. Use of a single lot for a year or more allows reliable interpretive criteria to be established that will minimize the effort and uncertainty associated with QC lot changes.
Limitations of Quality Control Materials
An important limitation of most QC materials, which also applies to EQA or PT materials, is called noncommutability with patient samples. Fig. 7.4A shows that a commutable QC material gives a result that closely agrees with results for authentic patient samples with the same amount of analyte when measured by different procedures. Fig. 7.4B shows that results for a noncommutable QC material have a different relationship than observed for patient samples when measured by different procedures. QC and EQA/PT materials are typically noncommutable with patient samples because the serum or other biologic fluid matrix is usually altered from that of a patient sample during product manufacturing, for example, by use of partially purified human and nonhuman additives to achieve desired concentrations and various stabilization additives and processes. The impact of the matrix alteration on the recovery of a measurand is not predictable and is frequently different for different lots of QC material, for different lots of reagent within a given measurement procedure, and for different measurement procedures. Because of the noncommutability limitation, special procedures are required when changing lots of reagent or comparing QC results among two or more measurement procedures.
A second limitation of QC materials is gradual deterioration of the analyte during storage and after reconstitution, thawing, or vial opening.
Frequency to Measure Quality Control Samples
The frequency to measure QC samples depends on several considerations. Regulatory requirements may specify a minimum frequency for QC measurements. The instructions for use from measurement procedure manufacturers may specify a minimum or recommended QC frequency. The risk of action being taken before a measurement error of clinical importance is detected is an important consideration for more frequent QC measurements than based on regulatory requirements or manufacturer’s recommendations.
Analytical Stability of the Measurement Procedure
The more stable the measurement procedure, the less frequently a QC evaluation needs to be performed. Some measurement procedures, particularly point-of-care (POC) devices, have been designed with sophisticated built-in control procedures to mitigate the risk that an erroneous result may be produced. These measurement systems may be sufficiently stable and self-monitored to justify reduced frequency of traditional QC sample testing.
FIG. 7.4 Illustration of commutable and noncommutable materials. (A) Commutable materials (blue squares) have the same relationship between two measurement procedures as observed for patient samples (black diamonds). (B) Noncommutable materials have a different relationship than observed for patient samples. EQA, External quality assessment; PT, proficiency testing; QC, quality control.
Risk of Harm to a Patient and Number of Patients Who May be at Risk
More frequent QC sampling is appropriate to avoid the situation of discovering a measurement procedure defect many hours after a physician has made a clinical treatment or nontreatment decision based on an erroneous result. For example, QC sampling performed on a 24-hour cycle might be performed at 9 a.m. If QC results indicate a measurement procedure problem, the erroneous condition could have started at any time during the previous 24 hours. If the problem had occurred at 3 p.m. the previous day, erroneous results could have been reported for 18 hours, likely putting a large number of patients at risk of an inappropriate care decision.
The Clinical and Laboratory Standards Institute (CLSI) has published guideline EP23 addressing risk-based QC procedures. The document provides guidance on how to develop a QC plan based on evaluation of risk of harm to a patient and assessment of the effectiveness of risk mitigation procedures. In general terms the laboratory director makes a judgment that suitable built-in and laboratory-applied controls are in place and that a result has a high probability to be correct at the time it is reported for clinical use.
Event-Based Quality Control Sample Measurement
It is necessary when using a continuous measurement system to measure QC samples before and after scheduled events such as recalibration or maintenance that may alter the current performance condition. Each of these operations is intended to restore the measurement conditions to optimal specifications and to correct for any calibration drift or component deterioration that may have occurred. If QC samples are not measured before such scheduled events, a laboratory will not know if an error condition may have occurred since the last time QC samples were measured. It is also necessary to measure QC samples after these events to verify that the operations were performed correctly and that measurement procedure performance meets specifications before restarting to measure patient samples.
Establishing the Quality Control Target Value and Standard Deviation That Represent a Stable Measurement Operating Condition
QC target values and acceptable performance limits are established to optimize the probability to detect a measurement defect that is large enough to have an impact on clinical care while minimizing the frequency of “false alerts” caused by statistical limitations of the criteria used to evaluate QC results.
Quality Control Material Target Value
The generally accepted minimum protocol for target value assignment is to use the mean from a minimum of 10 measurements of the QC material on 10 different days when the measurement procedure is correctly calibrated and performing to its specifications. Because all sources of variability cannot be captured in 10 measurements, it is recommended to update the target value after more data have been acquired during use of the QC material. If a 10-day protocol is not possible (e.g., if an emergency replacement of a lot of QC material is necessary), a provisional target value can be established with fewer data but should be updated when additional QC results are available.
Quality Control Material Standard Deviation
The SD must represent the variability expected for a measurement procedure over an extended time interval to include all sources of variability when its performance meets its specifications. Measurement variability has short time interval components (such as pipetting volume), gradual changes (such as pipet seal deterioration or coating of cuvette or electrode surfaces), or variable and sometimes long intervals (such as calibration cycles, reagent replenishment, and maintenance procedures). An SD that represents stable measurement performance can usually be estimated from the cumulative SD over a 6- to 12-month period for a single lot of QC material because most expected sources of variation are likely to be represented. Fig. 7.5 illustrates the fluctuation in SD that occurred when calculated for monthly intervals compared with the relatively stable value observed for the cumulative SD after a period of 6 months. Note that the cumulative SD is not the average of the monthly values but is the SD determined from all individual results obtained over a time interval since the lot of QC material was first used. If the imprecision expected during normal stable operation is underestimated, the acceptable range for QC results will be too small, and the false-alert rate will be unacceptably high.
When a measurement procedure has been established in a laboratory and a new lot of QC material is being introduced, the target value for the new lot of QC material is used along with the well-established SD from the previous lot. This practice is appropriate because in most cases, measurement imprecision is a property of the measurement procedure and equipment used and is unlikely to change with a different lot of QC material.
FIG. 7.5 Cumulative standard deviation (SD) versus single monthly values calculated from the data in Fig. 7.2. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
TABLE 7.1
Abbreviation Nomenclature for Quality Control Evaluation Rules
Rule
Meaning
Detects
12S
One observation exceeds 2 SDs from the target value. The 12S rule is not recommended except for low sigma measurement procedures because it has a high false-alert rate.
Bias or imprecision
13S
One observation exceeds 3 SDs from the target value.
Bias or large imprecision
22S (22.5S)
Two sequential observations, or observations for two QC samples measured at approximately the same time, exceed 2 SDs (or 2.5 SDs) from the target value in the same direction.
Bias
2 of 32S
Two observations for three QC samples measured at approximately the same time exceed 2 SDs from the target value in the same direction. Note that this type of rule is used when three QC materials are used for a measurement procedure.
Bias
R4S
Range between observations for two QC samples measured at approximately the same time, or for two sequential observations of the same QC sample, exceeds 4 SDs.
Imprecision
10x or 10m
Ten sequential observations for the same QC sample are on the same side of the target value (x or mean). The 10x rule is not recommended because it has an excessive false-alert rate.
Not recommended
81S (81.5S)
Eight sequential observations for the same QC sample exceed 1 SD (or 1.5 SD) in the same direction from the target value.
Bias trend
CUSUM
CUSUM of SDI for the current and previous results.
Bias trend
EWMA
EWMA for the current and previous results with newer results having more influence (weight).
Bias trend
From Miller, W. G. (2016). Quality control. In: Henry’s clinical diagnosis and management by laboratory methods (23rd ed.). Philadelphia: Elsevier.
When a new measurement procedure replaces an existing procedure, the SD for the existing procedure can in many cases be used as the initial SD for the new measurement procedure until the SD for the new measurement procedure has been established. An assumption is made that the SD for the existing measurement procedure was appropriate to ensure the results were suitable for use in medical decisions, consequently that SD is likely to be suitable for QC decisions for the new measurement procedure.
If a measurement procedure for a new analyte is introduced, there will not be any historical performance information. The SD is based on QC data obtained during the measurement procedure validation. A minimum of 20 observations on different days is recommended for the initial estimate of the SD. This initial estimate of SD will likely be an underestimate because it does not include all sources of variability and must be updated when sufficient QC results have been accumulated to include most sources of variability.
Quality Control Materials With Preassigned Values
Some QC materials are provided by the measurement procedure manufacturer with preassigned target values and acceptable ranges intended to confirm that the measurement procedure meets the manufacturer’s specifications. Such assigned values may be used to verify the manufacturer’s specifications. However, it is recommended that both the target value and the SD should be reevaluated and assigned by the laboratory after adequate replicate results have been obtained because the QC interpretive rules used in a single laboratory should reflect performance for the measurement procedure in that laboratory.
QC materials with assigned target values and SDs are also available from third-party manufacturers (i.e., manufacturers not affiliated with the measurement procedure manufacturer) and typically have values that are applicable to specifically stated measurement procedures and reagent lots to accommodate the influence of noncommutability. However, a laboratory should determine a target value and SD that reflect the operating conditions in that laboratory to ensure optimal assessment of QC results to identify an erroneous measurement condition.
Establishing Rules to Evaluate Quality Control Results
The acceptable range and rules for interpretation of QC results are based on the probability of detecting an analytical error condition with an acceptably small false-alert rate.
The conventional way to express QC interpretive rules is by using an abbreviation nomenclature popularized among clinical laboratories by Westgard and summarized in Table 7.1. Note that fractional SDIs can be used as in the 22.5S and 81.5S examples and that combinations of numbers of controls and limits can be used as appropriate for QC interpretive rules. Trend detection procedures such as cumulative sum (CUSUM) or exponentially weighted moving average (EWMA) are recommended if supported by an available computer system because they are more powerful for detecting trends than approaches based on counting the number of sequential observations exceeding a specified SDI. A trend rule can be set to give an alert as a warning that may not require discontinuing testing but indicate that a problem is developing that should be investigated.
TABLE 7.2
Empirical Multirule for the Quality Control Data Presented in Fig. 7.2
Multirule Components
Type of Variability Detected
13S
Imprecision or bias
22.5S
Bias
R4S
Imprecision
81.5S
Bias trend
From Miller, W. G. (2016). Quality control. In: Henry’s clinical diagnosis and management by laboratory methods (23rd ed.). Philadelphia: Elsevier.
It is recommended to improve the efficiency of QC interpretive rules by combining two or more rules and applying them simultaneously as multirule criteria. For example, the 13S/22S multirule identifies an error condition if one control exceeds ±3 SD from the target value or if two controls exceed ±2 SDs in the same direction from the target value. The 13S/22S multirule has a low false-alert rate but improved probability to detect a bias error of a given magnitude.
In practice, empirical judgment is frequently used to establish acceptance criteria (rules) to evaluate QC results based on data acquired over a long enough period of time to adequately estimate the expected variability when a measurement procedure is working correctly. An empirical approach can be used by obtaining a set of QC data that represents stable measurement procedure performance over a time interval expected to include most sources of variability. Using those data, the false-alert rate for a rule can be determined, and bias errors of different magnitudes can be added to estimate the ability of a rule, or a combination of rules, to identify that error. Table 7.2 gives an example of an empirically developed multirule for a 6-sigma measurement procedure. Such control rules should allow the laboratory to detect errors before they are of a magnitude that will affect clinical decisions. A 10x rule was not used because it would have increased the false-alert rate by 10.6%. A 10x rule or other rule that counts the number of sequential QC results on one side of the target value is not recommended because this condition typically does not indicate a problem with clinical interpretation of patient results when the magnitude of the difference from the target value is small. Counting the number of sequential results that exceed a larger SD from the target value, such as 81.5S in this example, is more likely to represent a measurement condition that might need investigation.
For measurement procedures with small sigma values, small deviations from the expected performance need to be identified. Consequently, more stringent QC practices need to be used such as selecting rules such as 12S that will give an alert at smaller error conditions, using additional rules in a multirule set, measuring QC more frequently, using more than two QC samples, and not releasing patient results until QC assessment is complete for the time interval during which patient samples were measured. More stringent QC rules will have more false alerts, but this is an unavoidable cost when lower sigma measurement procedures are used.
Specifying the Quality Control Plan
The preceding subsections describe the considerations for each component in a QC plan. The laboratory director is responsible for considering the components, making judgments regarding the considerations, and approving the final plan for each analyte measured in a laboratory. A plan for internal QC specifies the following components:
• The number of QC samples to be measured and the approximate concentrations of analytes in those controls
• The target value for each QC sample
• The SD for each QC sample to be used in the QC rules
• The rules for evaluating the QC results
• The frequency to test the QC samples
Considerations for Point-of-Care Testing
Internal QC of POC instruments offer extra challenges compared with those addressed at the central laboratory. The main reasons are that POC instruments are often operated by persons without laboratory training; they often use methodologic principles that are different from those in the central laboratory; they often have “built-in” controls; and the number of measurements can be small, making the use of QC samples in the traditional way expensive. In cartridge-based and some strip-based instruments, the manufacturer often has placed the technology in the cartridge or strip together with QCs, and in some cases, QC rules are built in so that patient results cannot be reported unless the QC is “satisfactory.” The instrument is then merely an electronic reader that often has incorporated an “electronic quality control” that verifies the electronics of the measurement procedure. The electronic instrument checks do not verify the reagents in the cartridges or strips, and unless each cartridge has internal QC materials, the reagent cartridges or strips should be checked at delivery and then at intervals (e.g., with the arrival of a new shipment or lot or at a suitable interval such as weekly or monthly depending on the cartridges/strips).
Not all POC instruments include enhanced QC features. In these cases, one must rely on daily liquid QC performed by the operator. The limitation of using the liquid QC sample in this situation is that it only checks if one disposable cartridge or strip meets the performance specifications. This limitation requires an assumption that all devices in a lot were manufactured uniformly and will perform equivalently.
How internal QC should be performed and supervised also depend on the location of the POC instruments. In a hospital, it is now possible with real-time bidirectional connectivity between the POC devices and the central laboratory to transfer both patient and QC results and to set lock-out parameters for conformance to a QC protocol. As technology advances, the general trend is for more sophisticated POC devices with built-in control systems to be incorporated to minimize or prevent the possibility for an incorrect result.
Corrective Action When a Quality Control Result Indicates a Measurement Problem
A QC alert occurs when a QC result fails an evaluation rule, which indicates that an analytical problem may exist. A QC alert means there is a high probability that the measurement procedure is producing results that have the potential to be unreliable for patient care and testing must be stopped until the problem is resolved. Fig. 7.6 presents a generalized troubleshooting sequence. QC materials can deteriorate after opening because of improper handling and storage or because of unstable analytes. Thus repeating the measurement on a new vial of the QC material is a useful step to determine if the alert was caused by deteriorated QC material rather than by a measurement procedure problem. In this situation, if the result for the new QC sample is acceptable, testing of patient samples can resume. One caution when the repeat QC result is near acceptability limits is to consider whether the repeat and original results are essentially the same. It is not acceptable to repeat the QC until a value happens to be just within the acceptable limit. In this situation the probability is high that a measurement problem exists, and this possibility should be investigated. In addition, current and preceding QC results should be examined for a trend in bias that indicates a measurement issue that needs to be corrected. These precautions in evaluating repeat results for a new QC sample can be challenging or impossible for automated evaluation by computer systems, thus requiring the laboratory technologist to be vigilant in reviewing results.
FIG. 7.6 Generalized troubleshooting sequence showing the initial steps after an unacceptable quality control (QC) result. The details of troubleshooting the defect may be different for different rules violations or if more than one unacceptable QC result was obtained.
When repeat testing of a new QC sample does not resolve the alert situation, the instrument and reagents should be inspected for component deterioration, empty reagent containers, mechanical problems, and so on. In many cases, it will be necessary to recalibrate. When the problem is identified and corrected, QC samples should be measured to verify the correction, and all patient samples since the time of the last acceptable QC results, or the time when the error condition occurred, should be measured again. The laboratory director must establish acceptable criteria to determine if the repeat results agree adequately to permit reporting of original results without issuing a corrected report. Otherwise, corrected results must be reported. The criteria for acceptability of repeated tests are based mainly on the TEa described earlier with consideration of the measurement procedure performance characteristics.
It may be difficult to establish the time when an error condition occurred. One approach is to repeat every few samples back to the time of the last acceptable QC results. The repeated results are then compared with acceptable criteria for repeated results agreement to identify a point in time when the error condition occurred. When selecting the samples to repeat, it is important to ensure that a substantial representation of the potentially erroneous samples is repeated and that samples at a concentration consistent with that of the unacceptable QC are represented. Alternatively, groups of 10 patient samples can be repeated, again ensuring that samples at a concentration consistent with that of the unacceptable QC are represented, until all repeat results in at least two sequential groups are within acceptable criteria for repeated results. When the point at which the error condition was likely to have occurred is identified, all patient samples must be repeated from that point until the unacceptable QC result was obtained. Any assessment of the point at which an error condition occurred by repeating selected patient samples has a risk to incorrectly identify that point, and laboratories are encouraged to repeat enough patient samples to have confidence in the assessment.
Verifying Quality Control Evaluation Parameters After a Reagent Lot Change
Changing reagent lots can cause a shift in QC results when there is no change in results for patient samples. Because the matrix-related noncommutability between a QC material and a reagent can change with a different reagent lot, QC results are not a reliable indicator of a measurement procedure’s performance for patient samples after a reagent lot change. In the example in Fig. 7.7, QC values for the high-concentration control shifted after the change to a new lot of reagents, but there was no change in results for the low control. A comparison of results for a panel of patient samples assayed using the new and old reagent lots showed equivalent results. Consequently, the change in QC values for the high-concentration material was due to a difference in matrix-related noncommutability bias between the QC material and each of the reagent lots.
It is necessary to use patient samples to verify the consistency of results between old and new lots of reagents because of the unpredictability of a matrix-related noncommutability bias being present for QC materials. Fig. 7.8 presents a protocol to verify or adjust QC material target values after a reagent lot change. A group of patient samples and the QC samples are measured using both the current (old) and new reagent lots. The first step is to verify that results for a group of patient samples measured with the new reagent lot are consistent with results from the current (old) lot. The patient sample results, not the QC results, provide the basis for verifying that the new reagent lot is acceptable for use. If a problem is identified, the calibration of the new reagent lot must be investigated and corrected, or the new reagent lot may be defective and should not be used. When evaluating the patient results, keep in mind that the calibration of the old reagent lot may have drifted and should be verified before concluding that the new reagent lot is not giving acceptable results for the patient samples.
FIG. 7.7 Levey-Jennings plot showing impact of a reagent lot change on matrix bias with quality control (QC) samples. Modified with permission from Miller, W. G., & Nichols, J. H. [2010]. Quality control. In: W. A. Clarke [Ed.], Contemporary practice in clinical chemistry [2nd ed.]. Washington, DC: AACC Press.
The number of patient samples to use for verifying the performance of a new reagent lot will depend on the measuring interval, the imprecision of a measurement procedure, and the concentrations at which clinical decisions are made. CLSI document EP26 recommends a minimum of three patient samples and more patient samples depending on the number of important clinical decision concentrations and the imprecision of a measurement procedure. This CLSI guideline includes a statistical analysis to determine if a difference in patient results is less than a critical difference that would represent risk for an inappropriate patient care decision based on a particular laboratory test result. An alternate approach is to select 5 to 10 patient samples that span the measuring interval and use a difference plot to evaluate average performance over the interval of concentrations represented by the patient samples. The laboratory must establish acceptance criteria for the agreement of patient results for old and new lot measurements consistent with the relatively small number of samples used, the analytical performance characteristics of a measurement procedure, and the clinical requirements for interpreting results.
When the results for patients are acceptable, the second step in Fig. 7.8 evaluates results for each QC material to determine if its target value is correct for use with the new lot of reagent(s). If the target value has changed, it must be adjusted to correct for the change in matrix-related noncommutability bias between old and new lots of reagent(s). This adjustment keeps the expected variability centered around the QC target value so that QC interpretive rules will remain valid.
FIG. 7.8 Process for assessment of potential matrix-related noncommutability bias on quality control (QC) samples after a reagent lot change. SD, Standard deviation. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
Failure to make a target value adjustment will cause subsequent QC results to be evaluated incorrectly, as illustrated in Fig. 7.9. The shift in target value would cause some of the results shown by blue squares to exceed the old upper QC rule limit, when in reality there is no defect in patient results because the increase in QC results is caused by the matrix-related noncommutability bias with the new reagent lot. Similarly, the increased magnitude of the gap to the old lower QC rule limit will permit a low bias condition, as shown by the blue square points at sequence number 29 and 30, to be undetected. In most cases the SD for a QC material will be the same with any lot of reagent(s).
Note that a reagent lot induced matrix-related noncommutability bias change in the numeric values for the QC results will cause an incorrect increase in the cumulative SD if all results are used for the calculation. For this reason, it is recommended to use the cumulative SD from a single reagent lot or the pooled SD from more than one reagent lot when determining the SD to use for interpreting QC rules.
Experience in clinical laboratories has shown that there are changes, other than reagent lot changes, in measurement procedures that can also affect the QC values but not the results for patient samples. Such changes could be caused by instrument component replacement or other causes. In theory, there should be an assignable cause for such effects, but such a cause is not always identifiable. In practice, any condition that affects QC results but does not affect patient results is treated in the same manner as described for reagent lot changes. The important QC principle is that if the results for patient samples are consistent between the two conditions, then the target value for the QC sample should be adjusted, if necessary, to reflect its value under the new condition. Failure to adjust the QC target value will cause inappropriate acceptability criteria to be used for evaluating the QC results.
Review of Quality Control Data and the Effectiveness of the Quality Control Plan
The immediate use of QC data is to determine if the results for patient samples can be reported for use in clinical care decisions, as described in the preceding sections. In addition, QC data must be reviewed by laboratory management on a regular schedule. Typical review schedules are weekly by senior technologists or supervisors and at least monthly by the laboratory director. However, the laboratory director or supervisor should promptly review items such as reagent or calibrator lot change validations, changes in QC target values associated with reagent lot or other changes, EQA/PT results review, and other occurrences that may affect quality of the laboratory results.
The weekly review process should determine that correct follow-up of any QC alerts was conducted, that all patient samples that may have had erroneous results were repeated, that any corrected reports were issued, and that the process was properly documented in QC records. The monthly review should include any issues identified by the weekly review process, as well as examination of the Levey-Jennings chart or a computer-based report, to identify trends or changes in assay performance that may need to be addressed before they have effects on clinical care decisions. Note that automated systems to assist in the review of QC data are acceptable, and individual Levey-Jennings charts do not need to be examined every month. The monthly review should also include any adjustments made to QC parameters or the QC plan for a measurement procedure during the month.
FIG. 7.9 Illustration of the influence on the failure rate for a quality control (QC) rule when failing to adjust the target value for a matrix-related shift. QC results before a new reagent lot are shown as gray circles and after the new reagent lot as blue squares. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
Points to Remember
• Internal QC samples are measured along with patient samples.
• The target value and SD expected for a QC sample are established by the laboratory.
• Results from QC samples are evaluated using interpretive rules that are established after considering the probability for false alerts and the probability for detecting errors that represent a risk of harm to a patient.
External Quality Assessment or Proficiency Testing
EQA/PT is used to evaluate measurement procedure performance by comparing a laboratory’s results with those of other laboratories for the same set of samples. An EQA/PT provider circulates a set of samples among a group of laboratories that can be several thousand in larger programs. Each laboratory measures the EQA/PT samples as if they were patient samples and reports the results to the EQA/PT provider for evaluation. The EQA/PT provider establishes target values for the EQA/PT samples and determines if the results for an individual laboratory are in close enough agreement with the target value to be consistent with acceptable measurement procedure performance.
EQA/PT is not available for some analytes because a particular measurement procedure may be new to the clinical laboratory or is not commonly performed or because analyte stability makes it difficult to include in an EQA/PT sample. In these situations the laboratory should use an alternate approach to periodically verify acceptable performance of the measurement procedure. CLSI guideline GP27 provides approaches for verifying measurement procedure performance when formal EQA/PT is not available.
Internal QC material manufacturers may provide a data analysis service that compares results from different laboratories using the same QC material by calculating group statistics for performance evaluation. As with EQA/PT evaluation, this type of interlaboratory QC data analysis allows a laboratory to verify that it is producing QC results that are consistent with those of other laboratories using the same measurement procedure. This information can be helpful for troubleshooting measurement procedure issues and for assessing performance of a new measurement procedure being introduced to a laboratory.
External Quality Assessment or Proficiency Testing Programs That Use Commutable Samples
EQA/PT programs that use commutable samples are preferred whenever available. Commutable samples are typically prepared by using an individual donor’s specimen or by pooling clinical patient samples with minimal processing or additives to avoid alteration of the sample matrix. When commutable EQA/PT samples can be prepared, the results reflect what would be expected if individual patient samples were sent to each of the different laboratories. Thus agreement among different laboratories and measurement procedures (harmonization) can be correctly evaluated. The agreement between an individual laboratory result and a reference measurement result gives an assessment of correct calibration for the laboratory. The agreement between an individual laboratory result and an all-methods mean gives an assessment of harmonization of results with other measurement procedures. The agreement between a measurement procedure group mean value and the reference measurement result gives an assessment of trueness and calibration traceability for the measurement procedure group. In addition, the agreement between a measurement procedure group mean value and an all results mean gives an assessment of harmonization of results. The information for measurement procedure groups is of particular interest to the producers of measurement procedures and can be used as part of a surveillance program for the calibration traceability scheme.
External Quality Assessment or Proficiency Testing Programs That Use Noncommutable Samples
The materials commonly used for EQA/PT samples are derived from blood, urine, or other body fluids but are altered in the process to manufacture EQA/PT samples such that the matrix is modified and the samples frequently do not have the same measurement characteristics as observed for unaltered clinical patient samples. In addition, some EQA/PT samples (e.g., urine, cerebrospinal fluid, or blood gas) are prepared as synthetic materials that are not derived from patient fluids. Consequently, many EQA/PT samples, as for QC samples, are noncommutable with authentic patient samples. The results for a noncommutable EQA/PT sample will have a different relationship in their numeric values between different measurement procedures and sometimes for different reagent lots within a measurement procedure than would be observed for patient samples.
It is a common practice for EQA/PT providers to organize results for noncommutable samples into “peer groups” of measurement procedures that represent similar technology expected to have the same result for a noncommutable EQA/PT sample. The mean or median value of the peer group results is the target value. Because the peer group mean value may be influenced by a matrix-related noncommutability bias, that mean value can be used only to evaluate laboratories using the same or very similar measurement procedures and cannot be used to evaluate laboratories using other measurement procedures or to evaluate if results from different measurement procedures agree with each other. If an individual laboratory’s results agree with those of the peer group, the individual laboratory can conclude that the measurement procedure was performing in conformance with the manufacturer’s specifications. An individual laboratory cannot use results for noncommutable EQA/PT samples to verify that a measurement procedure is calibrated correctly to be traceable to the reference system for an analyte.
However, even within a peer group using the same measurement procedure, differences can occur because of different reagent lots used by the measurement procedures in different laboratories because the matrix of the EQA/PT material can influence the results from different reagent lots when the patient samples give similar results. Therefore, in some cases, reagent lots should be registered as part of the EQA/PT reporting, and even reagent lot–specific target values may need to be assigned.
Reporting External Quality Assessment or Proficiency Testing Results When One Measurement Procedure Is Adjusted to Agree With Another Measurement Procedure
It is good laboratory practice to adjust the calibration of different measurement procedures for the same measurand used within a large hospital system that can have several satellite laboratories or within a collection of several hospitals with the same management structure, so that the results for patient samples are consistent, irrespective of which measurement procedure is used. Such harmonization of results is important for uniform use of reference intervals and decision thresholds within a hospital or clinic system. It is important to report EQA/PT results such that they can be properly evaluated against the peer group target value. The peer group target value will reflect the measurement procedure calibration established by the measurement procedure manufacturer. For an individual laboratory’s EQA/PT result to be evaluated against the peer group mean, that individual result must be reported to the EQA/PT provider after removing any calibration adjustments so that the reported result is consistent with the manufacturer’s nonadjusted calibration.
The most convenient way to remove a calibration adjustment is to first measure the EQA/PT samples with the calibration adjustment applied to the measurement procedure, as would be the usual measurement process for patient samples. After the measurement, the EQA/PT results should be adjusted “in reverse” by mathematically removing the calibration adjustment factors, and the results should be reported to the EQA/PT provider with any adjustment factors removed. One should not recalibrate the instrument with a new set of calibrators for the purpose of measuring the EQA/PT samples because this practice would violate regulations requiring the EQA/PT material to be measured in the same manner as patient samples. This process permits the EQA/PT sample to be measured in the same manner as patient samples and the numeric result reported to the EQA/PT provider to reflect the actual measured result using the manufacturer’s calibration settings.
Interpretation of External Quality Assessment or Proficiency Testing Results
Many countries have regulations requiring EQA/PT and specifying the evaluation criteria for acceptable performance. When criteria are set by regulations, an EQA/PT provider is required to use them. When criteria are not set by regulations, the EQA/PT provider sets evaluation criteria on the basis of clinically acceptable performance, biological variation, or the analytical capability of the measurement procedures in use. EQA/PT evaluation criteria are usually designed to evaluate the total error of a single measurement. In some programs, measurements are made several times, and it is possible to separately assess the bias and the imprecision. The acceptability limits for EQA/PT include bias and imprecision components considered acceptable for clinical use of a result, plus other error components that are unique to EQA/PT samples such as between-laboratory variation in calibration; variable matrix-related noncommutability bias with different lots of reagent within a peer group; uncertainty in the target value; stability variability in the EQA/PT material, both in storage and shipping, and after reconstitution or opening in the laboratory; and homogeneity of the EQA/PT material vials. Consequently, the acceptability limits for EQA/PT samples are frequently larger than what might be expected for clinically acceptable total error with patient samples.
FIG. 7.10 Example of an external proficiency testing evaluation report sent to a participating laboratory. Part A uses conventional units and part B uses SI units. SD, Standard deviation; SDI, standard deviation interval. From Miller, W. G. [2016]. Quality control. In: Henry’s clinical diagnosis and management by laboratory methods [23rd ed.]. Philadelphia: Elsevier.
Fig. 7.10 is an example of a typical evaluation report sent to a participating laboratory. Each reported result is compared with the mean result for the peer group using the same measurement procedure. The report also includes the SD for the distribution of results in the peer group, the number of laboratories in the peer group, and the SDI (also called a z-score), which expresses the reported result as the number of SDs it is from the mean value (SDI = [result − mean]/SD). The limits of acceptability are shown. Acceptability criteria may be a number of SDs from the mean value, a fixed percent from the mean value, or a fixed concentration from the mean value. For example, in Fig. 7.10, calcium acceptability criteria are ±1 mg/dL (0.25 mmol/L) from the mean value, and iron criteria are ±20% from the mean value.
FIG. 7.11 Example of part of a feedback report to hemoglobin A1c(HbA1c) point-of-care (POC) users in a survey for general practitioners’ offices and nursing homes. Commutable external quality assessment/proficiency testing material was circulated in two levels and measured in duplicate. The participant is informed about the bias (mean of the two results) compared with a reference measurement procedure target (x-axis) and “precision” as the difference between the two results. The histogram represents the distribution of results among all participants (light blue) and for the participant’s method group (dark blue). The thick black line represents the interval for “good” results, and the thin black line represents the interval for “acceptable” results. Results outside these limits are characterized as “poor.” The triangle points to the result of the participant. Modified with permission from the Norwegian Quality Improvement of Primary Care Laboratories, the external quality assessment provider in Norway.
In Fig. 7.10 the calcium results are in close agreement with the peer group mean (SDI ranges from −0.2 to −1.4), and the laboratory can conclude that its results are consistent with those of others in the peer group using the same measurement procedure and that it is using the measurement procedure according to the manufacturer’s specifications. However, the iron results show greater variability, with one result +3.5 SDI. Although all iron results are within the acceptability criteria, it is recommended to investigate the measurement procedure because a +3.5 SDI is more likely to be different from other participants in the peer group than to be in agreement with them.
Fig. 7.11 shows another type of evaluation report sent to a primary care office for hemoglobin A1c (HbA1c) for one of two EQA/PT samples. In this situation the EQA/PT provider is communicating directly with the clinician or the coworker in the general practice office, and the feedback must be easy to understand for nonlaboratory professionals. The EQA/PT result is evaluated as “good,” “acceptable,” or “poor.” The lot numbers of the reagent are registered so that the participant, in case of an aberrant result, can get information if the result was due to the measurement procedure used, the reagent lot used, or the performance of the user.
Fig. 7.12 shows a similar report from the same HbA1c survey provided to hospital laboratories. In addition to the figures about the distribution of results, information is provided on how different measurement procedures performed, as well as a historical overview of performance on consecutive EQA/PT samples and performance related to the concentration of the sample. The EQA/PT material used for the HbA1c is pooled fresh patient blood (commutable), and the target value is set by a reference measurement procedure and is therefore the same for all measurement procedures. Each sample was measured in duplicate (as requested by the EQA/PT provider), and the mean of the duplicate was used to estimate bias versus the reference measurement procedure. In the present example the performance was within the acceptability limits but with a generally high bias during the whole period. Because this observation was true for all the instruments using this measurement procedure, the EQA/PT organizer discussed the results with the manufacturer to solve the problem. Until the problem was solved (the manufacturer had to make a new calibrator), the participants were advised by the EQA/PT provider to use a correction factor when reporting their results for patient samples.
If an unacceptable EQA/PT result is identified, the measurement procedure must be investigated for possible causes and the necessary corrective action taken. Even when an EQA/PT result is within acceptability criteria, it is a good laboratory practice to investigate results that are more than approximately 2.5 SDI from the peer group mean. When the SDI is 2.5, there is only a 0.6% probability that the result will be within the expected distribution for the peer group; consequently, the probability is reasonable that a measurement procedure problem may need to be corrected.
Common causes for EQA/PT failure are listed in Box 7.1. Incorrect handling and reporting are unique to EQA/PT events and may not reflect the process used in the laboratory for patient samples. Because the influence of reagent lots on noncommutability related bias is well documented, a reagent lot-specific bias is a possible explanation when no other root cause can be identified.
Points to Remember
• An independent external organization circulates EQA/PT samples with unknown target values.
• When commutable “patient-like” material is used, a laboratory can compare its results with results from all other measurement procedures and often with a true value from a reference measurement procedure.
• When noncommutable material is used, a laboratory can compare its results only with results from participants in a “peer group” using a similar measurement procedure.
FIG. 7.12 Example of a part of a feedback report to hemoglobin A1c(HbA1c) users in hospital laboratories. Same survey and same materials as presented in Fig. 7.11. The histogram represents the distribution of results among all participants (light blue) and for the participant’s method group (dark blue). Only limits for “acceptable” (Acc.) results are given (thin black lines in figures). Information about performance of measurement procedures is given in addition to a historical overview of percentage deviation from target values dependent on time and concentration of HbA1c. CV, Coefficient of variation; HPLC, high-pressure liquid chromatography; SD, standard deviation. Modified with permission from the Norwegian Quality Improvement of Primary Care Laboratories, the external quality assessment provider in Norway.
BOX 7.1Classification of Potential Problems Identified When Investigating Unacceptable External Quality Assessment or Proficiency Testing Resultsa
1. Clerical errors
Incorrectly transcribed EQA/PT result from the instrument read-out to the report form
The EQA/PT sample was mislabeled in the laboratory
Incorrect instrument or measurement procedure was reported on the results submission form
Incorrect units were reported
Decimal point was misplaced
2. Measurement procedure problems
Inadequate SOP
Problem with manufacture or preparation of reagents or calibrators (e.g., unstable)
Lot-to-lot variation in reagents or calibrators
Incorrect value assignment of calibrators
Measurement procedure lacks adequate specificity for the measurand
Measurement procedure lacks adequate sensitivity to measure the concentration
Carry-over from a previous sample
Inadequate QC procedures used
3. Equipment problems
Obstruction of instrument tubing or orifice by clot
Misalignment of instrument probes
Incorrect instrument data processing functions
Incorrect instrument setting
Automatic pipetter not calibrated to acceptable precision and accuracy
From Miller, W. G., Jones, G. R. D., Horowitz, G. L., & Weykamp, C. (2011). Proficiency testing/external quality assessment: current challenges and future directions. Clinical Chemistry, 57, 1670–1680.