Chapter Fifteen Standard scores and the normal curve

CHAPTER CONTENTS

Introduction 165

Standard scores (z scores) 165

Normal distributions 166

The standard normal curve 167

Calculations of areas under the normal curve 168

Critical values 168

Standard normal curves for the comparison of distributions 169

Summary 170

Self-assessment 171

True or false 171

Multiple choice 171

Introduction

In this chapter we will discuss standard distributions. Standard scores represent the position of a score or measurement in relation to an overall set of scores. Standard distributions are also useful for comparing scores from different sets of measurements. Standard scores are used in both clinical practice and research in the health sciences. In clinical practice the score of a patient is often compared with a known distribution to interpret the score. Measures such as blood pressure or cholesterol measurements are compared with a distribution to interpret the patient’s result.

The aims of this chapter are to:

1. Define ‘standard’ scores.

2. Describe the characteristics of normal and standard normal curves.

3. Show how standard normal curves can be used for calculating percentile ranks.

4. Show how standard normal curves can be used to compare scores from different distributions.

Standard scores (z scores)

Consider this example: infant A walked unaided at the age of 40 weeks, while infant B is 65 weeks old but still cannot walk. What sense can we make of these measurements? Could infant B need further clinical investigation in case he has some neurological abnormality? The fact that infant B is unable to walk at the age of 65 weeks is not very informative in the absence of additional information about how this compares with norms for other children. However, say that it is known that the distribution of walking ages is such that μ = 50 weeks and σ = 5. Assuming that the frequency distribution is normal, the frequency polygon representing the population would look something like that shown in Figure 15.1.

Figure 15.1 Age at which children walk unaided.

Page 166

In this instance, infant B’s score is clearly above the mean. In fact, by inspection, we can see the infant’s score at this point of time was three standard deviations above (+ 3) the mean (65 = 50 + (3 × 5)). In contrast, infant A began walking earlier than the mean, his score of 40 being two standard deviations below (− 2) the mean. In general, any ‘raw’ score in a frequency distribution can be described in terms of its distance from the mean. The process of transforming a score into a measurement based on its distance from the mean in standard deviations is called standardizing the score. Such ‘transformed’ scores are called z scores or standard scores.

A z score represents how many standard deviations a given raw score is above or below the mean. The equation for transforming specific raw scores into z scores is given as:

For the above equation, x is the raw score, or μ is the mean of the distribution from which the score was drawn and s or σ is the standard deviation of the distribution. That is, when we know the mean and standard deviation of a distribution, we can transform any raw score into a z score. Conversely, when the z score is known, we can use the above equations to calculate the corresponding raw scores.

In the above example, the z scores corresponding to the infants’ raw scores are:

These calculations support our previous observations that A’s score was two standard deviations below the mean and B’s score was three standard deviations above the mean. In other words, A walked very early and B was a very late starter. The particular value of standardizing scores for understanding clinical or research evidence will be discussed in the context of the concepts of normal and standard normal distributions.

Normal distributions

Many variables measured in the biological, behavioural and clinical sciences are approximately ‘normally’ distributed. What is meant by a normal distribution is illustrated by the normal curve (see Fig. 15.2), which is a frequency polygon representing the theoretical distribution of population scores. We assume here that the variable x has been measured on an interval or ratio scale and that it is a continuous variable such as weight, height or blood pressure.

Figure 15.2 Standard normal distribution.

Page 167

The normal curve has the following characteristics:

1. It is symmetrical about the mean, so that equal numbers of cases fall above and below the mean (mean = median = mode).

2. Relatively few cases fall into the high or low values of x. Most of the cases fall close to the mean. (For the theoretical normal distribution, the arms of the curve do not intersect with the x-axis, allowing for a few extreme scores.)

3. The precise equation for the normal curve was discovered by the mathematician Gauss, so that it is sometimes referred to as a Gaussian curve.

We need not worry about the actual formula. Rather, the point is that, given that the functional relationship between f and x is known, integral calculus can be used to calculate areas under the curve for any value of x. All normal curves have the same general mathematical form; whether we are graphing IQ or weight, the same bell-shape will appear. The only differences between the curves are the mean value and the amount of variation. This is why the mean and the standard deviation provide us with important information about any particular normal distribution. Note that it is unlikely that any real data are exactly normally distributed. Rather, the normal distribution is a mathematical model that is useful for representing real distributions.

The standard normal curve

If we transform the raw scores of a variable into z scores and then plot the frequency polygon for the distribution, we will have a standard curve. If the original distribution was normal, then the frequency polygon will be a standard normal curve. Standard normal curves are identical regardless of the nature of the original variables.

By transforming raw scores into z scores we are adjusting for differences in means and standard deviations, which are the only things which distinguish between non-standardized normal curves.

The standard normal curve has the following additional properties:

1. The mean is always 0 (zero). For the previous example, the z score corresponding to μ = 50 (as in the ‘infants’ walking age’ example) is:

2. The mean = median = mode, as the curve is symmetrical.

3. The standard deviation of z scores is always 1 (one). For instance, the z score for 55 (which is one standard deviation above the mean) is:

4. It is assumed that the total area under the curve adds up to 1.00. Since the normal curve is symmetrical, 0.5 of the area falls above z = 0 and 0.5 falls below z = 0. This is another way of saying that 50% of the total cases fall below the mean, and 50% of the cases fall above the mean (which is equal to the median).

5. More generally, we can use appropriate statistical tables to estimate the area under the standard normal curve for any given z scores. These areas are available in table form (see Appendix A) so that for any value of z we can read off the corresponding area.

6. The area under the curve between any two points is directly proportional to the percentage of cases falling above, below or between those two points. We can use the standard normal curve to calculate the percentage of scores falling between any specified two scores.

In the next subsection, we will examine the use of the table of areas under the standard normal curve to understand the meaning of measurements in relation to distributions.

Calculations of areas under the normal curve

We have already examined the concept of percentile or centile ranks. The normal curve is useful for evaluating the percentile rank of scores in normal distributions. Appendix A gives the proportion of areas under the standard normal curve which lies:

• between the mean and a given z score

• beyond the z score.

Since normal distributions are symmetrical, the same proportions are also true for the area between the mean and any negative z score. Only the positive values are given in Appendix A.

Let us see how we can use this information to estimate the percentile ranks of the two infants’ walking ages. We have shown previously in our hypothetical example that for infant A, z = − 2. Let us now turn to Appendix A. In going down the column of z scores, we find that the area corresponding to z = 2.00 is 0.4772 (between) and 0.0228 (beyond).

We know that the area A1 under the curve in Figure 15.3 must be:

Figure 15.3 Area (A1) corresponding to z = −2.

This proportion can be expressed as a percentage, so that 2.28% of the cases in the distribution fall below z = − 2. We have defined percentile rank for a score as the percentage of cases in a distribution falling up to and including a specific score. Therefore, the percentile rank for infant A’s walking is 2.28%. Of all children, only 2.28% learn to walk as early as or earlier than infant A. Clearly, he is doing well.

What then is the percentile rank for infant B’s performance? As you remember, z = + 3. Looking up the area corresponding to z = 3.00 in Appendix A we find the area (A1 in Fig. 15.4) is equal to 0.4987. Therefore, the proportion of scores falling up to and including z = + 3 is 0.5 + 0.4987 = 0.9987.

Figure 15.4 Area (A1) corresponding to z = +3.

Expressing this finding as a percentage, we find that 99.87% of children learn to walk by the age of 65 weeks. As we said earlier, infant B is still not walking. Perhaps further clinical tests are indicated, although we should keep in mind that an unusual or extreme score is not necessarily indicative of pathological states.

Critical values

We can work the other way by determining the raw scores corresponding to areas under the normal curve in Appendix A. For example, say that the slowest 5% of infants are offered some special exercises in learning to walk. What would be the age at which the exercises should be offered, should the child not be walking? The key here is to determine the score that corresponds to the 95th percentile of the distribution. This can be represented as shown in Figure 15.5.

Figure 15.5 Determining z score of 95th percentile.

Page 169

From Figure 15.5 we can see that we need to discover the z score that corresponds to an area of 0.45 above the mean. By consulting the normal distribution table (see Appendix A), it can be seen that the corresponding z score is z = 1.65. This is a critical value for the statistic in defining an area.

Given the z score, we can calculate the corresponding raw score from the formula:

That is, if the slowest 5% are thought to be in need of help, then children somewhat over 58 weeks old and still not walking would be recommended for the remedial exercises. Of course, we can use the tables for reading off the z scores corresponding to any specified area or percentage.

Standard normal curves for the comparison of distributions

One of the uses of standard distributions is that we can compare scores from entirely different distributions. For example, if a student scored 63 on test A, and 52 on test B, on which test did the student do better? If we define ‘better’ as solely in terms of raw scores, then clearly the student did better on test A. However, test A might have been easier than test B, so that if the overall performances of all students on the tests are taken into account, the student’s relative performance might be better on test B.

Therefore, using the formula for calculating z scores, and looking up the corresponding areas in Appendix A (do this yourself) we obtain the results shown in Table 15.1. Thus, the student performed better on test B, by scoring higher than 93% of other students sitting for the test.

Table 15.1 Scores for example in text

Raw scores	z scores	Percentile ranks
x = 63	−0.25	40.1
x = 52	+1.50	93.3

This example illustrates that in some circumstances the meaning of specific scores has to be interpreted against ‘standards’.

Another use of standard distributions is in interpreting the meaning of the results of investigations in the health sciences. Let us examine the following hypothetical example. An investigator measured levels of blood cholesterol in a sample of 300 adults who are meat eaters, and 100 adults who are vegetarians. The results of the investigation are summarized in Table 15.2.

Table 15.2 Blood cholesterol

	Mean blood cholesterol (mg/cc)	Standard deviation
Meat eaters	0.6	0.15
Vegetarians	0.4	0.1

Page 170

Now, imagine that you are a clinician working with patients with cardiac disorders and you are interested in the following questions:

1. Approximately what per cent of vegetarians had blood cholesterol levels greater than the average meat eater?

2. Approximately what per cent of meat eaters had blood cholesterol levels lower than that of the average vegetarian?

The percentage of cases of vegetarians with blood cholesterol greater than 0.6 (the mean for the meat eaters) is represented by area A1 in Figure 15.6.

Figure 15.6 Area A1 corresponds to the percentage of vegetarians with blood cholesterol higher than 0.6 mg/cc.

Therefore, approximately 2.3% of vegetarians had blood cholesterol levels higher than the average for meat eaters.

Figure 15.7 demonstrates the area (A2) corresponding to the percentage of meat eaters with lower blood cholesterol than the average vegetarian.

Figure 15.7 Area A2 corresponds to the percentage of meat eaters with lower blood cholesterol than the mean for vegetarians.

Therefore, approximately 9.2% of meat eaters had lower blood cholesterol levels than the average vegetarian.

Summary

We found that if the mean and standard deviation for a given distribution have been calculated, then we can transform any raw score into a standard (or z) score. The z score represents how many standard deviations a specific score is above or below the mean. We described how to calculate this transformed score for a population or a sample. Also, we outlined the essential characteristics of the normal and the standard normal curves.

It was pointed out that if the original frequency distribution was approximately normal, then the table of normal curves (Appendix A) could be used to calculate percentile ranks of raw scores, or the percentage of scores falling between specified scores. Also, z scores were shown to be useful in comparing scores arising from two or more different normal distributions. The above information is applicable to clinical practice, for example in interpreting the significance of an individual’s assessment in relation to known population norms.

Page 171

Self-assessment

Explain the meaning of the following terms:

normal curve

probability

standard normal curve

standard score

transformed score

z score

True or false

1. z scores express how many standard deviations a particular score is from the mean.

2. Negative z scores are further from the mean than positive z scores.

3. Even when the distribution of raw scores is skewed, the standardized distribution will be normal.

4. The mean of a standard normal distribution is always 1.0.

5. The total area under the standard normal curve is 1.0.

6. The area of a normal curve between any two designated z scores expresses the proportion or percentage of cases falling between the two points.

7. The greater the value of

and s, the greater the value of the z scores in corresponding standard distributions.

8. About 10% of scores fall 3 standard deviations above the mean.

9. A standardized distribution has the same shape as the distribution from which it was derived.

10. Notwithstanding the level of skewness in a distribution, the standard normal curve is useful for determining the percentile rank of a score.

11. In a normal distribution, the higher the z score, the higher will be the frequency of the corresponding raw score.

12. 50% of scores fall between z = 0.5 and z = − 0.5.

13. In a normal curve, approximately 34% of the scores fall between z = 0 and z = − 1.

14. A percentile rank represents the number of cases falling above a particular score.

15. Given a bimodal distribution of raw scores, the standard normal curve is inappropriate for calculating percentile ranks.

16. z = − 2.58 has a percentile rank of 98 in a normal distribution.

17. z = 1.28 cuts off the highest 10% of scores in a normal distribution.

18. Numerous human characteristics are distributed approximately as a normal curve.

19. If 20% of scores fall into a given class interval, then the percentile rank of the upper real limit of the class interval is 20.

20. The percentile rank of z = 0 is always 50.

Multiple choice

1. Which of the following statements is true?

a A z score indicates how many standard units or deviations a raw score is above or below the mean.

b The mean of a standard normal distribution is always 0 (zero).

c The distribution of z scores takes the same shape as that of the raw scores from which they have been derived.

d All the above statements are true.

2. In an anatomy test, your result is equivalent to a standard or z score of 0.2. What does this z score imply?

a You performed poorly when compared to others.

b You performed very well when compared to others.

c Your result was slightly above average.

d Your result was slightly below average.

3. The z scores of three persons X, Y and Z in a statistical methods test were + 2.0, + 1.0 and 0.0, respectively. In terms of the original raw scores, which of the following statements is true?

a The raw score difference between X and Y is greater than the raw score difference between Y and Z.

b The raw score difference between X and Y is less than the raw score difference between Y and Z.

c The raw score difference between X and Y is equal to the raw score difference between Y and Z.

d No precise statement can be made about the relationships between the differences of the raw scores of X and Y and of Y and Z.

A group of patients has a mean weight of 80 kg, with a standard deviation of 10 kg. Questions 4 and 5 refer to these data.

Page 172

4. What is the standard score (z) for a patient whose weight is 50 kg?

a +3

b +2

c −2

d −3

5. You are told that a patient’s weight is two standard deviations below the mean. What is this patient’s weight?

a 60 kg

b 55 kg

c 50 kg

d 45 kg

6. We develop a new method of treating spastic hemiplegia by giving weekly ultrasound massages to the affected muscles. In a consecutive study of 200 children treated by this method we find that the average number of weeks to full recovery is 8, with a standard deviation of 2 weeks. Therefore we conclude that (given a normal distribution):

a treatment may be stopped after 8 weeks

b half of all children will need treatment for longer than 8 weeks

c 90% of children will be fully recovered after 12 weeks of treatment.

d a and c.

7. A percentile rank:

a represents the frequency of occurrence of a particular category

b tells you whether or not a distribution is skewed

c can be used to estimate the range of distribution

d tells you what percentage of scores fall at or below a particular score.

Use this information in answering questions 8–12: a normally distributed set of scores has a mean of 40 and a standard deviation of 8.

8. A raw score of 24 corresponds to a z score of:

a 3.0

b − 3.0

c 1.5

d − 1.5

e − 2.0

9. A z score of 1.25 corresponds to a raw score of:

a 50

b 10

c 30

d 56.4

e 40

10. The percentile rank of a raw score of 48 is:

a 34.13

b 15.87

c 84.13

d 65.87

e incalculable from information given.

11. The percentage of scores between 32–44 is:

a 68.26

b 53.28

c 46.82

d 32.74

e 43.32

12. The raw score which cuts off the lowest 5% of the population (rounded to the nearest whole number) is:

a 38

b 13

c 27

d 53

e 42

Questions 13–16 refer to a standard normal distribution.

13. The percentage of cases falling above z = 0.35 is:

a 16.8%

b 34.1%

c 84.1%

d 36.3%

14. The percentage of cases falling between z = − 1 and z = + 1 is:

a 16.8%

b 33.6%

c 34.1%

d 68.3%

15. The percentage of cases falling between z = − 0.5 and z = + 2 is:

a 85.0%

b 66.9%

c 28.6%

d 68.2%

16. The percentage of cases falling either below z = − 2 or above z = + 2 is:

a 95.5%

b 68.2%

c 47.7%

d 4.6%

Page 173

The following information should be used in answering questions 17–20: a test of reaction times has a mean of 10 and a standard deviation of 4 in the normal adult population.

17. A person scores 8. That person’s z score is:

a 2

b − 2

c − 0.5

d − 1

18. What percentage of the population would have scores up to and ncluding 14 on this test?

a 84.13

b 15.87

c 65.87

d 34.13

19. What is the percentile rank of a score of 8 on this test?

a 19.15

b 30.85

c 80.85

d 53.28

20. What score (to the nearest whole number) would cut off the highest 10% of scores?

a 1

b 14

c 15

d 18

Questions 21–25 refer to the following data: the mean for a population is 500, with a standard deviation of 90; the scores are normally distributed.

21. The percentile rank of a score of 667 is:

a 4.14

b 92.7

c 3.22

d 96.86

22. The proportion of scores which lie above 650 is:

a 0.4535

b 0.9535

c 0.0475

d 0.885

23. The proportion of scores which lie between 460 and 600 is:

a 0.4394

b 0.5365

c 0.4406

d 0.4635

24. The raw score which lies at the 90th percentile is:

a 615.20

b 384.80

c 616.10

d 383.90

25. The proportion of scores between 300–400 is:

a 0.3665

b 0.4868

c 0.8533

d 0.1203

Page 174