Page 125 

Chapter Twelve Measurement

CHAPTER CONTENTS

Introduction 125
Operational definitions and measurement 125
Objective and subjective measures 126
Desirable properties of measurement tools and procedures 126
Reliability 127
Validity 128
Standardized measures and tests 130
Measurement scale types 130
1. Nominal scales 131
2. Ordinal scales 131
3. Interval scales 131
4. Ratio scales 132
Summary 133
Self-assessment 133
True or false 133
Multiple choice 134

Introduction

The term measurement refers to the procedure of attributing qualities or quantities to specific characteristics of objects, persons or events. Meas-urement is a key process in quantitative research, evaluation and in clinical practice. If the measurement procedures in a study are inadequate its usefulness will be limited. Similarly, in clinical practice, the validity of diagnoses and treatment decisions can be compromised by inadequate measurement processes and tools.

The aims of this chapter are to:

1. Discuss key issues in measurement procedures.
2. Describe good practice in measurement in both research and clinical settings.
3. Evaluate and discuss the different types of measurement scales.

Operational definitions and measurement

Sometimes researchers start off with rather vague views of how to measure the variables included in a study. For instance, if researchers are interested in measuring ‘levels of pain’ experienced by patients, then the researchers must convert their general ideas about measuring pain to a tightly defined statement of how this is to be measured. Depending on their theoretical interpretation of the concept of ‘pain’, and the practical requirements of the investigation, one of the many pos-sible approaches to measurement of pain will then be selected.

  Page 126 

The process of converting theoretical ideas to a tightly defined statement of how variables are to be measured is called operationalization. It is important that researchers give exact details of how the measures were taken in order that others may judge their adequacy and appropriateness and be in a position to repeat the procedures in a new study. A study that is adequate in terms of design, sampling methods and sample size may nevertheless have poor validity due to the use of inadequate measurement techniques. Let us now discuss operationalization.

The operational definition of a variable is a statement of how the researcher in a particular study chooses to measure the variable in question. It should be unambiguous.

At the outset, let us note that in most circumstances there is no single best way of taking measurements. If a researcher claimed that her therapeutic techniques significantly increased ‘motor control’ in her sample of patients, the obvious question that arises is ‘What was meant by ‘motor control’ and how was it measured?’ If our researcher replied that she was interested in motor control as measured by the Plunkett Motor Dexterity Task scores, she has, in fact, supplied her operational definition. Another researcher may challenge the adequacy of this definition and substitute her own, stating that patients’ selfratings of control in various tracking tasks is a more appropriate definition.

A good operational definition will contain enough information to enable another researcher or clinician to replicate the measurement techniques used in the original study. Similarly, a good operational definition of a clinically relevant variable will enable a fellow professional to replicate the original diagnostic or assessment procedures. An operational definition can be an unambiguous description, a photograph or diagram, or the specification of a brand name of a standard tool. In describing a piece of research, one must include operational definitions of the measuring apparatus and all procedures, so that readers are quite clear as to what has been done and with whom and when.

Objective and subjective measures

A distinction is commonly drawn between object-ive and subjective measures, often with overtones of suspicion directed towards so-called ‘subjective’ measures. Let us make a much less value-laden distinction and define them as follows: objective measurements involve the measurement of physical quantities and qualities using measurement equipment; subjective measures involve ratings or judgments by humans of quantities and qualities.

One should not confuse the distinction between objective and subjective measures as corresponding to good or bad measurement techniques. Equipment might be improperly calibrated, complicated to use, or become damaged during an investigation. For instance, a researcher might have an absolutely terrible set of weight scales that give results far at variance with the correct measures. With the sophistication and complexity of much current measurement equipment, it is often difficult to calibrate equipment accurately without a complex calibration procedure. Just because a machine is involved in measurement does not mean that the results will be accurate. Furthermore, many quantities and qualities associated with persons and clinical phenomena are difficult to measure objectively, such as the personal attractiveness of individuals, or aspects of patient–therapist relationships, or the ‘quality’ of a patient’s gait.

Desirable properties of measurement tools and procedures

Measurement tools and procedures ought to yield measurements that are reproducible, accurate, applicable to the measurement task in hand and practical or easy to use. These properties are often given the technical terms of reliability, validity, applicability and practicability. These properties will be reviewed in detail in the following sections. Measurement theory and method are concerned with the development of measurement tools that maximize these properties.

  Page 127 

Before these specific test properties are reviewed, it is useful to review some basic concepts in test theory. In any measurement, we have three related concepts: the observed value or test score, the true value or test score and measurement error. Thus if I could be weighed on a completely accurate set of weighing scales, my true score might be 110 kg. However, the scales that I use in my bathroom might give me a reading of 100 kg. The difference between the observed score and my true score is the measurement error. This relationship can be expressed in the form of an equation such that:


image


Thus, measurement tools are designed with a view to minimize measurement error so that the observed value we obtain from our assessment process is close to the true value.

Reliability

Reliability is the property of reproducibility of the results of a measurement procedure or tool. There are several different ways in which reliability can be assessed. These include test–retest reliability, inter-observer reliability and internal consistency. Let us examine each.

Test–retest reliability

A common way to assess test reliability is to administer the same test twice to the same participants. The results obtained from the first test are then correlated with the second test. Reliability is generally measured by a correlation coefficient that may vary from − 1 to + 1 in value. A test–retest reliability of + 0.8 or above is considered to be sound. When the measurement process involves clinical ratings, e.g. a clinician’s rating of the dependency level of a cerebrovascular accident (CVA) patient, test–retest reliability is sometimes termed intra-observer reliability, i.e. the same observer rates the same patients twice and the results are correlated.

Inter-observer (inter-rater) reliability

A common issue in clinical assessment is the extent to which clinicians agree with each other in their assessments of patients. The extent of agreement is generally determined by having two or more clinicians independently assess the same patients and then comparing the results using correlations. If the agreement (correlation) is high then we have high inter-observer or inter-rater reliability.

Table 12.1 illustrates examples of both high and low inter-observer reliability on ratings of patients on a 5-point scale. Let’s imagine that this scale measures the level of patient dependency and need for nursing support. As we mentioned earlier, the degree of reliability is quantitatively expressed by correlation coefficients. However, by inspection you can see that in Table 12.1 there is a high degree of disagreement in the two observers’ ratings in the ‘Low reliability’ column. In this instance the clinical ratings would be unreliable, and inappropriate to use in the research project. However, the outcome shown in the ‘High reliability’ column in Table 12.1 shows a high level of agreement.

Table 12.1 Inter-observer reliability

image

An example of a classic study of inter-rater and intra-rater reliability is provided by Coppleson et al (1970). Over an 11-month period, 29 biopsy slides with suspected Hodgkin’s disease were presented to three pathologists. The pathologists were asked to make a number of judgments about features of the specimens. The specimens were unlabelled and over the year of the study were presented on two occasions to each of the three observers. This permitted an assessment of the test–retest or intra-rater reliability of each observer. The three observers disagreed with themselves on 7, 8 and 9 occasions, respectively, out of the total number of specimens. Overall, interrater agreement was calculated at 76% or 54%, depending on the diagnostic classification system used by the observers. Studies of clinical judgment often find low levels of agreement between raters (see, for example, Doyle & Thomas 2000, Thomas et al 1991).

  Page 128 

Internal consistency

Measurement tools will often consist of multiple items. For example, a test of your knowledge of research methods might include 50 items or questions. Similarly, a checklist designed to measure activities of daily living might have 20 items. The internal consistency of a test is the extent to which the results on the different items correlate with each other. If they tend to be highly correlated with each other, then the test is said to be intern-ally consistent. Internal consistency is also measured by a form of correlation coefficient known as Cronbach alpha and an alpha of above 0.8 is considered to be a desirable property for a test.

Thus, the reliability or reproducibility of an assessment or test can be determined in several different ways including the test–retest, intra-rater, inter-rater and internal consistency methods.

Validity

Validity is concerned with accuracy of the test procedure. Just because one keeps getting the same result upon repeated administrations, or agreement among independent observers, doesn’t mean that the results are accurate. If I jump on the bathroom scales and get a result of 40 kg and then jump off the scales and then get back on and it is still reading 40 kg, this reading is certainly reliable, but obviously it is an error (for readers who do not know us, 40 kg as an observed score for our weight entails major measurement error!). Thus the adequate reproducibility or reliability of a test or assessment process is essential, but we also need the results to be accurate or valid.

A case study in clinical test validity

The early detection of breast cancer in women has been recognized as an important public health initiative in many countries. Common ways of detecting suspicious lumps include breast self-examination and mammography, an X-ray of the breasts. Mammography is a common screening procedure and some countries such as Australia have funded large-scale programmes to promote it.

However, commendable as these initiatives may be, there are some doubts about the validity of mammography as a diagnostic tool. Walker & Langlands (1986), in their classic research project, studied the mammography results of 218 women, who, through the use of a diagnostic biopsy, were known to have breast cancer. Of the 218 women with cancer, 95 (43.6%) had recorded a (false) negative mammography test result. Of these patients, 47 had delayed further investigation and treatment for almost a year, no doubt relieved and reassured by their ‘favourable’ test results. The delays in treatment, given what we know about the relationship between early intervention and improved prognosis, in all likelihood seriously compromised the health and ultimate survival of these women. In this instance, the accuracy (or lack of it) of the test results has very important consequences for the people concerned. Test quality is of profound importance in research and clinical practice, as is demonstrated by this example.

Types of test validity

As with reliability, test validity may be assessed in a number of different ways. These include content validity, sensitivity and specificity, and predictive validity.

Content or face validity

In many contexts it is difficult to find external measures to correlate with the measure to be validated. For example, an examination in a particular academic subject may be the sole measure of the student’s performance available to determine grades. How can it be determined whether the tests administered will be valid or not? One way is to write down all the material covered in the subject and then make sure that there is adequate sampling from the overall content of the material delivered in the subject. If this criterion is satisfied it can be said that the test has content or face validity. You may have had an experience where you felt that a subject assessment task had low content validity in that it did not reflect the material presented in the subject.

  Page 129 

Sensitivity and specificity

The concepts of sensitivity and specificity are most commonly applied to diagnostic tests, where the purpose of the test is to determine whether the patient has a particular problem or illness. There are four possible outcomes for a test result, as shown in Table 12.2.

Table 12.2 Possible outcomes of test results

  Real situation
Test result Disease present Disease not present
Disease present True positive False positive
Disease not present False negative True negative

Sensitivity refers to the proportion of people who test as positive who really have the disease (i.e. the proportion of true positives out of all positives). Specificity refers to the proportion of individuals who test as negative who really do not have the disease (i.e. the proportion of true negatives out of all negative test results). If a diagnostic test has a sensitivity of 1.0 and specificity of 1.0 it is a perfectly accurate or valid test. Most clinical tests are not perfect and some have unknown or quite low sensitivity and specificity.

Predictive validity

Predictive validity is concerned with the ability of a test to predict values of it or other tests in the future. Some tests are designed to assist with prognostic decisions, i.e. what is going to happen in the future; and for these tests high predictive validity is an important quality.

Let us examine an example of predictive validity. Say a researcher devised a screening rating scale, X, for selecting patients to participate in a rehabilitation programme. The effectiveness of the rehabilitation programme is assessed with rating scale Y. Say that each rating scale involves assigning scores of 1–10 to the patients’ performance. Table 12.3 illustrates two outcomes: low predictive validity and high predictive validity.

Table 12.3 Predictive validity

image

Although the calculation of correlation coefficients is needed to examine quantitatively the predictive validity of test X, it can be seen in Table 12.3 that in the ‘Low predictive validity’ column, the scores on X are not clearly related to the level of scores on Y. On the other hand, in the ‘High predictive validity’ column, scores on the two variables correspond quite closely. Within the limits of the fact that only six subjects were involved in this hypothetical study, it is clear that only the results in the ‘High predictive validity’ column are consistent with rating scale X being useful for predicting the outcome of the rehabilitation programme, as measured on scale Y.

At this point we should again refer to the concepts of internal and external validity. The concepts of predictive and content validity apply to the specific tests and measures a researcher or clinician uses. Internal and external validity refer to characteristics of the total research project or programme. Test validity should not be confused with other forms of research design validity such as external and internal validity.

External validity is concerned with the researcher’s ability to generalize her or his findings to other samples and settings, i.e. the generalizability of the study findings. The external validity or generalizability of a study is affected by the sample size, the method of sampling, and the design characteristics and measures used in the study. If we say a study has high external validity, we mean that its findings generalize to other settings and samples outside the study. It does not make sense to talk about the external validity of a particular test.

Similarly, internal validity is concerned with the design characteristics of experimental studies. If a study is internally valid, any effects/changes or lack thereof in the dependent (outcome) vari-able can be directly attributed to the manipulation of the independent variable. It is important not to confuse the meanings of these terms.

Standardized measures and tests

Because reliability and validity of measures are so important, many researchers have devoted considerable time and energy to the development of measuring instruments and procedures that have known levels of reliability and validity.

The development of measurement standards for physical dimensions such as weight, length and time has been fundamental for the growth of science. That is, we have standards for comparing our measurements of a variable and we can meaningfully communicate our findings to colleagues living anywhere in the world. There are a variety of clinical measures, for instance the Apgar tests for evaluating the viability of neonates, that represent internationally recognized standards for communicating information about attributes of persons or disease entities.

Furthermore, there are standards relevant to populations, in terms of which assessments of individuals become meaningful. For instance, there are standards for the stages of development of infants: levels of physical, emotional, intellectual and social development occurring as a function of age.

Some tests have been trialled on large samples, and reliability and validity levels recorded. Tests that have been trialled in this way are known as standardized measures or tests. A large variety is available, particularly in the clinical and social areas. Bowling’s Measuring Health (1997) is a very useful source of descriptions of such tests. Many US firms and cooperatives market standardized tests. However, many researchers use tests and measures that have not been standardized, and do not report levels of reliability in their literature. This is of particular concern in studies where subjective measures with incomplete operational definitions are employed.

Measurement scale types

Measurement can produce different types of numbers, in the sense that some numbers are assigned different meanings and implications from others. For instance, when we speak of Ward 1 and Ward 2, we are using numbers in a different sense from when we speak of infant A being 1 month old, and infant B being 2 months old. In the first instance, we used numbers for naming; in the second instance the numbers indicate quan-tities. There are four scale types, distinguished by the types of numbers produced by the measurement of a specific variable.

  Page 131 

1. Nominal scales

The ‘lowest’ level of the measurement scale types is the nominal scale, where the measurement of a variable involves the naming or categorization of possible values of the variable. The measurements produced are ‘qualitative’ in the sense that the categories are merely different from each other. If numbers are assigned to the categories they are merely labels and do not represent real quan-tities; for example, Ward 1 and Ward 2 might be renamed St. Agatha’s Ward and St. Martha’s Ward without conveying any less information. Table 12.4 shows some other examples of nom-inal scaling.

Table 12.4 Some examples of nominal scaling

Variable Possible values
Patients’ admission numbers 3085001, 3085002
Sex Male, female
Religion Catholic, Protestant, Jewish, Muslim, Hindu
Psychiatric diagnosis Manic-depressive, schizophrenic, neurotic
Blood type A, B, AB, O
Cause of death Cardiac failure, neoplasm, trauma

The only mathematical relationship pertinent to nominal scales is equivalence or non-equivalence; that is, A = B or A ≠ B. A specific value of a variable either falls into a specific category, or it does not. Thus, there is no logical relationship between the numerical value assigned to its category and its size, quantity or frequency of occurrence. The arbitrary values of a nominal scale can be changed without any loss of information.

2. Ordinal scales

The next level of measurement involves rank ordering values of a variable. For example, 1st, 2nd or 3rd in a foot race are values on an ordinal scale. The numbers assigned on an ordinal scale signify order or rank.

With ordinal scales, statements about ranks can be made. Where A and B are values of the variable, we can say A > B or B > A. For instance, we can say Mrs Smith is more cooperative than Mr Jones (A > B), or Mr Jones is more cooperative than Ms Krax (B > C). We cannot, however, make any statements about the relative sizes of these differences. Examples of ordinal scales are shown in Table 12.5.

Table 12.5 Some examples of ordinal scales

Variable Possible values
Severity of condition Mild=1, moderate=2, severe=3, critical=4
Patients’ satisfaction with treatment Satisfied=1, undecided=2, dissatisfied=3
Age group Baby, infant, child, adult, geriatric
Cooperativeness with nurse or patients in a ward (In decreasing order) Mrs Smith, Mr Jones, Ms Krax

3. Interval scales

Examples of interval scales are shown in Table 12.6. For these scales, there is no absolute zero point; rather, an arbitrary zero point is assigned. For instance, 0°C does not represent the point at which there is no heat, but the freezing point of water. An IQ of zero would not mean no intelligence at all, but a serious intellectual or perceptual problem in using the materials of the test.

Table 12.6 Some examples of interval scales

Variable Possible values
Heat (Celsius or Fahrenheit) −10ºC, +20ºC, +5ºC, +10ºF
Intelligence (IQ) 45, 100, 185
  Page 132 

The use of an interval scale enables identificat-ion of equal intervals between any two values of measurements: we can say A − B = B − C. For example, if A, B and C are taken as IQ scores, and A = 150, B = 100, and C = 50, then it is true that A − B = B − C. However, we cannot say that A = 3C (that A is three times as intelligent as C).

4. Ratio scales

Ratio scales have what is called a meaningful or non-arbitrary zero point. For example, in the Kelvin temperature scale, 0°K (or absolute zero) represents an absence of heat, in that the mol-ecules have stopped vibrating completely; whereas 0°C is simply the freezing point of water. The Centigrade or Celsius zero is an arbitrary one tied to the freezing point of a particular compound. Thus °K is a ratio scale and °C is an interval scale. Examples of variables measured on ratio scales are shown in Table 12.7.

Table 12.7 Variables measured on ratio scales

Variable Possible values
Weight 10 kg, 20 kg, 100 kg
Height 50 cm, 150 cm, 200 cm
Blood pressure 110 mmHg, 120 mmHg, 160 mmHg
Heart beats 10 per minute, 30 per minute, 50 per minute
Rate of firing of a neurone 10 per millisec, 20 per millisec, 30 per millisec
Protein per blood volume 2 mg/cc, 5 mg/cc, 10 mg/cc
Vocabulary 100 words, 1000 words, 30 000 words

Table 12.8 compares the characteristics of different scales or levels of measurement.

Table 12.8 Characteristics of levels of measurement

image

Interval and ratio scales represent quantitative measurements. A ratio scale is the ‘highest’ scale of measurement, in the sense that it involves all the characteristics of the other scales, as well as having an absolute zero. A measurement on a higher level can be transformed into one on a lower level in this framework, but not vice versa, because the higher scale measurement contains more information and the values can be put to more use by permitting more mathematical operations than those on a lower level.

Also, a given variable might be measured on one of several types of scales, depending on the needs of the investigator. Consider, for instance, the variable ‘height’. This variable could be measured on any of the four scales, as follows:

1. Ratio scale. The height of individuals above the ground, for example, 180 cm.
2. Interval scale. The height of individuals above an arbitrary surface; for example 100 cm above the surface of a bench.
3. Ordinal scale. The comparative heights of individuals, for example, rank-ordered from tallest to shortest.
4. Nominal scale. Categorizing individuals as, for example, ‘normal’ or ‘abnormal’.

The different types of measurement scales are important when considering statistical analysis of data. Statistics are numbers with special properties from data. The type of measurement scale determines the type of statistic that is appropriate for its analysis. This issue is taken up later in this book.

  Page 133 

Summary

This section defines good measurement practice and its importance in research and clinical practice.

In good measurement practice we need to define concepts operationally, so that other investigators can also carry out or assess the measurement procedure. We also need to establish the reliability and validity of our measurements. A high degree of reliability and validity is necessary for minimizing measurement error. We have noted that measures involving the exercise of human judgment (i.e. ‘subjective’ measures) are not necessarily unreliable or invalid.

Four different scale types were discussed: nominal, ordinal, interval, and ratio. These scales have different characteristics, particularly in relation to the permissible mathematical operations. In subsequent sections, we shall see that the scale type involved in our measurements determines the descriptive and inferential statistics appropriate for describing and analysing the data.

Self-assessment

Explain the meaning of the following terms:

content validity
inter-observer reliability
interval scale
measurement
nominal scale
objective measures
operational definition
ordinal scale
predictive validity
ratio scale
standardized measures
standardized tests
subjective measures
test–retest reliability

True or false

1. The term ‘measurement’ refers to the assignment of qualities or quantities to specific aspects of objects or events.
2. An operational definition of a variable entails an explicit statement of how the variable is to be measured.
3. An objective measure is produced by the use of a measuring instrument.
4. A reliable test or measure will tend to produce accurate results.
5. A test–retest reliability as indicated by a correlation coefficient of 0.9 indicates a rather low reliability.
6. When an instrument is valid, we mean that it is measuring the characteristic which it is supposed to be measuring.
7. To establish the predictive validity of a test, we correlate the scores of individuals on the test with scores on other relevant measures.
8. A high correlation for scores between a college entrance examination and the subsequent examination results of students indicates that the entrance examination lacks predictive validity.
9. One of the advantages of using a standardized test is that its validity and reliability are already known.
10. Ordinal measures involve rank-ordering the values of a variable.
11. An interval scale has an absolute zero.
12. All arithmetical operations are permissible with measurements based on interval scales.
13. The levels of measurement which have the properties of distinctiveness, ordering of magnitude and equal intervals are the ordinal and nominal scales.
14. The statement ‘Anxiety is a feeling of impending injury’ is an example of an operational definition.
15. ‘Operationalization’ is defined as a statement specifying how a variable should be measured.
16. For a general concept like ‘intelligence’ only one operational definition is possible.
17. Subjective measures are necessarily unreliable.
18. Objective measures can be invalid and unreliable.
19. A correlation coefficient of 1.0 indicates an excellent test–retest reliability.
20. If a measurement is valid, then it is necessarily reliable.
21. A test may be highly reliable but invalid.
22. A low inter-observer reliability implies that the observed scores for a set of subjects on repeated tests tend to be unrelated to one another.
23. High predictive validity necessarily implies high content validity.
  Page 134 
24. With a nominal scale, we can only make statements about the distinctiveness of scores.
25. The mathematical statement A – B=B – C is in correct form, given an ordinal scale.
26. Nominal scales do not have the characteristic of ‘distinctiveness’.
27. The variable ‘motor functioning’ could be measured on either an ordinal or a nominal scale.
28. The variable ‘blood sugar level’ could be measured on either an ordinal or a ratio scale.
29. Ordinal scales are generally preferable to interval scales.
30. Statements such as y =zx can be made validly with ratio measures.

Multiple choice

1. Which of the following does not include an operational definition?
a Patients were encouraged to eat healthy food.
b Males under 60 who were currently in fulltime employment were the population for study.
c Anxiety was measured by the Spielberger Anxiety Scale.
d Patients who did not return for a scheduled follow-up appointment four weeks after initial treatment were classified as dropouts.
e Students who get over half the test items correct will be classified as having passed.
2. Subjective measures:
a are not operationally defined
b are always less valid than objective measures
c involve measuring physical attributes
d are not reliable because everybody’s subject-ive experience is different
e none of the above.
3. Which of the following statements does not include an operational definition of the dependent variable?
a In the present study, intelligence was measured on the Stanford Binet IQ test.
b In the present study, intelligence was measured in terms of the level of subjects’ know-ledge of their cultures.
c In the present study, intelligence was measured by the number of hairs on the subjects’ heads.
d b and c
e a and c.
4. Objective measures:
a are always more valid and reliable than subjective measures
b involve extensive human intuition for interpretation
c are always more reliable, but not necessarily more valid, than subjective measures
d are used in experimental, but not in non-experimental, investigations
e involve the measurement of physical qualities and quantities using measuring equipment.
5. A test is assessed for its reliability and its predictive validity. Both these measures are expressed as correlation coefficients, with the reliability coefficient being 0.9 and the predictive validity coefficient being 0.2. This indicates that:
a the test is not reliable
b the test has face validity
c the test is reliable but does not appear to measure the variable of interest
d the test is reliable, so it must measure the variable of interest
e the test is a good one.
6. If the test–retest reliability of a measure is low, then it follows that:
a the scores for different people tend to be different
b the validity must be high
c the scores for different people tend to be the same
d the same person measured twice tends to produce different results.
7. If a test is valid then:
a it might be reliable
b it must be reliable
c the reliability is unaffected
d it must be unreliable.
8. The reading ‘64 kilograms’ is a value on a(n):
a ratio scale
b interval scale
c ordinal scale
d nominal scale.
9. In a study of weight problems in a sample of pre-adolescent children, the relevant variable was expressed as ‘percentage overweight’ or ‘underweight’, given the child’s height. This is an example of a(n):
a ordinal scale
b ratio scale
c nominal scale
d interval scale.
  Page 135 
10. The gender of patients is an example of a(n):
a ratio scale
b nominal scale
c ordinal scale
d interval scale.
11. Which of the following variables has been labelled with an incorrect measuring scale?
a The number of heart beats per minute: interval.
b Platform numbers at a railway station: nominal.
c Finishing order in a horse race: ordinal.
d Self-rating of anxiety levels on a five-point scale: ordinal.
12. ‘10th’ is a value on a(n):
a ratio scale
b interval scale
c ordinal scale
d nominal scale.
13. Response delay in milliseconds is an example of a(n):
a ratio scale
b ordinal scale
c interval scale
d nominal scale.
14. In a patient records system, patients are randomly assigned a unique identification number. These numbers represent a(n):
a nominal scale
b ratio scale
c interval scale
d ordinal scale.

Therapists assess levels of clients’ ‘independence’ using the following scale:

0: Totally dependent on assistance from other/s for the activity.
1: Maximum assistance from other/s; can assist in a limited way.
2: Minimum assistance from one person; contributes significantly in carrying out the activity.
3: Supervised by another person due to mental/physical limitations and/or to ensure safety.
4: Independent with aids. Safe and consistent; would need assistance/supervision without aids.
5: Independent. Safe and consistent without aids, supervision or assistance.

Questions 15–17 refer to this scale.
15. The above scale is:
a ratio
b interval
c nominal
d ordinal.
16. If the subjects are assigned scores by the therapist’s clinical judgment, then the measurement process is:
a subjective
b objective
c unreliable
d invalid
e a and b.
17. Given the following three interval scale independence scores for three clients, A, B and C:
A: 4
B: 2
C: 0
which of the following statements is (are) true?
a Client A is twice as independent as B.
b The difference in independence between clients A and B is the same as that between clients B and C.
c Client A is more independent than either B or C.
d b and c.
18. Which of the following measures of the variable ‘weight’ is nominal?
a Weight in kg.
b Weight as percentage overweight in relation to ‘healthy’ weight.
c Weight as obese/overweight/normal/underweight/grossly underweight.
d Weight as ‘normal against pathological’ (obese or grossly underweight).
19. If a variable is not defined operationally, then:
a the investigation of the variable might be difficult to replicate
b it might be difficult to explain how the variable was measured
c a and b
d neither a nor b.
20. The terms ‘external’ and ‘internal’ validity refer to:
a complete investigations
b specific measurements
c measurement scales, as a whole
d characteristics of standardized, objective measures.
  Page 136