Chapter Twenty Three Critical evaluation of published research

CHAPTER CONTENTS

Introduction 263

Critical evaluation of the introduction 264

Adequacy of the literature review 264

Clearly defined aims or hypotheses 264

Selection of an appropriate research strategy 264

Selection of appropriate variables/information to be collected 264

Critical evaluation of the methods section 265

Research subjects/participants 265

Instruments/apparatus/tools 265

Procedure 265

Critical evaluation of the results 266

Critical evaluation of the discussion 267

Summary 269

Self-assessment 269

True or false 269

Multiple choice 269

Introduction

By the time a research report is published in a refereed journal, it has been critically scrutinized by several experts and, usually, changes have been made to the initial text by the author(s) to respond to the referees’ comments. Nevertheless, even this thorough evaluation procedure doesn’t necessarily guarantee the validity of the design or the conclusions presented in a published paper. Ultimately, you as a health professional must be responsible for judging the validity and relevance of published material to your own clinical activities. The evidence-based practice movement focuses on the ways in which practitioners can incorporate better procedures into their practice based upon well-founded research and evaluation evidence. The systematic review processes employed by bodies such as the Cochrane and Campbell Collaborations are intended to assist clinicians in the selection of interventions that are well proven (Ch. 24).

The proper attitude to take with published material, including systematic reviews, is hard-nosed scepticism, notwithstanding the authority of the source. This attitude is based on our understanding of the uncertain and provisional nature of scientific and professional knowledge. In addition, health researchers deal with the investigation of complex phenomena, where it is often impossible for ethical reasons to exercise the desired levels of control or to collect crucial information required to arrive at definitive conclusions. The aim of critical evaluation is to identify the strengths and weaknesses of a research publication, so as to ensure that patients receive assessment and treatment based on the best available evidence.

Page 264

The aim of this chapter is to demonstrate how select concepts in research design, analysis and measurement can be applied to the critical evaluation of published research. The chapter is organized around the evaluation of specific sections of research publications.

The specific aims of this chapter are to:

1. Examine the criteria used for the critical evaluation of a research paper.

2. Discuss the implications of identifying problems in design and analysis in a given publication.

3. Outline briefly strategies for summarizing and analysing evidence from a set of papers.

4. Discuss the implications of critical evaluation of research for health care practices.

Critical evaluation of the introduction

The introduction of a paper essentially reflects the planning of the research. Inadequacies in this section might signal that the research project was erroneously conceived or poorly planned. The following issues are essential for evaluating this section.

Adequacy of the literature review

The literature review must be sufficiently complete so as to reflect the current state of knowledge in the area. Key papers should not be omitted, particularly when their results could have direct relevance to the research hypotheses or aims. Researchers must be unbiased in presenting evidence that is unfavourable to their personal points of view. This is why we now have systematic review procedures, such as those utilized by the Cochrane Collaboration, so as to avoid inappropriate and biased exclusion or inclusion of work that supports or challenges a point of view favoured by the researcher or other researchers who hold contrary opinions. Poor review of the literature could lead to the unfortunate situation of repeating research or making mistakes that could have been avoided if the previous work’s findings had been incorporated into formulation of the research design.

Clearly defined aims or hypotheses

As stated in Chapter 2, the aims or hypotheses of the research should be clearly stated. If this clarity in expression of the aims is lacking, then the rest of the paper will be compromised. In a quantitative research project, it is usual to see a statement of the hypotheses as well as the research aims. All research, whether qualitative or quantitative, should have a clear and recognizable statement of aim(s).

Selection of an appropriate research strategy

In formulating the aims of the investigation, the researcher must have taken into account the appropriate research strategy. For instance, if the demonstration of causal effects is required, a survey may be inappropriate for satisfying the aims of the research. If the purpose of the study is to explore the personal interpretations and meanings of participants then a qualitative strategy will be best. Some researchers now advocate mixed designs where multiple studies are performed to examine different perspectives of the same issues. Thus in a study of views concerning health practices, a focus group discussion may also be accompanied by a structured questionnaire even within the same study sample, so that the findings from each may be used to inform the total understanding of the research issue(s) under study.

Selection of appropriate variables/information to be collected

In a quantitative study, if the selection of the variables is inappropriate to the aims or questions being investigated, then the investigation will not produce useful results. Similarly, in a qualitative study, the information to be collected must be appropriate to the research aims and questions.

Page 265

Critical evaluation of the methods section

A well-documented methods section is a necessary condition for understanding, evaluating and perhaps replicating a research project. In general, the critical evaluation of this section will allow a judgment of the validity of the investigation to be made.

Research subjects/participants

This section shows if the study participants were representative of the intended target group or population and the adequacy of the sampling model used.

Sampling model used

In Chapter 3, we outlined a number of sampling models that can be employed to optimize the representativeness of a study sample. If the sampling model is inappropriate, then the sample might be unrepresentative, raising questions concerning the external validity of the research findings. In qualitative research, although the participant sampling method may be less formal than in a quantitative study, the issue of participant representativeness is still pertinent in terms of being able to apply the results more broadly.

Sample size/number of participants

Use of a small sample is not necessarily a fatal flaw of an investigation, if the sample is representative. However, given a highly variable, heterogeneous population, a small sample will not be adequate to ensure representativeness (Ch. 3). Also, a small sample size could decrease the power of the stat-istical analysis in a quantitative study (Ch. 20). As discussed in the qualitative sampling section of this text, unlike in quantitative sampling procedures, there is not widespread agreement among qualitative researchers as to the issue of how many participants are needed in such studies.

Description of the study participants

A clear description of key participant characteristics (for example age, sex, type and severity of their condition) should be provided. When necessary and possible, demographic information concerning the population from which the participants have been drawn should be provided. If not, the reader cannot adequately judge the representativeness of the sample.

Instruments/apparatus/tools

The validity and reliability of observations and/or measurements are fundamental characteristics of good research. In this section, the investigator must demonstrate the adequacy of the tools used for the data collection.

Validity and reliability

The investigator should use standardized tools, or establish the validity and reliability of new tools used. A lack of proven validity and reliability will raise questions about the adequacy of the research findings.

Description of tools

A full description of the structure and use of novel tools should be presented so that they can be replicated by independent parties.

Procedure

A full description of how the investigation was carried out is required for both replication and for the evaluation of its internal and external validity. This requirement applies to both qualitative and quantitative studies.

Adequacy of the design

It was stated previously that a good design should minimize alternative conflicting interpretations of the data collected. For quantitative research aimed at studying causal relationships, poor design will result in uncontrolled influences by extraneous variables, muddying the identification of causal effects. In Section 3, we looked at a variety of threats to internal validity which must be considered when critically evaluating an investigation. In a qualitative study the theoretical approach taken in the study design or approach should be clearly stated.

Page 266

Control groups

In quantitative research a common way of controlling for extraneous effects is the use of control groups (such as placebo, no treatment, conventional treatment). If control groups are not employed, then the internal validity of the investigation might be questioned. Also, if placebo or untreated groups are not present, the size of the effect due to the treatments might be difficult to estimate.

Subject assignment

When using an experimental design, care must be taken in the assignment of subjects so as to avoid significant initial differences between treatment groups. Even when quasi-experimental or natural comparison strategies are used, care must be taken to establish the equivalence of the groups.

Treatment parameters

It is important to describe all the treatments given to the different groups. If the treatments differ in intensity, or the administering personnel take different approaches, the internal validity of the project is threatened. The adherence of the study in the delivery of the intervention to the intended intervention is sometimes called treatment fidelity.

Rosenthal and Hawthorne effects

Whenever possible, intervention studies should use double- or single-blind procedures. If the participants, researchers or observers are aware of the aims and predicted outcomes of the investigation, then the validity of the investigation will be threatened through bias and expectancy effects. In qualitative research, it is very important that the research findings are not unduly influenced by the personal positions of the researchers in a way that obscures the meanings and interpretations of the research participants. Of course, the position of the researcher in any study, whether qualitative or quantitative, will to some extent influence the findings but this needs to be kept to a minimum.

Settings

The setting in which a study is carried out has implications for external (ecological) validity. An adequate description of the setting is necessary for evaluating the generalizability of the findings. The context of the investigation may have important effects on the study outcomes. Research conducted in the investigator’s lab or office may yield different results to the same work conducted in the field.

Times of treatments and observations

In intervention studies the sequence of any treatments and observations must be clearly indicated, so that issues such as series and confounding effects can be detected. Identification of variability in treatment and observation times can influence the internal validity of experimental, quasi-experimental or n = 1 designs, resulting in, for instance, internal validity problems.

Critical evaluation of the results

The results should represent a sound and, where appropriate, statistically correct summary and analysis of the data. Inadequacies in this section could indicate that inferences drawn by the investigator were erroneous.

Tables and graphs

Data should be correctly tabulated or drawn and adequately labelled for interpretation. Complete summaries of all the relevant findings should be presented.

Selection of statistics

Where appropriate both descriptive and inferential statistics must be selected according to specific rules. The selection of inappropriate statistics could distort the findings and lead to inappropriate inferences.

Calculation of statistics

Clearly, both descriptive and inferential statistics must be correctly calculated. The use of computers generally ensures this, although some attention must be paid to gross errors when evaluating the data presented.

Page 267

Methods of qualitative analysis

The methods chosen must complement the theoretical approach taken in the study and be performed according to the specified protocols.

Critical evaluation of the discussion

In the discussion, investigators draw inferences from the information or data they have collected in relation to the initial aims, questions, and/or hypotheses of the investigation. Unless the inferences are correctly made, the conclusions drawn might lead to useless and dangerous treatments being offered to clients.

Drawing correct inferences from the collected information/data

The inferences from the collected information or data must take into account the limitations of the study and the analytical methods used to analyse them. In the quantitative domain we have seen, for instance in Chapter 16, that correlations do not necessarily imply causation, or that a lack of significance in the statistical analysis could imply a Type II error or incorrect missing of a real trend or finding (see Ch. 20). In the qualitative domain, the findings must follow reasonably from the information collected in the investigation according to the paradigm used.

Logically correct interpretations of the findings

Interpretations of the findings must follow from the information collected, without extraneous evidence being introduced. For instance, if the investigation used a single-participant design, the conclusions should not claim that a procedure is generally useful for the entire population.

Research protocol deviations

In interpreting the data or information collected in a study, the investigator must indicate, and take into account, unexpected deviations from the intended research protocols. For instance, in a quantitative study a placebo/active treatment code might be broken, or ‘contamination’ between control and experimental groups might be discovered. In a qualitative study, it could be that participants have conversed with each other about the research prior to one of the participants completing participation. If such deviations are discovered by investigators they are obliged to report these, so that the implications for the results might be taken into account.

Generalization from the findings

Strictly speaking, the data obtained from a given sample are generalizable only to the population from which the participants were drawn. This point is sometimes ignored by investigators and the findings are generalized to subjects or situations which were not considered in the original sampling plan. Qualitative researchers may vary in their willingness to claim generalizability of their findings outside the actual research participants but this must also be systematically considered.

Statistical and clinical significance

As was explained in Chapter 22, in quantitative studies, obtaining statistical significance does not necessarily imply that the results of an investigation are clinically applicable or useful. In deciding on clinical significance, factors such as the size of the effect, side effects and cost-effectiveness, as well as value judgments concerning outcome, must be considered.

Theoretical significance

It is necessary to relate the results of an investigation to previous relevant findings that have been identified in the literature review. Unless the results are logically related to the literature, the theoretical significance of the investigation remains unclear. The processes involved in comparing the findings of a set of related papers are introduced in the next subsection.

Table 23.1 summarizes some of the potential problems, and their implications, which might emerge in the context-critical evaluation of an investigation. A point which must be kept in mind is that, even where an investigation is flawed, useful knowledge might be drawn from it. The aim of critical analysis is not to discredit or tear down published work, but to ensure that the reader understands its implications and limitations with respect to theory and practice.

Table 23.1 Checklist for evaluating published research

Problems which might be identified	Possible implications in a research article
1. Inadequate literature review	Misrepresentation of the conceptual basis for the research
2. Vague aims or hypotheses	Research might lack direction; interpretation of evidence might be ambiguous
3. Inappropriate research strategy	Findings might not be relevant to the problem being investigated
4. Inappropriate variables selected	Measurements might not be related to concepts being investigated
5. Inadequate sampling method	Sample might be biased; investigation could lack external validity
6. Inadequate sample size	Sample might be biased; statistical analysis might lack power
7. Inadequate description of sample	Application of findings to specific groups or individuals might be difficult
8. Instruments lack validity or reliability	Findings might represent measurement errors
9. Inadequate design	Investigation might lack internal validity; i.e. outcomes might be due to uncontrolled extraneous variables
10. Lack of adequate control groups	Investigation might lack internal validity; size of the effect difficult to estimate
11. Biased subject assignment	Investigation might lack internal validity
12. Variations or lack of control of treatment parameters	Investigation might lack internal validity
13. Observer bias not controlled (Rosenthal effects)	Investigation might lack internal and external validity
14. Subject expectations not controlled (Hawthorne effects)	Investigation might lack internal and external validity
15. Research carried out in inappropriate setting	Investigation might lack ecological validity
16. Confounding of times at which observations and treatments are carried out	Possible series effects; investigation might lack internal validity
17. Inadequate presentation of descriptive statistics	The nature of the empirical findings might not be comprehensible
18. Inappropriate statistics used to describe and/or analyse data	Distortion of the decision process; false inferences might be drawn
19. Erroneous calculation of statistics	False inferences might be drawn
20. Drawing incorrect inferences from the data analysis (e.g. Type II error)	False conclusions might be made concerning the outcome of an investigation
21. Protocol deviations	Investigation might lack external or internal validity
22. Over-generalization of findings	External validity might be threatened
23. Confusing statistical and clinical significance	Treatments lacking clinical usefulness might be encouraged
24. Findings not logically related to previous research findings	Theoretical significance of the investigation remains doubtful

Page 268

Page 269

Summary

The critical evaluation of published material at a level of detail suggested by this chapter can be a time-consuming, even pedantic, task. One undertakes such detailed analysis only when professional communications are of key importance, for example, when writing a formal literature review or when evaluating current evidence for adopting a new intervention or approach. Nevertheless, it is a necessary process for an in-depth understanding of the empirical and theoretical basis of your clinical practice.

Even when some problems are identified with a given research report, it is nevertheless likely that the report will provide some useful additional knowledge. Given the problems of generalization, an individual research project is usually insufficient for firmly deciding upon the truth of a hypothesis or the usefulness of a clinical intervention. Rather, as we will see in Chapter 24, the reader needs to scrutinize the range of relevant research and summarize the evidence using qualitative and quantitative review methods. In this way, individual research results can be evaluated in the context of the research area. Disagreements or controversies are ultimately useful for generating hypotheses for guiding new research and for advancing theory and practice.

Self-assessment

Explain the meaning of the following terms:

critical evaluation

protocol deviation

True or false

1. Critical analysis of a publication aims to identify the internal and external validity of the investigation.

2. If an investigation is published in a reputable journal by established investigators then the validity of the investigation can be taken for granted.

3. Random assignment of subjects to treatment groups ensures that the investigation uncovers causal effects.

4. The outcome of an investigation can be useful even with a small sample size.

5. If an investigation produces statistically significant results, its design must have been adequate.

6. Obtaining statistical significance in an investigation is a condition for the demonstration of the clinical significance of a quantitative study.

7. The replication of an investigation demonstrates the internal validity of the original investigation.

8. Without adequate controls the size of an effect might be difficult to estimate.

9. If a study is internally valid, the investigator is justified in generalizing the results to any other population.

10. Provided that the outcomes are statistically significant, it doesn’t matter which statistical tests were chosen to analyse the data.

11. If the design of an investigation is inadequate, none of the empirical findings are of scientific or clinical use.

12. Controversies in an area of science usually reflect the presence of fraudulently published evidence.

13. One of the problems with using human subjects for research is the expectations of the subjects concerning the purpose of the investigation.

14. Even poorly planned research can provide some useful results.

15. The application of the scientific method ensures the validity of a researcher’s conclusions.

16. Disagreements among researchers in an area are useful for generating new hypotheses.

Multiple choice

1. The aim of the critical analysis of a publication is to:

a identify the relevance of the results for clinical practice

b identify the internal and external validity of the investigation

c identify and attack incompetent researchers in one’s area of interest

d a and b.

Page 270

2. If the internal validity of a study is adequate, then:

a the results will be statistically significant

b the results will be clinically useful

c the investigation may demonstrate causal effects

d a and b.

3. Say that an investigation has generated some interesting findings. However, you find that the investigators selected an inappropriate statistical test to analyse their findings. You should:

a regretfully discard the study as useless

b re-analyse the data from the descriptive statistics provided

c write to the investigators for their raw data, and re-analyse yourself

d b or c.

4. The reason one should evaluate the ‘literature’ as a whole is to:

a identify general patterns of findings in the area

b condense results from related papers into a single statistic

c identify and attempt to explain controversies in the area

d all of the above.

5. In judging the clinical significance of a well-designed investigation one should consider:

a the cost-effectiveness of the interventions

b the size of the therapeutic effects

c the possible undesirable side effects of the treatment

d all of the above.

An investigation was carried out in order to show that ‘prepared childbirth’ was an effective method for reducing pain during delivery. Ninety women attending a large hospital constituted the sample. Sixty of the women chose to participate in childbirth preparation, based on the Lamaze method, provided by trained instructors working at the hospital. This method encourages ‘natural’ (drug-free) childbirth through teaching physical and mental strategies for coping with pain or discomfort occurring during childbirth. The other 30 women chose not to attend the childbirth preparation programme. The level of pain experienced was assessed on the McGill Pain Questionnaire, which has been shown to be a valid and reliable interval scale for pain. It was administered following the childbirth. In addition the number of women seeking analgesia during childbirth was recorded as a measure of levels of discomfort experienced. The results for the investigation are as shown in the table below.

Questions 6–14 refer to the above investigation.

6. The strategy for the investigation is best described as:

a an experiment

b a quasi-experiment

c a correlational study

d an n = 90 design.

7. One of the problems with the above investigation was that:

a the subjects could not be randomly assigned to treatment groups

b the dependent variable was irrelevant to the aims

c basic ethical issues were not considered

d the instructors teaching the Lamaze method were incompetent.

8. From the information given above, it is clear that the investigators controlled for:

a Hawthorne effects

b Rosenthal effects

c subject assignment

d none of the above.

Groups	Mean pain scores	Number given medication
Women with no training (n = 30)	38	24
Women with childbirth preparation (n = 60)	32	49
	(The difference was statistically significant at a = 0.05)	(The difference was not statistically significant at a = 0.05)

Page 271

9. If you wanted to calculate the proportion of women with no training who had greater McGill pain scores than women with childbirth preparation, then the required statistics are:

a the distribution of t for n = 98

b the normal distribution

c the indices for reliability and validity

d the standard deviations for the two groups.

10. Which of the following statistical tests is most appropriate for analysing the significance of the data for the McGill pain scores?

a Mann–Whitney U

b Sign test

c z test for two means

d χ² test.

11. Which of the following statistical tests is most appropriate for analysing the significance of the data for women requiring medication?

a Mann–Whitney U

b Sign test

c t test for two means

d χ² test.

12. The lack of statistical significance for the data on medication implies that:

a the power for the test may have been too low

b equal sample sizes should have been used

c training has no effect

d both a and c.

13. The outcome of this investigation can be generalized to:

a women having children and undergoing Lamaze training

b women having children without Lamaze training

c women who choose the type of childbirth they undergo

d none of the above groups.

14. Considering the evidence provided, one concludes that:

a prepared childbirth is a waste of time

b there is evidence that Lamaze preparation at this hospital results in statistically significant reductions in pain during delivery

c women undergoing childbirth find Lamaze preparation useless at this hospital

d a and c.

Page 272