Chapter Twenty Three Critical evaluation of published research
By the time a research report is published in a refereed journal, it has been critically scrutinized by several experts and, usually, changes have been made to the initial text by the author(s) to respond to the referees’ comments. Nevertheless, even this thorough evaluation procedure doesn’t necessarily guarantee the validity of the design or the conclusions presented in a published paper. Ultimately, you as a health professional must be responsible for judging the validity and relevance of published material to your own clinical activities. The evidence-based practice movement focuses on the ways in which practitioners can incorporate better procedures into their practice based upon well-founded research and evaluation evidence. The systematic review processes employed by bodies such as the Cochrane and Campbell Collaborations are intended to assist clinicians in the selection of interventions that are well proven (Ch. 24).
The proper attitude to take with published material, including systematic reviews, is hard-nosed scepticism, notwithstanding the authority of the source. This attitude is based on our understanding of the uncertain and provisional nature of scientific and professional knowledge. In addition, health researchers deal with the investigation of complex phenomena, where it is often impossible for ethical reasons to exercise the desired levels of control or to collect crucial information required to arrive at definitive conclusions. The aim of critical evaluation is to identify the strengths and weaknesses of a research publication, so as to ensure that patients receive assessment and treatment based on the best available evidence.
The aim of this chapter is to demonstrate how select concepts in research design, analysis and measurement can be applied to the critical evaluation of published research. The chapter is organized around the evaluation of specific sections of research publications.
The introduction of a paper essentially reflects the planning of the research. Inadequacies in this section might signal that the research project was erroneously conceived or poorly planned. The following issues are essential for evaluating this section.
The literature review must be sufficiently complete so as to reflect the current state of knowledge in the area. Key papers should not be omitted, particularly when their results could have direct relevance to the research hypotheses or aims. Researchers must be unbiased in presenting evidence that is unfavourable to their personal points of view. This is why we now have systematic review procedures, such as those utilized by the Cochrane Collaboration, so as to avoid inappropriate and biased exclusion or inclusion of work that supports or challenges a point of view favoured by the researcher or other researchers who hold contrary opinions. Poor review of the literature could lead to the unfortunate situation of repeating research or making mistakes that could have been avoided if the previous work’s findings had been incorporated into formulation of the research design.
As stated in Chapter 2, the aims or hypotheses of the research should be clearly stated. If this clarity in expression of the aims is lacking, then the rest of the paper will be compromised. In a quantitative research project, it is usual to see a statement of the hypotheses as well as the research aims. All research, whether qualitative or quantitative, should have a clear and recognizable statement of aim(s).
In formulating the aims of the investigation, the researcher must have taken into account the appropriate research strategy. For instance, if the demonstration of causal effects is required, a survey may be inappropriate for satisfying the aims of the research. If the purpose of the study is to explore the personal interpretations and meanings of participants then a qualitative strategy will be best. Some researchers now advocate mixed designs where multiple studies are performed to examine different perspectives of the same issues. Thus in a study of views concerning health practices, a focus group discussion may also be accompanied by a structured questionnaire even within the same study sample, so that the findings from each may be used to inform the total understanding of the research issue(s) under study.
In a quantitative study, if the selection of the variables is inappropriate to the aims or questions being investigated, then the investigation will not produce useful results. Similarly, in a qualitative study, the information to be collected must be appropriate to the research aims and questions.
A well-documented methods section is a necessary condition for understanding, evaluating and perhaps replicating a research project. In general, the critical evaluation of this section will allow a judgment of the validity of the investigation to be made.
This section shows if the study participants were representative of the intended target group or population and the adequacy of the sampling model used.
In Chapter 3, we outlined a number of sampling models that can be employed to optimize the representativeness of a study sample. If the sampling model is inappropriate, then the sample might be unrepresentative, raising questions concerning the external validity of the research findings. In qualitative research, although the participant sampling method may be less formal than in a quantitative study, the issue of participant representativeness is still pertinent in terms of being able to apply the results more broadly.
Use of a small sample is not necessarily a fatal flaw of an investigation, if the sample is representative. However, given a highly variable, heterogeneous population, a small sample will not be adequate to ensure representativeness (Ch. 3). Also, a small sample size could decrease the power of the stat-istical analysis in a quantitative study (Ch. 20). As discussed in the qualitative sampling section of this text, unlike in quantitative sampling procedures, there is not widespread agreement among qualitative researchers as to the issue of how many participants are needed in such studies.
A clear description of key participant characteristics (for example age, sex, type and severity of their condition) should be provided. When necessary and possible, demographic information concerning the population from which the participants have been drawn should be provided. If not, the reader cannot adequately judge the representativeness of the sample.
The validity and reliability of observations and/or measurements are fundamental characteristics of good research. In this section, the investigator must demonstrate the adequacy of the tools used for the data collection.
A full description of how the investigation was carried out is required for both replication and for the evaluation of its internal and external validity. This requirement applies to both qualitative and quantitative studies.
It was stated previously that a good design should minimize alternative conflicting interpretations of the data collected. For quantitative research aimed at studying causal relationships, poor design will result in uncontrolled influences by extraneous variables, muddying the identification of causal effects. In Section 3, we looked at a variety of threats to internal validity which must be considered when critically evaluating an investigation. In a qualitative study the theoretical approach taken in the study design or approach should be clearly stated.
In quantitative research a common way of controlling for extraneous effects is the use of control groups (such as placebo, no treatment, conventional treatment). If control groups are not employed, then the internal validity of the investigation might be questioned. Also, if placebo or untreated groups are not present, the size of the effect due to the treatments might be difficult to estimate.
When using an experimental design, care must be taken in the assignment of subjects so as to avoid significant initial differences between treatment groups. Even when quasi-experimental or natural comparison strategies are used, care must be taken to establish the equivalence of the groups.
It is important to describe all the treatments given to the different groups. If the treatments differ in intensity, or the administering personnel take different approaches, the internal validity of the project is threatened. The adherence of the study in the delivery of the intervention to the intended intervention is sometimes called treatment fidelity.
Whenever possible, intervention studies should use double- or single-blind procedures. If the participants, researchers or observers are aware of the aims and predicted outcomes of the investigation, then the validity of the investigation will be threatened through bias and expectancy effects. In qualitative research, it is very important that the research findings are not unduly influenced by the personal positions of the researchers in a way that obscures the meanings and interpretations of the research participants. Of course, the position of the researcher in any study, whether qualitative or quantitative, will to some extent influence the findings but this needs to be kept to a minimum.
The setting in which a study is carried out has implications for external (ecological) validity. An adequate description of the setting is necessary for evaluating the generalizability of the findings. The context of the investigation may have important effects on the study outcomes. Research conducted in the investigator’s lab or office may yield different results to the same work conducted in the field.
In intervention studies the sequence of any treatments and observations must be clearly indicated, so that issues such as series and confounding effects can be detected. Identification of variability in treatment and observation times can influence the internal validity of experimental, quasi-experimental or n = 1 designs, resulting in, for instance, internal validity problems.
The results should represent a sound and, where appropriate, statistically correct summary and analysis of the data. Inadequacies in this section could indicate that inferences drawn by the investigator were erroneous.
Data should be correctly tabulated or drawn and adequately labelled for interpretation. Complete summaries of all the relevant findings should be presented.
Where appropriate both descriptive and inferential statistics must be selected according to specific rules. The selection of inappropriate statistics could distort the findings and lead to inappropriate inferences.
In the discussion, investigators draw inferences from the information or data they have collected in relation to the initial aims, questions, and/or hypotheses of the investigation. Unless the inferences are correctly made, the conclusions drawn might lead to useless and dangerous treatments being offered to clients.
The inferences from the collected information or data must take into account the limitations of the study and the analytical methods used to analyse them. In the quantitative domain we have seen, for instance in Chapter 16, that correlations do not necessarily imply causation, or that a lack of significance in the statistical analysis could imply a Type II error or incorrect missing of a real trend or finding (see Ch. 20). In the qualitative domain, the findings must follow reasonably from the information collected in the investigation according to the paradigm used.
Interpretations of the findings must follow from the information collected, without extraneous evidence being introduced. For instance, if the investigation used a single-participant design, the conclusions should not claim that a procedure is generally useful for the entire population.
In interpreting the data or information collected in a study, the investigator must indicate, and take into account, unexpected deviations from the intended research protocols. For instance, in a quantitative study a placebo/active treatment code might be broken, or ‘contamination’ between control and experimental groups might be discovered. In a qualitative study, it could be that participants have conversed with each other about the research prior to one of the participants completing participation. If such deviations are discovered by investigators they are obliged to report these, so that the implications for the results might be taken into account.
Strictly speaking, the data obtained from a given sample are generalizable only to the population from which the participants were drawn. This point is sometimes ignored by investigators and the findings are generalized to subjects or situations which were not considered in the original sampling plan. Qualitative researchers may vary in their willingness to claim generalizability of their findings outside the actual research participants but this must also be systematically considered.
As was explained in Chapter 22, in quantitative studies, obtaining statistical significance does not necessarily imply that the results of an investigation are clinically applicable or useful. In deciding on clinical significance, factors such as the size of the effect, side effects and cost-effectiveness, as well as value judgments concerning outcome, must be considered.
It is necessary to relate the results of an investigation to previous relevant findings that have been identified in the literature review. Unless the results are logically related to the literature, the theoretical significance of the investigation remains unclear. The processes involved in comparing the findings of a set of related papers are introduced in the next subsection.
Table 23.1 summarizes some of the potential problems, and their implications, which might emerge in the context-critical evaluation of an investigation. A point which must be kept in mind is that, even where an investigation is flawed, useful knowledge might be drawn from it. The aim of critical analysis is not to discredit or tear down published work, but to ensure that the reader understands its implications and limitations with respect to theory and practice.
Table 23.1 Checklist for evaluating published research
Problems which might be identified | Possible implications in a research article |
---|---|
Misrepresentation of the conceptual basis for the research | |
Research might lack direction; interpretation of evidence might be ambiguous | |
Findings might not be relevant to the problem being investigated | |
Measurements might not be related to concepts being investigated | |
Sample might be biased; investigation could lack external validity | |
Sample might be biased; statistical analysis might lack power | |
Application of findings to specific groups or individuals might be difficult | |
Findings might represent measurement errors | |
Investigation might lack internal validity; i.e. outcomes might be due to uncontrolled extraneous variables | |
Investigation might lack internal validity; size of the effect difficult to estimate | |
Investigation might lack internal validity | |
Investigation might lack internal validity | |
Investigation might lack internal and external validity | |
Investigation might lack internal and external validity | |
Investigation might lack ecological validity | |
Possible series effects; investigation might lack internal validity | |
The nature of the empirical findings might not be comprehensible | |
Distortion of the decision process; false inferences might be drawn | |
False inferences might be drawn | |
False conclusions might be made concerning the outcome of an investigation | |
Investigation might lack external or internal validity | |
External validity might be threatened | |
Treatments lacking clinical usefulness might be encouraged | |
Theoretical significance of the investigation remains doubtful |
The critical evaluation of published material at a level of detail suggested by this chapter can be a time-consuming, even pedantic, task. One undertakes such detailed analysis only when professional communications are of key importance, for example, when writing a formal literature review or when evaluating current evidence for adopting a new intervention or approach. Nevertheless, it is a necessary process for an in-depth understanding of the empirical and theoretical basis of your clinical practice.
Even when some problems are identified with a given research report, it is nevertheless likely that the report will provide some useful additional knowledge. Given the problems of generalization, an individual research project is usually insufficient for firmly deciding upon the truth of a hypothesis or the usefulness of a clinical intervention. Rather, as we will see in Chapter 24, the reader needs to scrutinize the range of relevant research and summarize the evidence using qualitative and quantitative review methods. In this way, individual research results can be evaluated in the context of the research area. Disagreements or controversies are ultimately useful for generating hypotheses for guiding new research and for advancing theory and practice.
Explain the meaning of the following terms:
Groups | Mean pain scores | Number given medication |
---|---|---|
Women with no training (n = 30) | 38 | 24 |
Women with childbirth preparation (n = 60) | 32 | 49 |
(The difference was statistically significant at a = 0.05) | (The difference was not statistically significant at a = 0.05) |