Fig. 2.1
Levels of evidence hierarchy for therapeutic studies
Low-level evidence is more likely to be subject to bias. Bias is a systematic error that can make the results invalid. There are many kinds of bias but important ones in orthopedics are selection bias, response bias, recall bias and bias due to confounding. These will be explained later on. This systematic error only results in bias when the inaccuracy affects comparison groups unequally. In theory a well conducted RCT should be free of bias because the process of randomisation is used to assign patients to treatment groups and this should result in the groups being balanced in all factors.
As the level of evidence hierarchy is descended, bias is increasingly likely and you need to be aware of this when you critically appraise articles.
Selection bias, sometimes referred to as sampling bias is error due to the improper process of selecting a study population i.e. the way subjects were identified, selected and included in a study.
Response bias or loss to follow-up bias can result in differences in the characteristics e.g. socio-demographic characteristics, of patients included in a study and those excluded or between selected comparison groups e.g. case and control. For example, response to follow-up may be dependent on socio-demographic characteristics of patients (sex, age, ethnicity, social class). Those who respond may be different from those who do not, leading to bias in the results. Response bias is common in case series and cross sectional study types. Wherever possible, analysis of the demographics of non-responders should be carried out to determine if they differ significantly from responders. In all study types response rates should be high, at the very least 70 % to minimize this type of bias.
Another type of bias is recall bias. This is particularly common in cross-sectional and case-control studies. Patients may not be able to remember correctly past events. Wherever possible any information obtained from patients should be verified from other sources such as patient records.
Confounding bias occurs when part of an observed relationship between 2 variables or factors involved in a disease is due to the action of a third, which is the actual factor responsible. Confounding arises because many aspects of behavior and health are related. Frequent confounders are gender, age, socio-economic status and co-morbidity. In RCTs randomisation ensures that potential confounding factors, known or unknown are evenly distributed among the study groups. This is why this study type is highly regarded.
The Process of Critical Appraisal
Critical appraisal to determine the validity of research findings is an established method used in evidence-based medicine (EBM). It is just one step in the process of EBM – the use of best evidence in making decisions about patient care. To facilitate CA, checklists that ask questions about the research have been developed enabling the reader to judge its validity.
Critical appraisal checklists can be divided into generic and study type specific lists. For the novice reviewer a generic tool is appropriate until more experience is gained. When you are confident and able to identify the study type, you will be able to progress to using the study specific checklists described later in this chapter. The checklists in this chapter are generally from our own experience.
The Anatomy of a Scientific Manuscript
Manuscripts in orthopedic journals have a standard format as follows but with minor variations depending on the particular journal:
Abstract (structured or unstructured with Medical Subject Heading (MeSH) keywords – keywords are sometimes at the end of the article before the references instead). MeSH keywords are used to describe precisely the content of journal articles
Sponsorship/competing interests (usually on the title page).
Introduction.
Methods (or materials and patients, materials and methods, patients and methods).
Results.
Discussion.
Conclusion (sometimes absent).
References.
Appraisal for the New Reviewer
For those new to CA it may be best to start with a more general appraisal until confidence is gained. Read through the whole paper quickly first. Does it seem clearly written and easy to understand or does it appear that it has been rushed? You will probably find papers describing RCTs and meta-analyses the most structured because journals usually have guidelines as to how they should be formatted.
Next, you should be aware of the quality of the journal in which the research is published. This is partly measured by its impact factor (IF). Because it is based upon the number of citations of its papers it is not a fixed value but can vary from year to year.
For orthopedic journals an IF of 2.8 is regarded as high (e.g. JBJS (Br), recently renamed as The Bone & Joint Journal). For general medical journals it is much higher – the BMJ is currently about 16.
Basically a journal is considered to be of good quality if is peer reviewed – that is each paper is reviewed by at least one expert in the subject area prior to acceptance of publication in the journal. In the higher impact journals it is often reviewed independently by 3 experts including a statistician if the work involved statistical tests.
To be accepted, a paper usually has to be approved by all reviewers although the editor does have the final decision, for example if one of the reviewers is doubtful.
Look also at the author names as given on the title page. You may be familiar with certain authors from your attendance at conferences etc. Related to the authors, is the institution – is it a center of excellence in orthopedics? This will give you more confidence in the validity of the research.
Conflict of interest is particularly important to look out for. This is usually at the bottom of the title page along with authors’ affiliations. The most common conflict of interest is that the authors have a financial affiliation with a company that manufactures the products used in the research. For example many orthopedic surgeons are actively involved in design of new implants for which they get remuneration or gifts (e.g. holidays) for their involvement. This is an important part of the evolution of new devices for patient benefit. But it can lead to conscious or unconscious behavior that undermines the integrity and validity of research that involves such appliances.
When there is conflict of interest it is important that it has been recognized and dealt with. For example it might be stated that sponsors had no input into the protocol or conduct of the study. The reader must then decide whether any conflicts are important and might have influenced the validity of the study findings.
After reading through the whole paper, look in detail at each section as follows:
Introduction
What were the aims of the study? Look for this in the introduction or discussion (where it is often reiterated). It may be stated as a formal hypothesis (the null hypothesis). For example “there is no difference in outcome between patients in the two treatments” that the study aims to reject. More usually it is stated as a general research question such as “the purpose of this study was to determine if treatment A is more effective than treatment B”.
Papers that do not have a clearly focused research question may be data dredging i.e. performing multiple statistical tests on the resulting data to see if anything of significance surfaces. This is bad science.
Methods
How were the patients selected for the study? Remember selection/sampling bias here. What were the exclusion criteria?
Are the details of statistical analysis described and appropriate? If so, what types of tests were used e.g. t-test, Pearson’s, and were they the most appropriate for the data types? For continuous data were efforts made to check the data for normality and if they were then a parametric test should have been used. If the data were non-normal, then the median rather than the mean should be quoted and non-parametric tests used. Statistical significance should be stated and is almost always given as P < 0.05 with confidence intervals (CI) at 95 %.
Was a sample size calculation made? This will not be relevant for case reports and series but applies for cross sectional and other studies higher in the hierarchy of evidence. If the sample size is too small for the effect size difference expected, then the study is unlikely to show statistical significance. A sample size calculation estimate should include the size of the minimum difference between the groups that is considered clinically significant (the effect size). For example, a 3-point decrease on the 0 to 10 visual analogue scale (VAS) for pain, would be regarded as clinically significant.
In case series, where a number of patients are reviewed and comparisons are made between subgroups of patients, it is possible to perform a post-hoc (after the analysis) power analysis. This may show that the results of no difference in the groups for a particular outcome may be due to insufficient sample size rather than the difference does not actually exist.
Other things to look out for in the methods are details of the surgical or other interventions used. Are they adequately described? What about the measures used to assess outcome? For example, if questionnaires have been used are they established ones or did the authors use their own questionnaires specifically designed for the study that may have not been validated?
Methods of measurements should be described in detail e.g. was a goniometer used for measuring straight leg raise or the less reliable visual estimation.
Be particularly critical of cross sectional survey type studies that the questions used are valid and reliable. How was the questionnaire developed? Was it piloted for reliability and validity? For example does the questionnaire have content validity i.e. are the questions asked relevant and how did the results compare with similar validated questionnaires (criterion validity)? Did the same questionnaire give similar results when repeated soon after on the same patients (reliability)? Where standard questionnaires have been used (e.g. SF-36) then this should not be a problem assuming they have been used before in that particular patient group: questionnaires designed for adults may not be valid and reliable when used with children. Were answers to questions involving recall verified using other data sources such as patient records? This raises confidence in the results.
Results
Are the demographics of patients described in detail (age, sex, pathology etc) and a breakdown for the groups where relevant e.g. study and control groups. Remember confounding factors.
What was the response rate? It is recommended to be at the very least 70 %, otherwise there could be sampling bias. Were the demographics of the non-responders, where known, given and were they similar to the responders. Otherwise this means the responders may be atypical and the results will be biased.
Are any deviations from the protocol described e.g. unexpected events, patient drop out?
Are confidence intervals (CI) given for values? These are more informative than just P values as they indicate the possible range of values in the general population. Just a note here: if a P value is not significant then the CI should include zero (i.e. no difference) so just check these – It may be that the statistics are not up to scratch. Always look at the statistical tables and figures carefully (e.g. graphs – see below on interpreting tables and graphs) and see if there are unusual values that don’t quite (metaphorically) add up (e.g. CI and P values – see above). Concerning P values, the smaller the value the less likely the result is due to chance, e.g. a P < 0.01 rather than P = 0.048 which is only bordering on significance. Not all papers quote the exact P value but use the expression P < 0.05.
Look for data dredging involving post-hoc analysis where tests are done on the data to look for interesting results – only tests should be performed that were stated in the original hypothesis e.g. to look for age or sex differences. Otherwise some of the significant results may be due to chance. This is because P = 0.05 signifies that chance could create the result 1 time in 20.
Do the tables, figures and graphs match up with any description in the main text? Do the values add up within the tables, figures and graphs?
Discussion
Have the authors discussed how their findings fit in with what is already known about the subject? Do the results fit in with previous findings and if not is there an explanation by the authors. Are you aware of similar studies that have been omitted and are contradictory to their findings? Do the findings appear plausible from a medical viewpoint.
Look for any overstatement of the findings i.e. over-extrapolation of the results which may only be the authors opinion.
Have they discussed the strengths and weaknesses of their findings?
Interpreting Figures, Tables and Graphs
Tables and graphs are time consuming and difficult to produce. Even with the help of word processor templates it is easy for errors to creep in. But they often improve the clarity of a paper. Tables often contain a lot of information and may be difficult to decipher. Look for:
Self-explanatory title with units of measurements.
Labelling of rows and columns.
Are the rows/columns ordered e.g. by age.
Numbers rounded to 2 significant figures e.g. 72.8 not 72.799. This will give you some indication of the standard of statistical input to the paper.
For figures make sure the appropriate type of figure has been used, that is graphs, histograms, bar charts, scatter plots or box plots. The axes should be labeled with the units.
With graphs watch out for scales that don’t start at zero – this may deceptively emphasize an effect. Histograms are for continuous grouped data e.g. age groups. They show the symmetry of the data and give some indication of the normality. This symmetry is related to the use of an appropriate statistical test. Parametric tests should be used for normal (symmetrical) data and non-parametric for non-normal (skewed) data.
Scatter plots and graphs show how 2 variables relate to each other. Bar charts are for discrete data e.g. blood groups, whereas graphs are for continuous data e.g. blood pressure or age. When data is grouped e.g. age in 5-year intervals, information is lost within the groups and this may hide important information that was in the original raw data. Scatter plots are for showing the relationship between 2 variables and often a correlation coefficient is given for the strength of the relationship. With scatter plots look out for outliers (extreme values e.g. age 99 instead of 9) that may have distorted summary values (e.g. means) listed in tables. Such outliers may be erroneous values that should have been screened out during data cleaning or an explanation given for their inclusion.
Advanced Critical Appraisal
Although the foregoing generic appraisal guidelines are relevant for the less experienced reader, detailed study specific checklists are needed for a more robust critical appraisal.
The main epidemiological study types are:
Qualitative research.
Case reports.
Case series.
Cross sectional.
Case control.
Cohort.
RCT.
Systematic review/meta-analysis.
Advanced critical appraisal necessitates identifying the study type used in the paper. This may be given in the title or in the introduction or methods section. Often apart from RCTs and systematic review/meta-analyses it may not be explicitly mentioned and you will need to decide for yourself. This will only come from experience.
If the paper has keywords and these are MeSH terms, then the type of study should be stated so it is best to look their first. But not all study types have a MeSH term e.g. case series is not given a term although case report is.
Most of the checklists detailed below are the ones we use from our own experience of carrying out research and publication. Before submission we ensure that we have covered the relevant check points mentioned for each study type.
The Study Types
Qualitative Research
This is rarely found in the orthopedic literature.
It provides information on qualities that are difficult to measure for example patient experience, emotions, social interactions, attitudes, and behaviour. Qualitative studies have their own study types such as descriptive, phenomenology and ethnography. Qualitative studies are often combined with quantitative methods.
Qualitative studies are prone to bias and for this reason are at the bottom of the hierarchy. A description of these and their detailed appraisal is beyond the scope of this chapter.
Case Reports
This type of study is common in orthopedics. It is a type of qualitative research and in the hierarchy of evidence it is at a low level. This study type is easy to identify as often they are in a separate section of a journal. They are regarded as having low validity but have been important in alerting clinicians to unusual events such as adverse reactions to treatment or conditions not seen before.
They are communicated in a narrative fashion e.g. “a 11 year old girl suffered a fall from standing and subsequently developed pain and …” that has unusual or novel outcome. They often do not have the standard format of research papers, usually having a discussion and conclusion after the case report itself.
Case reports can lead to generation of new hypotheses that can be tested using a study higher in the hierarchy of evidence for example by an RCT. They also have a strong educational component providing unusual things to watch out for in your own patients. This is enhanced by the fact that many case reports also include a literature review of the subject.
There are journals (such as Case Connector of the JBJS (Am), www.caseconnector.jbjs.org) entirely devoted to case reports while some other journals do not include them at all e.g. International Orthopaedics. Because there publication adversely affects impact factor, their numbers are restricted. Because of their simplicity case reports usually do not have conflict of interest statements.
The following checklist for the CA of a case report is modified from Chan and Bhandari [1]:
Does the case report include a literature review usually in the discussion section?Stay updated, free articles. Join our Telegram channel
Full access? Get Clinical Tree