The concept of evidence-based medicine has gained broad support in the medical community, because clinical decisions based on information from rigorous scientific study are most likely to provide optimal care. Researchers attempt to answer clinical questions using either observational studies or randomized controlled trials (RCTs). Observational studies currently dominate the surgical literature but provide a level of evidence inferior to RCTs. RCTs are ethically grounded in clinical equipoise and may further reduce the potential for bias or other confounding factors by blinding. This article discusses the barriers to implementation of surgical RCTs.
The concept of evidence-based medicine has gained broad support in the medical community, because clinical decisions based on information from rigorous scientific study are most likely to provide optimal care.
When researchers attempt to answer clinical questions, they must first decide whether to observe the events taking place in the subjects (observational study) or to introduce an intervention and analyze its effects on the subjects (randomized controlled trial). Observational studies such as Cohort studies (prospective or retrospective) and the cross-sectional study currently dominate the surgical literature. These may be the best methods if researchers are studying the outcomes of uncommon treatments or diseases, if funding is scarce, or if there is only 1 acceptable intervention and the physician is ethically obligated to offer it. However, observational studies can only demonstrate an association between observations, and therefore provide an inferior level of evidence to the other study designs, or a randomized controlled trial (RCT), which has the ability to show causality. RCTs are ethically grounded in clinical equipoise. If one treatment has not been proven better than another, a state of clinical equipoise exists, and subjects can ethically be randomly assigned to an intervention to reduce the effects that could influence the outcome. The results observed in such a study are more likely to be the consequence of the intervention.
RCTs may further reduce the potential for bias or other confounding factors due to chance by blinding. Blinded RCTs are the prevalent method used to compare pharmaceutical interventions, and theoretically, these would be the best way to compare different surgical procedures, and to compare surgery with nonoperative treatment options. Pharmaceutical trials can be designed such that subjects alone or subjects and researchers are blinded to the drug administered (single-blinded versus double-blinded studies). In double-blinded studies (the most stringent form of blinding) study participants, caregivers, and investigators do not know the treatment group that a subject has been assigned to until the study has ended. Thus, randomization prevents bias at the time of allocation to a treatment group, and blinding prevents bias during the data collection process.
Blinding in surgical trials is much more challenging than in pharmaceutical trials. The surgeon cannot be blinded to the procedure, so a double-blind design is not feasible. Blinding of patients is possible in a surgical trial if the subjects in one of the treatment arms receive a placebo or sham surgery, but there are practical and ethical barriers to this practice.
RCTs are the gold standard in study design and they play a vital role in increasing medical knowledge about treatment effectiveness and supporting clinical decisions with evidence-based data. However, surgical RCTs are not commonly performed and reported in the scientific literature in general, and are even scarcer in orthopedic surgery, because of the many challenges described in this article.
This article discusses the barriers to implementation of surgical RCTs, using the authors’ experience with a pediatric orthopedic RCT in which children are randomized to surgical versus nonsurgical treatment as a case study to illustrate some of the key points.
RCTs and evidence-based medicine
Because data produced from blinded RCTs allows researchers to draw conclusions about the cause and effect relationships of interventions and outcomes, they have become known as the gold standard of evidence-based medicine. The original definition of evidence-based medicine is “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.” Randomization does not guarantee that every variable that could potentially bias the results is accounted for, but it does reduce or eliminate the selection bias of the researcher and the subject. Because results of blinded RCTs have the lowest chance of bias, these are assumed to provide the best scientific evidence available. Study design rigor is rated using the Oxford Center for Evidence-Based Medicine Levels of Evidence, which are used or adapted for use by medical journals for published results. According to this rating scheme, the blinded RCT provides the highest levels of evidence, Level 1 or Level 2 ( Table 1 for an adaptation of the Oxford Levels of Evidence used by the Journal of Bone and Joint Surgery ).
Types of Studies | |||
---|---|---|---|
Therapeutic Studies Investigating the Results of Treatment | Prognostic Studies Investigating the Effect of a Patient Characteristic on the Outcome of Disease | Diagnostic Studies Investigating a Diagnostic Test | |
Level 1 |
| ||
Level 2 |
| ||
Level 3 |
|
| |
Level 4 |
|
|
|
Level 5 |
|
|
|
a A complete assessment of the quality of individual studies requires critical appraisal of all aspects of the study design.
b A combination of results from two or more previous studies.
c Studies provided consistent results.
d Study was started before the first patient enrolled.
e Patients treated one way (eg, with cemented hip arthroplasty) compared with patients treated another way (eg, with cementless hip arthroplasty) at the same institution.
f Study was started after the first patient enrolled.
g Patients identified for the study on the basis of their outcome (eg, failed total hip arthroplasty), called “cases,” are compared with those who did not have the outcome (eg, had a successful total hip arthroplasty), called “controls.”
h Patients treated one way with no comparison group of patients treated another way.
New surgical procedures are developed by a single surgeon or a group of surgeons who perform a new procedure and observe the results in either a case series report or a retrospective or prospective cohort study. Results from this level of evidence are potentially biased by patient selection and the surgeon’s enthusiasm for the procedure, in addition to small study cohorts and problems with data collection and data analysis. Furthermore, this approach does not allow for controlled comparison between surgical procedures or comparison of surgical treatment with a nonsurgical intervention.
Several barriers to achieving Level 1 evidence are inherent in the design of a surgical trial. Subject recruitment is difficult when the study design includes randomizing to surgery versus nonsurgical treatment arms, because of perceived lack of clinical equipoise by the surgeon, and/or potential subject preference. Randomizing subjects to 1 of 2 surgical treatment arms may be more agreeable to potential subjects because both treatments still involve surgery. Blinding techniques are also difficult in surgical trials. The surgeon cannot be blinded to a subject’s treatment arm, and unless a study uses a placebo or sham surgery, it is difficult to blind the patient.
Sham surgical trials that blind the patient to the procedure performed are uncommon, but these trials have the potential to provide clinicians with definitive answers to surgical questions. One well-known orthopedic sham surgery study provided high-level evidence that arthroscopic surgery for arthritis of the knee was no more effective than the placebo or sham surgery. However, the use of sham surgery raises ethical concerns, as discussed later, because a subject undergoing sham surgery is exposed to the risks of surgery without the potential benefits.
Barriers to performing and publishing RCTs
There has been a recent increase in RCTs reported in the orthopedic literature. A review of 36,293 articles in orthopedic journals found 671 RCTs and 12 meta-analyses published between 1966 and 1999. Over 75% of the RCTs were published between 1990 and 1999 (528 RCTs), whereas only 122 RCTs were published between 1980 and 1989, and 21 RCTs were published between 1968 and 1979.
Despite recent increases, only about 7% of articles in surgical journals are RCTs. Another review found that only 3.4% of all articles in leading surgical journals were RCTs. Of RCTs published in the same leading surgical journals during a 10-year period, less than half compared the results of surgery with a nonsurgical alternative treatment. These deficiencies are well described, and the surgical literature lags behind other medical specialties in these respects.
Specialty surgical literature does not contain many RCTs. A search of the Journal of Hand Surgery (American) from 1976 to 1994 (volumes 1–19) found that only 1% of all the published articles were RCTs. A search of all English-language literature for clinical trials in pediatric surgery from 1966 to 1999 identified only 134 RCTs out of 80,377 articles (0.17%). These RCTs studied analgesia (49%), antibiotics (13%), extra-corporeal membrane oxygenation (7%), and treatment of gastrointestinal conditions, burns, congenital anomalies, trauma and cancer, minimally invasive surgery, and vascular access (each less than 5%). Furthermore, of these 134 RCTs, 81% (109 studies) compared 2 medical therapies in surgical patients, 12% (16 studies) compared 2 surgical treatments, and 7% (9 studies) compared medical versus surgical treatment.
Surgical RCTs are uncommon for several reasons, including ethical issues, patient and surgeon preferences, irreversibility of surgical treatment, expense and follow-up time, and difficulties associated with randomization and blinding. Barriers to conducting successful RCTs also include the relative infrequency of the disease state under consideration, lack of community equipoise regarding standards of care, limited availability of diagnostic tools, and the challenges of enrolling children in RCTs. Uncommon diseases cannot be studied at a single center, but multicenter trials are expensive and complicated to conduct. Although nonoperative interventions are more common in pediatric orthopedics than in other orthopedic subspecialties, studies of pediatric surgical interventions comparing the results of surgery with nonsurgical interventions are especially uncommon.
Clinical equipoise, or genuine uncertainty about the best treatment, is a necessary criteria for randomizing subjects to treatment. Because of the deficit of RCTs, surgeons tend to rely on anecdotal data for making treatment decisions, thus creating a bias against the presumption of clinical equipoise. The data from biased studies may prematurely eliminate or falsely alter assumptions of clinical equipoise, interfering with the conduction of more scientifically rigorous research.
Barriers to performing and publishing RCTs
There has been a recent increase in RCTs reported in the orthopedic literature. A review of 36,293 articles in orthopedic journals found 671 RCTs and 12 meta-analyses published between 1966 and 1999. Over 75% of the RCTs were published between 1990 and 1999 (528 RCTs), whereas only 122 RCTs were published between 1980 and 1989, and 21 RCTs were published between 1968 and 1979.
Despite recent increases, only about 7% of articles in surgical journals are RCTs. Another review found that only 3.4% of all articles in leading surgical journals were RCTs. Of RCTs published in the same leading surgical journals during a 10-year period, less than half compared the results of surgery with a nonsurgical alternative treatment. These deficiencies are well described, and the surgical literature lags behind other medical specialties in these respects.
Specialty surgical literature does not contain many RCTs. A search of the Journal of Hand Surgery (American) from 1976 to 1994 (volumes 1–19) found that only 1% of all the published articles were RCTs. A search of all English-language literature for clinical trials in pediatric surgery from 1966 to 1999 identified only 134 RCTs out of 80,377 articles (0.17%). These RCTs studied analgesia (49%), antibiotics (13%), extra-corporeal membrane oxygenation (7%), and treatment of gastrointestinal conditions, burns, congenital anomalies, trauma and cancer, minimally invasive surgery, and vascular access (each less than 5%). Furthermore, of these 134 RCTs, 81% (109 studies) compared 2 medical therapies in surgical patients, 12% (16 studies) compared 2 surgical treatments, and 7% (9 studies) compared medical versus surgical treatment.
Surgical RCTs are uncommon for several reasons, including ethical issues, patient and surgeon preferences, irreversibility of surgical treatment, expense and follow-up time, and difficulties associated with randomization and blinding. Barriers to conducting successful RCTs also include the relative infrequency of the disease state under consideration, lack of community equipoise regarding standards of care, limited availability of diagnostic tools, and the challenges of enrolling children in RCTs. Uncommon diseases cannot be studied at a single center, but multicenter trials are expensive and complicated to conduct. Although nonoperative interventions are more common in pediatric orthopedics than in other orthopedic subspecialties, studies of pediatric surgical interventions comparing the results of surgery with nonsurgical interventions are especially uncommon.
Clinical equipoise, or genuine uncertainty about the best treatment, is a necessary criteria for randomizing subjects to treatment. Because of the deficit of RCTs, surgeons tend to rely on anecdotal data for making treatment decisions, thus creating a bias against the presumption of clinical equipoise. The data from biased studies may prematurely eliminate or falsely alter assumptions of clinical equipoise, interfering with the conduction of more scientifically rigorous research.
Quality of published RCTs
The actual level of evidence provided by an RCT depends on the quality of the study design, including the methodology of randomization and blinding, and on the rigor of the analysis and interpretation of results. It is possible to conduct a poor-quality RCT that may be just as biased if randomization and blinding never occurred; in fact, the quality of some of the few surgical RCTs that have reached publication appears to be poor. Reviews and meta-analyses have revealed major flaws in methodology and study design.
Several different assessment tools can be used to evaluate the quality of RCTs. The most commonly used tools are the Chalmers index and the Jadad score. These tools are not entirely applicable to surgical RCTs because of the weight they give to blinding techniques. For example, the Jadad score was originally validated for RCTs assessing treatment of pain. Surgical trials automatically score poorly on these scales because of the inherent design limitations, which include the impossibility of blinding the surgeon to the treatment and the rarity of blinding the subject in a sham surgical trial.
Another assessment tool, the Detsky scale, helps evaluate the quality of RCTs but does not penalize surgical trials for their inability to blind participants and researchers. However, a Canadian study found that even using the Detsky scale only 19% of all pediatric orthopedic RCTs published in 5 well-recognized journals between 1995 and 2005 met the satisfactory level of methodological quality. A separate study that also used the Detsky scale to assess the quality of RCTs in the Journal of Bone and Joint Surgery from 1988 to 2000 found that only 40% of the studies met the standard of acceptability. Surgical trials received slightly lower quality scores than did drug trials (63.9% compared with 72.8% according to Detsky score standardization). This study also found that funding of a trial and involvement of at least 1 epidemiologist were associated with better RCT quality. Lack of funding may play a significant role in the quality of surgical RCTs, because surgical RCTs lag significantly behind drug trials in terms of funding from the National Institutes of Health. Surgical trials are less likely to be funded than nonsurgical trials, and awards for surgical grants average 5% to 27% less than nonsurgical grants. In the same study that identified 134 pediatric surgical RCTs over a 33-year period, only 13% cited a biostatistician as an author or a consultant, 7% stated that they received funding from either National Institutes of Health or Medical Research Council, and 65% mentioned no funding source at all.
The most common methodological flaw in pediatric surgical trials is failure to identify specific inclusion and exclusion criteria, and failure to describe eligible patients who choose not to participate. The majority of studies that tracked patient withdrawal excluded the data from these patients from statistical analysis. In addition, many RCTs state that the trial was randomized and blinded or double-blinded, but do not discuss the method of randomization or blinding. RCTs using flawed methods of randomization overstate the effect of intervention. In an observational study that analyzed the quality of allocation concealment in 250 RCTs from 33 meta-analyses (from the Cochrane Pregnancy and Childbirth Database), trials that had inadequate concealment techniques exaggerated treatment effect by 41%. Trials that had unclear concealment techniques exaggerated treatment effect by 30%.
Additional problems with methodology include failure to perform a power analysis, in which the researchers identify the primary outcome measure before performing the study. This is necessary to determine the power of the study, or whether sample sizes are adequate to detect clinical differences. An analysis of 760 abstracts accepted for presentation at the British Association of Pediatric Surgeons Annual Congress during a 5-year period (1996–2000) revealed that 9 were RCTs. A review of these 9 trials revealed that the actual number of subjects enrolled was far below what was needed to find a significant difference between treatment arms, raising the likelihood of type 2 errors (an incorrect conclusion of no difference between treatment arms when a difference does exist). Underpowered studies such as these may lead to an incorrect interpretation of the data when a clinically important difference might exist.