Mark Phillips BSc PhD, Cand Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada You have been asked to join a guideline panel working group that is tasked with developing a clinical practice recommendation for the use of hemiarthroplasty (HA) versus total hip arthroplasty (THA) for the management of displaced femoral neck fracture in patients over the age of 60. The team has decided that the important outcomes to evaluate within their recommendation are revision rates, one‐year mortality, and dislocation rates. Interested in understanding the best method in developing this recommendation, you have decided to investigate the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) approach to guideline recommendation development. GRADE is a tool that has been developed to provide a transparent and thorough guide for rating the quality of evidence and strength of recommendations made within healthcare research.1 A GRADE assessment is conducted on a body of literature that was collated through a systematic review. The first step of recommendation development, after the guideline panel and clinical question to be answered have been defined, is to conduct a comprehensive systematic review that captures all evidence pertaining to the research question of interest.1 For this scenario, we are assuming that this systematic review of available literature on displaced femoral neck management using HA or THA has already been conducted, and all relevant research evidence has been collected for the outcomes of interest. It is important to note that GRADE is used to assess the quality of evidence for each individual outcome that will be considered within the clinical recommendation.1 This means that the GRADE approach would be repeated three times for the current scenario: assessing the quality of evidence for revision rates, one‐year mortality, and dislocation rates separately. This is done because the body of evidence for each outcome may not be the same. For example, there may be different ratings of the quality of evidence due to a large number of studies reporting revision rates, while fewer of them provide information on dislocation. The GRADE framework then provides guidance on how the working group should proceed to develop a clinical recommendation on HA or THA use for displaced femoral neck fractures. After collecting all relevant evidence and assessing the quality of that evidence for each individual outcome, the GRADE approach provides a transparent framework to create clinical recommendations based on the strength and quality of the evidence. This includes decisions by the working group regarding the balance between desirable and undesirable consequences of using the treatment options.2 It also requires the working group to provide a strength to their recommendation, based on the available evidence. Recommendations may be considered either “strong” or “weak,” depending on the certainty that the working group has regarding the quality and magnitude of the evidence that has been evaluated.2 The GRADE approach to assessing quality of evidence takes the following concepts into consideration: the study design of the available evidence, risk of bias, imprecision, inconsistency, indirectness, and publication bias.3 Additionally, assessment of all plausible confounders, magnitude of effect, and the presence of a dose‐response gradient are additional factors that are assessed within GRADE when evaluating observational data in order to potentially increase the quality of evidence rating.3 These considerations are each taken into account to provide a categorical quality rating of either very low, low, moderate, or high for each of the outcomes of interest.4 Evidence from randomized trials is initially regarded as high quality, but the evaluation of each of the considerations can influence the final rating given to the body of evidence (Table 4.1). Table 4.1 GRADE approach to rating quality of evidence. Source: Modified from Balshem, et al.3 A rating of high‐quality evidence by the guideline panel signifies that “[the panel] is very confident that the true effect lies close to that of the estimate of the effect”; while a rating of very low evidence is described as “[the panel] has very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect.”3 In order to determine an appropriate rating for the quality of evidence, each of the aforementioned considerations need to be evaluated in detail for each of the included outcomes. Assessors evaluating risk of bias can reduce the quality of evidence by one category if it is deemed to be a serious risk, or by two categories if it is deemed to be very serious. For example, a body of randomized trial evidence would initially be categorized as high‐quality evidence. If there was considered to be a serious risk of bias within the evidence, the rating would then be categorized as “moderate quality.” A rating that states a very serious risk of bias within the evidence would change the rating from high quality to low quality.5 In order to comprehensively evaluate the risks of bias for GRADE, there are a number of key study limitations that should be assessed in order to decide the risk of bias rating for a body of study of randomized trials:5
4 Healthcare Recommendations: Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) Approach
Case scenario
Top three questions
Question 1: What is GRADE?
Question 2: What are the components of a GRADE quality of evidence assessment, and how do you evaluate them for a body of evidence?
Quality of evidence assessment
Study design
Quality of initial body of evidence
Decrease the quality rating if
Increase the quality rating if
Final quality rating
Randomized controlled trials
Observational studies
High
(initial score of 4)
Low
(initial score of 2)
Risk of bias
−1 Serious
−2 Very serious
Inconsistency
−1 Serious
−2 Very serious
Indirectness
−1 Serious
−2 Very serious
Imprecision
−1 Serious
−2 Very serious
Publication bias
−1 Likely
−2 Very likely
For observational studies:
Large effect
+1 Large
+2 Very large
Dose response gradient
+1 Present
Plausible residual effect
+1 Would reduce the demonstrated effect
+2 Confident that all plausible confounders are accounted for
High
(score of 4 or higher)
Moderate
(score of 3)
Low
(score of 2)
Very low
(score of 1 or lower)
Assessing risk of bias