Measurement of Treatment Outcomes in the Young Patient with a Hip Disorder
Nicholas G.H. Mohtadi
Introduction
Measuring the treatment outcome of patients can be looked at in many ways. Society and the healthcare system have a different perspective than the healthcare practitioner and most importantly so does the patient. It should be made perfectly clear at the outset that no one outcome measure will be able to serve all purposes at the same time. It is also very necessary to recognize that each perspective has its own bias. From the healthcare system perspective the financial impact of the treatment may be an important aspect to measure. From the patient’s perspective, the negative impact of their problem on their quality of life (QoL) and their improvement with treatment is paramount. From a surgeon’s perspective, the correction of deformity on x-ray and type of surgical procedure could be critical issues with respect to outcome assessment.
Furthermore, outcomes can be subjective or objective, assessed generically, or in joint-specific or disease-specific terms and can be discriminative, evaluative, or predictive. Outcomes can be patient derived and reported, determined by a consensus of experts, administered in a variety of different ways, and can utilize various forms of responses (e.g., Likert scale, visual analog scale (VAS), ordinal scales, and nominal scales).
This chapter will attempt to provide a brief but general overview of outcome assessment and a focused review on the measurement of treatment outcomes for young patients with hip disorders.
Overview of Outcome Assessment
To measure the outcome of surgical treatment in a young patient with a hip disorder we must understand that the outcome is dependent on three independent components or variables.
First, we need to consider the patient. Patient demographics, the specific hip disorder, the natural history of this disorder, the extent of the disease, the impact of this problem on the patient, and any associated characteristics or comorbidities are likely to influence the outcome of surgical treatment. To better understand the results of surgical treatment between studies, it is necessary to understand the patient population, the sampling frame, and how the patients were selected into the study. We can account for many of these patient characteristics and match as many characteristics as possible. However, patients also have inherent biases. These biases are not likely to be known, predictable, or anticipated. Therefore, the only way to account for patient-related bias is to randomize patients to one treatment or the other. The randomized clinical trial (RCT) study design creates the best opportunity to address differences between patients to measure the outcome of a particular treatment. This can be illustrated by using an example from the literature. In an excellent study comparing the differences between labral refixation/repair compared to labral debridement, Larson and Giveans (1,2) showed that the refixation group had better short- and long-term outcomes using the Modified Harris Hip Scale (MHHS). Every effort was made to account and match for differences between patient groups preoperatively. The authors also compared many variables at baseline to demonstrate whether there were any glaring differences between the groups. The authors rightfully stated that “Although other variables could have influenced these outcomes, these preliminary results indicate that labral refixation resulted in better HHS (MHHS) outcomes and a greater percentage of good to excellent results compared with the results of labral debridement in an earlier cohort” (2). The “other variables” could very well have been differences in how the patients were chosen, the patient’s perception of their disease, their reaction to how they were treated, and so on. These patient biases cannot be accounted for in a retrospective design. The RCT minimizes the effect of bias between patients by randomly allocating patients to each treatment group.
The second independent component is that which is attributed to the surgeon/surgical procedure. Surgeons are
in a relatively unique position in medicine because patient outcome is based to a greater or lesser extent on the surgical procedure performed. It is not necessarily appropriate for an individual surgeon to simply quote results from the literature and apply them to his/her patients. The administration of a medication would be expected to have a similar effect and therefore patient outcome anywhere in the world, assuming that patients were similar in their disease state and demographic characteristics. The same cannot be said for surgical treatment on similar patients. Therefore, surgeons should be obligated to measure outcomes in some meaningful way. The impact of surgical experience is well understood. This has been addressed in the context of performing clinical trials by using an expertise-based design (3,4,5). In this design patients are randomized or matched to surgeons who perform a particular procedure. The surgeon is comfortable and experienced in the particular procedure. Therefore, the outcome of the study is more likely to pertain to the procedure, rather the surgeon. Using the example of the Larson and Giveans study, they again addressed this limitation by stating that it is possible that “improvements seen in the later refixation group could be affected by improved techniques for treatment of FAI (femoroacetabular impingement)”. Another limitation of this study relates to the fact that one of the surgeon-based factors in the decision to perform labral refixation/repair was the quality of the labral tissue. If the labrum was not repairable then the patient would not have been included in the refixation group. They would have been included in the debridement group. If the labrum is important in the outcome then the surgeon’s decision to repair or not would influence the outcome. It could be argued that this is a patient-related factor, that is, quality of the labrum; however, if the decision to repair is driven by the surgeon then it is a surgeon-based factor as well. One way to address this issue is to have the decision randomly determined, which may be ethically difficult. The authors effectively made this argument by stating, “with the growing awareness of the importance of the acetabular labrum and its potential for healing, we would not be comfortable randomly assigning patients to have the labrum resected.” Another way to address this issue is to refix/repair every patient and then divide the groups based on the quality of labral tissue.
in a relatively unique position in medicine because patient outcome is based to a greater or lesser extent on the surgical procedure performed. It is not necessarily appropriate for an individual surgeon to simply quote results from the literature and apply them to his/her patients. The administration of a medication would be expected to have a similar effect and therefore patient outcome anywhere in the world, assuming that patients were similar in their disease state and demographic characteristics. The same cannot be said for surgical treatment on similar patients. Therefore, surgeons should be obligated to measure outcomes in some meaningful way. The impact of surgical experience is well understood. This has been addressed in the context of performing clinical trials by using an expertise-based design (3,4,5). In this design patients are randomized or matched to surgeons who perform a particular procedure. The surgeon is comfortable and experienced in the particular procedure. Therefore, the outcome of the study is more likely to pertain to the procedure, rather the surgeon. Using the example of the Larson and Giveans study, they again addressed this limitation by stating that it is possible that “improvements seen in the later refixation group could be affected by improved techniques for treatment of FAI (femoroacetabular impingement)”. Another limitation of this study relates to the fact that one of the surgeon-based factors in the decision to perform labral refixation/repair was the quality of the labral tissue. If the labrum was not repairable then the patient would not have been included in the refixation group. They would have been included in the debridement group. If the labrum is important in the outcome then the surgeon’s decision to repair or not would influence the outcome. It could be argued that this is a patient-related factor, that is, quality of the labrum; however, if the decision to repair is driven by the surgeon then it is a surgeon-based factor as well. One way to address this issue is to have the decision randomly determined, which may be ethically difficult. The authors effectively made this argument by stating, “with the growing awareness of the importance of the acetabular labrum and its potential for healing, we would not be comfortable randomly assigning patients to have the labrum resected.” Another way to address this issue is to refix/repair every patient and then divide the groups based on the quality of labral tissue.
The surgical procedure can also be measured by considering surgical time, time under traction (i.e., arthroscopic hip surgery), whether or not a complication occurred, pre- and postoperation x-ray findings, as well as many other technical details of the surgical procedure. Assessing the surgical procedure may also be necessary to provide outcome information for third parties such as insurance companies, government organizations, or Workers Compensation Boards. In these cases reporting information on key performance indicators, such as measures of safety, use of antibiotic and deep venous thrombosis prophylaxis, hospital stay, and costs would be necessary (6). These types of outcomes are typically objective measures, which are observed, counted, and described in a reliable and valid way. Assuming that an unbiased observer documents this information independently, this can be trusted and assumed to be factual.
The third independent variable or component (the most important within the context of this chapter) is the outcome measure itself. Using the same example of the Larson (2) study, the primary outcome was the MHHS. They also measured pain on a VAS (0 to 10) and the SF-12 (7). The Harris Hip Scale was developed and published in a paper in 1969, long before the concept of evidence-based medicine or patient-reported outcomes (PROs) were on the radar screen (8). This clinician-based outcome was modified to eliminate the component of measuring range of motion (9). Therefore, an argument can be made that the outcome used in this study was less than optimal, that is, not patient-based, and therefore the conclusions are biased on that basis alone. In the Larson study if the pain VAS or SF-12 outcomes were used as the primary outcome the conclusions would have been that there is no difference in the two procedures (2).
Outcome measures can be called instruments, tools, scales, scores, indices, measures, outcomes, or questionnaires. These terms are used interchangeably for the purpose of this chapter. Outcome measures can be classified in many ways. Simply put, the outcome of treatment can be anything that is measured or observed. It can range from something as simple as measuring the range of motion to a complex, multifaceted, disease-specific, health-related quality-of-life (HRQoL) outcome questionnaire (10,11,12,13). The purpose of outcome measures can be classified as either disease specific, such as those tools created to assess osteoarthritis, or joint specific, such as those created to assess the outcome of any pathology of the hip. These measures can also be classified according to the person who completes the assessment. Traditionally, outcomes have been assessed by clinicians and include objective measures such as radiographic assessments. The clinician also asks the patient about pain and other subjective measures. These “clinician-based” or “clinician-administered” tools may introduce bias due to the way they are administered but more importantly may not capture the patient’s perceived outcomes (14). Therefore, more recently, patient-based and administered tools have been created. The objective of the tool must also be considered. If the goal is to follow patients over time and to assess changes, an evaluative index is necessary, because it can measure the magnitude of longitudinal change in an individual or a group of individuals (15). If the objective is to differentiate among patients to determine treatment, a discriminative index should be used, because it distinguishes between individuals or groups (15). It is very important to understand that the properties of each outcome measure change depending on the objective of the tool. One of the key properties of an evaluative index is the demonstration of responsiveness (15). Responsiveness refers to the ability of the outcome measure or instrument to detect within patient change over time (16). A discriminative index needs to differentiate between patients at a particular point in time. In other words being able to distinguish patients with more or less severe “disease” states. Guyatt (16) has explained the differences between these two types of instruments by using the statistical concept of quantifying the signal-to-noise ratio. The better the signal-to-noise ratio the better the instrument. “If the variability between patients (the signal) is much greater than the variability within patients (the noise), an instrument will be deemed reliable” (16). Discriminative instruments need to be highly reliable and the questions included in these instruments must enhance the
ability to measure variability. Evaluative instruments are subtly different in that they need to detect change over time, and responsiveness is a reflection of that change. Responsiveness is “directly related to the magnitude of the difference in score in patients who have improved or deteriorated (the signal) and the extent to which patients who have not changed provide more or less the same scores (the noise)” (16). If the change over time is clinically meaningful, then a responsive instrument will be able to measure whether or not specific treatment (i.e., surgery) has improved a patient’s outcome. Finally, it is very important to understand how each item in the PRO is determined. It is this initial item pool through the process of item generation that is critical (16,17). Once a comprehensive item pool is identified then the final set of items is reduced and formulated into the questionnaire (17). “The procedure for achieving comprehensiveness is different when selecting an item pool for an evaluative instrument than for either a discriminative or predictive tool” (15). In a discriminative index it would be important to have the majority of the respondents to answer the questions. Whereas in an evaluative index all relevant and important aspects should be included to measure clinically important outcomes (15).
ability to measure variability. Evaluative instruments are subtly different in that they need to detect change over time, and responsiveness is a reflection of that change. Responsiveness is “directly related to the magnitude of the difference in score in patients who have improved or deteriorated (the signal) and the extent to which patients who have not changed provide more or less the same scores (the noise)” (16). If the change over time is clinically meaningful, then a responsive instrument will be able to measure whether or not specific treatment (i.e., surgery) has improved a patient’s outcome. Finally, it is very important to understand how each item in the PRO is determined. It is this initial item pool through the process of item generation that is critical (16,17). Once a comprehensive item pool is identified then the final set of items is reduced and formulated into the questionnaire (17). “The procedure for achieving comprehensiveness is different when selecting an item pool for an evaluative instrument than for either a discriminative or predictive tool” (15). In a discriminative index it would be important to have the majority of the respondents to answer the questions. Whereas in an evaluative index all relevant and important aspects should be included to measure clinically important outcomes (15).
In today’s complex world of healthcare delivery, driven by issues of cost, efficacy, safety, effectiveness, and pay for performance, the key person in every respect remains the patient. Therefore, we are obligated to measure outcome from the patient’s perspective. Previously reported outcome measures have predominantly been clinician based. These have included measurements of deformity, range of motion, or the so-called objective measures such as evaluation of radiographs. More recently the logical and important trend has been to measure PROs (18). PROs are considered to be the reference standard for reporting clinical trials. It is necessary to distinguish between self-administered (i.e., by the patient) outcomes from those, which are not only self-administered but are also patient-derived or determined outcomes. There is some debate regarding what constitutes a PRO (19,20,21,22,23). The commonly accepted definition is “any report coming directly from patients, without interpretation by physicians or others, about how they function or feel in relation to a health condition and its therapy” (21). This definition works very well for simple outcomes such as measuring pain intensity over time using a VAS. The PRO takes on a different context when one is attempting to measure more complicated concepts such as QoL. A more complete definition of a PRO is that it is collected from a patient but more importantly the “information gained is necessarily of direct concern to the patient” (19). It is well recognized that the patient perspective is different from that of the clinician and most importantly the surgeon (14). Therefore, if we accept the first and simple definition of a PRO, that is, the patient is the source of the information, it becomes critical to define and/or label the content, construct, or concept of the specific PRO (24). This content includes direct subjective assessment by the patient of elements of their health including: symptoms, function, well-being, health-related quality-of-life (HRQoL), perceptions about treatment, satisfaction with care received, and satisfaction with professional communication. The patient is asked to summarize his or her evaluation of the disease, treatment, or healthcare system interactions through various modes, providing perceptions related to the condition, its impact, and its functional implications” (22). It is evident from the literature that there is discussion and debate regarding the definition of a PRO, what context it is measured, the importance of patient input, not to mention how it is analyzed and reported (19,20,21,22,23,24,25).
A recent systematic review of the literature identified three PROs for patients with FAI and labral tears (26). The authors identified the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), the Non-Arthritic Hip Score (NAHS), and the Hip Outcome Score (HOS). Critical appraisal of these outcomes would come to the conclusion that the WOMAC is patient-based and self-administered. However, it is disease-specific for patients with osteoarthritis and who are older than those with FAI (27,28,29). The NAHS is made up of 20 questions, 10 of which are taken directly from the WOMAC and the remaining questions determined by consensus from pilot test interviews with patients of varying educational levels as well as with health professionals (30