Assessing Outcomes After Hip Surgery

CHAPTER 8 Assessing Outcomes After Hip Surgery




Introduction


The assessment of outcomes after any type of surgery can be categorized in a variety of different ways. Simply put, the outcome of a procedure can be anything that is measured or observed. It can range from something as simple as measuring the range of motion to a complex, multifaceted, disease-specific, health-related quality-of-life outcome questionnaire.


Outcomes can be considered objective; this means that they are undistorted by emotion or personal bias and based on observable phenomena. Outcomes can also be described as subjective, which means that the effect takes place within the mind and is modified by individual bias. The irony of the subjective categorization when it comes to measuring outcomes in medicine or surgery is that we consider something like an x-ray to demonstrate objective outcomes but visual analog pain assessments to show a subjective outcomes. However, the fact is that the interpretation of the x-ray is open to observer bias and therefore has a component of subjectivity. By contrast, a patient’s response to a visual analog pain scale can be reproduced and assessed for error, and it therefore has the essential properties of an objective measurement. Whether the outcome is objective or subjective is not as important as whether the thing being measured represents the truth with respect to the outcome of a particular procedure.


With regard to hip outcome measures, several authors have addressed this area in the past. In 1972, Andersson compared 77 patients with the use of nine different methods and converted the final outcomes into a categoric scale of “good,” “fair,” and “bad.” The results were very disparate, with good outcomes ranging from 97.5% to as low as 30%, depending on the outcome used. The author’s final conclusion emphasized the importance of achieving agreement about what outcome should be used. In 1990, Callaghan and colleagues came to similar conclusions. In 1993, Bryant and colleagues used a statistical approach to analyze two separate groups of patients. They identified three core factors that were statistically independent: walking distance, hip flexion, and pain. These factors represented independent variables with respect to the outcome of hip arthroplasty. The authors’ conclusion was that combining these variables into a composite score is “arbitrary and without scientific foundation.” More recently, authors have recognized the need to assess health-related quality of life as a measure of health status of patients. Ethgen and colleagues performed a systematic review of outcomes related to hip and knee arthroplasty. They identified several important outcome measures, but their focus was on whether arthroplasty surgery improves quality of life. They stated the following: “If clinicians are interested in going beyond the pathophysiology…, if they seek to perceive the broader implications of diseases and strategies implemented to counter these diseases, it is necessary to consider outcomes that encompass several dimensions of health, as a health-related quality-of-life instrument does.” With this compelling statement in mind, this chapter will focus on the quality and methodology of outcome measures that have been created or used by orthopedic surgeons to assess the management of traumatic and degenerative conditions of the hip. The essential information is based on a systematic review performed by the authors.


Forty-one clinical rating systems for the outcome measurement of orthopedic patients with hip disease were identified. We will start with a general statement about outcomes and present a historic perspective. We will then classify the tools according to whether they were clinician or patient based, their method of administration (i.e., clinician or self-administered), and their purpose (i.e., evaluative, discriminative, or predictive). In the next part, we will critically appraise each tool for its quality by evaluating its creation methodology, looking at its population of interest, and reviewing the psychometrics (i.e., reliability, validity, and responsiveness) of the outcome measures. The final part of the chapter will focus on the development of a new health-related quality-of-life instrument that focuses on young, active patients with hip problems.



Outcome assessment: general


The purpose of outcome measures can be classified as either disease specific, such as those tools created to assess osteoarthritis, or joint specific, such as those created to assess the outcome of any pathology of the hip. These measures can also be classified according to the person who completes the assessment. Traditionally, outcomes have been assessed by clinicians and include objective measures such as radiographic assessments. The clinician also asks the patient about pain and other subjective measures. These “clinician-based” or “clinician-administered” tools may not capture the patient’s perceived outcomes. Therefore, more recently, “patient-based” or “patient-administered” tools have been created.


The objective of the tool must also be considered. If the goal is to follow patients over time and to assess changes, an evaluative index is necessary, because it can measure the magnitude of longitudinal change in an individual or a group of individuals. If the objective is to differentiate among patients to determine treatment, a discriminative index should be used, because it distinguishes among individuals or groups. Finally, to prognosticate, a predictive index can be used to classify individuals into a set of predefined measurement categories.


The second factor to consider when choosing an outcome measure is the quality of the tool, which is determined by the creation methodology. The creation of a questionnaire should follow a structured methodology that includes generating and reducing items. After it has been created, the tool must be tested for psychometric properties (i.e., reliability, validity, and responsiveness) within the target population.


Reliability refers to the ability of the tool to yield consistent and reproducible results. The questionnaire must be reproducible so that results will be the same for the same patient with the same amount of pathology on two separate occasions when measured by either the same or different raters. In addition, the items within the tool itself must be consistent so that all questions pertain to the concept that is being assessed.


Validity refers to how well an instrument fulfills the function for which it is being used. The different types of validity are face, content, construct, and criterion validity. Face validity ensures that the questionnaire “looks good” or appears to measure the intended content or trait. Without face validity, the tool will not be accepted. Content validity refers to the comprehensiveness of the instrument and how well the items represent all relevant concerns. Construct validity is the extent to which a particular tool can be shown to measure a hypothetical construct. Many of the factors that affect the ultimate outcome of a treatment (e.g., patient satisfaction) are intangible and therefore difficult to test. Construct validity tests these intangible qualities by proposing and testing logical relationships among different tools to measure similar or different outcomes. Construct validity can be tested with the use of convergent validity (when a high positive correlation is desired) and divergent validity (when a high negative correlation is desired). Finally, criterion validity is the validity of the questionnaire as compared with a gold standard. However, there is often no gold standard against which a particular questionnaire or tool can be compared.


Responsiveness refers to the ability of an instrument to measure change. However, this may be limited by ceiling or floor effects. Ceiling effects occur when the ability to record improvement is limited by the maximum obtainable value, and floor effects occur when the ability to record deterioration is limited by the minimum obtainable value.



Historic summary


There are many outcome measures that have been created for use in the population of orthopedic patients with hip disease. Most outcomes have been created for older patients who either require hip replacement or have a fracture.


The first published outcome tool for the hip was created by Ferguson and Howorth to assess the operative management of children with slipped capital femoral epiphysis. The authors measured the range of motion of the hip in all planes and multiplied each measurement by different modifiers to give the motions different weightings. This index of mobility was then modified by Gade, who changed the weighting of the actions for use in a total hip arthroplasty population. In 1954, Shepherd modified this index to use it as the mobility assessment for his tool, which also includes assessment of pain and function as well as the patient’s own assessment. Shepherd’s modification was further refined by Harris in 1969.


The first functional assessment appears to have come from France; it was published by Judet and Judet in 1952 and then in a different form by Merle D’Aubigné. The Merle d’Aubigné-Postel hip score was created in 1954 to grade the functional value of the hip in 405 patients who were treated with arthroplasty for the management of fractures of the femoral head or neck, osteoarthritis, or congenital dislocations. This score has subsequently been modified into other hip outcome measures: Charnley used the tool to assess low-friction total hip arthroplasties, Dutton and colleagues modified the tool for the assessment of patients with hip resurfacing, and Matta and colleagues adapted the score for patients with acetabular fractures. The Merle d’Aubigné-Postel score continues to be widely used for the assessment of hip arthroplasty in Europe. It has also been used by Letournel and Judet to assess acetabular fracture treatment, and it has since become the primary outcome measure for assessing the patient population with acetabular fractures.


In North America, the Harris Hip Score is more commonly used for the assessment of total hip arthroplasty. This score was created in 1969 “in an effort to encompass all the important variables into a single reliable figure, which is both reproducible and reasonably objective. The system was also designed to be equally applicable to different hip problems and different methods of treatment.” This disease-specific measure was created to evaluate patients having a total hip arthroplasty after a hip dislocation or an acetabular fracture. In 1973, Ilstrup and colleagues modified the Harris Hip Score to create a computerized method of following the results of total hip arthroplasty. Over a 40-year period, many other scores were developed as clinician assessments of arthritis.


In 1968 Goodwin developed what was considered to be a predictive tool for patients with hip fracture treatment. In 1982 Keene and Anderson developed the Hip Fracture Functional Rating Scale as a predictive measure to help with patient discharge planning and placement. Other outcome measures were developed to assess traumatic hip dislocations and for patients with slipped capital femoral epiphysis.


Before 1985, all measures were clinician based, and none of these tools made use of a standardized methodologic process for its creation; rather, these tools were created by one or more clinicians on the basis of what was felt to be clinically relevant. Since 1987, the methodology for creating a tool has been described, and the newer tools have followed a structured creation format. Most tools have still been directed toward the patient population with arthritis, which generally includes patients with an average age of more than 70 years.



Classification (Table 8-1)


All tools created before 1986 were clinician based, whereas all but two tools, created after 1986, are patient based. Of the patient-based tools, 10 are self-administered, 3 are meant to be administered by a clinician, and 1 can be either self- or clinician-administered. There are 3 predictive tools; all of the other tools are evaluative, and there are no discriminative tools. There are two main populations for which these scores have been created: middle-aged to elderly patients undergoing a hip replacement and elderly patients with fractures of the hip. Several other tools were created to be inclusive of all ages. One tool was created to capture higher activity levels among an older arthritic population, and two tools—the Non-Arthritic Hip Score and the Hip Outcome Score—were created specifically to capture the concerns of young, active patients with pre-arthritic hips.





Psychometrics (Table 8-2)


Table 8-2 shows the evaluative tools that have been tested for reliability, validity, and responsiveness. Of the clinician-based tools, only the Harris Hip Score and Lequesne Index have been tested for reliability and shown to have internal consistency. Ten of the patient-based measures have demonstrated internal consistency. Two others—the Total Hip Arthroplasty Outcome Evaluation and the Hip Rating Questionnaire—have also been tested for internal consistency, but the results were poor. The majority of the questionnaires have demonstrated adequate reproducibility or test/retest reliability. All tools in Table 8-2 were presumed to have both face and content validity because patients and clinicians were involved in their creation or because they have been used by other surgeons in clinical practice or research. All have been tested for construct validity against other questionnaires. Only three tools have been tested for criterion validity: the Hip Rating Questionnaire was compared with the 6-minute walk test; the MFA was compared with stair climbing and walking speed; and the Lower-Extremity Measure was compared with the timed up-and-go test.




Recommendations



General Musculoskeletal Complaints


There are two outcome measures that have been well designed for general musculoskeletal complaints of the lower extremity: the MFA and the American Academy of Orthopedic Surgeons Outcomes Questionnaires (AAOS Outcomes Questionnaires) The MFA was developed in 1996 to detect differences in function among patients with musculoskeletal disorders of the extremities. It is an evaluative, self-administered, patient-based tool that is appropriate for adult patients (i.e., an average age of 40) with various disorders, including upper-extremity injuries (45%), lower-extremity injuries (45%), repetitive-motion disorders (6%), osteoarthritis (3%), and rheumatoid arthritis (2%). It was created by a group of clinicians that included academic and community orthopedic surgeons as well as rehabilitation medicine specialists, physical therapists, and occupational therapists. It was developed with a formal methodology that included item generation. Items were identified by a review of existing scores and by interviews with patients and clinicians. Item reduction followed a formal process of determining the items that were prevalent, important, representative, and measurable. The MFA is consistent and reproducible, and face and content validity were ensured during its creation. Good construct validity was shown by comparing MFA scores with physicians’ ratings of patient functioning. When tested for criterion validity against stair climbing and self-selected walking speed, the tool showed poor agreement. This lack of correlation likely reflects the fact that the MFA was designed for a broad range of musculoskeletal disorders, including upper-extremity injuries.


The AAOS Outcomes Questionnaires were “designed for the efficient collection of outcomes data from patients of all ages with musculoskeletal conditions.” The outcome tools were separated into a Lower Limb Core Scale, a Hip and Knee Core Scale, a Sports/Knee Module and a Foot, and Ankle Module. The Lower Limb Core and the Hip and Knee Core Scales are essentially identical instruments with seven questions. The essential difference between the two questionnaires is that the words “lower limb” is substituted with “hip/knee.” This self-administered, patient-based, evaluative tool was created in 2004 with the use of a modified group technique that involved surgeons and health-services researchers. The group identified items after a review of the literature and then reduced these items by consensus. It is appropriate for adult patients around 48 years old with hip or knee complaints. It is reliable, with good internal consistency and reproducibility, as demonstrated by a test followed 24 hours later by a retest. Although patients were not involved in the generation or reduction of items, patients were asked if the questionnaires addressed their concerns to ensure face and content validity. The AAOS tool has shown good to excellent construct validity against the WOMAC (Pearson value, 0.89), the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36; Pearson value, 0.7), physician assessments of pain (Pearson value, 0.69), and physician assessments of function (Pearson value, 0.73). The AAOS is responsive, but it may show slight ceiling effects.



Osteoarthritis of the Hip


The best tool for general osteoarthritis of the hip is the Hip Disability and Osteoarthritis Outcome Score (HOOS). The HOOS is a patient-based, self-administered, evaluative tool created from the WOMAC. The WOMAC is a patient-based self-assessment tool that was initially developed in 1988 for patients with symptomatic osteoarthritis of the hip or knee. The items on the WOMAC were generated from interviews with 100 patients with osteoarthritis. It was tested with a group of patients with an average age of 71 years who were undergoing total hip arthroplasty. The WOMAC has been found to be both consistent and reproducible, and face and criterion validity were ensured during its development. Construct validity was determined by testing it against the SF-36. This tool is also responsive; however, the WOMAC does not capture the concerns of more active patients.


The HOOS was developed in part to capture the higher activity levels of patients with hip osteoarthritis. The questions for the HOOS were generated by interviewing more than 100 patients with hip disability and with or without hip osteoarthritis. Items were reduced by factor analysis to 40 items and include all of the items from the WOMAC in unchanged form. This tool has high reliability for all components of the questionnaire and high internal consistency. Content validity was ensured by having a subgroup of 26 patients rate the relevance of the importance of each item on a Likert scale, with “1” indicating that the item was irrelevant and unimportant and “3” indicating that the item was very relevant and very important. Construct validity was evaluated by comparing HOOS scores with those of the SF-36 general health status questionnaire. Responsiveness to clinical change was evaluated by calculating standardized response means and comparing the results with those of the WOMAC. This tool is good for patients with an average age 65 to 70 years with primary hip osteoarthritis who are having total hip replacements.

< div class='tao-gold-member'>

Stay updated, free articles. Join our Telegram channel

Jul 24, 2016 | Posted by in MUSCULOSKELETAL MEDICINE | Comments Off on Assessing Outcomes After Hip Surgery

Full access? Get Clinical Tree

Get Clinical Tree app for offline access