Rating Systems and Outcomes of Total Hip Arthroplasty

Rating Systems and Outcomes of Total Hip Arthroplasty

Conor J. Hurson and Michael J. Dunbar

Key Points

• The outcome of THA is generally excellent, yielding a large standard effect size.

• The large standard effect size associated with THA introduces a paradox of outcomes assessment, in that patients find it subjectively difficult to interpret subtle differences in outcomes associated with the introduction of new THA technology in the face of overall improvement post surgery.

• The gold standard for assessment of outcomes after hip arthroplasty is prosthesis survivorship. It is limited by the fact that revision status is a relatively blunt metric and generally is nonrepresentative of function, degree of pain relief, and overall patient satisfaction after hip arthroplasty.

• Higher-precision metrics and scoring systems are necessary to help introduce the next phase of surgical innovation.

• Survivorship for THA is generally improving over time.

• All rating systems are a construct for the “true” outcome. As such, all rating systems are subject to variable bias.


The Oxford English Dictionary defines outcome as the result or effect of treatment.1 Rating systems are tools or metrics used to assess an outcome. Numerous types of rating systems have been proposed for assessment of the outcome of total hip arthroplasty (THA). These include objective metrics, usually completed by the surgeon or allied health care professional, and subjective metrics, usually completed by the patient. Traditionally, and early in the development of THA, outcomes have concentrated on objective outcomes to record the success of the THA in terms of survivorship or reduction in complications. This was a necessary first step in the introduction, evolution, and refinement of THA as a viable and highly successful procedure. However, because the World Health Organization (WHO) has defined health as “…not only the absence of infirmity and disease but also a state of physical, mental and social well-being,”2 subjective outcomes have become more prevalent to augment some of the blunter objective outcome metrics.

Hip arthroplasty has been shown to have a major impact on health-related quality of life when preoperative status and postoperative status are compared.37 Such profound results make preoperative and postoperative comparisons of different prosthetic designs, surgical techniques, and so forth, using a given questionnaire, difficult to interpret and potentially irrelevant because assumed subtle differences in questionnaire results would be lost in the large signal. Paradoxically, the signal for preoperative and postoperative comparisons after hip arthroplasty is so loud (large) that in effect it functions as noise and obscures the subtler signal of interest. Additionally, hip arthroplasty innovation has made it inherently difficult to distinguish between subtle changes in treatments (Fig. 61-1). Therefore, a surgeon should carefully consider the various outcome metrics available before deciding which is most effective and appropriate for the desired application.


Figure 61-1 Asymptote of surgical innovation for total hip arthroplasty. Phase 1 innovation involved radical changes in technology from innovation to innovation (largely historic). Phase 2 represents modern, substantially more subtle innovative changes.

Although some consensus has been reached regarding which categories of outcome metrics should be applied to arthroplasty patients, no agreement is known as to which specific metrics are most appropriate. Instead, multitudes of metrics have been put forward in the literature, and new metrics continue to be introduced. Researchers are subsequently forced to choose a metric based on its published psychometric properties, or based on precedence and extraneous political factors. This practice has led to significant variation in the reporting of outcomes post arthroplasty. Although general trends in outcomes can be contrasted with various outcome metrics, subtler differences in outcomes are lost in the psychometric variability between outcome tools. Consequently, the need for consensus on the most appropriate outcome metrics for wide-scale employment was alluded to by Lord Kelvin when he said, “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.”8

Outcome metrics have been criticized because of the perception that they yield “soft” data, at least in comparison with more standardized technological laboratory tests that permeate the medical field, such as serum potassium or hemoglobin. Such tests are believed to yield “hard” data because the method for such tests is well described, the precisions are high, and the reproducibility is excellent. Still, the perception that questionnaires yield only soft data must not prevent clinically relevant questionnaire data from being utilized, because these data, perhaps more than any other, speak to the humanistic side, or art, of medicine.


Objective Outcome Metrics

One of the earliest assessments of hip mobility was devised in 1931 by Fergusson and Howarth to assess patients with slipped upper femoral epiphysis.9 In this purely objective assessment, points were allocated for hip flexion and abduction, as well as for adduction and hyperextension. In 1954, Merle d’Aubigné and Postel further developed the evaluation tool into a hip scoring system by including a subjective component that scored pain.10

The hip scoring system of Merle d’Aubigné and Postel was modified by Charnley in 1972, and has since become one of the most widely used hip assessment tools by orthopedic surgeons. The Charnley modification of the Merle d’Aubigné and Postel hip score assesses hip movements, pain, and walking. It is important to note that these categories require input from both the patient and the surgeon (or subjective and objective information), and the scores from each section are not combined for a total score.

The Harris Hip Score, developed in 1969, is one of the most widely used scoring systems to report outcomes following THA. It is a clinician-completed scoring system used to evaluate pain, function, range of motion, and absence of deformity. The function of a patient is decided by walking habits and the ability to do specified activities.

Surgeon-derived outcome measures can differ significantly from patient satisfaction after THA.11 Discrepancies between patient and surgeon perspectives are particularly large when patients are dissatisfied with replacement surgery. Thus, appraising the success of a THA using objective information exclusively will account for only a portion of the complete picture; therefore, subjective metrics are an important component of a hip scoring system.

Inclusion of Subjective Metrics

Subjective components in scoring systems became increasingly prevalent after the WHO altered its definition of health. Objective measures are unable to capture the mental and social well-being of patients. Unfortunately, subjectivity can make it difficult to make comparisons between different scoring systems; even the same scoring system reproduced in different languages can produce inconsistent results.12

Reporting of total scores can have the effect of blurring results. Individual section scores may not be proportional and cannot be meaningfully added together. Proportionality is particularly difficult to achieve when objective (e.g., radiologic measurements) and subjective (e.g., patient pain assessments) composite scores are combined.

Four types of questionnaires include at least subjective components: General Health, Disease Specific, Joint Specific, and Patient Specific. Each of these will be discussed in the “Basic Science” section of this chapter (Box 61-1).

Precision Objective Metrics

Over the past decade, high-precision metrics such as radiostereometric analysis (RSA) and gait analysis have become increasingly available. Currently, these techniques are used mainly as research tools, but as they evolve and less expensive surrogates are validated, their use will become more pervasive and essential to assessment of arthroplasty outcomes.


The gold standard for assessment of outcomes after THA is prosthesis survivorship. This survival analysis has been a powerful tool in the long-term assessment of replacement arthroplasty and allows comparison among types or series of joint replacements. Survivorship analysis was first used in orthopedics by Dobbs in 198013 and remains popular today.

Survival analysis provides very useful information; for this reason, many countries have developed joint replacement registries. These are national databases that monitor the survival of implants based on many variables, such as material, fixation technique, and size.

Although survival analysis is an essential tool for measuring THA outcomes, it is a crude one. Survivorship is based on an endpoint (e.g., revision, death) and often fails to account for the complexity of the variables involved. An implant can be revised (or not revised) for a variety of reasons; therefore large sample sizes can be required to allow conclusions about a particular procedure. Hence, survival analysis has no predictive power, and its applications are limited to post hoc and trend analyses.

Chaotic Innovation

Unfortunately, the introduction of new technology usually does not follow a stepwise algorithm and could be considered chaotic. The initial step, preclinical testing, is robust in North America. However, instead of being incorporated into prospective randomized studies before general release, new technologies are often made immediately available to a wide surgical community. Little emphasis is placed on formal study of clinical outcomes. Only a few specialized academic centers conduct prospective randomized studies; this again introduces a reporting bias into the literature. Finally, most published studies on new technology are retrospective in nature, often published after the technology has already changed. See Figure 61-2 for an illustration that demonstrates chaotic innovation.


Figure 61-2 Unfortunately, the introduction of new technology usually does not follow a stepwise algorithm and could be considered chaotic.

Basic Science

Outcome Metrics

With the advent of prosthetic components that demonstrate predictably good results, it became evident that more formalized outcome metrics were necessary. The initial response was that surgeons assessed the results of their interventions. Purely surgeon-derived outcome assessments were quickly shown to be inadequate without subjective data. Sir John Charnley, in 1972, modified the Merle d’Aubigné and Postel hip score to assess the outcome of his prosthesis; this system has become one of the most widely used hip scoring systems. This score assesses hip movement, pain, and walking. It is important to note that these categories require input from both the patient and the surgeon.

Survival Analysis

The Kaplan-Meier14 method is most commonly used to estimate prosthesis survival and to construct survival plots. It provides results that are independent of time intervals, in that survival is estimated at every failure time. Statistically significant differences can be assessed by using the log rank test. However, the log rank test does not allow adjustment for confounding factors. Relative risks for revision can be assessed and adjustments made for differences between compared groups (e.g., age, gender, diagnosis, other confounding factors) by using the Cox multiple regression model.

A 95% confidence interval should be given when survival results are presented. These can be presented in tables or on curves (Fig. 61-3). Murray and associates15 recommended the inclusion of a “worst case” curve—in which all patients lost to follow-up are considered failures—to provide a statistically accurate statement of survival. In addition, Lettin and colleagues16 recommended that at least 40 surviving subjects are required to produce reliable results.


Figure 61-3 An example of a survival curve with a 95% confidence interval. The survival curve is represented by the solid line, for the sample of subjects. Confidence intervals, represented by the dotted line, become wider as the sample size decreases over time.

Revision is a definite and easily reproducible endpoint that can be influenced by extraneous factors such as a patient’s fitness for surgery and severity of pain. Other endpoints, such as the presence of severe pain, low functional scores, and radiographic failure, should also be included.

Arthroplasty Registries

Arthroplasty registries use prosthesis survival as the primary outcome. Survival analysis is a definitive metric that facilitates comparison of outcomes between nations. Currently, 15 national arthroplasty registries have the potential to compare and contrast survivorship outcomes.1720 However, such comparisons are limited with respect to variations in demographics, including age at time of operation, diagnostic groupings, body mass index, gender, and activity levels. Research efforts are directed at defining the demographics of each nation/center in detail so that the denominator of the comparative data can be determined. Without this level of research, comparison of outcomes in survivorship between nations/centers is prone to misinterpretation. Furthermore, the specific method of defining survivorship should be standardized.15 For example, Cox’s regression is a particularly useful method because it accounts for other factors such as age and gender, which are known to have an effect on outcomes. If such factors are not considered in outcome analyses, reported differences in survival curves between various prostheses are difficult to interpret, particularly on an international basis.

Arthroplasty registries function best as a surveillance tool for implant failure. As such, favorable and unfavorable trends in the outcomes of certain prostheses can be easily determined and disseminated back to the orthopedic community in a quality improvement feedback cycle. However, because arthroplasty registries are surveillance tools, there is an inherent lag in the reporting of outcomes; this creates a potential for suboptimal implants or techniques to penetrate into and become part of the clinical norm before detection by the registry. A more accurate and predictive form of survivorship analysis would have the advantage of limiting new technology and techniques to fewer patients than is necessary to see trends with arthroplasty registries.

The gold standard for assessment of outcomes after hip arthroplasty is prosthesis survivorship. However, modern advances in prosthetic design and technique are such that the threshold for joint arthroplasty has moved from salvage operations performed in extreme cases to an intervention designed to improve the quality of life in patients who otherwise might cope without surgery. Hence, judging the success of the surgery may relate more to subtler improvements in quality of life, including relief of pain and improvement in function. Furthermore, technological innovation has improved the design of prostheses, ensuring survival in situ, barring infection, for at least a decade with relative certainty.18,21,22 Consequently, the homogeneity of current prostheses (with respect to stable and lasting designs) has produced an emerging emphasis on quantifying subtler outcomes after arthroplasty.

Limitation of Survivorship Analysis

Arthroplasty registries rely on revision status as the sole endpoint for defining outcomes after arthroplasty. Revision status is a useful measure because it is relatively easy to define and the incidence of revision is definite. Although definitive, revision status is a relatively blunt metric and is generally nonrepresentative of function, degree of pain relief, and overall patient satisfaction after knee arthroplasty. Furthermore, different surgeons have different thresholds for performing revisions, and not all patients who require revision surgery undergo the procedure because of coexisting medical problems, personal wishes, and so forth.23 Revision status yields data only on the small minority of operations that fail.24 The same set of arguments generally holds true for the outcome of continuous migration, as defined by RSA, which essentially acts as a surrogate for revision status. Although some evidence suggests that subjective outcomes may be correlated with RSA-defined migration patterns, this phenomenon is not widely reported in the literature.25

Subjective Outcomes

Pythagoras mused that “man is a measure of all things.”26 The implication of this statement speaks to the conceptualization that the distinction between mind and body is blurred, or indeed that there is no distinction at all. Although the Western philosophical distinction between mind and body has its origins with the ancient Greeks, it was the works of Renés Descartes that formalized the modern distinction between mind and body.27 According to Descartes, the rational soul is an entity distinct from the body that may or may not be aware of signals passing through the body via interfibrillar spaces. The interfibrillar spaces (i.e., the sensory nervous system) were “extended” into the physical world, while the rational soul (i.e., consciousness) was not. This distinction between mind and body has persisted into modern Western medical thought.

In 1947, the WHO defined health as follows: “Health is not only the absence of infirmity and disease but also a state of physical, mental and social well-being.” This definition reintroduced the concept that mind and body are in fact one and the “well-being” of the mind and body combined represents health. Subsequently, the measurement of health moved from simply defining the success of a procedure by determining its effect on infirmity and disease, to the more ambitious approach of defining what effect the intervention had on physical, mental, and social well-being. By this definition, it was no longer adequate to define the outcome of a hip arthroplasty, for example, simply by stating what the range of motion was or what the impact was on mobility. Instead, a more comprehensive metric was needed.

The definition of health put forth by the WHO was perhaps the impetus for the modern movement to measure physical, mental, and social well-being. The first attempts at quantifying general health involved single-item global ratings that were designed to augment organ-specific or more physiologic outcomes. Over time, a large number of questionnaires were developed that asked more questions around various aspects of health, such that separate scores for each of these health domains were generated. Domains that attempted to account for physical, mental, and social well-being included Emotional Reaction, Sleep, Social Isolation, Body Pain, and Social Functioning, for example. Advanced study and refinement of these tools continue today. The introduction and evolution of generic (or general) health measurements have been well documented.28 Measurements of this sort are often referred to as “subjective” and are difficult to quantify. Still, some form of logical metric was imperative for further research. This dilemma was eloquently alluded to by Lord Kelvin when he said, “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.”8 The WHO continues to be interested in this area of outcomes research. At a workshop in January of 2000 under the umbrella of the Bone and Joint Decade 2000-2010, the need to standardize outcome metrics for musculoskeletal research was discussed.29

Although the WHO definition of health may be largely responsible for the emergence of general health outcome questionnaires, the first aspect of the definition, that is, “…the absence of infirmary or disease…,” has not been lost on researchers. Similar evolution of health outcome questionnaires focused on the organ (or site) or physiologic process (disease) that has come about.

Subjective Health Outcome Questionnaires

Psychometric Considerations: What Makes a Good Questionnaire?

Definition of Psychometrics

Psychometrics can be defined as “the scientific measurement of mental capacities and processes and of personality.”30 In other words, psychometrics is the process that allows researchers to apply scientific methods to the measurement of subjective outcomes. In practical terms, the published psychometric properties of a questionnaire pertain mostly to validation of the questionnaire, or to defining how well the questionnaire measures what it is supposed to measure, in a global sense. The validation process usually involves three specific aspects of questionnaire testing: validity, reliability, and responsiveness.

Only gold members can continue reading. Log In or Register to continue

Nov 30, 2016 | Posted by in MUSCULOSKELETAL MEDICINE | Comments Off on Rating Systems and Outcomes of Total Hip Arthroplasty
Premium Wordpress Themes by UFO Themes