Assessing Clinical Results and Outcome Measures






  • CHAPTER OUTLINE






    • Survivorship Analysis 30



    • Arthroplasty Registers 30



    • Methods of Early Prediction of Failure 31




      • Statistical Models 31



      • Radiologic Models 31




    • Subjective Outcome Measures 32




      • Validity 32



      • Reliability 32



      • Responsiveness 32




    • Frequently Employed Outcome Measures 32



    • Interpreting Results of Subjective Outcome Measures 35



    • Identification of Modifiable Patient Factors 35



    • Summary 36



The concept of outcome measurement in arthroplasty surgery is multifaceted and requires consideration of several aspects. In its bluntest form, outcome is related to longevity of the prosthesis (i.e., survivorship). Although this outcome is simple to quantify, it gives no information on the performance of the implant clinically or its impact on patients’ lives—it does not give a true measure of the value of the procedure in either personal or societal terms. Such a measure is increasingly important in the current socioeconomic climate where the cost of health interventions must be justified.


In addition to the use of outcome measures to prove the efficacy of arthroplasty relative to other health interventions, there is the issue of quality improvement (i.e., comparing different prostheses or techniques) and of clinical governance, to enable individuals and institutions to assess, compare, and improve their performances. This chapter considers the various outcome measures in current use, their relative strengths and limitations, and areas of development in the attempt to refine them.




SURVIVORSHIP ANALYSIS


A description of outcome as determined by implant survivorship is often included in cohort studies, case series, and randomized prospective trials. It is usually reported in the statistical form of life tables or Kaplan-Meier curves. Interpreting results represented in this form meets several challenges. The first is that different definitions of failure may be chosen by different studies, rendering direct comparisons invalid. Survival curves are difficult to interpret when patient numbers are small, and this is particularly evident on the right-hand side of such curves, where dramatic drops occur as the single failures account for an increasingly larger proportion of the decreasing remaining study group. Study subjects either may be lost to follow-up or die during the follow-up period. These instances are usually dealt with in the “worst-scenario” method where failure is assumed—the true failure rate is most likely not represented. Perhaps the most relevant problem with making inferences from this type of study is that these studies often represent the work of high-volume surgeons in centers of excellence, and the results may not be directly extrapolated to the wider community or different populations. Finally, the reporting surgeon may be the innovator for the prosthesis, opening the study to potential bias.




ARTHROPLASTY REGISTERS


The requirement for standardized outcome information that is relevant to the general orthopedic community and to field experts in subspecialized centers is being addressed in many countries (Australia, Canada, Denmark, Finland, Hungary, Norway, New Zealand, United Kingdom) following the success of Sweden, by creating National Joint Replacement Registers. Because Sweden has one of the longest-running registers, we use this as an example of how national registers can be instrumental in defining and influencing outcomes.


Sweden began its register in 1979 with the mission of improving outcomes in hip arthroplasty. By a process of continual review, the Swedish registry has developed its data collection from simple demographics pertaining to primary arthroplasty (number of interventions per year or clinic and types of implant) to using three separate databases to record more comprehensive patient characteristics for primary and revision procedures and technical details of the operations. It aims to describe the epidemiology of hip replacement surgery and to identify by study of revisions risk factors for poor outcome. The register uses revision (exchange or extraction of one or both components) as the reliable but strict end point for failure. This end point has been shown to be valid. With this definition, which eliminates the problem of defining clinical failure, it has to be taken into consideration that the register underestimates the actual failure rate. For example, patients’ comorbidities may prevent further surgery, patients may be unwilling to undergo surgery, or patients may be on a lengthy waiting list at the time the assessment is made.


An important strength of the Swedish hip registry is that it collects information from all public and private clinics in Sweden, and so the data it provides reflect the results achieved by the “average” surgeon. Results are continually fed back to contributing institutions, allowing them to compare performance with the national average and consider the implants and techniques they are using. This register has been successful not only in determining failure rates and identifying risk factors, but also in improving the quality of total hip replacement in terms of implant safety and greater efficacy of surgical and cementing techniques.


Registers essentially act as surveillance tools and are useful for monitoring the performance of new prostheses or techniques. Although they provide good information to this effect by dealing with large numbers and results from throughout the orthopedic community (not just specialist centers), there is an inherent lag time between the occurrence of a problem and its recognition.




METHODS OF EARLY PREDICTION OF FAILURE


The lag period is of obvious concern when a prosthesis doomed to early failure gains popularity and widespread use before its deficiencies have come to light. This situation has led to the question of whether use of continuous monitoring methods can give early warning of suboptimal outcomes.


Statistical Models


Continuous monitoring methods are statistical testing procedures, which have been used in manufacturing and industry (and, less extensively, in medicine) for many years. These methods are used for the prospective monitoring of an intervention after it is in use in order to identify unacceptable or poor performance as early as possible. By predetermining an acceptable revision rate and setting boundaries to reduce the probability of a false alert, the use of this type of cumulative statistical model may give an advanced warning of a failing implant design or suboptimal surgical technique. National joint registries could offer a platform for this type of monitoring.


Radiologic Models


Radiostereometric analysis (RSA) is a technique used to predict long-term implant stability by studying its early behavior. At the time of surgery, small tantalum markers are embedded into the host bone so that the position of the implant can be precisely established. Postoperatively, biplanar x-rays are taken through a calibration cage, which has known fiducial (reference) points. The images are analyzed with an RSA software package that calculates micromotion between the implant and bone in three dimensions. These three measurements are converted into the overall motion—maximal total point motion. By repeating the x-ray analysis at 6-month intervals, the maximal total point motions can be plotted against time.


RSA has shown that the implant either stabilizes over time or continues to migrate. The difference in these two patterns can be detected one year postoperatively. This method is extremely precise and has been shown to be accurate and reliable in predicting implant survivorship with regard to aseptic loosening. It essentially acts as a surrogate marker for revision status. It is particularly useful because it has sufficient accuracy and power that groups of 30 patients can be used to study new technologies, limiting the number of patients exposed to the risk of design failures, and producing an early warning of unacceptable instability long before it becomes evident clinically. RSA can also be used to compare directly the efficacy, with respect to implant stability, of different surgical techniques. For instance, reaming of the subchondral plate for cemented acetabular components and using different surgical approaches.


The precision and accuracy of RSA makes this type of analysis the gold standard for measuring implant migration. The technique requires specialized radiographic equipment, insertion of marker beads, and expert interpretation of results; its use at present is restricted to prospective research in specialized centers. This limitation introduces the risk of potential selection and outcome biases. The question is raised as to whether alternative measurement techniques, although inferior to RSA in terms of precision and accuracy, may be adequate for detection of early movement at a threshold that is still predictive of later failure.


Direct methods of measurement have been shown to be too imprecise to detect this level of early movement, even with careful standardization of patient positioning and the use of modern measurement tools. Adequate precision can be achieved using EBRA-Digital (Ein Bild Roentgen Analyse). This system measures two-dimensional migrations from digitized plain radiographs using software programs that include elements to measure the components, to exclude radiographs with significant positioning artifacts from the measurement series, and to interpret the measurements. Although it is precise enough to characterize two-dimensional migration patterns and identify patients at risk for later aseptic loosening within two years of surgery, it is not as precise as RSA and requires more subjects in order to have equivalent power in a prospective study. EBRA-Digital is suitable for use in the multicenter trial setting. Collection of data from this wider pool of subjects reduces the selection and outcome biases associated with studies from specialist centers, potentially providing surrogate outcome information that is more generalizable to the wider orthopedic community.


Although we now have surveillance methods in the form of registries and predictive techniques such as RSA, these methods are useful only for observing outcomes as determined by implant survival. We have the necessary information to choose implants and techniques that give reproducible results in terms of longevity, but we lack information as to how these implants perform in terms of improving either the specific disease state or the patient’s overall well-being. The use of subjective outcome measures is required.




SUBJECTIVE OUTCOME MEASURES


A wealth of outcome measures are used in the literature to report subjective outcomes in hip replacement surgery, but there is little consensus regarding which are the most suitable, and it remains a challenge for the individual clinician to select the most appropriate metrics and to apply and interpret them correctly. Subjective outcome measures may be split into two broad categories: disease-specific or site-specific questionnaires (e.g., Harris Hip Score, Oxford Hip Score, Western Ontario McMaster University Osteoarthritis Index (WOMAC), and general health outcome questionnaires (e.g., SF-36, Nottingham Health Profile).


Whichever type of metric is chosen, one basic requirement of its appropriateness of use is that it has been psychometrically validated. The process of psychometric (the science of measuring mental capabilities and processes) validation tests the measure in question for three basic criteria to ensure its results can be interpreted in a scientific manner: validity, reliability, and responsiveness.


Validity


Validity is the ability of an instrument to measure that which it claims to measure. There are several angles from which validity should be assessed. Face validity refers to whether the questionnaire seems to measure what it is intended to measure—essentially, do the items on the questionnaire superficially make sense and can the questionnaire be easily understood. Poorly-structured response options to questions, hard-to-interpret rubrics, illogical responses, and double-negatives leave the questionnaire open to obvious criticism regarding its reliability and internal consistency. Even the most commonly used questionnaires have examples of items that leave much to individual interpretation.


Construct validity refers to whether there is evidence that the questionnaire actually measures what it claims to measure and reflects the concept being measured. A special case of construct validity is termed criterion validity, where the measure is compared with a gold standard. Because this standard does not exist for outcome measures pertaining to arthroplasty surgery, questionnaires instead are validated against a previously validated questionnaire. This is obviously suboptimal because any insufficiencies or flaws in the original questionnaire’s validity are perpetuated.


Content validity refers to whether the questionnaire is adequate (in terms of number and range of items) to test the area of interest properly so that correct inferences can be made. Many questionnaires tend to have more items grouped in the mid range of the scales being measured, leaving the extremes insufficiently challenged. This leads to floor and ceiling effects, where the patient achieves either the lowest or highest possible scores, and any clinical change in the direction of that extreme thereafter cannot be reflected by the measure. Similarly, a group of patients at one extreme on the measure may have heterogeneity that remains undetected.


An important concept regarding validity is that of noise. All measures produce a signal. The closer this signal is to that expected for the condition (by comparing it with the gold standard or with what is expected from previously validated metrics), the more valid the construct is. Any part of the signal that is not directly related to the condition of interest is termed “noise” ( Fig. 4-1 ).


Jun 10, 2019 | Posted by in ORTHOPEDIC | Comments Off on Assessing Clinical Results and Outcome Measures

Full access? Get Clinical Tree

Get Clinical Tree app for offline access