Effectiveness Evaluation of the Shoulder

CHAPTER 7 Effectiveness Evaluation of the Shoulder



The assessment of outcome following treatment is an important component of providing clinical care to patients. Patients, their relatives, employers, third-party payers, colleagues in other fields of medicine, hospital administrators, and government officials are all stakeholders in the health care system. Stakeholders need to know whether or not the health care interventions that are occurring in the system are effective, the degree to which they are affecting the health of patients, and, in relative terms, the cost-effectiveness of treatment. Duckworth and colleagues1 demonstrated the wide variability in the clinical expression of patients with full-thickness rotator cuff tears and showed the importance of documenting the clinical expression of shoulder pathology at initial evaluation in individual patients when treatment is being considered.


The concepts of quality assurance and continuous improvement demand that measurements be available to use as yardsticks against which to assess the outcome of treatment. There has been an explosion of interest in the assessment of general health outcome and in the assessment of outcome in the shoulder. This chapter details the history of outcome assessment in the shoulder and the methodology by which outcome measures themselves are assessed, critically analyzes the outcome measures currently available, and makes recommendations regarding the current use of outcome measures in clinical practice.




HISTORY


Codman2 is credited with introducing the concept of outcome assessment in the early part of the 20th century. He espoused the concept of the “end result idea” wherein the clinician critically evaluates the results of treating all surgical cases in order to identify and understand treatment failures so that the care of future patients could be improved.


Until recently, outcome following shoulder reconstruction was usually assessed in terms of treatment efficacy. Observer-based measures such as range of motion, stability, and deformity, together with radiographic assessment of fracture union or prosthesis alignment were used to determine whether or not a procedure had been effective. Such observer-based methodology ignores the consumer of the product when assessing its effectiveness and does not address the larger concept of patient health and the role of the shoulder disorder in affecting the general well-being of the individual.


Scientific analysis has demonstrated that observer-based assessment of shoulder function can be inconsistent. Hayes and coworkers3 found visual estimation, goniometry, still photography, and reach for six different shoulder motions to have standard errors of measurement of between 14 and 25 degrees between raters and 11 and 23 degrees with the same rater on different occasions. Ostor and colleagues4 determined the inter-rater reproducibility of clinical tests for rotator cuff function. Fair concordance between assessments was found in 40 of 55 observations, and moderate concordance was found in only 21 of 55 instances. Tzannes and coauthors5 found some tests of shoulder instability to be reliable among examiners, providing pain was not used as a criterion for a positive test. However, their study involved a small group of 13 patients who were preselected by referral to a shoulder specialist with complaints of instability. Terwee and colleagues6 found interobserver agreement to be low for assessing active and passive elevation of the shoulder, especially for patients with high pain severity and disability. Rudiger and coworkers7 found that even “objective” measurements of shoulder mobility by patients and surgeons correlated poorly in a prospective study. Dowrick and colleagues8 compared self-reported and independently observed disability in an orthopaedic trauma population. They found that disability level ratings varied greatly and that observers consistently rated the disability levels lower than participants. Hickey and associates9 found that experienced manipulative physiotherapists had difficulty determining the symptomatic status of patients using observational motion analysis.


The comparison of the results of various health care interventions in different fields of medicine is hampered by the absence of a common measurement instrument. Patient-based outcome assessment tools are widely accepted for assessing general health. General health outcome measures have also been used to determine the effect of shoulder conditions on patient well-being, and patient-based outcome instruments have been developed for the shoulder and the upper extremity as a whole. Cook and colleagues10 assessed four shoulder outcome measures and found them to exhibit good internal consistency across surgical status, in contrast to the in-consistency observed when observer-based systems are assessed.




TYPES OF OUTCOME MEASURES


The decision about which outcome measure to use should be based on many factors, including the population being studied, the purpose of the assessment (routine assessment vs. clinical research), training required, time required for administration and scoring, and availability of normative data. Generic health instruments, joint-specific instruments, limb-specific instruments, and disease-specific instruments are available for outcome assessment in the shoulder. Beaton and Schemitsch11 have reviewed the development of measures of health-related quality-of-life and physical-function instruments, noting the development of many outcome measures in the 1990s and reports of their reliability, validity, and responsiveness (see later).





Limb-Specific Outcome Instruments


Limb-specific outcome instruments are based on the supposition that the upper extremity functions as a kinematic chain. According to this paradigm, the shoulder and the elbow, forearm, and wrist position the hand for grasping and manipulating the environment. Studies of limb-specific instruments have shown close correlation with joint-specific and disease-specific measures, although they tend not to have the same degree of responsiveness. Nevertheless a limb-specific instrument may be appropriate when the diagnosis is less certain or when more than one part of the extremity is affected. For example, a condition involving a shoulder, elbow, wrist, and hand affects, to a greater or lesser extent, the ability to use a telephone.


The Disabilities of the Arm, Shoulder and Hand (DASH) questionnaire focuses on functional limitations, symptoms, and psychosocial problems. Patient-completed limb-specific functional questionnaires such as the DASH questionnaire have been developed with careful attention to psychometric principles of instrument design. Whole-limb questionnaires can be used to assess functional outcome following the treatment of specific joint disorders in the upper extremity. The use of outcome measures that are designed to assess the function of the entire limb can help to determine the relative impact of disorders affecting various anatomic sites in the upper extremity.


It may be appropriate to use a combination of a joint-specific, a limb-specific, a disease-specific, and a generic health status instrument for detailed research studies. However, for most clinicians, the use of a generic and either a joint-specific or a limb-specific outcome instrument would be appropriate for routine assessment and follow-up.



Disease-Specific Outcome Instruments


Disease-specific outcome measures are designed to assess specific conditions in individual joints. An example of a disease-specific outcome measure is the Western Ontario Shoulder Instability Index (WOSI). As a general rule, disease-specific outcome measures are very responsive to small changes in the condition for which they were designed. The disadvantage of disease-specific outcome measures is their limited usefulness in comparing outcomes across different disorders, anatomic sites, and populations and the need for a plethora of outcome measures to assess all conditions affecting the shoulder and elbow.


Kirkley and coworkers12 developed a disease-specific quality-of-life measurement tool for patients with shoulder instability. The steps included identification of a specific patient population, item generation, item reduction and pretesting. The final instrument had 21 items and demonstrated validity, reliability, and responsiveness. The instrument’s responsiveness compared favorably to five other shoulder outcome instruments, a general health outcome measure and measurement of range of motion. The authors suggested that their instrument be used as a primary outcome measure in patients with shoulder instability. In a similar manner Watson and colleagues12a developed a disease-specific questionnaire to assess outcome of glenohumeral instability termed the Melbourne Instability Shoulder Scale.


Kirkley and coauthors13 also developed disease-specific outcome instrument for patients with rotator cuff disorders (Western Ontario Rotator Cuff index). This instrument has been translated into Turkish, and the instrument has been shown to be valid and reliable when used in this context.14 Kirkley’s group15 also developed an outcome measure specifically for patients with glenohumeral osteoarthritis (the Western Ontario Osteoarthritis of the Shoulder Index).



APPLICATION OF OUTCOME MEASURES


The lack of a widely accepted outcome measure can lead to confusion when one attempts to determine the severity of an impairment. Patient-completed functional questionnaires have been developed and tested by psychometric and clinometric methods. When an observer questions a patient with regard to function and then records the response, the possibility of observer bias is introduced. Patient-completed questionnaires can be answered by telephone and mail, do not require a physical examination, and can be used to derive raw scores rather than categorical rankings.





Categorical Rankings Versus Aggregate Scores


A review of the literature reveals a plethora of different definitions for categorical rankings that have been used to describe the outcome after operations on the shoulder. The use of categorical ranking might lead clinicians to mistakenly assume that the categorical rankings that have been used in independently designed scoring systems describe similar levels of impairment. The variable definition of terms for categorical rankings hinders accurate communication between investigators and is an impediment to the objective comparison of the results of different studies. For example, the developers of different scoring systems have assigned different weights to each domain and different ranges of values to each categorical ranking. The outcome measures currently in general use for the shoulder are scoring systems based on the assessment domains such as range of motion, pain, and ability to perform daily activities, which are scored separately. Scores are then aggregated and assigned categorical rankings that range from excellent to poor. It is possible that the same patient can have different raw scores and different (or similar) categorical rankings, depending on which scoring system is used.


Scoring systems based on categorical rankings compartmentalize results into several categories or domains that clinicians use in decision making. Such scoring systems may be more appealing to clinicians than other outcome measures such as patient-completed functional questionnaires. However, the admixture of clinical and functional criteria can create a confusing array of variables, and questioning of a patient by an observer introduces the possibility of observer bias during the collection and interpretation of data. The differences among scoring systems are so pervasive that categorical rankings cannot be relied upon to provide meaningful comparisons either with the same cohort of patients or between cohorts. The results of studies that are based on categorical rankings of different scoring systems cannot be compared or combined. Patient-completed functional questionnaires can be valid and reliable instruments for assessing the shoulder and are not limited by observer bias.



ASSESSING OUTCOME MEASURES


A wide variety of outcome measures are available (Box 7-1). Because most clinicians use an outcome measure as an initial assessment tool and as a method of determining a patient’s progress over time, the initial selection of an appropriate outcome measure is important. It is necessary to assess the measurement properties of different instruments to be certain that the outcome is being appraised accurately. Outcome measurements have varying strengths depending on the population being assessed or the reason for using the instrument. An instrument for an outcome measure must be context specific, and selection should be based on evidence that the instrument has the necessary measurement properties in the population being sampled for a study or assessment. The quality of an outcome measure can be assessed objectively, and given the plethora of outcome measures that have been developed, it is advisable to use an outcome measure for which there are data on its measurement properties. The measurement properties of an outcome measure that are important to clinicians are validity, reliability, and responsiveness.



BOX 7-1 Types of Outcome Instruments to Evaluate the Shoulder







Validity


Conceptually, an instrument is considered valid if it is measuring what it is supposed to measure. Different terms are used to describe different facets of validity. In face validity, the items (questions) chosen appear to make sense to the subject using the instrument. Face validity is the simplest and weakest form of validity. Content validity is satisfied when it is proved that the scale measures all important aspects of the condition to be examined. Construct validity is the degree to which an outcome measure can be shown to be associated with other measures that have a specific relationship with the system being measured. Testing of construct validity builds confidence in an outcome measure. Comparison of the data generated by the outcome measure to patient-derived and physician-derived assessment of the severity of the impairment, the level of pain, the ability to perform normal activities of daily living, and the responses on other contemporary patient-completed questionnaires can be used to test validity. Convergent validity determines if an outcome measure correlates with similar scales or dimensions of a scale. Divergent validity demonstrates lack of correlation between dissimilar scales or dimensions of a scale. Discriminant validity is the ability of an outcome measure to discriminate across the levels of severity for a patient population. A variety of statistical measures are used in determining validity, including Pearson’s product moment correlation coefficient. A detailed discussion of the methodology of determining validity is outside the scope of this chapter.


Stay updated, free articles. Join our Telegram channel

Tags:
Sep 8, 2016 | Posted by in PHYSICAL MEDICINE & REHABILITATION | Comments Off on Effectiveness Evaluation of the Shoulder

Full access? Get Clinical Tree

Get Clinical Tree app for offline access