Patient-Related Outcome Measures for Shoulder Surgery and Rehabilitation

Lori A. Michener, PhD, PT, ATC

Phillip W. McClure, PT, PhD

Dr. McClure or an immediate family member serves as a board member, owner, officer, or committee member of the Journal of Orthopaedic and Sports Physical Therapy. Dr. Michener or an immediate family member has received nonincome support (such as equipment or services), commercially derived honoraria, or other non-research–related funding (such as paid travel) from the Journal of Orthopaedic and Sports Physical Therapy; and serves as a board member, owner, officer, or committee member of the ACOEM–American College of Occupational and Environmental Medicine, the American Physical Therapy Association–Orthopaedic Section, the ASSET–American Society of Shoulder and Elbow Therapists, the Journal of Orthopaedic and Sports Physical Therapy, and the Shoulder and Elbow–British Elbow and Shoulder Society.

Introduction

Outcome measures can be used to systematically assess the effects of surgical interventions and rehabilitation for musculoskeletal shoulder disorders. Ernest Codman, MD, was the first to describe this “end result idea” in 1905. He subsequently was also the first physician to describe the outcomes of the patients he treated in order to improve quality of care. To systematically assess the outcomes of care, outcome measures should be selected that can be simply and routinely performed, with adequate psychometric properties, and that are relevant to the patient. Patient-rated outcome (PRO) measures can fit these requirements to assess the effects of care for both individuals and groups of patients, as well as to guide treatment decision-making in daily clinical practice for individual patients. By providing the patient perspective, PROs can assess the direct impact of a shoulder disorder and its treatment on the patient’s daily activities and participation in work and recreational pursuits, which facilitates the assessment of the effectiveness and efficacy of treatment interventions. Moreover, PROs aid in the assessment of cost-effectiveness and value.

Clinician-rated impairment measures, such as shoulder range of motion (ROM) and strength, assess muscle and joint capacity in isolation, thus do not specifically determine the ability of the patient to use the upper extremity during required daily activities. These assessments of tissue or joint impairments are inadequate to comprehensively assess outcomes of care. PROs assess the patient perspective, which is frequently more reflective of the desired functional status, and is ultimately the most important aspect of outcome of care. Moreover, most PROs meet the requirements of simple routine use. Why are they not being used by clinicians consistently? Reported reasons for the lack of PRO use are unfamiliarity with the measures, time considerations for the patient and clinician, and lack of knowledge to interpret and use the final scores. This chapter will provide an overview of PROs for shoulder disorders—how to interpret and use the scores for treatment decision-making and quality improvement.

Psychometric Measurement Properties of Patient-Rated Outcomes

PRO measures should have established measurement properties of reliability, validity, error estimates, and responsiveness. In Table 2.1, these psychometric properties and common terms are defined. The properties of a PRO should be evaluated in a sample group of patients in whom the PRO is intended to be used. A PRO score should be reliable and reproducible over time in patients whose shoulder symptoms and function/disability are not changing. Items used to generate the total score or subscales scores should demonstrate internal consistency, which is the similarity of the items used to form the score. Types of validity of a scale include: (1) concurrent validity—correlated with other similar scales or measures of shoulder disability, (2) convergent validity—not related to scales that are not measures of shoulder disability, and (3) discriminate validity—ability to discriminate between patients with different levels of disability.

For daily clinical use and interpretation of a scale, the most helpful metrics are the error and responsiveness values. Error estimates are used to interpret the scale’s single score or a changed score. The standard error of the measure (SEM) is an estimate of error of a score when a patient completes a score just one time; SEM = standard deviation × [(1 − internal consistency coefficient or reliability coefficient)^1/2]. The minimal detectable change (MDC), also known as the smallest detectable difference (SDD), is the error associated with change scores (e.g., before and after treatment); MDC = SEM × [(2)^1/2]. The SEM and MDC both have a 68% confidence interval (CI).
To determine the 90% CI, the SEM and MDC are multiplied by the corresponding z-score of 1.64 for the SEM₉₀ and MDC₉₀, respectively. A critical feature of these error estimates is that they are in the same units or points as the scale itself; therefore, they allow a direct interpretation of error rather than a standardized reliability coefficient.

Table 2.1 PSYCHOMETRIC MEASUREMENT PROPERTIES OF PATIENT-RATED OUTCOMES

Property	Definition
Test-retest reliability; Interclass correlation coefficient (ICC)	Consistency of scale scores in a stable population
Internal consistency, Cronbach’s alpha	Homogeneity of the items (questions) of the scale
Error Estimates	Error in the scale score based on reliability of the scale
Standard Error of the Measure (SEM)	Error value associated with a change score (e.g., change in score between pretreatment and posttreatment) with 68% confidence bounds unless otherwise stated; distribution-based error value
Minimal Detectable Change (MDC) Smallest Detectable Difference (SDD)	Error value associated with a single score with 68% confidence bounds unless otherwise stated; distribution-based error value
Responsiveness	Ability of a scale to measure clinical change
Standardized Response Mean (SRM)	Mean change scores/standard deviation of changes scores
Effect Size (ES)	Mean change scores/standard deviation of baseline scores
Minimally Clinically Important Difference (MCID)	The minimal/smallest amount of change in the score associated with patient-rated improvement; anchor-based method for calculating change that is clinically meaningful minimal or small amount of change
Substantial Clinical Benefit (SCB)/Major Clinically Important Improvement (MCII)	The amount of change in the score associated with substantial or major patient-rated improvement; anchor-based method for calculating change that is a large clinically meaningful change
Validity	Degree by which a scale measures what it intends to measure. Variety of types, which include: construct, convergent, divergent, factorial, discriminant

Responsiveness, as an aspect of validity, is the ability of a scale to measure clinical change when it has occurred. Two established metrics, effect size (ES) and standardized response mean (SRM), are used to assess the size of treatment effect over time by calculating the magnitude of the change that has occurred and taking into account the variability (standard deviation) of the scores. To determine if the change is clinically meaningful, an external criterion of improvement is used as an anchor to establish meaningful clinical change. The minimally clinically important difference (MCID), also known as the minimal important change (MIC) value, is the PRO change value that indicates the minimally clinically important change in the PRO score. A more recent term—the substantial clinical benefit (SCB), also known as major clinically important improvement (MCII)—is the PRO change value of substantial or large change that may be expected over a longer period of treatment or as a long-term outcome. The MCID and SCB are both anchor-based metrics that can be used to determine if change in a PRO score is clinically meaningful.

Patient-Rated Outcomes for Musculoskeletal Shoulder Disorders

Shoulder pain and symptoms, difficulty with activities and participation, and patient satisfaction with shoulder use are the major elements assessed in PRO measures for shoulder disorders. PROs that are commonly used to assess outcomes of surgery and rehabilitation and have established measurement properties are presented in Table 2.2. These measures include two PROs, the Disabilities of the Arm, Shoulder, and Hand (DASH) and the Quick-DASH, which consider both upper extremities as a single unit when assessing symptoms and disability. This is accomplished by asking the patient to answer based on ability to perform the tasks “regardless of how you perform the task,” stating that “it doesn’t matter which hand or arm you use to perform the activity.” Thus, the DASH and Quick-DASH do not allow for the specific assessment of only the involved shoulder. Three other PROs assess function/disability, two including symptoms of pain of the involved shoulder specifically. The American Shoulder and Elbow Surgeon’s (ASES) Patient Self-Report, developed by the American Shoulder and Elbow Surgeons, assesses pain and function/disability considering the right or left shoulder individually with respect to the level of difficulty with daily tasks. The ASES function subscale has 10 questions, with ratings of difficulty using a 4-point Likert scale. The Simple

Shoulder Test (SST) epitomizes its name. It contains 12 simple items of daily function with dichotomous (yes/no) response options. The University of Pennsylvania Shoulder Score (Penn) assesses function and pain as well as patient satisfaction with shoulder use for the involved shoulder. The Penn uses an 11-point numeric rating for pain and satisfaction, and a 5-point Likert scale for the functional items.

Table 2.2 SCALE MEASUREMENT PROPERTIES PATIENT-RELATED OUTCOMES

Outcome	Scale Dimensions Scoring	Test-retest Reliability Internal Consistency	Error Estimate (SEM, MDC)	Validity	Responsiveness (SRM, ES, MCID)
ASES: American Shoulder & Elbow Standardized Form	Pain: 1 item, 10 pts/50% of total Function: 10 items 30 pts/50% of total Scoring: 0–100 pts, 100 = no disability	Test-retest reliability ICC: Range = 0.84–0.96; Average = 0.91 Internal consistency α: Range = 0.61–0.96	SEM = 6.7 MDC 90% CI = 9.4	Content Construct	SRM: Range = 0.5–1.6 Average = 1.1 ES: Range = 0.9–3.5 Average = 1.3 MCID = 6.4, 12–17
Constant Shoulder Score	Pain: 1 item, 10 pts/15% of total Function: 4 items, 20 pts/20% of total Clinician measure: Range of Motion: 4 items 40 pts/40% of total Strength: 1 item, 25 pts/25% of total Scoring: 0–100 pts, 100 = no disability	Test-retest reliability ICC: Range = 0.80–0.96 Internal consistency α: Range = 0.61–0.96	SEM = 4.5 Error estimate: SD = 8.86 (95% CI = 15, 20 pts)	Content Construct	SRM: Range = 0.59–2.16 ES: Range = 0.20–2.72 MCID = 5.4
DASH: Disabilities of the Arm, Shoulder, and Hand	Symptoms: 5 items, 30 pts/16.7% of total Disability: 25 items, 125 pts/83.3% of total Scoring: 0–100%; 0 = no disability	Test-retest reliability ICC: Range = 0.77–0.98 Average = 0.90 Internal consistency α: Range = 0.92–0.98	SEM: Range = 2.8–5.2; Average = 4.5 MDC 90% CI: Range = 6.6–12.2; Average = 10.5	Content Construct	SRM: Range = 0.5–2.2 Average = 1.1 ES: Range = 0.4–1.4 Average = 1.1 MCID = Range 10.2–10.8
Quick DASH	Symptoms: 3 items, 15 pts/37.5% of total Disability: 8 items, 40 pts/62.5% of total Scoring: 0–100% 0 = no disability	Test-retest reliability ICC: Range = 0.90–0.94 Internal consistency α: Range = 0.92–0.95	SEM: Range = 3.3–10.2 MDC 95% CI = Range = 11–13.3	Content Construct	SRM: Range = 0.63–1.1 ES = 1.02 and 1.26 MCID = 8.0, 15.9
Penn: University of Pennsylvania Shoulder Score	Pain: 3 items, 30 pts/30% of total Satisfaction: 1 item 10 pts/10% of total Function: 20 items 60 pts/60% of total Scoring: 0–100 pts 100 = no disability	Test-retest reliability ICC: 0.94 Internal consistency α = 0.93	SEM 90% CI = 8.5 MDC 90% CI = 12.1	Content Construct	SRM = 1.27 ES = 1.01 MCID = 11.4 (SD 9.5)
SST: Simple Shoulder Test	Function: 12 items, 12 pts/100% of total Scoring: 0–12 pts 12 = full function	Test-retest reliability ICC: Range = 0.97–0.99 Average = 0.98 Internal consistency α = 0.85	SEM = 11.65 MDC 95% CI = 32.3	Content Construct	SRM: Range = 0.8–1.8 Average = 0.9 ES = 0.8 MCID = 2.33 points
WORC: Western Ontario Rotator-Cuff Index	Physical Symptoms: 6 items; 100 pts/28.7% of total Sports/Rec: 4 items, 100 pts/19% of total Work: 4 items, 100 pts/19% of total Lifestyle: 4 items, 100 pts/19% of total Emotional: 3 items, 100 pts/14.3% of total Scoring: 0–100% 0 = no disability	Test-retest reliability ICC: Range = 0.84–0.89 Internal Consistency α: Range = 0.91–0.95	SEM = 6.9 MDC 95% CI = 19.1	Construct	SRM = 0.91–2.1 ES = 0.96–1.37 MCID = 11.7–13.1
PSFS: Patient Specific Functional Scale	Function: 3–5 items, 10 pts each, average of item total Scoring: 0–10 10 = no disability	Test-retest reliability ICC = 0.71 Internal Consistency: none	SEM = 1.1 MDC = 3.0 (90% CI = 1.7, 4.2)	Construct	MCID = 1.2
CI = confidence interval, ES = effect size, ICC = interclass correlation coefficient, MDC = minimal detectable change, MCID = minimally clinically important difference, SEM = standard error of the measure, SRM = standardized response mean