Measuring Clinical Outcomes in Rheumatic Disease
Melanie J. Harrison
Lisa A. Mandl
KEY POINTS
Understanding the purpose, design, proper application, and clinical meaning of outcomes instruments is necessary for correct interpretation of the medical literature.
The heterogeneity of the rheumatic disease often requires disease-specific assessment using validated instruments designed to detect clinically relevant outcomes that are unique to each condition.
Generic outcomes instruments are useful to compare different rheumatic disease populations with one another or with nonrheumatic disease populations with respect to common clinical manifestations, such as pain, disability, and general health.
Many different clinical outcomes are often assessed whether alone (e.g., tender joint count and pain) or as a composite [e.g., disease activity score (DAS)] in order to capture the overall effect of rheumatic diseases.
Clinical outcomes instruments are constantly changing to keep pace with the growing knowledge of the clinical manifestations and underlying pathophysiology of rheumatic diseases.
Many instruments have been developed to assess clinical outcomes. Some are designed for use in specific diseases, whereas others are applicable to more generalized populations; all instruments are intended to help make measurable assessments of the patients’ physical and psychosocial status. Outcomes instruments are used routinely in rheumatic disease studies.
It is important to understand what these tools measure, in what settings they can be applied, and how their results are interpreted.
GENERAL CONCEPTS OF OUTCOMES MEASUREMENT
Clinical outcomes instruments are essential for clinical research and are useful for the evaluation of the status or change in status of an individual in the clinical setting. Instruments objectify what we see clinically, standardize measurement, and allow for comparison. However, reducing complex clinical information to statistically manageable data may result in oversimplification and loss of important detail.
Data reduction. Outcomes analysis requires packaging of data qualitatively or quantitatively. Qualitative outcomes categorize individuals. For example, classification criteria classify subjects as with or without disease and American College of Rheumatology (ACR) response criteria classify patients as improved or not improved. Quantitative outcomes are expressed as scores and usually have a neutral or normal value for comparison.
Standardization. Explicit procedures for distributing, administering, and scoring instruments are necessary to minimize error. Specific parameters to define each variable/outcome are needed to ensure that all evaluations are made in the same way; this is called operationalization.
Reliability. The precision of a measure can be described as its consistency, reproducibility, or reliability. It is divided into external and internal consistency.
External consistency. Results are reproducible and have limited variability when the same measure is performed by the same assessor or on the same patient on different occasions (intraobserver) or different assessors simultaneously (interobserver).
Internal consistency. Responses to items within an instrument that measure the same attribute are similar.
Validity. Instruments should accurately represent and measure what they purport to measure.
Content validity. Is the instrument logical and meaningful, that is, does it make intuitive sense?
Construct validity. Does the instrument examine the theoretical concepts and appropriately evaluate the relation between specific variables that are believed to explain the phenomenon being studied?
Criterion validity. Do the results of the instrument correlate with an external measure of the true value, for example, a “gold standard”?
Responsiveness. Instruments that are used at two points in time to assess patient status over time should be sensitive to change; the results should vary consistently with the status of the patient (i.e., if the patient improves clinically, the change in outcome measurement should reflect improvement; if the patient’s condition does not change, neither should the measurement).
Objectivity. The influence of individual interpretation by the patient or the assessor should be limited.
Feasibility. Instruments and their measurements must be practical and not cumbersome to use, easily understood by those administering and responding to them, patient-friendly, and scored without difficulty.
CLASSIFICATION CRITERIA
There is no single definitive diagnostic test for most rheumatic diseases because many of these conditions are frustratingly heterogeneous. Classification criteria have been developed to allow accurate categorization of patients for clinical research studies.
Classification criteria are imperfect. Sensitivity and specificity are maximized, but each is less than 100%. These criteria undergo revisions periodically as knowledge of these diseases grows.
Classification criteria are often used as the basis for subject selection for studies of specific diseases. However, they are also used in epidemiologic surveys as the outcome measure (e.g., incidence and prevalence studies).
Classification criteria are not intended to serve as a diagnostic tool.
FUNCTIONAL STATUS QUESTIONNAIRES
I. HEALTH ASSESSMENT QUESTIONNAIRE (HAQ)
The HAQ measures functional disability, that is, how well individuals manage activities in their daily lives. The HAQ was one of the first patient self-report instruments. Previously, outcomes measurement relied on physicians’ assessment or laboratory data. Although originally developed to evaluate arthritic conditions, the HAQ is now considered a generic instrument.
Content. Individual questions that comprise the HAQ disability index evaluate the patient’s ability over the preceding 2 weeks to perform various activities that fall into eight component areas: hygiene, dressing and grooming, arising, eating, walking, reach, grip, and outdoor activities.
Scoring. HAQ scores range from 0 to 3; higher scores indicate greater disability and a 0.22 point change indicates a clinically important difference. The commonly used two-page version includes two visual analogue scales (VAS), one for pain and one for global health, and the HAQ disability index. The VAS global health scale
is a validated measure of health-related quality of life and correlates strongly with other quality of life measurement tools.
There are different versions of the HAQ. The multidimensional HAQ (MDHAQ) identifies patients who have important functional limitations, but do not register a high enough HAQ to score as “disabled.” The modified HAQ (mHAQ) includes fewer original items and additional questions regarding patient satisfaction and self-perception of health and performance.
Why is the HAQ important? The HAQ prospectively captures the effect of chronic illness over time. HAQ scores have been shown to correlate with work disability, health services utilization, and mortality, among other clinically important outcomes, especially in rheumatoid arthritis (RA) population.
II. ARTHRITIS IMPACT MEASUREMENT SCALE (AIMS)
The AIMS evaluates health-related quality of life in patients with arthritis.
Content. The AIMS 2 is a 78-item questionnaire which evaluates physical, social, and emotional well-being. It is self-administered and takes approximately 20 minutes to complete. There is also a validated shorter version of the AIMS, which consists of 28 questions and takes approximately 8 minutes to complete.
Scoring. Items are additive. Lower scores indicate better quality of life. Single items are grouped into five scales: general physical health, affect, symptoms, work role, and social interaction. The numeric totals for the scales vary based on the number of individual items contained within each scale.
OSTEOARTHRITIS MEASURES
I. Western Ontario McMaster Universities Osteoarthritis Index (WOMAC)
measures important, potentially modifiable, clinical outcomes in hip and knee osteoarthritis (OA). It is recommended by the Osteoarthritis Research Society International for use in clinical trials of knee and hip OA.