Chapter 46 Clinical Measures, Metrics, and Indices
Systemic lupus erythematosus (SLE) is a protean, multisystem complex disease characterized by remissions and exacerbations. The SLE disease course varies from flares to persistently active disease (PAD), from disease improvements to remissions.1,2 Patients with SLE may experience events that are related to lupus disease activity, chronic irreversible damage, and adverse events from the medications, all of which impact their quality of life. Monitoring each of these aspects is challenging but essential for the successful management of patients. The use of validated and reliable tools is therefore fundamental for the management of patients with lupus and to allow for comparisons among patients from different centers.
The assessment of patients with lupus includes the determination of five domains: (1) disease activity, (2) chronic damage resulting from lupus activity or its treatment, (3) adverse events of drugs, (4) health-related quality of life (HRQoL), and (5) economic impact (Table 46-1).3 To date, no universal agreement regarding the optimal tools to be used is available to assess each of the five domains in SLE. Whether in research or in clinical care settings, investigators and rheumatologists must identify the appropriate tools suited to the particular research or clinical needs. This chapter focuses on describing the available measures to assess all domains in patients with lupus.
A number of measures have been developed to assess disease activity, damage, and HRQoL in patients with lupus. In some instances, instruments have been specifically developed for lupus, whereas in other scenarios, generic instruments are used that have been developed for other chronic diseases. The following sections describe the development and use of the instruments in SLE.
Disease activity can be defined as a reversible clinical or laboratory manifestation, reflecting the immunologic and inflammatory manifestation of organ involvement from lupus at a specific point in time.4 The ability to quantify and grade disease activity, whether in a clinical practice or in research settings, is important. For this purpose, several measures have been developed and adopted to assess disease activity. Appropriate measures must be shown to be reliable and valid, as well as sensitive to change. In addition, the practical applicability of the measure includes the ease of administration, the low costs of data collection and method of scoring, and the ease of score interpretation.5 Two types of disease activity measures have been developed. Global indices describe the overall burden of inflammatory disease, whereas organ-specific indices relate to disease activity within each organ system, either individually or incorporated into one summary score.
The Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) is a global disease activity index that was initially developed and introduced in 1985. This index was modeled on clinicians’ global judgment. A group of experienced rheumatologists with expertise in lupus participated in the development of this index. The use of the nominal group process ensured that the resulting index, SLEDAI, represented the consensus of the developers. From the initial list of 37 descriptors derived from the literature that have been used to describe disease activity in lupus, 24 of the most important descriptors were retained for the development of SLEDAI. The elimination of the 13 descriptors occurred in the first phase of development (preconference ratings) and was accomplished by 15 clinicians. SLEDAI is thus based on the presence of 24 descriptors in 9 organ systems. Based on the experts’ evaluation of 1400 case scenarios, multiple regression models were used to derive the weighted scores for each descriptor. Most of the definitions of the descriptors were based on the American College of Rheumatology (ACR) glossary of rheumatic disease terms, and they were further refined throughout the development process of SLEDAI.6,7 The scores of the descriptors were derived from the values obtained through the regression models and ranged from 1 to 8 with a total possible score of 105. The initial validation of SLEDAI was conducted throughout the primary development phase, and descriptors were used to evaluate disease activity on a cohort database from the University of Toronto Lupus Clinic. The descriptors in SLEDAI were precisely defined in the 10-day period before the assessment and within which the manifestation must be recorded.4 The intrarater and interrater reliability of SLEDAI was shown during the phase of development on a set of case scenarios of patients with lupus across the investigators.4 Rheumatologists from four countries have successfully used SLEDAI in a multicenter study, confirming its reliability in real patients.8 Furthermore, SLEDAI reproducibility has been demonstrated when used in routine clinical visits and among less experienced observers (e.g., rheumatologist trainees) in the assessment of disease activity in patients with lupus.9,10 SLEDAI has been shown to correlate with other validated measures of disease activity.8,10,11 Moreover, SLEDAI has been used in both research and clinical settings and as a predictive variable and outcome measure in prognostic studies of lupus.10,12,13 It has also shown sensitivity to change over time and validity in the assessment of childhood lupus.14–16 Lupus disease activity, as determined by SLEDAI, has been associated with mortality and survival in studies of patients with lupus and has been the major determinant of damage accrual.17,18 SLEDAI is highly prognostic for mortality in the next 6 months, with increasing relative risks of 1.28 for SLEDAI 1 through 5, 2.34 for SLEDAI 6 through 10, 4.74 for SLEDAI 11 through 19, and 14.11 for SLEDAI higher than 20 (Figure 46-1).19
(From Gladman DD, Ibañez D, Urowitz MB: Systemic lupus erythematosus disease activity index 2000. J Rheumatol 29(2):288–291, 2002.)
In 1992 a modification of SLEDAI was developed in Mexico in an attempt to reduce the cost inherent in a SLEDAI calculation by eliminating the laboratory tests included in SLEDAI.20 The Mexican version of SLEDAI (Mex-SLEDAI) excludes immunologic descriptors. Moreover, some clinical and laboratory manifestations were added (fatigue, mononeuritis, and myelitis clustered in the descriptor neurologic disorder; peritonitis grouped with serositis; creatinine increase grouped with renal disorders; and hemolysis and lymphopenia grouped with leukopenia) and others were excluded (lupus headache, visual disturbance, and pyuria). The total number of variables in the Mex-SLEDAI was reduced to 10. In addition, investigators modified the definitions for a few descriptors. Different weighted scores were assigned to Mex-SLEDAI, as compared with SLEDAI, with a maximum score of 32.20 The Mex-SLEDAI was originally validated in Spanish-speaking countries.20 In 2004, the modifications of SLEDAI 2000 (SLEDAI-2K) were incorporated for the first time into the Mex-SLEDAI version and applied to patients of non-Hispanic descent (Mex-SLEDAI-2K).21 Mex-SLEDAI-2K was shown to have convergent validity with SLEDAI-2K and the revised systemic lupus activity measure (SLAM-R), as well as moderate correlation (r = 0.54) with physician’s global assessment (PGA).21 Nevertheless, the sensitivity to change of the Mex-SLEDAI needs to be studied further.20,21 Mex-SLEDAI has not been used extensively in clinical trials and is limited to a few centers in Latin America.
The Safety of Estrogens in Lupus Erythematosus–National Assessment Trial (SELENA) proposed a new modification of SLEDAI to which a composite flare outcome—the SELENA-SLEDAI Flare Index (SFI)—was added.22 In this version of SLEDAI, several descriptors were modified. The definition of the descriptor seizure was modified in SELENA-SLEDAI to exclude seizures that were due to past irreversible central nervous system damage, and the descriptor cerebrovascular accident was modified to exclude hypertensive causes. However, these modifications were unnecessary; in the original SLEDAI, these two descriptors are scored as present only if the features are attributed to lupus disease activity.4 The descriptor visual disturbance was modified to include scleritis and episcleritis. This modification has not been validated because these features do not reflect the same changes included under “visual” in the original SLEDAI and may not deserve a score of 8. In the descriptor cranial nerve disorder, “include vertigo due to lupus” was added to the definition. Nevertheless, vertigo is one of the manifestations of vestibulocochlear cranial nerve involvement and was intended to be reflected in the original SLEDAI, because it is one of the manifestations of the cranial nerve disturbance. The definitions of pleurisy and pericarditis were modified by adding the phrase “classic and severe” to ensure the attribution of the descriptors to lupus disease activity. More importantly, SLEDAI and SLEDAI-2K mandate the presence of subjective (e.g., pleuritic or pericardial pain) and objective (e.g., rub, effusion, electrocardiographic or echocardiographic confirmation, or pleural thickening) findings for pleurisy and pericarditis to be scored as present.4,23 In the SELENA-SLEDAI, researchers accepted the presence of either the objective or subjective findings to score the descriptor as present.22 In the SELENA-SLEDAI, arthritis is scored if more than two joints are active, whereas SLEDAI-2K defined arthritis as two or more actively inflamed joints as in the definition of lupus arthritis in the ACR glossary of terms. The SELENA-SLEDAI defines proteinuria as new onset or recent increase of more than 0.5 gm/24 h as in the original SLEDAI. However, SLEDAI-2K modified the descriptor proteinuria to be >0.5 gm/24 hours.23 As in the original SLEDAI, the score ranges from 0 to 105 (eFigure 46-2).22,23 Despite the modifications in some of the descriptors, SELENA-SLEDAI appears similar to SLEDAI-2K. Importantly, no validation of all of the modifications introduced in SELENA-SLEDAI has been made. Thus the SELENA-SLEDAI version lacks the stringent validation steps that are essential before a measure can be used in clinical trials or research settings. The authors of this text believe that the SLEDAI-2K could serve well as the SLEDAI component of the SELENA instrument, which also includes a flare measure.
(Petri M, Kim MY, Kalunian KC, et al: Combined oral contraceptives in women with systemic lupus erythematosus. N Engl J Med 353(24):2550–2558, 2005.)
SLEDAI-2K was introduced in 2002 and validated.23 In the glossary of the original SLEDAI, certain descriptors were scored as active only if they were new; thus PADs were not scored. This would lead to an apparent improvement that, in fact, did not occur. Among SLEDAI descriptors, rash, alopecia, and mucosal ulcers had been scored only if they were new or recurrent and, in the case of proteinuria, if new onset or a recent increase of more than 0.5 grams in 24 hours is present. SLEDAI-2K was modified to allow the documentation of ongoing disease activity in the descriptors: rash, alopecia, mucosal ulcers, and proteinuria.4 Thus SLEDAI-2K includes the presence of any inflammatory rash, alopecia, or mucosal ulcers, and new, recurrent, or persistent proteinuria greater than 0.5 grams in 24 hours. As in the original SLEDAI, all the descriptors in SLEDAI-2K must be attributed to lupus activity.23 In the validation phase of SLEDAI-2K against SLEDAI, the entire cohort of the University of Toronto Lupus Clinic was used. Of 18,636 visits, 78% of the scores were concordant in SLEDAI-2K and SLEDAI. In the remaining 22% of the visits, the differences were the result of proteinuria, rash, alopecia, and mucosal ulcers. SLEDAI-2K at presentation was equivalent to SLEDAI at presentation as a predictor of mortality. Moreover, SLEDAI-2K described disease activity at different activity levels in a comparable manner with the original SLEDAI. SLEDAI-2K was equivalent to SLEDAI in describing changes in disease activity from one visit to the next (see Figure 46-1).23
In the original SLEDAI and its 2000 modification, the time frame for the individual components was a 10-day period before the assessment.4,23 Other major disease activity indices for SLE measure disease activity in the preceding 30 days.24–26 Moreover, the usual time frame of observations within a clinical trial is 30 days; thus validating SLEDAI-2K 30 days against SLEDAI-2K 10 days was relevant. The first validation study was conducted on 149 patients who were seen over 9 weeks at the University of Toronto Lupus Clinic. The results showed that SLEDAI-2K 30 days is similar to SLEDAI-2K 10 days in both patients who were in remission and patients with a spectrum of disease activity levels.27 The second study validated SLEDAI-2K 30 days against SLEDAI-2K 10 days in a group of 41 patients who were followed at monthly intervals for 12 months. These studies confirmed that having a manifestation of active lupus present at 11 to 30 days before a visit and a complete resolution in the 10 days before the visit is unusual. Therefore the 30-day time frame for SLEDAI-2K should now be used in clinical studies and clinical trials to describe disease activity in patients with SLE.27,28
One of the drawbacks of SLEDAI-2K is that it can detect only 100% improvement of the active descriptors and thus cannot reflect a partial improvement in disease manifestation. A second drawback is that SLEDAI-2K does not detect a worsening of an already active descriptor; nevertheless, this particular descriptor will continue to be scored as active and thus scored as present. Despite the fact that SLEDAI-2K is a global index and generates a total score reflecting overall disease activity, disease activity in each of the nine organ systems of SLEDAI-2K can be derived if required in clinical trials. The practical applicability of SLEDAI-2K in clinical settings, its ease of administration, and its simplicity in scoring are fundamental properties. These benefits have enabled SLEDAI-2K to be one of the most commonly used global disease activity measures in longitudinal observational studies and clinical trials.
The Systemic Lupus Activity Measure (SLAM) index was introduced in Boston and first published in 1989 to measure global disease activity. The SLAM index uses disease manifestations derived from the American Rheumatology Association Council on SLE and includes 31 items—23 clinical and 7 laboratory—in 11 systems with a total possible score of 86. The SLAM index assesses global disease severity in the previous month.6,11,26 Most clinical and laboratory items are categorized as present or absent and are then scored from 0 to 3, based on the severity without considering the significance of the organ involved.11,29 For instance, mild fatigue or oral ulcers are scored similar to lupus headache or seizure. A few items can score only 1 or 2, in particular, fatigue, oral ulcers, headache, alopecia, Raynaud phenomenon, lymphadenopathy, and hepatomegaly or splenomegaly (eFigure 46-3). The revision, SLAM-R index, includes 23 clinical manifestations and the same 7 laboratory parameters and has a possible range of 0 to 81 with a score of 7 being considered clinically significant.29 The definitions of several items were modified in the SLAM-R index; in particular, pleurisy, pericarditis, and pneumonitis were dropped because of the difficulty in scoring. The definitions and weighting of fatigue, stroke syndrome, seizure, and headache were modified.29 The SLAM-R index does not include immunologic tests as in SLEDAI-2K. The SLAM index and its updated version, SLAM-R, are reliable and valid in measuring disease activity across cultures and when compared with other disease activity measures.8,11 Moreover, the SLAM and SLAM-R indices have been shown to capture patients’ assessments better than the other indices, and this could be explained by the presence of subjective items in these indices that reflect patients’ perceptions of the disease.30,31 The SLAM-R index is valid for assessing disease activity of childhood lupus.15,16 A potential drawback in the SLAM-R index is that it includes subjective items, such as fatigue, shortness of breath, chest pain, abdominal pain, myalgia, and arthralgia, which are then scored by their severity. Although these items reflect the patients’ perceptions of the disease, as in other indices, these items should only be scored if the assessor believes they are attributed to lupus disease activity. Nevertheless, the assessment of these items has been associated with ambiguity in research settings and clinical trials, and a score of 7 on the SLAM-R index is not unusual for subjective complaints that can be misinterpreted as lupus activity. Although the SLAM and SLAM-R indices have been used in clinical trials and research settings in the assessment of adult and childhood lupus and are sensitive to change, the previously listed drawbacks should be considered.8,15,16,30
The European Consensus Lupus Activity Measurement (ECLAM) index was first published in 1992 by the Consensus Study Group of the European Workshop for Rheumatology Research. The ECLAM index was developed on the basis of the analysis of 704 patients with lupus from 29 centers in 14 countries.25,32,33 The 15 items of the ECLAM index were derived through univariate analysis to reflect the best clinical and laboratory features of SLE and weighted according to their respective coefficient as determined using multivariate regression analyses. In the initial development and validation steps of the ECLAM index, the PGA was considered the criterion construct “gold standard” for lupus disease activity. The ECLAM index evaluates disease activity over the previous month, and the maximum possible score is 10 (eFigure 46-4). The ECLAM index has been shown to be reliable, valid, and sensitive to change, when compared against other indices including the SLEDAI and the British Isles Lupus Assessment Group (BILAG) index.30 The ECLAM index can be used to evaluate disease activity retrospectively in patients from the data provided in clinical charts as shown in a study conducted on 64 patients.34 The ECLAM index has been validated for the assessment of disease activity in childhood-onset lupus.15 More important, the ECLAM index has not been extensively used in clinical trials.
(Vitali C, Bencivelli W, Isenberg DA, et al: Disease activity in systemic lupus erythematosus: report of the Consensus Study Group of the European Workshop for Rheumatology Research. II. Identification of the variables indicative of disease activity and their use in the development of an activity score. The European Consensus Study Group for Disease Activity in SLE. Clin Exp Rheumatol 10(5):541–547, 1992.)
The lupus activity index (LAI) was proposed in 1989 to assess the global disease activity over the previous 2 weeks.10 The LAI includes five sections, eight organ systems, and three laboratory measures. The PGA, as well as the score for treatment with corticosteroids and immunosuppressive drugs, is part of this index. The severity of the disease is based on the physician’s judgment. The overall score reflects the mean of the PGA, physician’s judgment of the severity of clinical manifestations, degree of laboratory abnormalities, and treatment. The score of the LAI ranges from 0 to 3.10 The LAI validity was demonstrated in a study on 150 patients in which the correlation of the LAI was modified (M-LAI) so as not to contain the PGA and was scored at 0.64. The interrater and intrarater reliability of the LAI was shown in a study conducted on six patients in routine practice.10 The LAI has performed well in assessing disease activity when compared with other disease activity measures and has been sensitive to change; nonetheless its use has been limited as compared with other disease activity measures.30
The SLE activity index score (SIS) is a global index developed by clinicians at the National Institutes of Health (NIH). The SIS includes 17 clinical items and is based on clinical manifestations and subjective features reflecting the perception of the patients on the disease, in particular, fatigue, arthralgia, and myalgia, as well as laboratory items (eFigure 46-5). The SIS is a weighted index, and the scores range from 0 to 52. The SIS assesses disease activity over the previous week and categorizes disease activity into inactive, mildly active, moderately active, active, and very active. The SIS is a valid index that has been adopted in some clinical trials and research settings.33,35 The validity of the SIS index has been demonstrated against other disease activity indices, in particular, the SLEDAI, SLAM, and BILAG indices. In this study, all four indices were closely correlated with each other (r = 0.86 between SIS and SLAM); nevertheless, the SIS has not been used as extensively as the SLEDAI or the BILAG index.33,36
The BILAG index was proposed by a group of investigators from different centers in the United Kingdom, and its first version was published in 1988.37 This index was developed using a nominal consensus approach and is based on the principle of the physician’s intention to treat. BILAG includes 86 items including clinical signs, symptoms, and laboratory variables in 8 systems. The items recorded must have been attributed to active lupus and present during the 4 weeks before the assessment.37 Based on the presence of certain features in each system, a system is categorized into one of four levels: A for action; B for beware; C for content; and D for discount (eFigure 46-6).24 The BILAG index was shown to have good between-rater reliability and to be valid when compared with the “gold standard” criterion (i.e., starting or increasing disease-modifying therapy).24 Further validation of the BILAG index showed that disease activity in different systems in SLE does not follow a common pattern. This study recommended the use of the individual BILAG components rather than the total BILAG score as a primary endpoint in clinical and epidemiologic studies.38 The BILAG index sensitivity to change over time was shown in a study on 23 patients who were prospectively followed every 2 weeks for up to 40 weeks, with a standardized response means of 0.57.30 The BILAG index was adapted and validated in the assessment of SLE in children.15 The BILAG index has been found to be reliable and valid in several studies conducted by the BILAG group and other investigators and has correlated with other disease activity measures, in particular, the SLEDAI and the SLAM index.11,14,15,24,30,38,39 The BILAG index has been successfully used in clinical trials and research settings and has been particularly effective for demonstrating new organ flares.8,15,39–43
(Yee CS, Cresswell L, Farewell V, et al: Numerical scoring for the BILAG-2004 index. Rheumatology (Oxford) 49(9):1665–1669, 2010.)
The classic BILAG index has undergone a series of revisions to the current BILAG-2004.11,24,37 The members of the BILAG proposed the BILAG-2004 index, which included further changes in some divisions of organs and systems; refinements in the definitions of some items, in particular, the neurologic system; the removal of items attributed to damage rather than reflecting lupus disease activity, in particular, avascular necrosis and tendon contracture; and modifications in the glossary and scoring.43 As in the classic BILAG index, the BILAG-2004 index is based on the physician’s intention to treat.43 The BILAG-2004 index contains 97 items, whereas the classic BILAG index has only 86. The system vasculitis was removed and its items were included in other systems, and the gastrointestinal and ophthalmic systems were added.43 In the classic BILAG index, all items that are improving can only contribute to a C score, which does not reflect the appropriate level of disease activity for more severe manifestations.43 In the BILAG-2004 index, features that contribute to an A score when recorded as being the same, worse, or new will contribute to a B score when improving (Figure 46-7).43
(From Isenberg DA, Rahman A, Allen E, et al: BILAG 2004. Development and initial validation of an updated version of the British Isles Lupus Assessment Group’s disease activity index for patients with systemic lupus erythematosus. Rheumatology (Oxford) 44(7):902–906, 2005.)
A complete history and physical examination is required to determine disease activity by the BILAG-2004. The BILAG-2004 index generates a score for each of the nine systems assessed. The scoring of lupus disease activity in each system is graded A through E, based on the assessment of the clinical features and/or the laboratory findings for the appropriate system and representing disease activity. Like the classic BILAG index, the BILAG-2004 is a transitional index that is able to capture changing severity of clinical manifestations. The items in each system are rated using a scale from 0 through 4 (0 = not present, 1 = improving, 2 = same, 3 = worse, and 4 = new), and some items are scored as present or absent, reflecting disease activity over the last 4 weeks, as compared with the previous 4 weeks. The classic BILAG index and its versions, including the BILAG-2004 index, are ordinal scale indices, and an additive numerical scoring scheme for the BILAG-2004 index is available (A grade = 12 points, B = 8, C = 1, D = 0, and E = 0).44 This scoring system is mainly adopted in studies in which the BILAG-2004 index needs to be compared with other numerical indices or to facilitate the statistical analysis, if required; however, the BILAG-2004 index was not designed to be used in this way.43,44 The British Lupus Integrated Prospective System (BLIPS) is a computerized program that calculates the BILAG scores with the option to derive the SLEDAI, the SLAM-R index, the Systemic Lupus International Collaborating Clinics (SLICC)/ACR Damage Index, and the Medical Outcomes Study (MOS) Short Form 36 (SF-36).45 BLIPS has also undergone further refinement to reflect the BILAG-2004 index, and several amendments have been made to the other activity indices.43,45
The BILAG-2004 index has been able to discriminate among patients and has shown a good reliability and high levels of physician agreement in almost all systems.43 The reliability of the BILAG-2004 index was evaluated in a larger study involving 11 centers across the United Kingdom with the participation of 14 raters and 97 patients. This study showed that the BILAG-2004 is a reliable index to assess SLE activity and recommended the training of raters to ensure its optimal performance.46 More recently, the construct validity of the BILAG-2004 index was confirmed by its association with the erythrocyte sedimentation rate (ESR), C3 level, C4 level, anti–double stranded DNA (anti-dsDNA), and, more importantly, the SLEDAI-2K index.47 The criterion validity of BILAG-2004, defined as change in therapy, was confirmed by the association between the BILAG-2004 index and the increase in therapy.47 In this study, higher SLEDAI-2K scores were significantly associated with overall BILAG-2004 scores reflecting higher disease activity. Although the BILAG index has been extensively used in clinical trials, its routine use in long-term studies has some drawbacks, in particular, the practical applicability and the complicated glossary of the clinical features, and the scoring analysis, which requires a specialized computer program.