Introduction
How do we know that our interventions “work”? The success or effectiveness of an intervention can best be understood in terms of whether it consistently achieves the goal/s for which the intervention is intended. Outcomes research is the science of measuring effectiveness. Wennberg, one of the pioneers of the outcomes research movement, defined the imperative to “sort out what works in medicine and to learn how to make clinical decisions that reflect more truly the needs and wants of individual patients.” This sentiment is embodied in the definition of evidence-based medicine (EBM) that is characterized by the “…conscientious, explicit and judicious use of the current best evidence (derived from systematic research) in making decisions about the care of individual patients.” Indeed, when designing (or appraising) a clinical trial or comparative observational study, the spirit of EBM requires that the effectiveness of the intervention/s of interest should be judged with the use of meaningful, validated outcome measures that reflect the goals and expectations of patients. These goals may seem self-evident, but too often the perspectives of individual patients are not taken into sufficient consideration during clinical decision making, much less during the measurement of outcomes.
Outcomes assessment has come a long way from its origins exemplified by Codman, to the era of comparative effectiveness research (CER) of today. This chapter provides an overview of outcomes assessment to give a context for the measurement of conventional and patient-reported outcomes in the management of pediatric fractures. Some of the limitations of current assessments will be highlighted, and a framework will be suggested for conceptualizing “meaningful” outcomes and developing instruments to measure such outcomes.
What are Outcomes?
Outcomes can be defined as the effects of health care on the health status of patients and populations and is one of the three dimensions of care, along with process and structure, in Donabedian’s framework for evaluating the quality of care. An outcome is what happens to a patient as a consequence of an intervention, or the passage of time (natural history). An intervention can be associated with multiple outcomes. Outcomes can be desirable (benefits) or undesirable (harms). An outcome is desirable when the intended goal has been achieved. The goals of an intervention can be reactive, aiming to correct or restore a recognizable symptom or problem (e.g., a limb-lengthening procedure to correct an acquired limb-length inequality from a growth arrest). In the absence of any signs or symptoms, but some indication of risk factors, interventions can be preventive or prophylactic, intended to prevent some future harm (e.g., contralateral epiphysiodesis after a high-risk physeal injury). Consequently, some outcomes occur early, while others become evident only after a period of time. Outcomes may not come to light for many years (e.g., open reduction of an intraarticular fracture to prevent osteoarthritis in adulthood). An undesirable outcome can be expected and inevitable (e.g., an incisional scar after surgery); or expected some of the time, such as a known side effect of the treatment (e.g., pin-site infection of external fixation); or unexpected, when it is considered an adverse event. A complication is an undesirable outcome associated with an injury or its treatment, occurring some of the time (e.g., avascular necrosis after a femoral neck fracture). The likelihood (probability) or risk of such a complication may not always be reliably quantifiable, and there may be measures that can be taken to reduce or prevent such risks. Undesirable outcomes can be transient or reversible, or permanent.
An understanding of these concepts is essential for shared decision making, the central tenet of patient-centered clinical care. Most clinicians recognize these as key elements to be considered during discussions with patients about treatment recommendations and the process for obtaining informed consent. The evidence to guide these discussions must be derived from high-quality research that measures these harms and benefits. CER is involved in the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.
Frameworks of Health and Disease and the Evaluation of Outcomes
An evolution has occurred in the conceptualization of health from the traditional medical model in which health was characterized merely by the absence of disease to more complex models that take a more holistic account of the human experience. This evolution has been accompanied by changes in the way we measure health and disease, reflected by an array of outcome measures that quantify a wide range of health-related phenomena, including physical, mental, and social status and quality of life.
The International Classification of Functioning, Disability, and Health (ICF)
The International Classification of Functioning, Disability and Health (ICF), developed by the World Health Organization (WHO), provides a unified, standard language that classifies health and health-related domains of individuals or populations and provides a framework to measure health and disability associated with any health condition. The ICF complements the International Classification of Diseases (ICD) system, which codifies these conditions. The model has been adapted for children to develop the ICF for Children and Youth (ICF-CY). The ICF list of domains includes a list of body structures and functions and a list of activities and participation, which can be influenced by contextual factors such as the environment and personal characteristics ( Fig. 8-1 ).
In the ICF framework, the term body structures refers to the anatomic parts of the body affected by the health condition of interest (e.g., effect of an injury on bones, muscles, and neurovascular structures), and the term body functions refers to physiologic and psychologic functions of various body systems (e.g., range of motion). Intact body structures and body function allows for activities. An activity refers to the completion of a specific task or action (e.g., throwing or running), which, when performed for a particular purpose or role, is referred to as participation (e.g., playing baseball). Participation implies doing things that one wants to do. Participation in life roles is a key component of quality of life. Disruption of body structures and body functions, associated with a health condition, results in impairments. For instance, pediatric injury may be associated with impairments of the musculoskeletal system (e.g., femur fracture) and body function (e.g., joint range of motion). Impairments of body structures and body functioning may lead to limitation in activities. Limitation in an activity (e.g., inability to run) can result in restriction of participation (e.g., being dropped from the soccer team). The impact of a health condition and its treatment on activities and participation constitutes what are often called “functional” outcomes. The ICF framework includes the consideration of contextual factors that can be strong determinants, either as facilitators or barriers, of functional outcomes. These include external environmental factors, such as home/school/community, socioeconomic status, and access to health care, as well as personal factors such as demographic characteristics, culture and upbringing, lifestyle preferences, motivation, and personality traits. Contextual factors should be considered because they can explain the gap between what one can do (capacity) and what one actually does do in daily life (performance).
Technical Versus Functional Outcomes
In the management of fractures, the immediate objective of interventions is to restore and maintain alignment and length until the fracture unites and consolidates. In the ICF framework, these are outcomes at the level of impairments of body structures and body functions. Goldberg refers to these as technical (or clinical) outcomes. These are important indicators of the success of the intervention in achieving its “technical” objectives (e.g., anatomic alignment and bony union after open reduction and internal fixation). However, there is an expectation that these will lead to the functional outcomes that patients and parents ultimately want, which is to return to full activities and participation without restrictions (e.g., return to playing soccer). It is important to measure functional outcomes separately because these are related to the ultimate goals of the patient. A technically successful outcome cannot always be assumed to result in a functionally successful outcome. A functionally successful outcome may not always require technical perfection. Indeed, more harm than good might arise from one’s pursuit, and even achievement, of technical perfection. For example, an open reduction of a radial neck fracture to restore anatomic alignment might lead to a worse functional outcome than accepting some magnitude of malalignment that is completely compatible with a perfect functional result. Functional outcomes are more meaningful indicators of effectiveness than technical/clinical outcomes.
The Priority Framework for Outcomes Evaluation
Outcomes are most meaningful when they are aligned with patient priorities. Living with a health condition is associated with a set of experiences that includes current symptoms or “complaints” and/or potential future consequences related to the natural history of the health condition of interest. These experiences and the knowledge of future consequences can be associated with a set of concerns about the health condition and/or its treatment, resulting in certain desires (wishes) and expectations of these treatments and outcomes. Concerns, desires, and expectations can be collectively considered to be patient priorities. Elicitation of these priorities enables patients to define a set of goals related to their priorities, which can influence their choice (or preference) for specific treatment options. The elicitation of patients’ priorities might provide important insight into hitherto unknown patient preferences, which, in turn, might influence the process of informed choice and decision making and true informed consent and may facilitate the evaluation of outcomes that matter most to patients. Patient-based outcomes instruments will only be meaningful if the questions asked of patients reflect what is relevant and important to them.
These concepts have been incorporated in a priority/goal framework for evaluation of outcomes. In the center of the framework is the health condition of interest (e.g., pediatric fractures). Living with the health condition leads to a set of priorities (concerns, needs, desires, and expectations), which are fundamental to defining the goals that are derived from these priorities. Different stakeholders might have different priorities (e.g., child’s, parents’, family’s, surgeon’s, and societal) that might overlap but may not be concordant. Understanding priorities and goals is crucial for making decisions about interventions that will best address the identified priorities and goals or for designing or developing new treatments/interventions where these do not exist or are insufficiently effective in addressing the priorities and goals. Interventions must be held accountable in terms of these goals. Their effectiveness is evaluated (e.g., in clinical trials or cohort studies or at the level of the individual patient) with the use of outcome measures that specifically incorporate the goals and priorities of the patient population. In this framework, defining goals, choosing interventions, and evaluating outcomes or developing valid measures to do so, all come back to and depend on an understanding of patient priorities and the goals that arise from these priorities ( Fig. 8-2 ).
In the management of fractures, achieving anatomic alignment and bony union is a means to an end. The ultimate goal is to ensure that the injured patient is restored to the preinjured state. If the desired goals include that the injured limb should look, feel, and work as well as it did before the injury, these goals should be embedded in any outcome measure that purports to evaluate effectiveness in terms that are meaningful for patients.
Outcome Measures: General Considerations
Whose Perspective Prevails?
Outcome measures are tools used to assess a change in particular attributes that are deemed meaningful to a person’s life over time. To the extent that the patient’s perspective is recognized to be preeminent in making judgments about effectiveness, the use of patient-reported outcome (PRO) measures is now considered the standard when the effectiveness of interventions is evaluated. PROs should be derived from patients themselves, particularly for outcomes that pertain to personal experience (e.g., pain, body image, and self-esteem). This is challenging in the context of pediatric conditions. When children are too young or too cognitively immature to respond, one has to rely on the report of the child’s parent(s). A parent(s)’ report must be recognized to be a proxy for what the child might report. The views of older children can and must be taken into consideration, but their perspective might differ from those of their parents. Parents may or may not recognize that their priorities might be different from those of their children, and parents may not agree with each other. The level of agreement between parents and children is usually good for domains reflecting physical activity, functioning, and symptoms but poorer for domains reflecting more social or emotional issues. Proxies and children may not agree about many issues, but both perspectives may be valid and should be considered during decision making, as well as in measuring outcomes.
Some outcomes are more important than others, and different stakeholders will have different perspectives on the relative importance of different outcomes. Clinical outcomes are more relevant to patients, and nonclinical outcomes (e.g., length of stay and cost) are of interest to hospital administrators, payers, and health policy makers.
Generic Versus Condition-Specific Measures
Outcome measures can be generic or disease- or condition-specific. Generic outcome measures include those that measure general physical function, health status, and well-being. These have the advantage of comparing outcomes across different clinical conditions and interventions and are particularly useful to policy makers who might be interested in understanding the relative value of some types of interventions over others for purposes of health care utilization, planning, and resource allocation. However, generic outcome measures may assess some things that are not relevant to the condition and may neglect other issues of crucial importance. Generic measures are usually less sensitive to change than condition-specific measures that are designed to focus on issues relevant to the condition of interest. Condition-specific measures have more limited applicability. For instance, an outcome measure designed to evaluate outcomes of lower extremity fractures in children may not be relevant for other musculoskeletal conditions in children, let alone nonmusculoskeletal pediatric conditions.
Mortality, Health, and Quality of Life
When the primary goal of an intervention is to save or extend life, measuring mortality (survival) must be the primary outcome. However, every life saved ought to be a life worth living, and who is best to make that judgment than the person whose life has been prolonged. Quality of life (QOL) is defined as “individuals’ perceptions of their position in life in the context of the culture and value systems in which they live, and in relation to their goals, expectations, standards and concerns.” Health-related quality of life (HRQL) refers to the health-related factors that contribute toward the goodness and meaning of life and how one perceives one’s ability to fulfill certain life roles. Health itself is defined as “a state of complete physical, mental and social well being, and not merely the absence of disease or infirmity.” HRQL is therefore a multidimensional construct that encompasses physical, mental, and social well-being, as well as role attainment, daily functioning, and participation in community life. HRQL measures provide a more complete picture of an individual and are complementary to traditional biomedical measures and functional assessments. Functional status refers to the degree to which an individual is able to perform socially allocated roles without physical or mental limitations.
Capacity and Performance
Capacity is the term used to describe an activity or task that an individual can do in a standardized environment, whereas performance refers to tasks or activities that the individual actually carries out or does do in daily life. Some outcome instruments are measures of capacity or capability, whereas others are measures of performance.
Properties of a Good Outcome Measure
There are several factors to consider in the selection of a PRO measure. The outcome measure must be relevant to the goals of the intervention. It should measure issues that are important to the children being evaluated and to their parents and should be in a format that is accessible, comprehensible, and not unduly burdensome. The purpose of the selected outcome measure must be considered. There are three main types of outcome measures. Discriminative measures are able to distinguish patients from each other, for instance, those with a disorder from those without and those with more severe involvement from those less severely involved. Predictive measures provide some indication of a future outcome (e.g., Stulberg criteria of the morphology of the femoral head and acetabulum at skeletal maturity predict the risk of osteoarthritis of the hip in the future). Prediction is also concerned with classifying patients. For example, outcome measures may be used in diagnosis and screening to identify individuals for suitable forms of treatment. Evaluative measures are designed to measure change over time, such as after an intervention. Any measurement tool must be shown to be psychometrically sound (reliable and valid) to achieve the purpose for which the measure is intended.
Reliability is a fundamental requirement of any valid measure. This is the property of reproducibility or consistency. A reliable measure is one that produces the same result when it is administered repeatedly, provided that there has been no change in the subject or attribute that is being measured. Reliability reflects the amount of error that is inherent in any measurement. Reliability can be assessed between repeated administrations over a time frame in which one would not expect to observe a change (test–retest reliability), by the same rater (intrarater reliability), and between different raters of the same subjects (interrater reliability). Reliability is usually measured with the use of statistical tests of concordance or agreement rather than just correlation. The two common measures of reliability are the intraclass correlation coefficient (ICC) used for continuous data and the weighted Kappa for ordinal data. Reliability is expressed as a numeric value between 0 and 1, where 1 represents perfect agreement or concordance. Internal consistency refers to a special type of reliability to assess how well the items within a scale correlate with each other to measure a single dimension (e.g., physical function). The most common way of measuring internal consistency is with the use of the Cronbach alpha.
Reliability and consistency do not ensure accuracy. A measure can be perfectly reliable but very inaccurate. For instance, a patient might be asked to recall the maximum distance he has walked in the last week. He may consistently (reliably) report this to be 100 meters, when, in fact, if it were to be measured objectively, the distance is only 50 meters. Many factors play a role in influencing the accuracy of a measure, including the ability to recall or to make estimates accurately.
A measure is valid when it measures what it was intended to measure. The validity of a scale is the degree of confidence one can place on inferences about people based on their scores from that scale. An outcome measure has face validity when the items in the measure appear to be measuring what they are supposed to. It is an indication of the sensibility of the measure. Content validity examines the extent to which the attribute of interest is comprehensively sampled by the items or questions in the instrument so that all the relevant and important content or domains are represented. Face and content validity reflect a judgment about whether an outcome measure is reasonable and appropriate for the purpose it was intended. These judgments are usually sought during the development of the measure from patients with the condition and experts who work with these patients. Criterion validity is assessed when an instrument correlates with another instrument or measure that is regarded as a more accurate measure (gold standard) of the “criterion”. Concurrent validity is one type of criterion validity in which an outcome measure (new) is compared with another criterion measure by administration of both at the same time. Often, such a gold standard measure does not exist, particularly for subjective attributes like pain or measures of health. Validation then takes on a process of hypothesis testing. Construct validity examines the logical relations that should exist between a measure and characteristics of patients and patient groups. For example, an outcome measure can be tested on two groups of patients: one known to have the condition and the other not or one known to be the more severely involved and the other group just mildly involved. One would test the hypothesis that these groups would be rated differently on the outcome measure, thereby demonstrating known groups or extreme groups validity. Convergent validity, another type of construct validity, is shown when the scales of a measure correlate as expected with the related scales of another measure but not to unrelated scales (divergent validity). When applied to the population of interest, a discriminative instrument is sufficiently sensitive to detect small (but meaningful) differences between patients. An instrument should be free from ceiling effects, which occur when many subjects of the population of interest rate the highest possible score on the measure because it is not sensitive enough to be able to discriminate or distinguish higher functioning subjects from each other. When an instrument is less discriminative of lower functioning subjects and rates them all at the lowest end of the scale, it suffers from floor effects.
An outcome instrument that is intended to measure effectiveness (evaluative measure) must possess the property of responsiveness or sensitivity to change, which is the ability to measure change over time or after interventions. Responsiveness must be tested in longitudinal studies and can be quantified by either the standardized response mean (SRM), which is the ratio of the mean change to the standard deviation of that change, or the effect size (ES), which is the ratio of the mean change to the standard deviation of the initial measurements.
Outcome Measures for Pediatric Fractures
The ultimate goals of fracture management are to restore form and function. Restoration of function generally requires the fracture to unite and soft tissues (e.g., muscles and ligaments) to heal, en route to the resolution of pain, stiffness, weakness, and fatigue, the restoration of range of motion and strength, and return to full use. Bones are the lever arms on which the muscles act, and bone alignment and length influences lever arm function. Form (appearance) is influenced by alignment, length, and (absence of) atrophy and scarring. Restoration of alignment and length is common to both goals and is therefore the most common outcome measured when the effectiveness of fracture treatments is evaluated. These are outcomes at the level of body structures and body functions in the ICF framework or the technical outcomes of Goldberg.
Radiographic Outcomes of Alignment and Length
Alignment and length outcomes are measured on radiographs, recorded at some sufficiently elapsed time point after the injury to allow for remodeling. Conventionally, alignment has been reported in terms of residual angulation at the fracture site, which is by far the most common outcome reported in fracture studies. In these studies, malalignment is typically defined based on some threshold magnitude of angulation that is believed to be clinically significant in terms of either its physiologic and biomechanical impact now or in the future or its external visibility (appearance). For example, in a multicenter randomized trial comparing early spica cast with external fixation of femur fractures in children, the primary outcome measure reported was (the rate of) fracture malunion at 2 years after the fracture, defined as any of the following: limb-length difference of greater than 2 cm, more than 15° of anterior or posterior angulation, or more than 10° of varus or valgus angulation.
A number of serious limitations are associated with the use of residual angulation at the fracture site as a valid measure of alignment, especially as the primary outcome measure of fracture management, particularly in children. First, the remodeling of long bone fractures occurs predominantly at the physes by asymmetric growth to correct the alignment and less so at the fracture site. In femur fractures, Wallace and Hoffman demonstrated that 85% of angular deformity had corrected at an average of 45 months, of which only about 26% of the remodeling occurred at the fracture site. This would imply that measuring residual angulation at the fracture site will not take into account where most of the remodeling and correction has occurred, which will potentially lead to overreporting of “malunions.” This problem will be especially true for fractures treated by nonoperative or closed methods because these are less likely to achieve anatomic alignments at the outset ( Fig. 8-3 ). For instance, in the pediatric femur fracture trial of early spica casting versus external fixation of femur fractures in children, the rate of malunions, was reported to be 45% and 16%, respectively. The majority of these malunions was due to residual angulation at the fracture site. The authors concluded that early spica casting was associated with a three times greater rate of malunion than external fixation. However, no corrective osteotomies or other corrective procedures were reported to address these malunions for either group. Furthermore, no difference was recorded in functional outcomes or parental and child satisfaction, which were excellent in both groups. One possible explanation for this wide gap between the radiographic outcomes and patient-reported outcomes might be that the measurement of residual angulation at the fracture site is misleading because it fails to account for correction that has occurred at the physes. This might also explain why the rate of surgery to correct these “malunions” is usually far lower than the rate of reported malunions in these studies.