Trial Design, Measurement, and Analysis of Clinical Investigations

Evidence-Based Medicine and Clinical Investigation

Today more than ever, clinicians are encouraged to practice evidence-based medicine (i.e., the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients). Inherent to this is the need to appraise the usefulness and quality of clinically relevant research. The strength of evidence depends on many factors, including the rigor of the study design; the selection of patients and appropriate controls; and the meticulousness and appropriateness with which the data were gathered, analyzed, interpreted, and reported.

Clinical research is viewed as a continuum, beginning from basic biomedical research, progressing to clinical science and knowledge, and resulting in improved health of the public. Two major “transitional blocks” are identified that impede efforts to apply science to better human health in an expeditious fashion. The first translational block occurs between basic biomedical research and clinical science and knowledge, and the second block occurs between clinical science and knowledge and improved health. Contributing factors to the first block include lack of study participants willing to participate in research, regulatory burden, fragmented infrastructure, incompatible databases, and a lack of qualified investigators. Contributing to the second block are career disincentives, practice limitations, high research costs, and lack of funding. These obstacles should remain foremost in the reader’s mind, with the realization that the design and analysis of studies must be grounded in what is reasonable from logistical, practical, ethical, and economic points of view. This chapter provides readers with enough clinical, epidemiological, and biostatistical skills to assess the literature critically and to determine independently the “strength of the evidence.” It also promotes basic skills that facilitate the design, undertaking, and reporting of clinical research. Although this chapter emphasizes clinical research, many of the concepts discussed here are easily translated to the realm of basic science. Whether working in the laboratory or in the clinic, an investigator must understand basic concepts, such as frequency distributions and statistical inferences.

Definition of Clinical Research

The Nathan Report defines clinical research as “studies of living human subjects, including the laboratory-based development of new forms of technology; studies of the mechanisms of human disease and evaluations of therapeutic interventions (which are known collectively as translational research ); clinical trials, outcome studies, and health care research; and epidemiological and behavioral studies.” This area of research includes mechanisms of human disease, therapeutic interventions, clinical trials, and the development of new technologies. Human research, per this definition, excludes in vitro studies that use human tissues but do not deal directly with patients. Conversely, laboratory or translational research (“bench to bedside”), provided that the identity of the patients from whom the cells or tissues under study are derived is known, constitutes clinical research.

The National Institutes of Health (NIH) considers genomic and behavioral studies as categories of clinical research. In this chapter, clinical trials are considered separately from other types of clinical studies.

Patient-Oriented Research

The NIH definition of clinical research groups four categories of investigation under the major heading of patient-oriented research: (1) mechanisms of human disease, (2) therapeutic interventions, (3) clinical trials, and (4) development of new technologies. This chapter emphasizes clinical trial classification and methods. Many of the concepts and much of the terminology presented herein can be generalized to the conduct of clinical studies in any of these four areas of clinical research.

General Terminology and Basic Concepts of Clinical Studies

All clinical investigations may be divided broadly into observational or experimental studies. In observational studies, there is no artificial manipulation of any factor that is to be assessed in the study, and there is no active manipulation of the patient. In observational studies, the subjects have received the “etiological” agent by mechanisms other than active assignment or randomization. Examples may be medication exposure, atmospheric pollutants, and occupational toxins. Observational studies may be either retrospective or prospective. Retrospective implies that the data already exist and are retrieved using a systematic approach, but missing data are not retrievable or cannot be verified. In prospective observational studies, a cohort is observed prospectively through time, and data are gathered on an ongoing basis. In this case, missing data may possibly be retrieved for purposes of the study, and standardized methods of data verification can be employed. Experimental studies are studies in which the investigator artificially manipulates study factors or subjects, such as therapeutic regimen, or some other parameter. In experimental studies, the subjects are observed prospectively, some active maneuver is conducted, and the results of this maneuver are then observed.

Objectives and Hypotheses

The first step for conducting a clinical study is a research question or hypothesis. A research question is a clear, focused, concise, complex, and arguable query around which a particular research is centered. Hypotheses are derived from the research question. They are declarative statements about the predicted relationship between two or more variables. Hypotheses are testable, meaning that the variables that are part of a hypothesis must be observable, measurable, and analyzable. However, when formally testing for statistical significance, the hypothesis should be stated as a null hypothesis.

The null hypothesis, H 0 , represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that, on average, the new drug is no better than the current drug. Hence the statement for H 0 would be: “There is no difference between the two drugs on average.” Special consideration is given to the null hypothesis because it relates to the statement being tested, whereas the alternative hypothesis relates to the statement to be accepted if/when the null hypothesis is rejected. The final conclusion, once the test has been carried out, is always given in terms of the null hypothesis. We either “reject H 0 in favor of H 1 ” or “do not reject H 0 ”; we never conclude “reject H 1 ,” or even “accept H 1 .” If we conclude “do not reject H 0 ,” this does not necessarily mean that the null hypothesis is true; it only suggests that there is not sufficient evidence against H 0 in favor of H 1 . Rejecting the null hypothesis, then, suggests that the alternative hypothesis may be true.

The primary objective of a research study should be coupled with the hypothesis of the study. Study objectives define the specific aims of a study and should be clearly stated in the introduction of the research protocol. The study objective is an active statement about how the study is going to answer the specific research question.

The relation between the research question, hypothesis, and study objectives is exemplified by a study by de Benedetti and colleagues on a randomized trial of tocilizumab in systemic juvenile idiopathic arthritis (SIJA).

  • Research question: How does tocilizumab compare with a placebo in managing the signs and symptoms of SJIA?

  • Research hypothesis: SJIA signs and symptoms are significantly more improved in patients who receive biweekly intravenous tocilizumab for 12 weeks compared with individuals who receive placebo.

  • Objective: To investigate the clinical efficacy and safety of tocilizumab in children with systemic juvenile idiopathic arthritis.

Hypothesis-Generating Versus Hypothesis-Testing Studies

The design of a clinical investigation depends on whether the study intends to generate hypotheses to be tested in future studies or to test specific hypotheses for which the investigator has some existing evidence to support the belief that they are true or not true. Hypothesis-generating studies are considered exploratory . Studies that are designed as tests of hypotheses, for which there are preliminary data, are often called pivotal or confirmatory studies. A given study may have confirmatory and exploratory aspects. Each type of study has distinct advantages and disadvantages. The design chosen is always deeply influenced by reality: what is economically, logistically, ethically, and scientifically possible.

A common exercise used by methodologists is to design the best theoretical experiment to answer the research question posed, without regard to time, money, ethics, patient availability, or anything else that could cause a lessening in the quality of the study; a related approach is known as the infinite data set . Realizing that there is no such thing as the perfect clinical study, the designer eliminates the most unrealistic “requirement.” For example, it is not likely that one can enroll 300 children with active granulomatous angiitis who would agree to the possibility of being randomly assigned to a placebo for 1 year. The study is compromised further and further by reality until one arrives at what can be done in consideration of all the issues. If the resulting study design and its protocol are unacceptable scientifically, perhaps the question cannot (and should not) be answered. The decision to pursue or not to pursue the “compromised” study, based in reality, is one of the most difficult in the entire research process.

Epidemiological Studies

Clinical epidemiology is a medical science that studies the frequency and determinants of disease development, as well as the diagnostic and therapeutic approaches to disease management in clinical practice. Epidemiology and biostatistics comprise the basic tools of the clinical investigator. Epidemiological methods can be used to answer questions in the following categories.

Studies in descriptive epidemiology typically concern themselves with patterns of disease occurrence with respect to person, place, or time. Descriptive epidemiological studies serve as hypothesis-generating studies for studies of causation, much the same way as small exploratory clinical trials serve as preliminary studies for therapeutic confirmatory trials. The person variable is concerned with who experiences the disease. A basic tenet is that the disease does not occur at random, but is more likely to develop in some people than in others. Personal factors of potential importance include age, sex, race, ethnicity, socioeconomic status, existing morbidity, health habits, genetics, and epigenetics (i.e., heritable alterations in gene expression caused by mechanisms other than changes in DNA sequence). The place variable is concerned with where the disease develops. Variation in place of occurrence can be evaluated at the local, regional, or national level. The time variable is concerned with variation in the occurrence of disease in time and its seasonality or periodicity.

A hypothetical example of a descriptive epidemiological study is the investigation of a group of workers in a factory who are suspected of having environmentally acquired lupus. The epidemiologist would investigate the detailed characteristics of the workers to determine whether there are patterns among the workers who do and do not have lupus. Do all types of workers (management through hourly manufacturing employees) show the same rate of disease development? Are people living close to the factory or its effluent affected? Systematic investigation of the patterns of disease allows a more precise hypothesis of causation, particularly if some exposure or dose level is found to be more strongly associated with the illness.

Frequency of Disease Occurrence and Prognosis

The frequency of disease occurrence is an important aspect of understanding a disease process. It can be measured in numerous ways. Epidemiological theory states that incidence is best estimated from prospective studies; prevalence may be calculated by prospective or retrospective approaches.

Incidence is the rate at which newly diagnosed cases develop over time in a population. Mathematically, incidence is equal to the number of new cases (numerator) divided by the number of persons at risk in the population multiplied by the time (duration) of observation (denominator). This rate is expressed in units of cases/person-time. Incidence is related to the concept of risk, defined as the proportion of unaffected individuals who, on average, contract the disease over a specific period. Risk of a disease is equal to the number of new cases divided by the number of persons at risk. Risk has no units and can have values between 0 (no new occurrences) and 1 (the entire population becomes affected during the risk period). Prevalence is the total number of existing cases in a defined population at risk of developing a disease, either at a point in time or during some time period. Mathematically, prevalence is equal to the number of existing cases divided by the number of persons in the population at risk of developing the disease. Persons with the disease are subtracted from the denominator because they are no longer at risk of developing it. Prevalence is expressed in different ways: as a proportion (0 to 1), as a percentage (0 to 100), or by actual numbers using a convenient denominator (e.g., cases per 1,000 children). Point prevalence is the number of new and old cases in a defined population at a given “instant” in time. Period prevalence is the number of new and old cases that exist in a defined population during a given time period (e.g., 1 year).

Prognosis refers to the possible outcomes of a disease and the frequency with which they can be expected to occur. Prognostic factors need not cause the outcome, but must merely be associated with an outcome strongly enough to predict it. Prognosis is narrower in focus and more short-term in aspect than the consequences of disease and treatment that are considered in the field of outcomes research. The six most frequently measured outcomes in outcomes research are known as the six D’s: (1) death, (2) disease, (3) disability, (4) discomfort, (5) dissatisfaction, and (6) dollars. Prognostic studies often use a prospective cohort design. Studies of prognosis in JIA have included the sex of the patient, the age at onset, and a variety of clinical and laboratory variables to estimate outcome. Prognostic studies may also evaluate DNA and RNA, including pharmacogenetics.

Etiology and Risk of a Disease

In his presidential address to the Royal Society of Medicine in January 1965, Sir Austin Bradford Hill gave his now famous speech titled “The Environment and Disease: Association or Causation.” Hill described what have become known as Koch’s postulates for epidemiologists. These postulates describe what evidence should be considered when assessing causation of disease. Satisfaction of all of these criteria is not necessary or sufficient to establish causation, but they serve as a useful guide and include the following:

  • 1.

    Strength of the association: How strong is the association between the factor and the outcome? For example, how significant is the probability ( P ) value of the association between dietary intake of calcium and bone mineral density among children with juvenile idiopathic arthritis (JIA)?

  • 2.

    Consistency of the association: Does the association between factor and disease persist from one study to the next, even if variations in study design and samples of patients vary substantially?

  • 3.

    Specificity of the association: Is the association limited to specific alleles and types of disease, with little association between the alleles and other diseases? As the study of causation has advanced, including genetic risk, the issue of specificity is considered less important than it previously was.

  • 4.

    Temporal correctness: Did the exposure to the factor occur before the disease? Temporal correctness becomes more difficult to establish in diseases with extended time intervals between exposure and the onset of clinical manifestations of the disease.

  • 5.

    Biological gradient: Is there a dose-response relationship between the factor and the disease? Does increasing the dose or time of exposure to cyclophosphamide result in a subsequent increase in frequency of malignancy?

  • 6.

    Biological plausibility: Does the association make sense with what is currently understood about the disease and its pathogenesis?

  • 7.

    Coherence: Is the association consistent with laboratory science investigations of the disease?

  • 8.

    Experiment: Does the association hold up under experimental conditions? If one reduces the dose or time of exposure to cyclophosphamide, is there a corresponding decrease in the frequency of malignancy?

  • 9.

    Analogy: Are there similar factors that are accepted to be the cause of similar diseases?

No single study can prove indisputably that a potential etiological factor causes a disease, complication, or adverse event. The accumulating body of knowledge concerning factor and disease, or treatment and outcome, finally allows the conclusion that evidence is sufficient to prove a causal link between the two.

Risk of a disease is the likelihood, usually quantified as an incidence rate or cumulative incidence proportion, that an individual will develop a given disease in a given time period. There are several risk measures, among them the absolute risk of a disease, which is the chance of developing the disease over a time period (see Table 6 1 ). The same absolute risk can be expressed in different ways. For example, say you have a 1 in 10 risk of developing a certain disease in your life. This can also be said to be a 10% risk, or a 0.1 risk. The absolute risks from different exposures can be compared with each other by calculating the absolute risk reduction through simple subtraction. For example, if the absolute risk of developing an unwanted outcome with drug A is 5% and with drug B is 15%, then the absolute risk reduction associated with drug A would be 10%. A related measure is the number needed to treat, which is the inverse of the absolute risk reduction. In this example, the number needed to treat is 10 (1 / 0.10 = 10), meaning that for every 10 patients treated with drug A instead of drug B, 1 occurrence of the unwanted outcome is avoided.


Terms Associated with Risk Factors and Disease

Positive A B
Negative C D
The 2 × 2 table may be used to calculate associations between the risk factor and the disease
Incidence ( a + c ) / ( a + b + c + d ) Number of new cases among those at risk
Absolute risk Synonymous with incidence
Attributable risk [ a / ( a + b )] − [ c / ( c + d )] Incidence among those with the risk factor minus incidence among those without the risk factor (sometimes expressed as a percentage of the incidence rate among those with the risk factor)
Relative risk ( a / [ a + b ]) ÷ ( c / [ c + d ]) Incidence among those exposed divided by incidence among those not exposed
Odds ratio ( a × d ) / ( b × c ) Approximation to the relative risk used in case-control studies
Case exposure rate a / ( a + c ) Among those with the disease, the proportion who had the risk factor
Control exposure rate b / ( b + d ) Among those without the disease, the proportion who had the risk factor

An important concept in the study of disease etiology is relative risk or risk ratio (RR) . RR is used to compare the risk in two different groups of people. For example, research suggests that smokers have a higher risk of developing heart disease compared with (relative to) nonsmokers. An RR ranges from 0 to infinity. The RR indicates the strength of the association between the risk factor and the disease outcome and is calculated by dividing the absolute risk in the group exposed to a risk factor by the absolute risk in the unexposed group. An RR value statistically significantly larger than 1 indicates the exposure is associated with increased risk of disease; an RR value not statistically significantly different from 1 indicates there is no association between the exposure and the risk of disease; and an RR value statistically significantly less than 1 indicates the exposure is associated with decreased risk of disease; that is, the exposure is protective.

An RR used frequently in genetic studies is lambda, indicating familial aggregation of cases. An example is lambda sibling s ), calculated as the prevalence of a disorder in biological siblings of individuals with the disease divided by the prevalence of the disease in the general population. The λ s for systemic lupus erythematosus (SLE) has been estimated to be 30, meaning that a sibling of an individual with SLE is 30 times more likely to develop SLE than a member of the general population. LOD scores (logarithm of odds) are distinct from λ s and are commonly used to estimate genetic linkage in families between generic traits or biomarkers, and genetic traits or more than one biomarker.

Table 6-1 presents related terms that are relevant to risk and shows how each may be calculated using a 2 × 2 contingency table. Disease state (present or absent) is considered the dependent variable and is usually placed in the columns ( x axis). The risk factor (positive or negative) is considered the independent variable and is usually placed in the rows ( y axis).

The RR is calculated differently from the odds ratio and these two terms are not interchangeable. Odds ratios (ORs) are used in case-control studies because the retrospective selection of controls does not generally allow for the determination of true incidence rates. Instead, the OR is calculated by dividing the odds of exposure among the cases by the odds of exposure among the controls. The OR is frequently reported in sophisticated epidemiologic studies because it is the effect measure derived from regression models that have binary (yes/no) outcomes (i.e., logistic regression).

Diagnosis of Disease and Classification, and Response Criteria

The diagnosis of disease, as it applies to epidemiology, refers to the performance of screening and diagnostic tests used in populations, rather than the process of differential diagnosis of individual patients. Classification criteria are typically used to identify homogeneous population for studies with the intent to facilitate hypothesis testing. Classification criteria typically employ a set of core variables fashioned into an algorithm. Examples are the classification criteria for juvenile dermatomyositis or scleroderma. Although the criteria are often used in clinical practice to support the diagnosis of a rheumatic disease, patients can be diagnosed with a rheumatic disease even if they do not fulfill classification criteria for this disease. Conversely, response criteria provide measures of change to therapy. Examples are the criteria of flare or those of improvement in of JIA.

Validity of a Diagnostic or Screening Test or Set of Criteria

The validity of a diagnostic or screening test or set of criteria involves various parameters, as shown in Table 6-2 . The table typically is constructed with the presence or absence of disease as the column labels (i.e., x axis) and the test results as the row labels (i.e., y axis). The patients in row 1, column 1, are called true positives ; patients in row 1, column 2, are false positives ; patients in row 2, column 1, are false negatives ; and patients in row 2, column 2, are true negatives . Sensitivity, specificity, positive and negative predictive values, false-positive rate and false-negative rate, and reliability are terms used to describe the validity of a screening test. Of note, the positive and negative predictive values of a diagnostic test are dependent upon the prevalence of the condition in the population. By contrast, sensitivity and specificity are generally considered to be inherent properties of diagnostic tests.


Estimating the Validity of a Diagnostic Test

Positive True positive (TP) False positive (FP)
Negative False negative (FN) True negative (TN)
The 2 × 2 table may be used calculate measures of the test’s validity
Sensitivity TP / (TP + FN) Proportion (or percentage) of persons with the disease who test positive
Specificity TN / (TN + FP) Proportion (or percentage) of persons without the disease who test negative
Positive predictive value TP / (TP + FP) Proportion (or percentage) of persons who test positive who have the disease
Negative predictive value TN/ (TN + FN) Proportion (or percentage) of persons who test negative who do not have the disease
False-positive rate FP / (TP + FP) Proportion (or percentage) of persons who test positive who do not have the disease
False-negative rate FN / (TN + FN) Proportion (or percentage) of persons who test negative who have the disease
Reliability (also called reproducibility) The ability of a test to yield the same result on retesting
Likelihood ratio positive (LR+) Sensitivity / 1 − specificity The magnitude of the increase in the odds of disease given a positive test result
Likelihood ratio negative (LR−) 1 − sensitivity / specificity The magnitude of the decrease in the odds of disease given a negative test result

The utility of diagnostic tests can be evaluated with the use of likelihood ratios. The likelihood ratio positive (LR+) and likelihood ratio negative (LR−) are defined in Table 6 2 . If one estimates the pretest odds of a patient having a disease, performs the diagnostic test, and then multiplies by the appropriate corresponding likelihood ratio (LR+ for a positive test result and LR− for a negative result), the result is the posttest odds of the patient having the disease. Odds and probabilities can be interconverted easily [odds = probability / (1 − probability)]. Posttest probabilities of disease generated from likelihood ratios are more refined estimates than the positive predictive value of the test, because they rely on an individual patient’s probability of disease rather than the prevalence of disease among the population.

A widely used tool that allows visual comparison of the performance of a set of different criteria, or different cut points for a diagnostic or screening test, is known as the receiver operating characteristic (ROC) curve. An ROC curve is a plot of the true-positive rate against the false-positive rate–sensitivity on the y axis and (1 − specificity) on the x axis. An ROC curve informs about the trade-off between sensitivity and specificity for different criteria or cut points; the nearer the curve follows the left upper corner of the ROC space, the more accurate the test is. Conversely, the closer the curve approaches to the 45° diagonal of the ROC space, the less accurate the test is. Tests with an area under the ROC curve of 0.5 or lower are no better than chance to predict whether the disease is present or not. ROC analysis is commonly used to assess the quality of new criteria, diagnostic tests, or predictive tests. The overall quality of a test can be summarized by the area under the ROC curve (AUC), which ranges between 0 and 1. The larger the AUC of a test is, the more accurately the test predicts the disease, in terms of sensitivity and specificity. Figure 6-1 provides a sample of an ROC curve of a new laboratory test for the anticipation of a flare of lupus nephritis with a guide for the interpretation of the area under the ROC curve.


The receiver operating characteristic (ROC) curve, a plot of the true-positive rate versus false-positive rate (i.e., sensitivity on the y axis, and 1 − specificity on the x axis). Sensitivity and specificity range between 0 and 1 (or 0% and 100%). ROCs allow one to observe the trade-off between sensitivity and specificity at various cut points of a diagnostic test. Tests of a certain outcome (disease) with ROC curves that are 45° diagonals have an area under the ROC curve of 0.5. These tests are not useful for predicting the outcome (diagnosing the disease). The left most upper point of the ROC curve provides the statistically best trade-off of sensitivity and specificity. Depending on the cutoff point along the ROC curve chosen, the laboratory test yields a certain NPV and PPV. The AUC (0-1) serves as an overall measure of the quality of the test for the diagnosis (here: risk of impending lupus nephritis flare). NPV , Negative predictive value; PPV , positive predictive value; AUC , area under the ROC curve.

(Modified from C.H. Hinze, M. Suzuki, M. Klein-Gitelman, et al., Neutrophil gelatinase-associated lipocalin is a predictor of the course of global and renal childhood-onset systemic lupus erythematosus disease activity, Arthritis Rheum . 60 [2009] 2772–2781.)

Epidemiological Study Designs Aimed at Establishment of Associations and Cause-Effect Relationships

Case-Controlled Retrospective Study

One of the most common types of study designs used to establish an association, or a cause-effect relationship, is an observational case-controlled retrospective study. In this setting, patients who have the disease are compared with patients who do not have the disease, and data documenting prior exposure to some agent are ascertained retrospectively.

The most frequent statistic to come from this type of study is the OR (see Table 6-1 ). Provided the disease is rare, the OR estimate is numerically similar to the RR. The choice of an appropriate control group is crucial for the correct inferences to be made about a prior exposure. Controls are chosen with the intent to adjust for personal, socioeconomic, or environmental factors that may influence the development of a disease.

There is no “gold standard” for selecting control subjects, but guidelines exist. One basic principle is that the control patients should be representative of the underlying population from which the cases were derived (i.e., if the controls had developed disease, then they would have been identified as cases for the study). Advantages of case-controlled studies include efficiency, low cost, quick results, and low risk to study subjects. They are particularly advantageous when studying rare outcomes because the persons with the outcome or disease of interest are identified from the beginning of the study. There are several disadvantages of case-controlled studies, however. The temporal relationship of exposure and disease may be obscured. Historical information may be incomplete, inaccurate, or not verifiable; a detailed study of mechanisms of disease is often impossible; and if the study is not well done, results may be biased. For example, persons who develop a disease may remember prior exposures differently than those who do not develop a disease because they suspect a possible causal association in their own minds (i.e., recall bias).

Prospective Cohort Study

The observational approach that most closely resembles an experiment is the prospective cohort study. In this study, a population is defined from which the sample is drawn. Exposure to some factor is established, and subjects are categorized as having been either “exposed” or “not exposed” to a factor thought to contribute risk of some outcome. Each of the two cohorts is monitored prospectively to observe whether the outcome develops. Relative risk is the statistic most commonly used in describing this study. The identification of exposed persons presents several problems. The first is to identify the exposed persons correctly and measure the degree of exposure. This may be done by selecting subjects with some type of unusual occupational or environmental exposure.

The advantages of a prospective observational cohort study are that a clear temporal relationship between exposure and disease is established, and the study may yield information about the length of induction (incubation) of the disease. The design facilitates the study of rare exposures and allows direct calculation of disease incidence rates and thus relative rates or risks.

The disadvantages of cohort studies include the potential for loss to follow-up or alteration of behavior because of the long follow-up time that may be necessary. Cohort studies are not particularly suited for rare diseases when the outcome is onset of the disease. Detailed studies of the mechanisms of the disease typically are impossible in cohort studies. An example of a cohort study designed to detect disease causation is the study by Inman and associates, who prospectively observed a cohort of persons exposed to Salmonella typhimurium infection to determine whether reactive arthritis developed. Prospective cohorts or registries also are useful in pediatric rheumatology when the aim is to identify risk factors for development of certain complications or outcomes in a group of children who, typically, have the same disease but vary in predictor or risk variables.

Prospective Observational Registries

A patient registry has been defined as the organized collection of uniform observational data to evaluate specific outcomes for a defined population of persons. Registries may be developed to examine the natural history of disease, to analyze the effectiveness and safety of treatments, to measure quality of care, and other purposes. The primary data in registries may be generated by medical encounters (e.g., physician assessments and the results of investigative studies), and these data may be additionally linked to secondary data sources that are collected for other purposes (e.g., outpatient pharmacy billing data). Advantages of prospective observational patient registries include the study of “real-world” patients and conditions with resultant excellent generalizability and the ability to examine clinical questions for which a randomized clinical trial is impractical or unethical. The main disadvantage of registries is common to all observational studies: the potential for bias, especially confounding by indication.

Health Services Research

Health services research (HSR) can be defined as the multidisciplinary field of scientific investigation that studies how social factors, financing systems, organizational structures and processes, health technologies, and personal behaviors affect access to health care, the quality and cost of health care, and ultimately individual health and well-being. Research domains are individuals, families, organizations, institutions, communities, and populations.

HSR uses a multitude of methods and techniques, and in the following section the ones commonly or increasingly used in pediatric rheumatology are summarized. A comprehensive list can be found at .

Among the key methods of HSR are systematic reviews that use explicit methods to perform a thorough literature search and critical appraisal of individual studies to identify valid and applicable evidence. Systematic reviews summarize the existing evidence and identify gaps in current knowledge. They are often considered the prerequisite for meta-analyses , a statistical procedure for synthesizing quantitative results from different studies. These types of analyses can be used to overcome problems of reduced statistical power of smaller studies, making it a powerful analytical technique. The standard estimates derived by meta-analytic methods are combined probability and average effect size for a set of studies, the stability of these results, and the factors associated with differential treatment outcomes. The evaluation of meta-analyses should include assessment of whether there is a biased selection of studies and judgment about the quality of the data included, as well as the conceptual, methodological, and statistical soundness of the studies. Common challenges in meta-analyses include differences in the outcomes reported by the individual studies or significant heterogeneity in the results of the individual studies such that an aggregate estimate may not be easily interpreted. Despite these potential shortcomings, meta-analysis is a valid approach for overcoming the issues of reduced power because of a small study population or a rare outcome. Important meta-analyses that have influenced medical decisions in rheumatology include those of the cyclooxygenase-2 inhibitor rofecoxib, which ultimately led to the withdrawal of the product from the market.

The Cochrane Collaboration ( ) is an international not-for-profit and independent organization that produces and disseminates systematic reviews and meta-analyses of health care interventions and promotes the search for evidence in the form of clinical trials and other studies of interventions.

Decision analysis is another HSR method that is aimed at supporting evidence-based medical decision making. Decision analysis is a means of making complicated medical decisions by including all of the factors that could possibly affect the outcome. Decision analysis uses the form of a decision tree as a diagrammatic representation of the possible outcomes and events that are considered in the decision analytical model. This includes outlining the problem, laying out the options and possible outcomes in explicit detail, assessing the probabilities and values of each outcome, and selecting the “best choice.” Few decision analyses have been published in pediatric rheumatology owing to the large amount of evidence generally required to construct an informative decision model, but an example is a decision analysis about the treatment of monoarthritis of the knee in JIA.

Cost-effectiveness analyses are special types of decision analyses that address questions of the cost of health interventions compared with health outcomes. Cost-effectiveness analyses are often done to assess whether the additional costs of new medications are worth paying for by society. Cost-effectiveness analyses often use quality-adjusted life years (QALYs) to represent the value of different health outcomes. Using this metric, one year of “perfect health” is assigned a value of 1.0, and death is assigned a value of zero. Individual health states (e.g., moderately active polyarthritis) can be assigned values on this scale using various methodologies to elicit patient preferences. The amount of time spent in a particular health state is then multiplied by its assigned value to determine the number of QALYs for that health outcome. A value often mentioned is the incremental cost-effectiveness ratio (ICER), which compares the differences between the costs and health outcomes of the medication being evaluated and the most cost-effective current treatment. One of the challenges of cost-effectiveness studies is selecting the appropriate perspective for the determination of costs. For example, the results of cost-effectiveness analyses are often sensitive to whether or not the indirect costs of disease (e.g., lost time away from employment) are included.

Outcomes research is part of HSR and designed to evaluate the impact of health care on health or economic outcomes. Large population-level data sets are often used to conduct outcomes research, although primary data collection is sometimes conducted. Where large data sets are used, these are often gleaned from administrative or financial data, which may not be ideal for research purposes.

Outcomes research includes pharmacovigilance ( i.e., the process of detecting, assessing, understanding, and preventing adverse effects of approved drugs), and data from postmarketing reports, including adverse drug event reports, are used. Pharmacoepidemiology is the study of the use of and effects of drugs in large groups of people. To complement the information obtained from phase IV studies, pharmacoepidemiology studies frequently make use of data that were collected for other purposes, such as administrative claims billing data. These data sources generally provide detailed patient-level information about physician diagnoses, hospitalizations and other resource utilization, and outpatient medication prescription fills, but they do not contain clinical information, such as physician assessments and results of investigative studies. An example of a pharmacoepidemiology study in pediatric rheumatology is the use of administrative data from the United States Medicaid program to examine the rates of hospitalized infection associated with different treatments in JIA.

HSR also includes treatment guidelines and benchmark development to standardize therapies and obtain quality parameters for treatment effectiveness.

Other key areas of HSR research are the development and validation of classification and response of disease outcome measures. As the science of clinical research advances, we must update our standards for considering classification and response criteria. Disease outcome measures allow the comparison of patients in a standardized fashion. Details of how to develop and validate classification and response criteria and outcome measures in terms of their reliability, validity, and diagnostic accuracy can be found elsewhere.

Regulatory Affairs and Clinical Trials: Useful Guidelines

For simplicity, the generic term drug is used in the following discussions. It should be considered synonymous with any medicinal product, vaccine, or biologic agent. The principles discussed can also apply to interventional procedures such as surgery and radiotherapy. Clinical epidemiologists are frequently concerned with evaluating the effectiveness and safety of new therapies.

The Code of Federal Regulations of the U.S. Food and Drug Administration (FDA), and in particular Title 21 (Food and Drug), is the most relevant to clinical researchers in the United States. Regulatory activities for clinical research are described in the Good Clinical Practice (GCP) guidance developed by the International Conference on Harmonisation (ICH) of Technical Requirements for Registration of Pharmaceuticals for Human Use. The ICH GCPs represent an international quality standard that various regulatory agencies around the world can transpose into regulations for conducting clinical research. The GCPs include guidelines for human rights protection and how clinical trials should be conducted, and define the responsibilities and roles of clinical investigators and sponsors.

Links to relevant guidance documents for clinical researchers can be found most quickly at the website of the FDA ( ) and the website of the ICH ( ). Of particular relevance to pediatric rheumatologists is the FDA document titled “Guidance for Industry: Clinical Development Programs for Drugs, Devices and Biological Products for the Treatment of Rheumatoid Arthritis.” This document summarizes the position of the FDA on what clinical development programs should consist of, and it provides a framework for conducting studies used to obtain regulatory agency approval of therapies for rheumatoid arthritis or JIA. More recently the FDA issued a draft guidance document specifying pediatric study plans.

Classification of Clinical Trials by Initiator, INDs, NDAs, and BLAs

Before general and specific considerations for individual trials can be discussed, an understanding of the various systems of classification of clinical trials is essential. Clinical trials may be initiated either by industry or by an individual investigator. Trials that are part of a clinical development program and conducted under a sponsor’s (pharmaceutical company’s) Investigational New Drug (IND) submission are usually initiated by industry. Trials undertaken under a sponsor’s IND are used frequently by the sponsor in its submission to obtain approval for a new drug, known in the United States and elsewhere as a New Drug Application (NDA). If the NDA is approved by the regulatory agency, the drug may be marketed and labeled for the specific indication (i.e., disease or conditions) stated in the NDA. If the new agent is a biotechnology-derived pharmaceutical, such as a monoclonal antibody, the company files a Biologic License Application (BLA), which is analogous to an NDA for conventional drugs.

Investigator-initiated protocols are typically, but not always, conducted after the drug has been approved for market. The main objective of investigator-initiated protocols may be new dosage regimens or use in diseases other than that for which the drug has obtained an indication. Many such trials are exploratory rather than confirmatory. Funding for investigator-initiated protocols in pediatric rheumatology has come from government agencies, manufacturers, and foundations. Details of the rules and regulations for medication approval in Europe are available at .

Classification of Clinical Trials by Phase

Phase 0 trial is a term sometimes used to refer to the preclinical or theoretical phase of agent development during which the uses of the drug based on animal models and cellular assays are explored. Clinical drug development programs are often described as consisting of four temporal phases, numbered I through IV by the pharmaceutical industry and by regulatory agencies ( Fig. 6-2 ).


Study types and the phases of development in which they are performed. This graph shows that study types are not synonymous with phases of development. BLA , Biologic license application; HPB , health protection branch; IND , investigational new drug; NDA , new drug application; NDS , new drug submission.

Phase I

Phase I studies are human pharmacology trials with a focus on pharmacokinetics and pharmacodynamics. Both are important in determining a drug’s effect. In recent years, the study of pharmacogenomics has increasingly occurred (i.e., investigations of how genes affect a person’s response to drugs). The goal is to develop medications and drug regimens tailored to the genetic makeup of a person.

Pharmacokinetics can be defined as the study of the time course of drug absorption, distribution, metabolism, and excretion. Study of a drug’s pharmacokinetics may progress throughout a clinical development program. This can occur in separate studies or as part of larger trials to determine efficacy and safety. These studies are necessary to assess the clearance of the drug and to anticipate possible accumulation of the drug or its metabolites and the potential for drug-drug interactions. Assessing pharmacokinetics in subpopulations, such as those with impaired renal function or hepatic failure, or in very young children is another important aspect of this phase of studies. Pharmacokinetics data are usually expressed using the following terms :

  • Area under the time-concentration curve (AUC, or AUC 0-24 if done over a 24-hour period) is a measure of the total amount of drug absorbed; it is frequently estimated after the drug has reached steady-state levels.

  • Peak concentration (C max ) is the maximum concentration reached at a particular dosage.

  • Time to peak concentration (T max ) is used together with C max to measure the rate of absorption.

  • Cumulative percentage of drug recovered (A c %) usually relates to urine data and is the cumulative amount of drug recovered over a specific period (e.g., 24 hours) divided by the initial dose.

  • Elimination (or terminal) half-life (t 1/2 ) is a measure of how long it takes to clear a drug from the system.

Pharmacodynamics is the study of the physiological effects of drugs on the body and the mechanisms of drug action. These studies also typically observe the relationship of drug blood levels to clinical response or to adverse drug events. They may provide early estimates of drug activity and potential efficacy, and they help to establish the dosage regimen used in later phases of drug development.

Phase I studies also provide estimation of initial drug safety and tolerability. Drug safety refers to the frequency of adverse drug effects (i.e., physical or laboratory toxicity that could possibly be related to the drug) that are treatment emergent (i.e., they develop during treatment and were not present before treatment), or they become worse during treatment compared with the pretreatment state.

Drug tolerability refers to how well subjects are able to tolerate overt adverse drug effects. An adverse drug effect is distinguished from an adverse event (or experience), which refers to any untoward experience that occurs while a patient is receiving the medication, whether or not it is attributable to the drug. The seriousness of an adverse event dictates how quickly it must be reported to regulatory agencies and to others who may have ongoing experimental protocols. A serious adverse event is defined as one that results in death, is life-threatening, requires inpatient hospitalization or prolongation of existing hospitalization, results in persistent or significant disability or incapacity, or is a congenital anomaly or birth defect. Investigators conducting pharmaceutical industry–sponsored studies should be aware that companies may have their own, more strict definition of serious adverse events. The term severity is distinguished from serious in that severity refers to the intensity of a specific event, whereas serious refers to the outcome or consequences of the event.

Phase II

Phase II studies are the earliest attempt to establish efficacy in the intended patient population. Many are called therapeutic exploratory studies and form the basis for later trials. The hypotheses may be less well defined than in later studies. These studies may use a variety of different types of study design, including comparisons with baseline status or concurrent controls. In these studies, the eligibility criteria are typically very narrow, leading to a homogeneous population that is carefully monitored for safety. Further studies may establish the drug’s safety and efficacy in a broader population after it is determined that a drug does have activity.

In phase II, another aim is to determine more exactly the doses and regimens for later studies. Another important goal is to determine potential study end points, therapeutic regimens (including the use of concurrent medications), and subsets of the disease population (mild versus severe).

Phase III

The primary objective of phase III studies is to confirm a therapeutic benefit. The most typical kind of study is the therapeutic confirmatory study, which provides firm evidence of an agent’s efficacy and safety. This type of trial always has a predefined hypothesis that is tested. These studies also estimate (with substantial precision) the size of the treatment effect attributable to the drug. Typical for phase III studies are blinding and randomization of treatment allocation. Also incorporated in phase III development are further exploration of the dose-response relationship, study of the drug in a wider population and in different stages of the disease, and the effects of adding other drugs to the agent being investigated. These studies continue to add information to the accumulating safety database.

Phase IV

Phase IV postmarketing surveillance and pharmacovigilance studies aim to accumulate longer-term safety data from large numbers of subjects followed for extended periods, even after the drug has been discontinued in the patient. These types of studies begin after the drug reaches the market and extend the prior demonstration of the drug’s safety, efficacy, and dose. The most frequent phase IV study is one of therapeutic use, where it is shown how the drug performs when used in the everyday setting, by patients who may have comorbid conditions or are taking a host of concurrent medications or both. An example of the importance of postmarketing surveillance studies is the discovery of the association between rofecoxib use and increased risk of cardiac events. The FDA has published a guidance document for designers of surveillance and vigilance studies.

Heterogeneity and Investigation in Children

The specific population to be studied is delineated by the inclusion/exclusion criteria. Developers of trials must attempt to reach a compromise between limiting the heterogeneity of the sample and not making the criteria so strict that recruitment of eligible subjects becomes untenable or threatens to restrict the generalizability of results.

The heterogeneity of the patient population that would be allowed to enroll in the trial is influenced by the phase of development. Early exploratory studies are often concerned with whether a drug has any effect whatsoever. In these trials, one may use a very narrow subgroup of the total patient population for which the agent may eventually be labeled. Later phase confirmatory trials typically relax the eligibility criteria to allow for a broader, more heterogeneous sample of the target population. Still, if the criteria for enrollment are too broad, interpretation of treatment effects becomes difficult.

Investigations in children typically are conducted after considerable data have been gathered in an adult population with a similar disease. If clinical development includes children, it is usually appropriate to start with older children before extending the studies to younger children. The exception to the “adults first” rule is when a medication is developed to treat a condition that occurs only in childhood.

The procedures for the development and testing of drugs for children are far from satisfactory; many drugs used to treat children are licensed for use only in adults, drugs are often unavailable in formats suitable for children, and clinical trials involving children raise complex ethical issues. The use of adult products at lower doses or on a less-frequent basis may pose risks to children, as may the use of unlicensed and off-label medicines.

Two U.S. federal acts now mandate that drugs initially developed for use in adults be studied and labeled for children. The Pediatric Research Equity Act (PREA) (formerly referred to as the Pediatric Rule ) requires manufacturers to assess the safety and effectiveness of a drug or biologic product in children if the disease for which the drug was developed in adults also occurs in children. The Best Pharmaceuticals for Children Act (BPCA) provides manufacturers with pediatric exclusivity incentives, provides a process for “off-patent” drug development, and requires pediatric study results be incorporated into labeling. For special issues relevant to trials in children, the reader is referred to the FDA guidance document on pediatric research. In 2012, both the BPCA and PREA were made permanent and no longer require renewal by Congress.

Similar legislation was introduced by the European Medicines Agency (EMA) in 2007. To obtain the right to market their medications for use in adults within the European Union, companies are now required to study medicines in pediatric subjects and develop age-appropriate formulations. A Pediatric Drug Committee (PDCO), based at the EMA, is responsible for agreement of the Pediatric Investigation Plan (PIP) with the companies. The PIP contains a full proposal of all the studies and their timings necessary to support the pediatric use of an individual product. More recently, the FDA also provided additional guidance for the industry about pediatric study plans development in line with what has been proposed by EMA, reflecting the updates as were provided in the FDA Safety and Innovation Act (FDASIA) of 2011. As a reward or incentive for conducting these studies, companies are entitled to extensions of patent protection and market exclusivity.


The study’s phase, objectives, ethics, and feasibility influence the specific design of a trial. New designs have appeared in recent years that reduce the time during which children receive placebo or a known inferior medication. More than one type of design may be used to answer the same question. If the same results and conclusions are reached regardless of the design and analysis used, the results are said to show robustness. Although a study may be designed as being “pivotal,” it is rare that any single trial establishes incontrovertible evidence of an agent’s clinical worth.

Comparative and Noncomparative Studies

A comparative study implies that some type of comparison is made between the drug under investigation at a particular dosage level and a placebo, another dosage level of the investigational drug, or an active comparator (an existing drug known to be effective for the specific condition). Noncomparative trials involve no such comparisons with the investigational agent. Studies that compare the agent with placebo or an active comparator are called controlled studies. For a discussion and guidance on the proper selection of a control group, the reader is referred to a guidance document by the FDA. Studies that involve dose escalation or that compare the pharmacokinetics and pharmacodynamics for differing dosage levels of the same drug are not considered controlled in the usual sense.

Open Label Versus Blinded Studies

The phase I and IV studies are usually open label, meaning that everyone involved with the study, including the patient and the physician, knows what the patient is receiving. The chief purpose is to gather longer-term safety and efficacy data. Investigator-initiated protocols may also be open if the intent is simply to gather additional information about an agent in another disease or at a dosage level other than that indicated in the label. As one would expect, the possibility of bias in interpretation of safety and efficacy information with open studies is much greater than with blinded studies. Note that phase II and phase III studies may have an open-label extension phase, during which patients who took part in the comparative phase openly receive the investigational drug for an extended period.

Beginning either in late phase I or early phase II, blinded, controlled (comparative) studies are performed. Blinding refers to the masking of individuals involved in the assessment of the patient and, in some situations, of the data analyst. The purpose of blinding is to prevent identification of the treatment until any opportunity for bias has passed. These biases include (but are not limited to) decisions about whether to enroll a patient, allocation of patients, clinical assessment of end points, and approaches to data analysis and interpretation. Designs in which the assessor and the patient are blinded are called double-blind designs. Designs in which only the patient or only the assessor is blinded to the treatment are called single-blind designs. Studies should attempt to maintain blinding until the final patient has completed the study, although this has proved difficult in certain pediatric studies of severe diseases. Clinical studies in humans typically have a steering committee to provide oversight of the trial and a data safety and monitoring plan to provide ongoing monitoring. In large trials and in trials that carry more than minimal risk, the data safety and monitoring plan often includes the formation of a data safety and monitoring board, which meets regularly to assess trial safety, progress, and quality.

Certain studies present challenges to the maintenance of blinding because blinding is either unethical or impractical. Surgical versus nonsurgical interventions prevent the patient and the surgeon from being blinded because they know whether surgery was performed. In this situation, a blind assessor may be used to evaluate the patient’s condition. The blind assessor may be a physician, nurse, or other health professional who evaluates the patient’s response to treatment but is unaware of the treatment being given.

Double-dummy design to maintain blinding.

Another situation in which blinding of the patient is difficult is when the dosage administration regimen is different for two drugs being compared. An example in rheumatology is the comparison of methotrexate (administered once weekly) versus hydroxychloroquine (administered once daily). In this case, the double-dummy design can be a useful way to maintain the blind. In the example mentioned, patients who are to receive methotrexate take active methotrexate once per week and dummy hydroxychloroquine each day, whereas patients who are to receive hydroxychloroquine receive active hydroxychloroquine each day and dummy methotrexate once per week. Double-dummy designs are limited by ethical issues involving repeated infusions or other aggressive means of delivering the “dummy” agent.


The purpose of randomization is to introduce a deliberate element of chance into patient assignment to the treatment groups. Randomization reduces (but does not eliminate) the chance of an unequal distribution of known or unknown prognostic factors among the treatment groups. It also reduces possible bias in the selection and allocation of subjects. Many randomization schemes are currently employed. The simplest form of randomization is unrestricted randomization. Patients are assigned to one of two or more treatment groups by a sequential list of treatments. The list of treatments is known as the randomization schedule.

Blocked randomization is commonly used to ensure that equal numbers of patients are placed in each treatment group ( Table 6-3 ). Note that in Table 6-3 the assignment to groups is not sequential, but when the block is full, an equal number of patients will have been enrolled into each group. If the blocks are too small, there is a risk of unblinding. If the blocks are too large, they may not be completely filled, increasing the likelihood of unequal assignment to the groups. In more recent pediatric rheumatology studies involving two groups, block sizes of six to eight have been used. Clinical investigators are never made aware of block size during the trial.

Jun 30, 2019 | Posted by in RHEUMATOLOGY | Comments Off on Trial Design, Measurement, and Analysis of Clinical Investigations

Full access? Get Clinical Tree

Get Clinical Tree app for offline access