Issues in Study Design and Reporting

, Emmanuel Lesaffre2, 3 and Yusuf Yazici4



(1)
Division of Rheumatology, Department of Medicine, Cerrahpasa Medical Faculty, University of Istanbul, Istanbul, Turkey

(2)
Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands

(3)
L-Biostat, KU Leuven, Leuven, Belgium

(4)
Division of Rheumatology, NYU Hospital for Joint Diseases, New York University, New York, USA

 



We plan to deviate somewhat in this chapter from the usual accounts of ethical issues in medicine. We are mainly concerned here with ethical issues as they relate to study design and reporting. Secondly, we propose that ethical transgressions as they relate to medicine can be classified into two main categories. We like to call the first category pudendal and the second cerebral. Table 1 gives a brief, but incomplete list of what we mean.


Table 1
Some Pudendal and the Cerebral Forms of Ethical Transgression






















Pudendal

Cerebral

Direct bribery to promote drugs and medical equipment

Conducting/participating in unnecessary drug trials for the main purpose of promotion

Indirect bribery in the form of gifts, lavish meals and meeting facilities

Adjusting control groups in the direction of confirming the study hypothesis, i.e. exclusion of effective comparators in drug studies

Straightforward plagiarism or falsification of data

Many forms of statistical misconduct in data analysis and presentation, including resorting to pseudoscience in the latter

Disregarding patient autonomy in drug trials mainly in the form of conducting them among the less privileged by their geography and/or social status

Preparing practically incomprehensible informed consent forms which only please the corporate lawyer

The distinguishing characteristic of the pudendal form is that there is actually no real debate in the medical community that they represent ethical transgressions and this awareness is shared with the public. This is much akin to the US Supreme Court justice P. Stewart’s well known definition of pornography “But I know it when I see it…” [1]. On the other hand those which we like to call cerebral are perhaps more subtle. Not only the lay public is rather unaware of most, but we are afraid that a substantial portion of our colleagues in the medical community do not readily recognize them as transgressions and/or fully appreciate their importance. Furthermore, and with some wishful thinking, we reason that, as every seasoned hunter knows, one must always aim a bit higher to hit a target. So overcoming a cerebral problem might help us in overcoming the pudendal as well.


1. Ethical Issues in Study Design


To remind ourselves briefly, clinical studies can either be observational or interventional. In turn, either type can be retrospective (usually observational), prospective (most frequently interventional) or cross-sectional (always observational).

The main ethical issue in study design, and this surely extends into the reporting phase, centers around whether the investigators tailor their work solely in the direction of the results they wish to observe and promote. Leaving aside the pudendal examples of painting the experimental mice in the direction of what they want to see as in the famous and notorious Summerlin example [2], there are many, at first sight seemingly acceptable, ways of proving oneself right.


Informed Consent


In any study involving humans the informed consent is the basis of our morality and legitimacy. Good accounts of its history, application and legal dimension are available [3]. Here we plan to emphasize several specific issues which we believe are of current and important concern in practice and research.

There is the prevailing opinion that the informed consent forms have become unduly long [4]. We agree, but unfortunately are unaware of formal data. The consent forms we have to ask our patients to sign are mainly prepared by the pharmaceutical companies or central grant giving bodies by an army of lawyers or bureaucrats for whom the main concern is, we strongly suspect, to avert any potential law suits.

Another issue is related to continuation studies which are popular especially in drug industry sponsored trials of their products. The usual scenario is that a certain drug turns out to be effective in a, let us say 12 months, study against placebo and/or the conventional treatment. At the end of this time the investigators and/or the sponsors decide to continue with the study often in an open or sometimes in a blind design. The common arguments for this continuation are to assess the beneficial effects over a longer time period, to assess any potential side effects, again over a longer time period; and to continue prescribing effective medication still again for a longer time period to patients who participated in the study. The downside, however is that perhaps no important additional data are collected from these continuation studies and at least some of these studies are mainly seeding studies for promotional purposes [5, 6]. Frequently it is neglected to include the second informed consent that deals with the extension arm of the study, in the trial report. The important issue here is whether the document explicitly warns patients that there is a chance that they agree to continue taking a drug which was just proven to be inferior to the new remedy. This scenario does not surely comply with the dictum of equipoise and patient autonomy. Some years ago we had explicitly brought this up as related to a well-known trial in rheumatology and the rather unfortunate investigator/ sponsor reply was that this should not cause any concern because “Patients and investigators were always free to stop participation in the trial” [7, 8].

The exact wording of the informed consent form is of special importance in safety trials. In this line and to the best of our knowledge there are at least two currently ongoing trials of assessing the cardiovascular safety of celecoxib, the well-known Cox 2 inhibitor, versus conventional non steroidal anti-inflammatory drugs (NSAIDs). One is in the USA: the Prospective Randomized Evaluation Of Celecoxib Integrated Safety vs Ibuprofen Or Naproxen (PRECISION)) and the other one in Europe: The Standard Care Versus Celecoxib Outcome Trial (SCOT). The rational as well as the methodology of these trials have been published in peer reviewed journals [9, 10]. Both are non-inferiority trials with the primary outcomes being the occurrence of fatal or non-fatal myocardial infarction, or stroke during the study period. This design is based on the hypothesis that there will not be meaningfully more of these outcome events among the celecoxib as compared to the traditional NSAID users. A legitimate question is then how one gets an informed consent for such a trial? Or more simply how does one tell the patient:

‘You will be taking the medication A (the traditional NSAID) which is associated with a small extra risk of heart attack and/or a stroke. Here is this medication B (celecoxib), which is not only more easy on your stomach but does not cause meaningfully more heart attacks or stroke as compared to medication A. The main reason you are being enrolled in this trial is to formally test that the medication B does not substantially increase your chance of having more heart attacks or stroke as compared to medication A.’

No osteoarthritis patient has to take celecoxib or an NSAID, and may very well seek and get pain relief with weight loss, physical therapy, or simple analgesics like paracetamol. How is it then morally possible to expose a patient to a drug the use of which is associated with, however small, but potentially increased risk of cardiovascular events or stroke? Moreover how can this potential increase be worded in the informed consent document that thousands of patients have to sign before they take part in these studies? This exact issue was brought up some years ago with no satisfactory answer from the primary investigator [11].

More transparency is an impending need in informed consent forms as we have previously proposed [12]. The exact wording of these forms is currently only available (apart from the sponsors and the federal agencies) to the investigators, study participants and institutional review boards. When finally the related article appears, the readers, the journals and very importantly the reviewers, have no way of judging whether there had been any breach of patient autonomy. Our proposal is that a copy of all informed consent forms (and subsequent alterations) should be in the public domain. Any consideration of trade secrets or printed page limitations are not realistic since these forms are anyhow seen by the patients. A potential journal page allocation concern, on the other hand, cannot be of practical consequence in this digital age, either.

Our suggestion for transparency would discourage the industry and the investigators from embarking on promotional trials [13], i.e. Were the potential trial patients actually told this agent had been shown to be an effective medicine to start with? Finally our proposal would also possibly help to prevent the well known pudendal forms of ethical transgressions in drug research in underdeveloped regions of the globe.


Control Groups


A common way of proving oneself right is not giving due importance to the specificity of the observations made. That is, in many studies, often observational, the researchers focus on underpinning their prior beliefs, thereby neglecting the possibility that the initial reflections may be wrong. The litmus test for the specificity of what is observed is, of course, the control group.

The main reason for the inclusion of a control group, or groups, in any study is to assess specificity. If you are studying disease A and you hypothesize that an asset X is characteristically present among patients with disease A it is fairly obvious that you have also to look for the presence of X among healthy people and patients with diseases other than A. Smaller the frequency of the asset X in these groups (your control groups) more specific X becomes for disease A. The other (but often neglected) reason for having a control group is whether there is a need to assess the validity of a new measuring device. Suppose a laboratory method has been used to assess hypercoagulability in disease A and found it increased. Then, when our aim is to assess hypercoagulability this time in disease B, we might again include a group of patients with disease A. The behavior of the diagnostic test in the control group will then greatly help us to interpret our findings when we study disease B.

Above we plead to include a control group in almost every study, but the question is how the control group should be chosen. Too often the control group consists of healthy people only. The solo healthy control group design is quite inadequate in that this design does not tell us anything about the specificity of the observations we make for the disease we are studying. At the end of the study we might indeed observe differences between the disease being studied and the healthy controls in any parameter we have studied. However this does not mean the differences we observe are specific to that disease. In order to say that, we have to study other diseases (the diseased controls) and not observe these same changes. This specificity is crucial not only for diagnostic tests but, almost equally important, for our understanding of disease mechanisms. In a survey of publications about Behçet’s syndrome among 282 full articles reporting original research in 15 high impact factor general medicine and subspecialty journals we saw that 9.3 % had not included a control group while a 58 % had included only healthy controls. In addition, this survey revealed a strikingly low frequency, 6/37 (12.8 %), of diseased controls in genetic association studies [14]. It is noteworthy that the authoritative STREGA (Strengthening the Reporting of Genetic Association Studies) which is an extension of the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) position paper does not include the inclusion of diseased controls as an important issue [15].

Let us see now briefly consider how what we discuss relates to the important issue of the discovery of the association of pyrin (MEFV) mutations and Familial Mediterranean Fever (FMF). This association was first described by two independent research consortiums in 1997, using positional cloning [16, 17]. It is to be noted that around one half of all patients with FMF do not have a family history of the same disease. So from the very start we did not know how the described association applied to non familial FMF patients. Furthermore, as is true for the vast majority of association studies, there were no diseased controls. Another issue was that these described pyrin mutations could be present in the heterozygote form in a condition which, in the majority of cases had a recessive pattern of inheritance. Up until recently these caveats were basically ignored and most of the articles about FMF began with a sentence to the effect that FMF was a disease caused by pyrin mutations. These assertive sentences did not even acknowledge that there might be other additional causes(s) of FMF, obviously including mutations in other genes working in tandem, as well.

Since 1997, we, however, understood that these pyrin mutations can also be increased in many other diseases ranging from Behçet’s syndrome to endometriosis [18, 45], also being associated with disease severity in the former.

There is little doubt that pyrin is a very important molecule in the inflammatory cascade. However, we also know that the presence of pyrin mutations is not specific for FMF, let alone be causative. To complicate the issue further we also now acknowledge that the genetics of FMF is much more involved than we once we thought [20]. Some years ago when one of us (HY) had proposed the lack of diseased controls in the original MEFV work had hindered the advancement of FMF research for over a decade [21] one reviewer had commented “Had the initial FMF-pyrin work included diseased controls, for example patients with other auto or otherwise inflammatory conditions, the students of FMF would have been where they are now quite a number of years ago. We would respond that had the two FMF consortia used a Behçet’s or Crohn’s disease control group, they might still be searching for the gene now”. This praise of non-specificity by the reviewer was most curious.

The mere presence of diseased controls in a study design does not guarantee that the specificity issue is adequately addressed. The ultimate guideline in selecting the control groups is , as much as possible, to assess the specificity of our observations for that clinical condition we are studying. So if our aim is to search for the specificity of a laboratory biomarker in one disease we should also search for it in a) among healthy people; b) among patients with diseases with clinical manifestations that are similar to the disease we are studying; and c) in other diseases which are not clinically similar but in which we strongly suspect our new laboratory biomarker will turn out to be positive. We should also not forget that our search for specificity does not end even after we include these 3 types of control groups. Very often we still have to check both the sensitivity and the specificity of the new biomarker among the young and the old, among the severe and the less severe and among patients from different ethnic or social backgrounds.

When we design a study to search for the specificity for a specific serological marker of inflammation (A) for patients with rheumatoid arthritis, it gives us limited information to utilize a group of patients with osteoarthritis as a diseased control group. A much more informative group would be patients with another disease which runs with systemic inflammation, like systemic lupus erythematosus (SLE) or ankylosing spondylitis. In this example, the business of selecting osteoarthritis as a control group for rheumatoid arthritis is called an under matching of the control group [22]. This undermatching exists where it is, or should be, obvious to the investigator that the specificity of the asset that is studied will turn out to be excellent, merely by the virtue of the fact that the clinical and the laboratory features of the diseased control group selected is inherently quite different from those of the disease under study. Since osteoarthritis is basically a non-inflammatory disease it would be most unlikely to observe an inflammatory marker to be present in osteoarthritis. Therefore, your under matching of the control group would cause you to erroneously consider what you had observed in RA was pretty specific for RA. An over matching of the control group, on the other hand, exists where it will, by definition, be difficult for the investigator to determine the specificity of a parameter for a disease A by the virtue of the fact that the selected control group will inherently have many features in common with the disease A. Let us assume that based on several individual case reports we suspect that patients with Behçet’s syndrome (BS), in general, have enlarged prostate glands. Reasoning that almost every patient that attends a urology clinic has his prostate examined we consider that this would be a diseased control group very easy to study. Hence, we take the whole urology outpatient population for one full week as our diseased control group and examine every patient for his prostate size. This is surely overmatching the control group in that most patients who go to a urology clinic have enlarged prostates to start with.

Similarly, in randomized controlled trials of new remedies the way we choose our control group may influence considerably the results of the study. Particularly, undermatching of the traditional medication for efficacy is an important problem. This happens not uncommonly when you compare the efficacy of a new drug in group of patients using as the comparator another group of patients who you know as those who had not responded well to the traditional remedy. For example in the first randomized controlled trial of cyclosporine in BS from Japan [23] patients with resistant eye disease were randomized into receiving cyclosporine or colchicine. It is well known that all patients with BS receive colchicine from the start in Japan. So, in this particular example, the authors should have known that the eye disease in the group randomized to colchicine would not respond to this drug in any event since they had already used it before the trial had started with no effect on their eye disease. To randomize a portion of the patients to the same drug on which they had progressed to develop resistant disease is an obvious example of under matching.

An interesting example of undermatching can involve not only the control, but also the active arms of a randomized controlled trial. In this case we can say that all the groups were under matched to the hypothesis being tested. In a recent double blind randomized international trial of a new Syk inhibitor in rheumatoid arthritis, the authors reported the new medication was superior to placebo among patients who were non responsive to the traditional agent, methotrexate [24]. The design was a 3 arm study. All patients were using methotrexate at baseline at a dose ranging from 7.5-25.mg/week. One group received the new drug at a higher dose, the second group at a lower dose while the third group received placebo. At the end of the study the patients in both the low and the high dose groups did better than those who received placebo. Curiously, however, this good response was observed only among the patients from Eastern Europe and Latin America and not among those from the United States. We suggested that this might have been due to the fact that patients from Eastern Europe and Latin America had been on a lower dose of methotrexate before being called as “inadequate responders” to be enrolled in the trial as compared to US patients. In other words this proposed effect of the new agent may have disappeared if the patients had received the established effective dose of the traditional remedy before enrolling in the trial.

Under matching the control group may serve to prove a desirable outcome, as can be seen in the Actemra versus Methotrexate double-Blind Investigative Trial In monotherapy (AMBITION study) [25]. The aim of this trial was to compare the efficacy of monotherapy with tocilizumab or methotrexate in methotrexate naïve patients with rheumatoid arthritis. The primary outcomes were ACR responses. At the end of the trial, tocilizumab monotherapy was superior to monotherapy with MTX. This was a very interesting result not found, up to that time, with any other biologic in the same setting. In fact, a biologic monotherapy was generally similar in efficacy as MTX monotherapy while the combination of a biologic plus MTX was superior to either agent used alone when studied among MTX naïve patients with RA. Thus this trial in effect implied that tocilizumab was rather different in efficacy from other biologics. However, on a closer look one realized that about 35 % of patients had been on MTX prior to being enrolled in the trial and their MTX treatment had been discontinued prior to enrollment for reasons other than “inefficacy or adverse events”. But, why would anyone stop MTX if efficacious without adverse events? Also, it is rather bizarre to enroll patients to MTX, a drug they had taken before, and compare this to a group of patients starting a brand new drug and to conclude that the new drug is better than MTX. The efficacy of monotherapy in the AMBITION study could not be confirmed in the FUNCTION study looking at the same ACR responses, presented as an abstract in 2013 [26]. In this properly done MTX naïve trial, the ACR scores between monotherapy MTX and tocilizumab were not different, re-enforcing the paradigm that monotherapy with a biologic has similar efficacy to MTX and combination is better when tested among MTX naïve patients.


Power Calculations


This is another important issue in study design. We need power calculations basically to control the probability of a Type II error (denoted as β), which is simply missing a real difference between the two arms of a study, due to an inadequate sample size. To avoid this, calculations are made in the study design to select a minimum number of study subjects necessary to limit this error to say 0.20. The information needed for a sample size calculation are [27]:

1)

the anticipated clinically important difference between the effect of the primary outcome in the experimental treatment versus the control treatment. For continuous outcomes this is sometimes expressed as the effect size, which is the difference in treatment means of the primary outcome divided by the standard deviation of that outcome.

 

2)

The selected probability of Type I error (denoted as α)

 

3)

The aimed power, which is the probability of detecting the aimed clinical difference and is equal to 1-β.

 

An underpowered study raises ethical concerns, especially for interventional studies. It is simply not justified to expose any subject (patients, but also animals) to an intervention if the likelihood of demonstrating a beneficial effect is slight. This being the case it is strange that sample size determinations are quite uncommon in investigative rheumatology. In the methodological audit of Behçet ‘s syndrome publications [14] only 3.0 % of 280 original articles had any power calculations. Included among these original articles were 6 drug trials, which usually have the highest frequency of including a power calculation. Still, only 1/6 of these trials had a power calculation.


Equipoise


The concept of equipoise is an important principle in study design. In the context of a drug study equipoise simply means that prior to the study the investigators assume an equal chance of efficacy for the active and the control arms of the study. Equipoise, or mainly its lack in many randomized clinical trials, was the subject matter of an important debate a decade ago [28, 29].

Fries and Krishnan [28] argued that the main reason that practically only randomized clinical trials with positive results are published was not mainly due to publication bias as commonly assumed, but was the result of a design bias. By the time a new drug reaches Phase III in drug development, equipoise in the randomized controlled trial may not hold true anymore. In brief, from the accumulated evidence from Phases I and II the investigators are quite often fairly certain that the drug under study should work. Felson and Glantz, on the other hand, disagreed [29] and maintained that publication bias, rather than the lack of equipoise, was the main reason that only trials with positive results got published.

We have the impression that the concept of equipoise, which usually is not adhered to in drug industry sponsored trials, is present more in publicly funded trials or in trials where the comparator is not a placebo but what is available as the best active drug.

As expected the debate about equipoise in controlled clinical trials continues and there is special concern that equipoise is can be jeopardized in trials with an adaptive design [30] in which the a priori assumption about the possible efficacy of the drug being tested changes during the course of the trial.


Popper’s Falsification


The main discussion about the cerebral ethics as it relates to study design is centered on the investigators adhering to the general principle of falsification as the main business of any scientific investigation. According to Popper [31] a hypothesis can only be accepted (temporarily) when a rigorous attempt to falsify it turns out to be unsuccessful. In brief, Popper’s philosophy tells investigators that the hypothesis they have formulated (through experience, knowledge, hard work and, on occasion, genius) can only be accepted when it has undergone an agonizing, deductive, and above all honest self-falsification. From this perspective any breach of self-falsification may be considered as an ethical transgression. To quote Feynman “For example, if you’re doing an experiment, you should report everything that you think might make it invalid–not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked–to make sure the other fellow can tell they have been eliminated.”[32].

There surely have been severe criticisms of Popper’s hypothesis driven self – falsification centered approach to scientific method [33, 34]. Namely, it has been argued that science does not progress through deductive self- falsification. To the contrary, it is the inductive reasoning based on accumulated knowledge and designing experiments to prove our hypothesis that keeps up the scientific progress. The Bayesian approach to knowledge accumulation is surely in this line (see chapter “A review of statistical approaches for the analysis of data in rheumatology”). Finally, the critics of Popper explicitly bring up the point that no investigator actually wants to disprove himself.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Nov 27, 2016 | Posted by in RHEUMATOLOGY | Comments Off on Issues in Study Design and Reporting

Full access? Get Clinical Tree

Get Clinical Tree app for offline access