52 Power and Sample Size Calculation

10.1055/b-0035-122052

52 Power and Sample Size Calculation

Nan van Geloven, Rob de Haan, Marcel Dijkgraaf, Michael Tanck, Johannes B. Reitsma

52.1 Why Perform a Sample Size Calculation Prior to a Study?

Why do we want to determine the sample size before a study starts? The importance of a sample size calculation is based on ethical grounds. If the number of subjects tested in a study is too small to pick up a possible effect in the population, study subjects are tested in vain. The study will easily result in a false-negative conclusion. On the other hand, testing too many subjects may also lead to undesirable situations. If an intervention turns out to be effective, too many subjects have missed out on this intervention. If the intervention is not effective, too many have been exposed to this ineffective intervention. For these reasons, a trial should always consider what number of subjects would be appropriate to answer the study question. Sample size calculations prior to a study can help focus on the number of subjects that is needed and sufficient for a study. Moreover, a sample size calculation helps one to focus on a clinically relevant effect, instead of the erroneous strategy of testing as many subjects as needed to reach statistical difference of an irrelevant effect.

For clinical studies, several regulatory authorities demand a sample size calculation before the start of inclusion of subjects. The CONSORT statement (guideline for reporting clinical trials) states that a researcher should calculate study size beforehand and should report this calculation in the methods section of the resulting scientific paper. For experimental research, the guidelines are less strict, although all animal ethics committees will require a detailed sample size calculation before approving a research protocol. Finally, the logistic planning of a study benefits from a sample size calculation and helps you as a researcher to perform your studies in a more controlled manner.

52.2 What is Power and Statistical Significance?

The term “power” pops up everywhere in medical research, certainly in sample size calculations. Often, the term power is interpreted as a synonym for the number of patients tested in a study. “Our study did not have enough power to control for possible confounders” is understood as “you didn′t test enough patients to account for several effects.” “Our study had 80% power to detect an odds ratio of 1.1 at a significance level of 5%” is understood as “you have tested enough patients to pick up a possible effect.”

Although these interpretations are not (absolutely) wrong, in order to use the concept of power in a sample size calculation, we need to understand its exact meaning. Formally: the power of a study testing the null hypothesis H₀ against the alternative hypothesis H₁ is the probability that the test (based on a sample from this population) rejects H₀, given H₀ is false (in the whole population).

So the power is the chance of correctly rejecting a null hypothesis (rejecting a null hypothesis given it should be rejected). Because in most tests, H₀ is stated as “no difference between groups or no effect of intervention” (e.g., H₀ = “no difference in survival between treated and control group”), rejecting H₀ means you have reason to believe there is a difference. In other words, the power reflects the ability to pick up an effect that is present in a population using a test based on a sample from that population (true positive).

The power of a study is closely related to the so called type II error (β), the probability of falsely accepting H₀. The power of a study is 1 – β, so it is the probability of rightfully rejecting H₀ (Table 52.1).

**Table 52.1** **Possible Conclusions and Errors of a Study in Relation to the Truth**
		Whole population
		Effect exists H ₁ is true	No effect exists H ₀ is true
Study conclusion	Effect observed H ₁ appears true	True positive Power (1 – β)	False-positive Type I error (α)
Study conclusion	No effect observed H ₀ appears true	False-negative Type II error (β)	True negative (1 – α)

The significance level α is stated in Table 52.1. Alpha is the probability of falsely rejecting H₀ (i.e., falsely picking up an effect [false-positive]). Note that α only concerns situations in which no true effect exists in the population.

In a sample size calculation, one determines the number of patients needed to test the hypothesis with large enough power and small enough significance level. In this way, one protects oneself against false-negative and false-positive conclusions.

52.3 What Information is Needed to Calculate a Sample Size?

To make a sample size calculation, one will need information about each of the following values:

Desired power of the study 1-β. How much power do you want in the study? Or, stated differently, how certain do you want to be of preventing a type II error?
Desired significance level α. How certain do you want to be of preventing a type I error?
Desired test direction. One- or two-sided test?
Clinically relevant (or expected) difference. Which difference or which effect are you trying to find?
Expected variance/standard deviation. How much variation is expected in subjects belonging to the same study group?
Test to be used in statistical analysis. How will the hypothesis test be performed in the analysis phase of the study?
Attrition rate. Anticipate on the number of included subjects who will not be available for the study analysis.