Review article
On the Proper Use of the Crossover Design in Clinical Trials
Part 18 of a Series on Evaluation of Scientific Publications
;
Background: Many clinical trials have a crossover design. Certain considerations that are relevant to the crossover design, but play no role in standard parallelgroup trials, must receive adequate attention in trial planning and data analysis for the results to be of scientific value.
Methods: The authors present the basic statistical methods required for the analysis of crossover trials, referring to standard statistical texts.
Results: In the simplest and most common scenario, a crossover trial involves two treatments which are consecutively administered in each patient recruited in the study. The main purpose served by the design is to provide a basis for separating treatment effects from period effects. This is achieved via computing the treatment effects separately in two sequence groups formed via randomization. The differences between treatment effects can be assessed by means of a standard ttest for independent samples using the intraindividual differences between the outcomes in both periods as the raw data. The existence of carryover effects must be ruled out for this method to be valid. This assumption is usually checked using a pretest, which is also described in this article. Finally, we briefly discuss the use of nonparametric tests instead of ttests and more complicated designs with more than two test periods and/or treatments.
Conclusion: Crossover trials in which the results are not analyzed separately by sequence group are of limited, if any, scientific value. It is also essential to guard against carryover effects. Whenever ignoring such effects proves unjustified, the treatment effect must be analyzed solely via an analysis of the data obtained during the first trial period. Even the use of this restricted dataset yields results whose validity is not beyond question.
The crossover design has a long history in the planning of scientific trials ([1], sect. 1.4) and forms the basis of a large number of clinical studies year after year. Trials in almost all clinical disciplines use the crossover design, but it accounts for a particularly high proportion of studies in the “CNS specialties”—neurology and psychiatry—and of trials on pain treatment. One example of the latter is the frequently cited study of the analgesic effect of synthetic cannabinoids (2). This was a classic crossover trial involving a total of 21 patients with chronic neuropathic pain. In two consecutive treatment periods, both one week long, each patient received four or eight externally indistinguishable capsules daily. These capsules contained either placebo or dimethylheptylTHC11carbonic acid (CT3). The primary endpoint was the change in pain intensity at the end of each treatment period, measured using a visual analog scale (VAS).
The essential feature distinguishing a crossover trial from a conventional parallelgroup trial is that each proband or patient serves as his/her own control. The crossover design thus avoids problems of comparability of study and control groups with regard to confounding variables (e.g., age and sex). Moreover, the crossover design is advantageous regarding the power of the statistical test carried out to confirm the existence of a treatment effect: Crossover trials require lower sample sizes than parallelgroup trials to meet the same critieria in terms of type I and type II error risks.
To exploit these advantages to the full, a few specific pitfalls must be avoided in the planning and analysis of crossover trials. The two trial periods in which the patient receives the different treatments whose effects are being compared must be separated by a washout phase that is sufficiently long to rule out any carryover effect. In other words, the effect of the first treatment must have disappeared completely before the beginning of the second period. Researchers analyzing the data of crossover trials often proceed as though they were performing a simple pre/post comparison. Unfortunately this error can be observed time and time again, even in renowned journals (3–8). Crossover trials in which the paired ttest (or any other procedure for paired samples) was used for analysis are methodologically flawed and do not contribute to evidencebased evaluation of the treatments concerned.
Correct procedure for statistical analysis
The formal structure of a crossover trial for comparison of two treatments A and B is shown in Figure 1 (where A is placebo and B is CT3). The two phases that each patient has to complete in the course of the trial are usually referred to as the two study periods ([10], p. 79). The efficacy of A and B is assessed on the basis of the withinsubject difference between the two treatments with regard to the outcome variable. The crucial difference between a crossover trial and a simple study yielding paired observations is as follows: In planning a crossover trial, it must be taken into account that patients who receive treatment A in period 1 and treatment B in period 2 (or vice versa) may show systematic differences in outcome even when A and B have identical effects (e.g., when the same drug is given each time), because of time effects. As a consequence, researchers planning and analyzing a crossover trial have to take special precautions to avoid any confounding (11, 12) of treatment effects and period effects. A simple example of a period effect is familiarization with the study situation.
Main steps of confirmatory data analysis (Boxes 1 and 2)
Patients are assigned randomly to the two sequence groups A–B and B–A, comparison of which forms the basis for confirmatory analysis.
 The crucial variable for analysis is the withinsubject difference in outcome between the two study periods. In order to assess the difference between treatment effects, a statistically valid test for independent samples has to be carried out with the values obtained for this variable.
 The assumption that the washout phase was long enough to rule out a carryover effect should be checked in a preliminary test. To this end, the sum of the values measured in the two periods is calculated for each subject and compared across the two sequence groups by means of another test for independent samples. If this test yields a statistically significant result, the usual test for differences between the effects of the two treatments should not be applied.
Calculation of power and sample sizes, efficiency
As in any clinical study (17), the planning of a crossover trial should include a wellgrounded calculation of sample sizes, based on precise specification of the power of the test used to establish the primary hypothesis. In the case of the crossover design, this is the test for differences between the treatment effects. Planning of the trial will generally be done under the assumption that the washout phase is long enough to rule out carryover effects.
In principle, the procedure needed for calculation of power and sample sizes for a crossover trial is the same as that which is familiar from the ttest for unpaired samples (18). The sole difference lies in the specification of the assumptions under which a predefined power (e.g., 80%) should be attained (Box 3a).
One important question is whether the crossover design is superior or inferior in efficiency as compared with a standard twoarm study yielding data from one single study period. Efficiency here refers to the sample sizes required by the two designs to achieve the same power under otherwise identical conditions.
Under the usual statistical model assumptions for the parametric analysis of crossover trials (19), this question can be answered by means of the approximate equation shown in Box 3b. The formula implies that the crossover design is always the more efficient. Since the variance due to measurement error is generally smaller than that which can be ascribed to betweensubject variability, the difference is very often substantial. In a situation where the betweensubject variance is twice as large as that due to measurement error, for instance, six times as many patients are required to achieve the same power in a parallelgroup study as in a crossover trial. From the costefficiency viewpoint, however, it must be taken into account that the crossover design involves twice as many measurements per patient. Moreover, the time required for a crossover trial is increased because every patient has to complete two study periods separated by a washout phase.
Modifications and generalizations
The described confirmatory procedures based on unpaired tstatistics assume (approximate) normality of the distributions to be analyzed. Not infrequently, however, only a weaker model assumption seems realistic, according to which the variables under analysis have distributions of some unspecified form being common to both sequence groups. The medians of these distributions are assumed to decompose into a sum of terms representing the respective effects of treatment and period, as well as possible carryover effects. A strategy for confirmatory analysis whose validity is granted under these weaker conditions consists in replacing twosample ttests with Wilcoxon rank sum tests (20) throughout. Thus, the Wilcoxon test is used as a pretest to ascertain the negligibility of the carryover effects, with the subjectwise sums C_{1}(X), ..., C_{m}(X), C_{1}(Y), ..., C_{n}(Y) as data (as described, for example, in [13]), and similarly to test for differences between the treatment effects.
A modification of a much more fundamental kind concerning the comparative evaluation of the treatment effects comes into play whenever a crossover trial is carried out in order to establish the bioequivalence of two different formulations of the same drug product. In this scenario the “statistical logic” of the test is radically altered: The alternative hypothesis that the researchers are seeking to confirm now specifies that there is essentially no difference between the treatments (drug formulations) A and B. A systematic account of basic principles and important special procedures for testing for equivalence is given in Wellek (21). Furthermore, methods for the evaluation of equivalence studies will be the subject of a future article in this Series on Evaluation of Scientific Publications.
Another important modification, albeit relatively rarely employed in medical studies, is extension of the trial to more than two measurement periods. The number of periods need not then be identical with the number of treatments being compared. For bioequivalence studies, for example, a replicated crossover design with a total of four periods is recommended, with treatments A and B each given twice (22). As a rule the analysis of multiperiod crossover studies is relatively complicated and requires special software for linear regression models with mixed effects (1).
Discussion
The popularity of the crossover design for both clinical and experimental studies remains undiminished, and not infrequently the word “crossover” appears already in the title of the publication. In a much too high proportion of cases, however, the critical reader will realize that the statistical analysis of the results falls far short of the standards laid out here. The most common error is failure to accommodate stratification by sequence group in that the investigators proceed as it would be appropriate in analyzing a study with fixed order of treatments, performing a paired ttest or a Wilcoxon signedrank test. Proceeding in this way one takes the risk of putting the validity of the results of a crossover trial into question: In an extreme case, a significant result will solely mean that a pronounced period effect could be established, while the efficacy of the treatments in themselves was practically identical.
Another pitfall to be avoided in crossover trials presents itself right at the beginning: In the planning phase, it is crucial to make the washout phase long enough to definitively rule out a carryover effect from one treatment period to the next. The pretest performed as an initial step of the confirmatory analysis of the study data, essentially serves the purpose to reveal such a shortcoming in planning. Even the primary literature on applied statistics provides no conclusive answer to the question of how one should proceed when the pretest yields a significant result. For a long time the established biometric practice in presence of a significant carryover effect in a twoperiod crossover trial was to analyze the data from the first study period just as if it had been obtained from a conventional parallelgroup study. This procedure is still routinely followed, although it was shown more than 20 years ago that the unpaired ttest, used as part of such a twostage procedure, no longer exhibits its basic properties and may, under certain circumstances, become strongly anticonservative in the sense of markedly exceeding the target significance level (23).
Conflict of interest statement
The authors declare that no conflict of interest exists.
Manuscript received on 12 July 2011, revised version accepted on 10 November 2011.
Translated from the original German by David Roseveare.
Corresponding author
Prof. Dr. rer. nat. Maria Blettner
Institut für Medizinische Biometrie
Epidemiologie u. Informatik der
Johannes GutenbergUniversität
Obere Zahlbacher Straße 69
55131 Mainz
blettner@imbei.unimainz.de
2^{nd} edition. Boca Raton: Chapman & Hall/CRC 2003.
cholinesterase inhibitor on reducing falls in Parkinson disease.
Neurology 2010; 75: 1263–9. CrossRef MEDLINE PubMed Central
size calculation in clinical trials—part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(31–32): 552–6. VOLLTEXT
Prof. Wellek, Prof. Blettner
1.  Jones B, Kenward MG: Design and analysis of crossover trials. 2^{nd} edition. Boca Raton: Chapman & Hall/CRC 2003. 
2.  Karst M, Salim K, Burstein S, Conrad I, Hoy L, Schneider U: Analgesic effect of the synthetic cannabinoid CT3 on chronic neuropathic pain. A randomized controlled trial. JAMA 2003; 290: 1757–62. CrossRef MEDLINE 
3.  Ganesan A, CrumCianflone N, Higgins J, et al.: High dose atorvastatin decreases cellular markers of immune activation without affecting HIV1 RNA levels: results of a doubleblind randomized placebo controlled clinical trial. J Infect Dis 2011; 203: 756–64. CrossRef MEDLINE PubMed Central 
4.  Davis AR, Westhoff CL, Stanczyk FZ: Carbamazepine coadministration with an oral contraceptive: effects on steroid pharmacokinetics, ovulation, and bleeding. Epilepsia 2011; 52: 243–7. MEDLINE PubMed Central 
5.  Black KJ, Koller JM, Campbell MC, Gusnard DA, Bandak SI: Quantification of indirect pathway inhibition by the adenosine A2a antagonist SYN115 in Parkinson disease. J Neurosci 2010; 30: 16284–92. CrossRef MEDLINE PubMed Central 
6.  Mellor DD, Sathyapalan T, Kilpatrick ES, Beckett S, Atkin SL: Highcocoa polyphenolrich chocolate improves HDL cholesterol in Type 2 diabetes patients. Diabet Med 2010; 27: 1318–21. CrossRef MEDLINE 
7.  Chung KA, Lobb BM, Nutt JG, Horak FB: Effects of a central cholinesterase inhibitor on reducing falls in Parkinson disease. Neurology 2010; 75: 1263–9. CrossRef MEDLINE PubMed Central 
8.  Page TH, Turner JJ, Brown AC, et al.: Nonsteroidal antiinflammatory drugs increase TNF production in rheumatoid synovial membrane cultures and whole blood. J Immunol 2010; 185: 3694–701. CrossRef MEDLINE 
9.  Kabisch M, Ruckes C, SeibertGrafe M, Blettner M: Randomized controlled trials: part 17 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011; 108(39): 663–8. 
10.  Lehmacher W: Verlaufskurven und Crossover. Statistische Analyse von Verlaufskurven im ZweiStichprobenVergleich und von CrossoverVersuchen. In: Überla K, Reichertz PL, Victor N (eds.): Medizinische Informatik und Statistik, Vol 67. Berlin: Springer 1987. 
11.  Ressing M, Blettner M, Klug SJ: Data analysis of epidemiological studies: part 11 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(11): 187–92. VOLLTEXT 
12.  Sauerbrei W, Blettner M: Interpreting results in 2 x 2 tables: extensions and problems: part 9 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(48): 795–800. VOLLTEXT 
13.  du Prel JB, Röhrig B, Hommel G, Blettner M: Choosing statistical tests—part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(19): 343–8. VOLLTEXT 
14.  du Prel JB, Hommel G, Röhrig B, Blettner M: Confidence interval or pvalue? Part 4 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(19): 335–9. VOLLTEXT 
15.  GraffLonnevig V, Browaldh L: Twelve hours bronchodilating effect of inhaled formoterol in children with asthma: a doubleblind crossover study versus salbutamol. Clin Exp Allergy 1990; 20: 429–32. CrossRef MEDLINE 
16.  Senn S: Crossover designs. In: Armitage P, Colton T (eds.): Encyclopedia of biostatistics, Volume 2 . Chichester: John Wiley & Sons 1998: 1033–49. 
17.  du Prel JB, Röhrig B, Blettner M: Critical appraisal of scientific articles—part 1 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(7): 100–5 VOLLTEXT 
18.  Röhrig B, Prel JB du, Wachtlin D, Kwiecien R, Blettner M: Sample size calculation in clinical trials—part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(31–32): 552–6. VOLLTEXT 
19.  Grizzle JE: The twoperiod changeover design and its use in clinical trials. Biometrics 1965; 21: 467–80. CrossRef MEDLINE 
20.  Koch GG: The use of nonparametric methods in the statistical analysis of the twoperiod changeover design. Biometrics 1972; 28: 577–84. CrossRef MEDLINE 
21.  Wellek S: Testing statistical hypotheses of equivalence and noninferiority. 2^{nd} edition. Boca Raton: Chapman & Hall/CRC 2010. CrossRef 
22.  Food and Drug Administration (FDA): Guidance for industry: Statistical approaches to establishing bioequivalence. Rockville, MD: Center for Drug Evaluation and Research (CDER) 2001. 
23.  Freeman P: The performance of the twostage analysis of two treatment, two period crossover trials. Statistics in Medicine 1989; 8: 1421–32. CrossRef MEDLINE 

BMC Musculoskeletal Disorders, 202210.1186/s1289102205616y

EvidenceBased Complementary and Alternative Medicine, 202010.1155/2020/2416582

Antimicrobial Agents and Chemotherapy, 201410.1128/AAC.0214713

Journal of the American Society of Nephrology, 202010.1681/ASN.2019090905

JMIR mHealth and uHealth, 202010.2196/16937

Brain Sciences, 202110.3390/brainsci11081047

Nutrients, 201910.3390/nu11081761

European Journal of Applied Physiology, 202110.1007/s00421021046696

Nutrients, 202110.3390/nu13072204

Quantitative Finance and Economics, 202210.3934/QFE.2022018

2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA51224.2020.00062

Children, 202110.3390/children8070583

EvidenceBased Complementary and Alternative Medicine, 202110.1155/2021/9981183

Trials, 202110.1186/s13063021054682

European Journal of Sport Science, 202110.1080/17461391.2021.1963322

Archives of Disease in Childhood  Fetal and Neonatal Edition, 201810.1136/archdischild2017313190

Deutsches Ärzteblatt international, 201210.3238/arztebl.2012.0674

Pharmaceutics, 201910.3390/pharmaceutics11120664

BMJ Open Sport & Exercise Medicine, 202010.1136/bmjsem2019000714

Journal of Dietary Supplements, 202110.1080/19390211.2020.1832640

Frontiers in Neuroscience, 201910.3389/fnins.2019.00259

Nutrients, 202010.3390/nu12123778

Book, 201910.1007/9783319960982_82

Frontiers in Veterinary Science, 202110.3389/fvets.2021.644836

BMJ Open, 202010.1136/bmjopen2020039691

Nutrients, 201810.3390/nu10050574

European Journal of Nutrition, 202110.1007/s00394020023742

BMJ Open Science, 202110.1136/bmjos2020100126

Transfusion Medicine and Hemotherapy, 201410.1159/000354336

Deutsches Ärzteblatt international, 201610.3238/arztebl.2016.0634

Deutsches Ärzteblatt international, 201910.3238/arztebl.2019.0453

EvidenceBased Complementary and Alternative Medicine, 202210.1155/2022/5034572

BMC Neuroscience, 202010.1186/s12868020006041

Integrative Cancer Therapies, 202110.1177/15347354211058449

Journal of Autism and Developmental Disorders, 202210.1007/s1080302105136z

European Journal of Nutrition, 202010.1007/s00394019019194

Applied Physiology, Nutrition, and Metabolism, 202010.1139/apnm20190641

Cognitive, Affective, & Behavioral Neuroscience, 202010.3758/s13415020008215

JMIR Research Protocols, 202110.2196/31541

Clinical Journal of the American Society of Nephrology, 201810.2215/CJN.00380118

EuroIntervention, 202010.4244/EIJD1801067

Scientific Reports, 202110.1038/s41598021941192

BMJ Open, 201910.1136/bmjopen2018026478

Frontiers in Psychology, 202110.3389/fpsyg.2021.588418

Korean Journal of Anesthesiology, 202110.4097/kja.21165