On the Proper Use of the Crossover Design in Clinical Trials
Part 18 of a Series on Evaluation of Scientific Publications
Background: Many clinical trials have a crossover design. Certain considerations that are relevant to the crossover design, but play no role in standard parallel-group trials, must receive adequate attention in trial planning and data analysis for the results to be of scientific value.
Methods: The authors present the basic statistical methods required for the analysis of crossover trials, referring to standard statistical texts.
Results: In the simplest and most common scenario, a crossover trial involves two treatments which are consecutively administered in each patient recruited in the study. The main purpose served by the design is to provide a basis for separating treatment effects from period effects. This is achieved via computing the treatment effects separately in two sequence groups formed via randomization. The differences between treatment effects can be assessed by means of a standard t-test for independent samples using the intra-individual differences between the outcomes in both periods as the raw data. The existence of carryover effects must be ruled out for this method to be valid. This assumption is usually checked using a pre-test, which is also described in this article. Finally, we briefly discuss the use of nonparametric tests instead of t-tests and more complicated designs with more than two test periods and/or treatments.
Conclusion: Crossover trials in which the results are not analyzed separately by sequence group are of limited, if any, scientific value. It is also essential to guard against carryover effects. Whenever ignoring such effects proves unjustified, the treatment effect must be analyzed solely via an analysis of the data obtained during the first trial period. Even the use of this restricted dataset yields results whose validity is not beyond question.
The crossover design has a long history in the planning of scientific trials (, sect. 1.4) and forms the basis of a large number of clinical studies year after year. Trials in almost all clinical disciplines use the crossover design, but it accounts for a particularly high proportion of studies in the “CNS specialties”—neurology and psychiatry—and of trials on pain treatment. One example of the latter is the frequently cited study of the analgesic effect of synthetic cannabinoids (2). This was a classic crossover trial involving a total of 21 patients with chronic neuropathic pain. In two consecutive treatment periods, both one week long, each patient received four or eight externally indistinguishable capsules daily. These capsules contained either placebo or dimethyl-heptyl-THC-11-carbonic acid (CT-3). The primary endpoint was the change in pain intensity at the end of each treatment period, measured using a visual analog scale (VAS).
The essential feature distinguishing a crossover trial from a conventional parallel-group trial is that each proband or patient serves as his/her own control. The crossover design thus avoids problems of comparability of study and control groups with regard to confounding variables (e.g., age and sex). Moreover, the crossover design is advantageous regarding the power of the statistical test carried out to confirm the existence of a treatment effect: Crossover trials require lower sample sizes than parallel-group trials to meet the same critieria in terms of type I and type II error risks.
To exploit these advantages to the full, a few specific pitfalls must be avoided in the planning and analysis of crossover trials. The two trial periods in which the patient receives the different treatments whose effects are being compared must be separated by a washout phase that is sufficiently long to rule out any carryover effect. In other words, the effect of the first treatment must have disappeared completely before the beginning of the second period. Researchers analyzing the data of crossover trials often proceed as though they were performing a simple pre/post comparison. Unfortunately this error can be observed time and time again, even in renowned journals (3–8). Crossover trials in which the paired t-test (or any other procedure for paired samples) was used for analysis are methodologically flawed and do not contribute to evidence-based evaluation of the treatments concerned.
Correct procedure for statistical analysis
The formal structure of a crossover trial for comparison of two treatments A and B is shown in Figure 1 (where A is placebo and B is CT-3). The two phases that each patient has to complete in the course of the trial are usually referred to as the two study periods (, p. 79). The efficacy of A and B is assessed on the basis of the within-subject difference between the two treatments with regard to the outcome variable. The crucial difference between a crossover trial and a simple study yielding paired observations is as follows: In planning a crossover trial, it must be taken into account that patients who receive treatment A in period 1 and treatment B in period 2 (or vice versa) may show systematic differences in outcome even when A and B have identical effects (e.g., when the same drug is given each time), because of time effects. As a consequence, researchers planning and analyzing a crossover trial have to take special precautions to avoid any confounding (11, 12) of treatment effects and period effects. A simple example of a period effect is familiarization with the study situation.
Main steps of confirmatory data analysis (Boxes 1 and 2)
Patients are assigned randomly to the two sequence groups A–B and B–A, comparison of which forms the basis for confirmatory analysis.
- The crucial variable for analysis is the within-subject difference in outcome between the two study periods. In order to assess the difference between treatment effects, a statistically valid test for independent samples has to be carried out with the values obtained for this variable.
- The assumption that the washout phase was long enough to rule out a carryover effect should be checked in a preliminary test. To this end, the sum of the values measured in the two periods is calculated for each subject and compared across the two sequence groups by means of another test for independent samples. If this test yields a statistically significant result, the usual test for differences between the effects of the two treatments should not be applied.
Calculation of power and sample sizes, efficiency
As in any clinical study (17), the planning of a crossover trial should include a well-grounded calculation of sample sizes, based on precise specification of the power of the test used to establish the primary hypothesis. In the case of the crossover design, this is the test for differences between the treatment effects. Planning of the trial will generally be done under the assumption that the washout phase is long enough to rule out carryover effects.
In principle, the procedure needed for calculation of power and sample sizes for a crossover trial is the same as that which is familiar from the t-test for unpaired samples (18). The sole difference lies in the specification of the assumptions under which a predefined power (e.g., 80%) should be attained (Box 3a).
One important question is whether the crossover design is superior or inferior in efficiency as compared with a standard two-arm study yielding data from one single study period. Efficiency here refers to the sample sizes required by the two designs to achieve the same power under otherwise identical conditions.
Under the usual statistical model assumptions for the parametric analysis of crossover trials (19), this question can be answered by means of the approximate equation shown in Box 3b. The formula implies that the crossover design is always the more efficient. Since the variance due to measurement error is generally smaller than that which can be ascribed to between-subject variability, the difference is very often substantial. In a situation where the between-subject variance is twice as large as that due to measurement error, for instance, six times as many patients are required to achieve the same power in a parallel-group study as in a crossover trial. From the cost-efficiency viewpoint, however, it must be taken into account that the crossover design involves twice as many measurements per patient. Moreover, the time required for a crossover trial is increased because every patient has to complete two study periods separated by a washout phase.
Modifications and generalizations
The described confirmatory procedures based on unpaired t-statistics assume (approximate) normality of the distributions to be analyzed. Not infrequently, however, only a weaker model assumption seems realistic, according to which the variables under analysis have distributions of some unspecified form being common to both sequence groups. The medians of these distributions are assumed to decompose into a sum of terms representing the respective effects of treatment and period, as well as possible carryover effects. A strategy for confirmatory analysis whose validity is granted under these weaker conditions consists in replacing two-sample t-tests with Wilcoxon rank sum tests (20) throughout. Thus, the Wilcoxon test is used as a pre-test to ascertain the negligibility of the carryover effects, with the subject-wise sums C1(X), ..., Cm(X), C1(Y), ..., Cn(Y) as data (as described, for example, in ), and similarly to test for differences between the treatment effects.
A modification of a much more fundamental kind concerning the comparative evaluation of the treatment effects comes into play whenever a crossover trial is carried out in order to establish the bioequivalence of two different formulations of the same drug product. In this scenario the “statistical logic” of the test is radically altered: The alternative hypothesis that the researchers are seeking to confirm now specifies that there is essentially no difference between the treatments (drug formulations) A and B. A systematic account of basic principles and important special procedures for testing for equivalence is given in Wellek (21). Furthermore, methods for the evaluation of equivalence studies will be the subject of a future article in this Series on Evaluation of Scientific Publications.
Another important modification, albeit relatively rarely employed in medical studies, is extension of the trial to more than two measurement periods. The number of periods need not then be identical with the number of treatments being compared. For bioequivalence studies, for example, a replicated crossover design with a total of four periods is recommended, with treatments A and B each given twice (22). As a rule the analysis of multiperiod crossover studies is relatively complicated and requires special software for linear regression models with mixed effects (1).
The popularity of the crossover design for both clinical and experimental studies remains undiminished, and not infrequently the word “crossover” appears already in the title of the publication. In a much too high proportion of cases, however, the critical reader will realize that the statistical analysis of the results falls far short of the standards laid out here. The most common error is failure to accommodate stratification by sequence group in that the investigators proceed as it would be appropriate in analyzing a study with fixed order of treatments, performing a paired t-test or a Wilcoxon signed-rank test. Proceeding in this way one takes the risk of putting the validity of the results of a crossover trial into question: In an extreme case, a significant result will solely mean that a pronounced period effect could be established, while the efficacy of the treatments in themselves was practically identical.
Another pitfall to be avoided in crossover trials presents itself right at the beginning: In the planning phase, it is crucial to make the washout phase long enough to definitively rule out a carryover effect from one treatment period to the next. The pre-test performed as an initial step of the confirmatory analysis of the study data, essentially serves the purpose to reveal such a shortcoming in planning. Even the primary literature on applied statistics provides no conclusive answer to the question of how one should proceed when the pre-test yields a significant result. For a long time the established biometric practice in presence of a significant carryover effect in a two-period crossover trial was to analyze the data from the first study period just as if it had been obtained from a conventional parallel-group study. This procedure is still routinely followed, although it was shown more than 20 years ago that the unpaired t-test, used as part of such a two-stage procedure, no longer exhibits its basic properties and may, under certain circumstances, become strongly anticonservative in the sense of markedly exceeding the target significance level (23).
Conflict of interest statement
The authors declare that no conflict of interest exists.
Manuscript received on 12 July 2011, revised version accepted on 10 November 2011.
Translated from the original German by David Roseveare.
Prof. Dr. rer. nat. Maria Blettner
Institut für Medizinische Biometrie
Epidemiologie u. Informatik der
Obere Zahlbacher Straße 69
2nd edition. Boca Raton: Chapman & Hall/CRC 2003.
size calculation in clinical trials—part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(31–32): 552–6. VOLLTEXT
Prof. Wellek, Prof. Blettner
|1.|| Jones B, Kenward MG: Design and analysis of cross-over trials. |
2nd edition. Boca Raton: Chapman & Hall/CRC 2003.
|2.||Karst M, Salim K, Burstein S, Conrad I, Hoy L, Schneider U: Analgesic effect of the synthetic cannabinoid CT-3 on chronic neuropathic pain. A randomized controlled trial. JAMA 2003; 290: 1757–62. CrossRef MEDLINE|
|3.||Ganesan A, Crum-Cianflone N, Higgins J, et al.: High dose atorvastatin decreases cellular markers of immune activation without affecting HIV-1 RNA levels: results of a double-blind randomized placebo controlled clinical trial. J Infect Dis 2011; 203: 756–64. CrossRef MEDLINE PubMed Central|
|4.||Davis AR, Westhoff CL, Stanczyk FZ: Carbamazepine coadministration with an oral contraceptive: effects on steroid pharmacokinetics, ovulation, and bleeding. Epilepsia 2011; 52: 243–7. MEDLINE PubMed Central|
|5.||Black KJ, Koller JM, Campbell MC, Gusnard DA, Bandak SI: Quantification of indirect pathway inhibition by the adenosine A2a antagonist SYN115 in Parkinson disease. J Neurosci 2010; 30: 16284–92. CrossRef MEDLINE PubMed Central|
|6.||Mellor DD, Sathyapalan T, Kilpatrick ES, Beckett S, Atkin SL: High-cocoa polyphenol-rich chocolate improves HDL cholesterol in Type 2 diabetes patients. Diabet Med 2010; 27: 1318–21. CrossRef MEDLINE|
|7.|| Chung KA, Lobb BM, Nutt JG, Horak FB: Effects of a central |
cholinesterase inhibitor on reducing falls in Parkinson disease.
Neurology 2010; 75: 1263–9. CrossRef MEDLINE PubMed Central
|8.||Page TH, Turner JJ, Brown AC, et al.: Nonsteroidal anti-inflammatory drugs increase TNF production in rheumatoid synovial membrane cultures and whole blood. J Immunol 2010; 185: 3694–701. CrossRef MEDLINE|
|9.||Kabisch M, Ruckes C, Seibert-Grafe M, Blettner M: Randomized controlled trials: part 17 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2011; 108(39): 663–8.|
|10.||Lehmacher W: Verlaufskurven und Crossover. Statistische Analyse von Verlaufskurven im Zwei-Stichproben-Vergleich und von Cross-over-Versuchen. In: Überla K, Reichertz PL, Victor N (eds.): Medizinische Informatik und Statistik, Vol 67. Berlin: Springer 1987.|
|11.||Ressing M, Blettner M, Klug SJ: Data analysis of epidemiological studies: part 11 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(11): 187–92. VOLLTEXT|
|12.||Sauerbrei W, Blettner M: Interpreting results in 2 x 2 tables: extensions and problems: part 9 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(48): 795–800. VOLLTEXT|
|13.||du Prel JB, Röhrig B, Hommel G, Blettner M: Choosing statistical tests—part 12 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(19): 343–8. VOLLTEXT|
|14.||du Prel JB, Hommel G, Röhrig B, Blettner M: Confidence interval or p-value? Part 4 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(19): 335–9. VOLLTEXT|
|15.||Graff-Lonnevig V, Browaldh L: Twelve hours bronchodilating effect of inhaled formoterol in children with asthma: a double-blind cross-over study versus salbutamol. Clin Exp Allergy 1990; 20: 429–32. CrossRef MEDLINE|
|16.||Senn S: Crossover designs. In: Armitage P, Colton T (eds.): Encyclopedia of biostatistics, Volume 2 . Chichester: John Wiley & Sons 1998: 1033–49.|
|17.||du Prel JB, Röhrig B, Blettner M: Critical appraisal of scientific articles—part 1 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106(7): 100–5 VOLLTEXT|
|18.|| Röhrig B, Prel JB du, Wachtlin D, Kwiecien R, Blettner M: Sample |
size calculation in clinical trials—part 13 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2010; 107(31–32): 552–6. VOLLTEXT
|19.||Grizzle JE: The two-period change-over design and its use in clinical trials. Biometrics 1965; 21: 467–80. CrossRef MEDLINE|
|20.||Koch GG: The use of non-parametric methods in the statistical analysis of the two-period changeover design. Biometrics 1972; 28: 577–84. CrossRef MEDLINE|
|21.||Wellek S: Testing statistical hypotheses of equivalence and noninferiority. 2nd edition. Boca Raton: Chapman & Hall/CRC 2010. CrossRef|
|22.||Food and Drug Administration (FDA): Guidance for industry: Statistical approaches to establishing bioequivalence. Rockville, MD: Center for Drug Evaluation and Research (CDER) 2001.|
|23.||Freeman P: The performance of the two-stage analysis of two treatment, two period crossover trials. Statistics in Medicine 1989; 8: 1421–32. CrossRef MEDLINE|
Acute effects of a single bout of exercise therapy on knee acoustic emissions in patients with osteoarthritis: a double-blinded, randomized controlled crossover trialBMC Musculoskeletal Disorders, 202210.1186/s12891-022-05616-y
Increasing Warmth in Adolescents with Anorexia Nervosa: A Randomized Controlled Crossover Trial Examining the Efficacy of Mustard and Ginger FootbathsEvidence-Based Complementary and Alternative Medicine, 202010.1155/2020/2416582
Urinary Concentrations and Antibacterial Activities of Nitroxoline at 250 Milligrams versus Trimethoprim at 200 Milligrams against Uropathogens in Healthy VolunteersAntimicrobial Agents and Chemotherapy, 201410.1128/AAC.02147-13
A Randomized Trial of Distal Diuretics versus Dietary Sodium Restriction for Hypertension in Chronic Kidney DiseaseJournal of the American Society of Nephrology, 202010.1681/ASN.2019090905
Assessment of the Efficacy of a Mobile Phone–Delivered Just-in-Time Planning Intervention to Reduce Alcohol Use in Adolescents: Randomized Controlled Crossover TrialJMIR mHealth and uHealth, 202010.2196/16937
The Effects of Spinal Manipulation on Oculomotor Control in Children with Attention Deficit Hyperactivity Disorder: A Pilot and Feasibility StudyBrain Sciences, 202110.3390/brainsci11081047
Lactobacillus reuteri V3401 Reduces Inflammatory Biomarkers and Modifies the Gastrointestinal Microbiome in Adults with Metabolic Syndrome: The PROSIR StudyNutrients, 201910.3390/nu11081761
Molecular regulation of skeletal muscle mitochondrial biogenesis following blood flow-restricted aerobic exercise: a call to actionEuropean Journal of Applied Physiology, 202110.1007/s00421-021-04669-6
The Effect of A Whey-Protein and Galacto-Oligosaccharides Based Product on Parameters of Sleep Quality, Stress, and Gut Microbiota in Apparently Healthy Adults with Moderate Sleep Disturbances: A Randomized Controlled Cross-Over StudyNutrients, 202110.3390/nu13072204
Impact of macroeconomic indicators on bankruptcy prediction models: Case of the Portuguese construction sectorQuantitative Finance and Economics, 202210.3934/QFE.2022018
2020 46th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA51224.2020.00062
Age-Related Differential Effects of School-Based Sitting and Movement Meditation on Creativity and Spatial Cognition: A Pilot StudyChildren, 202110.3390/children8070583
Warm Footbaths with Sinapis nigra or Zingiber officinale Enhance Self-Reported Vitality in Healthy Adults More than Footbaths with Warm Water Only: A Randomized, Controlled TrialEvidence-Based Complementary and Alternative Medicine, 202110.1155/2021/9981183
Microbial Interventions to Control and Reduce Blood Pressure in Australia (MICRoBIA): rationale and design of a double-blinded randomised cross-over placebo controlled trialTrials, 202110.1186/s13063-021-05468-2
Acute effects of an injury preventive warmup programme on unanticipated jump-landing-task performance in adult football players: A crossover trialEuropean Journal of Sport Science, 202110.1080/17461391.2021.1963322
Non-invasive high-frequency oscillatory ventilation in preterm infants: a randomised controlled cross-over trialArchives of Disease in Childhood - Fetal and Neonatal Edition, 201810.1136/archdischild-2017-313190
Deutsches Ärzteblatt international, 201210.3238/arztebl.2012.0674
Phosphorus supplementation raised the heart rate of male water polo players during a randomised graded dryland exercise testBMJ Open Sport & Exercise Medicine, 202010.1136/bmjsem-2019-000714
Ingesting a Post-Workout Vegan-Protein Multi-Ingredient Expedites Recovery after Resistance Training in Trained Young MalesJournal of Dietary Supplements, 202110.1080/19390211.2020.1832640
A Randomized and Controlled Crossover Study Investigating the Improvement of Walking and Posture Functions in Chronic Stroke Patients Using HAL Exoskeleton – The HALESTRO Study (HAL-Exoskeleton STROke Study)Frontiers in Neuroscience, 201910.3389/fnins.2019.00259
Effects of Multi-Ingredient Preworkout Supplementation across a Five-Day Resistance and Endurance Training Microcycle in Middle-Aged AdultsNutrients, 202010.3390/nu12123778
Frontiers in Veterinary Science, 202110.3389/fvets.2021.644836
ASPIRE trial: study protocol for a double-blind randomised controlled trial of aspirin for overheating during exercise in multiple sclerosisBMJ Open, 202010.1136/bmjopen-2020-039691
Relationship of a Special Acidified Milk Protein Drink with Cognitive Performance: A Randomized, Double-Blind, Placebo-Controlled, Crossover Study in Healthy Young AdultsNutrients, 201810.3390/nu10050574
The effect of Nigella sativa supplementation on cardiovascular risk factors in obese and overweight women: a crossover, double-blind, placebo-controlled randomized clinical trialEuropean Journal of Nutrition, 202110.1007/s00394-020-02374-2
BMJ Open Science, 202110.1136/bmjos-2020-100126
Course of Hemoglobin and Hematocrit during and after Preparatory Plasmaphereses without and with Infusion of NaCl 0.9% 500 mlTransfusion Medicine and Hemotherapy, 201410.1159/000354336
Deutsches Ärzteblatt international, 201610.3238/arztebl.2016.0634
Planning and Analysis of Trials Using a Stepped Wedge Design: Part 26 of a Series on Evaluation of Scientific PublicationsDeutsches Ärzteblatt international, 201910.3238/arztebl.2019.0453
Do Chest Compresses with Mustard or Ginger Affect Warmth Regulation in Healthy Adults? A Randomized Controlled TrialEvidence-Based Complementary and Alternative Medicine, 202210.1155/2022/5034572
Distraction by a cognitive task has a higher impact on electrophysiological measures compared with conditioned pain modulationBMC Neuroscience, 202010.1186/s12868-020-00604-1
Increasing Warmth in Oncological Patients: A Randomized Controlled Cross-Over Pilot Trial Examining the Efficacy of Mustard and Ginger FootbathsIntegrative Cancer Therapies, 202110.1177/15347354211058449
Therapeutic Riding or Mindfulness: Comparative Effectiveness of Two Recreational Therapy Interventions for Adolescents with AutismJournal of Autism and Developmental Disorders, 202210.1007/s10803-021-05136-z
Acute nicotinamide riboside supplementation improves redox homeostasis and exercise performance in old individuals: a double-blind cross-over studyEuropean Journal of Nutrition, 202010.1007/s00394-019-01919-4
A novel gravity-induced blood flow restriction model augments ACC phosphorylation and PGC-1α mRNA in human skeletal muscle following aerobic exercise: a randomized crossover studyApplied Physiology, Nutrition, and Metabolism, 202010.1139/apnm-2019-0641
Cognitive, Affective, & Behavioral Neuroscience, 202010.3758/s13415-020-00821-5
Validation of a Musculoskeletal Digital Assessment Routing Tool: Protocol for a Pilot Randomized Crossover Noninferiority TrialJMIR Research Protocols, 202110.2196/31541
Clinical Journal of the American Society of Nephrology, 201810.2215/CJN.00380118
Continuous intracoronary versus standard intravenous infusion of adenosine for fractional flow reserve assessment: the HYPEREMIC trialEuroIntervention, 202010.4244/EIJ-D-18-01067
Efficacy of hypertonic dextrose injection (prolotherapy) in temporomandibular joint dysfunction: a systematic review and meta-analysisScientific Reports, 202110.1038/s41598-021-94119-2
Methodological advantages and disadvantages of parallel and crossover randomised clinical trials on methylphenidate for attention deficit hyperactivity disorder: a systematic review and meta-analysesBMJ Open, 201910.1136/bmjopen-2018-026478
A Dance Movement Psychotherapy Intervention for the Wellbeing of Children With an Autism Spectrum Disorder: A Pilot Intervention StudyFrontiers in Psychology, 202110.3389/fpsyg.2021.588418
Korean Journal of Anesthesiology, 202110.4097/kja.21165