Part 19 of a Series on Evaluation of Scientific Publications
Background: The early detection of cancer and other diseases is generally considered beneficial, yet there is evidence that in some diseases screening may be of limited benefit. To clarify this issue, we present the statistical principles that underlie screening.
Methods: We define screening and discuss the conditions for its successful use. We give illustrative examples from among the currently recommended types of screening in Germany and from the recent medical literature, particularly with regard to mammography.
Results: Certain specific conditions must be fulfilled for screening to be beneficial (usually measured by reduced mortality): The screening procedure must be of high quality, and the screening intervals must be well adapted to the distribution of the sojourn time. Alongside its benefits, screening can also cause harm, particularly to the many patients who are given a false positive test result. According to German law, potential participants are entitled to being given all information necessary to make an informed decision about screening.
Conclusion: Just like clinical interventions, screening programs should be evaluated before they are introduced or, at the latest, at the time of their introduction.
Many physicians would readily agree with the statement that it is in a patient’s interests for a disease, particularly cancer, to be detected as early as possible. Behind this lies the conviction that this leads to more successful, or at least less aggressive, treatment. Early detection programs for breast cancer were therefore begun as early as the 1960s, and many others followed (1). However, in recent years critical voices disputing in particular the use of screening mammography have been repeatedly heard. The most recent of these is a study by Autier et al. (2). Similarly, the pros and cons of PSA (prostate-specific antigen) screening are the subject of heated discussion (3). This article will describe the methodological basis of screening.
The principle of screening
To understand how such widely varying opinions on the benefits of (particular) screening measures have arisen, it is helpful to consider the principle behind screening.
Morrison (5) defines screening as follows: “Screening for disease is the examination of asymptomatic people in order to classify them as likely or unlikely to have the disease that is the object of screening. People who appear likely to have a disease are investigated further to arrive at a final diagnosis. Those people who are found to have the disease are then treated.”
In other words, screening is not part of general preventive healthcare: It is always directed at a specific disease. The target group consists of people who have not been diagnosed with a disease and are not suspected of having a disease (Box 1).
Screening normally involves two stages: Following a test that is as sensitive as possible but not necessarily specific, individuals are divided into those who have tested negative and those who have tested positive. Those who have tested positive then undergo a confirmation test that is as specific as possible. This allows the disease to be either diagnosed or ruled out. This confirmation test identifies diseased (true positive) and healthy (false positive) persons. One-stage screening, such as colonoscopy, is the exception rather than the rule (Box 2).
The phrase “early detection” expresses the fundamental idea behind screening: An earlier diagnosis is expected to imply a lower stage of disease that is more likely to be treated successfully. This implicitly assumes that if left untreated the disease would progress to forms with worse prognoses.
The Figure shows what happens over time to someone who develops a given disease during the course of his/her life. The disease begins at a particular point in time. Somewhat later, it becomes essentially “detectable”, e.g. for a solid tumor, a minimum size is reached. The time up to the point that patient would be diagnosed clinically, even without screening, is known as the “preclinical phase” (or “sojourn time”). The length of this preclinical phase depends primarily on the disease in question and also varies between individuals.
Screening can only lead to earlier diagnosis during this preclinical phase. The length of time by which the diagnosis is brought forward is called the “lead time.” Logically, lead time cannot be observed in individual cases. Simply put, the average age at diagnosis in a screened group is expected to be lower by the lead time than in a comparison group.
When a disease is diagnosed following an earlier negative screening result, this result is retrospectively described as a “false negative” or an “interval case.” It is usually no longer possible to determine why the earlier screening result was negative: Had the tumor not yet developed, was it not yet detectable, or was it missed (“screening failure”)? These distinctions play a role in quality assurance. An individual patient is formally classified as a “false negative” even if his/her tumor had not yet developed when the test was performed.
A screening program, e.g. for breast cancer, consists of the following:
- A test procedure (in this case a mammogram)
- A defined group of people to be included (in this case women aged 50 to 69 years)
- The testing frequency (in this case every two years).
There are also differences in terms of addressing those who are eligible; in Germany, personal written invitations are sent only for mammograms.
For most diseases, the aim of early detection is to achieve a benefit in the form of prolonged life. Depending on the disease, the aim of screening may be an endpoint other than death, for example an endpoint that can be prevented or delayed such as heart attack, blindness or amputation. Success is considered to be a significant reduction in mortality (or another endpoint) in the eligible population.
When should screening be performed?
Current statutory preventive care in Germany covers breast, colon, skin, cervical, and prostate cancer (6). Prostate cancer prevention does not include the PSA test. With the exception of neonatal metabolic screening, preventive measures not connected with cancer (check-ups, antenatal and pediatric preventive care) are less specifically directed at particular target diseases.
How are the diseases for which screening is offered selected? If screening is offered, what screening method is chosen? The literature contains a number of recommendations on this subject (4, 7, 8). The disease must represent a “considerable problem”; in other words it must affect many people and/or have serious consequences. For example, breast cancer is the most common form of cancer and the most common cause of cancer-related death among women in Germany (e3). There must also be sufficient evidence that (almost) all those affected progress through the stages preclinical → clinical → endpoint progression. Nonprogressive and transient diseases are therefore excluded from screening. The preclinical phase must be sufficiently long. The length of the preclinical phase can be estimated on the basis of study data. For example, in Sweden an average length of approximately three years is estimated for breast cancer (9). Treatment of preclinical cases must be significantly more likely to succeed than clinically identified cases. Aggressively growing cancers, very rare cancers, and cancers that can be treated successfully if diagnosed clinically are not included. The question of whether prostate cancer is always progressive is contentious (3).
The selected procedure must also be valid, low-risk, and acceptable. Up to a point, this assessment is a matter of opinion. For example, with respect to colon cancer, testing for occult blood in stool samples is not particularly accurate, but it is a low-risk first step and is relatively widely accepted (10). Colonoscopy, in contrast, is accurate but not low-risk and not widely accepted. Its risks are infections, perforations, hemorrhaging, the sedation usually needed, and cardiovascular problems related to colon cleansing (11, 12, e4). Isolated cases of death are reported (12). In this controversial area, the law in Germany leaves the choice of procedure to the individual (6).
Measurements of a screening's validity are sensitivity (how many of the actual cases does the test identify?) and specificity (how many healthy people are correctly classified as healthy the first time?). A high positive predictive value (how many of those who test positive to screening actually have the disease?) is also desirable. High sensitivity means few false negatives, while high specificity or a high positive predictive value means few false positives.
The conditions stated above are required for success but are not in themselves sufficient. Even a program that meets all these conditions may not necessarily be successful. To date, most evidence on this comes from reviews of international studies. Every screening procedure should really undergo evaluation similar to a clinical study before it is introduced.
The ethical dimension of screening
Unlike therapeutic measures, the overwhelming majority of screening participants do not have the disease being tested for. All screening participants, however, bear the risk of the method used for screening. Of all the screening procedures used in Germany, these risks are highest for colonoscopy (12). As well, mammograms entail (low) exposure to radiation. Only those who have the disease and test positive to screening (true positives) may benefit from screening. However, out of the true positives only those with a subsequent extended life span and/or improved quality of life benefit from early detection. People diagnosed on the basis of screening also include those who would have undergone equally intensive treatment and/or had the same survival time even if their disease had been clinically identified and treated later. These people do not benefit from screening and are sometimes actually worse off because of it, as their morbidity phase may be prolonged due to earlier intervention.
The increasing success of treatment of advanced cancers also reduces the benefit of early detection. Those with a positive screening result for a disease that would never have actually manifested during their lifetime are siginificantly worse off as a result of screening. This is known as overdiagnosis (13) and seems to occur particularly frequently with prostate cancer (3). In all, there are only ever a few participants in a given screening program who are true positives. For example, the current average figure for screening mammography participants is 8 in 1000 (14). It is not known how many of these individuals really do not die of breast cancer as a result of screening; Welch and Frankel (e5) estimate 2 at the most.
For the small group of false negatives, screening can lead to delayed diagnosis in individual cases if unclear symptoms begin soon afterwards.
True negatives (healthy people who test negative at screening), generally the largest group of all screening participants, do usually benefit from screening. They usually perceive medical confirmation that they are healthy in a positive light. However, this is sustained only for those who receive such confirmation each time they undergo screening.
A false positive test result is usually followed by a confirmation test, which can be invasive and risky. Examples include colonoscopy following a positive stool test or biopsy following a positive mammogram finding. The considerable concern caused by suspicion of the disease until the confirmation test is performed is also stressful. False positives usually outnumber true positives. For example, the positive predictive value of a mammogram in Germany is currently 15.4%. In other words, for approximately 85% of all screening participants who receive a positive mammogram result, the positive finding is not confirmed by the confirmation test (14). In order to reduce the number of biopsies, in Germany the confirmation test consists of two stages: Before a biopsy, other examinations using imaging procedures (mammogram, ultrasound) are performed. The positive predictive value of this second stage is 49.1%. Each time screening is performed, an average of approximately 4.5% of all participants receive a false positive result (14). Every participant runs this risk at each of her screening examinations, of which there are up to ten. International studies have determined that for each woman the probability of obtaining at least one false positive result in a screening mammography program ranges from 20% to 63%, depending on the program (15–18).
Sensitivity and specificity cannot be increased simultaneously in a single group of screening participants. In other words, fewer false negatives means more false positives, and vice versa. A balance and suitable compromise must therefore be found in terms of the age range and frequency for screening. The heated debate on this subject shows that this is not easy.
In the face of this potential benefit and potential harm, it is not only legislators who must decide which screening measures to offer as “statutory preventive care.” Each individual to whom this preventive care is offered must also make a decision. In addition to statutory preventive care measures, physicians may also offer them other procedures (e.g. PSA screening) as part of tailored health care.
To achieve the highest possible level of success, a high participation rate is needed (4). One possible way of increasing willingness to participate is the relatively expensive procedure of issuing personal written invitations, with appointments, to those who are eligible for screening.
However, in addition to the right to self-determination and the right not to know, people must also have the opportunity to weigh the facts and decide whether or not to take part without being put under any pressure. The critical aspects of screening and its ethical basis are stated comprehensively and clearly in the detailed explanation of the changes made to the German chronic care guidelines in 2007 (19, 20, e6, e7). In these guidelines lawmakers decided not to offer any financial reward for screening participation, and instead to provide counseling before individuals decide whether or not to take part in screening. Similar counseling has long been called for internationally (17).
Physicians’ duty to provide counseling was first established in Germany’s 2002 law on colonoscopy (21). This is in the context of the choice of stool test or colonoscopy mentioned above. Germany’s Federal Joint Committee (G-BA, Gemeinsamer Bundesausschuss) has compiled specifications for the form such counseling should take (21). Essentially, these specifications can be applied to all preventive care examinations: It is the physician’s task to inform individuals that they can decide independently, i.e. give informed consent. The G-BA stipulates that information on the following must be provided:
- Frequency of the disease
- Clinical picture of the disease
- The aims and underlying concept of screening
- Efficacy (sensitivity, specificity) and effectiveness
- Disadvantages (discomfort, risks)
- Action to be taken if the test is positive.
As part of this provision of information, individuals must be given the information leaflet stipulated by law (22, e8). However, these information sheets have already been criticized for relaying only one point of view (23, e9).
Screening measures must undergo experimental evaluation
At the beginning of this article we cited studies that cast doubt on the success, or at least major success, of screening mammography (1, 2). The principle behind screening gives rise to possible reasons for this that cannot be affected even by technically perfect mammograms or error-free diagnostic confirmation tests: If the length of the preclinical phase varies widely between patients, it means that a substantial proportion of patients have very short preclinical phases. Biologically, this would be a rapidly-progressing tumor. These patients would receive false negative test results particularly frequently, because their entire preclinical phase could easily fall between two screening tests, and they would then not benefit from screening. On the other hand, in the same situation a relatively high proportion of patients have very long preclinical phases, i.e. slow-progressing tumors. These would mostly be detected by screening, but for a considerable proportion of these patients earlier diagnosis may not yield any benefit for treatment success. From the physician's point of view, this constellation leads to the suggestion that a symptomatic diagnosis has a poor prognosis and a screening diagnosis has a good prognosis, which in turn suggests that screening yields a significant benefit; this is called “length time bias”, so named because it is the varying lengths of the preclinical phase that are causing the mistaken impression that screening is beneficial.
If the length of the preclinical phase varies considerably, only a relatively small group of patients, with an average preclinical phase duration, will benefit; this is not sufficient to achieve a substantial decrease in mortality for all those entitled to screening.
This too makes it important to evaluate screening procedures in-depth. If screening yields little or no benefit, the price paid by the many false positives, and overdiagnosed patients in particular, is fundamentally too high.
Sufficiently large and sufficiently long ongoing studies are needed, with an unscreened comparison group and subsequent comparison of the relevant endpoint (e.g. mortality) (24). Comparison groups must be comparable with screened groups in terms of incidence, endpoint, and treatment. This can be done by randomizing those who are invited to take part, but this is not essential (25, e10). The most important information on non-participants must also be known so that the extent of self-selection can be estimated (e11). With the exception of neuroblastoma screening, no such preliminary model projects or studies have yet been conducted in Germany before a measure is introduced. During trials of neuroblastoma screening it became clear that it provided no benefit and even resulted in potential harm in the form of frequent overdiagnoses, and screening was therefore not introduced (25).
Conflict of interest statement
PD Dr. Spix has received travel subsidies and lecture fees from Sanofi and the German Social Accident Insurance Institution for the Energy, Textile, Electrical and Media Products Sectors (Berufsgenossenschaft Energie Textil Elektro Medienerzeugnisse [BG ETEM]).
Prof. Blettner has received consultancy fees from Astra Zeneca and Astellas.
Manuscript received on 28 October 2010, revised version accepted on
24 January 2012.
Translated from the original German by Caroline Devitt, MA.
PD Dr. rer. nat. et med. habil. Claudia Spix
Deutsches Kinderkrebsregister am
Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI)
Universitätsmedizin der Johannes Gutenberg-Universität Mainz
55101 Mainz, Germany
@For eReferences please refer to:
Leistungen_ 110331.pdf. 2010
A 2026. VOLLTEXT
Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) at the University Medical Center of the Johannes Gutenberg University Mainz: Prof. Dr. med. Blettner
|1.||Gøtzsche PC: Relation between breast cancer mortality and screening effectiveness: systematic review of the mammography trials. Dan Med Bull 2011; 58: A4246. MEDLINE|
|2.||Autier P, Boniol M, Gavin A, Vatten LJ: Breast cancer mortality in neighbouring European countries with different levels of screening but similar access to treatment: trend analysis of WHO mortality database. BMJ 2011; 343: d4411. CrossRef MEDLINE PubMed Central|
|3.||Djulbegovic M, Beyth RJ, Neuberger MM, et al.: Screening for prostate cancer: systematic review and meta-analysis of randomised controlled trials. BMJ 2010; 341: c4543. CrossRef MEDLINE PubMed Central|
|4.||Cancer prevention. In: Santos Silva, Isabel dos. Cancer epidemiology: Principles and methods. IARC 1999.|
|5.||Morrison AS: Screening in chronic disease. Oxford: University Press 1992.|
|6.||Bundesministerium für Gesundheit: Vorsorge- und Früherkennungsangebote in der GKV; www.bmg.bund.de/fileadmin/dateien/|
Leistungen_ 110331.pdf. 2010
|7.||Anonymous: Position Paper. Recommendations on cancer screening in the European Union. Advisory Committee on Cancer Prevention. Eur J Cancer 2000; 36: 1473–8. CrossRef MEDLINE|
|8.||Giersiepen K, Hense HW, Klug SJ, Antes G, Zeeb H: Entwicklung, Durchführung und Evaluation von Programmen zur Krebsfrüherkennung. Ein Positionspapier. ZaeFQ 2007; 101: 43–9.|
|9.||Chiu SY, Duffy S, Yen AM, Tabár L, Smith RA, Chen HH: Effect of baseline breast density on breast cancer incidence, stage, mortality, and screening parameters: 25-year follow-up of a Swedish mammographic screening. Cancer Epidemiol Biomarkers Prev 2010; 19: 1219–28. CrossRef MEDLINE|
|10.||Haug U, Brenner H: A simulation model for colorectal cancer screening: potential of stool tests with various performance characteristics compared with screening colonoscopy. Cancer Epidemiol Biomarkers Prev 2005; 14: 422–8. CrossRef MEDLINE|
|11.||Brenner H, Altenhofen L, Hoffmeister M: Eight years of colonoscopic bowel cancer screening in Germany: initial findings and projections. Dtsch Arztebl Int 2010; 107(43): 753–9. VOLLTEXT|
|12.||Mansmann, U, Crispin A, Henschel V, et al.: Epidemiology and quality control of 245 000 outpatient colonoscopies. Dtsch Arztebl Int 2008; 105(24): 434–40. VOLLTEXT|
|13.||Jørgensen KJ, Gøtzsche PC: Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends. BMJ 2009; 339: b2587. CrossRef MEDLINE PubMed Central|
|14.||Malek D, Rabe P, et al.: für die Kooperationsgemeinschaft Mammographie: Evaluationsbericht 2005–2007. Ergebnisse des Mammographie-Screening-Programms in Deutschland 2009.|
|15.||Hubbard RA, Miglioretti DL: Modelling the cumulative risk of a false-positive screening test. Stat Methods Med Res 2010; 19: 429–49. CrossRef MEDLINE PubMed Central|
|16.||Njor SH, Olsen AH, Schwartz W, Vejborg I, Lynge E: Predicting the risk of a false-positive test for women following a mammography screening programme. J Med Screen 2007; 14: 94–7. CrossRef MEDLINE|
|17.||Woloshin S, Schwartz LM: Numbers needed to decide. JNCI 2009; 101: 1163–5. CrossRef MEDLINE|
|18.||Hubbard RA, Kerlikowske K, Flowers CI, et al.: Cumulative probability of false-positive recall or biopsy recommendation after 10 years of screening mammography: a cohort study. Ann Intern Med 2011; 155: 481–92. MEDLINE PubMed Central|
|19.||Gemeinsamer Bundesausschuss: Chroniker-Richtlinie (§ 62 SGB V) in der Version vom 19.7.2007. www.g-ba.de/downloads/ 62–492–140/RL_Chroniker-2007–07–19.pdf|
|20.||Gemeinsamer Bundesausschuss: Bericht der Arbeitsgruppe Zuzahlung des UA Prävention zum Regelungsauftrag des § 62 Abs. 1 Satz 3 SGB V Stand: 30.05.2007. www.g-ba.de/downloads/ 40–268–416/2007–05–30-Abschlu%C3%9F_verpfl-Frueh |
|21.||Bundesministerium für Gesundheit: Bekanntmachung [1540 A] des Bundesausschusses der Ärzte und Krankenkassen über eine Änderung der Richtlinien über die Früherkennung von Krebserkrankungen (Krebsfrüherkennungs-Richtlinien) vom 21. Juni 2002. www.g-ba.de/downloads/39–261–61/2002–06–21-KFU-Kolon.pdf|
|22.||Merkblatt Darmkrebs-Früherkennung. |
|23.||Griebenow B: Vor- und Nachteile darstellen. Dtsch Arztebl 2008; 105(33): A 1724. VOLLTEXT|
|24.||Hakama M: Chapter 2. Planning and designing of screening programmes. 13–28. In: Sankila R, Demaret E, Hakama M, Lynge E, Schouten LJ, Parkin DM for the European Network of Cancer Registries. Evaluation and monitoring of screening programmes. European Commission, Europe Against Cancer Programme. Brussels-Luxembourg, 2000. European Communities, 2001.|
|25.||Schilling FH, Spix C, Berthold F, et al.: Neuroblastom-Früherkennung im Alter von einem Jahr in Deutschland. Eine kontrollierte populationsbezogene Studie mit unerwartetem Ausgang. Dtsch Arztebl 2003; 100(25): A 1739–46. VOLLTEXT|
|e1.||Bautsch W: Requirements and assessment of laboratory tests—part 5 of a series on evaluation of scientific publications. Dtsch Arztebl Int 2009; 106: 403–6. VOLLTEXT|
|e2.||Anon: Statistik-Quiz Sensitivität und Spezifität. Dtsch Arztebl 2010; 107(31): 551. VOLLTEXT|
|e3.||Robert Koch-Institut, Zentrum für Krebsregisterdaten: Krebs in Deutschland 2005/2006, Häufigkeiten und Trends. www.rki.de/cln_234/DE/Content/GBE/DachdokKrebs/krebs__node.html?__ nnn=true|
|e4.||Klug SJ: Colonoscopy screening in Germany—a success story? Dtsch Arztebl Int 2010; 107(43): 751–2. VOLLTEXT|
|e5.||Welch HG, Frankel BA: Likelihood that a woman with screen-detected breast cancer has had her „life saved“ by that screening. Arch Intern Med 2011; 171: 2043–6. MEDLINE|
|e6.||Kassenärztliche Bundesvereinigung: Beschluss zu Änderungen des Einheitlichen Bewertungsmaßstabes (EBM) durch den Bewertungsausschuss nach § 87 Abs. 1 Satz 1 SGB V aufgrund des Beschlusses des Gemeinsamen Bundesausschusses über eine Neufassung der Krebsfrüherkennungs-Richtlinie vom 18. Juni 2009 in seiner 201. Sitzung (schriftliche Beschlussfassung) mit Wirkung zum 1. Oktober 2009. Dtsch Arztebl 2009; 106(41): |
A 2026. VOLLTEXT
|e7.||Gemeinsamer Bundesausschuss: Krebsfrüherkennungs-Richtlinie in der Version vom 16.12.2010. www.g-ba.de/downloads/ 62–492–510/RL_KFU_2010–12–16.pdf|
|e8.||Gemeinsamer Bundesausschuss: Merkblätter für Patienten. www.g-ba.de/institution/service/publikationen/merkblaetter/merkblaetter/2011.|
|e9.||Deutsches Netzwerk für Evidenzbasierte Medizin e.V.: Stellungnahme des Fachbereichs Patienteninformation. Kriterien zur Erstellung von Patienteninformationen zu Krebsfrüherkennungsuntersuchungen. Berlin 2008. www.ebm-netzwerk.de/netzwerkarbeit/images/stelungnahme_dnebm_080630.pdf|
|e10.||Hakama M, Pukkala E, Heikkilä M, Kallio M: Effectiveness of the public health policy for breast cancer screening in Finland: population based cohort study. BMJ 1997; 314: 864–7. CrossRef MEDLINE PubMed Central|
|e11.||Lagrèze WA: Vision screening in preschool children: Do the data support universal screening? Dtsch Arztebl Int 2010; 107(28–29): 495–9. VOLLTEXT|