DÄ internationalArchive24/2009Requirements and Assessment of Laboratory Tests

Review article

Requirements and Assessment of Laboratory Tests

Part 5 of a Series on Evaluation of Scientific Publications

Dtsch Arztebl Int 2009; 106(24): 403-6; DOI: 10.3238/arztebl.2009.0403

Bautsch, W

Background: Current laboratory tests exhibit high sensitivity and specificity combined with comparatively low costs thus favoring broad and uncritical ordering habits.
Methods: Introduction of Bayes’ theorem and discussion of its implications for laboratory test results in a mostly non-technical form, accompanied by a selective literature review.
Results and conclusions: According to Bayes’ theorem the positive predictive value of laboratory test results is directly dependent on the prevalence of the disease in a given patient cohort. Thus, the clinical value of a given test result is critically dependent on a precise indication. Ordering of tests that are not indicated in a given patient is clinically useless and undesirable, where detailed information on disease prevalence is missing. These considerations are valid irrespective of ethical or economic considerations.
Dtsch Arztebl Int 2009; 106(24): 403–6
DOI: 10.3238/arztebl.2009.0403
Key words: laboratory diagnostics, blood analysis, diagnosis, PSA test, borreliosis
LNSLNS In practice, laboratory tests are often ordered in a highly uncritical manner. They are comparatively cheap (for example, in comparison to imaging procedures), but highly sensitive and specific. This implies that if many different laboratory parameters are measured, this will supply clinically relevant information on the disease fast, with little effort and relatively cheaply. This is even taken to be the case if the tested parameters have little or nothing to do with the patient's symptoms. This includes routine profiles (which may be very extensive), as well as screening for diseases such as cancer which should be diagnosed before clinical symptoms develop and infectious diseases, such as borreliosis, which develop in phases.

This overlooks the fact that the reliability of test results depends on a clear indication. Although this aspect is frequently mentioned in public discussions of the value of screening (1), it is also important in daily medical practice. This does not of course apply to recommended screening tests (such as neonatal screening), as these issues are explicitly considered in the recommendations.

The present article sketches the underlying relationships in a largely non-mathematical form and explains the consequences for ordering diagnostic tests in daily medical practice. This problem is related to statistics, an area in which intuitive ideas are often misleading. The underlying problem is displayed in the following multiple choice question:

A laboratory test (for example, for borreliosis) has a diagnostic specificity of 98%. How probable is it that a patient who gives a positive test result does in fact have this disease?

a) You have to know the sensitivity too to be able to answer this question.
b) 98%
c) (1-specificity) × 100 (%) = 2%
d) None of these answers is correct.

Readers who can answer this question correctly can stop here. (The solution is at the end of the article). This article can be very helpful for practical medical work, as the underlying problem appears repeatedly in many different variations.

Confronted with this problem, most people attempt to solve it with the help of specificity alone. The specificity states the proportion of healthy subjects for whom a negative test result is (correctly) given. Conversely, 1-specificity gives the proportion of healthy subjects for whom a positive result is wrongly given (false positive rate). The intuitive tendency is to think that we now have all the necessary information and that the probability is 98%. However, this is wrong. The correct solution of the problem requires two additional pieces of information, the test sensitivity and the prevalence of the disease in the test cohort. The latter is the proportion of persons with the illness relative to all persons for whom the doctor has ordered this test. The reason for this is explained in Box 1 (gif ppt).

Use additional parameters
What are the consequences for the example of borreliosis testing discussed in the introduction? The prevalence of active borreliosis in the population is not precisely known. Estimates range from 10 to 237 cases per 100 000 inhabitants (2), with major differences between the regions (3). The Robert Koch Institute published a value of 25 per 100 000 for Germany in 2003 (4). This will be used in the following, to simplify the calculations. Modern serological immune tests for borreliosis coupled to the recommended immunoblot are assumed to be at least 98% specific (5), although this figure is not known exactly and probably depends on the test system. We will assume that the specificity is 98%. This means that 25 genuine positive results are actually obtained for 100 000 tests in the population. It will be neglected that the sensitivity of the test is less than 100%. However, there are two additional fundamental problems in the interpretation of serological test results for borreliosis, which also exist when tests are only ordered for strict indications.

- A negative test result does not reliably exclude active borreliosis—particularly in the early stages—as the tests are less than 100% sensitive.
- The available serological tests cannot reliably distinguish between active borreliosis and a titer after recovery from borreliosis, so that even unambiguously positive serological findings per se are not an indication for treatment.

Aside from the 25 genuine positive results, there will also be 2000 false positive test results, as 1-specificity = 2%. There will therefore be a total of 2025 positive test results, of which 25 are caused by active borreliosis. This corresponds to a probability of about 1.25% that a test person with a positive test result really is suffering from active borreliosis. It follows that this test is clearly unsuitable for population screening, as it is almost 99% certain that a positive result is wrong.

The physician can influence the prevalence of a disease, meaning the prevalence of a disease in the test cohort for whom he orders the test. Thus, if he orders borreliosis testing for every patient—whatever the symptoms—, the reliability of the individual results is close to that for population screening, as everyone goes to the doctor at one time or another. The reliability of the positive result is then close to zero.

The situation is quite different if the test is ordered for a specific indication, for example, if the patient comes with acute peripheral facial palsy. The prevalence of borreliosis in patients with acute facial palsy has not been very well studied. A recent Norwegian article gives the value of about 10% (6); the value for children is certainly greater. The results are quite different for this patient cohort. Of 1000 tests, 18 will be false positive (1-specificity = 2% of 900 negative patients), but there will be 100 genuine positive findings. The probability that a patient with a positive test result genuinely has borreliosis is then (100/100 + 18) x 100 ~ 85%. This figure will certainly be greater for children.

Conclusion
Sensitivity and specificity are test-specific properties which the physician cannot actively influence. This assumes that the test is properly performed and evaluated, including the steps before and after the analysis. On the other hand, the reliability of a positive test result—the positive predictive value—is critically dependent on the prevalence of the disease in the test cohort and this is something the physician can influence. As a matter of principle, tests should only be ordered when they are indicated, as it is only then that the test result can be clinically evaluated. Results from non-indicated orders are clinically useless without a well founded database on the prevalence of the disease and should therefore not be ordered. This is unrelated to economic or ethical considerations.

Although the borreliosis test was used as an example, this applies to all laboratory tests. The arguments apply equally well to laboratory tests or to other investigations, including X-ray, endoscopic, sonographic, electrocardiographic or clinical procedures. If the test or investigation is not indicated, this reduces its positive predictive value and increases the number of false positive test results.

The correct answer to the initial multiple choice question was—d.

Conflict of interest statement
The authors declare that no conflict of interest exists according to the guidelines of the International Committee of Medical Journal Editors.

Manuscript received on 6 February 2007, revised version accepted on
19 October 2007.

Translated from the original German by Rodney A. Yeates, M.A., Ph.D.


Corresponding author
Prof. Dr. med. habil., Dr. rer. nat. Wilfried Bautsch
Institut für Mikrobiologie,
Immunologie und Krankenhaushygiene
Städtisches Klinikum Braunschweig gGmbH
Celler Str. 38
38814 Braunschweig, Germany
w.bautsch@klinikum-braunschweig.de
1.
Bögermann C, Rübben H: Früherkennung des Prostatakarzinoms. Dtsch Arztebl 2007; 104(8): A 503–4. VOLLTEXT
2.
O´Connell S, Granström M, Gray JS, Stanek G: Epidemiology of European Lyme borreliosis. Zentralbl Bakteriol 1998; 287: 229–40. MEDLINE
3.
Talaska T: Borreliose-Epidemiologie. Brandenburgisches Ärzteblatt 2002; 11: 338–40. www.laekb.de/15/15Beitraege/95021TH0211.pdf
4.
Mehnert WH, Krause G: Surveillance of lyme borreliosis in Germany, 2002 and 2003. Euro Surveill 2005; 10: 83–5. MEDLINE
5.
Goettner G, Schulte-Spechtel U, Hillermann R, Liegl G, Wilske B, Fingerle V: Improvement of Lyme borreliosis serodiagnosis by a newly developed recombinant immunoglobulin G (IgG) and IgM line immunoblot assay and addition of VlsE and DbpA homologues. J Microbiol 2005; 43: 3602–9. MEDLINE
6.
Ljostad U, Okstad S, Topstadt T, Mygland A, Monstad P: Acute peripheral facial palsy in adults. J Neurol 2005; 252: 672–6. MEDLINE
Institut für Mikrobiologie, Immunologie und Krankenhaushygiene, Städtisches Klinikum Braunschweig gGmbH: Prof. Dr. med. habil. Dr. rer. nat. Bautsch
1. Bögermann C, Rübben H: Früherkennung des Prostatakarzinoms. Dtsch Arztebl 2007; 104(8): A 503–4. VOLLTEXT
2. O´Connell S, Granström M, Gray JS, Stanek G: Epidemiology of European Lyme borreliosis. Zentralbl Bakteriol 1998; 287: 229–40. MEDLINE
3. Talaska T: Borreliose-Epidemiologie. Brandenburgisches Ärzteblatt 2002; 11: 338–40. www.laekb.de/15/15Beitraege/95021TH0211.pdf
4. Mehnert WH, Krause G: Surveillance of lyme borreliosis in Germany, 2002 and 2003. Euro Surveill 2005; 10: 83–5. MEDLINE
5. Goettner G, Schulte-Spechtel U, Hillermann R, Liegl G, Wilske B, Fingerle V: Improvement of Lyme borreliosis serodiagnosis by a newly developed recombinant immunoglobulin G (IgG) and IgM line immunoblot assay and addition of VlsE and DbpA homologues. J Microbiol 2005; 43: 3602–9. MEDLINE
6. Ljostad U, Okstad S, Topstadt T, Mygland A, Monstad P: Acute peripheral facial palsy in adults. J Neurol 2005; 252: 672–6. MEDLINE