TOWARDS BAYESIAN EVALUATION OF SEROPREVALENCE STUDIES

Bayes’ Theorem represents a mathematical formalization of the common sense. What we 1 know about the world today is what we knew yesterday plus what the data told us. The lack of 2 understanding of this concept is the source of many errors and wrong judgements in the current 3 COVID-19 pandemic. In this contribution, we show how to use the framework of Bayesian inference 4 to produce a reasonable estimate of seroprevalence from studies that use a single binary test. Bayes’ 5 Theorem sometimes produces results that seem counter-intuitive at first sight. It is important to 6 realize that the reality may be different from its image represented by test results. The extent to which 7 these two worlds differ depends on the performance of the test (i.e. its sensitivity and specificity), 8 and the prevalence of the tested condition. 9


11
In the age of the coronavirus, various testing has become enormously widespread. Unfortunately, 12 what has not become widespread is the understanding of the test results. The most common PCR test 13 is used for the detection of the virus (more precisely its particular fragments) in a sample collected by 14 a nasopharyngeal swab. The number of PCR positive cases can be used to assess the Case Fatality Rate Version March 9, 2021 submitted to Journal Not Specified 2 of 6 of 0.23%. In people under 70 years, the median IFR reached 0.05%. Both the numbers are likely to be 28 overestimated because an unknown proportion of population defeats the virus on the level of cellular 29 immunity (and probably even become immune) without producing antibodies at all [5]. This seems to 30 be the case especially for children [6] .

31
Despite the fundamental importance of various forms of testing, not enough attention has been 32 paid to the correct interpretation of the test results. In this paper, we want to explain this issue in 33 three successive steps of an increasing level of complexity. We use the example of antibody tests here, 34 but the same logic should be used for any test, the results of which are converted to a binary answer 35 (positive-negative). This applies to all antibody tests (laboratory or rapid tests), all PCR tests (full 36 RT-qPCR, antigen testing, etc.), and many more coronavirus unrelated medical tests, or even health 37 unrelated tests (such as AB testing [7]).

A binary test primer 59
Each test with a binary outcome has a certain accuracy which is never perfect. Let us fix ideas by

72
In practice, we test a subject and observe the test result, say T+. Since neither sens nor spec are 73 perfect, a positive test result does not necessarily imply that the antibodies are presents (it may be a false positive). Thus, we want to make inference about the probability that the antibodies are present, 75 provided the test came out positive. We use the Bayes' Theorem to obtain It is important to realize that the posterior probability p(A + |T+), i.e. the probability that a 77 positively tested subject indeed has the antibodies, depends not only on the parameters of the test (sens 78 and spec) but also on the prevalence. For example, the Euroimmun ELISA test for IgA anti-SARS-CoV-2 79 antibodies has a declared sensitivity of 98.6% and specificity of 92.0%. If the prevalence is assumed to

A single test study 91
In a typical seroprevalence study, the question is how widespread a certain antibody is in a given population. Thus, we want to make inference on the prevalence. A test of known parameters is used and a random sample of N subjects is drawn from the population. The study yields data which consist of K positive test results and N − K negative test results. The Bayes' Theorem -this time written in terms of probability densities [11] -states that p(prev|data) ∝ p(data|prev)p(prev). (1) The proportional sign (∝) means that the posterior density p(prev|data) must be normalized to a unit 92 area. The posterior density represents a degree of belief about the prevalence, taking into account all 93 the available data. Some assumption must be made about the prevalence that we want to estimate.

94
This is the first principle of Bayesian inference -you cannot make inference without assumptions. It is 95 sensible to model the prior density p(prev) as a beta distribution centered around our prior beliefs.

96
For example, if the study is performed at the very beginning of the pandemic, the prevalence is almost 97 certainly very low, and so p(prev) = beta(1, 10) may be a sensible prior (see Figure 1).

Figure 1.
The simulated results of a seroprevalence study with N = 1000 subjects, out of whom K = 200 came out positive. A single test with the parameters sens = 0.7 and spec = 0.9 was used. The dashed line represents the prior and the thick line of the same color represents the posterior density. Notice that the prior has a negligible effect on the posterior, if the number of subjects is sufficiently high. Now let us evaluate the likelihood, i.e. p(data|prev). The likelihood is interpreted as the probability of obtaining the observed data if the true prevalence was known and equal to prev. This is a rather simple calculation because p(data|prev) ∝ [p(T + |prev)] K [p(T − |prev)] N−K .