Modelling covid19 infection risk for rational covid19 test allocation
Motivation:
To understand the potential effectiveness of a peertopeer notatracing(NAT) system against an unknown pathogen, we may consider how a datadriven approach to defining an infection risk function at each node of a graph may allow us to allocate limited testing capacity in a rational manner. Key challenges shall become clear after we have analysed the structure of our problem.
Initially, it appears that the main challenge involves finding a reliable definition of when we are in the lowdata regime.
Graph structure and state space:
We shall assume that our graph is a small world network sampled from an epidemiological state space:
a. Each node represents an individual whose set of neighbours(i.e. physical contacts) may be represented by .
b. We shall assume that at any instant a particular node is either susceptible or infected, so for a graph with nodes there are possible states.
Data collection:
We shall assume that individuals in this social network use a NAT app where:
a. They log clinically relevant phenotypic data(aka phenotypic space): age, biological sex, preexisting medical conditions
b. Symptoms(aka symptom space): cough, sore throat, temperature, sense of smell
We may assume that symptoms shall be logged on a daily basis.
Modelling infection risk using machine learning:
Given that test kits are limited we need a method for prioritising test allocation. This may be done using a model of infection risk. Mathematically, in order to determine whether a particular individual should take a test we use a parametric risk function:
\begin{equation} \mathcal{R}(\theta): \mathbb{R}^l \times \mathbb{R}^{d} \rightarrow [0,1] \end{equation}
where is the dimension of the symptom space, and is the dimension of the phenotypic space. Furthermore, for each vertex there is a feature map such that .
in the lowdata regime:
In the lowdata regime, learning is unstable and there are no convergence guarantees for . However, we still need a reliable definition of . In such a regime it may be possible to have either hardcoded by a team of experts or we may use a function that is pretrained on data with similar properties.
I must add that in this regime, we don’t do any learning just function evaluations or what some in the machine learning community would call inference.
Discussion:
In the largedata regime, we can use some form of privacypreserving machine learning. However, I think it makes sense to first focus on the lowdata regime problem. I suspect that the authors of [3] might have a reasonable candidate for .
Finally, I’d like to add that if is good enough not only is machine learning unnecessary but it may also be used as a proxy measure for test outcomes.
References:

Alexander A. Alemi, Matthew Bierbaum, Christopher R. Myers, James P. Sethna. You Can Run, You Can Hide: The Epidemiology and Statistical Mechanics of Zombies. Arxiv. 2015.

Jussi Taipale, Paul Romer, Sten Linnarsson. Populationscale testing can suppress the spread of covid19. medrxiv. 2020.

Hagai Rossman, Ayya Keshet, Smadar Shilo, Amir Gavrieli, Tal Bauman, Ori Cohen, Esti Shelly, Ran Balicer, Benjamin Geiger, Yuval Dor & Eran Segal. A framework for identifying regional outbreak and spread of COVID19 from oneminute populationwide surveys. Nature. 2020.