Model¶

class covid19_inference.model.Cov19Model(new_cases_obs, data_begin, fcast_len, diff_data_sim, N_population, name='', model=None)[source]¶

Model class used to create a covid-19 propagation dynamics model. Parameters below are passed to the constructor. Attributes (Variables) are available after creation and can be accessed from every instance. Some background:

The simulation starts diff_data_sim days before the data.

The data has a certain length, on which the inference is based. This length is given by new_cases_obs.

After the inference, a forecast takes of length fcast_len takes place, starting on the day after the last data point in new_cases_obs.

In total, traces produced by a model run have the length sim_len = diff_data_sim + data_len + fcast_len

Date ranges include both boundaries. For example, if data_begin is March 1 and data_end is March 3 then data_len will be 3.

Parameters

new_cases_obs (1 or 2d array) – If the array is two-dimensional, an hierarchical model will be constructed. First dimension is then time, the second the region/country.
data_begin (datatime.datetime) – Date of the first data point
fcast_len (int) – Number of days the simulations runs longer than the data
diff_data_sim (int) – Number of days the simulation starts earlier than the data. Should be significantly longer than the delay between infection and report of cases.
N_population (number or 1d array) – Number of inhabitance in region, needed for the S(E)IR model. Is ideally 1 dimensional if new_cases_obs is 2 dimensional
name (string) – suffix appended to the name of random variables saved in the trace
model – specify a model, if this one should expand another

Variables

new_cases_obs (1 or 2d array) – as passed during construction
data_begin (datatime.datetime) – date of the first data point in the data
data_end (datatime.datetime) – date of the last data point in the data
sim_begin (datatime.datetime) – date at which the simulation begins
sim_end (datatime.datetime) – date at which the simulation ends (should match fcast_end)
fcast_begin (datatime.datetime) – date at which the forecast starts (should be one day after data_end)
fcast_end (datatime.datetime) – data at which the forecast ends
data_len (int) – total number of days in the data
sim_len (int) – total number of days in the simulation
fcast_len (int) – total number of days in the forecast
diff_data_sim (int) – difference in days between the simulation begin and the data begin. The simulation starting time is usually earlier than the data begin.

Example

with Cov19Model(**params) as model:
    # Define model here

covid19_inference.model.modelcontext(model)[source]¶: return the given model or try to find it in the context if there was none supplied.

covid19_inference.model.student_t_likelihood(new_cases_inferred, pr_beta_sigma_obs=30, nu=4, offset_sigma=1, model=None, data_obs=None, name_student_t='_new_cases_studentT', name_sigma_obs='sigma_obs')[source]¶

Set the likelihood to apply to the model observations (model.new_cases_obs) We assume a StudentT distribution because it is robust against outliers [Lange1989]. The likelihood follows:

$P(\text{data\_obs}) &\sim StudentT(\text{mu} = \text{new\_cases\_inferred}, sigma =\sigma, \text{nu} = \text{nu})\\ \sigma &= \sigma_r \sqrt{\text{new\_cases\_inferred} + \text{offset\_sigma}}$

The parameter $\sigma_r$ follows a HalfCauchy prior distribution with parameter beta set by pr_beta_sigma_obs. If the input is 2 dimensional, the parameter $\sigma_r$ is different for every region.

Parameters

new_cases_inferred (TensorVariable) – One or two dimensonal array. If 2 dimensional, the first dimension is time and the second are the regions/countries
pr_beta_sigma_obs (float) – The beta of the HalfCauchy prior distribution of $\sigma_r$ .
nu (float) – How flat the tail of the distribution is. Larger nu should make the model more robust to outliers. Defaults to 4 [Lange1989].
offset_sigma (float) – An offset added to the sigma, to make the inference procedure robust. Otherwise numbers of new_cases_inferred would lead to very small errors and diverging likelihoods. Defaults to 1.
model – The model on which we want to add the distribution
data_obs (array) – The data that is observed. By default it is model.new_cases_ob
name_student_t – The name under which the studentT distribution is saved in the trace.
name_sigma_obs – The name under which the distribution of the observable error is saved in the trace

Returns

None

References

Lange1989(1,2): Lange, K., Roderick J. A. Little, & Jeremy M. G. Taylor. (1989). Robust Statistical Modeling Using the t Distribution. Journal of the American Statistical Association, 84(408), 881-896. doi:10.2307/2290063

covid19_inference.model.SIR(lambda_t_log, mu, pr_I_begin=100, model=None, return_all=False, save_all=False)[source]¶

Implements the susceptible-infected-recovered model.

$I_{new}(t) &= \lambda_t I(t-1) \frac{S(t-1)}{N} \\ S(t) &= S(t-1) - I_{new}(t) \\ I(t) &= I(t-1) + I_{new}(t) - \mu I(t)$

The prior distribution of the recovery rate $\mu$ is set to $LogNormal(\text{log(pr\_median\_mu)), pr\_sigma\_mu})$ . And the prior distribution of $I(0)$ to $HalfCauchy(\text{pr\_beta\_I\_begin})$

Parameters

lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional the first dimension is time.
mu (TensorVariable) – the recovery rate $\mu$ , typically a random variable. Can be 0 or 1-dimensional. If 1-dimensional, the dimension are the different regions.
pr_I_begin (float or array_like or TensorVariable) – Prior beta of the Half-Cauchy distribution of $I(0)$ .
pr_median_mu (float or array_like) – Prior for the median of the lognormal distrubution of the recovery rate $\mu$ .
pr_sigma_mu (float or array_like) – Prior for the sigma of the lognormal distribution of recovery rate $\mu$ .
model (Cov19Model) – if none, it is retrieved from the context
return_all (bool) – if True, returns new_I_t, I_t, S_t otherwise returns only new_I_t
save_all (bool) – if True, saves new_I_t, I_t, S_t in the trace, otherwise it saves only new_I_t

Returns

new_I_t (TensorVariable) – time series of the number daily newly infected persons.
I_t (TensorVariable) – time series of the infected (if return_all set to True)
S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

covid19_inference.model.SEIR(lambda_t_log, pr_beta_I_begin=100, pr_beta_new_E_begin=50, pr_median_mu=0.125, pr_mean_median_incubation=4, pr_sigma_median_incubation=1, sigma_incubation=0.4, pr_sigma_mu=0.2, model=None, return_all=False, save_all=False, name_median_incubation='median_incubation')[source]¶

Implements a model similar to the susceptible-exposed-infected-recovered model. Instead of a exponential decaying incubation period, the length of the period is lognormal distributed. The complete equation is:

$E_{\text{new}}(t) &= \lambda_t I(t-1) \frac{S(t)}{N} \\ S(t) &= S(t-1) - E_{\text{new}}(t) \\ I_\text{new}(t) &= \sum_{k=1}^{10} \beta(k) E_{\text{new}}(t-k) \\ I(t) &= I(t-1) + I_{\text{new}}(t) - \mu I(t) \\ \beta(k) & = P(k) \sim LogNormal(\text{log}(d_{\text{incubation}})), \text{sigma\_incubation})$

The recovery rate $\mu$ and the incubation period is the same for all regions and follow respectively:

$P(\mu) &\sim LogNormal(\text{log(pr\_median\_mu)), pr\_sigma\_mu}) \\ P(d_{\text{incubation}}) &\sim Normal(\text{pr\_mean\_median\_incubation, pr\_sigma\_median\_incubation})$

The initial number of infected and newly exposed differ for each region and follow prior HalfCauchy distributions:

$E(t) &\sim HalfCauchy(\text{pr\_beta\_E\_begin}) \:\: \text{ for} \: t \in \{-9, -8, ..., 0\}\\ I(0) &\sim HalfCauchy(\text{pr\_beta\_I\_begin}).$

Parameters

lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional, the first dimension is time.
pr_beta_I_begin (float or array_like) – Prior beta of the HalfCauchy distribution of $I(0)$ .
pr_beta_new_E_begin (float or array_like) – Prior beta of the HalfCauchy distribution of $E(0)$ .
pr_median_mu (float or array_like) – Prior for the median of the Lognormal distribution of the recovery rate $\mu$ .
pr_mean_median_incubation – Prior mean of the Normal distribution of the median incubation delay $d_{\text{incubation}}$ . Defaults to 4 days [Nishiura2020], which is the median serial interval (the important measure here is not exactly the incubation period, but the delay until a person becomes infectious which seems to be about 1 day earlier as showing symptoms).
pr_sigma_median_incubation – Prior sigma of the Normal distribution of the median incubation delay $d_{\text{incubation}}$ . Default is 1 day.
sigma_incubation – Scale parameter of the Lognormal distribution of the incubation time/ delay until infectiousness. The default is set to 0.4, which is about the scale found in [Nishiura2020], [Lauer2020].
pr_sigma_mu (float or array_like) – Prior for the sigma of the lognormal distribution of recovery rate $\mu$ .
model (Cov19Model) – if none, it is retrieved from the context
return_all (bool) – if True, returns new_I_t, new_E_t, I_t, S_t otherwise returns only new_I_t
save_all (bool) – if True, saves new_I_t, new_E_t, I_t, S_t in the trace, otherwise it saves only new_I_t
name_median_incubation (str) – The name under which the median incubation time is saved in the trace

Returns

new_I_t (TensorVariable) – time series of the number daily newly infected persons.
new_E_t (TensorVariable) – time series of the number daily newly exposed persons. (if return_all set to True)
I_t (TensorVariable) – time series of the infected (if return_all set to True)
S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

References

Nishiura2020(1,2): Nishiura, H.; Linton, N. M.; Akhmetzhanov, A. R. Serial Interval of Novel Coronavirus (COVID-19) Infections. Int. J. Infect. Dis. 2020, 93, 284–286. https://doi.org/10.1016/j.ijid.2020.02.060.
Lauer2020: Lauer, S. A.; Grantz, K. H.; Bi, Q.; Jones, F. K.; Zheng, Q.; Meredith, H. R.; Azman, A. S.; Reich, N. G.; Lessler, J. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Ann Intern Med 2020. https://doi.org/10.7326/M20-0504.

covid19_inference.model.delay_cases(new_I_t, pr_median_delay=10, pr_sigma_median_delay=0.2, pr_median_scale_delay=0.3, pr_sigma_scale_delay=None, model=None, save_in_trace=True, name_delay='delay', name_delayed_cases='new_cases_raw', len_input_arr=None, len_output_arr=None, diff_input_output=None)[source]¶

Convolves the input by a lognormal distribution, in order to model a delay:

$y_\text{delayed}(t) &= \sum_{\tau=0}^T y_\text{input}(\tau) LogNormal[log(\text{delay}), \text{pr\_median\_scale\_delay}](t - \tau)\\ log(\text{delay}) &= Normal(log(\text{pr\_sigma\_delay}), \text{pr\_sigma\_delay})$

For clarification: the $LogNormal$ distribution is a function evaluated at $t - \tau$ .

If the model is 2-dimensional, the $log(\text{delay})$ is hierarchically modelled with the hierarchical_normal() function using the default parameters except that the prior $\sigma$ of $\text{delay}_\text{L2}$ is HalfNormal distributed (error_cauchy=False).

Parameters

new_I_t (TensorVariable) – The input, typically the number newly infected cases $I_{new}(t)$ of from the output of SIR() or SEIR().
pr_median_delay (float) – The mean of the normal distribution which models the prior median of the LogNormal delay kernel.
pr_sigma_median_delay (float) – The standart devaiation of normal distribution which models the prior median of the LogNormal delay kernel.
pr_median_scale_delay (float) – The scale (width) of the LogNormal delay kernel.
pr_sigma_scale_delay (float) – If it is not None, the scale is of the delay is kernel follows a prior LogNormal distribution, with median pr_median_scale_delay and scale pr_sigma_scale_delay.
model (Cov19Model) – if none, it is retrieved from the context
save_in_trace (bool) – whether to save $y_\text{delayed}$ in the trace
name_delay (str) – The name under which the delay is saved in the trace, suffixes and prefixes are added depending on which variable is saved.
name_delayed_cases (str) – The name under which the delay is saved in the trace, suffixes and prefixes are added depending on which variable is saved.
len_input_arr – Length of new_I_t. By default equal to model.sim_len. Necessary because the shape of theano tensors are not defined at when the graph is built.
len_output_arr (int) – Length of the array returned. By default it set to the length of the cases_obs saved in the model plus the number of days of the forecast.
diff_input_output (int) – Number of days the returned array begins later then the input. Should be significantly larger than the median delay. By default it is set to the model.diff_data_sim.

Returns

new_cases_inferred (TensorVariable) – The delayed input $y_\text{delayed}(t)$ , typically the daily number new cases that one expects to measure.

covid19_inference.model.week_modulation(new_cases_raw, week_modulation_type='abs_sine', pr_mean_weekend_factor=0.3, pr_sigma_weekend_factor=0.5, week_end_days=(6, 7), model=None, save_in_trace=True)[source]¶

Adds a weekly modulation of the number of new cases:

$\text{new\_cases} &= \text{new\_cases\_raw} \cdot (1-f(t))\,, \qquad\text{with}\\ f(t) &= f_w \cdot \left(1 - \left|\sin\left(\frac{\pi}{7} t- \frac{1}{2}\Phi_w\right)\right| \right),$

if week_modulation_type is "abs_sine" (the default). If week_modulation_type is "step", the new cases are simply multiplied by the weekend factor on the days set by week_end_days

The weekend factor $f_w$ follows a Lognormal distribution with median pr_mean_weekend_factor and sigma pr_sigma_weekend_factor. It is hierarchically constructed if the input is two-dimensional by the function hierarchical_normal() with default arguments.

The offset from Sunday $\Phi_w$ follows a flat VonMises distribution and is the same for all regions.

Parameters

new_cases_raw (TensorVariable) – The input array, can be one- or two-dimensional
week_modulation_type (str) – The type of modulation, accepts "step" or "abs_sine (the default).
pr_mean_weekend_factor (float) – Sets the prior mean of the factor $f_w$ by which weekends are counted.
pr_sigma_weekend_factor (float) – Sets the prior sigma of the factor $f_w$ by which weekends are counted.
week_end_days (tuple of ints) – The days counted as weekend if week_modulation_type is "step"
model (Cov19Model) – if none, it is retrieved from the context
save_in_trace (bool) – If True (default) the new_cases are saved in the trace.

Returns

new_cases (TensorVariable)

covid19_inference.model.make_change_point_RVs(change_points_list, pr_median_lambda_0, pr_sigma_lambda_0=1, model=None)[source]¶

Parameters

priors_dict –
change_points_list –
model –

covid19_inference.model.lambda_t_with_sigmoids(change_points_list, pr_median_lambda_0, pr_sigma_lambda_0=0.5, model=None)[source]¶

Parameters

change_points_list –
pr_median_lambda_0 –
pr_sigma_lambda_0 –
model (Cov19Model) – if none, it is retrieved from the context

covid19_inference.model.hierarchical_normal(name, name_sigma, pr_mean, pr_sigma, len_L2, w=1.0, error_fact=2.0, error_cauchy=True)[source]¶

Implements an hierarchical normal model:

$x_\text{L1} &= Normal(\text{pr\_mean}, \text{pr\_sigma})\\ y_{i, \text{L2}} &= Normal(x_\text{L1}, \sigma_\text{L2})\\ \sigma_\text{L2} &= HalfCauchy(\text{error\_fact} \cdot \text{pr\_sigma})$

It is however implemented in a non-centered way, that the second line is changed to:

$y_{i, \text{L2}} &= x_\text{L1} + Normal(0,1) \cdot \sigma_\text{L2}$

See for example https://arxiv.org/pdf/1312.0906.pdf

Parameters

name (str) – Name under which $x_\text{L1}$ and $y_\text{L2}$ saved in the trace. '_L1' and '_L2' is appended
name_sigma (str) – Name under which $\sigma_\text{L2}$ saved in the trace. '_L2' is appended.
pr_mean (float) – Prior mean of $x_\text{L1}$
pr_sigma (float) – Prior sigma for $x_\text{L1}$ and (muliplied by error_fact) for $\sigma_\text{L2}$
len_L2 (int) – length of $y_\text{L2}$
error_fact (float) – Factor by which pr_sigma is multiplied as prior for sigma_text{L2}
error_cauchy (bool) – if False, a $HalfNormal$ distribution is used for $\sigma_\text{L2}$ instead of $HalfCauchy$

Returns

y (TensorVariable) – the random variable $y_\text{L2}$
x (TensorVariable) – the random variable $x_\text{L1}$

covid19_inference.model.make_prior_I(lambda_t_log, mu, pr_median_delay, pr_sigma_I_begin=2, n_data_points_used=5, model=None)[source]¶

Builds the prior for I begin by solving the SIR differential from the first data backwards. This decorrelates the I_begin from the lambda_t at the beginning, allowing a more efficient sampling. The example_one_bundesland runs about 30% faster with this prior, instead of a HalfCauchy.

Parameters

lambda_t_log (TensorVariable) –
mu (TensorVariable) –
pr_median_delay (float) –
pr_sigma_I_begin (float) –
n_data_points_used (int) –
model (Cov19Model) – if none, it is retrieved from the context

Returns

I_begin (TensorVariable)