Model

class covid19_inference.model.Cov19Model(new_cases_obs, data_begin, fcast_len, diff_data_sim, N_population, name='', model=None)[source]

Model class used to create a covid-19 propagation dynamics model. Parameters below are passed to the constructor. Attributes (Variables) are available after creation and can be accessed from every instance. Some background:

  • The simulation starts diff_data_sim days before the data.

  • The data has a certain length, on which the inference is based. This length is given by new_cases_obs.

  • After the inference, a forecast takes of length fcast_len takes place, starting on the day after the last data point in new_cases_obs.

  • In total, traces produced by a model run have the length sim_len = diff_data_sim + data_len + fcast_len

  • Date ranges include both boundaries. For example, if data_begin is March 1 and data_end is March 3 then data_len will be 3.

Parameters
  • new_cases_obs (1 or 2d array) – If the array is two-dimensional, an hierarchical model will be constructed. First dimension is then time, the second the region/country.

  • data_begin (datatime.datetime) – Date of the first data point

  • fcast_len (int) – Number of days the simulations runs longer than the data

  • diff_data_sim (int) – Number of days the simulation starts earlier than the data. Should be significantly longer than the delay between infection and report of cases.

  • N_population (number or 1d array) – Number of inhabitance in region, needed for the S(E)IR model. Is ideally 1 dimensional if new_cases_obs is 2 dimensional

  • name (string) – suffix appended to the name of random variables saved in the trace

  • model – specify a model, if this one should expand another

Variables
  • new_cases_obs (1 or 2d array) – as passed during construction

  • data_begin (datatime.datetime) – date of the first data point in the data

  • data_end (datatime.datetime) – date of the last data point in the data

  • sim_begin (datatime.datetime) – date at which the simulation begins

  • sim_end (datatime.datetime) – date at which the simulation ends (should match fcast_end)

  • fcast_begin (datatime.datetime) – date at which the forecast starts (should be one day after data_end)

  • fcast_end (datatime.datetime) – data at which the forecast ends

  • data_len (int) – total number of days in the data

  • sim_len (int) – total number of days in the simulation

  • fcast_len (int) – total number of days in the forecast

  • diff_data_sim (int) – difference in days between the simulation begin and the data begin. The simulation starting time is usually earlier than the data begin.

Example

with Cov19Model(**params) as model:
    # Define model here
covid19_inference.model.modelcontext(model)[source]

return the given model or try to find it in the context if there was none supplied.

covid19_inference.model.student_t_likelihood(new_cases_inferred, pr_beta_sigma_obs=30, nu=4, offset_sigma=1, model=None, data_obs=None, name_student_t='_new_cases_studentT', name_sigma_obs='sigma_obs')[source]

Set the likelihood to apply to the model observations (model.new_cases_obs) We assume a StudentT distribution because it is robust against outliers [Lange1989]. The likelihood follows:

P(\text{data\_obs}) &\sim StudentT(\text{mu} = \text{new\_cases\_inferred}, sigma =\sigma,
\text{nu} = \text{nu})\\
\sigma &= \sigma_r \sqrt{\text{new\_cases\_inferred} + \text{offset\_sigma}}

The parameter \sigma_r follows a HalfCauchy prior distribution with parameter beta set by pr_beta_sigma_obs. If the input is 2 dimensional, the parameter \sigma_r is different for every region.

Parameters
  • new_cases_inferred (TensorVariable) – One or two dimensonal array. If 2 dimensional, the first dimension is time and the second are the regions/countries

  • pr_beta_sigma_obs (float) – The beta of the HalfCauchy prior distribution of \sigma_r.

  • nu (float) – How flat the tail of the distribution is. Larger nu should make the model more robust to outliers. Defaults to 4 [Lange1989].

  • offset_sigma (float) – An offset added to the sigma, to make the inference procedure robust. Otherwise numbers of new_cases_inferred would lead to very small errors and diverging likelihoods. Defaults to 1.

  • model – The model on which we want to add the distribution

  • data_obs (array) – The data that is observed. By default it is model.new_cases_ob

  • name_student_t – The name under which the studentT distribution is saved in the trace.

  • name_sigma_obs – The name under which the distribution of the observable error is saved in the trace

Returns

None

References

Lange1989(1,2)

Lange, K., Roderick J. A. Little, & Jeremy M. G. Taylor. (1989). Robust Statistical Modeling Using the t Distribution. Journal of the American Statistical Association, 84(408), 881-896. doi:10.2307/2290063

covid19_inference.model.SIR(lambda_t_log, mu, pr_I_begin=100, model=None, return_all=False, save_all=False)[source]

Implements the susceptible-infected-recovered model.

I_{new}(t) &= \lambda_t I(t-1)  \frac{S(t-1)}{N}   \\
S(t) &= S(t-1) - I_{new}(t)  \\
I(t) &= I(t-1) + I_{new}(t) - \mu  I(t)

The prior distribution of the recovery rate \mu is set to LogNormal(\text{log(pr\_median\_mu)), pr\_sigma\_mu}). And the prior distribution of I(0) to HalfCauchy(\text{pr\_beta\_I\_begin})

Parameters
  • lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional the first dimension is time.

  • mu (TensorVariable) – the recovery rate \mu, typically a random variable. Can be 0 or 1-dimensional. If 1-dimensional, the dimension are the different regions.

  • pr_I_begin (float or array_like or TensorVariable) – Prior beta of the Half-Cauchy distribution of I(0).

  • pr_median_mu (float or array_like) – Prior for the median of the lognormal distrubution of the recovery rate \mu.

  • pr_sigma_mu (float or array_like) – Prior for the sigma of the lognormal distribution of recovery rate \mu.

  • model (Cov19Model) – if none, it is retrieved from the context

  • return_all (bool) – if True, returns new_I_t, I_t, S_t otherwise returns only new_I_t

  • save_all (bool) – if True, saves new_I_t, I_t, S_t in the trace, otherwise it saves only new_I_t

Returns

  • new_I_t (TensorVariable) – time series of the number daily newly infected persons.

  • I_t (TensorVariable) – time series of the infected (if return_all set to True)

  • S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

covid19_inference.model.SEIR(lambda_t_log, pr_beta_I_begin=100, pr_beta_new_E_begin=50, pr_median_mu=0.125, pr_mean_median_incubation=4, pr_sigma_median_incubation=1, sigma_incubation=0.4, pr_sigma_mu=0.2, model=None, return_all=False, save_all=False, name_median_incubation='median_incubation')[source]

Implements a model similar to the susceptible-exposed-infected-recovered model. Instead of a exponential decaying incubation period, the length of the period is lognormal distributed. The complete equation is:

E_{\text{new}}(t) &= \lambda_t I(t-1) \frac{S(t)}{N}   \\
S(t) &= S(t-1) - E_{\text{new}}(t)  \\
I_\text{new}(t) &= \sum_{k=1}^{10} \beta(k) E_{\text{new}}(t-k)   \\
I(t) &= I(t-1) + I_{\text{new}}(t) - \mu  I(t) \\
\beta(k) & = P(k) \sim LogNormal(\text{log}(d_{\text{incubation}})), \text{sigma\_incubation})

The recovery rate \mu and the incubation period is the same for all regions and follow respectively:

P(\mu) &\sim LogNormal(\text{log(pr\_median\_mu)), pr\_sigma\_mu}) \\
P(d_{\text{incubation}}) &\sim Normal(\text{pr\_mean\_median\_incubation, pr\_sigma\_median\_incubation})

The initial number of infected and newly exposed differ for each region and follow prior HalfCauchy distributions:

E(t)  &\sim HalfCauchy(\text{pr\_beta\_E\_begin}) \:\: \text{ for} \: t \in \{-9, -8, ..., 0\}\\
I(0)  &\sim HalfCauchy(\text{pr\_beta\_I\_begin}).

Parameters
  • lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional, the first dimension is time.

  • pr_beta_I_begin (float or array_like) – Prior beta of the HalfCauchy distribution of I(0).

  • pr_beta_new_E_begin (float or array_like) – Prior beta of the HalfCauchy distribution of E(0).

  • pr_median_mu (float or array_like) – Prior for the median of the Lognormal distribution of the recovery rate \mu.

  • pr_mean_median_incubation – Prior mean of the Normal distribution of the median incubation delay d_{\text{incubation}}. Defaults to 4 days [Nishiura2020], which is the median serial interval (the important measure here is not exactly the incubation period, but the delay until a person becomes infectious which seems to be about 1 day earlier as showing symptoms).

  • pr_sigma_median_incubation – Prior sigma of the Normal distribution of the median incubation delay d_{\text{incubation}}. Default is 1 day.

  • sigma_incubation – Scale parameter of the Lognormal distribution of the incubation time/ delay until infectiousness. The default is set to 0.4, which is about the scale found in [Nishiura2020], [Lauer2020].

  • pr_sigma_mu (float or array_like) – Prior for the sigma of the lognormal distribution of recovery rate \mu.

  • model (Cov19Model) – if none, it is retrieved from the context

  • return_all (bool) – if True, returns new_I_t, new_E_t, I_t, S_t otherwise returns only new_I_t

  • save_all (bool) – if True, saves new_I_t, new_E_t, I_t, S_t in the trace, otherwise it saves only new_I_t

  • name_median_incubation (str) – The name under which the median incubation time is saved in the trace

Returns

  • new_I_t (TensorVariable) – time series of the number daily newly infected persons.

  • new_E_t (TensorVariable) – time series of the number daily newly exposed persons. (if return_all set to True)

  • I_t (TensorVariable) – time series of the infected (if return_all set to True)

  • S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

References

Nishiura2020(1,2)

Nishiura, H.; Linton, N. M.; Akhmetzhanov, A. R. Serial Interval of Novel Coronavirus (COVID-19) Infections. Int. J. Infect. Dis. 2020, 93, 284–286. https://doi.org/10.1016/j.ijid.2020.02.060.

Lauer2020

Lauer, S. A.; Grantz, K. H.; Bi, Q.; Jones, F. K.; Zheng, Q.; Meredith, H. R.; Azman, A. S.; Reich, N. G.; Lessler, J. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Ann Intern Med 2020. https://doi.org/10.7326/M20-0504.

covid19_inference.model.delay_cases(new_I_t, pr_median_delay=10, pr_sigma_median_delay=0.2, pr_median_scale_delay=0.3, pr_sigma_scale_delay=None, model=None, save_in_trace=True, name_delay='delay', name_delayed_cases='new_cases_raw', len_input_arr=None, len_output_arr=None, diff_input_output=None)[source]

Convolves the input by a lognormal distribution, in order to model a delay:

y_\text{delayed}(t) &= \sum_{\tau=0}^T y_\text{input}(\tau) LogNormal[log(\text{delay}), \text{pr\_median\_scale\_delay}](t - \tau)\\
log(\text{delay}) &= Normal(log(\text{pr\_sigma\_delay}), \text{pr\_sigma\_delay})

For clarification: the LogNormal distribution is a function evaluated at t - \tau.

If the model is 2-dimensional, the log(\text{delay}) is hierarchically modelled with the hierarchical_normal() function using the default parameters except that the prior \sigma of \text{delay}_\text{L2} is HalfNormal distributed (error_cauchy=False).

Parameters
  • new_I_t (TensorVariable) – The input, typically the number newly infected cases I_{new}(t) of from the output of SIR() or SEIR().

  • pr_median_delay (float) – The mean of the normal distribution which models the prior median of the LogNormal delay kernel.

  • pr_sigma_median_delay (float) – The standart devaiation of normal distribution which models the prior median of the LogNormal delay kernel.

  • pr_median_scale_delay (float) – The scale (width) of the LogNormal delay kernel.

  • pr_sigma_scale_delay (float) – If it is not None, the scale is of the delay is kernel follows a prior LogNormal distribution, with median pr_median_scale_delay and scale pr_sigma_scale_delay.

  • model (Cov19Model) – if none, it is retrieved from the context

  • save_in_trace (bool) – whether to save y_\text{delayed} in the trace

  • name_delay (str) – The name under which the delay is saved in the trace, suffixes and prefixes are added depending on which variable is saved.

  • name_delayed_cases (str) – The name under which the delay is saved in the trace, suffixes and prefixes are added depending on which variable is saved.

  • len_input_arr – Length of new_I_t. By default equal to model.sim_len. Necessary because the shape of theano tensors are not defined at when the graph is built.

  • len_output_arr (int) – Length of the array returned. By default it set to the length of the cases_obs saved in the model plus the number of days of the forecast.

  • diff_input_output (int) – Number of days the returned array begins later then the input. Should be significantly larger than the median delay. By default it is set to the model.diff_data_sim.

Returns

new_cases_inferred (TensorVariable) – The delayed input y_\text{delayed}(t), typically the daily number new cases that one expects to measure.

covid19_inference.model.week_modulation(new_cases_raw, week_modulation_type='abs_sine', pr_mean_weekend_factor=0.3, pr_sigma_weekend_factor=0.5, week_end_days=(6, 7), model=None, save_in_trace=True)[source]

Adds a weekly modulation of the number of new cases:

\text{new\_cases} &= \text{new\_cases\_raw} \cdot (1-f(t))\,, \qquad\text{with}\\
f(t) &= f_w \cdot \left(1 - \left|\sin\left(\frac{\pi}{7} t- \frac{1}{2}\Phi_w\right)\right| \right),

if week_modulation_type is "abs_sine" (the default). If week_modulation_type is "step", the new cases are simply multiplied by the weekend factor on the days set by week_end_days

The weekend factor f_w follows a Lognormal distribution with median pr_mean_weekend_factor and sigma pr_sigma_weekend_factor. It is hierarchically constructed if the input is two-dimensional by the function hierarchical_normal() with default arguments.

The offset from Sunday \Phi_w follows a flat VonMises distribution and is the same for all regions.

Parameters
  • new_cases_raw (TensorVariable) – The input array, can be one- or two-dimensional

  • week_modulation_type (str) – The type of modulation, accepts "step" or "abs_sine (the default).

  • pr_mean_weekend_factor (float) – Sets the prior mean of the factor f_w by which weekends are counted.

  • pr_sigma_weekend_factor (float) – Sets the prior sigma of the factor f_w by which weekends are counted.

  • week_end_days (tuple of ints) – The days counted as weekend if week_modulation_type is "step"

  • model (Cov19Model) – if none, it is retrieved from the context

  • save_in_trace (bool) – If True (default) the new_cases are saved in the trace.

Returns

new_cases (TensorVariable)

covid19_inference.model.make_change_point_RVs(change_points_list, pr_median_lambda_0, pr_sigma_lambda_0=1, model=None)[source]
Parameters
  • priors_dict

  • change_points_list

  • model

covid19_inference.model.lambda_t_with_sigmoids(change_points_list, pr_median_lambda_0, pr_sigma_lambda_0=0.5, model=None)[source]
Parameters
  • change_points_list

  • pr_median_lambda_0

  • pr_sigma_lambda_0

  • model (Cov19Model) – if none, it is retrieved from the context

covid19_inference.model.hierarchical_normal(name, name_sigma, pr_mean, pr_sigma, len_L2, w=1.0, error_fact=2.0, error_cauchy=True)[source]

Implements an hierarchical normal model:

x_\text{L1} &= Normal(\text{pr\_mean}, \text{pr\_sigma})\\
y_{i, \text{L2}} &= Normal(x_\text{L1}, \sigma_\text{L2})\\
\sigma_\text{L2} &= HalfCauchy(\text{error\_fact} \cdot \text{pr\_sigma})

It is however implemented in a non-centered way, that the second line is changed to:

y_{i, \text{L2}} &= x_\text{L1} +  Normal(0,1) \cdot \sigma_\text{L2}

See for example https://arxiv.org/pdf/1312.0906.pdf

Parameters
  • name (str) – Name under which x_\text{L1} and y_\text{L2} saved in the trace. '_L1' and '_L2' is appended

  • name_sigma (str) – Name under which \sigma_\text{L2} saved in the trace. '_L2' is appended.

  • pr_mean (float) – Prior mean of x_\text{L1}

  • pr_sigma (float) – Prior sigma for x_\text{L1} and (muliplied by error_fact) for \sigma_\text{L2}

  • len_L2 (int) – length of y_\text{L2}

  • error_fact (float) – Factor by which pr_sigma is multiplied as prior for sigma_text{L2}

  • error_cauchy (bool) – if False, a HalfNormal distribution is used for \sigma_\text{L2} instead of HalfCauchy

Returns

covid19_inference.model.make_prior_I(lambda_t_log, mu, pr_median_delay, pr_sigma_I_begin=2, n_data_points_used=5, model=None)[source]

Builds the prior for I begin by solving the SIR differential from the first data backwards. This decorrelates the I_begin from the lambda_t at the beginning, allowing a more efficient sampling. The example_one_bundesland runs about 30% faster with this prior, instead of a HalfCauchy.

Parameters
Returns

I_begin (TensorVariable)