Model

If you are familiar with pymc3, then looking at the example below should explain how our model works. Otherwise, here is a quick overview:

  • First, we have to create an instance of the base class (that is derived from pymc3s model class). It has some convenient properties to get the range of the data, simulation length and so forth.

  • We then add details that base model. They correspond to the actual (physical) model features, such as the change points, the reporting delay and the week modulation.

    • Every feature has its own function that takes in arguments to set prior assumptions.

    • Sometimes they also take in input (data, reported cases … ) but none of the functions perform any actual modifications on the data. They only tell pymc3 what it is supposed to do during the sampling.

    • None of our functions actually modifies any data. They rather define ways how pymc3 should modify data during the sampling.

    • Most of the feature functions add variables to the pymc3.trace, see the function arguments that start with name_.

  • In pymc3 it is common to use a context, as we also do in the example. Everything within the block with cov19.model.Cov19Model(**params_model) as this_model: automatically applies to this_model. Alternatively, you could provide a keyword to each function model=this_model.

Example

import datetime

import pymc3 as pm
import numpy as np
import covid19_inference as cov19

# limit the data range
bd = datetime.datetime(2020, 3, 2)

# download data
jhu = cov19.data_retrieval.JHU(auto_download=True)
new_cases = jhu.get_new(country="Germany", data_begin=bd)

# set model parameters
params_model = dict(
    new_cases_obs=new_cases,
    data_begin=bd,
    fcast_len=28,
    diff_data_sim=16,
    N_population=83e6,
)

# change points like in the paper
change_points = [
    dict(pr_mean_date_transient=datetime.datetime(2020, 3, 9)),
    dict(pr_mean_date_transient=datetime.datetime(2020, 3, 16)),
    dict(pr_mean_date_transient=datetime.datetime(2020, 3, 23)),
]

# create model instance and add details
with cov19.model.Cov19Model(**params_model) as this_model:
    # apply change points, lambda is in log scale
    lambda_t_log = cov19.model.lambda_t_with_sigmoids(
        pr_median_lambda_0=0.4,
        pr_sigma_lambda_0=0.5,
        change_points_list=change_points,
    )

    # prior for the recovery rate
    mu = pm.Lognormal(name="mu", mu=np.log(1 / 8), sigma=0.2)

    # new Infected day over day are determined from the SIR model
    new_I_t = cov19.model.SIR(lambda_t_log, mu)

    # model the reporting delay, our prior is ten days
    new_cases_inferred_raw = cov19.model.delay_cases(
        cases=new_I_t,
        pr_mean_of_median=10,
    )

    # apply a weekly modulation, fewer reports during weekends
    new_cases_inferred = cov19.model.week_modulation(new_cases_inferred_raw)

    # set the likeliehood
    cov19.model.student_t_likelihood(new_cases_inferred)

# run the sampling
trace = pm.sample(model=this_model, tune=50, draws=10, init="advi+adapt_diag")

Model Base Class

class covid19_inference.model.Cov19Model(new_cases_obs, data_begin, fcast_len, diff_data_sim, N_population, data_end=None, name='', model=None, shifted_cases=True)[source]

Abstract base class for the dynamic model of covid-19 propagation. Derived from pymc3.Model.

Parameters below are passed to the constructor.

Attributes (Variables) are available after creation and can be accessed from every instance. Some background:

  • The simulation starts diff_data_sim days before the data.

  • The data has a certain length, on which the inference is based. This length is given by new_cases_obs.

  • After the inference, a forecast takes of length fcast_len takes place, starting on the day after the last data point in new_cases_obs.

  • In total, traces produced by a model run have the length sim_len = diff_data_sim + data_len + fcast_len

  • Date ranges include both boundaries. For example, if data_begin is March 1 and data_end is March 3 then data_len will be 3.

Parameters
  • new_cases_obs (1 or 2d array) – If the array is two-dimensional, an hierarchical model will be constructed. First dimension is then time, the second the region/country.

  • data_begin (datatime.datetime) – Date of the first data point

  • fcast_len (int) – Number of days the simulations runs longer than the data

  • diff_data_sim (int) – Number of days the simulation starts earlier than the data. Should be significantly longer than the delay between infection and report of cases.

  • N_population (number or 1d array) – Number of inhabitance in region, needed for the S(E)IR model. Is ideally 1 dimensional if new_cases_obs is 2 dimensional

  • name (string) – suffix appended to the name of random variables saved in the trace

  • model – specify a model, if this one should expand another

  • shifted_cases (bool) – when enabled (True), interprets short intervals of zero cases as days, where no reporting happens and adds model cases to next non-zero-case day

Variables
  • new_cases_obs (1 or 2d array) – as passed during construction

  • data_begin (datatime.datetime) – date of the first data point in the data

  • data_end (datatime.datetime) – date of the last data point in the data

  • sim_begin (datatime.datetime) – date at which the simulation begins

  • sim_end (datatime.datetime) – date at which the simulation ends (should match fcast_end)

  • fcast_begin (datatime.datetime) – date at which the forecast starts (should be one day after data_end)

  • fcast_end (datatime.datetime) – data at which the forecast ends

  • data_len (int) – total number of days in the data

  • sim_len (int) – total number of days in the simulation

  • fcast_len (int) – total number of days in the forecast

  • diff_data_sim (int) – difference in days between the simulation begin and the data begin. The simulation starting time is usually earlier than the data begin.

Example

with Cov19Model(**params) as model:
    # Define model here
property untransformed_freeRVs

Returns the names of all free parameters of the model, usefull for plotting!

Returns

list – all variable names

Compartmental models

SIR — susceptible-infected-recovered

covid19_inference.model.SIR(lambda_t_log, mu, name_new_I_t='new_I_t', name_I_begin='I_begin', name_I_t='I_t', name_S_t='S_t', pr_I_begin=100, model=None, return_all=False)[source]

Implements the susceptible-infected-recovered model.

Parameters
  • lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional the first dimension is time.

  • mu (TensorVariable) – the recovery rate \mu, typically a random variable. Can be 0 or 1-dimensional. If 1-dimensional, the dimension are the different regions.

  • name_new_I_t (str, optional) – Name of the new_I_t variable

  • name_I_begin (str, optional) – Name of the I_be  gin variable

  • name_I_t (str, optional) – Name of the I_t variable, set to None to avoid adding as trace variable.

  • name_S_t (str, optional) – Name of the S_t variable, set to None to avoid adding as trace variable.

  • pr_I_begin (float or array_like or Variable) – Prior beta of the Half-Cauchy distribution of I(0). if type is at.Constant, I_begin will not be inferred by pymc3

  • model (Cov19Model) – if none, it is retrieved from the context

  • return_all (bool) – if True, returns name_new_I_t, name_I_t, name_S_t otherwise returns only name_new_I_t

Returns

  • new_I_t (TensorVariable) – time series of the number daily newly infected persons.

  • I_t (TensorVariable) – time series of the infected (if return_all set to True)

  • S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

More Details

I_{new}(t) &= \lambda_t I(t-1)  \frac{S(t-1)}{N}   \\
S(t) &= S(t-1) - I_{new}(t)  \\
I(t) &= I(t-1) + I_{new}(t) - \mu  I(t)

The prior distributions of the recovery rate \mu and I(0) are set to

\mu &\sim \text{LogNormal}\left[
        \log(\text{pr\_median\_mu}), \text{pr\_sigma\_mu}
    \right] \\
I(0) &\sim \text{HalfCauchy}\left[
        \text{pr\_beta\_I\_begin}
    \right]

SEIR-like — susceptible-exposed-infected-recovered

covid19_inference.model.SEIR(lambda_t_log, mu, name_new_I_t='new_I_t', name_new_E_t='new_E_t', name_I_t='I_t', name_S_t='S_t', name_I_begin='I_begin', name_new_E_begin='new_E_begin', name_median_incubation='median_incubation', pr_I_begin=100, pr_new_E_begin=50, pr_median_mu=0.125, pr_mean_median_incubation=4, pr_sigma_median_incubation=1, sigma_incubation=0.4, pr_sigma_mu=0.2, model=None, return_all=False)[source]

Implements a model similar to the susceptible-exposed-infected-recovered model. Instead of a exponential decaying incubation period, the length of the period is lognormal distributed.

Parameters
  • lambda_t_log (TensorVariable) – time series of the logarithm of the spreading rate, 1 or 2-dimensional. If 2-dimensional, the first dimension is time.

  • mu (TensorVariable) – the recovery rate \mu, typically a random variable. Can be 0 or 1-dimensional. If 1-dimensional, the dimension are the different regions.

  • name_new_I_t (str, optional) – Name of the new_I_t variable

  • name_I_t (str, optional) – Name of the I_t variable

  • name_S_t (str, optional) – Name of the S_t variable

  • name_I_begin (str, optional) – Name of the I_begin variable

  • name_new_E_begin (str, optional) – Name of the new_E_begin variable

  • name_median_incubation (str) – The name under which the median incubation time is saved in the trace

  • pr_I_begin (float or array_like) – Prior beta of the HalfCauchy distribution of I(0). if type is at.Variable, I_begin will be set to the provided prior as a constant.

  • pr_new_E_begin (float or array_like) – Prior beta of the HalfCauchy distribution of E(0).

  • pr_median_mu (float or array_like) – Prior for the median of the Lognormal distribution of the recovery rate \mu.

  • pr_mean_median_incubation – Prior mean of the Normal distribution of the median incubation delay d_{\text{incubation}}. Defaults to 4 days [Nishiura2020], which is the median serial interval (the important measure here is not exactly the incubation period, but the delay until a person becomes infectious which seems to be about 1 day earlier as showing symptoms).

  • pr_sigma_median_incubation (number or None) – Prior sigma of the Normal distribution of the median incubation delay d_{\text{incubation}}. If None, the incubation time will be fixed to the value of pr_mean_median_incubation instead of a random variable Default is 1 day.

  • sigma_incubation – Scale parameter of the Lognormal distribution of the incubation time/ delay until infectiousness. The default is set to 0.4, which is about the scale found in [Nishiura2020], [Lauer2020].

  • pr_sigma_mu (float or array_like) – Prior for the sigma of the lognormal distribution of recovery rate \mu.

  • model (Cov19Model) – if none, it is retrieved from the context

  • return_all (bool) – if True, returns name_new_I_t, name_new_E_t, name_I_t, name_S_t otherwise returns only name_new_I_t

Returns

  • name_new_I_t (TensorVariable) – time series of the number daily newly infected persons.

  • name_new_E_t (TensorVariable) – time series of the number daily newly exposed persons. (if return_all set to True)

  • name_I_t (TensorVariable) – time series of the infected (if return_all set to True)

  • name_S_t (TensorVariable) – time series of the susceptible (if return_all set to True)

More Details

E_{\text{new}}(t) &= \lambda_t I(t-1) \frac{S(t)}{N}   \\
S(t) &= S(t-1) - E_{\text{new}}(t)  \\
I_\text{new}(t) &= \sum_{k=1}^{10} \beta(k) E_{\text{new}}(t-k)   \\
I(t) &= I(t-1) + I_{\text{new}}(t) - \mu  I(t) \\
\beta(k) & = P(k) \sim \text{LogNormal}\left[
        \log(d_{\text{incubation}}), \text{sigma\_incubation}
    \right]

The recovery rate \mu and the incubation period is the same for all regions and follow respectively:

P(\mu) &\sim \text{LogNormal}\left[
       \text{log(pr\_median\_mu), pr\_sigma\_mu}
   \right] \\
P(d_{\text{incubation}}) &\sim \text{Normal}\left[
       \text{pr\_mean\_median\_incubation, pr\_sigma\_median\_incubation}
   \right]

The initial number of infected and newly exposed differ for each region and follow prior HalfCauchy distributions:

E(t)  &\sim \text{HalfCauchy}\left[
        \text{pr\_beta\_E\_begin}
    \right] \:\: \text{for} \: t \in {-9, -8, ..., 0}\\
I(0)  &\sim \text{HalfCauchy}\left[
        \text{pr\_beta\_I\_begin}
    \right].

References

  • Nishiura2020(1,2)

    Nishiura, H.; Linton, N. M.; Akhmetzhanov, A. R. Serial Interval of Novel Coronavirus (COVID-19) Infections. Int. J. Infect. Dis. 2020, 93, 284–286. https://doi.org/10.1016/j.ijid.2020.02.060.

  • Lauer2020

    Lauer, S. A.; Grantz, K. H.; Bi, Q.; Jones, F. K.; Zheng, Q.; Meredith, H. R.; Azman, A. S.; Reich, N. G.; Lessler, J. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Ann Intern Med 2020. https://doi.org/10.7326/M20-0504.

Likelihood

covid19_inference.model.student_t_likelihood(cases, name_student_t='_new_cases_studentT', name_sigma_obs='sigma_obs', pr_beta_sigma_obs=30, nu=4, offset_sigma=1, model=None, data_obs=None, sigma_shape=None)[source]

Set the likelihood to apply to the model observations (model.new_cases_obs) We assume a StudentT distribution because it is robust against outliers [Lange1989]. The likelihood follows:

P(\text{data\_obs}) &\sim StudentT(\text{mu} = \text{new\_cases\_inferred}, sigma =\sigma,
\text{nu} = \text{nu})\\
\sigma &= \sigma_r \sqrt{\text{new\_cases\_inferred} + \text{offset\_sigma}}

The parameter \sigma_r follows a HalfCauchy prior distribution with parameter beta set by pr_beta_sigma_obs. If the input is 2 dimensional, the parameter \sigma_r is different for every region, this can be changed be using the sigma_shape Parameter.

Parameters
  • cases (TensorVariable) – The daily new cases estimated by the model. Will be compared to the real world data data_obs. One or two dimensonal array. If 2 dimensional, the first dimension is time and the second are the regions/countries

  • name_student_t – The name under which the studentT distribution is saved in the trace.

  • name_sigma_obs – The name under which the distribution of the observable error is saved in the trace

  • pr_beta_sigma_obs (float) – The beta of the HalfCauchy prior distribution of \sigma_r.

  • nu (float) – How flat the tail of the distribution is. Larger nu should make the model more robust to outliers. Defaults to 4 [Lange1989].

  • offset_sigma (float) – An offset added to the sigma, to make the inference procedure robust. Otherwise numbers of cases would lead to very small errors and diverging likelihoods. Defaults to 1.

  • model – The model on which we want to add the distribution

  • data_obs (array) – The data that is observed. By default it is model.new_cases_obs

  • sigma_shape (int, array) – Shape of the sigma distribution i.e. the data error term.

Returns

None

References

Lange1989(1,2)

Lange, K., Roderick J. A. Little, & Jeremy M. G. Taylor. (1989). Robust Statistical Modeling Using the t Distribution. Journal of the American Statistical Association, 84(408), 881-896. doi:10.2307/2290063

Spreading Rate

covid19_inference.model.lambda_t_with_sigmoids(change_points_list, pr_median_lambda_0, pr_sigma_lambda_0=0.5, model=None, name_lambda_t='lambda_t', hierarchical=None, sigma_lambda_cp=None, sigma_lambda_week_cp=None, prefix_lambdas='', shape=None)[source]

Builds a time dependent spreading rate \lambda_t with change points. The change points are marked by a transient with a sigmoidal shape.

Todo

Add a bit more detailed documentation.

Parameters
  • change_points_list

  • pr_median_lambda_0

  • pr_sigma_lambda_0

  • model (Cov19Model) – if none, it is retrieved from the context

Returns

lambda_t_log

covid19_inference.model.uncorrelated_prior_I(lambda_t_log, mu, pr_median_delay, name_I_begin='I_begin', name_I_begin_ratio_log='I_begin_ratio_log', pr_sigma_I_begin=2, n_data_points_used=5, model=None)[source]

Builds the prior for I begin by solving the SIR differential from the first data backwards. This decorrelates the I_begin from the lambda_t at the beginning, allowing a more efficient sampling. The example_one_bundesland runs about 30% faster with this prior, instead of a HalfCauchy.

Parameters
  • lambda_t_log (TensorVariable) – Description

  • mu (TensorVariable) – Description

  • pr_median_delay (float) – Description

  • name_I_begin (str, optional) – Description

  • name_I_begin_ratio_log (str, optional) – Description

  • pr_sigma_I_begin (float) – Description

  • n_data_points_used (int) – Description

  • model (Cov19Model) – if none, it is retrieved from the context

  • lambda_t_log

  • mu

  • pr_median_delay

  • pr_sigma_I_begin

  • n_data_points_used

Returns

I_begin (TensorVariable)

Delay

covid19_inference.model.delay_cases(cases, name_delay='delay', name_cases=None, name_width='delay-width', pr_mean_of_median=10, pr_sigma_of_median=0.2, pr_median_of_width=0.3, pr_sigma_of_width=None, model=None, len_input_arr=None, len_output_arr=None, diff_input_output=None, seperate_on_axes=True, num_seperated_axes=None, use_gamma=False)[source]

Convolves the input by a lognormal distribution, in order to model a delay:

  • We have a kernel (a distribution) of delays, one realization of this kernel is applied to each pymc3 sample.

  • The kernel has a median delay D and a width that correspond to this one sample. Doing the ensemble average over all samples and the respective kernels, we get two distributions: one of the median delay D and one of the width.

  • The (normal) distribution of the median of D is specified using pr_mean_of_median and pr_sigma_of_median.

  • The (lognormal) distribution of the width of the kernel of D is specified using pr_median_of_width and pr_sigma_of_width. If pr_sigma_of_width is None, the width is fixed (skipping the second distribution).

Parameters
  • cases (TensorVariable) – The input, typically the number of newly infected cases from the output of SIR() or SEIR().

  • name_delay (str) – The name under which the delay is saved in the trace, suffixes and prefixes are added depending on which variable is saved. Default : “delay”

  • name_cases (str or None) – The name under which the delayed cases are saved in the trace. If None, no variable will be added to the trace. Default: “delayed_cases”

  • pr_mean_of_median (float) – The mean of the normal distribution which models the prior median of the LogNormal delay kernel. Default: 10.0 (days)

  • pr_sigma_of_median (float) – The standart devaiation of normal distribution which models the prior median of the LogNormal delay kernel. Default: 0.2

  • pr_median_of_width (float) – The scale (width) of the LogNormal delay kernel. Default: 0.3

  • pr_sigma_of_width (float or None) – Whether to put a prior distribution on the scale (width) of the distribution of the delays, too. If a number is provided, the scale of the delay kernel follows a prior LogNormal distribution, with median pr_median_scale_delay and scale pr_sigma_scale_delay. Default: None, and no distribution is applied.

  • model (Cov19Model or None) – The model to use. Default: None, model is retrieved automatically from the context

Other Parameters
  • len_input_arr – Length of new_I_t. By default equal to model.sim_len. Necessary because the shape of aesara tensors are not defined at when the graph is built.

  • len_output_arr (int) – Length of the array returned. By default it set to the length of the cases_obs saved in the model plus the number of days of the forecast.

  • diff_input_output (int) – Number of days the returned array begins later then the input. Should be significantly larger than the median delay. By default it is set to the model.diff_data_sim.

  • seperate_on_axes (Bool) – This decides whether or not the delay is applied on every axes separately. I.e. Different delay times for the different axes. If None no axes is modelled separately!

  • num_seperated_axes (int or None) – If you are not using separated axes, this is the number of axes.

Returns

delayed_cases (TensorVariable) – The delayed input y_\text{delayed}(t), typically the daily number new cases that one expects to measure.

More Details

y_\text{delayed}(t) &= \sum_{\tau=0}^T y_\text{input}(\tau)
\text{LogNormal}\left[
    \log(\text{delay}), \text{pr\_median\_scale\_delay}
\right](t - \tau) \\
\log(\text{delay}) &= \text{Normal}\left[
    \log(\text{pr\_sigma\_delay} ), \text{pr\_sigma\_delay}
\right]

The LogNormal distribution is a function evaluated at t - \tau.

If the model is 2-dimensional (hierarchical), the \log(\text{delay}) is hierarchically modelled with the hierarchical_normal() function using the default parameters except that the prior sigma of delay_L2 is HalfNormal distributed (error_cauchy=False).

Week modulation

covid19_inference.model.week_modulation(cases, name_cases=None, name_weekend_factor='weekend_factor', name_offset_modulation='offset_modulation', week_modulation_type='abs_sine', pr_mean_weekend_factor=0.3, pr_sigma_weekend_factor=0.5, weekend_days=(6, 7), model=None)[source]

Adds a weekly modulation of the number of new cases:

\text{new\_cases} &= \text{new\_cases\_raw} \cdot (1-f(t))\,, \qquad\text{with}\\
f(t) &= f_w \cdot \left(1 - \left|\sin\left(\frac{\pi}{7} t- \frac{1}{2}\Phi_w\right)\right| \right),

if week_modulation_type is "abs_sine" (the default). If week_modulation_type is "step", the new cases are simply multiplied by the weekend factor on the days set by weekend_days

The weekend factor f_w follows a Lognormal distribution with median pr_mean_weekend_factor and sigma pr_sigma_weekend_factor. It is hierarchically constructed if the input is two-dimensional by the function hierarchical_normal() with default arguments.

The offset from Sunday \Phi_w follows a flat VonMises distribution and is the same for all regions.

Parameters
  • cases (TensorVariable) – The input array of daily new cases, can be one- or two-dimensional

  • name_cases (str or None,) – The name under which to save the cases as a trace variable. Default: None, cases are not stored in the trace.

  • week_modulation_type (str) – The type of modulation, accepts "step" or "abs_sine (the default).

  • pr_mean_weekend_factor (float, at.Variable) – Sets the prior mean of the factor f_w by which weekends are counted.

  • pr_sigma_weekend_factor (float) – Sets the prior sigma of the factor f_w by which weekends are counted.

  • weekend_days (tuple of ints) – The days counted as weekend if week_modulation_type is "step"

  • model (Cov19Model) – if none, it is retrieved from the context

Returns

new_cases (TensorVariable)

Model utilities

covid19_inference.model.utility.hierarchical_normal(pr_mean, pr_sigma, name_L1='delay_hc_L1', name_L2='delay_hc_L2', name_sigma='delay_hc_sigma', model=None, error_fact=2.0, error_cauchy=True, shape=None)[source]

Implements an hierarchical normal model:

x_\text{L1} &= Normal(\text{pr\_mean}, \text{pr\_sigma})\\
y_{i, \text{L2}} &= Normal(x_\text{L1}, \sigma_\text{L2})\\
\sigma_\text{L2} &= HalfCauchy(\text{error\_fact} \cdot \text{pr\_sigma})

It is however implemented in a non-centered way, that the second line is changed to:

y_{i, \text{L2}} &= x_\text{L1} +  Normal(0,1) \cdot \sigma_\text{L2}

See for example https://arxiv.org/pdf/1312.0906.pdf

Parameters
  • name_L1 (str) – Name under which x_\text{L1} is saved in the trace.

  • name_L2 (str) – Name under which x_\text{L2} is saved in the trace. The non-centered distribution in addition saved with a suffix _raw added.

  • name_sigma (str) – Name under which \sigma_\text{L2} is saved in the trace.

  • pr_mean (float) – Prior mean of x_\text{L1}

  • pr_sigma (float) – Prior sigma for x_\text{L1} and (muliplied by error_fact) for \sigma_\text{L2}

  • len_L2 (int) – length of y_\text{L2}

  • error_fact (float) – Factor by which pr_sigma is multiplied as prior for sigma_text{L2}

  • error_cauchy (bool) – if False, a HalfNormal distribution is used for \sigma_\text{L2} instead of HalfCauchy

Returns

covid19_inference.model.utility.tt_lognormal(x, mu, sigma)[source]

Calculates a lognormal pdf for integer spaced x input.

covid19_inference.model.utility.tt_gamma(x, mu=None, sigma=None, alpha=None, beta=None)[source]

Calculates a gamma distribution pdf for integer spaced x input. Parametrized similarly to Gamma