Plotting

We provide a lot of plotting functions which can be used to recreate our plots or create completely new visualizations. If you are familiar with matplotlib it should be no problem to use them extensively.

We provide three different types of functions here:

  • High level functions These can be used create figures similar to our paper Dehning et al. arXiv:2004.01105. The are neat little one liners which create a good looking plot from our model, but do not have a lot of customization options.

  • Low level functions These extend the normal matplotlib plotting functions and can be used to plot arbitrary data. They have a lot of customization options, it could take some time to get nicely looking plots with these functions though.

  • Helper functions These are mainly functions that manipulate data or retrieve data from our model. These do not have to be used most of the time and are only documented here for completeness.

If one just wants to recreate our figures with a different color. The easiest was is to change the default rc parameters.

covid19_inference.plot.rcParams.get_rcparams_default()[source]

Get a Param (dict) of the default parameters. Here we set our default values. Assigned once to module variable rcParamsDefault on load.

covid19_inference.plot.rcParams.set_rcparams(par)[source]

Sets the rcparameters used for plotting, provided instance of Param has to have the following keys (attributes):

Variables
  • locale (str) – region settings, passed to setlocale(). Default: “en_US”

  • date_format (str) – Format the date on the x axis of time-like data (see https://strftime.org/) example April 1 2020: “%m/%d” 04/01, “%-d. %B” 1. April Default “%b %-d”, becomes April 1

  • date_show_minor_ticks (bool) – Whether to show the minor ticks (for every day). Default: True

  • rasterization_zorder (int or None) – Rasterizes plotted content below this value, set to None to keep everything a vector. Default: -1

  • draw_ci_95 (bool) – For time series plots, indicate 95% Confidence interval via fill between. Default: True

  • draw_ci_75 (bool) – For time series plots, indicate 75% Confidence interval via fill between. Default: False

  • draw_ci_50 (bool) – For time series plots, indicate 50% Confidence interval via fill between. Default: False

  • color_model (str) – Base color used for model plots, mpl compatible color code “C0”, “#303030” Default: “tab:green”

  • color_data (str) – Base color used for data Default: “tab:blue”

  • color_annot (str) – Color to use for annotations Default: “#646464”

  • color_prior (str) – Color to used for priors in distributions Default: “#708090”

  • color_posterior (str) – Color used in posterior plotting

Example

# Get default parameter
pars = cov.plot.get_rcparams_default()

# Change parameters
pars["locale"]="de_DE"
pars["color_data"]="tab:purple"

# Set parameters
cov.plot.set_rcparams(pars)

High level functions

covid19_inference.plot.timeseries_overview(model, idata, start=None, end=None, region=None, color=None, save_to=None, offset=0, annotate_constrained=True, annotate_watermark=True, axes=None, forecast_label='Forecast', forecast_heading='$\\bf Forecasts\\!:$', add_more_later=False)[source]

Create the time series overview similar to our paper. Dehning et al. arXiv:2004.01105 Contains \lambda, new cases, and cumulative cases.

Parameters
  • model (Cov19Model) –

  • trace (arviz.InferenceData) – needed for the data

  • offset (int) – offset that needs to be added to the (cumulative sum of) new cases at time model.data_begin to arrive at cumulative cases

  • start (datetime.datetime) – only used to set xrange in the end

  • end (datetime.datetime) – only used to set xrange in the end

  • color (str) – main color to use, default from rcParam

  • save_to (str or None) – path where to save the figures. default: None, not saving figures

  • annotate_constrained (bool) – show the unconstrained constrained annotation in lambda panel

  • annotate_watermark (bool) – show our watermark

  • axes (np.array of mpl axes) – provide an array of existing axes (from previously calling this function) to add more traces. Data will not be added again. Ideally call this first with add_more_later=True

  • forecast_label (str) – legend label for the forecast, default: “Forecast”

  • forecast_heading (str) – if add_more_later, how to label the forecast section. default: “$bf Forecasts!:$”,

  • add_more_later (bool) – set this to true if you plan to add multiple models to the plot. changes the layout (and the color of the fit to past data)

Returns

  • fig (mpl figure)

  • axes (np array of mpl axeses (insets not included))

Todo

  • Replace offset with an instance of data class that should yield the cumulative cases. we should not to calculations here.

covid19_inference.plot.distribution.distribution(model, idata, key, nSamples_prior=1000, title='', dist_math='x', indices=None, ax=None)[source]

High level plotting function for distribution overviews. Only works if the distrubtion is one dim or two dimensional.

Parameters
  • model (Cov19Model) – The model used to create the inference data

  • idata (av.InferenceData) – The inference data containing the posterior samples

  • key (str) – The variable of interest (which should be plotted)

  • nSamples_prior (int) – Number of samples to draw for the prior kernel density estimation.

  • indices (array-like int) – Which dimensions do you want to plot from the variable? default: None i.e. all

Low level functions

covid19_inference.plot.timeseries._timeseries(x, y, ax=None, what='data', draw_ci_95=None, draw_ci_75=None, draw_ci_50=None, date_format=True, alpha_ci=None, **kwargs)[source]

low-level function to plot anything that has a date on the x-axis.

Parameters
  • x (array of datetime.datetime) – times for the x axis

  • y (array, 1d or 2d) – data to plot. if 2d, we plot the CI as fill_between (if CI enabled in rc params) if 2d, then first dim is realization and second dim is time matching x if 1d then first tim is time matching x

  • ax (mpl axes element, optional) – plot into an existing axes element. default: None

  • what (str, optional) – what type of data is provided in x. sets the style used for plotting: * data for data points * fcast for model forecast (prediction) * model for model reproduction of data (past)

  • date_format (bool, optional) – Automatic converting of index to dates default:True

  • kwargs (dict, optional) – directly passed to plotting mpl.

Returns

ax

covid19_inference.plot.distribution._distribution(array_posterior, array_prior, dist_name, dist_math, suffix='', ax=None)[source]

Low level function to plots posterior and prior from arrays.

Parameters
  • array_prior (array_posterior,) – Sampling data as array, should be filtered beforehand. If none it does not get plotted!

  • dist_name (str) – name of distribution for plotting

  • dist_math (str) – math of distribution for plotting

  • suffix (str,optional) – Suffix for the plot title e.g. “age_group_1” Default: “”

  • ax (mpl axes element, optional) – Plot into an existing axes element Default: None

Example

In this example we want to use the low level time series function to plot the new daily cases and deaths reported by the Robert Koch institute.

import datetime
import matplotlib.pyplot as plt
import covid19_inference as cov19

# Data retrieval i.e. download new data from RobertKochInstitue
rki = cov19.data_retrieval.RKI()
rki.download_all_available_data()

new_deaths = rki.get_new(
    value = "deaths",
    data_begin=datetime.datetime(2020,3,15), #arbitrary data
    data_end=datetime.datetime.today())

new_cases = rki.get_new(
    value = "confirmed",
    data_begin=datetime.datetime(2020,3,15),
    data_end=datetime.datetime.today())

# Create a multiplot
fig, axes = plt.subplots(2,1, figsize=(12,6))

# Plot the new cases onto axes[0]
cov19.plot._timeseries(
    x=new_cases.index,
    y=new_cases,
    ax=axes[0],
    what="model", #We define model here to get a line instead of data points
)

# Plot the new deaths onto axes[1]
cov19.plot._timeseries(
    x=new_deaths.index,
    y=new_deaths,
    ax=axes[1],
    what="model", #We define model here to get a line instead of data points
)

# Label the plots

axes[0].set_title("New cases")

axes[1].set_title("New deaths")

# Show the figure
fig.show()
../_images/exampe_timeseries.png

Helper functions

covid19_inference.plot.utils.get_array_from_idata(idata, var, from_type='posterior')[source]

Reshapes and returns an numpy array from an arviz idata

Parameters
  • idata (arviz.InferenceData) – InferenceData object

  • var (str) – Variable name

  • from_type (str, optional) – Type of data to return. Options are: * posterior : posterior samples * prior : prior samples * … check idata attributes for options

Returns

array (numpy.ndarray with chain and smaples flattened)

covid19_inference.plot.utils.get_array_from_idata_via_date(model, idata, var, start=None, end=None, dates=None)[source]
Parameters
  • model (Cov19Model) –

  • idata (arviz.InferenceData) –

  • var (str) – the variable name in the trace

  • start (datetime.datetime) – get all data for a range from start to end. (both boundary dates included)

  • end (datetime.datetime) –

  • dates (list of datetime.datetime objects, optional) – the dates for which to get the data. Default: None, will return all available data.

Returns

  • data (nd array, 3 dim) – the elements from the trace matching the dates. dimensions are as follows 0 samples, if no samples only one entry 1 data with time matching the returned dates (if compatible variable) 2 region, if no regions only one entry

  • dates (pandas DatetimeIndex) – the matching dates. this is essnetially an array of dates than can be passed to matplotlib

Example

import covid19_inference as cov
model, trace = cov.create_example_instance()
y, x = cov.plot._get_array_from_trace_via_date(
    model, trace, "lambda_t", model.data_begin, model.data_end
)
ax = cov.plot._timeseries(x, y[:,:,0], what="model")
covid19_inference.plot.timeseries._new_cases_to_cum_cases(x, y, what, offset=0)[source]

so this conversion got ugly really quickly. need to check dimensionality of y

Parameters
  • x (pandas DatetimeIndex array) – will be padded accordingly

  • y (1d or 2d numpy array) – new cases matching dates in x. if 1d, we assume raw data (no samples) if 2d, we assume results from trace with 0th dim samples and 1st new cases matching x

  • what (str) – dirty workaround to differntiate between traces and raw data “data” or “trace”

  • offset (int or array like) – added to cum sum (should be the known cumulative case number at the first date of provided in x)

Returns

  • x_cum (pandas DatetimeIndex array) – dates of the cumulative cases

  • y_cum (nd array) – cumulative cases matching x_cum and the dimension of input y

Example

cum_dates, cum_cases = _new_cases_to_cum_cases(new_dates, new_cases)
covid19_inference.plot.distribution._get_mpl_text_coordinates(text, ax)[source]

helper to get coordinates of a text object in the coordinates of the axes element [0,1]. used for the rectangle backdrop.

Returns: x_min, x_max, y_min, y_max

covid19_inference.plot.distribution._add_mpl_rect_around_text(text_list, ax, x_padding=0.05, y_padding=0.05, **kwargs)[source]

add a rectangle to the axes (behind the text)

provide a list of text elements and possible options passed to mpl.patches.Rectangle e.g. facecolor=”grey”, alpha=0.2, zorder=99,

covid19_inference.plot.utils.format_k(prec)[source]

format yaxis 10_000 as 10 k. _format_k(0)(1200, 1000.0) gives “1 k” _format_k(1)(1200, 1000.0) gives “1.2 k”

covid19_inference.plot.utils.format_date_xticks(ax, minor=None)[source]
covid19_inference.plot.distribution._truncate_number(number, precision)[source]
covid19_inference.plot.distribution._string_median_CI(arr, prec=2)[source]
covid19_inference.plot.utils.add_watermark(ax, mark='Dehning et al. 10.1126/science.abb9789')[source]

Add our arxive url to an axes as (upper right) title

covid19_inference.plot.rcParams

alias of covid19_inference.plot.rcParams