Debugging¶
This is some pointer to help debugging models and sampling issues
General approach for nans/infs during sampling¶
The idea of this approach is to sample from the prior and then run the model. If the log likelihood is then -inf, there is a problem, and the output of the theano functions is inspected.
Sample from prior:
from pymc3.util import (
get_untransformed_name,
is_transformed_name)
varnames = list(map(str, model.vars))
for name in varnames:
if is_transformed_name(name):
varnames.append(get_untransformed_name(name))
with model:
points = pm.sample_prior_predictive(var_names = varnames)
points_list = []
for i in range(len(next(iter(points.values())))):
point_dict = {}
for name, val in points.items():
point_dict[name] = val[i]
points_list.append(point_dict)
points_list is a list of the starting points for the model, sampled from the prior. Then to run the model and print the log-likelihood:
fn = model.fn(model.logpt)
for point in points_list[:]:
print(fn(point))
To monitor the output and save it in a file (for use in ipython). Learned from: http://deeplearning.net/software/theano/tutorial/debug_faq.html#how-do-i-step-through-a-compiled-function
%%capture cap --no-stderr
def inspect_inputs(i, node, fn):
print(i, node, "input(s) value(s):", [input[0] for input in fn.inputs],
end='')
def inspect_outputs(i, node, fn):
print(" output(s) value(s):", [output[0] for output in fn.outputs])
fn_monitor = model.fn(model.logpt,
mode=theano.compile.MonitorMode(
pre_func=inspect_inputs,
post_func=inspect_outputs).excluding(
'local_elemwise_fusion', 'inplace'))
fn = model.fn(model.logpt)
for point in points_list[:]:
if fn(point) < -1e10:
print(fn_monitor(point))
break
In a new cell:
with open('output.txt', 'w') as f:
f.write(cap.stdout)
Then one can open output.txt in a text editor, and follow from where infs or nans come from by following the inputs and outputs up through the graph
Sampler: MCMC (Nuts)¶
Divergences¶
During sampling, a significant fraction of divergences are a sign that the sampler doesn’t sample the whole posterior. In this case the model should be reparametrized. See this tutorial for a typical example: https://docs.pymc.io/notebooks/Diagnosing_biased_Inference_with_Divergences.html
And these papers include some more details: https://pdfs.semanticscholar.org/7b85/fb48a077c679c325433fbe13b87560e12886.pdf https://arxiv.org/pdf/1312.0906.pdf
Bad initial energy¶
This typically occurs when some distribution in the model can’t be evaluated at the starting point of chain. Run this to see which distribution throws nans or infs:
for RV in model.basic_RVs:
print(RV.name, RV.logp(model.test_point))
However, this is only evaluates the test_point. When PyMC3 starts sampling, it adds some jitter around this test_point, which then could lead to nans. Run this to add jitter and then evaluate the logp:
chains=4
for RV in model.basic_RVs:
print(RV.name)
for _ in range(chains):
mean = {var: val.copy() for var, val in model.test_point.items()}
for val in mean.values():
val[...] += 2 * np.random.rand(*val.shape) - 1
print(RV.logp(mean))
This code could potentially change in newer versions of PyMC3 (this is tested in 3.8). Read the source code, to know which random jitter PyMC3 currently adds at beginning.
Nans occur during sampling¶
Run the sampler with the debug mode of Theano.
from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=False, big_is_error=False,
optimizer='o1')
trace = pm.sample(mode=mode)
However this doesn’t lead to helpful messages if nans occur during gradient evaluations.
Sampler: Variational Inference¶
There exist some ways to track parameters during sampling. An example:
with model:
advi = pm.ADVI()
print(advi.approx.group)
print(advi.approx.mean.eval())
print(advi.approx.std.eval())
tracker = pm.callbacks.Tracker(
mean=advi.approx.mean.eval, # callable that returns mean
std=advi.approx.std.eval # callable that returns std
)
approx = advi.fit(100000, callbacks=[tracker],
obj_optimizer=pm.adagrad_window(learning_rate=1e-3),)
#total_grad_norm_constraint=10) #constrains maximal gradient, could help
print(approx.groups[0].bij.rmap(approx.params[0].eval()))
plt.plot(tracker['mean'])
plt.plot(tracker['std'])
For the tracker, the order of the parameters is saved in:
approx.ordering.by_name
and the indices encoded there in the slc field. To plot the mean value of a given parameter name, run:
plt.plot(np.array(tracker['mean'])[:, approx.ordering.by_name['name'].slc]
The debug mode is set with the following parameter:
from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=False, big_is_error=False,
optimizer='o1')
approx = advi.fit(100000, callbacks=[tracker],
fn_kwargs={'mode':mode})