arviz_stats.loo#
- arviz_stats.loo(data, pointwise=None, var_name=None, reff=None, log_weights=None, pareto_k=None, log_jacobian=None)[source]#
Compute Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV).
Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV). Also calculates LOO’s standard error and the effective number of parameters. The method is described in [1] and [2].
- Parameters:
- data
xarray.DataTree
orInferenceData
Input data. It should contain the posterior and the log_likelihood groups.
- pointwisebool, optional
If True the pointwise predictive accuracy will be returned. Defaults to
rcParams["stats.ic_pointwise"]
.- var_name
str
, optional The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.
- reff
float
, optional Relative MCMC efficiency,
ess / n
i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.- log_weights
xarray.DataArray
, optional Smoothed log weights. It must have the same shape as the log likelihood data. Defaults to None. If not provided, it will be computed using the PSIS-LOO method. Must be provided together with pareto_k or both must be None.
- pareto_k
xarray.DataArray
, optional Pareto shape values. It must have the same shape as the log likelihood data. Defaults to None. If not provided, it will be computed using the PSIS-LOO method. Must be provided together with log_weights or both must be None.
- log_jacobian
xarray.DataArray
, optional Log-Jacobian adjustment for variable transformations. Required when the model was fitted on transformed response data \(z = T(y)\) but you want to compute ELPD on the original response scale \(y\). The value should be \(\log|\frac{dz}{dy}|\) (the log absolute value of the derivative of the transformation). Must be a DataArray with dimensions matching the observation dimensions.
- data
- Returns:
ELPDData
Object with the following attributes:
elpd: expected log pointwise predictive density
se: standard error of the elpd
p: effective number of parameters
n_samples: number of samples
n_data_points: number of data points
warning: True if the estimated shape parameter of Pareto distribution is greater than
good_k
.elpd_i:
DataArray
with the pointwise predictive accuracy, only ifpointwise=True
pareto_k: array of Pareto shape values, only if
pointwise=True
good_k: For a sample size S, the threshold is computed as
min(1 - 1/log10(S), 0.7)
approx_posterior: True if approximate posterior was used.
log_weights: Smoothed log weights.
See also
compare
Compare models based on their ELPD.
arviz_plots.plot_compare
Summary plot for model comparison.
References
[1]Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.
[2]Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646
Examples
Calculate LOO of a model:
In [1]: from arviz_stats import loo ...: from arviz_base import load_arviz_data ...: data = load_arviz_data("centered_eight") ...: loo_data = loo(data) ...: loo_data ...: Out[1]: Computed from 2000 posterior samples and 8 observations log-likelihood matrix. Estimate SE elpd_loo -30.78 1.35 p_loo 0.95 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 8 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%
Return the pointwise values:
In [2]: loo_data.elpd_i Out[2]: <xarray.DataArray 'obs' (school: 8)> Size: 64B array([-4.89190585, -3.41968688, -3.86727944, -3.46496198, -3.4780878 , -3.49926442, -4.2004696 , -3.95934834]) Coordinates: * school (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
If a model was fit on a transformed response \(z = T(y)\) (e.g., \(z=\sqrt{y}\)), you can report LOO results on the original \(y\) scale by adding the log-Jacobian of the forward transform to each pointwise log-likelihood value via the
log_jacobian
argument.For example, with \(z=\sqrt{y}\) (and \(y>0\)), the derivative of \(z\) with respect to \(y\) is \(\tfrac{dz}{dy} = \tfrac{1}{2\sqrt{y}}\). So, the log-Jacobian is \(\log\!\left|\tfrac{dz}{dy}\right| = -\log 2 - \tfrac{1}{2}\log y\):
In [3]: import numpy as np ...: import xarray as xr ...: from arviz_stats import loo ...: from arviz_base import load_arviz_data ...: data = load_arviz_data("centered_eight") ...: y = data.observed_data["obs"] ...: y_positive = y - y.min() + 1 # Positive values for sqrt ...: log_jacobian_values = -np.log(2) - 0.5 * np.log(y_positive) ...: log_jacobian = xr.DataArray( ...: log_jacobian_values, ...: dims=y.dims, ...: coords=y.coords ...: ) ...: loo_data_adjusted = loo(data, log_jacobian=log_jacobian) ...: loo_data_adjusted ...: Out[3]: Computed from 2000 posterior samples and 8 observations log-likelihood matrix. Estimate SE elpd_loo -44.79 2.49 p_loo 0.95 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 8 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%