arviz_stats.loo_i#
- arviz_stats.loo_i(i, data, var_name=None, reff=None, log_weights=None, pareto_k=None)[source]#
Compute PSIS-LOO-CV for a single observation.
Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) for a single observation. The method is described in [1] and [2].
- Parameters:
- i
int
|dict
| scalar Observation selector. Must be one of:
int: Positional index in flattened observation order across all observation dimensions.
dict: Label-based mapping
{obs_dim: coord_value}
for all observation dimensions. Uses.sel
semantics.scalar label: Only when there is exactly one observation dimension.
- data
xarray.DataTree
orInferenceData
Input data. It should contain the posterior and the log_likelihood groups.
- var_name
str
, optional The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.
- reff
float
, optional Relative MCMC efficiency,
ess / n
i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.- log_weights
xarray.DataArray
, optional Smoothed log weights for observation i. If not provided, will be computed using PSIS. Must be provided together with pareto_k or both must be None.
- pareto_k
float
, optional Pareto shape value for observation i. If not provided, will be computed using PSIS. Must be provided together with log_weights or both must be None.
- i
- Returns:
ELPDData
Object with the following attributes:
elpd: expected log pointwise predictive density for observation i
se: standard error (set to 0.0 as SE is undefined for a single observation)
p: effective number of parameters for observation i
n_samples: number of samples
n_data_points: 1 (single observation)
warning: True if the estimated shape parameter of Pareto distribution is greater than
good_k
elpd_i:
DataArray
with single valuepareto_k:
DataArray
with single Pareto shape valuegood_k: For a sample size S, the threshold is computed as
min(1 - 1/log10(S), 0.7)
log_weights: Smoothed log weights for observation i
Notes
This function is useful for testing log-likelihood functions and getting detailed diagnostics for individual observations. It’s particularly helpful when debugging PSIS-LOO-CV computations for large datasets using
loo_subsample
with the PLPD approximation method, or when verifying log-likelihood implementations withloo_moment_match
.Since this computes PSIS-LOO-CV for a single observation, the standard error is set to 0.0 as variance cannot be computed from a single value.
References
[1]Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.
[2]Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646
Examples
Calculate LOO for a single observation using the school name:
In [1]: from arviz_stats import loo_i ...: from arviz_base import load_arviz_data ...: import xarray as xr ...: data = load_arviz_data("centered_eight") ...: loo_i({"school": "Choate"}, data) ...: Out[1]: Computed from 2000 posterior samples and 1 observations log-likelihood matrix. Estimate SE elpd_loo -4.89 0.00 p_loo 0.28 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 1 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%
If you prefer simple integer indexing across flattened observations, you can use the index:
In [2]: loo_i(0, data) Out[2]: Computed from 2000 posterior samples and 1 observations log-likelihood matrix. Estimate SE elpd_loo -4.89 0.00 p_loo 0.28 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 1 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%
For multi-dimensional data, specify all observation dimensions. For example, with data that has two observation dimensions (y_dim_0 and y_dim_1), you can select by index:
In [3]: import arviz_base as azb ...: import numpy as np ...: np.random.seed(0) ...: idata = azb.from_dict({ ...: "posterior": {"theta": np.random.randn(2, 100, 3, 4)}, ...: "log_likelihood": {"y": np.random.randn(2, 100, 3, 4)}, ...: "observed_data": {"y": np.random.randn(3, 4)}, ...: }) ...: loo_i({"y_dim_0": 1, "y_dim_1": 2}, idata) ...: Out[3]: Computed from 200 posterior samples and 1 observations log-likelihood matrix. Estimate SE elpd_loo -0.53 0.00 p_loo 0.95 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.57] (good) 1 100.0% (0.57, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%
With a single observation dimension, you can pass a single label directly:
In [4]: loo_i("Choate", data) Out[4]: Computed from 2000 posterior samples and 1 observations log-likelihood matrix. Estimate SE elpd_loo -4.89 0.00 p_loo 0.28 - ------ Pareto k diagnostic values: Count Pct. (-Inf, 0.70] (good) 1 100.0% (0.70, 1] (bad) 0 0.0% (1, Inf) (very bad) 0 0.0%