arviz_stats.loo_i

Contents

arviz_stats.loo_i#

arviz_stats.loo_i(i, data, var_name=None, reff=None, log_weights=None, pareto_k=None)[source]#

Compute PSIS-LOO-CV for a single observation.

Estimates the expected log pointwise predictive density (elpd) using Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) for a single observation. The method is described in [1] and [2].

Parameters:
iint | dict | scalar

Observation selector. Must be one of:

  • int: Positional index in flattened observation order across all observation dimensions.

  • dict: Label-based mapping {obs_dim: coord_value} for all observation dimensions. Uses .sel semantics.

  • scalar label: Only when there is exactly one observation dimension.

dataxarray.DataTree or InferenceData

Input data. It should contain the posterior and the log_likelihood groups.

var_namestr, optional

The name of the variable in log_likelihood groups storing the pointwise log likelihood data to use for loo computation.

refffloat, optional

Relative MCMC efficiency, ess / n i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.

log_weightsxarray.DataArray, optional

Smoothed log weights for observation i. If not provided, will be computed using PSIS. Must be provided together with pareto_k or both must be None.

pareto_kfloat, optional

Pareto shape value for observation i. If not provided, will be computed using PSIS. Must be provided together with log_weights or both must be None.

Returns:
ELPDData

Object with the following attributes:

  • elpd: expected log pointwise predictive density for observation i

  • se: standard error (set to 0.0 as SE is undefined for a single observation)

  • p: effective number of parameters for observation i

  • n_samples: number of samples

  • n_data_points: 1 (single observation)

  • warning: True if the estimated shape parameter of Pareto distribution is greater than good_k

  • elpd_i: DataArray with single value

  • pareto_k: DataArray with single Pareto shape value

  • good_k: For a sample size S, the threshold is computed as min(1 - 1/log10(S), 0.7)

  • log_weights: Smoothed log weights for observation i

See also

loo

Compute LOO for all observations

compare

Compare models based on their ELPD.

Notes

This function is useful for testing log-likelihood functions and getting detailed diagnostics for individual observations. It’s particularly helpful when debugging PSIS-LOO-CV computations for large datasets using loo_subsample with the PLPD approximation method, or when verifying log-likelihood implementations with loo_moment_match.

Since this computes PSIS-LOO-CV for a single observation, the standard error is set to 0.0 as variance cannot be computed from a single value.

References

[1]

Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.

[2]

Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646

Examples

Calculate LOO for a single observation using the school name:

In [1]: from arviz_stats import loo_i
   ...: from arviz_base import load_arviz_data
   ...: import xarray as xr
   ...: data = load_arviz_data("centered_eight")
   ...: loo_i({"school": "Choate"}, data)
   ...: 
Out[1]: 
Computed from 2000 posterior samples and 1 observations log-likelihood matrix.

         Estimate       SE
elpd_loo    -4.89     0.00
p_loo        0.28        -
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.70]   (good)        1  100.0%
   (0.70, 1]   (bad)         0    0.0%
    (1, Inf)   (very bad)    0    0.0%

If you prefer simple integer indexing across flattened observations, you can use the index:

In [2]: loo_i(0, data)
Out[2]: 
Computed from 2000 posterior samples and 1 observations log-likelihood matrix.

         Estimate       SE
elpd_loo    -4.89     0.00
p_loo        0.28        -
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.70]   (good)        1  100.0%
   (0.70, 1]   (bad)         0    0.0%
    (1, Inf)   (very bad)    0    0.0%

For multi-dimensional data, specify all observation dimensions. For example, with data that has two observation dimensions (y_dim_0 and y_dim_1), you can select by index:

In [3]: import arviz_base as azb
   ...: import numpy as np
   ...: np.random.seed(0)
   ...: idata = azb.from_dict({
   ...:     "posterior": {"theta": np.random.randn(2, 100, 3, 4)},
   ...:     "log_likelihood": {"y": np.random.randn(2, 100, 3, 4)},
   ...:     "observed_data": {"y": np.random.randn(3, 4)},
   ...: })
   ...: loo_i({"y_dim_0": 1, "y_dim_1": 2}, idata)
   ...: 
Out[3]: 
Computed from 200 posterior samples and 1 observations log-likelihood matrix.

         Estimate       SE
elpd_loo    -0.53     0.00
p_loo        0.95        -
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.57]   (good)        1  100.0%
   (0.57, 1]   (bad)         0    0.0%
    (1, Inf)   (very bad)    0    0.0%

With a single observation dimension, you can pass a single label directly:

In [4]: loo_i("Choate", data)
Out[4]: 
Computed from 2000 posterior samples and 1 observations log-likelihood matrix.

         Estimate       SE
elpd_loo    -4.89     0.00
p_loo        0.28        -
------

Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.70]   (good)        1  100.0%
   (0.70, 1]   (bad)         0    0.0%
    (1, Inf)   (very bad)    0    0.0%