arviz_stats.loo_pit

Contents

arviz_stats.loo_pit#

arviz_stats.loo_pit(data, var_names=None, log_weights=None)[source]#

Compute leave one out (PSIS-LOO) probability integral transform (PIT) values.

The LOO-PIT values are \(p(\tilde{y}_i \le y_i \mid y_{-i})\), where \(y_i\) represents the observed data for index \(i\) and \(\tilde y_i\) represents the posterior predictive sample at index \(i\). Note that \(y_{-i}\) indicates we have left out the \(i\)-th observation. LOO-PIT values are computed using the PSIS-LOO-CV method described in [1] and [2].

Parameters:
dataxarray.DataTree or InferenceData

It should contain posterior, posterior_predictive and log_likelihood groups.

var_namesstr or list of str, optional

Names of the variables to be used to compute the LOO-PIT values. If None, all variables are used. The function assumes that the observed and log_likelihood variables share the same names.

log_weights: DataArray or ELPDData, optional

Smoothed log weights. Can be either:

  • A DataArray with the same shape as y_pred

  • An ELPDData object from a previous arviz_stats.loo call.

Defaults to None. If not provided, it will be computed using the PSIS-LOO method.

Returns:
loo_pit: array or xarray.DataArray

Value of the LOO-PIT at each observed data point.

References

[1]

Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.

[2]

Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646

Examples

Calculate LOO-PIT values using as test quantity the observed values themselves.

In [1]: from arviz_stats import loo_pit
   ...: from arviz_base import load_arviz_data
   ...: dt = load_arviz_data("centered_eight")
   ...: loo_pit(dt)
   ...: 
Out[1]: 
<xarray.Dataset> Size: 576B
Dimensions:  (school: 8)
Coordinates:
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
Data variables:
    obs      (school) float64 64B 0.9435 0.638 0.3167 ... 0.4025 0.9025 0.6553

Calculate LOO-PIT values using as test quantity the square of the difference between each observation and mu. For this we create a new DataTree, copying the posterior and log_likelihood groups and creating new observed and posterior_predictive groups.

In [2]: from arviz_base import from_dict
   ...: new_dt = from_dict({"posterior": dt.posterior,
   ...:                 "log_likelihood": dt.log_likelihood,
   ...:                 "observed_data": {
   ...:                     "obs": (dt.observed_data.obs
   ...:                            - dt.posterior.mu.median(dim=("chain", "draw")))**2},
   ...:                 "posterior_predictive": {
   ...:                     "obs": (dt.posterior_predictive.obs - dt.posterior.mu)**2}})
   ...: loo_pit(new_dt)
   ...: 
Out[2]: 
<xarray.Dataset> Size: 128B
Dimensions:    (obs_dim_0: 8)
Coordinates:
  * obs_dim_0  (obs_dim_0) int64 64B 0 1 2 3 4 5 6 7
Data variables:
    obs        (obs_dim_0) float64 64B 0.8737 0.2437 0.3575 ... 0.775 0.2967