arviz_stats.summary

Contents

arviz_stats.summary#

arviz_stats.summary(data, var_names=None, filter_vars=None, group='posterior', coords=None, sample_dims=None, kind='all', ci_prob=None, ci_kind=None, round_to=2, skipna=False)[source]#

Create a data frame with summary statistics and or diagnostics.

Parameters:
dataxarray.DataTree, DataSet or InferenceData
var_nameslist of str, optional

Names of variables to include in summary. If None all variables are included.

filter_vars: {None, “like”, “regex”}, default None

Used for var_names only. If None (default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names.

group: str

Select a group for summary. Defaults to “posterior”.

coordsdict, optional

Coordinates defining a subset over the selected group.

sample_dimsstr or sequence of hashable, optional

Defaults to rcParams["data.sample_dims"]

kind: {‘all’, ‘stats’, ‘diagnostics’, ‘all_median’, ‘stats_median’,
‘diagnostics_median’, ‘mc_diagnostics’}, default ‘all’
  • all: mean, sd, ci, ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.

  • stats: mean, sd, and ci.

  • diagnostics: ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.

  • all_median: median, mad, ci, ess_median, ess_tail, r_hat, mcse_median.

  • stats_median: median, mad, and ci.

  • diagnostics_median: ess_median, ess_tail, r_hat, mcse_median.

  • mc_diagnostics: mcse_mean, ess_mean, and min_ss.

ci_probfloat, optional

Probability for the credible interval. Defaults to rcParams["stats.ci_prob"].

ci_kind{“hdi”, “eti”}, optional

Type of credible interval. Defaults to rcParams["stats.ci_kind"]. If kind is stats_median or all_median, ci_kind is forced to “eti”.

round_toint

Number of decimals used to round results. Defaults to 2. Use “none” to return raw numbers.

skipna: bool

If true ignores nan values when computing the summary statistics. Defaults to false.

Returns:
pandas.DataFrame

See also

arviz.rhat

Compute estimate of rank normalized split R-hat for a set of traces.

arviz.ess

Calculate the effective sample size of a set of traces.

arviz.mcse

Calculate Markov Chain Standard Error statistic.

plot_ess

Plot quantile, local or evolution of effective sample sizes (ESS).

plot_mcse

Plot quantile, local or evolution of Markov Chain Standard Error (MCSE).

Examples

In [1]: from arviz_base import load_arviz_data
   ...: from arviz_stats import summary
   ...: data = load_arviz_data("non_centered_eight")
   ...: summary(data, var_names=["mu", "tau"])
   ...: 
Out[1]: 
     mean    sd  eti94_lb  eti94_ub  ...  ess_tail  r_hat  mcse_mean  mcse_sd
mu   4.37  3.29     -1.92     10.48  ...   1060.38    1.0       0.08     0.06
tau  3.72  3.10      0.19     11.16  ...    868.12    1.0       0.08     0.09

[2 rows x 9 columns]

You can use filter_vars to select variables without having to specify all the exact names. Use filter_vars="like" to select based on partial naming:

In [2]: summary(data, var_names=["the"], filter_vars="like")
Out[2]: 
                           mean    sd  eti94_lb  ...  r_hat  mcse_mean  mcse_sd
theta_t[Choate]            0.34  1.07     -1.74  ...    1.0       0.02     0.02
theta_t[Deerfield]         0.14  0.97     -1.70  ...    1.0       0.02     0.01
theta_t[Phillips Andover] -0.08  1.00     -1.87  ...    1.0       0.02     0.01
theta_t[Phillips Exeter]   0.05  0.92     -1.69  ...    1.0       0.02     0.01
theta_t[Hotchkiss]        -0.17  0.93     -1.87  ...    1.0       0.02     0.02
theta_t[Lawrenceville]    -0.09  0.89     -1.70  ...    1.0       0.02     0.01
theta_t[St. Paul's]        0.40  0.95     -1.38  ...    1.0       0.02     0.01
theta_t[Mt. Hermon]        0.09  0.99     -1.76  ...    1.0       0.02     0.01
theta[Choate]              6.42  5.66     -2.09  ...    1.0       0.13     0.15
theta[Deerfield]           5.02  4.82     -3.99  ...    1.0       0.10     0.10
theta[Phillips Andover]    3.88  5.45     -7.60  ...    1.0       0.13     0.13
theta[Phillips Exeter]     4.51  4.69     -4.65  ...    1.0       0.10     0.09
theta[Hotchkiss]           3.50  4.80     -6.21  ...    1.0       0.11     0.10
theta[Lawrenceville]       4.04  4.77     -5.43  ...    1.0       0.12     0.11
theta[St. Paul's]          6.51  5.23     -2.08  ...    1.0       0.12     0.12
theta[Mt. Hermon]          4.85  5.49     -4.55  ...    1.0       0.12     0.16

[16 rows x 9 columns]

Use filter_vars="regex" to select based on regular expressions, and prefix the variables you want to exclude by ~. Here, we exclude from the summary all the variables starting with the letter t:

In [3]: summary(data, var_names=["~^t"], filter_vars="regex")
Out[3]: 
    mean    sd  eti94_lb  eti94_ub  ...  ess_tail  r_hat  mcse_mean  mcse_sd
mu  4.37  3.29     -1.92     10.48  ...   1060.38    1.0       0.08     0.06

[1 rows x 9 columns]