arviz_stats.summary#
- arviz_stats.summary(data, var_names=None, filter_vars=None, group='posterior', coords=None, sample_dims=None, kind='all', ci_prob=None, ci_kind=None, round_to=2, skipna=False)[source]#
Create a data frame with summary statistics and or diagnostics.
- Parameters:
- data
xarray.DataTree
,DataSet
orInferenceData
- var_names
list
ofstr
, optional Names of variables to include in summary. If None all variables are included.
- filter_vars: {None, “like”, “regex”}, default None
Used for var_names only. If
None
(default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names.- group: str
Select a group for summary. Defaults to “posterior”.
- coords
dict
, optional Coordinates defining a subset over the selected group.
- sample_dims
str
or sequence ofhashable
, optional Defaults to
rcParams["data.sample_dims"]
- kind: {‘all’, ‘stats’, ‘diagnostics’, ‘all_median’, ‘stats_median’,
- ‘diagnostics_median’, ‘mc_diagnostics’}, default ‘all’
all
: mean, sd, ci, ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.stats
: mean, sd, and ci.diagnostics
: ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.all_median
: median, mad, ci, ess_median, ess_tail, r_hat, mcse_median.stats_median
: median, mad, and ci.diagnostics_median
: ess_median, ess_tail, r_hat, mcse_median.mc_diagnostics
: mcse_mean, ess_mean, and min_ss.
- ci_prob
float
, optional Probability for the credible interval. Defaults to
rcParams["stats.ci_prob"]
.- ci_kind{“hdi”, “eti”}, optional
Type of credible interval. Defaults to
rcParams["stats.ci_kind"]
. If kind is stats_median or all_median, ci_kind is forced to “eti”.- round_to
int
Number of decimals used to round results. Defaults to 2. Use “none” to return raw numbers.
- skipna: bool
If true ignores nan values when computing the summary statistics. Defaults to false.
- data
- Returns:
pandas.DataFrame
See also
arviz.rhat
Compute estimate of rank normalized split R-hat for a set of traces.
arviz.ess
Calculate the effective sample size of a set of traces.
arviz.mcse
Calculate Markov Chain Standard Error statistic.
plot_ess
Plot quantile, local or evolution of effective sample sizes (ESS).
plot_mcse
Plot quantile, local or evolution of Markov Chain Standard Error (MCSE).
Examples
In [1]: from arviz_base import load_arviz_data ...: from arviz_stats import summary ...: data = load_arviz_data("non_centered_eight") ...: summary(data, var_names=["mu", "tau"]) ...: Out[1]: mean sd eti94_lb eti94_ub ... ess_tail r_hat mcse_mean mcse_sd mu 4.37 3.29 -1.92 10.48 ... 1060.38 1.0 0.08 0.06 tau 3.72 3.10 0.19 11.16 ... 868.12 1.0 0.08 0.09 [2 rows x 9 columns]
You can use
filter_vars
to select variables without having to specify all the exact names. Usefilter_vars="like"
to select based on partial naming:In [2]: summary(data, var_names=["the"], filter_vars="like") Out[2]: mean sd eti94_lb ... r_hat mcse_mean mcse_sd theta_t[Choate] 0.34 1.07 -1.74 ... 1.0 0.02 0.02 theta_t[Deerfield] 0.14 0.97 -1.70 ... 1.0 0.02 0.01 theta_t[Phillips Andover] -0.08 1.00 -1.87 ... 1.0 0.02 0.01 theta_t[Phillips Exeter] 0.05 0.92 -1.69 ... 1.0 0.02 0.01 theta_t[Hotchkiss] -0.17 0.93 -1.87 ... 1.0 0.02 0.02 theta_t[Lawrenceville] -0.09 0.89 -1.70 ... 1.0 0.02 0.01 theta_t[St. Paul's] 0.40 0.95 -1.38 ... 1.0 0.02 0.01 theta_t[Mt. Hermon] 0.09 0.99 -1.76 ... 1.0 0.02 0.01 theta[Choate] 6.42 5.66 -2.09 ... 1.0 0.13 0.15 theta[Deerfield] 5.02 4.82 -3.99 ... 1.0 0.10 0.10 theta[Phillips Andover] 3.88 5.45 -7.60 ... 1.0 0.13 0.13 theta[Phillips Exeter] 4.51 4.69 -4.65 ... 1.0 0.10 0.09 theta[Hotchkiss] 3.50 4.80 -6.21 ... 1.0 0.11 0.10 theta[Lawrenceville] 4.04 4.77 -5.43 ... 1.0 0.12 0.11 theta[St. Paul's] 6.51 5.23 -2.08 ... 1.0 0.12 0.12 theta[Mt. Hermon] 4.85 5.49 -4.55 ... 1.0 0.12 0.16 [16 rows x 9 columns]
Use
filter_vars="regex"
to select based on regular expressions, and prefix the variables you want to exclude by~
. Here, we exclude from the summary all the variables starting with the letter t:In [3]: summary(data, var_names=["~^t"], filter_vars="regex") Out[3]: mean sd eti94_lb eti94_ub ... ess_tail r_hat mcse_mean mcse_sd mu 4.37 3.29 -1.92 10.48 ... 1060.38 1.0 0.08 0.06 [1 rows x 9 columns]