Neural information measures (spynal.info)¶
Compute measures of neural information about task/behavioral variables
Overview¶
Functionality for computing information in neural activity about experimental variables (task conditions and/or behavior) using various different methods, which have different assumptions about distribution of neural activity and task->activity encoding model:
- PEVPercent of variance in neural activity explained by experimental variable(s).
Assumes encoding described by linear model, and data is ~ normally distributed with same variance for all conditions. Can handle any arbitary combinations of multi-level categorical or continuous task variables.
- dprimeDistance between data means normalized by their pooled standard deviations.
Assumes task->activity encoding is based on a change in mean activity, data is normal. Can only handle two-sample data (ie contrasts between two categorical task conditions).
- area under ROCArea under ROC curve for discriminating activity between two conditions.
Assumes encoding is based on a change in mean activity, but no assumption of any specific data distributions. Can only handle two-sample data (ie contrasts between two categorical task conditions).
- mutual informationMutual information between neural activity and task variable.
Reflects how much knowing the neural activity reduces uncertainty about the task variable. Makes NO assumptions about data distribution or encoding model. But, as a result, requires more data to reliably estimate. Can only handle two-sample data (ie contrasts between two categorical task conditions).
- decodingAccuracy in decoding (classifying) task variable from neural activity
Unlike other methods here, this is based on a decoding (activity->task), rather than encoding (task->activity), model. Unlike other models, this method is multivariate (looks at information across multiple neural channels), rather than univariate (looks at information only in a single channel at a time). Commonly-used linear decoders assume activity->task decoding is described by a linear model, and often also assume activity is distributed as a multivariate normal. However, user can also use any custom decoding model (with different assumptions) with this function. Can only handle multi-level categorical (multinomial) task variables.
Most functions perform operations in a mass-univariate (or mass-multivariate) manner. This means that rather than embedding function calls in for loops over channels, timepoints, etc., like this:
for channel in channels:
for timepoint in timepoints:
results[timepoint,channel] = compute_something(data[timepoint,channel])
You can instead execute a single call on ALL the data, labeling the relevant axis for the computation (usually trials/observations here), and it will run in parallel (vectorized) across all channels, timepoints, etc. in the data, like this:
results = compute_something(data, axis)
Function list¶
High-level information computation wrapper functions¶
neural_info : Compute neural information using any given method
neural_info_2groups : Compute binary neural info with 2 data groups as inputs
neural_info_ngroups : Compute multi-class neural info with n data groups as inputs
Specific information measures¶
decode : Compute accuracy of decoding task conds from neural activity
mutual_info : Compute Shannon mutual information btwn response and task conds
auroc : Compute area under Receiver Operating Curve
dprime : Compute d-prime (Cohen’s d) – difference in means/pooledSD
- pevCompute percent explained variance by task conds (with optional stats)
anova1 : Compute PEV and stats using 1-way ANOVA
anova2 : Compute PEV and stats using 2-way ANOVA
regress : Compute PEV and stats using multi-factor linear regression
Dependencies¶
patsy : Python package for describing statistical models
Function reference¶
- neural_info(data, labels, axis=0, method='pev', keepdims=True, **kwargs)¶
High-level interface for computing mass-univariate neural information about some task/behavioral variable(s)
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Neural data to measure information in Axis axis should correspond to observations (trials), while rest of axis(s) can be any independent data series (channels, time points, frequencies, etc.) that will be analyzed separately using the same labels.
labels (array-like, shape=(n_obs,) or (n_obs,n_terms)) – List of labels or design matrix to measure information about. Must be same length as data.shape along dimension axis
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
method ({'pev','dprime','auroc','mutual_info','decode'}, default: 'pev') –
Method to use to compute information. See specific functions for details.
’pev’ : Percent of data variance explained by model, using
pev()’dprime’ : d’ metric (difference in means/pooled SD), using
dprime()’auroc’ : Area under ROC curve metric for discriminating labels, using
auroc()’mutual_info’ : Mutual information btwn data and labels, using
mutual_info()’decode’ : Classification accuracy of decoding labels from data, using
decode()
keepdims (bool, default: True) – If True, retains reduced observations axis in output, even if length=1. If False, removes reduced length-1 observations axis from output. NOTE: Return shape rules differ btwn info methods; see method function for details.
**kwargs – All other kwargs passed directly to specific information method function
- Returns:
info – Measure of information in data about labels.
Return shape rules differ somewhat between methods; see specific functions for details. Generally, 1d data -> scalar info. For n-d data, info has same as data, with observation axis reduced to length = 1 (or = n_terms for pev/regression method) if keepdims is True, but with axis removed from output if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…) or (…,n_terms,…)
- neural_info_2groups(data1, data2, axis=0, method='pev', keepdims=True, **kwargs)¶
Alternative interface to compute mass-univariate neural information about some binary (ie 2-group) task/behavioral variable, with the inputs being the data for each of the two possible conditions: (data1, data2)
Note: Some methods (dprime, auroc) are faster with this method and it is thus preferred, but otherwise the main reason to use it is for convenience if your data is already formatted into two groups
NOTE: Only parameters differing from primary interface (
neural_info()) described here.- Parameters:
data1 (ndarray, shape=(...,n_obs1,...)) – Set of values for one condition/group to be compared.
data2 (ndarray, shape=(...,n_obs2,...)) – Set of values for a second condition/group to be compared. Can have different number of observations (trials), but must otherwise have same shape as data1.
- Returns:
info – Measure of information in data about difference between data1 vs data2 vs … data_k.
Return shape rules differ somewhat between methods; see specific functions for details. Generally, 1d data -> scalar info. For n-d data, info has same as data, with observation axis reduced to length = 1 (or = n_terms for pev/regression method) if keepdims is True, but with axis removed from output if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…) or (…,n_terms,…)
- neural_info_ngroups(*args, axis=0, method='pev', keepdims=True, **kwargs)¶
Alternative to compute mass-univariate neural information about some multi-class variable, with the inputs being the data for each of the possible conditions 1-k: (data1, data2, …, data_k)
NOTE: Some methods (dprime, auroc, mutual_info) are designed for only binary comparisons and will raise an error if you try to call them with this function
NOTE: Only parameters differing from primary interface (
neural_info()) described here.- Parameters:
data1...data_k (ndarray, shape=(...,n_obs[j],...)) – Sets of values for each data distribution to be compared. Can have different number of observations (trials), but must otherwise all have same shape.
- Returns:
info – Measure of information in data about difference between data1 vs data2 vs … data_k.
Return shape rules differ somewhat between methods; see specific functions for details. Generally, 1d data -> scalar info. For n-d data, info has same as data, with observation axis reduced to length = 1 (or = n_terms for pev/regression method) if keepdims is True, but with axis removed from output if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…) or (…,n_terms,…)
- decode(data, labels, axis=0, feature_axis=1, decoder='LDA', cv='auto', seed=None, groups=None, as_pct=False, return_stats=False, stats=None, keepdims=True, **kwargs)¶
Mass-multivariate population decoding analysis using given classifier method
Returns cross-validated decoding accuracy, and (optionally) additional stats from decoding classifier.
Computation is performed over trial/observation axis, using features along feature_axis in mass-univariate fashion across data series in all other data dimensions (time points, frequencies, etc.).
- Parameters:
data (ndarray, shape=(...,n_obs,...,n_features,...)) – Neural data to decode labels from. Arbitrary shape, but axis should correspond to observations (trials) and feature_axis should correspond to decoder features (eg neural channels), while rest of axis(s) can be any independent data series (time points, frequencies, etc.) that are analyzed separately (ie separate decoder fit and evaluated at each time point, frequency, etc.).
labels (array-like, shape=(n_obs,)) – Labels/target values for each observation (trial) to predict
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
feature_axis (int, default: 1 (second axis)) – Axis of data array corresponding to decoder features (usually distinct neural channels/electrodes/units)
decoder ({'LDA','logistic','SVM'} or sklearn classifier object, default: 'LDA') –
Classifier method to use for decoding. Options:
- ’LDA’Linear discriminant analysis using
sklearn.discriminant_analysis.LinearDiscriminantAnalysis Unlike scikit’s empirical prior, sets a uniform prior unless specified otherwise (bc any imbalance in emprical probabilities of task conditions is usually happenstance, not predictive and not something we want to use)
- ’logistic’Logistic regression using sklearn.linear_model.LogisticRegression
Sets penalty=’none’ unless specified otherwise in kwargs, unlike scikit’s default ‘l2’ (L2 regularization) bc seems safer to make users opt-in.
- ’SVM’Support vector machine classification using sklearn.svm.SVC
Sets kernel=’linear’ unless specified otherwise in kwargs, unlike scikit’s default ‘rbf’ (nonlinear radial basis functions) bc linear classifiers are much more common in the field (and safer to make users opt-in to nonlinear)
Can also input scikit-learn classifier object instance or custom objects that follow sklearn API (ie has ‘fit’ and ‘score’ methods).
cv (str or sklearn.model_selection "Splitter" object, default: StratifiedKFold(n_splits=5,shuffle=True)) – Determines how cross-validation is done. Set = ‘auto’ for default. Set = None for no cross-validation. Can use custom objects that follow “Splitter’ object API (ie has ‘split’ method).
seed (int, default: None (unseeded random numbers)) – Random generator seed for repeatable results
groups (array-like, shape=(n_groups,), default: unique(labels) (all distinct values)) – Which group labels from labels to use. Useful for computing information on subset of groups/classes in labels.
as_pct (bool, default: False) – Set=True to return decoding accuracy as a percent (range ~ 0-100). Set=False to return accuracy as a proportion (range ~ 0-1)
return_stats (bool, default: False) – Set=True to return additional classifier stats
stats (str or list of str, default: ['predict','prob'] (if return_stats=True)) –
List of additional classifier stats to return. Options:
’predict’ : Predicted class for each trial/observation
’prob’ : Posterior probability for each class, for each trial/observation
’decision’ : “Decision function” confidence score for each trial/observation.
keepdims (bool, default: True) – If True, retains axis and feature_axis reduced to length-one axes in output. If False, removes axis and feature_axis axes from output.
**kwargs – All other kwargs passed directly to decoding object constructor
- Returns:
accuracy (float or ndarray, shape=(…,[1,]…,[1,]…)) – Decoding accuracy. Accuracy given as proportion or percent correct, depending on value of as_pct. Chance = [100 x] 1/n_classes.
If data is 2d (just observation and feature axes), returned as scalar value. For n-d data, it has same shape as data, with axis and feature_axis reduced to length 1 if keepdims is True, or with axis and feature_axis removed if keepdims is False.
stats (dict, optional) – Additional per-trial decoding-related stats, as requested in input argument stats. May include the following:
- ’predict’ndarray, shape=(…,_n_obs,…,[1,]…)
Predicted class for each observation/trial. Same shape as accuracy, but axis has length n_obs.
- ’prob’ndarray, shape=(…,_n_obs,…,n_classes,…)
Posterior probabilty for each class and each observation (trial). Same shape as accuracy, but axis has length n_obs and feature_axis has length n_classes.
- ’decision’ndarray, shape=(…,_n_obs,…,[1,]…) or (…,_n_obs,…,n_classes,…)
”Decision function” confidence score for each trial/observation. For most classifiers, this is proportional to distance from separating hyperplane. Same shape as accuracy, but axis has length n_obs. For binary classification, feature_axis is reduced to length 1 and it reflects the signed decision in favor of class 2. For multi-class classification, it reflects the signed score in favor of each class, for each observation.
Examples
accuracy = decode(data,labels,return_stats=False)
accuracy,stats = decode(data,labels,return_stats=True)
References
- mutual_info(data, labels, axis=0, bins=None, resp_entropy=None, groups=None, keepdims=True)¶
Mass-univariate mutual information between set of discrete-valued (or discretized) neural responses and categorical experimental conditions (often referred to as “stimuli” in theoretical treatments).
NOTE: Currently only 2-class (binary) conditions are supported
info = 0 indicates no mutual information between responses and experimental conditions info = 1 indicates maximum possible information btwn responses and conditions
Computation is performed over trial/observation axis, in mass-univariate fashion across data series in all other data dimensions (channels, time points, frequencies, etc.).
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data values to compute mutual information with labels. If it is not discrete-valued (eg spike counts), it will be discretized using bins
labels (array-like, shape=(n_obs,)) – List of categorical group labels labelling observations from each group. NOTE: Should be only two groups represented, unless sub-selecting two groups using groups argument.
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
bins (array-like, shape=(n_bins,2) or string, default: 'fd' (Freedman–Diaconis rule)) – Non-integer data must be binned for mutual information computation. Bins can be given either explicitly, as an array of bin [left,right] edges, or as a string indicating the type of binning rule to use in np.histogram_bin_edges(). Data is binned only if it is non-integer-valued or if a value is input for bins.
resp_entropy (ndarray, shape=(...,1,...)) – Total response entropy. Can optionally compute and input this to save repeated calculations (eg for distinct contrasts on same data).
groups (array-like, shape=(2,), default: unique(labels) (all distinct values)) – Which group labels from labels to use. Useful for computing information on subset of groups/classes in labels.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from output.
- Returns:
info – Mutual information between responses and experimental conditions (in bits). For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True, or with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…)
Notes
Computes Shannon mutual information using standard equation (cf. Dayan & Abbott, eqn. 4.7): I = H - Hnoise = -Sum(p(r)*log(p(r)) + Sum(p(cat)*p(r|s)*log(p(r|s)))
where H = total response entropy, Hnoise = noise entropy, p(r) is response probability distribution, and p(r|s) is conditional probability of response given experimental condition
Freedman–Diaconis rule for binning: bin width = 2*interquartile_range(data)/cuberoot(n)). See np.histogram_bin_edges() for details.
References
Dayan & Abbott (2001), “Theoretical neuroscience” ch. 4.1
- mutual_information(data, labels, axis=0, bins=None, resp_entropy=None, groups=None, keepdims=True)¶
Alias of
mutual_info(). See there for details.
- auroc(data, labels, axis=0, signed=True, groups=None, keepdims=True)¶
Mass-univariate area-under-ROC-curve metric of discriminability between two data distributions
Calculates area under receiver operating characteristic curve (AUROC) relating hits to false alarms in a binary classification/discrimination
AUROC = 0.5 indicates no difference between distributions, and AUROC = 1 indicates data distributions are completely discriminable.
Computation is performed over trial/observation axis, in mass-univariate fashion across data series in all other data dimensions (channels, time points, frequencies, etc.).
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data values for both distributions to be compared.
labels (array-like, shape=(n_obs,)) – List of group labels labelling observations from each group. Should be only two groups represented, unless sub-selecting two groups using groups argument.
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
signed (bool, default: True) –
If True, returns standard AUROC. If False, returns AUROC rectified around 0.5, corresponding to finding “preferred” distribution for each data series (non-trial dims).
NOTE: rectified AUROC is a biased metric–its expected value is > 0 even for identical data distributions. You might want to use resampling methods to estimate the bias and correct for it.
groups (array-like, shape=(2,), default: unique(labels) (all distinct values in labels)) – Which group labels from labels to use. Useful for enforcing a given order to groups (eg sign to signed AUROC), or for computing info btwn pairs of groups within data with > 2 groups.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from output.
- Returns:
roc – AUROC between groups along given axis. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True, or with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…)
- dprime(data, labels, axis=0, signed=True, groups=None, keepdims=True)¶
Mass-univariate d’ metric of difference between two data distributions
d’ = 0 indicates no difference between distribution means. d’ is unbounded, and increases monotonically with the difference btwn group means and inversely with the pooled std deviation.
NOTE: To enforce a specific sign for the results, it is strongly advised to use the groups argument. The returned dprime values will reflect the groups[0] - groups[1] difference. Otherwise, the difference order (and resulting sign) is based on the sorted done by taking the
np.unique()of the input labels.Computation is performed over trial/observation axis, in mass-univariate fashion across data series in all other data dimensions (channels, time points, frequencies, etc.).
Note: This is actually a wrapper around _dprime_2groups(), which accepts two data distributions as arguments, and is faster.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data values for both distributions to be compared.
labels (array-like, shape=(n_obs,)) – List of group labels labelling observations from each group. Should be only two groups represented, unless sub-selecting two groups using groups argument.
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
signed (bool, default: True) –
If True, returns standard d’. If False, returns absolute value of d’, corresponding to finding “preferred” distribution for each data series (non-trial dims).
NOTE: absolute d’ is a biased metric–its expected value is > 0 even for identical data distributions. You might want to use resampling methods estimate the bias and correct for it.
groups (array-like, shape=(2,), default: unique(labels) (all distinct values in labels)) – Which group labels from labels to use. Useful for enforcing a given order to groups (eg sign to signed d’), or for computing info btwn pairs of groups within data with > 2 groups.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from output.
- Returns:
d – d’ between groups along given axis. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True, or with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…)
Notes
Calculates d’ (aka Cohen’s d) metric of difference btwn two distributions, under (weak-ish) assumption they are IID normal, using formula:: d’ = (mu1 - mu2) / sd_pooled
References
Dayan & Abbott (2001) “Theoretical Neuroscience” eqn. 3.4 (p.91)
- pev(data, labels, axis=0, model=None, omega=True, as_pct=True, return_stats=False, keepdims=True, **kwargs)¶
Mass-univariate percent explained variance (PEV) analysis.
Computes the percentage (or proportion) of variance explained in data by predictors in design matrix/list of labels, using one of a few types of linear models.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data to fit with linear model. axis should correspond to observations (trials), while any other axes can be any independent data series (channels, time points, frequencies, etc.) that will be fit separately using the same list of group labels.
labels (array-like, shape=(n_obs,n_terms) or patsy DesignMatrix object) – Design matrix (group labels for ANOVA models, or regressors for regression model) for each observation (trial). labels.shape[0] must be same length as data.shape[axis].
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
model (str, default: (we attempt to infer from labels; safest to set explicitly)) –
Type of linear model to fit, in order to compute PEV.
omega (bool, default: True) – If True, uses bias-corrected omega-squared formula for PEV. If False uses eta-squared/R-squared formula, which is positively biased.
as_pct (bool, default: True) – If True, return PEV as a percent (range ~ 0-100). If False, return PEV as a proportion (range ~ 0-1)
return_stats (bool, default: False) – If True, return several stats on model fit (eg F-stat,p) in addition to PEV If False, just return PEV.
keepdims (bool, default: True) – If True, retains reduced observations axis in output, even if length=1. If False, removes reduced length-1 observations axis from output. NOTE: Return shape rules differ btwn model types; see model function for details.
**kwargs – All other kwargs passed directly to model function. See those for details.
- Returns:
exp_var (float or ndarray, shape=(…,[n_terms,]…)) – Percent (or proportion) of variance in data explained by labels Shape is usually same as data, with observation axis reduced to length = n_terms.
For single-term ‘anova1’ model: If data is 1d, exp_var is just returned as single scalar. If data is n-d and keepdims is True, it’s returned as same shape as data, with axis reduced to length-1. If data is n-d and keepdims is False, it’s returned as same shape as data, with axis removed.
stats (dict, optional) – If return_stats set, statistics on each fit also returned. See model functions for specific stats returned by each.
Examples
exp_var = pev(data,labels,return_stats=False)
exp_var,stats = pev(data,labels,return_stats=True)
References
Snyder & Lawson (1993) https://doi.org/10.1080/00220973.1993.10806594
- percent_explained_variance(data, labels, axis=0, model=None, omega=True, as_pct=True, return_stats=False, keepdims=True, **kwargs)¶
Alias of
pev(). See there for details.
- anova1(data, labels, axis=0, omega=True, groups=None, gm_method='mean_of_obs', as_pct=True, return_stats=False, keepdims=True)¶
Mass-univariate 1-way ANOVA analyses of one or more data vector(s) on single list of group labels
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data to fit with ANOVA model. axis should correspond to observations (trials), while any other axes can be any independent data series (channels, time points, frequencies, etc.) that will be fit separately using the same list of group labels.
labels (array-like, shape=(n_obs,)) – Group labels for each observation (trial), identifying which group/factor level each observation belongs to. Must be same length as data.shape[axis].
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
omega (bool, default: True) – If True, uses bias-corrected omega-squared formula for PEV. If False uses eta-squared formula, which is positively biased.
groups (array-like, shape=(n_groups,), default: unique(labels) (all distinct values)) – Which group labels from labels to use. Useful for enforcing a given order to groups (reflected in mu’s), or for computing info btwn pairs of groups within data with > 2 groups.
gm_method (str, default: 'mean_of_obs') –
Method used to calculate grand mean for ANOVA formulas. Options:
’mean_of_obs’ : Mean of all observations (more standard ANOVA formula)
’mean_of_means’ : Mean of group means–less downward-bias of PEV,F for unbalanced grp n’s
as_pct (bool, default: True) – If True, return PEV as a percent (range ~ 0-100). If False, return PEV as a proportion (range ~ 0-1)
return_stats (bool, default: False) – If True, return several stats on model fit (eg F-stat,p) in addition to PEV If False, just return PEV.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from output.
- Returns:
exp_var (float or ndarray, shape=(…,[1,]…)) – Percent (or proportion) of variance in data explained by labels. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True, or with axis removed if keepdims is False.
stats (dict, optional) – If return_stats set, statistics on each fit also returned:
- pfloat or ndarray, shape=(…,[1,]…)
F-test p values for each datapoint. Same shape as exp_var.
- Ffloat or ndarray, shape=(…,[1,]…)
F-statistic for each datapoint. Same shape as exp_var.
- mundarray, shape=(…,n_groups,…)
Group mean for each group/level. Note different shape rules.
- nndarray, shape=(n_groups,)
Number of observations (trials) in each group/level
Examples
exp_var = anova1(data,labels,return_stats=False)
exp_var,stats = anova1(data,labels,return_stats=True)
References
Snyder & Lawson (1993) https://doi.org/10.1080/00220973.1993.10806594
- anova2(data, labels, axis=0, interact=None, omega=True, partial=False, total=False, gm_method='mean_of_obs', as_pct=True, return_stats=False, keepdims=True)¶
Mass-univariate 2-way ANOVA analyses of one or more data vector(s) on single set of group labels
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data to fit with ANOVA model. axis should correspond to observations (trials), while rest of axis(s) are any independent data series (channels, time points, frequencies, etc.) that will be fit separately using the same list of group labels.
labels (array-like, shape=(n_obs,n_terms=2 or 3)) – Group labels for each observation (trial), identifying which group (factor level) each observation belongs to. Can either set interaction term labels for column 3, or set interact = True and we will auto-generate interaction term. labels.shape[0] must be same length as data.shape[axis].
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
interact (bool, default: True iff labels has 3rd column (for interaction)) – Determines whether an interaction term is included in model. If True, but but no 3rd entry is given in labels, we auto-generate an interaction term based on all unique combinations of levels in labels[:,0] & labels[:,1].
omega (bool, default: True) – If True, uses bias-corrected omega-squared formula for PEV. If False uses eta-squared formula, which is positively biased.
partial (bool, default: False) – Determines method used to calc PEV – full-model or partial. If False, uses standard full-model PEV = SS_factor / SS_total. Increase in PEV for one factor will decrease all others. If True, uses partial factor PEV = SS_factor / (SS_factor + SS_error). Factor EV’s are therefore independent of each other.
total (bool, default: False) – If True, total PEV summed across all model terms is appended to to end of terms axis in exp_var.
as_pct (bool, default: True) – If True, return PEV as a percent (range ~ 0-100). If False, return PEV as a proportion (range ~ 0-1)
gm_method (str, default: 'mean_of_obs') –
Method used to calculate grand mean for ANOVA formulas. Options:
’mean_of_obs’ : Mean of all observations (more standard ANOVA formula)
- ’mean_of_means’Mean of group (cell) means–less downward-bias of PEV,F
for unbalanced grp n’s. For this option, MUST have interact==True.
return_stats (bool, default: False) – If True, return several stats on model fit (eg F-stat,p) in addition to PEV If False, just return PEV.
keepdims (bool, default: True) – NOTE: This argument is not used, only kept for consistent API with other info funcs.
- Returns:
exp_var (ndarray, shape=(…,n_terms,…)) – Percent (or proportion) of variance in data explained by labels. Shape is same as data, with observation axis reduced to length = n_terms. NOTE: shape rules for returns are different from other info functions bc axis contains model terms and cannot be reduced.
stats (dict, optional) – If return_stats is True, statistics on each fit also returned:
- pndarray, shape=(…,n_terms,…)
F-test p values for each datapoint. Same shape as exp_var.
- Fndarray, shape=(…,n_terms,…)
F-statistic for each datapoint. Same shape as exp_var.
- mulist, shape=(n_terms,) of [ndarray, shape=(…,n_groups,…)]
Group mean for each group (level), in a separate list element for each model term (b/c n_groups not necesarily same for different terms).
- nlist, shape=(n_terms,) of [ndarray, shape=(n_groups,)]
Number of observations (trials) in each group/level, in a separate list element for each model term.
Examples
exp_var = anova2(data,labels,return_stats=False)
exp_var,stats = anova2(data,labels,return_stats=True)
References
Zar “Biostatistical Analysis” 4th ed.
Snyder & Lawson (1993) https://doi.org/10.1080/00220973.1993.10806594
- regress(data, labels, axis=0, col_terms=None, omega=True, constant=True, partial=False, total=False, as_pct=True, return_stats=False, keepdims=True)¶
Mass-univariate ordinary least squares regression analyses of one or more data vector(s) on single design matrix
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data to fit with regression model. axis should correspond to observations (trials), while rest of axis(s) are any independent data series (channels, time points, frequencies, etc.) that will be fit separately using the same list of group labels.
labels (array-like, shape=(n_obs,n_params) or patsy DesignMatrix object) – Regression design matrix. Each row corresponds to a distinct observation (trial), and each column to a distinct predictor (coefficient to fit). If constant is True, a constant (intercept) column will be appended to end of labels, if not already present. labels.shape[0] (number of rows) be same length as data.shape[axis].
axis (int, default: 0 (first axis)) – Axis of data array to perform analysis on, corresponding to trials/observations
col_terms (array-like, shape=(n_params,), default: (assume 1:1 mapping from term:column)) – Lists regression term (eg as integer or string name) corresponding to each column (predictor) in labels. Mapping may not be 1:1 due to multiple dummy-variable columns arising from categorical terms with > 2 levels. PEV/stats are computed separately for all columns/predictors of each term pooled together. If labels is a DesignMatrix, we obtain this from its attributes. Otherwise, we assume 1:1 mapping from term:column (col_terms = np.arange(n_params)).
omega (bool, default: True) – If True, uses bias-corrected omega-squared formula for PEV, If False uses eta-squared formula, which is positively biased.
constant (bool, default: True) – If True, ensures there is a constant column in labels to fit an intercept/bias term (appends if missing, does nothing if present).
partial (bool, default: False) – If True, uses partial-factor formula for PEV, where each term’s EV is expressed relative to only that term + error variance, and thus changes in one term’s EV do not necessarily affect other terms. If False, the standard full-model PEV formula is used.
total (bool, default: False) – If True, total PEV summed across all model terms is appended to to end of terms axis in exp_var.
as_pct (bool, default: True) – If True, return PEV as a percent (range ~ 0-100). If False, return PEV as a proportion (range ~ 0-1)
return_stats (bool, default: False) – If True, return several stats on model fit (eg F-stat,p) in addition to PEV If False, just return PEV.
keepdims (bool, default: True) – If True, retains reduced observations axis in output, even if length=1. If False, removes reduced length-1 observations axis from output.
- Returns:
exp_var (float or ndarray, shape=(…,[n_terms,]…)) – Percent (or proportion) of variance in data explained by labels. Shape is usually same as data, with observation axis reduced to length = n_terms.
For single-term ‘anova1’ model: If data is 1d, exp_var is just returned as single scalar. If data is n-d and keepdims is True, it’s returned as same shape as data, with axis reduced to length-1. If data is n-d and keepdims is False, it’s returned as same shape as data, with axis removed.
stats (dict, optional) – If return_stats is True, statistics on each fit also returned:
- pfloat or ndarray, shape=(…,[n_terms,]…)
F-test p values for each datapoint. Same shape as exp_var.
- Ffloat or ndarray, shape=(…,[n_terms,]…)
F-statistic for each datapoint. Same shape as exp_var.
- Bndarray, shape=(…,n_params,…)
Fitted regression coefficients for each predictor (column in labels). Same shape as exp_var, but with B.shape[axis] = n_params.
Examples
exp_var = regress(data,labels,return_stats=False)
exp_var,stats = regress(data,labels,return_stats=True)
References
Draper & Smith “Applied Regression Analysis” (1998) sxn’s 1.3, 6.1
Snyder & Lawson (1993) https://doi.org/10.1080/00220973.1993.10806594
- patsy_terms_to_columns(labels)¶
Given a patsy DesignMatrix, map model terms to design matrix columns, returning a vector listing the term corresponding to each column.
Note that this correspondence may not be 1:1 due to categorical terms generating multiple dummy columns.
- Parameters:
labels (patsy DesignMatrix object, shape=(n_observations,n_columns)) –
- Returns:
col_terms – Lists term name (from labels.design_info.term_names) corresponding to each column in design matrix labels.
- Return type:
ndarray, shape=(n_columns,), dtype=object(str)
- eta_squared(SS_model, SS_total)¶
Alias of
_R_squared(). See there for details.