General utility functions (spynal.utils)¶
General-purpose python utilities for data preprocessing and analysis
Overview¶
Functionality includes:
basic statistics: z-scoring, t/F-stats, SNR measures (Fano,CV,etc.), correlation
numerical methods: interpolation, setting random seed
functions to reshape data arrays and dynamically index into specific array axes
functions for dealing w/ Numpy “object” arrays (similar to Matlab cell arrays)
various other useful little utilities
Function list¶
Basic statistics¶
zscore : Mass univariate Z-score data along given axis (or whole array)
fano : Fano factor (variance/mean) of data
cv : Coefficient of Variation (SD/mean) of data
cv2 : Local Coefficient of Variation (Holt 1996) of data
lv : Local Variation (Shinomoto 2009) of data
one_sample_tstat : Mass univariate 1-sample t-statistic
paired_tstat : Mass univariate paired-sample t-statistic
two_sample_tstat : Mass univariate 2-sample t-statistic
one_way_fstat : Mass univariate 1-way F-statistic
two_way_fstat : Mass univariate 2-way (with interaction) F-statistic
correlation : Pearson product-moment correlation btwn two variables
rank_correlation : Spearman rank correlation btwn two variables
Numerical utility functions¶
set_random_seed : Seed Python/Numpy random number generators with given seed
interp1 : Interpolate 1d data vector at given index values
gaussian : Evaluate parameterized 1D Gaussian function at given datapoint(s)
gaussian_2d : Evaluate parameterized 2D Gaussian function at given datapoint(s)
gaussian_nd : Evaluate parameterized N-D Gaussian function at given datapoint(s)
is_symmetric : Test if matrix is symmetric
is_positive_definite : Test if matrix is symmetric positive (semi)definite
setup_sliding_windows : Generates set of sliding windows using given parameters
Data indexing and reshaping functions¶
index_axis : Dynamically index into arbitrary axis of ndarray
axis_index_slices : Generates list of slices for dynamic axis indexing
standardize_array : Reshapes array to 2D w/ axis relevant for analysis at start or end
undo_standardize_array : Undoes effect of standardize_array after analysis
data_labels_to_data_groups : Convert (data,labels) pair to tuple of (data_1,data_2,…,data_k)
data_groups_to_data_labels : Convert tuple of (data_1,data_2,…,data_k) to (data,labels) pair
Other utilities¶
iarange : np.arange(), but with an inclusive endpoint
unsorted_unique : np.unique(), but without sorting values
isarraylike : Tests if variable is “array-like” (ndarray, list, or tuple)
isnumeric : Tests if array dtype is numeric (int, float, or complex)
ispc: Tests if running on Windows OS
ismac: Tests if running on MacOS
isunix: Tests if running on Linux/UNIX (but not Mac OS)
object_array_equal : Determine if two object arrays are equal
object_array_compare : Compare each object within an object ndarray
concatenate_object_array : Concatenates objects across one/more axes of object ndarray
Function reference¶
- zscore(data, axis=None, time_range=None, time_axis=None, timepts=None, ddof=0, zerotol=1e-06, return_stats=False)¶
Z-score data along given axis (or over entire array)
Optionally also returns mean,SD (eg, to compute on training set and apply to test set)
- Parameters:
data (array-like, shape=(...,n_obs,...)) – Data to z-score. Arbitrary dimensionality.
axis (int or tuple of ints or None, default: None (compute z-score across entire data array)) – Array axis(s) to compute mean/SD along for z-scoring (usually corresponding to distict trials/observations). If tuple given, mean/SD are computed along all axes listed in tuple. If None, computes mean/SD across entire array (analogous to np.mean/std).
time_range (array-like, shape=(2,), default: None (compute mean/SD over all time points)) – Optionally allows for computing mean/SD within a given time window, then using these to z-score ALL timepoints (eg compute mean/SD within a “baseline” window, then use to z-score all timepoints). Set=[start,end] of time window. If set, MUST also provide values for time_axis and timepts.
time_axis (int, optional) – Axis corresponding to timepoints. Only necessary if time_range is set.
timepts (array-like, shape=(n_timepts,), optional) – Time sampling vector for data. Only necessary if time_range is set, unused otherwise.
ddof (int, default: 0) – Sets divisor for computing SD = N - ddof. Set=0 for max likelihood estimate, set=1 for unbiased (N-1 denominator) estimate
zerotol (float, default: 1e-6) – Any SD values < zerotol are treated as 0, and corresponding z-scores set = np.nan
return_stats (bool, default: False) – If True, also returns computed mean, SD. If False, only returns z-scored data.
- Returns:
data (ndarray, shape=(…,n_obs,…)) – Z-scored data. Same shape as input data.
mean (ndarray, shape=(…,1,…), optional) – Computed means for z-score. Only returned if return_stats is True. Same as input data with ‘axis`reduced to length 1.
sd (ndarray, shape=(…,1,…), optional) – Computed standard deviations for z-score. Only returned if return_stats is True. Same as input data with ‘axis`reduced to length 1.
Examples
data = zscore(data, return_stats=False)
data, mu, sd = zscore(data, return_stats=True)
- fano(data, axis=None, ddof=0, keepdims=True)¶
Computes Fano factor of data along a given array axis or across entire array
np.nan is returned for cases where the mean ~ 0
Fano factor = variance/mean
Fano factor has an expected value of 1 for a Poisson distribution/process.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data of arbitrary shape
axis (int, default: None (compute across entire array)) – Array axis to compute Fano factor on (usually corresponding to distict trials/observations). If None, computes Fano factor across entire array (analogous to np.mean/var).
ddof (int, default: 0) – Sets divisor for computing variance = N - ddof. Set=0 for max likelihood estimate, set=1 for unbiased (N-1 denominator) estimate.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
fano – Fano factor of data. For 1d data or axis=None, a single scalar value is returned. Otherwise, it’s an array w/ same shape as data, but with axis reduced to length 1 if keepdims is True, and with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,] …)
- cv(data, axis=None, ddof=0, keepdims=True)¶
Compute Coefficient of Variation of data, along a given array axis or across entire array
np.nan is returned for cases where the mean ~ 0
CV = standard deviation/mean
CV has an expected value of 1 for a Poisson distribution or Poisson process.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data of arbitrary shape
axis (int, default: None (compute across entire array)) – Array axis to compute Fano factor on (usually corresponding to distict trials/observations). If None, computes Fano factor across entire array (analogous to np.mean/var).
ddof (int, default: 0) – Sets divisor for computing variance = N - ddof. Set=0 for max likelihood estimate, set=1 for unbiased (N-1 denominator) estimate.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
CV – CV (SD/mean) of data. For 1d data or axis=None, a single scalar value is returned. Otherwise, it’s an array w/ same shape as data, but with axis reduced to length 1 if keepdims is True, and with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,] …)
- coefficient_of_variation(data, axis=None, ddof=0, keepdims=True)¶
Alias of
cv()
. See there for details
- cv2(data, axis=0, keepdims=True)¶
Compute local Coefficient of Variation (CV2) of data, along a given array axis
CV2 reduces effects of slow changes in data (eg changes in spike rate) on measure of variation by only comparing adjacent data values (eg adjacent ISIs).
CV2 has an expected value of 1 for a Poisson process.
Typically used as measure of local variation in inter-spike intervals.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data of arbitrary shape
axis (int, default: 0) – Array axis to compute CV2 on. Unlike CV, computing CV2 over entire array is not permitted and will raise an error, as it is a locally-defined measure.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
CV2 – CV2 of data. For 1d data, a single scalar value is returned. Otherwise, it’s an array w/ same shape as data, but with axis reduced to length 1 if keepdims is True, and with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,] …)
References
Holt et al. (1996) Journal of Neurophysiology https://doi.org/10.1152/jn.1996.75.5.1806
- lv(data, axis=0, keepdims=True)¶
Compute Local Variation (LV) of data along a given array axis
LV reduces effects of slow changes in data (eg changes in spike rate) on measure of variation by only comparing adjacent data values (eg adjacent ISIs).
LV has an expected value of 1 for a Poisson process.
Typically used as measure of local variation in inter-spike intervals.
- Parameters:
data (ndarray, shape=(...,n_obs,...)) – Data of arbitrary shape
axis (int, default: 0) – Array axis to compute LV on. Unlike CV, computing LV over entire array is not permitted and will raise an error, as it is a locally-defined measure.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
LV – LV of data. For 1d data, a single scalar value is returned. Otherwise, it’s an array w/ same shape as data, but with axis reduced to length 1 if keepdims is True, and with axis removed if keepdims is False.
- Return type:
float or ndarray, shape=(…,[1,]…)
References
Shinomoto et al. (2009) PLoS Computational Biology https://doi.org/10.1371/journal.pcbi.1000433
- one_sample_tstat(data, axis=0, mu=0, keepdims=True)¶
Mass univariate 1-sample t-statistic, relative to expected mean under null mu
t = (mean(data) - mu) / SEM(data)
- Parameters:
data (ndarray, shape=(...,n,...)) – Data to compute stat on. axis should correspond to distinct observations/trials; other axes Z treated as independent data series, and stat is computed separately for each
axis (int, default: 0 (1st axis)) – Axis of data corresponding to distinct trials/observations.
mu (float, default: 0) – Expected mean under the null hypothesis.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
t – 1-sample t-statistic for data. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True or removed if keepdims is False.
- Return type:
float or ndarray, shape=(…[,1,]…)
- paired_tstat(data1, data2, axis=0, d=0, keepdims=True)¶
Mass univariate paired-sample t-statistic, relative to mean difference under null d
d_obs = data1 - data2
t = (mean(d_obs) - d) / SEM(d_obs)
- Parameters:
data1/data2 (ndarray, shape=(...,n,...)) – Data from two groups to compare. Shape is arbitrary, but must be same for data1,2.
axis (int, default: 0 (1st axis)) – Axis of data corresponding to distinct trials/observations.
d (float, default: 0) – Hypothetical difference in means under null hypothesis
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
t – Paired-sample t-statistic for given data. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True or removed if keepdims is False.
- Return type:
float or ndarray, shape=(…[,1,]…)
- two_sample_tstat(data1, data2, axis=0, equal_var=True, d=0, keepdims=True)¶
Mass univariate 2-sample t-statistic, relative to mean difference under null d
t = (mean(data1) - mean(data2) - mu) / pooledSE(data1,data2)
(where the formula for pooled SE differs depending on equal_var)
- Parameters:
data1 (ndarray, shape=(...,n1,...)) – Data from one group to compare.
data2 (ndarray, shape=(...,n2,...)) – Data from a second group to compare. Need not have the same n as data1, but all other dim’s must be same size/shape. For both, axis should correspond to distinct observations/trials; other axes are treated as independent data series, and stat is computed separately for each.
axis (int, default: 0 (1st axis)) – Axis of data corresponding to distinct trials/observations.
equal_var (bool, default: True) – If True, compute standard t-stat assuming equal population variances for 2 groups. If False, compute Welch’s t-stat, which does not assume equal population variances.
d (float, default: 0) – Hypothetical difference in means under null hypothesis
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
t – 2-sample t-statistic for data. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True or removed if keepdims is False.
- Return type:
float or ndarray, shape=(…[,1,]…)
References
Indep t-test : https://en.wikipedia.org/wiki/Student%27s_t-test#Independent_two-sample_t-test
Welch’s test : https://en.wikipedia.org/wiki/Welch%27s_t-test
- one_way_fstat(data, labels, axis=0, groups=None, keepdims=True)¶
Mass univariate 1-way F-statistic on given data and labels
F = var(between groups) / var(within groups)
- Parameters:
data (ndarray, shape=(...,n,...)) – Data to compute stat on. axis should correspond to distinct observations/trials; other axes are treated as independent data series, and stat is computed separately for each
labels (array-like, shape=(n,)) – Group labels for each observation (trial), identifying which group (factor level) each observation belongs to.
axis (int, default: 0 (1st axis)) – Axis of data corresponding to distinct trials/observations.
groups (array-like, shape=(n_groups,), optional, default: np.unique(labels)) – List of labels for each group (condition). Used to test only a subset of labels.
keepdims (bool, default: True) – If True, retains reduced observations axis as length-one axes in output. If False, removes reduced observations axis from outputs.
- Returns:
F – F-statistic for data. For 1d data, returned as scalar value. For n-d data, it has same shape as data, with axis reduced to length 1 if keepdims is True or removed if keepdims is False.
- Return type:
float or ndarray, shape=(…[,1,]…)
- two_way_fstat(data, labels, axis=0, groups=None)¶
Mass univariate 2-way (with interaction) F-statistic on given data and labels
F = var(between groups) / var(within groups)
- Parameters:
data (ndarray, shape=(...,n,...)) – Data to compute stat on. axis should correspond to distinct observations/trials; other axes are treated as independent data series, and stat is computed separately for each
labels (array-like, shape=(n,n_terms=2|3)) – Group labels for each model term and observation (trial), identifying which group (factor level) each observation belongs to for each term. First 2 columns should reflect main effects, and optional third column should be their interaction.
axis (int, default: 0 (1st axis)) – Axis of data corresponding to distinct trials/observations.
groups (array_like, shape=(n_terms,) of [array-like, shape=(n_groups(term),)], default: all) – List of group labels to use for each for each model term. Used to test only a subset of labels. Default to using all values in labels.
- Returns:
F – F-statistic for given data. Same shape as data, with axis reduced to length=n_terms.
- Return type:
ndarray, shape=(…,n_terms,…)
References
Zar “Biostatistical Analysis” ch.12
- correlation(data1, data2, axis=None, keepdims=True)¶
Compute Pearson product-moment (standard) correlation between two variables, in mass-bivariate fashion
axis is treated as observations (eg trials), which correlation is computed over. Correlations are computed separately across all other array dims (eg, timepoints, freqs, etc). If axis is None, correlations are computed across entire 1-d flattened (unrolled) arrays.
Correlations range from -1 (perfect anti-correlation) to +1 (perfect positive correlation), with 0 indicating a lack of correlation.
Pearson correlation only identifies linear relationships between variables. If a nonlinear (monotonic) relationship is suspected, consider using rank_correlation instead.
- Parameters:
data1 (ndarray, shape=(n,) or (...,n,...)) – Paired data to compute correlations between. Can be 1d vectors or multi-dim arrays, but must have same shape.
data2 (ndarray, shape=(n,) or (...,n,...)) – Paired data to compute correlations between. Can be 1d vectors or multi-dim arrays, but must have same shape.
axis (int or None, default: None (compute across entire flattened array)) – Array axis to treat as observations and compute correlations over. Correlations are computed in mass-bivariate fashion across all other array axes. If axis=None, correlation is computed across entire 1d flattened arrays.
keepdims (bool, default: True) – If False, correlation axis is removed (squeezed out) from output. If True, axis is kept in output as singleton (length 1) axis.
- Returns:
r – Correlation between data1 & data2. For 1d data, r is a float. For multi-d data, r is same shape as data, but with axis reduced to length 1 (if keepdims is True) or removed (if keepdims is False).
- Return type:
float or ndarray, shape=(…,[1,]…)
- rank_correlation(data1, data2, axis=None, keepdims=True)¶
Computes Spearman rank correlation between two variables, in mass-bivariate fashion
Input axis is treated as observations (eg trials), which correlation is computed over. Correlations are computed separately across all other array dims (eg, timepoints, freqs, etc). If axis is None, correlations are computed across entire 1d flattened (unrolled) arrays.
Each data is sorted into rank-order separately, and the resulting ranks are entered into a standard (Pearson) correlation. This identifies any monotonic relationship between variables, and thus should be favored when a nonlinear relationship is suspected.
Correlations range from -1 (perfect anti-correlation) to +1 (perfect positive correlation), with 0 indicating a lack of correlation.
- Parameters:
data1 (ndarray, shape=(n,) or (...,n,...)) – Paired data to compute correlations between. Can be 1d vectors or multi-dim arrays, but must have same shape.
data2 (ndarray, shape=(n,) or (...,n,...)) – Paired data to compute correlations between. Can be 1d vectors or multi-dim arrays, but must have same shape.
axis (int or None, default: None (compute across entire flattened array)) – Array axis to treat as observations and compute correlations over. Correlations are computed in mass-bivariate fashion across all other array axes. If axis=None, correlation is computed across entire 1d flattened arrays.
keepdims (bool, default: True) – If False, correlation axis is removed (squeezed out) from output. If True, axis is kept in output as singleton (length 1) axis.
- Returns:
rho – Correlation between data1 & data2. For 1d data, rho is a float. For multi-d data, rho is same shape as data, but with axis reduced to length 1 (if keepdims is True) or removed (if keepdims is False).
- Return type:
float or ndarray, shape=(…,[1,]…)
- set_random_seed(seed=None)¶
Seed built-in Python and Numpy random number generators with given value
- Parameters:
seed (int or str, default: (use current clock time)) – Seed to use. If string given, converts each char to ascii and sums the resulting values. If no seed given, seeds based on current clock time.
- Returns:
seed – Actual integer seed used
- Return type:
int
- randperm(n, k=None)¶
Generates a random permutation of the integers 0 : n-1, or selects a length-k random subset.
Emulates Matlab randperm().
- Parameters:
n (int) – Length of sequence to permute / subsample from
k (int, default: n (generate random permutation of 0:n-1)) – Length of subset to select. Default generates random permutation w/o sub-selection.
- Returns:
perm – Length-k random subset/permutation of integers 0:n-1
- Return type:
ndarray, shape=(k,), dtype=int
- interp1(x, y, xinterp, axis=0, **kwargs)¶
Interpolate data over one dimension to new sampling vector
Convenience wrapper around
scipy.interpolate.interp1d()
w/o weird call structure- Parameters:
x (array-like, shape=(n_orig,)) – Original 1d sampling vector
y (array-like, shape=(...,n_orig,...)) – Original data sampled at values in x. May contain multiple data vectors sampled along same sampling vector x. The length of y along the interpolation axis axis must be equal to the length of x.
xinterp (array-like, shape=(n_interp,)) – Desired interpolated sampling vector. Typically n_interp > n_orig.
axis (int, default: 0) – Specifies the axis of y along which to interpolate. Defaults to 1st axis.
**kwargs – Any additional keyword args are passed as-is to scipy.interpolate.interp1d
- Returns:
yinterp – Data in y interpolated to sampling in xinterp
- Return type:
ndarray, shape=(n_interp,)
- gaussian(points, center=0.0, width=1.0, amplitude=1.0, baseline=0.0)¶
Evaluate a 1D Gaussian function with given parameters at given datapoint(s)
Parameter values can be set for Gaussian center (mean), width (SD), amplitude, and additive baseline/offset. Defaults are set to generate standard normal function (mean=0, sd=1, amp=1, baseline-0).
- Parameters:
points (float or ndarray, shape=(n_datapoints,)) – Datapoints to evaluate Gaussian function at
center (float, default: 0.0) – Center (mean) of Gaussian function
width (float, default: 1.0) – Width (standard deviation) of Gaussian function
amplitude (float, default: 1.0) – Gaussian amplitude (multiplicative gain)
baseline (float, default: 0.0 (no offset)) – Additive baseline value for Gaussian function
- Returns:
f_x – Gaussian function with given parameters evaluated at each given datapoint. Returned as float for single datapoint input, as array for multiple datapoints.
- Return type:
float or ndarray, shape=(n_datapoints,)
- gaussian_1d(points, center=0.0, width=1.0, amplitude=1.0, baseline=0.0)¶
Alias of
gaussian()
. See there for details
- gaussian_2d(points, center_x=0.0, center_y=0.0, width_x=1.0, width_y=1.0, amplitude=1.0, baseline=0.0, orientation=0.0)¶
Evaluate an 2D Gaussian function with given parameters at given datapoint(s)
Parameter values can be set for Gaussian centers (means), widths (SDs), amplitude, additive baseline, and rotation. Defaults are set to generate unrotated 2D standard normal function (mean=(0,0), sd=(1,1), amp=1, no baseline offset, no rotation).
- Parameters:
points (ndarray, shape=(n_datapoints,2=[x,y])) – Datapoints to evaluate 2D Gaussian function at. Each row is a distinct datapoint x to evaluate f(x) at, and the 2 columns correspond to the 2 dimensions (x and y) of the 2D Gaussian function.
center_x/y (float, default: 0.0) – Center (mean) of Gaussian function along x and y dims
width_x/y (float, default: 1.0) – Width (standard deviation) of Gaussian function along x and y dims
amplitude (float, default: 1.0) – Gaussian amplitude (multiplicative gain)
baseline (float, default: 0.0 (no offset)) – Additive baseline value for Gaussian function
orientation (float, default: 0.0 (axis-aligned)) – Orientation (radians CCW from + x-axis) of 2D Gaussian. 0=oriented along standard x/y axes (non-rotated); 45=oriented along positive diagonal
- Returns:
f_x – 2D Gaussian function with given parameters evaluated at each given datapoint. Returned as float for single datapoint input, as array for multiple datapoints.
- Return type:
float or ndarray, shape=(n_datapoints,)
- gaussian_nd(points, center=None, width=None, covariance=None, amplitude=1.0, baseline=0.0, check=True)¶
Evaluate an N-D Gaussian function with given parameters at given datapoint(s).
Parameter values can be set for Gaussian center (mean), width (SD) or covariance, amplitude, and additive baseline.
Gaussian shape can be set in one of two ways (but NOT using both):
width : computes an axis-aligned (0 off-diagonal covariance) Gaussian with SD’s = width
‘covariance’ : computes an N-D Gaussian with full variance/covariance matrix = covariance
Defaults are set to generate N-D standard normal function (mean=0’s, sd=1’s, no off-diagonal covariance, amp=1, no baseline offset).
- Parameters:
points (ndarray, shape=(n_datapoints,n_dims)) – Datapoints to evaluate N-D Gaussian function at. Each row is a distinct datapoint x to evaluate f(x) at, and each column is a distinct dimension of the N-dimensional Gaussian function.
center (ndarray, shape=(n_dims,) or scalar, default: (0.0,...,0.0) (0 for all dims)) – Center (mean) of Gaussian function along each dim. Scalar value expanded to n_dims.
width (ndarray, shape=(n_dims,) or scalar, default: (1.0,...,1.0) (1 for all dims)) – Width (standard deviation) of Gaussian function along each dim. Scalar value expanded to n_dims. NOTE: Can input values for either width OR covariance. Setting both raises an error.
covariance (ndarray, shape=(n_dims,n_dims), default: (identity matrix: var's=1, covar's=0)) – Variance/covariance matrix for N-D Gaussian. Diagonals are variances for each dim, off- diagonals are covariances btwn corrresponding dims. Must be symmetric symmetric, positive semi-definite matrix Alternative method for setting function width/shape, allowing non-axis-aligned Gaussian. NOTE: Can input values for either width OR covariance. Setting both raises an error.
amplitude (scalar, default: 1.0) – Gaussian amplitude (multiplicative gain)
baseline (scalar, default: 0.0) – Additive baseline value for Gaussian function
check (bool, default: True) – If True, checks if covariance is symmetric positive semidefinite; else skips slow check
- Returns:
f_x – N-D Gaussian function evaluated at each given datapoint. Returned as float for single datapoint input, as array for multiple datapoints.
- Return type:
float or ndarray, shape=(n_datapoints,)
- is_symmetric(X)¶
Test if matrix is symmetric
- Parameters:
X (ndarray or Numpy matrix, shape=Any) – Matrix to test
- Returns:
symmetric – True only if X is square and symmetric
- Return type:
bool
References
https://stackoverflow.com/questions/16266720/find-out-if-matrix-is-positive-definite-with-numpy
- is_positive_definite(X, semi=False)¶
Test if matrix is symmetric positive (semi)definite
- Parameters:
X (ndarray or Numpy matrix, shape=Any) – Matrix to test
semi (bool, default: False) – If True, tests if positive semi-definite. If False, tests if positive definite.
- Returns:
pos_def – True only if X is square and symmetric positive (semi)definite
- Return type:
bool
References
https://stackoverflow.com/questions/16266720/find-out-if-matrix-is-positive-definite-with-numpy
- setup_sliding_windows(width, lims, step=None, reference=None, force_int=False, exclude_end=None)¶
Generate set of sliding windows using given parameters
- Parameters:
width (scalar) – Full width of each window
lims (array-like, shape=(2,)) – [start end] of full range of domain you want windows to sample
step (scalar, default: step = width (ie, perfectly non-overlapping windows)) – Spacing between start of adjacent windows
reference (bool, default: None (just start at lim[0])) – Optionally sets a reference value at which one window starts and the rest of windows will be determined from there. eg, set = 0 to have a window start at x=0, or set = -width/2 to have a window centered at x=0
force_int (bool, default: False (don't round)) – If True, rounds window starts,ends to integer values.
exclude_end (bool, default: True if force_int==True, otherwise False) – If True, excludes the endpoint of each (integer-valued) sliding win from the definition of that win, to prevent double-sampling (eg, the range for a 100 ms window is [1,99], not [1,100])
- Returns:
windows – Sequence of sliding window [start,end]’s
- Return type:
ndarray, shape=(n_wins,2)
- index_axis(data, axis, idxs)¶
Utility to dynamically index into a arbitrary axis of an ndarray
Similar to function of Numpy take and compress functions, but this can take either integer indexes, boolean indexes, or a slice object. And this is generally much faster.
- Parameters:
data (ndarray, shape=Any) – Array of arbitrary shape, to index into given axis of.
axis (int) – Axis of ndarray to index into
idxs (array-like, shape=(n_selected,), dtype=int or array-like, shape=(axis_len,), dtype=bool or Slice object) – Indexing into given axis of array to perform, given as list of integer indexes, as boolean vector, or as Slice object
- Returns:
data – Input array with indexed values selected from given axis.
- Return type:
ndarray
- axis_index_slices(axis, idxs, ndim)¶
Generate list of slices, with ‘:’ for all axes except idxs for axis, to use for dynamic indexing into an arbitary axis of an ndarray
- Parameters:
axis (int) – Axis of ndarray to index into
idxs (array_like, shape=(n_selected,), dtype=int or array-like, shape=(axis_len,), dtrype=bool or Slice object) – Indexing into given axis of array to perform, given as list of integer indexes, as boolean vector, or as Slice object
ndim (int) – Number of dimensions in ndarray to index into
- Returns:
slices – Indexing tuple to use to index into given axis of ndarray as: selected_values = array[slices]
- Return type:
tuple of slices
- standardize_array(data, axis=0, target_axis=0)¶
Reshape multi-dimensional data array to standardized 2D array (matrix-like) form, with axis shifted to target_axis for analysis
- Parameters:
data (ndarray, shape=(...,n,...)) – Data array of arbitrary shape.
axis (int, default: 0) – Axis of data to move to target_axis for subsequent analysis
target_axis (int, default: 0) – Array axis to move axis to for subsequent analysis. NOTE: MUST be 0 (first axis) or -1 (last axis).
- Returns:
data (ndarray, shape=(n,m) or (m,n)) – Data array w/ axis moved to target_axis, and all other axes unwrapped into single dimension, where m = prod(shape[axes != axis])
NOTE: Even 1d (vector) data is expanded into 2d (n,1) | (1,n) array to standardize for calling code.
data_shape (tuple, shape=(data.ndim,)) – Original shape of input data array
- undo_standardize_array(data, data_shape, axis=0, target_axis=0)¶
Undo effect of standardize_array() – reshapes data array from unwrapped 2D (matrix-like) form back to ~ original multi-dimensional form, with axis shifted back to original location (but allowing that data.shape[axis] may have changed)
- Parameters:
data (ndarray, shape=(axis_len,m) or (m,axis_len)) – Standardized data array – with axis moved to target_axis, and all axes != target_axis unwrapped into single dimension, where m = prod(shape[axes != axis])
data_shape (tuple, shape=(data_orig.ndim,)) – Original shape of data array. Second output of standardize_array.
axis (int, default: 0) – Axis of original data moved to target_axis, which will be shifted back to original axis
target_axis (int, default: 0) – Array axis axis was moved to for subsequent analysis NOTE: MUST be 0 (first axis) or -1 (last axis)
- Returns:
data – Data array reshaped back to original shape
- Return type:
ndarray,. shape=(…,axis_len,…)
- data_labels_to_data_groups(data, labels, axis=0, groups=None, max_groups=None)¶
Convert (data,labels) pair to tuple of (data_1,data_2,…,data_k) where each data_j corresponds to all datapoints in input data associated with a given label value (eg group/condition/etc.).
- Parameters:
data (ndarray, shape=(...,N,...)) – Array of multi-class data. Arbitrary shape, but axis must correspond to observations/trials and have same length as labels.
labels (array-like, shape=(N,)) – List of labels corresponding to each observation in data.
axis (int, default: 0) – Axis of data array corresponding to observations/trials in labels
groups (array-like, shape=(n_groups,), default: np.unique(labels) (all unique values)) – Which group labels from labels to include. Useful to ensure a specific group order in outputs or to retain only subset of groups in labels.
max_groups (int, default: None) – Maximum number of allowed groups in data. Raises an error if len(groups) > max_groups. Set=None to allow any number of groups.
- Returns:
data_1,…,data_k – n_groups arrays of data corresponding to each group in groups, each returned in a separate variable. Shape is same as input data on all axes except axis, which is reduced to the n for each group.
- Return type:
ndarray (…,n_j,…)
- data_groups_to_data_labels(*data, axis=0, groups=None)¶
Convert tuple of (data_1,data_2,…,data_k) to (data,labels) pair, where a unique label is associated with all datapoints in each data group data_j (eg group/condition/etc.).
- Parameters:
data_1 (ndarray (...,n_j,...)) – n_groups arrays of data corresponding to each group in groups, each input in a separate variable. Shape is arbitrary, but axis must correspond to observations/trials and all axes but axis must have same length across all data arrays.
... (ndarray (...,n_j,...)) – n_groups arrays of data corresponding to each group in groups, each input in a separate variable. Shape is arbitrary, but axis must correspond to observations/trials and all axes but axis must have same length across all data arrays.
data_k (ndarray (...,n_j,...)) – n_groups arrays of data corresponding to each group in groups, each input in a separate variable. Shape is arbitrary, but axis must correspond to observations/trials and all axes but axis must have same length across all data arrays.
axis (int, default: 0) – Axis of data arrays corresponding to observations/trials in labels
groups (array_like, shape=(n_groups,), default: integers from 0 - n_groups-1) – List of names of each group in input data to use in labels.
- Returns:
data (ndarray, shape=(…,N,…)) – Array of multi-class data. Shape is same as input data on all axes except axis, which expands to the sum of all group n’s.
labels (array-like, shape=(N,)) – List of labels corresponding to each observation in data.
- iarange(*args, **kwargs)¶
Implements
np.arange()
with an inclusive endpoint. Same inputs as np.arange(), same output, except ends at stop, not stop - 1 (or more generally stop - step)Like np.arange, iarange can be called with a varying number of positional arguments:
- iarange(stop)Values are generated within the closed interval [0,stop]
(in other words, the interval including both start AND stop).
iarange(start,stop) : Values are generated within the closed interval [start,stop].
- iarange(start,stop,step)Values are generated within the closed interval [start,stop],
with spacing between values given by step.
- Parameters:
start (int, default: 0) – Starting index for range
stop (int, default: 0) – Inclusive ending index for range
step (int, default: 1) – Stepping value for range
**kwargs – Any other kwargs passed directly to
np.arange()
function
- Returns:
range – Array of evenly spaced values from start to stop (inclusive) in length step steps
- Return type:
ndarray
- unsorted_unique(x, axis=None, **kwargs)¶
Implements
np.unique()
without sorting, ie maintaining original order of unique elements as they are found in x.- Parameters:
x (ndarray, shape:Any) – Array to find unique values in
axis (int, default: None (unique values over entire array)) – Axis of array to find unique values on. If None, finds unique values in entire array.
**kwargs – All other keyword passed directly to np.unique
- Returns:
unique – Unique values in x, in order in which they appear in x
- Return type:
ndarray
References
https://stackoverflow.com/questions/15637336/numpy-unique-with-order-preserved
- isarraylike(x)¶
Test if variable x is “array-like”: np.ndarray, list, or tuple
Returns True if x is array-like, False otherwise
- isnumeric(x)¶
Test if dtype of ndarray x is numeric (some subtype of int,float,complex)
Returns True if x.dtype is numeric, False otherwise
- isunix()¶
Return true iff current system OS is Linux/UNIX (but not Mac OS)
- ismac()¶
Return true iff current system OS is Mac OS
- ispc()¶
Return true iff current system OS is PC Windows
- object_array_equal(data1, data2, comp_func=numpy.array_equal, reduce_func=numpy.all)¶
Determine if each object element within two object arrays is equal
- Parameters:
data1 (ndarray, shape= Any) – Two arrays to determine elementwise equality of. Must have same shape if using anything other than defaults for comp_func, reduce_func (bc we have no way of knowing how to deal with this).
data2 (ndarray, shape= Any) – Two arrays to determine elementwise equality of. Must have same shape if using anything other than defaults for comp_func, reduce_func (bc we have no way of knowing how to deal with this).
comp_func (callable, default: np.array_equal (True iff elements have same shape and values)) – Comparison function used to determine equality of each element If None, no reduction of the comparison results is performed.
reduce_func (callable, default: np.all (True iff ALL objects in array are elementwise True)) – Optional function to reduce equality results for each element across entire array
- Returns:
equal – Reflects equality of each object element in data1,data2. If reduce_func is None, this is the elementwise equality of each object, and has same shape as data1,2. Otherwise, elementwise equality is reduced across the array using reduce_func, and this returns as a single scalar bool.
If data1,2 have different shapes: we return False if comp_func is array_equal and reduce_func is np.all; otherwise an error is raised (don’t know how to compare).
- Return type:
bool or ndarray, shape=data.shape, dtype=bool
- object_array_compare(data1, data2, comp_func=numpy.equal, reduce_func=None)¶
Compares object elements within two object arrays using given comparison function
- Parameters:
data1 (ndarray, shape=Any (but data1.shape = data2.shape)) – Two arrays to determine elementwise equality of
data2 (ndarray, shape=Any (but data1.shape = data2.shape)) – Two arrays to determine elementwise equality of
comp_func (callable, default: np.equal (True/False for each value w/in each object element)) – Comparison function used to compare each object element.
reduce_func (callable, Default: None (don't perform any reduction on result)) – Optional function to reduce comparison results for each element across entire array If None, no reduction of the comparison results is performed.
- Returns:
equal – Reflects comparison of each object element in data1,data2. If reduce_func is None, this is the elementwise comparison of each object, and has same shape as data1,2. Otherwise, elementwise comparison is reduced across the array using reduce_func, and this returns as a single scalar bool.
- Return type:
ndarray | bool
- concatenate_object_array(data, axis=None, elem_axis=0, sort=False)¶
Concatenate objects across one or more axes of an object array. Useful for concatenating spike timestamp or waveform data across trials, units, etc.
- Parameters:
data (ndarray, shape=Any, dtype=object (containing 1d lists/arrays)) –
axis (int or list of int or None, default: None) – Axis(s) of object array to concatenate object array elements across. Set = list of ints to concatenate across multiple axes. Set = None to concatenate across all axes in data.
elem_axis (int, default: 0) – Axis of each element of object array to concatenate along. Note: This is the axis of the contained elements that are being concatenated, as opposed to the axis of object array container to concatenate along (which is axis).
sort (bool, default: False) – If True, resorts items (eg spike timestamps) in concatenated object array elements.
- Returns:
data – Concatenated object(s). If axis is None, returns as single list extracted from object array. Otherwise, returns as object ndarray with all concatenated axes reduced to singletons.
- Return type:
list or ndarray, dtype=object
Examples
- data = [[[1, 2], [3, 4, 5]],
[[6, 7, 8], [9, 10] ]]
concatenate_object_array(data,axis=0) >> [[1,2,6,7,8], [3,4,5,9,10]]
concatenate_object_array(data,axis=1) >> [[1,2,3,4,5], [6,7,8,9,10]]