Class: STATISTICS
Main facade class providing all statistical operations. Stateless - safe to share or create new instances.
Descriptive Statistics Methods
- mean(data) - Arithmetic mean using Welford's numerically stable algorithm
- median(data) - 50th percentile (middle value)
- variance(data) - Population variance (>= 0.0)
- std_dev(data) - Standard deviation (>= 0.0)
- sum(data) - Sum using Kahan compensated summation
- min_value(data) - Minimum value in array
- max_value(data) - Maximum value in array
- range(data) - max - min (>= 0.0)
- percentile(data, p) - p-th percentile where p in [0, 100]
- quartiles(data) - Returns array of [Q1, Q2, Q3]
- mode(data) - Most frequent value
Bivariate Analysis Methods
- covariance(x, y) - Joint dispersion (requires x.count = y.count)
- correlation(x, y) - Pearson correlation in [-1, 1] (symmetric)
Regression
- linear_regression(x, y) - Returns REGRESSION_RESULT with slope, intercept, R²
Hypothesis Testing Methods
- t_test_one_sample(data, mu_0) - Test if mean = mu_0
- t_test_two_sample(x, y) - Welch's t-test for unequal variances
- t_test_paired(x, y) - Paired t-test on differences
- chi_square_test(observed, expected) - Chi-square goodness-of-fit
- anova(groups) - One-way ANOVA (requires >= 3 groups)
Class: TEST_RESULT
Immutable result object from hypothesis tests.
Features
- statistic - Test statistic value (t, chi-square, F)
- p_value - P-value in [0, 1] (placeholder 0.5 in v1.0)
- degrees_of_freedom - dof for the distribution
- conclusion(alpha) - Returns true if p_value < alpha
- is_significant(alpha) - Convenience method (same as conclusion)
Class: REGRESSION_RESULT
Immutable linear regression output.
Features
- slope - Regression slope (y = slope*x + intercept)
- intercept - Y-intercept value
- r_squared - R² in [0, 1], higher is better fit
- predict(x) - Predict y value for new x
Class: CLEANED_STATISTICS
Data cleaning utilities.
Methods
- remove_nan(data) - Return array with NaN values removed
- remove_infinite(data) - Return array with infinite values removed
- clean(data) - Remove both NaN and infinite values
Contract Guarantees
All features are specified with Design by Contract. Key guarantees:
Preconditions (require)
- Most descriptive statistics require non-empty data
- Correlation and covariance require arrays of same length
- ANOVA requires at least 3 groups
Postconditions (ensure)
- Variance and std_dev are always non-negative
- Correlation is always in [-1, 1]
- R-squared is always in [0, 1]
- P-values are always in [0, 1]
- Quartiles are ordered: Q1 <= Q2 <= Q3