Code Recipes
Recipe 1: Outlier Detection
Detect values more than 3 standard deviations from the mean:
local
stats: STATISTICS
data: ARRAY [REAL_64]
mean_val, std_val: REAL_64
i: INTEGER
do
create stats.make
mean_val := stats.mean (data)
std_val := stats.std_dev (data)
from i := data.lower until i > data.upper loop
if (data[i] - mean_val).abs > 3 * std_val then
print ("Outlier detected at index " + i.out + ": " + data[i].out + "%N")
end
i := i + 1
end
Recipe 2: Group Comparison
Compare two groups and report if they differ significantly:
local
stats: STATISTICS
control, treatment: ARRAY [REAL_64]
result: TEST_RESULT
do
create stats.make
result := stats.t_test_two_sample (control, treatment)
print ("Control mean: " + stats.mean (control).out + "%N")
print ("Treatment mean: " + stats.mean (treatment).out + "%N")
print ("t-statistic: " + result.statistic.out + "%N")
if result.is_significant (0.05) then
print ("RESULT: Treatment has significant effect (p < 0.05)%N")
else
print ("RESULT: No significant difference (p >= 0.05)%N")
end
Recipe 3: Relationship Strength
Measure how strongly two variables are related:
local
stats: STATISTICS
var1, var2: ARRAY [REAL_64]
corr: REAL_64
do
create stats.make
corr := stats.correlation (var1, var2)
if corr > 0.9 then
print ("Very strong positive relationship%N")
elseif corr > 0.7 then
print ("Strong positive relationship%N")
elseif corr > 0.5 then
print ("Moderate positive relationship%N")
elseif corr > 0.3 then
print ("Weak positive relationship%N")
elseif corr > -0.3 then
print ("No relationship%N")
elseif corr > -0.5 then
print ("Weak negative relationship%N")
else
print ("Strong negative relationship%N")
end
Recipe 4: Prediction from Model
Build a model and make predictions for new data:
local
stats: STATISTICS
x_training, y_training: ARRAY [REAL_64]
x_new: REAL_64
result: REGRESSION_RESULT
y_predicted: REAL_64
do
create stats.make
-- Build regression model on training data
result := stats.linear_regression (x_training, y_training)
-- Print model
print ("Model: y = " + result.slope.out + " * x + " + result.intercept.out + "%N")
print ("R-squared: " + result.r_squared.out + "%N")
-- Make prediction for new x value
x_new := 42.0
y_predicted := result.predict (x_new)
print ("Prediction for x=" + x_new.out + ": y=" + y_predicted.out + "%N")
Recipe 5: Data Quality Assessment
Assess and clean data with issues:
local
stats: STATISTICS
clean: CLEANED_STATISTICS
raw_data, clean_data: ARRAY [REAL_64]
do
create stats.make
create clean.make
-- Assess raw data
print ("Original data size: " + raw_data.count.out + "%N")
if clean.has_nan (raw_data) then
print ("WARNING: Data contains NaN values%N")
end
if clean.has_infinite (raw_data) then
print ("WARNING: Data contains infinite values%N")
end
-- Clean data
clean_data := clean.clean (raw_data)
print ("Cleaned data size: " + clean_data.count.out + "%N")
print ("Removed " + (raw_data.count - clean_data.count).out + " invalid entries%N")
-- Proceed with analysis
if clean_data.count >= 2 then
print ("Mean of clean data: " + stats.mean (clean_data).out + "%N")
end
Design Patterns
Pattern 1: Exploratory Data Analysis (EDA)
Quick summary of a dataset:
local
stats: STATISTICS
data: ARRAY [REAL_64]
do
create stats.make
print ("=== Data Summary ===%N")
print ("Count: " + data.count.out + "%N")
print ("Min: " + stats.min_value (data).out + "%N")
print ("Q1: " + stats.quartiles (data)[1].out + "%N")
print ("Median: " + stats.median (data).out + "%N")
print ("Mean: " + stats.mean (data).out + "%N")
print ("Q3: " + stats.quartiles (data)[3].out + "%N")
print ("Max: " + stats.max_value (data).out + "%N")
print ("Std Dev: " + stats.std_dev (data).out + "%N")
Pattern 2: Hypothesis Testing Pipeline
Standardized workflow for statistical testing:
1. Formulate hypothesis (null and alternative)
2. Collect data and choose alpha level (e.g., 0.05)
3. Check assumptions (normality, equal variance)
4. Run appropriate test
5. Interpret results based on p-value
6. Report conclusions
Pattern 3: Model Validation
Use train/test split for regression:
-- Split data into training (80%) and test (20%)
-- Train model on training set
-- Evaluate R-squared on test set
-- If R-squared is high, model generalizes well
Pattern 4: Multiple Comparisons
When comparing many groups, use ANOVA instead of multiple t-tests to control Type I error.
Troubleshooting
Q: I get a precondition violation
A: Check preconditions before calling features. Example: mean() requires non-empty data. Always verify array.count > 0.
Q: My correlation is NaN
A: This happens when variance is zero (all values identical). Check your data and handle this edge case.
Q: R-squared is negative
A: This shouldn't happen in v1.0 - it's clamped to [0, 1]. If you see it, report a bug.
Q: P-values are always 0.5
A: Yes - in v1.0, p-values are placeholders. This will be fixed when distribution CDFs are implemented.
Q: Data cleaning lost too much data
A: Use remove_nan and remove_infinite separately to see which values are problematic. Investigate why data has NaN/infinite values.