User Guide - simple

Getting Started

Installation

Add simple_statistics to your ECF file:

<library name="simple_statistics" location="$SIMPLE_EIFFEL/simple_statistics/simple_statistics.ecf"/>

Basic Setup

To use the library, create an instance of STATISTICS:

local
    stats: STATISTICS
do
    create stats.make
    -- stats is now ready to use

The STATISTICS class is stateless - you can share a single instance across your application or create new instances as needed.

Comprehensive Examples

Example 1: Analyzing Test Scores

local
    stats: STATISTICS
    scores: ARRAY [REAL_64]
    mean, median, std: REAL_64
do
    create stats.make
    scores := {ARRAY [REAL_64]} << 85.0, 92.0, 78.0, 88.0, 95.0, 82.0 >>

    -- Calculate descriptive statistics
    mean := stats.mean (scores)              -- Average: 86.67
    median := stats.median (scores)          -- Middle value: 86.5
    std := stats.std_dev (scores)            -- Variability: 6.12

    print ("Class Performance Report%N")
    print ("Mean Score: " + mean.out + "%N")
    print ("Median Score: " + median.out + "%N")
    print ("Std Dev: " + std.out + "%N")

Example 2: Correlation Analysis

local
    stats: STATISTICS
    hours_studied, test_scores: ARRAY [REAL_64]
    correlation: REAL_64
do
    create stats.make
    hours_studied := {ARRAY [REAL_64]} << 1.0, 2.0, 3.0, 4.0, 5.0 >>
    test_scores := {ARRAY [REAL_64]} << 65.0, 72.0, 81.0, 88.0, 95.0 >>

    correlation := stats.correlation (hours_studied, test_scores)

    if correlation > 0.9 then
        print ("Strong positive correlation: more study time = higher scores%N")
    elseif correlation > 0.7 then
        print ("Moderate positive correlation%N")
    else
        print ("Weak or no correlation%N")
    end

Example 3: Linear Regression

local
    stats: STATISTICS
    x, y: ARRAY [REAL_64]
    result: REGRESSION_RESULT
do
    create stats.make
    x := {ARRAY [REAL_64]} << 1.0, 2.0, 3.0, 4.0, 5.0 >>
    y := {ARRAY [REAL_64]} << 2.0, 4.0, 6.0, 8.0, 10.0 >>

    result := stats.linear_regression (x, y)

    print ("Regression Equation: y = " + result.slope.out + "*x + " + result.intercept.out + "%N")
    print ("R-squared: " + result.r_squared.out + "%N")
    print ("Prediction at x=6: " + result.predict (6.0).out + "%N")

Example 4: Hypothesis Testing

local
    stats: STATISTICS
    group1, group2: ARRAY [REAL_64]
    result: TEST_RESULT
do
    create stats.make
    group1 := {ARRAY [REAL_64]} << 100.0, 105.0, 110.0, 95.0, 108.0 >>
    group2 := {ARRAY [REAL_64]} << 98.0, 102.0, 104.0, 96.0, 101.0 >>

    -- Test if groups have significantly different means
    result := stats.t_test_two_sample (group1, group2)

    if result.is_significant (0.05) then
        print ("Groups are significantly different at 0.05 level%N")
    else
        print ("No significant difference between groups%N")
    end

    print ("t-statistic: " + result.statistic.out + "%N")
    print ("p-value: " + result.p_value.out + "%N")

Best Practices

Data Validation

Always ensure your data is clean before analysis:

local
    stats: STATISTICS
    clean: CLEANED_STATISTICS
    raw_data, clean_data: ARRAY [REAL_64]
do
    create stats.make
    create clean.make

    -- Remove invalid values
    clean_data := clean.clean (raw_data)

    -- Now safe to analyze
    if clean_data.count > 0 then
        print ("Mean: " + stats.mean (clean_data).out + "%N")
    end

Precondition Handling

All features have preconditions. Check them before calling:

-- Check that data is not empty before calling mean
if not data.is_empty then
    avg := stats.mean (data)
else
    print ("Error: cannot compute mean of empty array%N")
end

-- Check that arrays have same length
if x.count = y.count then
    corr := stats.correlation (x, y)
else
    print ("Error: arrays must have same length%N")
end

Numerical Stability

The library uses numerically stable algorithms:

Welford's algorithm for mean and variance (one pass, stable)
Kahan summation for sum (compensated addition)
Linear interpolation for percentiles

These ensure accurate results even with extreme magnitudes or large datasets.

Contract Verification

All features in simple_statistics are specified with Design by Contract. This means:

Preconditions (require): What must be true when calling the feature (e.g., data not empty)
Postconditions (ensure): What is guaranteed to be true after execution
Invariants: Properties that remain true throughout object lifetime

Example from `mean`:

mean (data: ARRAY [REAL_64]): REAL_64
    require
        data_not_empty: not data.is_empty
    do
        -- Implementation
    ensure
        result_is_average: True  -- result is the arithmetic mean
    end

These contracts are verified by the implementation and checked during testing. You can rely on them.

Next Steps

Quick API - Fast reference for common operations
API Reference - Complete documentation
Cookbook - More real-world examples and patterns