Skip to main content

Installation & Setup

Get AnoFox Statistics running in minutes. Choose the installation method that works for you.

System Requirements

RequirementVersionNotes
DuckDB1.4.2 or laterRequired
OSLinux, macOS, WindowsMost platforms supported
Architecturex86_64, ARM64WASM also supported
Memory2GB minimumDepends on model size and data volume

Installation Methods

The easiest way to install AnoFox Statistics.

INSTALL anofox_statistics FROM community;
LOAD anofox_statistics;

Advantages:

  • ✅ Automatic updates
  • ✅ No build required
  • ✅ Works on all platforms
  • ✅ Recommended for most users

Next step: Verify your installation


Option 2: From GitHub (Bleeding Edge)

Build the latest development version from source.

Prerequisites

  • C++ compiler: GCC 7+ or Clang 5+
  • CMake: 3.21 or later
  • Git: To clone the repository

Build Steps

# Clone the repository
git clone https://github.com/datazoode/anofox.git
cd anofox/statistics

# Create build directory
mkdir build && cd build

# Configure CMake
cmake ..

# Build the extension
make -j$(nproc)

# Install (optional)
sudo make install

Load from Custom Path

LOAD 'path/to/build/release/anofox_statistics.so';

Verify Installation

Test that the extension loads and functions correctly.

-- Load extension
LOAD anofox_statistics;

-- Create sample data
CREATE TABLE sample_data AS
SELECT
i::INTEGER as x,
(i * 2.5 + RANDOM() * 10)::DOUBLE as y
FROM range(100) t(i);

-- Test OLS regression
SELECT
coefficient[1] as intercept,
coefficient[2] as slope,
r_squared,
rmse
FROM anofox_statistics_ols(
'sample_data',
'y',
ARRAY['x']
);

Expected output:

intercept   | slope | r_squared | rmse
------------|-------|-----------|--------
-4.3 | 2.5 | 0.98 | 2.1

If you see results similar to above, the installation is successful!


Troubleshooting

Extension Won't Load

Error: Extension "anofox_statistics" not found

Solution 1: Install from community registry

INSTALL anofox_statistics FROM community;
LOAD anofox_statistics;

Solution 2: Verify DuckDB version

SELECT version();
-- Should be 1.4.2 or later

Solution 3: Try explicit installation

INSTALL anofox_statistics VERSION 'latest' FROM community;

Function Not Found

Error: Unknown function anofox_statistics_ols

Solution: Extension may not be loaded in this session

LOAD anofox_statistics;

-- Then try again
SELECT * FROM anofox_statistics_ols(...);

Build Fails

Error: CMake not found or compilation errors

Solutions:

# Install cmake (macOS)
brew install cmake

# Install cmake (Ubuntu/Debian)
sudo apt-get install cmake

# Install cmake (Windows)
# Download from https://cmake.org/download/

Insufficient Observations

Error: Insufficient observations for regression

Reason: Need at least p+1 observations (p = number of predictors)

Solution: Use more rows or fewer predictors

-- Bad: 10 rows, 5 predictors (need 6+ rows)
SELECT * FROM anofox_statistics_ols(
'small_table',
'y',
ARRAY['x1', 'x2', 'x3', 'x4', 'x5']
);

-- Good: 100 rows, 5 predictors
SELECT * FROM anofox_statistics_ols(
'large_table',
'y',
ARRAY['x1', 'x2', 'x3', 'x4', 'x5']
);

Configuration

Persistent Extension Loading

To load the extension automatically, create a .duckdbrc file:

# ~/.duckdbrc
LOAD anofox_statistics;

Performance Tips

For Large Datasets

  1. Use aggregate functions for massive datasets
-- Efficient: aggregate across groups
SELECT
group_id,
result.coefficients[1] as intercept,
result.r_squared
FROM table
GROUP BY group_id
APPLY anofox_statistics_ols_agg(y, ARRAY[x1, x2]);
  1. Create indexes on key columns
CREATE INDEX idx_table_group ON table(group_id);
CREATE INDEX idx_table_y ON table(y);
  1. Sample data for exploration
SELECT * FROM anofox_statistics_ols(
'SELECT * FROM large_table USING SAMPLE 10000',
'y',
ARRAY['x1', 'x2']
);

Version Compatibility

AnoFox VersionDuckDB MinDuckDB MaxNotes
1.01.4.2LatestInitial release

Next Steps

🍪 Cookie Settings