Installation & Setup
Get AnoFox Statistics running in minutes. Choose the installation method that works for you.
System Requirements
| Requirement | Version | Notes |
|---|---|---|
| DuckDB | 1.4.2 or later | Required |
| OS | Linux, macOS, Windows | Most platforms supported |
| Architecture | x86_64, ARM64 | WASM also supported |
| Memory | 2GB minimum | Depends on model size and data volume |
Installation Methods
Option 1: From DuckDB Community Registry (Recommended)
The easiest way to install AnoFox Statistics.
INSTALL anofox_statistics FROM community;
LOAD anofox_statistics;
Advantages:
- ✅ Automatic updates
- ✅ No build required
- ✅ Works on all platforms
- ✅ Recommended for most users
Next step: Verify your installation
Option 2: From GitHub (Bleeding Edge)
Build the latest development version from source.
Prerequisites
- C++ compiler: GCC 7+ or Clang 5+
- CMake: 3.21 or later
- Git: To clone the repository
Build Steps
# Clone the repository
git clone https://github.com/datazoode/anofox.git
cd anofox/statistics
# Create build directory
mkdir build && cd build
# Configure CMake
cmake ..
# Build the extension
make -j$(nproc)
# Install (optional)
sudo make install
Load from Custom Path
LOAD 'path/to/build/release/anofox_statistics.so';
Verify Installation
Test that the extension loads and functions correctly.
-- Load extension
LOAD anofox_statistics;
-- Create sample data
CREATE TABLE sample_data AS
SELECT
i::INTEGER as x,
(i * 2.5 + RANDOM() * 10)::DOUBLE as y
FROM range(100) t(i);
-- Test OLS regression
SELECT
coefficient[1] as intercept,
coefficient[2] as slope,
r_squared,
rmse
FROM anofox_statistics_ols(
'sample_data',
'y',
ARRAY['x']
);
Expected output:
intercept | slope | r_squared | rmse
------------|-------|-----------|--------
-4.3 | 2.5 | 0.98 | 2.1
If you see results similar to above, the installation is successful!
Troubleshooting
Extension Won't Load
Error: Extension "anofox_statistics" not found
Solution 1: Install from community registry
INSTALL anofox_statistics FROM community;
LOAD anofox_statistics;
Solution 2: Verify DuckDB version
SELECT version();
-- Should be 1.4.2 or later
Solution 3: Try explicit installation
INSTALL anofox_statistics VERSION 'latest' FROM community;
Function Not Found
Error: Unknown function anofox_statistics_ols
Solution: Extension may not be loaded in this session
LOAD anofox_statistics;
-- Then try again
SELECT * FROM anofox_statistics_ols(...);
Build Fails
Error: CMake not found or compilation errors
Solutions:
# Install cmake (macOS)
brew install cmake
# Install cmake (Ubuntu/Debian)
sudo apt-get install cmake
# Install cmake (Windows)
# Download from https://cmake.org/download/
Insufficient Observations
Error: Insufficient observations for regression
Reason: Need at least p+1 observations (p = number of predictors)
Solution: Use more rows or fewer predictors
-- Bad: 10 rows, 5 predictors (need 6+ rows)
SELECT * FROM anofox_statistics_ols(
'small_table',
'y',
ARRAY['x1', 'x2', 'x3', 'x4', 'x5']
);
-- Good: 100 rows, 5 predictors
SELECT * FROM anofox_statistics_ols(
'large_table',
'y',
ARRAY['x1', 'x2', 'x3', 'x4', 'x5']
);
Configuration
Persistent Extension Loading
To load the extension automatically, create a .duckdbrc file:
# ~/.duckdbrc
LOAD anofox_statistics;
Performance Tips
For Large Datasets
- Use aggregate functions for massive datasets
-- Efficient: aggregate across groups
SELECT
group_id,
result.coefficients[1] as intercept,
result.r_squared
FROM table
GROUP BY group_id
APPLY anofox_statistics_ols_agg(y, ARRAY[x1, x2]);
- Create indexes on key columns
CREATE INDEX idx_table_group ON table(group_id);
CREATE INDEX idx_table_y ON table(y);
- Sample data for exploration
SELECT * FROM anofox_statistics_ols(
'SELECT * FROM large_table USING SAMPLE 10000',
'y',
ARRAY['x1', 'x2']
);
Version Compatibility
| AnoFox Version | DuckDB Min | DuckDB Max | Notes |
|---|---|---|---|
| 1.0 | 1.4.2 | Latest | Initial release |
Next Steps
- Quickstart — Run your first regression
- Basic Workflow — Complete end-to-end example
- API Reference — All functions