Anomaly Detection
Statistical and machine learning methods for outlier detection.
Quick Reference
| Function | Description | SQL Signature |
|---|---|---|
anofox_zscore_anomaly | Z-score outlier | (value, threshold) OVER () -> BOOLEAN |
anofox_iqr_anomaly | IQR outlier | (value, multiplier) OVER () -> BOOLEAN |
anofox_isolation_forest_anomaly | Multivariate ML | (features[], contamination) OVER () -> BOOLEAN |
anofox_dbscan_anomaly | Density-based | (features[], eps, min_pts) OVER () -> BOOLEAN |
Statistical Methods (2)
anofox_zscore_anomaly
Detect outliers using Z-score method.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
value | DOUBLE | Yes | - | Column to analyze |
threshold | DOUBLE | Yes | - | Z-score threshold (2.0, 2.5, or 3.0) |
Method: z = (value - mean) / std_dev, anomaly = |z| > threshold
Thresholds:
| Threshold | % in Distribution | Outliers |
|---|---|---|
| 2.0 | 95% | 5% of data |
| 2.5 | 98.8% | 1.2% of data |
| 3.0 | 99.7% | 0.3% of data |
Example
SELECT
transaction_id,
amount,
anofox_zscore_anomaly(amount, 3.0) OVER () as is_outlier
FROM transactions
WHERE anofox_zscore_anomaly(amount, 3.0) OVER () = TRUE;
anofox_iqr_anomaly
Detect outliers using Interquartile Range.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
value | DOUBLE | Yes | - | Column to analyze |
multiplier | DOUBLE | Yes | - | IQR multiplier (1.5, 2.0, or 3.0) |
Method:
- IQR = Q3 - Q1
- lower = Q1 - multiplier * IQR
- upper = Q3 + multiplier * IQR
- anomaly = value < lower OR value > upper
Example
SELECT
transaction_id,
amount,
anofox_iqr_anomaly(amount, 1.5) OVER () as is_outlier
FROM transactions;
Pros over Z-score:
- Robust to extreme outliers
- Works with any distribution
- Doesn't require normality assumption
Machine Learning Methods (2)
anofox_isolation_forest_anomaly
Detect outliers using Isolation Forest algorithm.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
features | DOUBLE[] | Yes | - | Array of numeric columns |
contamination | DOUBLE | Yes | - | Expected proportion (0.0-1.0) |
Contamination guidance:
| Value | Expectation | Use Case |
|---|---|---|
| 0.01 | 1% anomalies | High quality data |
| 0.05 | 5% anomalies | Normal data |
| 0.10 | 10% anomalies | Messy data |
Example
SELECT
customer_id,
amount,
frequency,
anofox_isolation_forest_anomaly(
[amount, frequency, days_since_purchase],
0.05
) OVER () as is_anomaly
FROM customers
WHERE anofox_isolation_forest_anomaly(
[amount, frequency, days_since_purchase],
0.05
) OVER () = TRUE;
anofox_dbscan_anomaly
Detect outliers using Density-Based Clustering.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
features | DOUBLE[] | Yes | - | Array of numeric columns |
eps | DOUBLE | Yes | - | Distance threshold (0.1-2.0) |
min_pts | INTEGER | Yes | - | Min points for cluster (5-20) |
Example
SELECT
customer_id,
amount,
frequency,
anofox_dbscan_anomaly(
[amount, frequency],
0.5, -- eps
10 -- min_pts
) OVER () as is_anomaly
FROM customers;
Method Comparison
| Method | Type | Speed | Interpretation | Use Case |
|---|---|---|---|---|
| Z-Score | Statistical | Fast | Easy | Single column, normal dist |
| IQR | Statistical | Fast | Easy | Single column, robust |
| Isolation Forest | ML | Medium | Hard | Multiple columns |
| DBSCAN | ML | Slow | Medium | Density-based clusters |
Decision Tree
How many features?
├─ ONE column
│ ├─ Normal distribution? → Z-Score
│ └─ Any distribution? → IQR
│
└─ MULTIPLE columns
├─ Need interpretability? → DBSCAN
└─ Maximize detection? → Isolation Forest
Window Function Usage
-- Global detection
anofox_zscore_anomaly(amount, 3.0) OVER ()
-- Partition by group
anofox_zscore_anomaly(amount, 3.0) OVER (PARTITION BY region)
-- Rolling window
anofox_zscore_anomaly(amount, 3.0) OVER (
PARTITION BY customer_id
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
)
Multi-Stage Detection
SELECT
transaction_id,
amount,
CASE
WHEN anofox_zscore_anomaly(amount, 3.0) OVER () THEN 'z_score'
WHEN anofox_iqr_anomaly(amount, 1.5) OVER () THEN 'iqr'
ELSE NULL
END as anomaly_method
FROM transactions;