BOM Analysis Beyond Excel: SQL-Native Product Similarity in DuckDB

February 26, 2026 · 6 min read

AnoFox Development Team

Your new laptop launches in 6 weeks. Finance wants a demand forecast. Problem: zero sales history.

The S&OP playbook says: find similar products, use their history as analogs. But "which products are similar?" means comparing BOMs. And comparing BOMs often means... Excel. VLOOKUP. Manual counting. Hours later, you have five candidates and zero confidence it's complete.

The typical enterprise has 5,000-50,000 active products. Each BOM of one of these products might have 10-200 components. That's millions of potential comparisons. You need a system that can answer "what's similar to our new laptop?" in seconds, not hours.

BOM comparison challenge: manual Excel vs SQL-native approach

The Example: Consumer Electronics BOMs

We'll demonstrate with a real dataset: the Figshare Consumer Electronics BOM dataset (CC0 licensed). It contains material composition data for 156 consumer electronics products. The material compositions are unfortunately not real BOMs, but close enough for our purposes:

156 products: iPhones, Samsung Galaxy phones, laptops, TVs, gaming consoles, printers
11 material types: aluminum, copper, PCB, Li-ion battery, plastics, steel, glass
1,097 BOM relationships: product-to-material mappings with mass quantities

This dataset lets us test: Can we find iPhone analogs? Do laptops cluster together? Does BOM similarity predict product similarity?

Here's what two BOMs look like— first for a smartphone:

And the second for a CRT TV:

Both share core materials (aluminum, copper, PCB, plastic, steel). But the discriminating materials differ: smartphones have Li-ion batteries and flat panel glass; CRT TVs have CRT glass and no battery. This is why they cluster into different categories.

The Solution: SQL-Native BOM Similarity

We built anofox-similarity—a DuckDB extension that brings graph-based product similarity directly into SQL.

-- Your new laptop launches in 6 weeks. Find forecast analogs.
SELECT * FROM find_similar_materials('LAPTOP-NEW-2026', k := 5);

That's it. One function call. Results in milliseconds.

No Python scripts. No data exports. No external services. Your BOM data stays in DuckDB, and similarity analysis becomes just another SQL query.

Real product similarity isn't just about component overlap. A laptop and a tablet might share 80% of their materials—but have completely different demand patterns, descriptions, and assembly structures.

anofox-similarity uses three signals to find truly similar products:

Multi-modal similarity: combining structural, textual, and transactional signals

1. Structural Similarity (BOM Topology)

The extension analyzes the graph structure of your BOMs using the Weisfeiler-Lehman (WL) kernel. This captures:

Which components are used together
How materials are assembled into subassemblies
The depth and shape of the BOM hierarchy

Two products with identical parts but different assembly structures will score differently—because structure affects manufacturing complexity, cost, and quality.

2. Textual Similarity (Product Descriptions)

Product names and descriptions carry semantic meaning. "Dell Latitude E6430" is textually similar to "HP EliteBook 8470p"—both are business laptops from the early 2010s.

The extension uses an E5-small embedding model (runs locally, no API calls) to convert descriptions into 384-dimensional vectors. Products with similar descriptions cluster together.

3. Transactional Similarity (Consumption Patterns)

Products with similar demand patterns are often good forecast analogs—even if their BOMs differ slightly. The extension extracts time-series features from goods movements:

Consumption trends and seasonality
Volatility and coefficient of variation
Autocorrelation at different lags

This helps find products that behave similarly, not just look similar.

Fused Embeddings

All three signals combine into a single 768-dimensional embedding per product:

fused_embedding = [structural || textual || transactional]

Vector search then finds the k-nearest neighbors in milliseconds—even across 50,000+ products.

Finding Forecast Analogs

Let's load the consumer electronics dataset and find analogs for a laptop:

-- Load the extension
INSTALL anofox_similarity FROM community;
LOAD anofox_similarity;

-- Load the Figshare consumer electronics data
CREATE TABLE materials AS SELECT * FROM read_csv('materials.csv');
CREATE TABLE bom_items AS SELECT * FROM read_csv('bom_items.csv');

-- Initialize embeddings (builds structural + textual vectors)
SELECT similarity_init();

Now find the most similar products:

-- Find the 5 most similar products using multi-modal search
SELECT * FROM find_similar_materials(
    'DELL-LATITUDE-E6430-2012',
    k := 5,
    method := 'combined'  -- uses structural + textual fusion
);

Results:

material_id	description	similarity
HP-ELITEBOOK-8470P-2012	HP EliteBook 8470p (2012)	0.94
LENOVO-THINKPAD-T430-2012	Lenovo ThinkPad T430 (2012)	0.93
DELL-LATITUDE-E6420-2011	Dell Latitude E6420 (2011)	0.92
APPLE-MACBOOK-PRO-13-2012	MacBook Pro 13" (2012)	0.85
MICROSOFT-SURFACE-PRO-2013	Surface Pro (2013)	0.76

Business laptops from the same era cluster tightly—they share similar BOMs, similar names ("EliteBook", "ThinkPad", "Latitude"), and likely similar demand patterns. The MacBook scores lower (different brand positioning), and tablets diverge further.

From Similarity to Forecast

For cold-start forecasting, use cold_start_analogs() to get weighted analogs directly:

-- Get forecast analogs with normalized weights
SELECT * FROM cold_start_analogs(
    'NEW-LAPTOP-2026',
    k := 3,
    method := 'combined'
);

Results:

analog_material_id	similarity_score	forecast_weight
HP-ELITEBOOK-8470P-2012	0.94	0.36
LENOVO-THINKPAD-T430-2012	0.93	0.35
DELL-LATITUDE-E6420-2011	0.92	0.29

Interpretation: Your new laptop's forecast should weight HP EliteBook's history at 36%, ThinkPad T430 at 35%, and Dell Latitude E6420 at 29%. Feed these to your forecasting model for a data-driven analog forecast.

Beyond Cold-Start Forecasting

While this post focuses on the S&OP use case, BOM similarity enables other workflows:

Material rationalization: Find near-duplicate SKUs for consolidation
Impact analysis: Which products are affected by a component change?
Product family clustering: Group products into variant families automatically

These are covered in Part 2 and Part 3 of this series.

Try It Yourself

-- Install the extension
INSTALL anofox_similarity FROM community;
LOAD anofox_similarity;

-- Load your BOM data (or use the Figshare sample dataset)
-- Initialize similarity engine
SELECT similarity_init();

-- Find similar products for forecasting
SELECT * FROM find_similar_materials('YOUR-NEW-PRODUCT', k := 5);

-- Get weighted analogs for cold-start forecasting
SELECT * FROM cold_start_analogs('YOUR-NEW-PRODUCT', k := 3);

In this series:

BOM Analysis Beyond Excel — SQL-native product similarity for cold-start forecasting (this post)
Understanding Similarity Algorithms — From Jaccard to WL kernels to multi-modal fusion
Detecting Product Lifecycles — Automatic predecessor/successor detection

All of this runs in SQL. No external services, no data exports, no new tools to learn.

Reference Documentation

📖 Full Similarity Reference →

The Example: Consumer Electronics BOMs​

The Solution: SQL-Native BOM Similarity​

How It Works: Multi-Modal Similarity​

1. Structural Similarity (BOM Topology)​

2. Textual Similarity (Product Descriptions)​

3. Transactional Similarity (Consumption Patterns)​

Fused Embeddings​

Finding Forecast Analogs​

From Similarity to Forecast​

Beyond Cold-Start Forecasting​

Try It Yourself​

Reference Documentation​