When Postgres Beat Machine Learning

Client

E-commerce Platform

Role

Technical Consultant

Timeline

1 Week

Tech Stack

PostgreSQLSQLData ModelingPython

The TL;DR

Client wanted an AI recommendation engine. After analysis, I discovered their "ML problem" was actually a data quality issue. A well-designed Postgres query solved it 10x faster and cheaper.

Key Results

vs $5k/month AI service

10x

Faster

90%

Complexity Reduction

✅

The Result

Instead of building an ML pipeline with embeddings, vector stores, and model serving infrastructure, we wrote a 50-line SQL query that runs in 12ms and costs nothing extra.

The Problem: 'We Need AI for This'

An e-commerce platform wanted to show "recommended products" based on user behavior. Their initial plan:

Proposed ML Solution:

Collect user clickstream data
Train collaborative filtering model
Generate product embeddings
Deploy model serving infrastructure
Subscribe to $5K/month ML platform

Estimated timeline: 3 months • Estimated cost: $15K setup + $5K/month

⚠️

The Real Problem

After spending 2 hours analyzing their data, I discovered they weren't trying to predict future behavior—they just wanted to show products that similar customers bought together.

This is a database query, not a machine learning problem.

The Solution: SQL + Good Data Modeling

Instead of ML, we used basic relational database operations:

What We Did

Created a materialized view of product co-purchases
Added indexes on frequently queried columns
Wrote a query that finds "products bought together"
Cached results for 1 hour (99% hit rate)

What We Avoided

No model training pipeline
No vector database
No model serving infrastructure
No ML platform subscription
No data science team needed

The 'AI' Solution (It's Just SQL) sql

-- Materialized view: products frequently bought together
CREATE MATERIALIZED VIEW product_recommendations AS
SELECT 
    p1.product_id as source_product,
    p2.product_id as recommended_product,
    COUNT(DISTINCT o.order_id) as co_purchase_count,
    AVG(p2.rating) as avg_rating,
    SUM(p2.revenue) as total_revenue
FROM order_items p1
JOIN order_items p2 
    ON p1.order_id = p2.order_id 
    AND p1.product_id != p2.product_id
JOIN products p2 ON p2.product_id = p2.product_id
JOIN orders o ON o.order_id = p1.order_id
WHERE o.created_at > NOW() - INTERVAL '90 days'
GROUP BY p1.product_id, p2.product_id
HAVING COUNT(DISTINCT o.order_id) >= 3;

-- Create indexes for fast lookups
CREATE INDEX idx_recommendations_source 
    ON product_recommendations(source_product, co_purchase_count DESC);

-- The query that powers recommendations (12ms avg)
SELECT 
    recommended_product,
    co_purchase_count,
    avg_rating
FROM product_recommendations
WHERE source_product = $1
ORDER BY 
    co_purchase_count DESC,
    avg_rating DESC
LIMIT 10;

💭

Why This Works

Materialized view: Pre-computed results, refreshed hourly
Smart indexing: Lookups are O(log n) instead of full table scans
Recency filter: Only last 90 days of data (keeps it relevant)
Minimum threshold: Only show products bought together 3+ times (quality filter)

The Outcome

Deployed to production in 3 days (vs 3 months for ML solution).

SQL vs ML Approach

Metric	Before	After
Development Time	3 months	3 days
Monthly Cost	$5,000	$0
Query Latency	~100ms (estimated)	12ms
Infrastructure Complexity	High	None
Maintenance Burden	ML team required	Standard DB ops

✅

Business Impact

Recommendations live in 3 days instead of 3 months
Saved $15K setup + $5K/month ongoing costs
10% increase in average order value from recommendations
No new infrastructure to maintain

I was ready to hire a data science team. Codefred showed me we already had everything we needed in our database. This honesty saved us months of work and tens of thousands of dollars.

F

Founder

E-commerce Platform

The Lesson

Not every data problem needs AI. Sometimes the best solution is boring, fast, and already in your stack.

⚠️

When to Use SQL Instead of ML

You have structured data: Relational databases excel at this
Rules are deterministic: "Show products bought together" is a query, not a prediction
You need explainability: SQL results are easy to debug and explain
Latency matters: Database indexes beat model inference for speed
Budget is tight: SQL is free; ML infrastructure isn't

💡

When You Actually Need ML

Predicting future behavior (not just historical patterns)
Unstructured data (text, images, audio)
Complex patterns humans can't codify
Personalization at massive scale (millions of users)

The Honest Consultant Test

A good consultant tells you when you don't need their most expensive solution. I could have built the ML pipeline and billed for 3 months. Instead, I solved it in 3 days with SQL.

Your success > My billable hours.

Need honest technical guidance?

I'll tell you if AI won't solve your problem. Sometimes you need better SQL, not better models. Book a consultation for an honest assessment.

Book Free Call Book $500 Audit

💳 No payment required to book • 📅 Free 15-min discovery call