Back to Case Studies

When Postgres Beat Machine Learning

Client
E-commerce Platform
Role
Technical Consultant
Timeline
1 Week
Tech Stack
PostgreSQLSQLData ModelingPython

The TL;DR

Client wanted an AI recommendation engine. After analysis, I discovered their "ML problem" was actually a data quality issue. A well-designed Postgres query solved it 10x faster and cheaper.

Key Results

$0
vs $5k/month AI service
10x
Faster
90%
Complexity Reduction

The Result

Instead of building an ML pipeline with embeddings, vector stores, and model serving infrastructure, we wrote a 50-line SQL query that runs in 12ms and costs nothing extra.

The Problem: 'We Need AI for This'

An e-commerce platform wanted to show "recommended products" based on user behavior. Their initial plan:

Proposed ML Solution:

  • Collect user clickstream data
  • Train collaborative filtering model
  • Generate product embeddings
  • Deploy model serving infrastructure
  • Subscribe to $5K/month ML platform

Estimated timeline: 3 months • Estimated cost: $15K setup + $5K/month

⚠️

The Real Problem

After spending 2 hours analyzing their data, I discovered they weren't trying to predict future behavior—they just wanted to show products that similar customers bought together.

This is a database query, not a machine learning problem.

The Solution: SQL + Good Data Modeling

Instead of ML, we used basic relational database operations:

What We Did

  1. Created a materialized view of product co-purchases
  2. Added indexes on frequently queried columns
  3. Wrote a query that finds "products bought together"
  4. Cached results for 1 hour (99% hit rate)

What We Avoided

  • No model training pipeline
  • No vector database
  • No model serving infrastructure
  • No ML platform subscription
  • No data science team needed
The 'AI' Solution (It's Just SQL) sql
-- Materialized view: products frequently bought together
CREATE MATERIALIZED VIEW product_recommendations AS
SELECT 
    p1.product_id as source_product,
    p2.product_id as recommended_product,
    COUNT(DISTINCT o.order_id) as co_purchase_count,
    AVG(p2.rating) as avg_rating,
    SUM(p2.revenue) as total_revenue
FROM order_items p1
JOIN order_items p2 
    ON p1.order_id = p2.order_id 
    AND p1.product_id != p2.product_id
JOIN products p2 ON p2.product_id = p2.product_id
JOIN orders o ON o.order_id = p1.order_id
WHERE o.created_at > NOW() - INTERVAL '90 days'
GROUP BY p1.product_id, p2.product_id
HAVING COUNT(DISTINCT o.order_id) >= 3;

-- Create indexes for fast lookups
CREATE INDEX idx_recommendations_source 
    ON product_recommendations(source_product, co_purchase_count DESC);

-- The query that powers recommendations (12ms avg)
SELECT 
    recommended_product,
    co_purchase_count,
    avg_rating
FROM product_recommendations
WHERE source_product = $1
ORDER BY 
    co_purchase_count DESC,
    avg_rating DESC
LIMIT 10;
💭

Why This Works

  • Materialized view: Pre-computed results, refreshed hourly
  • Smart indexing: Lookups are O(log n) instead of full table scans
  • Recency filter: Only last 90 days of data (keeps it relevant)
  • Minimum threshold: Only show products bought together 3+ times (quality filter)

The Outcome

Deployed to production in 3 days (vs 3 months for ML solution).

SQL vs ML Approach

MetricBeforeAfter
Development Time3 months3 days
Monthly Cost$5,000$0
Query Latency~100ms (estimated)12ms
Infrastructure ComplexityHighNone
Maintenance BurdenML team requiredStandard DB ops

Business Impact

  • Recommendations live in 3 days instead of 3 months
  • Saved $15K setup + $5K/month ongoing costs
  • 10% increase in average order value from recommendations
  • No new infrastructure to maintain
"

I was ready to hire a data science team. Codefred showed me we already had everything we needed in our database. This honesty saved us months of work and tens of thousands of dollars.

F
Founder
E-commerce Platform

The Lesson

Not every data problem needs AI. Sometimes the best solution is boring, fast, and already in your stack.

⚠️

When to Use SQL Instead of ML

  • You have structured data: Relational databases excel at this
  • Rules are deterministic: "Show products bought together" is a query, not a prediction
  • You need explainability: SQL results are easy to debug and explain
  • Latency matters: Database indexes beat model inference for speed
  • Budget is tight: SQL is free; ML infrastructure isn't
💡

When You Actually Need ML

  • Predicting future behavior (not just historical patterns)
  • Unstructured data (text, images, audio)
  • Complex patterns humans can't codify
  • Personalization at massive scale (millions of users)

The Honest Consultant Test

A good consultant tells you when you don't need their most expensive solution. I could have built the ML pipeline and billed for 3 months. Instead, I solved it in 3 days with SQL.

Your success > My billable hours.

Need honest technical guidance?

I'll tell you if AI won't solve your problem. Sometimes you need better SQL, not better models. Book a consultation for an honest assessment.

💳 No payment required to book • 📅 Free 15-min discovery call