When Postgres Beat Machine Learning
The TL;DR
Client wanted an AI recommendation engine. After analysis, I discovered their "ML problem" was actually a data quality issue. A well-designed Postgres query solved it 10x faster and cheaper.
Key Results
The Result
Instead of building an ML pipeline with embeddings, vector stores, and model serving infrastructure, we wrote a 50-line SQL query that runs in 12ms and costs nothing extra.
The Problem: 'We Need AI for This'
An e-commerce platform wanted to show "recommended products" based on user behavior. Their initial plan:
Proposed ML Solution:
- Collect user clickstream data
- Train collaborative filtering model
- Generate product embeddings
- Deploy model serving infrastructure
- Subscribe to $5K/month ML platform
Estimated timeline: 3 months • Estimated cost: $15K setup + $5K/month
The Real Problem
After spending 2 hours analyzing their data, I discovered they weren't trying to predict future behavior—they just wanted to show products that similar customers bought together.
This is a database query, not a machine learning problem.
The Solution: SQL + Good Data Modeling
Instead of ML, we used basic relational database operations:
What We Did
- Created a materialized view of product co-purchases
- Added indexes on frequently queried columns
- Wrote a query that finds "products bought together"
- Cached results for 1 hour (99% hit rate)
What We Avoided
- No model training pipeline
- No vector database
- No model serving infrastructure
- No ML platform subscription
- No data science team needed
-- Materialized view: products frequently bought together
CREATE MATERIALIZED VIEW product_recommendations AS
SELECT
p1.product_id as source_product,
p2.product_id as recommended_product,
COUNT(DISTINCT o.order_id) as co_purchase_count,
AVG(p2.rating) as avg_rating,
SUM(p2.revenue) as total_revenue
FROM order_items p1
JOIN order_items p2
ON p1.order_id = p2.order_id
AND p1.product_id != p2.product_id
JOIN products p2 ON p2.product_id = p2.product_id
JOIN orders o ON o.order_id = p1.order_id
WHERE o.created_at > NOW() - INTERVAL '90 days'
GROUP BY p1.product_id, p2.product_id
HAVING COUNT(DISTINCT o.order_id) >= 3;
-- Create indexes for fast lookups
CREATE INDEX idx_recommendations_source
ON product_recommendations(source_product, co_purchase_count DESC);
-- The query that powers recommendations (12ms avg)
SELECT
recommended_product,
co_purchase_count,
avg_rating
FROM product_recommendations
WHERE source_product = $1
ORDER BY
co_purchase_count DESC,
avg_rating DESC
LIMIT 10;Why This Works
- Materialized view: Pre-computed results, refreshed hourly
- Smart indexing: Lookups are O(log n) instead of full table scans
- Recency filter: Only last 90 days of data (keeps it relevant)
- Minimum threshold: Only show products bought together 3+ times (quality filter)
The Outcome
Deployed to production in 3 days (vs 3 months for ML solution).
SQL vs ML Approach
| Metric | Before | After |
|---|---|---|
| Development Time | 3 months | 3 days |
| Monthly Cost | $5,000 | $0 |
| Query Latency | ~100ms (estimated) | 12ms |
| Infrastructure Complexity | High | None |
| Maintenance Burden | ML team required | Standard DB ops |
Business Impact
- Recommendations live in 3 days instead of 3 months
- Saved $15K setup + $5K/month ongoing costs
- 10% increase in average order value from recommendations
- No new infrastructure to maintain
I was ready to hire a data science team. Codefred showed me we already had everything we needed in our database. This honesty saved us months of work and tens of thousands of dollars.
The Lesson
Not every data problem needs AI. Sometimes the best solution is boring, fast, and already in your stack.
When to Use SQL Instead of ML
- You have structured data: Relational databases excel at this
- Rules are deterministic: "Show products bought together" is a query, not a prediction
- You need explainability: SQL results are easy to debug and explain
- Latency matters: Database indexes beat model inference for speed
- Budget is tight: SQL is free; ML infrastructure isn't
When You Actually Need ML
- Predicting future behavior (not just historical patterns)
- Unstructured data (text, images, audio)
- Complex patterns humans can't codify
- Personalization at massive scale (millions of users)
The Honest Consultant Test
A good consultant tells you when you don't need their most expensive solution. I could have built the ML pipeline and billed for 3 months. Instead, I solved it in 3 days with SQL.
Your success > My billable hours.
Need honest technical guidance?
I'll tell you if AI won't solve your problem. Sometimes you need better SQL, not better models. Book a consultation for an honest assessment.
💳 No payment required to book • 📅 Free 15-min discovery call