Overview

SQL is no longer just a reporting language.
In modern applications—payments, filings, customer analytics, fraud detection, and compliance—SQL directly powers business decisions.

Bad SQL can:

Slow down production databases

Inflate AWS RDS / Redshift bills

Break dashboards

Produce wrong financial numbers

Good SQL can:

Make systems faster

Reduce cloud cost

Improve developer productivity

Build trust in business data

This tutorial teaches how SQL should be written in real systems, not just how to make queries run.

What You Will Learn

By the end of this guide, you will know how to:

Write clean, readable, professional SQL

Fetch only the data you actually need

Break complex logic into CTEs

Avoid slow subqueries

Write index-friendly filters

Prevent duplicate data and wrong metrics

Use COUNT correctly in business reports

This is designed for:

Our Real-World Data Model

We will use a realistic business schema:

customers(customer_id, name, email, state)
orders(order_id, customer_id, order_date, status, total_amount)
payments(payment_id, order_id, payment_date, amount, payment_status)

Think of this like:

Business Requirement

Show total revenue and number of paid orders per state for the last 30 days.

This is exactly the kind of query used in finance, growth, and compliance dashboards.

Write Clean & Formatted SQL

❌ Bad (hard to read, risky to change)

select c.state,sum(p.amount),count(o.order_id)
from customers c,orders o,payments p
where c.customer_id=o.customer_id and o.order_id=p.order_id
and date(p.payment_date)>=date(now()-interval 30 day)
group by c.state;

✅ Good (production-grade)

SELECT
    c.state,
    SUM(p.amount)      AS total_revenue,
    COUNT(o.order_id)  AS paid_orders
FROM customers c
JOIN orders o
    ON c.customer_id = o.customer_id
JOIN payments p
    ON o.order_id = p.order_id
WHERE p.payment_date >= CURRENT_DATE - INTERVAL 30 DAY
  AND p.payment_status = 'SUCCESS'
GROUP BY c.state;

Clean SQL saves hours during:

Fetch Only What You Need

❌ Bad

SELECT *
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL 30 DAY;

Large tables may have 50+ columns → slow & expensive.

✅ Good

SELECT
    order_id,
    customer_id,
    total_amount,
    status
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL 30 DAY;

Less data = faster queries = lower cloud cost.

Split Logic Using CTEs

When queries grow, CTEs keep them readable.

Business rule:

WITH paid_orders AS (
    SELECT
        o.order_id,
        o.customer_id
    FROM orders o
    JOIN payments p
        ON o.order_id = p.order_id
    WHERE p.payment_status = 'SUCCESS'
      AND p.payment_date >= CURRENT_DATE - INTERVAL 30 DAY
      AND o.status = 'COMPLETED'
),
state_revenue AS (
    SELECT
        c.state,
        p.amount
    FROM paid_orders po
    JOIN customers c
        ON po.customer_id = c.customer_id
    JOIN payments p
        ON po.order_id = p.order_id
)
SELECT
    state,
    SUM(amount) AS total_revenue
FROM state_revenue
GROUP BY state;

CTEs let teams:

Prefer JOINs Over Subqueries

❌ Bad (runs repeatedly per row)

SELECT
    c.state,
    (SELECT SUM(p.amount)
     FROM payments p
     WHERE p.order_id IN (
         SELECT o.order_id
         FROM orders o
         WHERE o.customer_id = c.customer_id
     )
    ) AS revenue
FROM customers c;

✅ Good

SELECT
    c.state,
    SUM(p.amount) AS revenue
FROM customers c
JOIN orders o
    ON c.customer_id = o.customer_id
JOIN payments p
    ON o.order_id = p.order_id
GROUP BY c.state;

Fewer scans = faster dashboards.

Write Index-Friendly Filters

Assume payment_date is indexed.❌ Bad

WHERE DATE(payment_date) = '2026-01-01'

This disables the index.

✅ Good

WHERE payment_date >= '2026-01-01'
  AND payment_date <  '2026-01-02'

This uses the index → huge performance gain.

Don’t Use DISTINCT to Hide Bad Joins

❌ Bad

SELECT DISTINCT c.customer_id
FROM customers c
JOIN orders o
    ON c.customer_id = o.customer_id
JOIN payments p
    ON o.order_id = p.order_id;

This hides data duplication.

✅ Correct

SELECT c.customer_id
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    JOIN payments p
        ON o.order_id = p.order_id
    WHERE o.customer_id = c.customer_id
      AND p.payment_status = 'SUCCESS'
);

This returns only real paying customers.

**COUNT(*) vs COUNT(column)**

SELECT
    COUNT(*)          AS total_orders,
    COUNT(payment_id) AS paid_orders
FROM orders o
LEFT JOIN payments p
    ON o.order_id = p.order_id;

If 100 orders exist but only 80 are paid:

COUNT(*) = 100

COUNT(payment_id) = 80

Using the wrong one leads to wrong revenue reports.

Why This Matters

In production systems:

Finance uses these numbers

Marketing runs campaigns

Compliance files reports

Leadership makes decisions

One wrong query can mean:

Wrong revenue

Wrong growth numbers

Wrong business strategy

Good SQL = trustworthy data.

Overview

What You Will Learn

Our Real-World Data Model

Business Requirement

Write Clean & Formatted SQL

❌ Bad (hard to read, risky to change)

✅ Good (production-grade)

Fetch Only What You Need

❌ Bad

✅ Good

Split Logic Using CTEs

Prefer JOINs Over Subqueries

❌ Bad (runs repeatedly per row)

✅ Good

Write Index-Friendly Filters

✅ Good

Don’t Use DISTINCT to Hide Bad Joins

❌ Bad

✅ Correct

COUNT(*) vs COUNT(column)

Why This Matters

Share this:

Related

Leave a Reply Cancel reply

**COUNT(*) vs COUNT(column)**