SQL CTE vs Subquery

CTEs vs. Subqueries: The Definitive Guide to Choosing the Right Tool

Introduction

Every SQL developer has been there: you’ve written a perfectly functional query, only to have a code reviewer suggest, “This would be cleaner as a CTE.” Or perhaps you’ve used a CTE and been told, “A simple subquery would suffice here.”

The debate between Common Table Expressions (CTEs) and subqueries isn’t just about personal preference—it’s about writing SQL that is maintainable, performant, and clear in its intent. In this guide, we’ll explore both approaches, understand their strengths and weaknesses, and develop a framework for making the right choice every time.

Understanding the Basics

What is a Subquery?

A subquery is simply a query nested within another query. It’s a query that helps another query get its job done. Think of it as a supporting actor that sets the stage for the main performance.

Here’s a straightforward example:

SELECT *
FROM premium_transactions
WHERE payment_code IN (
    SELECT payment_code
    FROM payment_methods
    WHERE is_automatic = 1
);

The inner query identifies automatic payment codes, and the outer query uses those results to filter transactions. It’s clean, concise, and gets the job done.

However, subqueries can quickly become difficult to parse when nested multiple levels deep:

SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE order_date > '2023-01-01'
    AND order_id IN (
        SELECT order_id
        FROM order_items
        WHERE price > (
            SELECT AVG(price)
            FROM products
            WHERE category = 'Electronics'
        )
    )
);

This deeply nested structure calculates average electronics prices, filters order items, then orders, then customers—all in a way that requires mental gymnastics to follow.

What is a Common Table Expression?

A CTE is a named temporary result set that exists only for the duration of a query. Defined using the WITH keyword, CTEs act like temporary views that you can reference multiple times within the same statement.

Here’s a simple example:

WITH avg_salary AS (
    SELECT AVG(salary) as average
    FROM employees
)
SELECT e.employee_id, e.name, e.salary
FROM employees e
CROSS JOIN avg_salary
WHERE e.salary > avg_salary.average;

The CTE avg_salary calculates once and can be referenced as many times as needed in the main query.

CTEs truly shine in multi-step transformations:

WITH sales_per_customer AS (
    SELECT customer_id, SUM(amount) as total_spent
    FROM orders
    GROUP BY customer_id
),
top_customers AS (
    SELECT customer_id
    FROM sales_per_customer
    WHERE total_spent > 5000
)
SELECT c.*
FROM customers c
INNER JOIN top_customers tc ON c.customer_id = tc.customer_id;

This query builds logic incrementally: calculate totals, filter for high spenders, then enrich with full customer details. Each step is clear and follows a logical progression.

When to Use CTEs

1. Readability and Organization Matter

Your code is read far more often than it’s written. When a query involves complex logic, CTEs make that logic transparent.

Compare the deeply nested subquery from earlier with this CTE version:

WITH avg_electronics_price AS (
    SELECT AVG(price) as avg_price
    FROM products
    WHERE category = 'Electronics'
),
qualifying_items AS (
    SELECT order_id, customer_id
    FROM order_items
    WHERE price > (SELECT avg_price FROM avg_electronics_price)
),
qualifying_orders AS (
    SELECT DISTINCT customer_id
    FROM orders o
    INNER JOIN qualifying_items qi ON o.order_id = qi.order_id
    WHERE o.order_date > '2023-01-01'
)
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM qualifying_orders);

The transformation is immediately clearer. Each CTE has a descriptive name that documents its purpose, making the query self-documenting.

2. Building Incremental Logic

When your query requires step-by-step transformations, CTEs provide a natural structure. Each CTE represents one logical step, making it easier to:

Debug specific stages
Test intermediate results
Modify individual steps without affecting others
Understand the analytical thought process

This is particularly valuable in analytical queries where you’re building insights layer by layer:

WITH daily_sales AS (
    SELECT 
        DATE(order_date) as sale_date,
        SUM(amount) as daily_total
    FROM orders
    GROUP BY DATE(order_date)
),
moving_average AS (
    SELECT 
        sale_date,
        daily_total,
        AVG(daily_total) OVER (
            ORDER BY sale_date 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ) as seven_day_avg
    FROM daily_sales
),
anomaly_detection AS (
    SELECT 
        sale_date,
        daily_total,
        seven_day_avg,
        CASE 
            WHEN daily_total > seven_day_avg * 1.5 THEN 'Above Normal'
            WHEN daily_total < seven_day_avg * 0.5 THEN 'Below Normal'
            ELSE 'Normal'
        END as pattern
    FROM moving_average
)
SELECT * FROM anomaly_detection
WHERE pattern != 'Normal'
ORDER BY sale_date DESC;

3. Reusing Logic Multiple Times

If you need to reference the same result set multiple times, CTEs eliminate redundant calculations:

-- Without CTE (subquery repeated)
SELECT 
    (SELECT MAX(salary) FROM employees) as max_salary,
    (SELECT MIN(salary) FROM employees) as min_salary,
    (SELECT MAX(salary) FROM employees) - (SELECT MIN(salary) FROM employees) as salary_range;

-- With CTE (calculated once, used multiple times)
WITH salary_stats AS (
    SELECT 
        MAX(salary) as max_salary,
        MIN(salary) as min_salary
    FROM employees
)
SELECT 
    max_salary,
    min_salary,
    max_salary - min_salary as salary_range
FROM salary_stats;

The CTE version is more efficient (calculates once) and more maintainable (change the logic in one place).

4. Recursive Operations

CTEs have a unique capability that subqueries lack: recursion. This is essential for hierarchical data like organizational charts, bill of materials, or graph traversal:

WITH RECURSIVE employee_hierarchy AS (
    -- Anchor: Start with top-level employees
    SELECT 
        employee_id,
        name,
        manager_id,
        1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Add employees under each manager
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        eh.level + 1
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;

Subqueries simply cannot handle recursive operations—this is CTE-exclusive territory.

When to Use Subqueries

1. Simple, One-Time Operations

When the logic is straightforward and used only once, a subquery keeps things concise:

UPDATE products
SET price = price * 1.1
WHERE category_id IN (
    SELECT category_id
    FROM categories
    WHERE name = 'Electronics'
);

Adding a CTE here would be overkill. The subquery is clear, compact, and fit for purpose.

2. EXISTS and NOT EXISTS Checks

Subqueries pair naturally with existence checks:

SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date > CURRENT_DATE - INTERVAL '90 days'
);

The EXISTS clause is optimized to stop searching as soon as it finds a match, making this pattern very efficient.

3. Correlated Subqueries for Row-by-Row Comparisons

When you need to compare each row against aggregate values, correlated subqueries are often the most intuitive approach:

SELECT 
    p.product_id,
    p.product_name,
    p.price
FROM products p
WHERE p.price > (
    SELECT AVG(price)
    FROM products
    WHERE category_id = p.category_id
);

This finds products priced above their category average. While you could use a CTE with a join, the correlated subquery directly expresses the intent.

4. Inline Scalar Values

For single-value calculations, subqueries in the SELECT clause can be concise:

SELECT 
    customer_id,
    customer_name,
    (SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) as order_count,
    (SELECT MAX(order_date) FROM orders WHERE customer_id = c.customer_id) as last_order_date
FROM customers c;

Note: This can be inefficient for large datasets since it executes the subquery for every row. A CTE with a JOIN or window functions might perform better.

Performance Considerations

Query Optimizer Behavior

In modern database systems (PostgreSQL 12+, SQL Server 2017+, MySQL 8.0+), the query optimizer often treats CTEs and subqueries similarly. Both get optimized, inlined, or materialized based on what the optimizer determines is most efficient.

However, there are important differences:

Materialization: Some databases (particularly older PostgreSQL versions) always materialize CTEs, meaning the result set is computed and stored once before the main query runs. This can be beneficial (avoid recalculation) or detrimental (forced to compute even if only a few rows are needed).

Optimization Barriers: In some database versions, CTEs create “optimization fences” that prevent the optimizer from pushing down predicates or reordering operations. Subqueries may offer more optimization flexibility.

Modern Behavior: PostgreSQL 12+ introduced the MATERIALIZED and NOT MATERIALIZED hints:

-- Force materialization
WITH sales_summary AS MATERIALIZED (
    SELECT customer_id, SUM(amount) as total
    FROM orders
    GROUP BY customer_id
)
SELECT * FROM sales_summary WHERE total > 1000;

-- Prevent materialization (inline it)
WITH recent_orders AS NOT MATERIALIZED (
    SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '7 days'
)
SELECT * FROM recent_orders;

Performance Guidelines

For large result sets that are reused multiple times: CTEs may perform better due to single calculation
For small, simple filters: Subqueries often perform identically and may offer more optimization opportunities
For recursive operations: CTEs are your only option
Always profile: Use EXPLAIN ANALYZE to understand actual execution plans and performance

Database-Specific Considerations

PostgreSQL

Pre-12: CTEs always materialized (optimization fence)
12+: Intelligent optimization unless forced with MATERIALIZED keyword
Strong support for recursive CTEs

SQL Server

CTEs optimized similarly to subqueries
Excellent recursive CTE performance
OPTION (MAXRECURSION n) to control recursive depth

MySQL

8.0+: Full CTE support including recursive
Generally optimizes CTEs and subqueries similarly
Earlier versions: No CTE support (use derived tables)

Oracle

Calls them “subquery factoring”
MATERIALIZE hint to force materialization
Strong optimization for both approaches

Decision Framework

Use CTEs when:

✅ Logic involves multiple steps or transformations
✅ Readability and maintainability are priorities
✅ The same result set is needed multiple times
✅ You need recursive operations
✅ Query will be reviewed, modified, or maintained by others
✅ Complex business logic that benefits from self-documenting code

Use Subqueries when:

✅ Logic is simple and used once
✅ Working with EXISTS/NOT EXISTS checks
✅ Correlated row-by-row comparisons
✅ Quick one-off queries or ad-hoc analysis
✅ Inline scalar values in SELECT clause
✅ Simple filtering in WHERE clauses

Best Practices

1. Name CTEs Descriptively

-- Bad
WITH t1 AS (...), t2 AS (...), t3 AS (...)

-- Good
WITH monthly_sales AS (...), 
     top_performers AS (...), 
     final_rankings AS (...)

2. Keep CTEs Focused

Each CTE should do one thing well. If a CTE becomes too complex, break it into multiple CTEs.

3. Comment Complex Logic

WITH customer_segments AS (
    -- Segment customers by lifetime value
    -- LTV = total spend over all time
    SELECT 
        customer_id,
        SUM(amount) as lifetime_value,
        CASE 
            WHEN SUM(amount) > 10000 THEN 'VIP'
            WHEN SUM(amount) > 5000 THEN 'Premium'
            ELSE 'Standard'
        END as segment
    FROM orders
    GROUP BY customer_id
)
SELECT * FROM customer_segments;

4. Test Intermediate Results

One advantage of CTEs is that you can easily test each step:

-- Test just the first CTE
WITH first_step AS (
    SELECT ...
)
SELECT * FROM first_step;

5. Be Aware of Scope

CTEs are scoped to a single statement. If you need to use the same logic across multiple queries, consider creating a view instead.

Conclusion

The choice between CTEs and subqueries isn’t about one being universally better than the other. It’s about selecting the right tool for the specific job at hand.

CTEs excel at clarity: They transform complex logic into readable, maintainable code that documents the analytical process. When your query tells a story that unfolds in steps, CTEs are your narrative structure.

Subqueries excel at simplicity: When you need a quick filter or existence check, subqueries keep your code concise without unnecessary ceremony.

A good rule of thumb: Start with readability. If a subquery makes your intent clear, use it. If you find yourself nesting multiple levels or repeating logic, refactor to CTEs. And when in doubt, remember that your future self (or your teammates) will thank you for choosing the more readable option.

As SQL developers, our goal isn’t just to write queries that work—it’s to write queries that communicate clearly, perform efficiently, and can be maintained over time. Understanding when to use CTEs versus subqueries is a key skill in achieving that goal.

Remember: Profile your queries, understand your database’s optimizer behavior, and always prioritize code that your team can understand and maintain. The best query is the one that works correctly, performs adequately, and can be understood six months from now.

recent posts

about

Like this:

Leave a ReplyCancel reply