CTEs vs. Subqueries: The Definitive Guide to Choosing the Right Tool
Introduction
Every SQL developer has been there: you’ve written a perfectly functional query, only to have a code reviewer suggest, “This would be cleaner as a CTE.” Or perhaps you’ve used a CTE and been told, “A simple subquery would suffice here.”
The debate between Common Table Expressions (CTEs) and subqueries isn’t just about personal preference—it’s about writing SQL that is maintainable, performant, and clear in its intent. In this guide, we’ll explore both approaches, understand their strengths and weaknesses, and develop a framework for making the right choice every time.
Understanding the Basics
What is a Subquery?
A subquery is simply a query nested within another query. It’s a query that helps another query get its job done. Think of it as a supporting actor that sets the stage for the main performance.
Here’s a straightforward example:
SELECT *
FROM premium_transactions
WHERE payment_code IN (
SELECT payment_code
FROM payment_methods
WHERE is_automatic = 1
);
The inner query identifies automatic payment codes, and the outer query uses those results to filter transactions. It’s clean, concise, and gets the job done.
However, subqueries can quickly become difficult to parse when nested multiple levels deep:
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE order_date > '2023-01-01'
AND order_id IN (
SELECT order_id
FROM order_items
WHERE price > (
SELECT AVG(price)
FROM products
WHERE category = 'Electronics'
)
)
);
This deeply nested structure calculates average electronics prices, filters order items, then orders, then customers—all in a way that requires mental gymnastics to follow.
What is a Common Table Expression?
A CTE is a named temporary result set that exists only for the duration of a query. Defined using the WITH keyword, CTEs act like temporary views that you can reference multiple times within the same statement.
Here’s a simple example:
WITH avg_salary AS (
SELECT AVG(salary) as average
FROM employees
)
SELECT e.employee_id, e.name, e.salary
FROM employees e
CROSS JOIN avg_salary
WHERE e.salary > avg_salary.average;
The CTE avg_salary calculates once and can be referenced as many times as needed in the main query.
CTEs truly shine in multi-step transformations:
WITH sales_per_customer AS (
SELECT customer_id, SUM(amount) as total_spent
FROM orders
GROUP BY customer_id
),
top_customers AS (
SELECT customer_id
FROM sales_per_customer
WHERE total_spent > 5000
)
SELECT c.*
FROM customers c
INNER JOIN top_customers tc ON c.customer_id = tc.customer_id;
This query builds logic incrementally: calculate totals, filter for high spenders, then enrich with full customer details. Each step is clear and follows a logical progression.
When to Use CTEs
1. Readability and Organization Matter
Your code is read far more often than it’s written. When a query involves complex logic, CTEs make that logic transparent.
Compare the deeply nested subquery from earlier with this CTE version:
WITH avg_electronics_price AS (
SELECT AVG(price) as avg_price
FROM products
WHERE category = 'Electronics'
),
qualifying_items AS (
SELECT order_id, customer_id
FROM order_items
WHERE price > (SELECT avg_price FROM avg_electronics_price)
),
qualifying_orders AS (
SELECT DISTINCT customer_id
FROM orders o
INNER JOIN qualifying_items qi ON o.order_id = qi.order_id
WHERE o.order_date > '2023-01-01'
)
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM qualifying_orders);
The transformation is immediately clearer. Each CTE has a descriptive name that documents its purpose, making the query self-documenting.
2. Building Incremental Logic
When your query requires step-by-step transformations, CTEs provide a natural structure. Each CTE represents one logical step, making it easier to:
- Debug specific stages
- Test intermediate results
- Modify individual steps without affecting others
- Understand the analytical thought process
This is particularly valuable in analytical queries where you’re building insights layer by layer:
WITH daily_sales AS (
SELECT
DATE(order_date) as sale_date,
SUM(amount) as daily_total
FROM orders
GROUP BY DATE(order_date)
),
moving_average AS (
SELECT
sale_date,
daily_total,
AVG(daily_total) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_avg
FROM daily_sales
),
anomaly_detection AS (
SELECT
sale_date,
daily_total,
seven_day_avg,
CASE
WHEN daily_total > seven_day_avg * 1.5 THEN 'Above Normal'
WHEN daily_total < seven_day_avg * 0.5 THEN 'Below Normal'
ELSE 'Normal'
END as pattern
FROM moving_average
)
SELECT * FROM anomaly_detection
WHERE pattern != 'Normal'
ORDER BY sale_date DESC;
3. Reusing Logic Multiple Times
If you need to reference the same result set multiple times, CTEs eliminate redundant calculations:
-- Without CTE (subquery repeated)
SELECT
(SELECT MAX(salary) FROM employees) as max_salary,
(SELECT MIN(salary) FROM employees) as min_salary,
(SELECT MAX(salary) FROM employees) - (SELECT MIN(salary) FROM employees) as salary_range;
-- With CTE (calculated once, used multiple times)
WITH salary_stats AS (
SELECT
MAX(salary) as max_salary,
MIN(salary) as min_salary
FROM employees
)
SELECT
max_salary,
min_salary,
max_salary - min_salary as salary_range
FROM salary_stats;
The CTE version is more efficient (calculates once) and more maintainable (change the logic in one place).
4. Recursive Operations
CTEs have a unique capability that subqueries lack: recursion. This is essential for hierarchical data like organizational charts, bill of materials, or graph traversal:
WITH RECURSIVE employee_hierarchy AS (
-- Anchor: Start with top-level employees
SELECT
employee_id,
name,
manager_id,
1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Add employees under each manager
SELECT
e.employee_id,
e.name,
e.manager_id,
eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;
Subqueries simply cannot handle recursive operations—this is CTE-exclusive territory.
When to Use Subqueries
1. Simple, One-Time Operations
When the logic is straightforward and used only once, a subquery keeps things concise:
UPDATE products
SET price = price * 1.1
WHERE category_id IN (
SELECT category_id
FROM categories
WHERE name = 'Electronics'
);
Adding a CTE here would be overkill. The subquery is clear, compact, and fit for purpose.
2. EXISTS and NOT EXISTS Checks
Subqueries pair naturally with existence checks:
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date > CURRENT_DATE - INTERVAL '90 days'
);
The EXISTS clause is optimized to stop searching as soon as it finds a match, making this pattern very efficient.
3. Correlated Subqueries for Row-by-Row Comparisons
When you need to compare each row against aggregate values, correlated subqueries are often the most intuitive approach:
SELECT
p.product_id,
p.product_name,
p.price
FROM products p
WHERE p.price > (
SELECT AVG(price)
FROM products
WHERE category_id = p.category_id
);
This finds products priced above their category average. While you could use a CTE with a join, the correlated subquery directly expresses the intent.
4. Inline Scalar Values
For single-value calculations, subqueries in the SELECT clause can be concise:
SELECT
customer_id,
customer_name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) as order_count,
(SELECT MAX(order_date) FROM orders WHERE customer_id = c.customer_id) as last_order_date
FROM customers c;
Note: This can be inefficient for large datasets since it executes the subquery for every row. A CTE with a JOIN or window functions might perform better.
Performance Considerations
Query Optimizer Behavior
In modern database systems (PostgreSQL 12+, SQL Server 2017+, MySQL 8.0+), the query optimizer often treats CTEs and subqueries similarly. Both get optimized, inlined, or materialized based on what the optimizer determines is most efficient.
However, there are important differences:
Materialization: Some databases (particularly older PostgreSQL versions) always materialize CTEs, meaning the result set is computed and stored once before the main query runs. This can be beneficial (avoid recalculation) or detrimental (forced to compute even if only a few rows are needed).
Optimization Barriers: In some database versions, CTEs create “optimization fences” that prevent the optimizer from pushing down predicates or reordering operations. Subqueries may offer more optimization flexibility.
Modern Behavior: PostgreSQL 12+ introduced the MATERIALIZED and NOT MATERIALIZED hints:
-- Force materialization
WITH sales_summary AS MATERIALIZED (
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
)
SELECT * FROM sales_summary WHERE total > 1000;
-- Prevent materialization (inline it)
WITH recent_orders AS NOT MATERIALIZED (
SELECT * FROM orders WHERE order_date > CURRENT_DATE - INTERVAL '7 days'
)
SELECT * FROM recent_orders;
Performance Guidelines
- For large result sets that are reused multiple times: CTEs may perform better due to single calculation
- For small, simple filters: Subqueries often perform identically and may offer more optimization opportunities
- For recursive operations: CTEs are your only option
- Always profile: Use
EXPLAIN ANALYZEto understand actual execution plans and performance
Database-Specific Considerations
PostgreSQL
- Pre-12: CTEs always materialized (optimization fence)
- 12+: Intelligent optimization unless forced with MATERIALIZED keyword
- Strong support for recursive CTEs
SQL Server
- CTEs optimized similarly to subqueries
- Excellent recursive CTE performance
- OPTION (MAXRECURSION n) to control recursive depth
MySQL
- 8.0+: Full CTE support including recursive
- Generally optimizes CTEs and subqueries similarly
- Earlier versions: No CTE support (use derived tables)
Oracle
- Calls them “subquery factoring”
- MATERIALIZE hint to force materialization
- Strong optimization for both approaches
Decision Framework
Use CTEs when:
- ✅ Logic involves multiple steps or transformations
- ✅ Readability and maintainability are priorities
- ✅ The same result set is needed multiple times
- ✅ You need recursive operations
- ✅ Query will be reviewed, modified, or maintained by others
- ✅ Complex business logic that benefits from self-documenting code
Use Subqueries when:
- ✅ Logic is simple and used once
- ✅ Working with EXISTS/NOT EXISTS checks
- ✅ Correlated row-by-row comparisons
- ✅ Quick one-off queries or ad-hoc analysis
- ✅ Inline scalar values in SELECT clause
- ✅ Simple filtering in WHERE clauses
Best Practices
1. Name CTEs Descriptively
-- Bad
WITH t1 AS (...), t2 AS (...), t3 AS (...)
-- Good
WITH monthly_sales AS (...),
top_performers AS (...),
final_rankings AS (...)
2. Keep CTEs Focused
Each CTE should do one thing well. If a CTE becomes too complex, break it into multiple CTEs.
3. Comment Complex Logic
WITH customer_segments AS (
-- Segment customers by lifetime value
-- LTV = total spend over all time
SELECT
customer_id,
SUM(amount) as lifetime_value,
CASE
WHEN SUM(amount) > 10000 THEN 'VIP'
WHEN SUM(amount) > 5000 THEN 'Premium'
ELSE 'Standard'
END as segment
FROM orders
GROUP BY customer_id
)
SELECT * FROM customer_segments;
4. Test Intermediate Results
One advantage of CTEs is that you can easily test each step:
-- Test just the first CTE
WITH first_step AS (
SELECT ...
)
SELECT * FROM first_step;
5. Be Aware of Scope
CTEs are scoped to a single statement. If you need to use the same logic across multiple queries, consider creating a view instead.
Conclusion
The choice between CTEs and subqueries isn’t about one being universally better than the other. It’s about selecting the right tool for the specific job at hand.
CTEs excel at clarity: They transform complex logic into readable, maintainable code that documents the analytical process. When your query tells a story that unfolds in steps, CTEs are your narrative structure.
Subqueries excel at simplicity: When you need a quick filter or existence check, subqueries keep your code concise without unnecessary ceremony.
A good rule of thumb: Start with readability. If a subquery makes your intent clear, use it. If you find yourself nesting multiple levels or repeating logic, refactor to CTEs. And when in doubt, remember that your future self (or your teammates) will thank you for choosing the more readable option.
As SQL developers, our goal isn’t just to write queries that work—it’s to write queries that communicate clearly, perform efficiently, and can be maintained over time. Understanding when to use CTEs versus subqueries is a key skill in achieving that goal.
Remember: Profile your queries, understand your database’s optimizer behavior, and always prioritize code that your team can understand and maintain. The best query is the one that works correctly, performs adequately, and can be understood six months from now.
More from Python and Data Analysis Insights substack https://benjaminbennettalexander.substack.com/p/ctes-vs-subqueries-which-one-should

Leave a Reply