Writing efficient and maintainable SQL queries is a critical skill for anyone working with databases. Whether you’re a beginner or looking to refine your approach, understanding best practices can dramatically improve your query performance, readability, and reliability. This guide will walk you through essential principles with practical examples you can try yourself.
Understanding the Foundation
Before diving into specific techniques, it’s important to recognize that good SQL queries balance three key concerns: performance, readability, and maintainability. A query that runs fast but is impossible for your teammates to understand creates technical debt. Conversely, a beautifully formatted query that takes minutes to execute helps no one.
Use Explicit Column Names Instead of SELECT *
One of the most common shortcuts developers take is using SELECT * to retrieve all columns from a table. While convenient during development, this practice can lead to performance issues and maintenance headaches.
When you use SELECT *, you’re transferring potentially unnecessary data over the network, increasing memory usage, and making your code brittle to schema changes. If someone adds a large text or binary column to your table later, your query suddenly becomes much slower without any changes to your code.
Instead, explicitly name the columns you need:
-- Avoid this
SELECT * FROM employees WHERE department_id = 5;
-- Do this instead
SELECT employee_id, first_name, last_name, email, hire_date
FROM employees
WHERE department_id = 5;
This approach makes your intentions clear, improves performance, and ensures that schema changes don’t unexpectedly break your application logic.
Leverage Indexes Effectively
Indexes are the backbone of query performance, but they require thoughtful application. When you create an index on a column, the database builds a data structure that allows rapid lookups, similar to an index in a book.
The most significant performance gains come from indexing columns used in WHERE clauses, JOIN conditions, and ORDER BY statements. However, indexes aren’t free: they consume disk space and slow down INSERT, UPDATE, and DELETE operations because the index must be maintained.
Consider this scenario:
-- Without an index on last_name, this scans the entire table
SELECT employee_id, first_name, last_name
FROM employees
WHERE last_name = 'Smith';
-- Create an index to speed this up
CREATE INDEX idx_employees_last_name ON employees(last_name);
For composite queries, composite indexes can be powerful:
-- Create a composite index for queries filtering on multiple columns
CREATE INDEX idx_employees_dept_status
ON employees(department_id, employment_status);
-- This query benefits from the composite index
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 5
AND employment_status = 'ACTIVE';
Remember that the order of columns in composite indexes matters. The database can use the index for queries that filter on the leftmost columns, but not for queries that only filter on columns later in the index.
Write Readable Queries with Consistent Formatting
SQL queries can become complex quickly, especially when dealing with multiple joins, subqueries, and conditions. Consistent formatting makes queries easier to understand, debug, and modify.
Adopt a formatting convention and stick to it. Here’s an example of well-formatted SQL:
SELECT
e.employee_id,
e.first_name,
e.last_name,
d.department_name,
SUM(s.sale_amount) AS total_sales
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id
INNER JOIN sales s
ON e.employee_id = s.employee_id
WHERE e.employment_status = 'ACTIVE'
AND s.sale_date >= '2024-01-01'
GROUP BY
e.employee_id,
e.first_name,
e.last_name,
d.department_name
HAVING SUM(s.sale_amount) > 10000
ORDER BY total_sales DESC;
Notice how each clause starts on a new line, joins are indented, and related conditions are aligned. This structure makes the query’s logic immediately apparent.
Use JOINs Instead of Subqueries When Possible
While both JOINs and subqueries can achieve similar results, JOINs typically perform better because database optimizers can work with them more efficiently.
Compare these approaches:
-- Using a subquery (often slower)
SELECT e.employee_id, e.first_name, e.last_name
FROM employees e
WHERE e.department_id IN (
SELECT department_id
FROM departments
WHERE location = 'New York'
);
-- Using a JOIN (typically faster)
SELECT e.employee_id, e.first_name, e.last_name
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id
WHERE d.location = 'New York';
The JOIN version allows the optimizer to create a more efficient execution plan and makes your intention clearer. However, there are cases where subqueries are necessary or more readable, particularly with correlated subqueries or when you need to perform complex aggregations.
Avoid Functions on Indexed Columns in WHERE Clauses
When you apply functions to columns in WHERE clauses, you prevent the database from using indexes on those columns effectively. This forces a full table scan, which can be devastating for performance on large tables.
-- This prevents index usage on hire_date
SELECT employee_id, first_name, last_name
FROM employees
WHERE YEAR(hire_date) = 2023;
-- This allows the index to be used
SELECT employee_id, first_name, last_name
FROM employees
WHERE hire_date >= '2023-01-01'
AND hire_date < '2024-01-01';
The second query is functionally equivalent but allows the database to use an index on hire_date, potentially making it orders of magnitude faster.
Use EXISTS Instead of COUNT for Existence Checks
When you need to determine whether rows exist that meet certain criteria, using EXISTS is more efficient than counting rows. EXISTS stops searching as soon as it finds a single matching row, while COUNT must examine all matching rows.
-- Inefficient: counts all matching rows
SELECT employee_id, first_name, last_name
FROM employees e
WHERE (SELECT COUNT(*)
FROM sales s
WHERE s.employee_id = e.employee_id) > 0;
-- Efficient: stops at first match
SELECT employee_id, first_name, last_name
FROM employees e
WHERE EXISTS (
SELECT 1
FROM sales s
WHERE s.employee_id = e.employee_id
);
Notice also that within the EXISTS subquery, we use SELECT 1 rather than selecting specific columns. Since EXISTS only cares whether rows exist, not what data they contain, this is a minor optimization that makes the intent clear.
Be Careful with OR Conditions and Indexes
OR conditions can prevent effective index usage, especially when the conditions reference different columns. The database often can’t use indexes efficiently when faced with complex OR logic.
-- May not use indexes effectively
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 5 OR salary > 100000;
-- Consider using UNION for better performance
SELECT employee_id, first_name, last_name
FROM employees
WHERE department_id = 5
UNION
SELECT employee_id, first_name, last_name
FROM employees
WHERE salary > 100000;
The UNION approach allows the database to use separate indexes for each condition, potentially offering better performance. However, this depends on your specific database and data distribution, so testing is important.
Use LIMIT to Control Result Sets
When working with large tables, always use LIMIT (or TOP in SQL Server) when you don’t need all results. This is especially important during development and testing.
-- Good practice during development
SELECT employee_id, first_name, last_name
FROM employees
WHERE employment_status = 'ACTIVE'
ORDER BY hire_date DESC
LIMIT 100;
This prevents accidentally pulling millions of rows into your application or overwhelming your display logic. In production, proper pagination using LIMIT and OFFSET (or more sophisticated techniques) ensures your application remains responsive.
Understand and Use EXPLAIN
Every major database system provides tools to show you how queries will be executed. In MySQL and PostgreSQL, this is the EXPLAIN command. Learning to read execution plans is one of the most valuable skills for query optimization.
EXPLAIN SELECT
e.employee_id,
e.first_name,
e.last_name,
COUNT(s.sale_id) AS sale_count
FROM employees e
LEFT JOIN sales s ON e.employee_id = s.employee_id
WHERE e.department_id = 5
GROUP BY e.employee_id, e.first_name, e.last_name;
The output shows you whether indexes are being used, the estimated number of rows scanned, the join type, and other critical information. Look for table scans on large tables, high row estimates, or expensive operations that might indicate optimization opportunities.
Use Transactions Appropriately
When you need to execute multiple related statements that should succeed or fail as a unit, wrap them in a transaction. This ensures data consistency and integrity.
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 1;
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 2;
COMMIT;
If any statement fails, you can ROLLBACK the transaction to undo all changes. This is critical for operations like transferring money between accounts, where partial completion would leave your data in an inconsistent state.
Avoid SELECT DISTINCT When Possible
DISTINCT forces the database to sort and deduplicate results, which can be expensive. Often, the need for DISTINCT indicates a problem with your query logic or data model.
-- If you're getting duplicates, understand why
SELECT DISTINCT e.first_name, e.last_name
FROM employees e
INNER JOIN sales s ON e.employee_id = s.employee_id;
-- Better: Use appropriate grouping
SELECT e.first_name, e.last_name
FROM employees e
INNER JOIN sales s ON e.employee_id = s.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name;
The GROUP BY approach makes it clear why you’re getting unique employee records and often performs better, especially if you need to add aggregations.
Practical Exercise: Putting It All Together
Here’s a sample database schema you can create to practice these concepts:
-- Create sample tables
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100),
location VARCHAR(100)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
hire_date DATE,
salary DECIMAL(10, 2),
department_id INT,
employment_status VARCHAR(20),
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
employee_id INT,
sale_date DATE,
sale_amount DECIMAL(10, 2),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
-- Insert sample data
INSERT INTO departments VALUES
(1, 'Sales', 'New York'),
(2, 'Engineering', 'San Francisco'),
(3, 'Marketing', 'New York');
INSERT INTO employees VALUES
(1, 'John', 'Smith', '[email protected]', '2020-01-15', 75000, 1, 'ACTIVE'),
(2, 'Jane', 'Doe', '[email protected]', '2019-03-22', 95000, 2, 'ACTIVE'),
(3, 'Bob', 'Johnson', '[email protected]', '2021-07-10', 65000, 1, 'ACTIVE');
INSERT INTO sales VALUES
(1, 1, '2024-01-15', 5000),
(2, 1, '2024-02-20', 7500),
(3, 3, '2024-01-30', 3000);
Now try writing queries that apply the best practices discussed above. For example, find all active employees in New York departments with their total sales, using proper formatting, explicit columns, and appropriate joins.
Conclusion
Mastering SQL query best practices is an ongoing journey. As you work with different databases and larger datasets, you’ll develop intuition for what works well in your specific context. The key is to always think about performance, readability, and maintainability in balance. Use your database’s profiling tools, test your queries with realistic data volumes, and never stop learning about how your specific database engine optimizes queries. These foundational practices will serve you well regardless of which database system you’re using.

Leave a Reply