Your application has been running smoothly in production for months, handling thousands of queries per day without issue. Then suddenly, response times spike to 30 seconds for what should be millisecond queries. The culprit? A seemingly innocent query that worked fine on small datasets but becomes a performance nightmare as data grows. Sound familiar?
Database performance tuning is where theoretical knowledge meets harsh production reality. It's not enough to know that indexes speed up queries—you need to understand when a covering index outperforms a clustered index, why the query optimizer sometimes ignores your carefully crafted indexes, and how to rewrite queries that look elegant but execute terribly.
This lesson will transform you from someone who adds indexes reactively to someone who designs indexing strategies proactively. You'll learn to read execution plans like a detective examining crime scene evidence, understand when conventional wisdom fails, and develop the intuition to spot performance bottlenecks before they reach production.
What you'll learn:
You should be comfortable writing complex SQL queries with joins, subqueries, and window functions. Experience with at least one major database system (PostgreSQL, MySQL, SQL Server, Oracle) is essential, along with basic understanding of how databases store and retrieve data. Familiarity with EXPLAIN/EXPLAIN PLAN output will be helpful but not required.
Before diving into indexing strategies, we need to understand how databases execute queries. Every query goes through several phases: parsing, optimization, execution planning, and execution. The query optimizer is your frenemy—sometimes brilliant, sometimes stubbornly obtuse.
Let's start with a realistic scenario. Imagine you're working with an e-commerce platform that tracks customer orders, products, and reviews:
-- Sample schema for our examples
CREATE TABLE customers (
customer_id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE,
registration_date DATE,
country_code CHAR(2),
customer_tier ENUM('bronze', 'silver', 'gold', 'platinum'),
total_lifetime_value DECIMAL(10,2)
);
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT,
order_date DATETIME,
order_status VARCHAR(20),
order_total DECIMAL(10,2),
shipping_country CHAR(2),
payment_method VARCHAR(20)
);
CREATE TABLE order_items (
order_id BIGINT,
product_id BIGINT,
quantity INT,
unit_price DECIMAL(8,2),
discount_amount DECIMAL(8,2)
);
CREATE TABLE products (
product_id BIGINT PRIMARY KEY,
category_id INT,
brand_id INT,
product_name VARCHAR(500),
base_price DECIMAL(8,2),
weight_grams INT,
is_active BOOLEAN
);
Now consider this query that seems straightforward but hides several performance traps:
SELECT
c.email,
c.customer_tier,
COUNT(DISTINCT o.order_id) as order_count,
SUM(o.order_total) as total_spent,
AVG(oi.unit_price) as avg_item_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.registration_date >= '2023-01-01'
AND p.category_id IN (15, 23, 47, 89)
AND o.order_status = 'completed'
GROUP BY c.customer_id, c.email, c.customer_tier
HAVING COUNT(DISTINCT o.order_id) > 5
ORDER BY total_spent DESC
LIMIT 100;
This query joins four tables, applies multiple filters, performs aggregations, and sorts results. On a dataset with millions of customers and orders, this could take minutes without proper indexing and optimization.
Let's examine the execution plan to understand what's happening:
-- PostgreSQL example
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT /* your query here */;
-- SQL Server example
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT /* your query here */;
-- MySQL example
EXPLAIN FORMAT=JSON
SELECT /* your query here */;
The execution plan reveals the database's strategy: which tables it scans first, which join algorithms it uses, and where it applies filters. Common performance killers you'll spot include:
Pro tip: Always capture execution plans for both your test environment and production. Query optimizers make different decisions based on data distribution, table statistics, and available memory.
Indexes are not just "make queries faster" magic—they're data structures with specific strengths, weaknesses, and use cases. Understanding these nuances separates competent database professionals from exceptional ones.
B-tree indexes excel at range queries, equality searches, and ordered traversals. But their effectiveness depends heavily on selectivity and data distribution:
-- Highly selective - excellent for B-tree
CREATE INDEX idx_customers_email ON customers (email);
-- Low selectivity - potentially wasteful
CREATE INDEX idx_customers_tier ON customers (customer_tier);
-- Composite index - column order matters
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);
The composite index above demonstrates a crucial principle: column order in composite indexes affects performance dramatically. Leading with customer_id makes this index useful for:
Covering indexes include all columns needed for a query, eliminating the need for key lookups:
-- Before: requires key lookup to fetch order_total
SELECT customer_id, order_date, order_total
FROM orders
WHERE customer_id = 12345
AND order_date BETWEEN '2024-01-01' AND '2024-01-31';
-- Create covering index
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (order_total);
-- Now the query can be satisfied entirely from the index
Covering indexes are especially powerful for reporting queries that need to aggregate data:
-- This aggregation query benefits from a covering index
CREATE INDEX idx_orders_analytics
ON orders (order_status, order_date)
INCLUDE (customer_id, order_total, shipping_country);
SELECT
order_status,
COUNT(*) as order_count,
SUM(order_total) as revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY order_status;
Filtered indexes include only rows that meet specific conditions, making them smaller and more efficient for targeted queries:
-- Only index active products
CREATE INDEX idx_products_active_category
ON products (category_id, brand_id)
WHERE is_active = true;
-- Only index recent orders
CREATE INDEX idx_orders_recent
ON orders (customer_id, order_date, order_total)
WHERE order_date >= '2024-01-01';
-- Only index high-value customers
CREATE INDEX idx_customers_premium
ON customers (country_code, registration_date)
WHERE customer_tier IN ('gold', 'platinum');
Filtered indexes are particularly valuable for:
Modern query optimizers can combine multiple single-column indexes through index intersection. This works well when:
-- Separate indexes
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (order_status);
CREATE INDEX idx_orders_date ON orders (order_date);
-- Query that can use index intersection
SELECT order_id, order_total
FROM orders
WHERE customer_id = 12345
AND order_status = 'completed'
AND order_date >= '2024-01-01';
However, index intersection has limitations:
Sometimes the most dramatic performance improvements come not from better indexes but from fundamentally rewriting queries. Here are battle-tested techniques that can transform query performance.
Correlated subqueries often perform poorly because they execute once for each row in the outer query:
-- Poor performing correlated subquery
SELECT
c.customer_id,
c.email,
c.total_lifetime_value
FROM customers c
WHERE c.customer_id IN (
SELECT o.customer_id
FROM orders o
WHERE o.order_date >= '2024-01-01'
AND o.order_total > 100.00
);
-- Rewritten as JOIN with EXISTS for better performance
SELECT DISTINCT
c.customer_id,
c.email,
c.total_lifetime_value
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2024-01-01'
AND o.order_total > 100.00
);
-- Or even better, eliminate the subquery entirely
SELECT DISTINCT
c.customer_id,
c.email,
c.total_lifetime_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
AND o.order_total > 100.00;
Window functions can replace expensive self-joins and multiple passes through data:
-- Inefficient: multiple subqueries
SELECT
o.order_id,
o.order_date,
o.order_total,
(SELECT COUNT(*) FROM orders o2
WHERE o2.customer_id = o.customer_id
AND o2.order_date <= o.order_date) as order_sequence,
(SELECT AVG(order_total) FROM orders o3
WHERE o3.customer_id = o.customer_id) as customer_avg_order
FROM orders o
WHERE o.order_date >= '2024-01-01';
-- Efficient: single pass with window functions
SELECT
order_id,
order_date,
order_total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) as order_sequence,
AVG(order_total) OVER (
PARTITION BY customer_id
) as customer_avg_order
FROM orders
WHERE order_date >= '2024-01-01';
Sometimes the best performance optimization is questioning whether you need perfect normalization. Consider this common scenario:
-- Normalized approach requiring joins
SELECT
p.product_name,
c.category_name,
b.brand_name,
SUM(oi.quantity * oi.unit_price) as revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
JOIN brands b ON p.brand_id = b.brand_id
WHERE oi.order_date >= '2024-01-01'
GROUP BY p.product_id, p.product_name, c.category_name, b.brand_name;
For frequently accessed reporting data, strategic denormalization can eliminate joins:
-- Denormalized reporting table
CREATE TABLE product_sales_summary (
product_id BIGINT,
product_name VARCHAR(500),
category_id INT,
category_name VARCHAR(100),
brand_id INT,
brand_name VARCHAR(100),
order_date DATE,
daily_quantity INT,
daily_revenue DECIMAL(12,2)
);
-- Much faster query on denormalized data
SELECT
product_name,
category_name,
brand_name,
SUM(daily_revenue) as revenue
FROM product_sales_summary
WHERE order_date >= '2024-01-01'
GROUP BY product_id, product_name, category_name, brand_name;
This approach requires careful management of data consistency, typically through:
Table partitioning can dramatically improve query performance by eliminating large portions of data from consideration:
-- Partition orders by year
CREATE TABLE orders_2024 (
CHECK (order_date >= '2024-01-01' AND order_date < '2025-01-01')
) INHERITS (orders);
CREATE TABLE orders_2023 (
CHECK (order_date >= '2023-01-01' AND order_date < '2024-01-01')
) INHERITS (orders);
-- Queries automatically use partition elimination
SELECT COUNT(*) FROM orders
WHERE order_date BETWEEN '2024-06-01' AND '2024-06-30';
-- Only scans orders_2024 partition
Query optimizer hints are powerful tools that should be used sparingly and with deep understanding. They override the optimizer's decisions, which can lead to performance improvements in specific cases but can also cause performance degradation when data patterns change.
Sometimes the optimizer chooses suboptimal indexes, especially in complex queries with multiple viable options:
-- SQL Server syntax
SELECT /*+ INDEX(orders, idx_orders_customer_date) */
customer_id, order_date, order_total
FROM orders
WHERE customer_id = 12345
AND order_date >= '2024-01-01';
-- MySQL syntax
SELECT customer_id, order_date, order_total
FROM orders USE INDEX (idx_orders_customer_date)
WHERE customer_id = 12345
AND order_date >= '2024-01-01';
-- PostgreSQL (less common, more often solved with better statistics)
SELECT customer_id, order_date, order_total
FROM orders
WHERE customer_id = 12345
AND order_date >= '2024-01-01';
Different join algorithms have different performance characteristics:
-- Force hash join for better performance on large datasets
SELECT /*+ USE_HASH(c,o) */
c.email,
COUNT(o.order_id)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2023-01-01'
GROUP BY c.customer_id, c.email;
-- Force nested loop join for small result sets
SELECT /*+ USE_NL(c,o) */
c.email,
o.order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = 12345;
Warning: Hints should be thoroughly tested and documented. They can cause performance regressions when data volumes or patterns change. Always have a rollback plan and monitoring in place.
Each database system has unique features and optimization opportunities. Let's explore some advanced techniques for major platforms.
PostgreSQL offers several powerful performance features:
-- Partial indexes for complex conditions
CREATE INDEX idx_orders_problematic
ON orders (customer_id, order_date)
WHERE order_status IN ('pending', 'processing')
AND order_total > 1000;
-- Expression indexes for computed values
CREATE INDEX idx_customers_name_search
ON customers (LOWER(email));
-- GIN indexes for array and JSON data
CREATE INDEX idx_product_tags
ON products USING GIN (tags);
-- Functional indexes with custom operators
CREATE INDEX idx_products_tsearch
ON products USING GIN (to_tsvector('english', product_name));
PostgreSQL's query planner can also benefit from custom statistics:
-- Increase statistics target for better cardinality estimates
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
-- Create extended statistics for correlated columns
CREATE STATISTICS orders_correlation (dependencies)
ON customer_id, order_date FROM orders;
SQL Server offers columnstore indexes for analytical workloads:
-- Clustered columnstore for fact tables
CREATE CLUSTERED COLUMNSTORE INDEX cci_order_items
ON order_items_analytics;
-- Non-clustered columnstore for mixed workloads
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci_orders_analytics
ON orders (customer_id, order_date, order_total, order_status);
-- Memory-optimized tables for high-concurrency scenarios
CREATE TABLE orders_memory_optimized (
order_id BIGINT PRIMARY KEY NONCLUSTERED,
customer_id BIGINT INDEX ix_customer NONCLUSTERED,
order_date DATETIME2 INDEX ix_date NONCLUSTERED,
order_total DECIMAL(10,2)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
MySQL's InnoDB engine has specific tuning opportunities:
-- Optimize for InnoDB's clustered index structure
CREATE TABLE orders_optimized (
order_id BIGINT AUTO_INCREMENT PRIMARY KEY,
customer_id BIGINT,
order_date DATETIME,
order_total DECIMAL(10,2),
-- Put frequently accessed columns first
order_status VARCHAR(20),
shipping_address TEXT,
-- Secondary indexes
KEY idx_customer_date (customer_id, order_date),
KEY idx_status_date (order_status, order_date)
) ENGINE=InnoDB;
-- Use covering indexes to avoid primary key lookups
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date, order_status, order_total);
Let's work through a complete performance optimization scenario. You've been asked to optimize this slow-running analytics query that executives run daily:
-- Problematic query taking 45+ seconds
SELECT
c.country_code,
c.customer_tier,
DATE_FORMAT(o.order_date, '%Y-%m') as order_month,
COUNT(DISTINCT c.customer_id) as unique_customers,
COUNT(o.order_id) as total_orders,
SUM(o.order_total) as total_revenue,
AVG(o.order_total) as avg_order_value,
SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as high_value_orders
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
AND o.order_status = 'completed'
AND c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.country_code, c.customer_tier, DATE_FORMAT(o.order_date, '%Y-%m')
ORDER BY c.country_code, order_month DESC;
First, examine what the database is doing:
EXPLAIN ANALYZE
SELECT /* original query */;
Common issues you'll likely find:
Based on the execution plan analysis, create indexes that support the query pattern:
-- Support the WHERE conditions
CREATE INDEX idx_customers_tier_country
ON customers (customer_tier, country_code)
WHERE customer_tier IN ('silver', 'gold', 'platinum');
CREATE INDEX idx_orders_status_date_customer
ON orders (order_status, order_date, customer_id)
WHERE order_status = 'completed';
-- Consider a covering index for the aggregation
CREATE INDEX idx_orders_analytics_covering
ON orders (order_status, order_date, customer_id)
INCLUDE (order_total)
WHERE order_status = 'completed'
AND order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 25 MONTH);
Eliminate function calls in GROUP BY and optimize the join order:
-- Optimized version
WITH monthly_orders AS (
SELECT
o.customer_id,
YEAR(o.order_date) as order_year,
MONTH(o.order_date) as order_month,
COUNT(o.order_id) as customer_orders,
SUM(o.order_total) as customer_revenue,
SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as customer_high_value_orders
FROM orders o
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
AND o.order_status = 'completed'
GROUP BY o.customer_id, YEAR(o.order_date), MONTH(o.order_date)
)
SELECT
c.country_code,
c.customer_tier,
CONCAT(mo.order_year, '-', LPAD(mo.order_month, 2, '0')) as order_month,
COUNT(DISTINCT c.customer_id) as unique_customers,
SUM(mo.customer_orders) as total_orders,
SUM(mo.customer_revenue) as total_revenue,
AVG(mo.customer_revenue / mo.customer_orders) as avg_order_value,
SUM(mo.customer_high_value_orders) as high_value_orders
FROM customers c
JOIN monthly_orders mo ON c.customer_id = mo.customer_id
WHERE c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.country_code, c.customer_tier, mo.order_year, mo.order_month
ORDER BY c.country_code, mo.order_year DESC, mo.order_month DESC;
For queries that run frequently, consider pre-computing the results:
-- Create summary table updated nightly
CREATE TABLE customer_monthly_summary (
customer_id BIGINT,
country_code CHAR(2),
customer_tier VARCHAR(20),
order_year INT,
order_month INT,
order_count INT,
total_revenue DECIMAL(12,2),
high_value_orders INT,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (customer_id, order_year, order_month),
KEY idx_country_tier_date (country_code, customer_tier, order_year, order_month)
);
-- ETL process to populate (run nightly)
INSERT INTO customer_monthly_summary
SELECT
c.customer_id,
c.country_code,
c.customer_tier,
YEAR(o.order_date) as order_year,
MONTH(o.order_date) as order_month,
COUNT(o.order_id) as order_count,
SUM(o.order_total) as total_revenue,
SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as high_value_orders,
CURRENT_TIMESTAMP
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 25 MONTH)
AND o.order_status = 'completed'
AND c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.customer_id, c.country_code, c.customer_tier,
YEAR(o.order_date), MONTH(o.order_date)
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
total_revenue = VALUES(total_revenue),
high_value_orders = VALUES(high_value_orders),
last_updated = VALUES(last_updated);
-- Now the report query is blazing fast
SELECT
country_code,
customer_tier,
CONCAT(order_year, '-', LPAD(order_month, 2, '0')) as order_month,
COUNT(DISTINCT customer_id) as unique_customers,
SUM(order_count) as total_orders,
SUM(total_revenue) as total_revenue,
AVG(total_revenue / order_count) as avg_order_value,
SUM(high_value_orders) as high_value_orders
FROM customer_monthly_summary
WHERE order_year * 100 + order_month >=
YEAR(DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)) * 100 +
MONTH(DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH))
GROUP BY country_code, customer_tier, order_year, order_month
ORDER BY country_code, order_year DESC, order_month DESC;
This approach transforms a 45-second query into a sub-second response, with the trade-off being slightly stale data (updated nightly) and additional storage requirements.
One of the most common mistakes is creating too many indexes. Each index has costs:
-- Don't do this - too many overlapping indexes
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, order_date);
CREATE INDEX idx3 ON orders (customer_id, order_date, order_status);
CREATE INDEX idx4 ON orders (customer_id, order_status);
CREATE INDEX idx5 ON orders (order_date);
CREATE INDEX idx6 ON orders (order_date, customer_id);
-- Better approach - fewer, more strategic indexes
CREATE INDEX idx_orders_primary ON orders (customer_id, order_date, order_status);
CREATE INDEX idx_orders_date_only ON orders (order_date)
WHERE order_date >= '2024-01-01'; -- filtered for recent data only
Monitor index usage regularly:
-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan < 10;
-- SQL Server: Find unused indexes
SELECT
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.user_seeks + s.user_scans + s.user_lookups = 0
AND i.type > 0; -- Exclude heaps
Query optimizers rely on statistics to make decisions. Outdated or inaccurate statistics lead to poor execution plans:
-- Update statistics regularly, especially after large data loads
ANALYZE TABLE customers;
UPDATE STATISTICS orders WITH FULLSCAN;
-- Check for data skew that might affect optimizer decisions
SELECT
customer_tier,
COUNT(*) as customer_count,
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () as percentage
FROM customers
GROUP BY customer_tier;
If you discover severe data skew (e.g., 95% of customers are 'bronze' tier), consider:
Focus on the biggest performance bottlenecks first:
-- Don't spend time optimizing this if it runs once per day
SELECT COUNT(*) FROM archive_table WHERE created_date < '2020-01-01';
-- Focus on queries like this that run hundreds of times per minute
SELECT product_name, current_inventory
FROM products
WHERE product_id = ?;
Use query profiling tools to identify your actual performance bottlenecks:
-- Enable query logging to find slow queries
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
SET GLOBAL log_queries_not_using_indexes = 'ON';
Query plans can change due to statistics updates, data growth, or database configuration changes. Monitor for plan regressions:
-- SQL Server: Enable Query Store for plan history
ALTER DATABASE YourDatabase SET QUERY_STORE = ON;
-- PostgreSQL: Use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;
-- Monitor for queries with degraded performance
SELECT
query,
calls,
total_time,
total_time/calls as avg_time,
rows,
100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Performance tuning isn't a one-time activity—it requires ongoing monitoring and maintenance. Develop a systematic approach:
Set up alerts for key performance metrics:
-- Track query performance trends
SELECT
DATE(query_start_time) as query_date,
AVG(duration_ms) as avg_duration,
MAX(duration_ms) as max_duration,
COUNT(*) as query_count
FROM query_performance_log
WHERE query_text LIKE '%orders%'
GROUP BY DATE(query_start_time)
ORDER BY query_date DESC;
-- Monitor index effectiveness
SELECT
table_name,
index_name,
reads_per_write,
CASE
WHEN reads_per_write < 10 THEN 'Consider dropping'
WHEN reads_per_write > 100 THEN 'Very effective'
ELSE 'Moderate effectiveness'
END as effectiveness
FROM (
SELECT
table_name,
index_name,
CASE WHEN user_updates > 0
THEN (user_seeks + user_scans) / user_updates
ELSE user_seeks + user_scans
END as reads_per_write
FROM index_usage_stats
) effectiveness_analysis;
Establish routine maintenance procedures:
-- Weekly: Update statistics on high-churn tables
UPDATE STATISTICS orders WITH FULLSCAN;
UPDATE STATISTICS customers;
-- Monthly: Reorganize fragmented indexes
SELECT
table_name,
index_name,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')
WHERE avg_fragmentation_in_percent > 30;
-- Quarterly: Review and optimize worst-performing queries
SELECT TOP 10
query_hash,
SUM(total_worker_time) as total_cpu,
SUM(execution_count) as total_executions,
SUM(total_worker_time) / SUM(execution_count) as avg_cpu_per_execution
FROM sys.dm_exec_query_stats
GROUP BY query_hash
ORDER BY total_cpu DESC;
Maintain documentation of your optimization decisions:
-- Document index rationale
COMMENT ON INDEX idx_orders_customer_date IS
'Created for daily sales reports. Covers 85% of order-related queries.
Review if report requirements change.';
-- Track performance baselines
CREATE TABLE performance_baselines (
query_id VARCHAR(100),
baseline_date DATE,
avg_duration_ms DECIMAL(10,2),
query_description TEXT
);
This documentation proves invaluable when team members change or when revisiting optimization decisions months later.
Database performance tuning combines deep technical knowledge with practical problem-solving skills. You've learned to approach performance issues systematically: analyze execution plans to understand bottlenecks, design indexes strategically rather than reactively, and rewrite queries when structural changes yield better results than index optimization.
The key insights from this lesson:
Index design is an architectural decision, not a tactical one. Consider data access patterns, query workloads, and maintenance costs holistically rather than adding indexes to fix individual slow queries.
Query rewriting often outperforms index optimization for complex analytical queries. Don't hesitate to restructure queries fundamentally when performance demands it.
Database-specific features matter at the expert level. Each platform offers unique optimization opportunities that generic approaches miss.
Monitoring and maintenance are essential for sustained performance. Query plans change, data grows, and access patterns evolve—successful tuning requires ongoing attention.
Your next steps should focus on deepening expertise in your primary database platform while building experience with the systematic approach demonstrated here. Consider these advanced topics:
The skills you've developed here—reading execution plans, understanding optimizer behavior, and thinking systematically about performance trade-offs—transfer directly to these advanced scenarios. Start applying these techniques to your production systems, but remember: always test thoroughly and maintain rollback plans when implementing performance changes.
Learning Path: Advanced SQL Queries