Database Performance Tuning: Indexing Strategies and Query Rewriting

Your application has been running smoothly in production for months, handling thousands of queries per day without issue. Then suddenly, response times spike to 30 seconds for what should be millisecond queries. The culprit? A seemingly innocent query that worked fine on small datasets but becomes a performance nightmare as data grows. Sound familiar?

Database performance tuning is where theoretical knowledge meets harsh production reality. It's not enough to know that indexes speed up queries—you need to understand when a covering index outperforms a clustered index, why the query optimizer sometimes ignores your carefully crafted indexes, and how to rewrite queries that look elegant but execute terribly.

This lesson will transform you from someone who adds indexes reactively to someone who designs indexing strategies proactively. You'll learn to read execution plans like a detective examining crime scene evidence, understand when conventional wisdom fails, and develop the intuition to spot performance bottlenecks before they reach production.

What you'll learn:

How to analyze query execution plans to identify specific performance bottlenecks
Advanced indexing strategies including covering indexes, filtered indexes, and index intersection
Query rewriting techniques that can improve performance by orders of magnitude
When and how to denormalize data structures for performance without sacrificing data integrity
How to use database-specific optimizer hints and when they're necessary vs. harmful
Performance monitoring strategies that help you tune proactively rather than reactively

Prerequisites

You should be comfortable writing complex SQL queries with joins, subqueries, and window functions. Experience with at least one major database system (PostgreSQL, MySQL, SQL Server, Oracle) is essential, along with basic understanding of how databases store and retrieve data. Familiarity with EXPLAIN/EXPLAIN PLAN output will be helpful but not required.

Understanding Query Execution: The Foundation of Performance Tuning

Before diving into indexing strategies, we need to understand how databases execute queries. Every query goes through several phases: parsing, optimization, execution planning, and execution. The query optimizer is your frenemy—sometimes brilliant, sometimes stubbornly obtuse.

Let's start with a realistic scenario. Imagine you're working with an e-commerce platform that tracks customer orders, products, and reviews:

-- Sample schema for our examples
CREATE TABLE customers (
    customer_id BIGINT PRIMARY KEY,
    email VARCHAR(255) UNIQUE,
    registration_date DATE,
    country_code CHAR(2),
    customer_tier ENUM('bronze', 'silver', 'gold', 'platinum'),
    total_lifetime_value DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    order_date DATETIME,
    order_status VARCHAR(20),
    order_total DECIMAL(10,2),
    shipping_country CHAR(2),
    payment_method VARCHAR(20)
);

CREATE TABLE order_items (
    order_id BIGINT,
    product_id BIGINT,
    quantity INT,
    unit_price DECIMAL(8,2),
    discount_amount DECIMAL(8,2)
);

CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    category_id INT,
    brand_id INT,
    product_name VARCHAR(500),
    base_price DECIMAL(8,2),
    weight_grams INT,
    is_active BOOLEAN
);

Now consider this query that seems straightforward but hides several performance traps:

SELECT 
    c.email,
    c.customer_tier,
    COUNT(DISTINCT o.order_id) as order_count,
    SUM(o.order_total) as total_spent,
    AVG(oi.unit_price) as avg_item_price
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE c.registration_date >= '2023-01-01'
    AND p.category_id IN (15, 23, 47, 89)
    AND o.order_status = 'completed'
GROUP BY c.customer_id, c.email, c.customer_tier
HAVING COUNT(DISTINCT o.order_id) > 5
ORDER BY total_spent DESC
LIMIT 100;

This query joins four tables, applies multiple filters, performs aggregations, and sorts results. On a dataset with millions of customers and orders, this could take minutes without proper indexing and optimization.

Let's examine the execution plan to understand what's happening:

-- PostgreSQL example
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT /* your query here */;

-- SQL Server example  
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT /* your query here */;

-- MySQL example
EXPLAIN FORMAT=JSON 
SELECT /* your query here */;

The execution plan reveals the database's strategy: which tables it scans first, which join algorithms it uses, and where it applies filters. Common performance killers you'll spot include:

Sequential scans on large tables instead of index seeks
Nested loop joins when hash joins would be more efficient
Sorting operations on large result sets
Late filtering where filters are applied after expensive joins

Pro tip: Always capture execution plans for both your test environment and production. Query optimizers make different decisions based on data distribution, table statistics, and available memory.

The Art and Science of Index Design

Indexes are not just "make queries faster" magic—they're data structures with specific strengths, weaknesses, and use cases. Understanding these nuances separates competent database professionals from exceptional ones.

B-Tree Indexes: The Workhorses

B-tree indexes excel at range queries, equality searches, and ordered traversals. But their effectiveness depends heavily on selectivity and data distribution:

-- Highly selective - excellent for B-tree
CREATE INDEX idx_customers_email ON customers (email);

-- Low selectivity - potentially wasteful
CREATE INDEX idx_customers_tier ON customers (customer_tier);

-- Composite index - column order matters
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date);

The composite index above demonstrates a crucial principle: column order in composite indexes affects performance dramatically. Leading with customer_id makes this index useful for:

Queries filtering by customer_id alone
Queries filtering by customer_id and order_date
But NOT queries filtering by order_date alone

Covering Indexes: The Performance Multipliers

Covering indexes include all columns needed for a query, eliminating the need for key lookups:

-- Before: requires key lookup to fetch order_total
SELECT customer_id, order_date, order_total 
FROM orders 
WHERE customer_id = 12345 
    AND order_date BETWEEN '2024-01-01' AND '2024-01-31';

-- Create covering index
CREATE INDEX idx_orders_covering 
ON orders (customer_id, order_date) 
INCLUDE (order_total);

-- Now the query can be satisfied entirely from the index

Covering indexes are especially powerful for reporting queries that need to aggregate data:

-- This aggregation query benefits from a covering index
CREATE INDEX idx_orders_analytics 
ON orders (order_status, order_date) 
INCLUDE (customer_id, order_total, shipping_country);

SELECT 
    order_status,
    COUNT(*) as order_count,
    SUM(order_total) as revenue,
    COUNT(DISTINCT customer_id) as unique_customers
FROM orders 
WHERE order_date >= '2024-01-01'
GROUP BY order_status;

Filtered Indexes: Precision Tools

Filtered indexes include only rows that meet specific conditions, making them smaller and more efficient for targeted queries:

-- Only index active products
CREATE INDEX idx_products_active_category 
ON products (category_id, brand_id) 
WHERE is_active = true;

-- Only index recent orders
CREATE INDEX idx_orders_recent 
ON orders (customer_id, order_date, order_total)
WHERE order_date >= '2024-01-01';

-- Only index high-value customers
CREATE INDEX idx_customers_premium 
ON customers (country_code, registration_date)
WHERE customer_tier IN ('gold', 'platinum');

Filtered indexes are particularly valuable for:

Tables with logical deletions (WHERE deleted_at IS NULL)
Time-series data where queries focus on recent records
Multi-tenant applications (WHERE tenant_id = ?)

Index Intersection: When Multiple Indexes Combine

Modern query optimizers can combine multiple single-column indexes through index intersection. This works well when:

-- Separate indexes
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (order_status);
CREATE INDEX idx_orders_date ON orders (order_date);

-- Query that can use index intersection
SELECT order_id, order_total
FROM orders 
WHERE customer_id = 12345 
    AND order_status = 'completed'
    AND order_date >= '2024-01-01';

However, index intersection has limitations:

It's less efficient than a single composite index for frequently used combinations
The optimizer might not choose intersection when it would be optimal
Each additional index increases storage and maintenance costs

Advanced Query Rewriting Techniques

Sometimes the most dramatic performance improvements come not from better indexes but from fundamentally rewriting queries. Here are battle-tested techniques that can transform query performance.

Subquery to Join Conversion

Correlated subqueries often perform poorly because they execute once for each row in the outer query:

-- Poor performing correlated subquery
SELECT 
    c.customer_id,
    c.email,
    c.total_lifetime_value
FROM customers c
WHERE c.customer_id IN (
    SELECT o.customer_id 
    FROM orders o 
    WHERE o.order_date >= '2024-01-01'
        AND o.order_total > 100.00
);

-- Rewritten as JOIN with EXISTS for better performance
SELECT DISTINCT
    c.customer_id,
    c.email,
    c.total_lifetime_value
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id
        AND o.order_date >= '2024-01-01'
        AND o.order_total > 100.00
);

-- Or even better, eliminate the subquery entirely
SELECT DISTINCT
    c.customer_id,
    c.email,
    c.total_lifetime_value
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
    AND o.order_total > 100.00;

Window Functions for Complex Aggregations

Window functions can replace expensive self-joins and multiple passes through data:

-- Inefficient: multiple subqueries
SELECT 
    o.order_id,
    o.order_date,
    o.order_total,
    (SELECT COUNT(*) FROM orders o2 
     WHERE o2.customer_id = o.customer_id 
     AND o2.order_date <= o.order_date) as order_sequence,
    (SELECT AVG(order_total) FROM orders o3 
     WHERE o3.customer_id = o.customer_id) as customer_avg_order
FROM orders o
WHERE o.order_date >= '2024-01-01';

-- Efficient: single pass with window functions
SELECT 
    order_id,
    order_date,
    order_total,
    ROW_NUMBER() OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
    ) as order_sequence,
    AVG(order_total) OVER (
        PARTITION BY customer_id
    ) as customer_avg_order
FROM orders
WHERE order_date >= '2024-01-01';

Strategic Denormalization

Sometimes the best performance optimization is questioning whether you need perfect normalization. Consider this common scenario:

-- Normalized approach requiring joins
SELECT 
    p.product_name,
    c.category_name,
    b.brand_name,
    SUM(oi.quantity * oi.unit_price) as revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
JOIN brands b ON p.brand_id = b.brand_id
WHERE oi.order_date >= '2024-01-01'
GROUP BY p.product_id, p.product_name, c.category_name, b.brand_name;

For frequently accessed reporting data, strategic denormalization can eliminate joins:

-- Denormalized reporting table
CREATE TABLE product_sales_summary (
    product_id BIGINT,
    product_name VARCHAR(500),
    category_id INT,
    category_name VARCHAR(100),
    brand_id INT,
    brand_name VARCHAR(100),
    order_date DATE,
    daily_quantity INT,
    daily_revenue DECIMAL(12,2)
);

-- Much faster query on denormalized data
SELECT 
    product_name,
    category_name,
    brand_name,
    SUM(daily_revenue) as revenue
FROM product_sales_summary
WHERE order_date >= '2024-01-01'
GROUP BY product_id, product_name, category_name, brand_name;

This approach requires careful management of data consistency, typically through:

ETL processes that rebuild summary tables nightly
Triggers that maintain consistency in real-time
Application-level logic that updates denormalized data

Partitioning for Large Tables

Table partitioning can dramatically improve query performance by eliminating large portions of data from consideration:

-- Partition orders by year
CREATE TABLE orders_2024 (
    CHECK (order_date >= '2024-01-01' AND order_date < '2025-01-01')
) INHERITS (orders);

CREATE TABLE orders_2023 (
    CHECK (order_date >= '2023-01-01' AND order_date < '2024-01-01')
) INHERITS (orders);

-- Queries automatically use partition elimination
SELECT COUNT(*) FROM orders 
WHERE order_date BETWEEN '2024-06-01' AND '2024-06-30';
-- Only scans orders_2024 partition

Understanding Query Optimizer Hints and When to Use Them

Query optimizer hints are powerful tools that should be used sparingly and with deep understanding. They override the optimizer's decisions, which can lead to performance improvements in specific cases but can also cause performance degradation when data patterns change.

Index Hints: Forcing Specific Index Usage

Sometimes the optimizer chooses suboptimal indexes, especially in complex queries with multiple viable options:

-- SQL Server syntax
SELECT /*+ INDEX(orders, idx_orders_customer_date) */
    customer_id, order_date, order_total
FROM orders 
WHERE customer_id = 12345 
    AND order_date >= '2024-01-01';

-- MySQL syntax
SELECT customer_id, order_date, order_total
FROM orders USE INDEX (idx_orders_customer_date)
WHERE customer_id = 12345 
    AND order_date >= '2024-01-01';

-- PostgreSQL (less common, more often solved with better statistics)
SELECT customer_id, order_date, order_total
FROM orders 
WHERE customer_id = 12345 
    AND order_date >= '2024-01-01';

Join Hints: Controlling Join Algorithms

Different join algorithms have different performance characteristics:

-- Force hash join for better performance on large datasets
SELECT /*+ USE_HASH(c,o) */
    c.email, 
    COUNT(o.order_id)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2023-01-01'
GROUP BY c.customer_id, c.email;

-- Force nested loop join for small result sets
SELECT /*+ USE_NL(c,o) */
    c.email, 
    o.order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = 12345;

Warning: Hints should be thoroughly tested and documented. They can cause performance regressions when data volumes or patterns change. Always have a rollback plan and monitoring in place.

Database-Specific Optimization Techniques

Each database system has unique features and optimization opportunities. Let's explore some advanced techniques for major platforms.

PostgreSQL: Advanced Features

PostgreSQL offers several powerful performance features:

-- Partial indexes for complex conditions
CREATE INDEX idx_orders_problematic 
ON orders (customer_id, order_date) 
WHERE order_status IN ('pending', 'processing') 
    AND order_total > 1000;

-- Expression indexes for computed values
CREATE INDEX idx_customers_name_search 
ON customers (LOWER(email));

-- GIN indexes for array and JSON data
CREATE INDEX idx_product_tags 
ON products USING GIN (tags);

-- Functional indexes with custom operators
CREATE INDEX idx_products_tsearch 
ON products USING GIN (to_tsvector('english', product_name));

PostgreSQL's query planner can also benefit from custom statistics:

-- Increase statistics target for better cardinality estimates
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;

-- Create extended statistics for correlated columns
CREATE STATISTICS orders_correlation (dependencies) 
ON customer_id, order_date FROM orders;

SQL Server: Specialized Index Types

SQL Server offers columnstore indexes for analytical workloads:

-- Clustered columnstore for fact tables
CREATE CLUSTERED COLUMNSTORE INDEX cci_order_items 
ON order_items_analytics;

-- Non-clustered columnstore for mixed workloads
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci_orders_analytics
ON orders (customer_id, order_date, order_total, order_status);

-- Memory-optimized tables for high-concurrency scenarios
CREATE TABLE orders_memory_optimized (
    order_id BIGINT PRIMARY KEY NONCLUSTERED,
    customer_id BIGINT INDEX ix_customer NONCLUSTERED,
    order_date DATETIME2 INDEX ix_date NONCLUSTERED,
    order_total DECIMAL(10,2)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

MySQL: InnoDB Optimizations

MySQL's InnoDB engine has specific tuning opportunities:

-- Optimize for InnoDB's clustered index structure
CREATE TABLE orders_optimized (
    order_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    customer_id BIGINT,
    order_date DATETIME,
    order_total DECIMAL(10,2),
    -- Put frequently accessed columns first
    order_status VARCHAR(20),
    shipping_address TEXT,
    
    -- Secondary indexes
    KEY idx_customer_date (customer_id, order_date),
    KEY idx_status_date (order_status, order_date)
) ENGINE=InnoDB;

-- Use covering indexes to avoid primary key lookups
CREATE INDEX idx_orders_covering 
ON orders (customer_id, order_date, order_status, order_total);

Hands-On Exercise: Complete Performance Optimization

Let's work through a complete performance optimization scenario. You've been asked to optimize this slow-running analytics query that executives run daily:

-- Problematic query taking 45+ seconds
SELECT 
    c.country_code,
    c.customer_tier,
    DATE_FORMAT(o.order_date, '%Y-%m') as order_month,
    COUNT(DISTINCT c.customer_id) as unique_customers,
    COUNT(o.order_id) as total_orders,
    SUM(o.order_total) as total_revenue,
    AVG(o.order_total) as avg_order_value,
    SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as high_value_orders
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
    AND o.order_status = 'completed'
    AND c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.country_code, c.customer_tier, DATE_FORMAT(o.order_date, '%Y-%m')
ORDER BY c.country_code, order_month DESC;

Step 1: Analyze the Current Execution Plan

First, examine what the database is doing:

EXPLAIN ANALYZE 
SELECT /* original query */;

Common issues you'll likely find:

Sequential scans on the customers table for customer_tier filtering
Sequential scans on orders for date filtering
Expensive sorting operation at the end
Repeated evaluation of DATE_FORMAT function

Step 2: Create Targeted Indexes

Based on the execution plan analysis, create indexes that support the query pattern:

-- Support the WHERE conditions
CREATE INDEX idx_customers_tier_country 
ON customers (customer_tier, country_code)
WHERE customer_tier IN ('silver', 'gold', 'platinum');

CREATE INDEX idx_orders_status_date_customer 
ON orders (order_status, order_date, customer_id)
WHERE order_status = 'completed';

-- Consider a covering index for the aggregation
CREATE INDEX idx_orders_analytics_covering
ON orders (order_status, order_date, customer_id)
INCLUDE (order_total)
WHERE order_status = 'completed' 
    AND order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 25 MONTH);

Step 3: Rewrite the Query for Better Performance

Eliminate function calls in GROUP BY and optimize the join order:

-- Optimized version
WITH monthly_orders AS (
    SELECT 
        o.customer_id,
        YEAR(o.order_date) as order_year,
        MONTH(o.order_date) as order_month,
        COUNT(o.order_id) as customer_orders,
        SUM(o.order_total) as customer_revenue,
        SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as customer_high_value_orders
    FROM orders o
    WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)
        AND o.order_status = 'completed'
    GROUP BY o.customer_id, YEAR(o.order_date), MONTH(o.order_date)
)
SELECT 
    c.country_code,
    c.customer_tier,
    CONCAT(mo.order_year, '-', LPAD(mo.order_month, 2, '0')) as order_month,
    COUNT(DISTINCT c.customer_id) as unique_customers,
    SUM(mo.customer_orders) as total_orders,
    SUM(mo.customer_revenue) as total_revenue,
    AVG(mo.customer_revenue / mo.customer_orders) as avg_order_value,
    SUM(mo.customer_high_value_orders) as high_value_orders
FROM customers c
JOIN monthly_orders mo ON c.customer_id = mo.customer_id
WHERE c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.country_code, c.customer_tier, mo.order_year, mo.order_month
ORDER BY c.country_code, mo.order_year DESC, mo.order_month DESC;

Step 4: Consider a Materialized Summary Table

For queries that run frequently, consider pre-computing the results:

-- Create summary table updated nightly
CREATE TABLE customer_monthly_summary (
    customer_id BIGINT,
    country_code CHAR(2),
    customer_tier VARCHAR(20),
    order_year INT,
    order_month INT,
    order_count INT,
    total_revenue DECIMAL(12,2),
    high_value_orders INT,
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    PRIMARY KEY (customer_id, order_year, order_month),
    KEY idx_country_tier_date (country_code, customer_tier, order_year, order_month)
);

-- ETL process to populate (run nightly)
INSERT INTO customer_monthly_summary 
SELECT 
    c.customer_id,
    c.country_code,
    c.customer_tier,
    YEAR(o.order_date) as order_year,
    MONTH(o.order_date) as order_month,
    COUNT(o.order_id) as order_count,
    SUM(o.order_total) as total_revenue,
    SUM(CASE WHEN o.order_total > 200 THEN 1 ELSE 0 END) as high_value_orders,
    CURRENT_TIMESTAMP
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 25 MONTH)
    AND o.order_status = 'completed'
    AND c.customer_tier IN ('silver', 'gold', 'platinum')
GROUP BY c.customer_id, c.country_code, c.customer_tier, 
         YEAR(o.order_date), MONTH(o.order_date)
ON DUPLICATE KEY UPDATE
    order_count = VALUES(order_count),
    total_revenue = VALUES(total_revenue),
    high_value_orders = VALUES(high_value_orders),
    last_updated = VALUES(last_updated);

-- Now the report query is blazing fast
SELECT 
    country_code,
    customer_tier,
    CONCAT(order_year, '-', LPAD(order_month, 2, '0')) as order_month,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(order_count) as total_orders,
    SUM(total_revenue) as total_revenue,
    AVG(total_revenue / order_count) as avg_order_value,
    SUM(high_value_orders) as high_value_orders
FROM customer_monthly_summary
WHERE order_year * 100 + order_month >= 
      YEAR(DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH)) * 100 + 
      MONTH(DATE_SUB(CURRENT_DATE, INTERVAL 24 MONTH))
GROUP BY country_code, customer_tier, order_year, order_month
ORDER BY country_code, order_year DESC, order_month DESC;

This approach transforms a 45-second query into a sub-second response, with the trade-off being slightly stale data (updated nightly) and additional storage requirements.

Common Mistakes & Troubleshooting

Over-Indexing: The Performance Tax

One of the most common mistakes is creating too many indexes. Each index has costs:

-- Don't do this - too many overlapping indexes
CREATE INDEX idx1 ON orders (customer_id);
CREATE INDEX idx2 ON orders (customer_id, order_date);
CREATE INDEX idx3 ON orders (customer_id, order_date, order_status);
CREATE INDEX idx4 ON orders (customer_id, order_status);
CREATE INDEX idx5 ON orders (order_date);
CREATE INDEX idx6 ON orders (order_date, customer_id);

-- Better approach - fewer, more strategic indexes
CREATE INDEX idx_orders_primary ON orders (customer_id, order_date, order_status);
CREATE INDEX idx_orders_date_only ON orders (order_date) 
    WHERE order_date >= '2024-01-01'; -- filtered for recent data only

Monitor index usage regularly:

-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes 
WHERE idx_scan < 10;

-- SQL Server: Find unused indexes
SELECT 
    OBJECT_NAME(i.object_id) AS table_name,
    i.name AS index_name,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.user_seeks + s.user_scans + s.user_lookups = 0
    AND i.type > 0; -- Exclude heaps

Ignoring Data Skew and Statistics

Query optimizers rely on statistics to make decisions. Outdated or inaccurate statistics lead to poor execution plans:

-- Update statistics regularly, especially after large data loads
ANALYZE TABLE customers;
UPDATE STATISTICS orders WITH FULLSCAN;

-- Check for data skew that might affect optimizer decisions
SELECT 
    customer_tier,
    COUNT(*) as customer_count,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () as percentage
FROM customers 
GROUP BY customer_tier;

If you discover severe data skew (e.g., 95% of customers are 'bronze' tier), consider:

Creating filtered indexes for uncommon values
Using hints for queries targeting skewed data
Partitioning tables by skewed columns

Premature Optimization and Micro-Optimizations

Focus on the biggest performance bottlenecks first:

-- Don't spend time optimizing this if it runs once per day
SELECT COUNT(*) FROM archive_table WHERE created_date < '2020-01-01';

-- Focus on queries like this that run hundreds of times per minute
SELECT product_name, current_inventory 
FROM products 
WHERE product_id = ?;

Use query profiling tools to identify your actual performance bottlenecks:

-- Enable query logging to find slow queries
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;
SET GLOBAL log_queries_not_using_indexes = 'ON';

Neglecting Query Plan Stability

Query plans can change due to statistics updates, data growth, or database configuration changes. Monitor for plan regressions:

-- SQL Server: Enable Query Store for plan history
ALTER DATABASE YourDatabase SET QUERY_STORE = ON;

-- PostgreSQL: Use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;

-- Monitor for queries with degraded performance
SELECT 
    query,
    calls,
    total_time,
    total_time/calls as avg_time,
    rows,
    100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements 
ORDER BY total_time DESC 
LIMIT 10;

Monitoring and Maintenance Strategy

Performance tuning isn't a one-time activity—it requires ongoing monitoring and maintenance. Develop a systematic approach:

Proactive Monitoring

Set up alerts for key performance metrics:

-- Track query performance trends
SELECT 
    DATE(query_start_time) as query_date,
    AVG(duration_ms) as avg_duration,
    MAX(duration_ms) as max_duration,
    COUNT(*) as query_count
FROM query_performance_log
WHERE query_text LIKE '%orders%'
GROUP BY DATE(query_start_time)
ORDER BY query_date DESC;

-- Monitor index effectiveness
SELECT 
    table_name,
    index_name,
    reads_per_write,
    CASE 
        WHEN reads_per_write < 10 THEN 'Consider dropping'
        WHEN reads_per_write > 100 THEN 'Very effective'
        ELSE 'Moderate effectiveness'
    END as effectiveness
FROM (
    SELECT 
        table_name,
        index_name,
        CASE WHEN user_updates > 0 
             THEN (user_seeks + user_scans) / user_updates 
             ELSE user_seeks + user_scans 
        END as reads_per_write
    FROM index_usage_stats
) effectiveness_analysis;

Regular Maintenance Tasks

Establish routine maintenance procedures:

-- Weekly: Update statistics on high-churn tables
UPDATE STATISTICS orders WITH FULLSCAN;
UPDATE STATISTICS customers;

-- Monthly: Reorganize fragmented indexes
SELECT 
    table_name,
    index_name,
    avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED')
WHERE avg_fragmentation_in_percent > 30;

-- Quarterly: Review and optimize worst-performing queries
SELECT TOP 10
    query_hash,
    SUM(total_worker_time) as total_cpu,
    SUM(execution_count) as total_executions,
    SUM(total_worker_time) / SUM(execution_count) as avg_cpu_per_execution
FROM sys.dm_exec_query_stats
GROUP BY query_hash
ORDER BY total_cpu DESC;

Documentation and Knowledge Management

Maintain documentation of your optimization decisions:

-- Document index rationale
COMMENT ON INDEX idx_orders_customer_date IS 
'Created for daily sales reports. Covers 85% of order-related queries. 
Review if report requirements change.';

-- Track performance baselines
CREATE TABLE performance_baselines (
    query_id VARCHAR(100),
    baseline_date DATE,
    avg_duration_ms DECIMAL(10,2),
    query_description TEXT
);

This documentation proves invaluable when team members change or when revisiting optimization decisions months later.

Summary & Next Steps

Database performance tuning combines deep technical knowledge with practical problem-solving skills. You've learned to approach performance issues systematically: analyze execution plans to understand bottlenecks, design indexes strategically rather than reactively, and rewrite queries when structural changes yield better results than index optimization.

The key insights from this lesson:

Index design is an architectural decision, not a tactical one. Consider data access patterns, query workloads, and maintenance costs holistically rather than adding indexes to fix individual slow queries.

Query rewriting often outperforms index optimization for complex analytical queries. Don't hesitate to restructure queries fundamentally when performance demands it.

Database-specific features matter at the expert level. Each platform offers unique optimization opportunities that generic approaches miss.

Monitoring and maintenance are essential for sustained performance. Query plans change, data grows, and access patterns evolve—successful tuning requires ongoing attention.

Your next steps should focus on deepening expertise in your primary database platform while building experience with the systematic approach demonstrated here. Consider these advanced topics:

Distributed query optimization for sharded or federated databases
Columnar storage and analytical database design patterns
Query caching strategies and materialized view management
Performance testing frameworks for validating optimizations under realistic load

The skills you've developed here—reading execution plans, understanding optimizer behavior, and thinking systematically about performance trade-offs—transfer directly to these advanced scenarios. Start applying these techniques to your production systems, but remember: always test thoroughly and maintain rollback plans when implementing performance changes.

Database Performance Tuning: Advanced Indexing Strategies and Query Rewriting for Production Systems