NULL Handling in SQL: IS NULL, COALESCE, and NULLIF

You're analyzing a customer database when you discover that critical fields are missing values. Customer records have NULL email addresses, NULL phone numbers, and NULL registration dates. Your dashboard is showing misleading metrics because NULL values are being excluded from calculations. Your data pipeline is failing because downstream systems can't handle NULLs properly.

Welcome to one of the most fundamental challenges in data work: NULL handling. NULLs aren't just "missing data" — they represent unknown, undefined, or inapplicable values that can derail your analysis if you don't handle them correctly. Unlike empty strings or zeros, NULLs have unique behavior that trips up even experienced analysts.

In this lesson, you'll master the three essential SQL functions for NULL handling: IS NULL for detection, COALESCE for substitution, and NULLIF for controlled NULL creation. You'll learn when to use each function, how to combine them for complex data cleaning scenarios, and how to avoid the subtle bugs that NULL handling can introduce.

What you'll learn:

How to reliably detect NULL values using IS NULL and IS NOT NULL operators
When and how to use COALESCE to provide fallback values and clean messy data
How NULLIF creates controlled NULLs to handle edge cases and invalid data
Advanced patterns for combining these functions in real-world data cleaning scenarios
Performance considerations and common pitfalls when working with NULLs in production

Prerequisites

You should be comfortable with basic SQL SELECT statements, WHERE clauses, and understand the concept of NULL values. Familiarity with aggregate functions (COUNT, SUM, AVG) will help you understand the examples, but isn't required.

Understanding NULL Behavior in SQL

Before diving into NULL handling functions, you need to understand how NULLs behave differently from other values. This isn't academic — these behaviors cause real bugs in production systems.

NULL represents "unknown" or "not applicable," which means it has three-valued logic: true, false, or unknown. This creates counterintuitive behavior:

-- These all return NULL, not true or false
SELECT NULL = NULL;        -- NULL (not true!)
SELECT NULL <> 'hello';    -- NULL (not true!)
SELECT 5 + NULL;           -- NULL
SELECT 'Hello' || NULL;    -- NULL

This behavior affects your WHERE clauses in ways that catch developers off-guard:

-- Sample customer data
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    phone VARCHAR(20),
    loyalty_tier VARCHAR(20)
);

INSERT INTO customers VALUES 
(1, 'Sarah Johnson', 'sarah@example.com', '555-0101', 'Gold'),
(2, 'Mike Chen', 'mike@example.com', NULL, 'Silver'),
(3, 'Anna Rodriguez', NULL, '555-0103', 'Gold'),
(4, 'David Kim', 'david@example.com', '555-0104', NULL),
(5, 'Lisa Wong', NULL, NULL, 'Bronze');

-- This query excludes customers with NULL phones
SELECT name, phone FROM customers WHERE phone <> '555-0101';
-- Returns: Sarah Johnson, Anna Rodriguez, David Kim
-- Missing: Mike Chen, Lisa Wong (their NULLs didn't match the <> condition)

The customers with NULL phone numbers disappear from your results because NULL <> '555-0101' evaluates to NULL, which is treated as false in WHERE clauses.

Critical insight: NULL comparisons always return NULL, never true or false. This means standard comparison operators won't work for finding or excluding NULL values.

Detecting NULLs with IS NULL and IS NOT NULL

The IS NULL and IS NOT NULL operators are your primary tools for detecting missing data. Unlike comparison operators, these return true or false, not NULL.

Finding Missing Data

-- Find customers missing email addresses
SELECT customer_id, name, email 
FROM customers 
WHERE email IS NULL;

-- Results:
-- 3, Anna Rodriguez, NULL
-- 5, Lisa Wong, NULL

-- Find customers missing both email AND phone
SELECT customer_id, name, email, phone 
FROM customers 
WHERE email IS NULL AND phone IS NULL;

-- Results:
-- 5, Lisa Wong, NULL, NULL

Data Quality Analysis

IS NULL becomes powerful when you're auditing data quality across multiple fields:

-- Comprehensive data quality check
SELECT 
    customer_id,
    name,
    CASE WHEN email IS NULL THEN 'Missing Email' ELSE 'Has Email' END as email_status,
    CASE WHEN phone IS NULL THEN 'Missing Phone' ELSE 'Has Phone' END as phone_status,
    CASE WHEN loyalty_tier IS NULL THEN 'No Tier' ELSE loyalty_tier END as tier_status
FROM customers
ORDER BY customer_id;

For large datasets, you'll want aggregate views of missing data:

-- Summary of missing data by field
SELECT 
    COUNT(*) as total_customers,
    COUNT(email) as has_email,
    COUNT(*) - COUNT(email) as missing_email,
    ROUND((COUNT(*) - COUNT(email)) * 100.0 / COUNT(*), 1) as pct_missing_email,
    COUNT(phone) as has_phone,
    COUNT(*) - COUNT(phone) as missing_phone,
    ROUND((COUNT(*) - COUNT(phone)) * 100.0 / COUNT(*), 1) as pct_missing_phone
FROM customers;

-- Results show: 5 total, 3 have email (40% missing), 3 have phone (40% missing)

Pro tip: The COUNT() function automatically ignores NULL values, which is why COUNT(email) only counts non-NULL emails. Use COUNT(*) to count all rows including those with NULLs.

Excluding NULLs from Analysis

When you want complete records only:

-- Customers with complete contact information
SELECT customer_id, name, email, phone, loyalty_tier
FROM customers 
WHERE email IS NOT NULL 
  AND phone IS NOT NULL 
  AND loyalty_tier IS NOT NULL;

-- Results:
-- 1, Sarah Johnson, sarah@example.com, 555-0101, Gold

Providing Default Values with COALESCE

COALESCE returns the first non-NULL value from a list of expressions. It's your go-to function for providing fallback values and cleaning up messy data displays.

Basic COALESCE Usage

-- Provide fallback values for missing contact info
SELECT 
    customer_id,
    name,
    COALESCE(email, 'No email provided') as email_display,
    COALESCE(phone, 'No phone provided') as phone_display,
    COALESCE(loyalty_tier, 'Standard') as tier_display
FROM customers;

-- Results include:
-- 2, Mike Chen, mike@example.com, No phone provided, Silver
-- 3, Anna Rodriguez, No email provided, 555-0103, Gold
-- 5, Lisa Wong, No email provided, No phone provided, Bronze

Multi-Level Fallbacks

COALESCE really shines when you have multiple potential sources for the same information:

-- Extended customer table with backup contact methods
CREATE TABLE customer_contacts (
    customer_id INT,
    primary_email VARCHAR(100),
    backup_email VARCHAR(100),
    work_email VARCHAR(100),
    primary_phone VARCHAR(20),
    mobile_phone VARCHAR(20),
    emergency_contact VARCHAR(20)
);

INSERT INTO customer_contacts VALUES
(1, 'sarah@example.com', NULL, 'sarah@work.com', '555-0101', '555-1001', '555-2001'),
(2, NULL, 'mike.backup@example.com', NULL, NULL, '555-1002', '555-2002'),
(3, NULL, NULL, NULL, '555-0103', NULL, '555-2003'),
(4, 'david@example.com', 'david.alt@example.com', 'david@work.com', NULL, NULL, NULL);

-- Find the best available contact method for each customer
SELECT 
    customer_id,
    COALESCE(primary_email, backup_email, work_email, 'No email available') as best_email,
    COALESCE(primary_phone, mobile_phone, emergency_contact, 'No phone available') as best_phone
FROM customer_contacts;

-- Results:
-- 1, sarah@example.com, 555-0101
-- 2, mike.backup@example.com, 555-1002  
-- 3, No email available, 555-0103
-- 4, david@example.com, No phone available

COALESCE in Calculations

COALESCE becomes critical when you're doing calculations with potentially NULL values:

-- Order data with potential NULL values
CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    base_amount DECIMAL(10,2),
    tax_amount DECIMAL(10,2),
    shipping_cost DECIMAL(10,2),
    discount_amount DECIMAL(10,2)
);

INSERT INTO orders VALUES
(1001, 1, 100.00, 8.50, 15.00, NULL),
(1002, 2, 250.00, NULL, NULL, 25.00),
(1003, 3, 75.00, 6.25, 10.00, 5.00),
(1004, 4, 500.00, 42.50, NULL, NULL);

-- Calculate total order amount, treating NULLs as zero
SELECT 
    order_id,
    base_amount,
    COALESCE(tax_amount, 0) as tax_amount,
    COALESCE(shipping_cost, 0) as shipping_cost,
    COALESCE(discount_amount, 0) as discount_amount,
    base_amount + 
    COALESCE(tax_amount, 0) + 
    COALESCE(shipping_cost, 0) - 
    COALESCE(discount_amount, 0) as total_amount
FROM orders;

-- Results:
-- 1001, 100.00, 8.50, 15.00, 0.00, 123.50
-- 1002, 250.00, 0.00, 0.00, 25.00, 225.00
-- 1003, 75.00, 6.25, 10.00, 5.00, 86.25
-- 1004, 500.00, 42.50, 0.00, 0.00, 542.50

Without COALESCE, any NULL value would make the entire calculation NULL:

-- This fails - returns NULL for orders with any NULL components
SELECT 
    order_id,
    base_amount + tax_amount + shipping_cost - discount_amount as total_amount
FROM orders;

-- Results:
-- 1001, NULL (because discount_amount is NULL)
-- 1002, NULL (because tax_amount and shipping_cost are NULL)
-- 1003, 86.25 (only complete record)
-- 1004, NULL (because shipping_cost and discount_amount are NULL)

Creating Controlled NULLs with NULLIF

NULLIF returns NULL if two expressions are equal, otherwise returns the first expression. This might seem backwards, but it's incredibly useful for data cleaning and handling edge cases.

Converting Invalid Values to NULL

NULLIF excels at converting "placeholder" values into proper NULLs:

-- Raw data import with placeholder values
CREATE TABLE imported_customers (
    customer_id INT,
    name VARCHAR(100),
    email VARCHAR(100),
    phone VARCHAR(20),
    age INT,
    income DECIMAL(10,2)
);

INSERT INTO imported_customers VALUES
(1, 'Sarah Johnson', 'sarah@example.com', '555-0101', 28, 65000.00),
(2, 'Mike Chen', 'N/A', 'Unknown', 0, -1.00),
(3, 'Anna Rodriguez', '', '555-0103', 35, 0.00),
(4, 'David Kim', 'david@example.com', '555-0104', -999, 75000.00),
(5, 'Lisa Wong', 'TBD', 'NULL', 25, 999999.99);

-- Clean up the data by converting placeholders to NULL
SELECT 
    customer_id,
    name,
    NULLIF(NULLIF(NULLIF(email, 'N/A'), ''), 'TBD') as email,
    NULLIF(NULLIF(phone, 'Unknown'), 'NULL') as phone,
    NULLIF(NULLIF(age, 0), -999) as age,
    NULLIF(NULLIF(income, -1.00), 999999.99) as income
FROM imported_customers;

-- Results:
-- 1, Sarah Johnson, sarah@example.com, 555-0101, 28, 65000.00
-- 2, Mike Chen, NULL, NULL, NULL, NULL
-- 3, Anna Rodriguez, NULL, 555-0103, 35, 0.00
-- 4, David Kim, david@example.com, 555-0104, NULL, 75000.00  
-- 5, Lisa Wong, NULL, NULL, 25, NULL

You can chain NULLIF calls to handle multiple placeholder values, but a CASE statement is often clearer for complex scenarios:

-- More readable approach for multiple placeholder values
SELECT 
    customer_id,
    name,
    CASE 
        WHEN email IN ('N/A', '', 'TBD', 'Unknown') THEN NULL 
        ELSE email 
    END as email,
    CASE 
        WHEN phone IN ('Unknown', 'NULL', '', 'N/A') THEN NULL 
        ELSE phone 
    END as phone,
    CASE 
        WHEN age <= 0 OR age = -999 THEN NULL 
        ELSE age 
    END as age
FROM imported_customers;

Avoiding Division by Zero

NULLIF prevents division by zero errors by converting zero denominators to NULL:

-- Sales performance metrics
CREATE TABLE sales_reps (
    rep_id INT,
    rep_name VARCHAR(100),
    calls_made INT,
    meetings_scheduled INT,
    deals_closed INT
);

INSERT INTO sales_reps VALUES
(1, 'Alex Turner', 150, 45, 12),
(2, 'Jessica Park', 200, 60, 18),
(3, 'Robert Davis', 0, 0, 0),    -- New hire, no activity yet
(4, 'Maria Garcia', 175, 0, 0),  -- Made calls but no meetings
(5, 'James Wilson', 125, 35, 8);

-- Calculate conversion rates, avoiding division by zero
SELECT 
    rep_name,
    calls_made,
    meetings_scheduled,
    deals_closed,
    ROUND(meetings_scheduled * 100.0 / NULLIF(calls_made, 0), 1) as meeting_conversion_rate,
    ROUND(deals_closed * 100.0 / NULLIF(meetings_scheduled, 0), 1) as close_rate
FROM sales_reps;

-- Results:
-- Alex Turner, 150, 45, 12, 30.0, 26.7
-- Jessica Park, 200, 60, 18, 30.0, 30.0  
-- Robert Davis, 0, 0, 0, NULL, NULL
-- Maria Garcia, 175, 0, 0, 0.0, NULL
-- James Wilson, 125, 35, 8, 28.0, 22.9

Without NULLIF, you'd get division by zero errors that crash your query.

Data Validation and Cleaning

NULLIF helps identify and clean invalid data patterns:

-- Product data with potential quality issues
CREATE TABLE products (
    product_id INT,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10,2),
    weight_kg DECIMAL(8,3)
);

INSERT INTO products VALUES
(1, 'Laptop Computer', 'Electronics', 999.99, 2.500),
(2, 'Coffee Mug', 'Kitchenware', 15.99, 0.350),
(3, 'Digital Camera', 'Electronics', 0.00, 0.000),  -- Price/weight errors
(4, 'Notebook', 'Office', 5.99, 0.200),
(5, 'Smartphone', 'Electronics', 599.99, 0.000);   -- Weight error

-- Identify products with data quality issues
SELECT 
    product_id,
    product_name,
    price,
    weight_kg,
    CASE 
        WHEN NULLIF(price, 0.00) IS NULL THEN 'Missing Price'
        WHEN NULLIF(weight_kg, 0.000) IS NULL THEN 'Missing Weight'
        ELSE 'OK'
    END as data_quality_flag
FROM products;

-- Results flag products 3 and 5 as having data quality issues

Combining NULL Handling Functions

Real-world data cleaning often requires combining these functions to handle complex scenarios. Here are patterns you'll use repeatedly in production.

The NULL Hierarchy Pattern

When you have multiple potential sources and want to establish a clear priority:

-- Customer contact preferences with fallback logic
CREATE TABLE customer_contact_prefs (
    customer_id INT,
    preferred_email VARCHAR(100),
    work_email VARCHAR(100),
    personal_email VARCHAR(100),
    sms_number VARCHAR(20),
    home_phone VARCHAR(20),
    contact_preference VARCHAR(20)
);

INSERT INTO customer_contact_prefs VALUES
(1, 'sarah.pref@example.com', 'sarah@work.com', 'sarah.personal@gmail.com', '555-1001', '555-2001', 'email'),
(2, NULL, NULL, 'mike@gmail.com', '555-1002', NULL, 'sms'),
(3, NULL, 'anna@work.com', NULL, NULL, '555-2003', 'phone'),
(4, 'david@example.com', NULL, NULL, NULL, NULL, 'email'),
(5, '', '', '', '', '', 'email');  -- Empty strings need cleaning

-- Determine best contact method based on preference and availability
SELECT 
    customer_id,
    contact_preference,
    CASE contact_preference
        WHEN 'email' THEN 
            COALESCE(
                NULLIF(preferred_email, ''), 
                NULLIF(work_email, ''), 
                NULLIF(personal_email, ''),
                'No email available'
            )
        WHEN 'sms' THEN 
            COALESCE(
                NULLIF(sms_number, ''), 
                NULLIF(home_phone, ''),
                'No phone available'
            )
        WHEN 'phone' THEN 
            COALESCE(
                NULLIF(home_phone, ''), 
                NULLIF(sms_number, ''),
                'No phone available'
            )
        ELSE 'Unknown preference'
    END as contact_method
FROM customer_contact_prefs;

-- Results:
-- 1, email, sarah.pref@example.com
-- 2, sms, 555-1002
-- 3, phone, 555-2003
-- 4, email, david@example.com
-- 5, email, No email available

Data Standardization Pattern

Combining all three functions to clean and standardize messy data:

-- Survey responses with inconsistent data entry
CREATE TABLE survey_responses (
    response_id INT,
    age_group VARCHAR(20),
    income_range VARCHAR(30),
    satisfaction_score VARCHAR(10)
);

INSERT INTO survey_responses VALUES
(1, '25-34', '$50,000-$75,000', '8'),
(2, 'N/A', 'Prefer not to say', '10'),
(3, '', '$25,000-$50,000', 'N/A'),
(4, '35-44', '0', '7'),
(5, 'Unknown', 'NULL', ''),
(6, '45-54', '$75,000-$100,000', '99');  -- Invalid satisfaction score

-- Standardize and clean the survey data
SELECT 
    response_id,
    -- Clean age_group: convert various "unknown" values to NULL
    CASE 
        WHEN NULLIF(TRIM(age_group), '') IN ('N/A', 'Unknown', 'NULL') THEN NULL
        ELSE NULLIF(TRIM(age_group), '')
    END as age_group_clean,
    
    -- Clean income_range: standardize "prefer not to say" and invalid entries
    CASE 
        WHEN NULLIF(TRIM(income_range), '') IN ('Prefer not to say', 'NULL', '0') THEN NULL
        ELSE NULLIF(TRIM(income_range), '')
    END as income_range_clean,
    
    -- Clean satisfaction_score: convert to integer, handle invalid values
    CASE 
        WHEN NULLIF(TRIM(satisfaction_score), '') IN ('N/A', 'NULL') THEN NULL
        WHEN CAST(NULLIF(TRIM(satisfaction_score), '') AS INT) > 10 THEN NULL
        ELSE CAST(NULLIF(TRIM(satisfaction_score), '') AS INT)
    END as satisfaction_score_clean,
    
    -- Provide a data quality assessment
    CASE 
        WHEN COALESCE(
            NULLIF(TRIM(age_group), ''),
            NULLIF(TRIM(income_range), ''),
            NULLIF(TRIM(satisfaction_score), '')
        ) IS NULL THEN 'Incomplete Response'
        ELSE 'Complete Response'
    END as response_quality
FROM survey_responses;

Hands-On Exercise: Customer Data Cleanup Pipeline

Let's build a comprehensive data cleanup pipeline that demonstrates all the NULL handling techniques in a realistic scenario. You're working with a customer database that's been populated from multiple sources with inconsistent data entry practices.

-- Create the messy customer dataset
CREATE TABLE raw_customer_data (
    import_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email_address VARCHAR(100),
    phone_number VARCHAR(20),
    date_of_birth DATE,
    annual_income DECIMAL(12,2),
    employment_status VARCHAR(30),
    credit_score INT,
    last_login_date DATE,
    account_balance DECIMAL(12,2),
    marketing_opt_in VARCHAR(10)
);

-- Insert messy data that represents real-world import scenarios
INSERT INTO raw_customer_data VALUES
-- Good records
(1, 'Sarah Johnson', 'sarah.johnson@email.com', '555-123-4567', '1985-03-15', 65000.00, 'Employed', 750, '2023-12-01', 1250.50, 'Yes'),
(2, 'Michael Chen', 'mike.chen@gmail.com', '555-987-6543', '1990-07-22', 82000.00, 'Self-employed', 720, '2023-11-28', -150.75, 'No'),

-- Records with various NULL/empty patterns
(3, 'Anna Rodriguez', '', '555-555-0123', NULL, NULL, 'Unknown', 0, NULL, 0.00, ''),
(4, 'David Kim', 'david@example.com', 'N/A', '1988-12-03', -1.00, 'Retired', 999, '1900-01-01', 999999.99, 'TRUE'),
(5, '', 'lisa.wong@email.com', 'NULL', '2000-01-01', 0.00, '', 680, NULL, NULL, 'FALSE'),
(6, 'Robert Taylor', 'INVALID', '000-000-0000', '1975-05-18', 95000.00, 'Employed', 800, '2023-12-02', 2500.00, 'Y'),
(7, 'Jennifer White', 'jenny@email.com', '555-111-2222', NULL, 55000.00, 'Student', -999, '2023-11-30', 0, 'N'),
(8, 'TBD', 'TBD', 'TBD', '1980-01-01', 999999.00, 'TBD', 500, '2023-12-01', -999999.99, 'TBD');

-- Your task: Create a comprehensive cleanup query
SELECT 
    import_id,
    
    -- Step 1: Clean customer names
    CASE 
        WHEN NULLIF(TRIM(customer_name), '') IN ('TBD', 'Unknown', 'NULL') THEN NULL
        ELSE NULLIF(TRIM(customer_name), '')
    END as clean_customer_name,
    
    -- Step 2: Validate and clean email addresses (basic validation)
    CASE 
        WHEN NULLIF(TRIM(email_address), '') IN ('TBD', 'Unknown', 'NULL', 'INVALID', 'N/A') THEN NULL
        WHEN NULLIF(TRIM(email_address), '') NOT LIKE '%@%' THEN NULL
        ELSE NULLIF(TRIM(email_address), '')
    END as clean_email,
    
    -- Step 3: Clean phone numbers
    CASE 
        WHEN NULLIF(TRIM(phone_number), '') IN ('TBD', 'Unknown', 'NULL', 'N/A', '000-000-0000') THEN NULL
        ELSE NULLIF(TRIM(phone_number), '')
    END as clean_phone,
    
    -- Step 4: Handle invalid dates (assuming anything before 1930 or after today is invalid)
    CASE 
        WHEN date_of_birth < '1930-01-01' THEN NULL
        WHEN date_of_birth > CURRENT_DATE THEN NULL
        ELSE date_of_birth
    END as clean_date_of_birth,
    
    -- Step 5: Clean income (negative values and obvious placeholders become NULL)
    NULLIF(NULLIF(NULLIF(annual_income, -1.00), 0.00), 999999.00) as clean_annual_income,
    
    -- Step 6: Standardize employment status
    CASE 
        WHEN NULLIF(TRIM(employment_status), '') IN ('TBD', 'Unknown', 'NULL', '') THEN NULL
        ELSE NULLIF(TRIM(employment_status), '')
    END as clean_employment_status,
    
    -- Step 7: Clean credit scores (0, negative, or obvious placeholders)
    CASE 
        WHEN credit_score <= 0 OR credit_score = 999 OR credit_score = -999 THEN NULL
        WHEN credit_score > 850 THEN NULL  -- Scores typically max at 850
        ELSE credit_score
    END as clean_credit_score,
    
    -- Step 8: Handle invalid login dates
    CASE 
        WHEN last_login_date < '2020-01-01' THEN NULL  -- Assume system started in 2020
        WHEN last_login_date > CURRENT_DATE THEN NULL
        ELSE last_login_date
    END as clean_last_login,
    
    -- Step 9: Clean account balances (remove obvious placeholders)
    NULLIF(NULLIF(annual_income, 999999.99), -999999.99) as clean_account_balance,
    
    -- Step 10: Standardize boolean opt-in values
    CASE 
        WHEN UPPER(TRIM(marketing_opt_in)) IN ('YES', 'TRUE', 'Y', '1') THEN 'Yes'
        WHEN UPPER(TRIM(marketing_opt_in)) IN ('NO', 'FALSE', 'N', '0') THEN 'No'
        ELSE NULL
    END as clean_marketing_opt_in,
    
    -- Add data quality indicators
    CASE 
        WHEN NULLIF(TRIM(customer_name), '') IS NULL THEN 'Missing Name'
        WHEN NULLIF(TRIM(email_address), '') IS NULL THEN 'Missing Email'
        ELSE 'Complete Core Data'
    END as data_quality_flag,
    
    -- Count of NULL fields after cleaning
    (CASE WHEN NULLIF(TRIM(customer_name), '') IN ('TBD', 'Unknown', 'NULL') OR NULLIF(TRIM(customer_name), '') IS NULL THEN 1 ELSE 0 END +
     CASE WHEN NULLIF(TRIM(email_address), '') IN ('TBD', 'Unknown', 'NULL', 'INVALID', 'N/A') OR NULLIF(TRIM(email_address), '') NOT LIKE '%@%' OR NULLIF(TRIM(email_address), '') IS NULL THEN 1 ELSE 0 END +
     CASE WHEN NULLIF(TRIM(phone_number), '') IN ('TBD', 'Unknown', 'NULL', 'N/A', '000-000-0000') OR NULLIF(TRIM(phone_number), '') IS NULL THEN 1 ELSE 0 END
    ) as missing_contact_fields
    
FROM raw_customer_data
ORDER BY import_id;

Now, create a summary report showing the cleanup results:

-- Cleanup effectiveness summary
WITH cleaned_data AS (
    -- [Insert the cleanup query from above here - abbreviated for space]
    SELECT import_id, 
           CASE WHEN NULLIF(TRIM(customer_name), '') IN ('TBD', 'Unknown', 'NULL') THEN NULL ELSE NULLIF(TRIM(customer_name), '') END as clean_customer_name,
           CASE WHEN NULLIF(TRIM(email_address), '') IN ('TBD', 'Unknown', 'NULL', 'INVALID', 'N/A') OR NULLIF(TRIM(email_address), '') NOT LIKE '%@%' THEN NULL ELSE NULLIF(TRIM(email_address), '') END as clean_email
    FROM raw_customer_data
)
SELECT 
    'Original Data' as dataset,
    COUNT(*) as total_records,
    COUNT(customer_name) as has_name,
    COUNT(email_address) as has_email,
    ROUND(COUNT(customer_name) * 100.0 / COUNT(*), 1) as pct_complete_name,
    ROUND(COUNT(email_address) * 100.0 / COUNT(*), 1) as pct_complete_email
FROM raw_customer_data

UNION ALL

SELECT 
    'Cleaned Data' as dataset,
    COUNT(*) as total_records,
    COUNT(clean_customer_name) as has_name,
    COUNT(clean_email) as has_email,
    ROUND(COUNT(clean_customer_name) * 100.0 / COUNT(*), 1) as pct_complete_name,
    ROUND(COUNT(clean_email) * 100.0 / COUNT(*), 1) as pct_complete_email
FROM cleaned_data;

This exercise demonstrates how NULL handling functions work together to solve real data quality problems. The key insight is that data cleaning is rarely about just one function — it's about combining IS NULL detection, COALESCE fallbacks, and NULLIF conversions to create robust data pipelines.

Common Mistakes & Troubleshooting

Here are the most frequent NULL handling mistakes that cause production issues, along with how to diagnose and fix them:

Mistake 1: Using = NULL Instead of IS NULL

-- WRONG: This never finds NULL values
SELECT * FROM customers WHERE email = NULL;  -- Returns 0 rows always

-- CORRECT: Use IS NULL
SELECT * FROM customers WHERE email IS NULL;

Debugging tip: If your NULL-finding queries return zero rows unexpectedly, check for = NULL comparisons.

Mistake 2: Forgetting About Three-Valued Logic

-- DANGEROUS: This excludes NULL values silently
SELECT customer_id, loyalty_tier 
FROM customers 
WHERE loyalty_tier <> 'Bronze';

-- SAFER: Be explicit about NULL handling
SELECT customer_id, loyalty_tier 
FROM customers 
WHERE loyalty_tier <> 'Bronze' OR loyalty_tier IS NULL;

Debugging tip: When row counts don't match expectations, check if NULLs are being excluded by comparison operators.

Mistake 3: COALESCE Type Mismatches

-- PROBLEMATIC: Mixing incompatible types
SELECT 
    customer_id,
    COALESCE(email, phone, customer_id) as contact_method  -- Error or unexpected behavior
FROM customers;

-- BETTER: Ensure compatible types
SELECT 
    customer_id,
    COALESCE(email, phone, 'No contact info') as contact_method
FROM customers;

Mistake 4: Not Handling Empty Strings vs NULLs

-- INCOMPLETE: Only handles NULLs, not empty strings
SELECT COALESCE(email, 'No email') FROM customers;

-- COMPLETE: Handles both NULLs and empty strings
SELECT COALESCE(NULLIF(TRIM(email), ''), 'No email') FROM customers;

Mistake 5: Performance Issues with Complex COALESCE

-- SLOW: Multiple subqueries in COALESCE
SELECT 
    customer_id,
    COALESCE(
        (SELECT email FROM customer_contacts WHERE type = 'primary'),
        (SELECT email FROM customer_contacts WHERE type = 'backup'),
        (SELECT email FROM customer_contacts WHERE type = 'work')
    ) as best_email
FROM customers;

-- FASTER: Join first, then COALESCE
WITH contact_pivot AS (
    SELECT 
        customer_id,
        MAX(CASE WHEN type = 'primary' THEN email END) as primary_email,
        MAX(CASE WHEN type = 'backup' THEN email END) as backup_email,
        MAX(CASE WHEN type = 'work' THEN email END) as work_email
    FROM customer_contacts
    GROUP BY customer_id
)
SELECT 
    c.customer_id,
    COALESCE(cp.primary_email, cp.backup_email, cp.work_email, 'No email') as best_email
FROM customers c
LEFT JOIN contact_pivot cp ON c.customer_id = cp.customer_id;

Troubleshooting NULL Behavior

When NULL handling isn't working as expected, use these diagnostic queries:

-- Debug: Show exact values including NULLs and empty strings
SELECT 
    customer_id,
    email,
    CASE 
        WHEN email IS NULL THEN 'IS NULL'
        WHEN email = '' THEN 'EMPTY STRING'
        WHEN TRIM(email) = '' THEN 'WHITESPACE ONLY'
        ELSE 'HAS VALUE: ' || email
    END as email_diagnosis
FROM customers;

-- Debug: Test your COALESCE logic step by step
SELECT 
    customer_id,
    primary_email,
    CASE WHEN primary_email IS NULL THEN 'NULL' ELSE 'NOT NULL' END as primary_check,
    backup_email,
    CASE WHEN backup_email IS NULL THEN 'NULL' ELSE 'NOT NULL' END as backup_check,
    COALESCE(primary_email, backup_email, 'DEFAULT') as coalesce_result
FROM customer_contacts;

Summary & Next Steps

You now have mastery over SQL's three essential NULL handling functions:

IS NULL / IS NOT NULL for reliable detection of missing data — remember that standard comparison operators won't work with NULLs due to three-valued logic.

COALESCE for providing fallback values and creating robust calculations. Use it whenever you need the first non-NULL value from a list, especially in calculations where NULLs would break your math.

NULLIF for converting problematic values to NULL, particularly useful for handling "placeholder" values in imported data and preventing division by zero errors.

The real power comes from combining these functions in data cleaning pipelines. You'll use patterns like COALESCE(NULLIF(TRIM(field), ''), 'Default Value') constantly to handle both NULL and empty string scenarios.

Key takeaways for production use:

Always be explicit about how you want NULLs handled rather than letting them be excluded silently
Use NULLIF to clean data at the source rather than trying to fix it downstream
Consider performance implications when using complex COALESCE expressions
Test your NULL handling logic with realistic messy data, not just clean examples

Next steps to deepen your skills:

Window Functions with NULLs: Learn how functions like LAG(), LEAD(), and FIRST_VALUE() handle NULLs in time series data
Advanced CASE Statements: Move beyond simple COALESCE to complex conditional logic for data validation
Database-Specific NULL Functions: Explore functions like ISNULL (SQL Server), NVL (Oracle), and IFNULL (MySQL) for cross-platform compatibility
NULL Handling in Joins: Understand how NULLs affect JOIN behavior and when to use techniques like FULL OUTER JOIN

The principles you've learned here form the foundation for all advanced data cleaning and validation work. Practice these patterns with your own messy datasets — you'll be surprised how often you need exactly these techniques in real-world data projects.