Master SQL Aggregate Functions: GROUP BY, HAVING, COUNT, SUM, AVG

SQL🌱 Foundation10 min readMay 6, 2026Updated May 6, 2026

Table of Contents

SQL Aggregate Functions: GROUP BY, HAVING, COUNT, SUM, AVG

Imagine you're analyzing sales data for a retail company. You have a database table with thousands of individual transactions, but your manager doesn't want to see every single purchase. They want answers to questions like: "What's our average order value by region?" or "Which product categories generated the most revenue last quarter?"

This is where SQL aggregate functions become essential. Instead of looking at individual rows, you need to summarize and group your data to find meaningful patterns and insights.

By the end of this lesson, you'll transform from someone who can only query individual records into someone who can slice and dice data to answer real business questions. You'll understand how to group related data together and perform calculations across those groups.

What you'll learn:

How to use GROUP BY to organize data into meaningful categories
When and how to apply COUNT, SUM, and AVG aggregate functions
How to filter grouped results with the HAVING clause
The difference between WHERE and HAVING (a common source of confusion)
How to combine multiple aggregate functions in a single query
Common pitfalls and how to avoid them

Prerequisites

You should be comfortable with basic SQL SELECT statements, including WHERE clauses for filtering individual rows. If you can write queries like SELECT * FROM customers WHERE city = 'Chicago', you're ready to proceed.

Understanding the Problem: When Individual Rows Aren't Enough

Let's start with a realistic scenario. You're working with an e-commerce database that contains an orders table with the following structure:

CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    product_category VARCHAR(50),
    region VARCHAR(50),
    order_amount DECIMAL(10,2),
    order_date DATE
);

Here's some sample data:

INSERT INTO orders VALUES
(1001, 501, 'Electronics', 'North', 899.99, '2024-01-15'),
(1002, 502, 'Clothing', 'South', 149.50, '2024-01-16'),
(1003, 503, 'Electronics', 'North', 299.99, '2024-01-16'),
(1004, 501, 'Books', 'North', 79.99, '2024-01-17'),
(1005, 504, 'Clothing', 'West', 199.99, '2024-01-17'),
(1006, 505, 'Electronics', 'South', 549.99, '2024-01-18'),
(1007, 502, 'Books', 'South', 39.99, '2024-01-18'),
(1008, 506, 'Clothing', 'East', 89.99, '2024-01-19');

A basic SELECT statement shows you every individual order:

SELECT * FROM orders;

But what if your manager asks: "How much revenue did each region generate?" Looking at eight individual rows won't give you a quick answer. You need to group the data by region and sum up the amounts.

Your First Aggregate Function: COUNT

Let's start with the simplest aggregate function: COUNT. This function tells you how many rows match your criteria.

SELECT COUNT(*) FROM orders;

This returns 8 because there are 8 orders in total. The asterisk (*) means "count all rows."

But COUNT becomes much more powerful when combined with GROUP BY. Let's count how many orders each region received:

SELECT region, COUNT(*) 
FROM orders 
GROUP BY region;

Results:

region | count
-------|------
North  |   3
South  |   3  
West   |   1
East   |   1

Here's what's happening step by step:

GROUP BY region tells SQL to organize all rows into groups based on their region value
COUNT(*) counts how many rows are in each group
SELECT region includes the region name in the output so you know which count belongs to which region

Important: When you use GROUP BY, every column in your SELECT clause must either be in the GROUP BY clause or be an aggregate function. You can't mix grouped and non-grouped columns.

Working with SUM: Adding Up Values

COUNT tells you "how many," but SUM tells you "how much." Let's calculate the total revenue by region:

SELECT region, SUM(order_amount) as total_revenue
FROM orders 
GROUP BY region;

Results:

region | total_revenue
-------|-------------
North  |     1279.97
South  |      739.48
West   |      199.99
East   |       89.99

The as total_revenue part creates an alias—a more readable name for the calculated column. Without it, you'd see something like sum(order_amount) as the column header.

You can also combine multiple aggregate functions:

SELECT region, 
       COUNT(*) as order_count,
       SUM(order_amount) as total_revenue
FROM orders 
GROUP BY region;

This shows both how many orders each region had and their total value.

Calculating Averages with AVG

AVG calculates the arithmetic mean of a numeric column. Let's find the average order value by region:

SELECT region, 
       AVG(order_amount) as avg_order_value
FROM orders 
GROUP BY region;

Results:

region | avg_order_value
-------|---------------
North  |      426.66
South  |      246.49
West   |      199.99
East   |       89.99

Now you can see that while the North region has the same number of orders as the South, their average order value is much higher.

Tip: AVG automatically ignores NULL values. If you have 3 orders with values 100, 200, and NULL, AVG will return 150 (the average of 100 and 200), not 100.

Combining Multiple Aggregate Functions

Real analysis often requires multiple metrics. Let's create a comprehensive regional summary:

SELECT region,
       COUNT(*) as order_count,
       SUM(order_amount) as total_revenue,
       AVG(order_amount) as avg_order_value,
       MIN(order_amount) as smallest_order,
       MAX(order_amount) as largest_order
FROM orders 
GROUP BY region
ORDER BY total_revenue DESC;

This query gives you a complete picture of each region's performance, sorted by total revenue from highest to lowest.

Filtering Groups with HAVING

Sometimes you want to filter your results after grouping. This is where HAVING comes in. HAVING is to GROUP BY what WHERE is to individual rows.

Let's say you only want to see regions with more than 2 orders:

SELECT region, 
       COUNT(*) as order_count,
       SUM(order_amount) as total_revenue
FROM orders 
GROUP BY region
HAVING COUNT(*) > 2;

Results:

region | order_count | total_revenue
-------|-------------|-------------
North  |           3 |     1279.97
South  |           3 |      739.48

The West and East regions are filtered out because they each have only 1 order.

WHERE vs. HAVING: A Critical Distinction

This is one of the most confusing aspects for beginners. Here's the key difference:

WHERE filters individual rows before grouping
HAVING filters groups after grouping and aggregation

Let's see this in action. First, let's filter orders before grouping:

SELECT region, COUNT(*) as order_count
FROM orders 
WHERE order_amount > 100  -- Filter individual orders first
GROUP BY region;

This counts orders by region, but only includes orders over $100 in the count.

Now let's filter groups after aggregation:

SELECT region, COUNT(*) as order_count
FROM orders 
GROUP BY region
HAVING COUNT(*) > 1;  -- Filter groups after counting

This shows all regions that have more than 1 order, regardless of order amounts.

You can use both in the same query:

SELECT region, 
       COUNT(*) as order_count,
       AVG(order_amount) as avg_order_value
FROM orders 
WHERE order_date >= '2024-01-17'  -- Only recent orders
GROUP BY region
HAVING COUNT(*) >= 2              -- Only regions with 2+ recent orders
ORDER BY avg_order_value DESC;

Grouping by Multiple Columns

You can group by more than one column to create more detailed breakdowns. Let's analyze revenue by both region and product category:

SELECT region, 
       product_category,
       COUNT(*) as order_count,
       SUM(order_amount) as total_revenue
FROM orders 
GROUP BY region, product_category
ORDER BY region, product_category;

Results:

region | product_category | order_count | total_revenue
-------|------------------|-------------|-------------
East   | Clothing         |           1 |        89.99
North  | Books            |           1 |        79.99
North  | Electronics      |           2 |      1199.98
South  | Books            |           1 |        39.99
South  | Clothing         |           1 |       149.50
South  | Electronics      |           1 |       549.99
West   | Clothing         |           1 |       199.99

This gives you a much more granular view of your business. You can see, for example, that Electronics performs very well in the North region but appears only once in the South.

Working with Dates and Time Periods

Grouping by dates is incredibly common in business analysis. Let's group orders by month:

SELECT DATE_TRUNC('month', order_date) as order_month,
       COUNT(*) as order_count,
       SUM(order_amount) as monthly_revenue
FROM orders 
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY order_month;

Note: The DATE_TRUNC function syntax varies by database system. In MySQL, you might use DATE_FORMAT(order_date, '%Y-%m') instead. In SQL Server, you could use FORMAT(order_date, 'yyyy-MM').

Hands-On Exercise

Now it's your turn to practice. Using the orders table we've been working with, write queries to answer these business questions:

Customer Analysis: Which customer (by customer_id) has placed the most orders, and what's their total spending?
Product Performance: What's the average order value for each product category? Which category has the highest average?
Regional Trends: Show regions that have an average order value above $200, including their order count and total revenue.
Category-Region Analysis: Find all region-category combinations that have generated more than $500 in total revenue.

Take a moment to work through these before looking at the solutions below.

Solutions:

Customer Analysis:

SELECT customer_id,
       COUNT(*) as order_count,
       SUM(order_amount) as total_spending
FROM orders 
GROUP BY customer_id
ORDER BY order_count DESC, total_spending DESC;

Product Performance:

SELECT product_category,
       AVG(order_amount) as avg_order_value,
       COUNT(*) as order_count
FROM orders 
GROUP BY product_category
ORDER BY avg_order_value DESC;

Regional Trends:

SELECT region,
       COUNT(*) as order_count,
       SUM(order_amount) as total_revenue,
       AVG(order_amount) as avg_order_value
FROM orders 
GROUP BY region
HAVING AVG(order_amount) > 200
ORDER BY avg_order_value DESC;

Category-Region Analysis:

SELECT region,
       product_category,
       SUM(order_amount) as total_revenue,
       COUNT(*) as order_count
FROM orders 
GROUP BY region, product_category
HAVING SUM(order_amount) > 500
ORDER BY total_revenue DESC;

Common Mistakes & Troubleshooting

Mistake #1: Mixing grouped and non-grouped columns

This query will fail:

SELECT region, customer_id, COUNT(*)
FROM orders 
GROUP BY region;

Error: You can't select customer_id because it's not in the GROUP BY clause. Each region group contains multiple customer_ids, so SQL doesn't know which one to display.

Fix: Either add customer_id to GROUP BY or remove it from SELECT.

Mistake #2: Using WHERE instead of HAVING for aggregate conditions

This won't work as expected:

SELECT region, COUNT(*) as order_count
FROM orders 
WHERE COUNT(*) > 2  -- Wrong!
GROUP BY region;

Error: WHERE is evaluated before grouping, so COUNT(*) doesn't exist yet.

Fix: Use HAVING:

SELECT region, COUNT(*) as order_count
FROM orders 
GROUP BY region
HAVING COUNT(*) > 2;  -- Correct!

Mistake #3: Forgetting that COUNT(*) includes all rows, even those with NULL values

If you want to count only non-NULL values in a specific column, use COUNT(column_name) instead of COUNT(*).

Mistake #4: Not handling NULL values in aggregate functions

SUM, AVG, MIN, and MAX ignore NULL values, but this might not be what you expect. If you need to include NULLs in calculations, use functions like COALESCE to convert them to zero first:

SELECT region, 
       AVG(COALESCE(order_amount, 0)) as avg_including_nulls
FROM orders 
GROUP BY region;

Summary & Next Steps

You've now mastered the fundamental aggregate functions that turn raw data into business insights:

COUNT() tells you how many records exist in each group
SUM() adds up numeric values across groups
AVG() calculates the mean of numeric values
GROUP BY organizes your data into meaningful categories
HAVING filters groups based on aggregate conditions
WHERE vs. HAVING - WHERE filters before grouping, HAVING filters after

These functions form the backbone of data analysis in SQL. You can now answer questions like "What's our revenue by region?" or "Which product categories are performing best?" - the types of queries that drive business decisions.

Next steps in your SQL journey:

Learn window functions for more advanced analytics
Explore subqueries to combine aggregate results with detailed data
Study JOIN operations to analyze data across multiple tables
Practice with date/time functions for time-series analysis

Start applying these concepts to your own datasets. The more you practice grouping and aggregating real data, the more natural these patterns will become. Soon you'll be the person others come to when they need to "make sense of all this data."

Learning Path: SQL Fundamentals

Master SQL Aggregate Functions: GROUP BY, HAVING, COUNT, SUM, AVG

SQL Aggregate Functions: GROUP BY, HAVING, COUNT, SUM, AVG

Prerequisites

Understanding the Problem: When Individual Rows Aren't Enough

Your First Aggregate Function: COUNT

Working with SUM: Adding Up Values

Calculating Averages with AVG

Combining Multiple Aggregate Functions

Filtering Groups with HAVING

WHERE vs. HAVING: A Critical Distinction

Grouping by Multiple Columns

Working with Dates and Time Periods

Hands-On Exercise

Common Mistakes & Troubleshooting

Summary & Next Steps

Related Articles

Temporal Data Mastery: Writing Queries for Time-Series, Date Ranges, and Slowly Changing Dimensions

Subqueries and Correlated Subqueries: Writing Queries Within Queries

Database Performance Tuning: Advanced Indexing Strategies and Query Rewriting for Production Systems

Related Articles

SQL⚡ Practitioner
Temporal Data Mastery: Writing Queries for Time-Series, Date Ranges, and Slowly Changing Dimensions
21 min

SQL🌱 Foundation
Subqueries and Correlated Subqueries: Writing Queries Within Queries
15 min

SQL🔥 Expert
Database Performance Tuning: Advanced Indexing Strategies and Query Rewriting for Production Systems
20 min