
Imagine you're analyzing sales data for a retail company. You have a database table with thousands of individual transactions, but your manager doesn't want to see every single purchase. They want answers to questions like: "What's our average order value by region?" or "Which product categories generated the most revenue last quarter?"
This is where SQL aggregate functions become essential. Instead of looking at individual rows, you need to summarize and group your data to find meaningful patterns and insights.
By the end of this lesson, you'll transform from someone who can only query individual records into someone who can slice and dice data to answer real business questions. You'll understand how to group related data together and perform calculations across those groups.
What you'll learn:
You should be comfortable with basic SQL SELECT statements, including WHERE clauses for filtering individual rows. If you can write queries like SELECT * FROM customers WHERE city = 'Chicago', you're ready to proceed.
Let's start with a realistic scenario. You're working with an e-commerce database that contains an orders table with the following structure:
CREATE TABLE orders (
order_id INT,
customer_id INT,
product_category VARCHAR(50),
region VARCHAR(50),
order_amount DECIMAL(10,2),
order_date DATE
);
Here's some sample data:
INSERT INTO orders VALUES
(1001, 501, 'Electronics', 'North', 899.99, '2024-01-15'),
(1002, 502, 'Clothing', 'South', 149.50, '2024-01-16'),
(1003, 503, 'Electronics', 'North', 299.99, '2024-01-16'),
(1004, 501, 'Books', 'North', 79.99, '2024-01-17'),
(1005, 504, 'Clothing', 'West', 199.99, '2024-01-17'),
(1006, 505, 'Electronics', 'South', 549.99, '2024-01-18'),
(1007, 502, 'Books', 'South', 39.99, '2024-01-18'),
(1008, 506, 'Clothing', 'East', 89.99, '2024-01-19');
A basic SELECT statement shows you every individual order:
SELECT * FROM orders;
But what if your manager asks: "How much revenue did each region generate?" Looking at eight individual rows won't give you a quick answer. You need to group the data by region and sum up the amounts.
Let's start with the simplest aggregate function: COUNT. This function tells you how many rows match your criteria.
SELECT COUNT(*) FROM orders;
This returns 8 because there are 8 orders in total. The asterisk (*) means "count all rows."
But COUNT becomes much more powerful when combined with GROUP BY. Let's count how many orders each region received:
SELECT region, COUNT(*)
FROM orders
GROUP BY region;
Results:
region | count
-------|------
North | 3
South | 3
West | 1
East | 1
Here's what's happening step by step:
Important: When you use GROUP BY, every column in your SELECT clause must either be in the GROUP BY clause or be an aggregate function. You can't mix grouped and non-grouped columns.
COUNT tells you "how many," but SUM tells you "how much." Let's calculate the total revenue by region:
SELECT region, SUM(order_amount) as total_revenue
FROM orders
GROUP BY region;
Results:
region | total_revenue
-------|-------------
North | 1279.97
South | 739.48
West | 199.99
East | 89.99
The as total_revenue part creates an alias—a more readable name for the calculated column. Without it, you'd see something like sum(order_amount) as the column header.
You can also combine multiple aggregate functions:
SELECT region,
COUNT(*) as order_count,
SUM(order_amount) as total_revenue
FROM orders
GROUP BY region;
This shows both how many orders each region had and their total value.
AVG calculates the arithmetic mean of a numeric column. Let's find the average order value by region:
SELECT region,
AVG(order_amount) as avg_order_value
FROM orders
GROUP BY region;
Results:
region | avg_order_value
-------|---------------
North | 426.66
South | 246.49
West | 199.99
East | 89.99
Now you can see that while the North region has the same number of orders as the South, their average order value is much higher.
Tip: AVG automatically ignores NULL values. If you have 3 orders with values 100, 200, and NULL, AVG will return 150 (the average of 100 and 200), not 100.
Real analysis often requires multiple metrics. Let's create a comprehensive regional summary:
SELECT region,
COUNT(*) as order_count,
SUM(order_amount) as total_revenue,
AVG(order_amount) as avg_order_value,
MIN(order_amount) as smallest_order,
MAX(order_amount) as largest_order
FROM orders
GROUP BY region
ORDER BY total_revenue DESC;
This query gives you a complete picture of each region's performance, sorted by total revenue from highest to lowest.
Sometimes you want to filter your results after grouping. This is where HAVING comes in. HAVING is to GROUP BY what WHERE is to individual rows.
Let's say you only want to see regions with more than 2 orders:
SELECT region,
COUNT(*) as order_count,
SUM(order_amount) as total_revenue
FROM orders
GROUP BY region
HAVING COUNT(*) > 2;
Results:
region | order_count | total_revenue
-------|-------------|-------------
North | 3 | 1279.97
South | 3 | 739.48
The West and East regions are filtered out because they each have only 1 order.
This is one of the most confusing aspects for beginners. Here's the key difference:
Let's see this in action. First, let's filter orders before grouping:
SELECT region, COUNT(*) as order_count
FROM orders
WHERE order_amount > 100 -- Filter individual orders first
GROUP BY region;
This counts orders by region, but only includes orders over $100 in the count.
Now let's filter groups after aggregation:
SELECT region, COUNT(*) as order_count
FROM orders
GROUP BY region
HAVING COUNT(*) > 1; -- Filter groups after counting
This shows all regions that have more than 1 order, regardless of order amounts.
You can use both in the same query:
SELECT region,
COUNT(*) as order_count,
AVG(order_amount) as avg_order_value
FROM orders
WHERE order_date >= '2024-01-17' -- Only recent orders
GROUP BY region
HAVING COUNT(*) >= 2 -- Only regions with 2+ recent orders
ORDER BY avg_order_value DESC;
You can group by more than one column to create more detailed breakdowns. Let's analyze revenue by both region and product category:
SELECT region,
product_category,
COUNT(*) as order_count,
SUM(order_amount) as total_revenue
FROM orders
GROUP BY region, product_category
ORDER BY region, product_category;
Results:
region | product_category | order_count | total_revenue
-------|------------------|-------------|-------------
East | Clothing | 1 | 89.99
North | Books | 1 | 79.99
North | Electronics | 2 | 1199.98
South | Books | 1 | 39.99
South | Clothing | 1 | 149.50
South | Electronics | 1 | 549.99
West | Clothing | 1 | 199.99
This gives you a much more granular view of your business. You can see, for example, that Electronics performs very well in the North region but appears only once in the South.
Grouping by dates is incredibly common in business analysis. Let's group orders by month:
SELECT DATE_TRUNC('month', order_date) as order_month,
COUNT(*) as order_count,
SUM(order_amount) as monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY order_month;
Note: The DATE_TRUNC function syntax varies by database system. In MySQL, you might use
DATE_FORMAT(order_date, '%Y-%m')instead. In SQL Server, you could useFORMAT(order_date, 'yyyy-MM').
Now it's your turn to practice. Using the orders table we've been working with, write queries to answer these business questions:
Customer Analysis: Which customer (by customer_id) has placed the most orders, and what's their total spending?
Product Performance: What's the average order value for each product category? Which category has the highest average?
Regional Trends: Show regions that have an average order value above $200, including their order count and total revenue.
Category-Region Analysis: Find all region-category combinations that have generated more than $500 in total revenue.
Take a moment to work through these before looking at the solutions below.
Solutions:
SELECT customer_id,
COUNT(*) as order_count,
SUM(order_amount) as total_spending
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC, total_spending DESC;
SELECT product_category,
AVG(order_amount) as avg_order_value,
COUNT(*) as order_count
FROM orders
GROUP BY product_category
ORDER BY avg_order_value DESC;
SELECT region,
COUNT(*) as order_count,
SUM(order_amount) as total_revenue,
AVG(order_amount) as avg_order_value
FROM orders
GROUP BY region
HAVING AVG(order_amount) > 200
ORDER BY avg_order_value DESC;
SELECT region,
product_category,
SUM(order_amount) as total_revenue,
COUNT(*) as order_count
FROM orders
GROUP BY region, product_category
HAVING SUM(order_amount) > 500
ORDER BY total_revenue DESC;
Mistake #1: Mixing grouped and non-grouped columns
This query will fail:
SELECT region, customer_id, COUNT(*)
FROM orders
GROUP BY region;
Error: You can't select customer_id because it's not in the GROUP BY clause. Each region group contains multiple customer_ids, so SQL doesn't know which one to display.
Fix: Either add customer_id to GROUP BY or remove it from SELECT.
Mistake #2: Using WHERE instead of HAVING for aggregate conditions
This won't work as expected:
SELECT region, COUNT(*) as order_count
FROM orders
WHERE COUNT(*) > 2 -- Wrong!
GROUP BY region;
Error: WHERE is evaluated before grouping, so COUNT(*) doesn't exist yet.
Fix: Use HAVING:
SELECT region, COUNT(*) as order_count
FROM orders
GROUP BY region
HAVING COUNT(*) > 2; -- Correct!
Mistake #3: Forgetting that COUNT(*) includes all rows, even those with NULL values
If you want to count only non-NULL values in a specific column, use COUNT(column_name) instead of COUNT(*).
Mistake #4: Not handling NULL values in aggregate functions
SUM, AVG, MIN, and MAX ignore NULL values, but this might not be what you expect. If you need to include NULLs in calculations, use functions like COALESCE to convert them to zero first:
SELECT region,
AVG(COALESCE(order_amount, 0)) as avg_including_nulls
FROM orders
GROUP BY region;
You've now mastered the fundamental aggregate functions that turn raw data into business insights:
These functions form the backbone of data analysis in SQL. You can now answer questions like "What's our revenue by region?" or "Which product categories are performing best?" - the types of queries that drive business decisions.
Next steps in your SQL journey:
Start applying these concepts to your own datasets. The more you practice grouping and aggregating real data, the more natural these patterns will become. Soon you'll be the person others come to when they need to "make sense of all this data."
Learning Path: SQL Fundamentals