Imagine you're working with employee data at a growing tech company. You need to find everyone who reports to a specific manager, including people who report to those reports, and their reports too — essentially mapping out the entire organizational hierarchy beneath any given person. Or maybe you're analyzing a retail chain and need to trace how products flow through different distribution levels, from warehouses to regional centers to local stores.
These scenarios involve hierarchical data — information organized in parent-child relationships that can extend multiple levels deep. While regular SQL queries excel at working with flat data, they struggle with these tree-like structures where you don't know in advance how many levels you need to traverse.
That's where recursive Common Table Expressions (CTEs) come in. Recursive CTEs let you write queries that call themselves, building results incrementively by following relationships through as many levels as exist in your data. By the end of this lesson, you'll understand how to use this powerful technique to query hierarchical data structures with confidence.
What you'll learn:
Before diving into recursive CTEs, you should be comfortable with:
If you need a refresher on regular CTEs, review that topic first, as recursive CTEs build directly on those foundations.
A recursive CTE is a CTE that references itself within its own definition. Think of it like a mathematical function that calls itself with slightly different parameters each time, eventually building up a complete result.
The classic example is calculating factorials: 5! = 5 × 4!, where 4! = 4 × 3!, and so on. Each step uses the result from the previous step, continuing until reaching a base case (1! = 1).
Recursive CTEs work similarly. They start with a "base case" (called the anchor member), then repeatedly apply a rule (the recursive member) until no new rows can be generated.
Here's the basic syntax structure:
WITH RECURSIVE cte_name AS (
-- Anchor member (base case)
SELECT initial_data
FROM some_table
WHERE base_condition
UNION [ALL]
-- Recursive member (the recursive step)
SELECT modified_data
FROM some_table
JOIN cte_name ON join_condition
WHERE recursive_condition
)
SELECT * FROM cte_name;
Let's break this down:
Note: Some databases (like PostgreSQL) require the
RECURSIVEkeyword, while others (like SQL Server) don't use it but support the same functionality.
To make this concrete, let's work with an employee hierarchy table. This represents a common real-world scenario where each employee has a manager, creating a tree structure:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT,
department VARCHAR(50),
salary DECIMAL(10,2),
hire_date DATE
);
INSERT INTO employees VALUES
(1, 'Sarah Chen', NULL, 'Executive', 150000, '2018-01-15'),
(2, 'Marcus Johnson', 1, 'Engineering', 120000, '2019-03-20'),
(3, 'Lisa Rodriguez', 1, 'Sales', 110000, '2019-06-10'),
(4, 'David Kim', 2, 'Engineering', 95000, '2020-02-12'),
(5, 'Amanda Foster', 2, 'Engineering', 98000, '2020-04-18'),
(6, 'Robert Taylor', 3, 'Sales', 85000, '2020-07-25'),
(7, 'Jennifer Wu', 3, 'Sales', 88000, '2020-09-30'),
(8, 'Michael Brown', 4, 'Engineering', 75000, '2021-01-08'),
(9, 'Emily Davis', 4, 'Engineering', 78000, '2021-03-15'),
(10, 'Chris Wilson', 6, 'Sales', 65000, '2021-05-20'),
(11, 'Jessica Moore', 6, 'Sales', 68000, '2021-08-12');
This creates a hierarchy where Sarah Chen (CEO) is at the top, Marcus and Lisa report to her, and so on. Let's visualize this structure:
Sarah Chen (CEO)
├── Marcus Johnson (Engineering Manager)
│ ├── David Kim (Senior Engineer)
│ │ ├── Michael Brown (Engineer)
│ │ └── Emily Davis (Engineer)
│ └── Amanda Foster (Senior Engineer)
└── Lisa Rodriguez (Sales Manager)
├── Robert Taylor (Senior Sales Rep)
│ ├── Chris Wilson (Sales Rep)
│ └── Jessica Moore (Sales Rep)
└── Jennifer Wu (Senior Sales Rep)
Let's start with a fundamental question: "Who reports to Marcus Johnson, either directly or indirectly?"
Here's how we solve this with a recursive CTE:
WITH RECURSIVE reporting_chain AS (
-- Anchor: Start with Marcus Johnson
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE name = 'Marcus Johnson'
UNION ALL
-- Recursive: Find everyone who reports to someone in our current result set
SELECT e.employee_id, e.name, e.manager_id, rc.level + 1
FROM employees e
JOIN reporting_chain rc ON e.manager_id = rc.employee_id
)
SELECT level, name, employee_id
FROM reporting_chain
ORDER BY level, name;
Let's trace through how this works:
Iteration 1 (Anchor):
Iteration 2 (Recursive):
Iteration 3 (Recursive):
Iteration 4 (Recursive):
The result shows the complete hierarchy under Marcus:
level | name | employee_id
------|----------------|------------
1 | Marcus Johnson | 2
2 | Amanda Foster | 5
2 | David Kim | 4
3 | Emily Davis | 9
3 | Michael Brown | 8
Key insight: The
levelcolumn helps track how many steps away from the starting point each person is. This is incredibly useful for understanding the depth of your hierarchy.
Another common pattern is starting from any employee and finding their complete reporting chain up to the CEO. Let's find everyone above Emily Davis:
WITH RECURSIVE management_chain AS (
-- Anchor: Start with Emily Davis
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE name = 'Emily Davis'
UNION ALL
-- Recursive: Find each person's manager
SELECT e.employee_id, e.name, e.manager_id, mc.level + 1
FROM employees e
JOIN management_chain mc ON e.employee_id = mc.manager_id
)
SELECT level, name,
CASE WHEN manager_id IS NULL THEN 'CEO' ELSE 'Reports to ID: ' || manager_id END as role
FROM management_chain
ORDER BY level;
This traverses upward through the hierarchy:
level | name | role
------|----------------|------------------
1 | Emily Davis | Reports to ID: 4
2 | David Kim | Reports to ID: 2
3 | Marcus Johnson | Reports to ID: 1
4 | Sarah Chen | CEO
Notice how we changed the JOIN condition: instead of finding subordinates (e.manager_id = rc.employee_id), we're finding managers (e.employee_id = mc.manager_id).
Sometimes you only want to traverse a limited number of levels. Most databases provide ways to limit recursion depth. Here's how to find only the direct and immediate subordinates (2 levels) under Sarah Chen:
WITH RECURSIVE reporting_chain AS (
-- Anchor: Start with Sarah Chen
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE name = 'Sarah Chen'
UNION ALL
-- Recursive: Find subordinates, but stop at level 3
SELECT e.employee_id, e.name, e.manager_id, rc.level + 1
FROM employees e
JOIN reporting_chain rc ON e.manager_id = rc.employee_id
WHERE rc.level < 3 -- Stop recursion after level 3
)
SELECT level, name
FROM reporting_chain
ORDER BY level, name;
The WHERE rc.level < 3 condition prevents the recursive member from continuing beyond the desired depth.
You can also use database-specific options. In PostgreSQL, for example:
WITH RECURSIVE reporting_chain AS (
-- ... same CTE definition ...
)
SELECT * FROM reporting_chain
OPTION (MAXRECURSION 3);
Recursive CTEs become particularly powerful when you combine them with aggregation. Let's calculate the total salary cost for each manager, including all their subordinates at any level:
WITH RECURSIVE team_hierarchy AS (
-- Anchor: Get all managers (people who have subordinates)
SELECT DISTINCT m.employee_id, m.name as manager_name,
e.employee_id as subordinate_id, e.name as subordinate_name,
e.salary, 1 as level
FROM employees m
JOIN employees e ON m.employee_id = e.manager_id
UNION ALL
-- Recursive: Find subordinates of subordinates
SELECT th.employee_id, th.manager_name,
e.employee_id, e.name, e.salary, th.level + 1
FROM team_hierarchy th
JOIN employees e ON th.subordinate_id = e.manager_id
),
salary_totals AS (
SELECT employee_id, manager_name,
COUNT(*) as total_subordinates,
SUM(salary) as total_team_salary
FROM team_hierarchy
GROUP BY employee_id, manager_name
)
SELECT st.manager_name,
st.total_subordinates,
st.total_team_salary,
m.salary as manager_salary,
st.total_team_salary + m.salary as total_cost
FROM salary_totals st
JOIN employees m ON st.employee_id = m.employee_id
ORDER BY total_cost DESC;
This query reveals the true cost of each part of the organization:
manager_name | total_subordinates | total_team_salary | manager_salary | total_cost
----------------|-------------------|-------------------|----------------|------------
Marcus Johnson | 4 | 336000 | 120000 | 456000
Lisa Rodriguez | 4 | 306000 | 110000 | 416000
David Kim | 2 | 153000 | 95000 | 248000
Robert Taylor | 2 | 133000 | 85000 | 218000
Real-world data often contains multiple separate hierarchies. Let's add a second company division:
INSERT INTO employees VALUES
(12, 'Thomas Anderson', NULL, 'Technology', 160000, '2018-05-01'),
(13, 'Nina Patel', 12, 'IT', 105000, '2019-08-15'),
(14, 'Carlos Rivera', 12, 'Security', 108000, '2020-01-20'),
(15, 'Sophie Zhang', 13, 'IT', 82000, '2021-02-28'),
(16, 'Alex Thompson', 14, 'Security', 79000, '2021-06-10');
Now we have two separate hierarchies (one under Sarah Chen, another under Thomas Anderson). To process all hierarchies at once, we can modify our recursive CTE:
WITH RECURSIVE full_hierarchy AS (
-- Anchor: All top-level managers (no manager_id)
SELECT employee_id, name, manager_id,
employee_id as root_manager, name as root_name,
1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Build each hierarchy
SELECT e.employee_id, e.name, e.manager_id,
fh.root_manager, fh.root_name, fh.level + 1
FROM employees e
JOIN full_hierarchy fh ON e.manager_id = fh.employee_id
)
SELECT root_name as organization_head,
level, name,
COUNT(*) OVER (PARTITION BY root_manager) as org_size
FROM full_hierarchy
ORDER BY root_manager, level, name;
This shows both hierarchies with their respective sizes:
organization_head | level | name | org_size
-------------------|-------|-------------------|----------
Sarah Chen | 1 | Sarah Chen | 11
Sarah Chen | 2 | Lisa Rodriguez | 11
Sarah Chen | 2 | Marcus Johnson | 11
...
Thomas Anderson | 1 | Thomas Anderson | 5
Thomas Anderson | 2 | Carlos Rivera | 5
Thomas Anderson | 2 | Nina Patel | 5
...
Now it's time to practice with a realistic scenario. You're analyzing a product category hierarchy for an e-commerce company. Create and populate this table:
CREATE TABLE product_categories (
category_id INT PRIMARY KEY,
category_name VARCHAR(100),
parent_category_id INT,
product_count INT
);
INSERT INTO product_categories VALUES
(1, 'Electronics', NULL, 0),
(2, 'Clothing', NULL, 0),
(3, 'Home & Garden', NULL, 0),
(4, 'Computers', 1, 0),
(5, 'Mobile Devices', 1, 0),
(6, 'Audio', 1, 0),
(7, 'Men''s Clothing', 2, 0),
(8, 'Women''s Clothing', 2, 0),
(9, 'Laptops', 4, 45),
(10, 'Desktop PCs', 4, 23),
(11, 'Smartphones', 5, 78),
(12, 'Tablets', 5, 34),
(13, 'Headphones', 6, 56),
(14, 'Speakers', 6, 29),
(15, 'Shirts', 7, 120),
(16, 'Pants', 7, 89),
(17, 'Dresses', 8, 156),
(18, 'Shoes', 8, 203);
Your task: Write a recursive CTE that calculates the total product count for each category, including all products in its subcategories. For example, "Electronics" should show the sum of all products in Computers, Mobile Devices, Audio, and their subcategories.
Try this yourself before looking at the solution below.
Solution:
WITH RECURSIVE category_totals AS (
-- Anchor: Start with leaf categories (those with actual product counts)
SELECT category_id, category_name, parent_category_id,
product_count, 1 as level
FROM product_categories
WHERE product_count > 0
UNION ALL
-- Recursive: Roll up totals to parent categories
SELECT p.category_id, p.category_name, p.parent_category_id,
ct.product_count, ct.level + 1
FROM product_categories p
JOIN category_totals ct ON p.category_id = ct.parent_category_id
),
aggregated_totals AS (
SELECT category_id, category_name, parent_category_id,
SUM(product_count) as total_products
FROM category_totals
GROUP BY category_id, category_name, parent_category_id
)
SELECT pc.category_name,
COALESCE(at.total_products, 0) as total_products,
CASE WHEN pc.parent_category_id IS NULL THEN 'Top Level'
ELSE 'Subcategory' END as category_type
FROM product_categories pc
LEFT JOIN aggregated_totals at ON pc.category_id = at.category_id
ORDER BY pc.parent_category_id NULLS FIRST, pc.category_name;
This should show Electronics with 265 total products (45+23+78+34+56+29), Clothing with 568 products (120+89+156+203), and so on.
Infinite Recursion: The most dangerous mistake is creating a cycle in your data that causes infinite recursion. Always include proper termination conditions:
-- BAD: No protection against cycles
WITH RECURSIVE dangerous_query AS (
SELECT employee_id, manager_id, 1 as level
FROM employees
WHERE employee_id = 1
UNION ALL
SELECT e.employee_id, e.manager_id, dq.level + 1
FROM employees e
JOIN dangerous_query dq ON e.manager_id = dq.employee_id
-- No termination condition!
)
-- GOOD: Multiple safety measures
WITH RECURSIVE safe_query AS (
SELECT employee_id, manager_id, 1 as level,
CAST(employee_id as VARCHAR(1000)) as path
FROM employees
WHERE employee_id = 1
UNION ALL
SELECT e.employee_id, e.manager_id, sq.level + 1,
sq.path || ',' || e.employee_id
FROM employees e
JOIN safe_query sq ON e.manager_id = sq.employee_id
WHERE sq.level < 10 -- Depth limit
AND sq.path NOT LIKE '%,' || e.employee_id || ',%' -- Cycle detection
)
Performance Issues: Recursive CTEs can be expensive. Always:
UNION ALL instead of UNION when you don't need to eliminate duplicatesIncorrect JOIN Logic: Double-check your JOIN conditions. The most common error is reversing the parent-child relationship:
-- Wrong: This finds managers of people in the CTE
JOIN employees e ON e.employee_id = cte.manager_id
-- Right: This finds subordinates of people in the CTE
JOIN employees e ON e.manager_id = cte.employee_id
Missing Anchor Cases: Ensure your anchor member returns the correct starting points. If you want all hierarchies, start with rows where parent_id IS NULL. If you want a specific subtree, start with the specific root node.
Performance Tip: When working with large hierarchical datasets, consider materializing frequently-accessed hierarchy paths in a separate table to avoid repeated recursive computation.
Recursive CTEs are a powerful tool for working with hierarchical and tree-structured data. You now understand how to:
The key to mastering recursive CTEs is practice with real-world hierarchical data. Look for opportunities in your own work — organizational charts, product categories, geographic regions, network structures, or any data with parent-child relationships.
Next steps to advance your skills:
Recursive CTEs unlock entirely new categories of analytical questions. Once you're comfortable with the basic patterns, you'll find yourself reaching for this technique whenever you encounter data with natural tree structures — and you'll be amazed at how many business problems actually involve hierarchical relationships.
Learning Path: Advanced SQL Queries