
Picture this: it's the first Monday of a new quarter, and you just received an email with 12 Excel files attached—one monthly sales report for each of your regional offices. Your manager needs a consolidated analysis by end of day, and manually copying and pasting data from each file isn't just tedious—it's a recipe for errors. Sound familiar?
This scenario plays out in organizations everywhere: HR departments combining monthly employee reports, finance teams consolidating expense sheets, or marketing analysts merging campaign data from different sources. Power Query transforms what used to be hours of manual work into a few clicks and an automated process that updates whenever your source files change.
By the end of this lesson, you'll build a robust system that automatically combines data from multiple files in a folder, handles new files as they're added, and maintains data quality throughout the process—no manual intervention required.
What you'll learn: • How to connect Power Query to an entire folder of files at once • The step-by-step process of combining files with different structures • How to filter and transform data during the combination process • Techniques for handling common file naming patterns and data inconsistencies • How to create a dynamic system that automatically includes new files
Before diving in, you should have:
When you connect Power Query to a folder, you're not just opening files—you're creating a live connection to everything in that location. Think of it like subscribing to a folder rather than just reading its contents once.
Here's what makes this powerful: Power Query reads the folder's file list, then applies the same transformation steps to each file automatically. When you add new files to the folder later, they're included the next time you refresh your data.
Let's create realistic sample data to work with. Imagine you're managing sales data from three regional offices, and each office sends you their monthly reports in separate Excel files.
First, create a new folder on your computer called "Monthly_Sales_Reports". Inside this folder, create three Excel files:
North_Region_Jan2024.xlsx with this data:
Date | Product | Sales_Rep | Revenue
2024-01-05 | Laptop | Sarah Chen| 1200
2024-01-12 | Monitor | Mike Davis| 350
2024-01-18 | Keyboard | Sarah Chen| 75
2024-01-25 | Mouse | Lisa Wong | 45
South_Region_Jan2024.xlsx with this data:
Date | Product | Salesperson | Revenue
2024-01-08 | Laptop | Tom Rodriguez| 1450
2024-01-15 | Tablet | Amy Foster | 650
2024-01-22 | Headphones | Tom Rodriguez| 120
West_Region_Jan2024.xlsx with this data:
Date | Item | Rep_Name | Sales_Amount
2024-01-03 | Desktop | Kelly Park | 890
2024-01-10 | Printer | David Kim | 275
2024-01-17 | Scanner | Kelly Park | 180
2024-01-24 | Webcam | David Kim | 95
Notice the intentional inconsistencies—different column names for the same data (Sales_Rep vs. Salesperson vs. Rep_Name), and slightly different column headers (Revenue vs. Sales_Amount). This mirrors real-world scenarios where different departments or systems produce similar but not identical file formats.
Open Excel and navigate to the Data tab. In the Get Data section, click Get Data > From File > From Folder.
In the dialog box that appears, browse to your "Monthly_Sales_Reports" folder and click OK.
Power Query opens with a preview showing all files in your folder. You'll see something like this:
Content | Name | Extension | Date accessed | Date modified | Date created
[Binary] | North_Region_Jan2024.xlsx | .xlsx | 1/15/2024 | 1/15/2024 | 1/15/2024
[Binary] | South_Region_Jan2024.xlsx | .xlsx | 1/15/2024 | 1/15/2024 | 1/15/2024
[Binary] | West_Region_Jan2024.xlsx | .xlsx | 1/15/2024 | 1/15/2024 | 1/15/2024
This folder view is your control center. Each row represents one file, and the Content column contains the actual file data (shown as [Binary] because Power Query hasn't opened the files yet).
Pro tip: The folder connection updates automatically. If you add "East_Region_Jan2024.xlsx" to this folder later and refresh your data, it will automatically be included in your combined dataset.
Click the Combine Files button at the bottom of the preview window. Power Query launches a helper dialog that walks you through the combination process.
Power Query automatically selects the first file as a template and shows you a preview of its contents. You'll see your North Region data displayed in a table format:
Date | Product | Sales_Rep | Revenue
2024-01-05 | Laptop | Sarah Chen| 1200
2024-01-12 | Monitor | Mike Davis| 350
2024-01-18 | Keyboard | Sarah Chen| 75
2024-01-25 | Mouse | Lisa Wong | 45
If your files have multiple worksheets, you can specify which sheet to use from the dropdown. Since our sample files have data on Sheet1, we'll stick with the default.
Click OK to proceed with the combination.
Power Query combines all files and displays the merged dataset. You'll notice several important additions:
However, you'll immediately spot the column naming issue we introduced intentionally:
Date | Product | Sales_Rep | Revenue | Salesperson | Rep_Name | Item | Sales_Amount
2024-01-05 | Laptop | Sarah Chen| 1200 | null | null | null | null
2024-01-08 | Laptop | null | null | Tom Rodriguez| null | null | 1450
2024-01-03 | null | null | null | null | Kelly Park| Desktop | 890
Power Query created separate columns for each unique column name it encountered. This is the expected behavior, but it's not what we want for analysis.
Now comes the crucial part: transforming this messy combined dataset into something usable. We need to merge similar columns and standardize the structure.
First, let's create a single Sales_Representative column by merging the three rep name columns. In the Power Query Editor:
The COALESCE function concept applies here—Power Query takes the first non-null value from the selected columns for each row.
Your result will show:
Sales_Representative
Sarah Chen
Tom Rodriguez
Kelly Park
Now do the same for the product columns:
And for the revenue columns:
That Source.Name column contains valuable information, but it's messy with the full filename and extension. Let's extract just the region name:
After these transformations, your Region column should show:
Region
North_
South_
West_
Clean up those trailing underscores with one more replace operation: replace "_" with "".
Power Query often imports numerical data as text when combining files with inconsistent structures. Let's fix the data types:
For the Date column:
Power Query automatically handles the conversion and flags any rows that can't be converted, allowing you to investigate and fix data quality issues.
Sometimes you only want files that meet certain criteria. You can filter before combining by using the folder's file properties.
Go back to the folder connection step (before clicking Combine Files). Notice you can filter by:
To filter by date, click the dropdown arrow in the Date modified column header and choose Date Filters > After. Select a date 30 days ago to only include recent files.
You can add calculated columns to enhance your combined dataset. Let's add a column that shows the month and year from the filename:
Text.Middle([Source.Name], Text.PositionOf([Source.Name], "_") + 1, 7)This formula extracts the "Jan2024" portion from filenames like "North_Region_Jan2024.xlsx".
Real-world files often have more complex differences than just column names. Here's how to handle common scenarios:
Different starting rows: If some files have headers on row 1 and others on row 3, use the Use First Row as Headers feature selectively, or create custom logic to detect and skip variable header rows.
Extra columns: Files might have comment columns or metadata that you don't need. Remove unwanted columns after combination by selecting them and clicking Remove Columns.
Different worksheets: If files have data on different sheets, you can specify worksheet names in the combination dialog or create logic to find the sheet with actual data.
The real power of this approach becomes apparent when you automate the entire process. Let's build a system that requires zero manual intervention when new files arrive.
Instead of hardcoding your folder path, create a parameter that makes your solution portable:
Now modify your folder connection to use this parameter:
Folder.Files(FolderPath)This makes your query portable—you can easily point it to different folders for different projects.
Real files contain real problems. Build error handling into your process:
let
Source = Folder.Files(FolderPath),
FilteredFiles = Table.SelectRows(Source, each ([Extension] = ".xlsx") and ([Attributes]?[Size]? > 1000)),
AddedCustom = Table.AddColumn(FilteredFiles, "FileContent",
each try Excel.Workbook([Content], null, true) otherwise null),
RemovedErrors = Table.SelectRows(AddedCustom, each [FileContent] <> null),
// Continue with combination steps...
in
RemovedErrors
This approach:
try...otherwise to handle corrupt files gracefullyAdd validation steps to catch data problems early:
let
// ... previous steps ...
AddedValidation = Table.AddColumn(CombinedData, "DataQuality",
each if [Sales_Revenue] > 0 and [Sales_Representative] <> null
then "Valid"
else "Review Required"),
FilteredValid = Table.SelectRows(AddedValidation, each [DataQuality] = "Valid")
in
FilteredValid
This creates a data quality flag and optionally filters out problematic rows for separate review.
Now let's put everything together in a practical exercise that reinforces all the concepts we've covered.
Exercise Requirements: Create a Power Query solution that combines monthly expense reports from different departments, handling the inconsistencies shown below:
Create these three files in a folder called "Department_Expenses":
HR_Expenses_Q1.xlsx:
Transaction_Date | Category | Employee_Name | Cost
2024-01-15 | Travel | Alice Johnson | 450.00
2024-02-20 | Supplies | Bob Wilson | 125.50
2024-03-10 | Training | Alice Johnson | 800.00
Marketing_Expenses_Q1.xlsx:
Date | Expense_Type | Staff_Member | Amount
2024-01-22 | Conference | Carol Davis | 650.00
2024-02-15 | Software | Dave Miller | 299.99
2024-03-05 | Advertising | Carol Davis | 1200.00
IT_Expenses_Q1.xlsx:
Purchase_Date | Item | Buyer | Price
2024-01-08 | Hardware | Eve Chen | 750.00
2024-02-12 | License | Frank Torres | 199.00
2024-03-18 | Equipment | Eve Chen | 425.00
Your Task:
Solution Approach:
Connect to Folder: Use Data > Get Data > From Folder and select your "Department_Expenses" folder
Combine Files: Click Combine Files and proceed with the combination
Standardize Columns:
Extract Department:
Text.BeforeDelimiter([Source.Name], "_")Filter and Add Quarter:
Summary Calculation:
Expected Result:
Department | Total_Expenses
HR | 1375.50
Marketing | 2149.99
IT | 1374.00
The Problem: You hardcode the folder path, then your query breaks when you move files or share the workbook with colleagues.
What Happens: Power Query shows "DataSource.Error: Couldn't find file" when the hardcoded path doesn't exist on another computer.
The Fix: Always use parameters for file paths:
let
FolderPath = Excel.CurrentWorkbook(){[Name="FolderPath"]}[Content]{0}[Column1],
Source = Folder.Files(FolderPath)
in
Source
Create a single-cell named range called "FolderPath" in your worksheet where users can easily update the path.
The Problem: You combine files without checking for structural differences, leading to misaligned data.
What Happens: Power Query creates separate columns for similar data (like "Revenue" and "Sales"), resulting in mostly empty columns and fragmented data.
The Fix: Always examine your sample files first and build column standardization into your process:
let
// After combining files
StandardizedColumns = Table.RenameColumns(Source, {
{"Revenue", "Sales_Amount"},
{"Sales", "Sales_Amount"},
{"Rep", "Sales_Rep"},
{"Salesperson", "Sales_Rep"}
})
in
StandardizedColumns
The Problem: You don't set proper data types, causing calculation errors and unexpected sorting behavior.
What Happens: Numbers stored as text sort alphabetically (1, 10, 2, 3) instead of numerically (1, 2, 3, 10), and SUM functions return errors.
The Fix: Explicitly set data types after combining:
let
TypedColumns = Table.TransformColumnTypes(Source, {
{"Date", type date},
{"Revenue", type number},
{"Quantity", Int64.Type}
})
in
TypedColumns
The Problem: Your combination works fine with 5 files but becomes extremely slow with 50 files because you're not optimizing the process.
What Happens: Each refresh takes progressively longer, eventually timing out or consuming excessive memory.
The Fix: Use folder filtering and implement incremental refresh strategies:
let
// Only process files from the last 6 months
CutoffDate = Date.AddMonths(Date.From(DateTime.LocalNow()), -6),
FilteredFiles = Table.SelectRows(Source,
each [Date modified] >= CutoffDate and [Extension] = ".xlsx")
in
FilteredFiles
The Problem: You try to refresh data while Excel files in your source folder are open in other applications.
What Happens: Power Query throws sharing violation errors or skips locked files without warning.
The Fix: Build error handling that gracefully manages locked files:
let
AddedContent = Table.AddColumn(FilteredFiles, "Content",
each try Excel.Workbook([Content])
otherwise #table({"Error"}, {{"File locked or corrupted"}}))
in
AddedContent
You've now mastered the fundamental skill of combining multiple files with Power Query—a capability that transforms how you handle repetitive data consolidation tasks. Let's recap the key concepts you've learned:
The folder connection approach creates a dynamic link to your data sources, automatically including new files and maintaining consistency across refreshes. By understanding how to standardize inconsistent column structures, handle different file formats, and build error handling into your processes, you're equipped to tackle real-world data combination challenges.
The parameter-driven setup you learned makes your solutions portable and maintainable, while the data quality checks ensure your combined datasets remain reliable as source files change. These aren't just technical skills—they're productivity multipliers that free you from manual data preparation tasks.
Natural next steps to expand your Power Query expertise:
Master Advanced Transformations: Dive deeper into Power Query's transformation capabilities, including pivoting, unpivoting, and complex column operations. These skills become essential when your source files have more complex structures than simple tabular data.
Learn Power Query's M Language: While the visual interface handles most scenarios, understanding M (Power Query's formula language) unlocks advanced customization. You'll be able to create sophisticated logic for handling edge cases and building reusable functions.
Explore Incremental Data Refresh: For scenarios involving hundreds of files or very large datasets, learn techniques for processing only changed or new files rather than reprocessing everything on each refresh. This becomes critical for maintaining performance in enterprise environments.
Each of these topics builds naturally on the foundation you've established here, taking you from basic file combination toward becoming a Power Query expert who can handle any data integration challenge.
Learning Path: Power Query Essentials