Picture this: you've just connected Power BI to your company's sales database. The data comes in, and almost immediately you notice the problems. Dates are stored as text strings. Region codes are inconsistently capitalized — "NORTHEAST," "Northeast," and "north east" all coexist in the same column. The revenue figures include currency symbols that make them impossible to aggregate. And someone, at some point, decided that null values should be represented as the string "N/A" rather than actual nulls. Your model is a mess before you've written a single DAX formula.
This is where Power Query earns its place as one of the most valuable tools in the Power BI stack. Power Query is the transformation layer that sits between your raw data sources and your data model — it's where you impose structure, enforce consistency, and shape data into something your model can actually use intelligently. Getting this layer right means your DAX stays cleaner, your reports run faster, and your data is genuinely trustworthy. Getting it wrong means patching problems forever at the wrong layer.
By the end of this lesson, you'll have a working command of Power Query's most important capabilities: connecting to sources, applying the right transformations in the right order, writing M formulas for things the UI can't handle alone, and building queries that are maintainable, performant, and correct.
What you'll learn:
You should be comfortable loading data into Power BI and have spent some time in the Power Query Editor UI — you know what Applied Steps are, you've renamed a column or changed a data type before. You don't need to know the M language yet, but you should have at least seen it briefly. Some familiarity with DAX is helpful for understanding why we push work to Power Query rather than DAX, but it's not required.
Before touching any transformation, you need to understand what Power Query actually does with your data and when it does it.
Power Query processes data through a pipeline of steps, each represented as an M expression. These steps are lazy — they don't execute until Power BI refreshes the dataset. When refresh happens, Power Query evaluates the entire pipeline against the source, and the result gets loaded into the data model as a table. Between refreshes, the data in your model is static; Power Query is dormant.
This has a critical implication: every transformation you apply in Power Query runs once at refresh time, whereas every DAX calculation runs repeatedly at query time. If you have a complex text-cleaning operation that touches a million rows, doing it in Power Query means it runs once per refresh. Doing it in a DAX calculated column means it runs every time the model processes that column. Power Query almost always wins for row-level transformations.
The pipeline also introduces the concept of query folding — one of the most important (and most misunderstood) performance levers in Power Query. When Power Query can translate your transformation steps into the native query language of the source (SQL for relational databases, OData queries for REST sources, etc.), it pushes the computation to the source system and only pulls back the result. This is dramatically faster than pulling all the raw data and transforming it locally in Power Query's own engine.
Key principle: Transformations that fold to the source are almost always faster than those that don't. Watch your Applied Steps carefully — once folding breaks, all subsequent steps run locally.
Open the Power Query Editor by clicking "Transform Data" in the Home ribbon. The interface has four zones you'll live in:
The Queries pane (left side) lists every query in your file. Queries can load to the model or remain as staging queries — more on that shortly.
The formula bar shows the M expression for whichever Applied Step is currently selected. This is your window into what's really happening.
The Applied Steps pane (right side) shows the sequence of transformations for the selected query. You can click any step to see an intermediate state of your data — this is invaluable for debugging.
The data preview (center) shows a sample of the data at the currently selected step. Note that this is a sample, not the full dataset, which matters when you're debugging edge cases that only appear in a small percentage of rows.
One habit to develop immediately: name your queries and steps descriptively. Default step names like "Replaced Value1" and "Replaced Value2" become completely unreadable when you come back to a query three months later. Right-click any step and rename it. Similarly, rename your queries from the default "Query1" to something meaningful like "Sales_Raw" or "dim_Customers."
Every action you take in the Power Query Editor generates an M expression. Understanding M is what separates someone who can click through transformations from someone who can actually maintain and extend a complex query.
M is a functional, case-sensitive, expression-based language. The output of every query is a table, and each step in Applied Steps is a named expression that takes the previous step as input and returns a transformed version.
Here's a minimal example. When you import a CSV and rename a column, the generated M looks like this:
let
Source = Csv.Document(File.Contents("C:\Data\sales.csv"), [Delimiter=",", Columns=8, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers", {{"OrderDate", type date}, {"Revenue", type number}, {"Region", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type", {{"Rev", "Revenue"}, {"Reg", "Region"}})
in
#"Renamed Columns"
The let block defines a sequence of named expressions. Each name references the previous name as its input — Changed Type takes Promoted Headers as input, and so on. The in block specifies what the query returns, which is always the last named expression.
Notice that step names with spaces must be wrapped in #"...". This is M's way of allowing identifiers that would otherwise be invalid.
You can edit this directly in the Advanced Editor (Home → Advanced Editor). This is where you'll make changes that the UI can't express easily — conditional logic, custom functions, dynamic parameters.
Tip: When you're learning M, generate transformations using the UI first, then open the Advanced Editor to read what was generated. You'll internalize the function signatures much faster this way than reading documentation in isolation.
The most common real-world data quality problem is inconsistent text values. Let's say you have a Region column that contains "NORTHEAST", "Northeast", "north east", and "NE" — all meaning the same thing. You need to standardize these before they reach your model, or every region-based analysis will be fragmented.
The UI's "Replace Values" and "Transform" options get you partway there, but for multi-condition replacements, a custom column with if/then/else or a lookup table approach is more maintainable.
Here's a custom column expression that standardizes region codes:
if Text.Upper(Text.Trim([Region])) = "NORTHEAST" or Text.Upper(Text.Trim([Region])) = "NE" then "Northeast"
else if Text.Upper(Text.Trim([Region])) = "SOUTHEAST" or Text.Upper(Text.Trim([Region])) = "SE" then "Southeast"
else if Text.Upper(Text.Trim([Region])) = "MIDWEST" or Text.Upper(Text.Trim([Region])) = "MW" then "Midwest"
else if Text.Upper(Text.Trim([Region])) = "WEST" then "West"
else "Unknown"
Notice we're calling Text.Trim and Text.Upper before any comparison. This makes the comparison case-insensitive and removes leading/trailing whitespace in one move — two of the most common sources of false mismatches.
For a larger set of replacements, this conditional chain becomes unwieldy. A better approach is a replacement table: create a small reference query that maps raw values to clean values, then merge it with your main query. This way, business users can maintain the mapping in Excel without touching the Power Query logic.
The UI's "Split Column" handles simple delimiter-based splits cleanly. But real data often has irregular structure. Consider a FullAddress column containing values like "123 Main St, Boston, MA 02101" — you want to extract the state abbreviation.
Rather than splitting on commas (which gives you three columns you have to further process), use Text.BetweenDelimiters to extract exactly what you need:
Text.BetweenDelimiters([FullAddress], ", ", ", ", 1, 0)
This extracts the text between the second and third comma-space, which reliably gives you the state. The third and fourth arguments are occurrence indexes — zero-based counts of which delimiter occurrence to use as the start and end.
For merging, the UI's "Merge Columns" does the job for simple concatenation. When you need conditional merging — for example, building a display name that uses "FirstName LastName" when both are present but falls back to just one when the other is null — use a custom column:
if [FirstName] = null and [LastName] = null then "Unknown"
else if [FirstName] = null then [LastName]
else if [LastName] = null then [FirstName]
else [FirstName] & " " & [LastName]
Unpivoting is one of the most transformative (and underused) operations in Power Query, and it's where many analysts discover just how powerful the tool is.
Imagine you've received a sales report where each month is its own column:
| Product | Jan_Sales | Feb_Sales | Mar_Sales |
|---|---|---|---|
| Widget A | 4500 | 5200 | 4800 |
| Widget B | 3100 | 2900 | 3400 |
This wide format is readable for humans but terrible for Power BI. You can't create a date-based slicer, you can't write a single DAX measure that aggregates across months, and adding April means adding a new column and rewriting every report. Unpivoting transforms this into:
| Product | Month | Sales |
|---|---|---|
| Widget A | Jan_Sales | 4500 |
| Widget A | Feb_Sales | 5200 |
| Widget A | Mar_Sales | 4800 |
| Widget B | Jan_Sales | 3100 |
| ... and so on. |
Select the Product column, then use Transform → Unpivot Other Columns. This tells Power Query: keep the selected columns fixed, and turn every other column into an attribute-value pair. The result is the tall format above.
After unpivoting, you typically need to clean the attribute column — strip "_Sales" from the month names, parse out the actual month, and create proper dates. Use a combination of Text.Replace, Text.Start, and Date.FromText depending on how the month labels are structured.
Warning: Use "Unpivot Other Columns" rather than "Unpivot Selected Columns" when your source might add new month columns in the future. "Unpivot Other Columns" dynamically handles any new columns; "Unpivot Selected Columns" only processes the columns you explicitly named at design time.
Pivoting is the reverse — turning distinct values in one column into separate columns with aggregated values. This is useful for creating summary structures, but be thoughtful about it in Power BI. Most of the time, the unpivoted form plus a DAX measure handles reporting more flexibly than a pre-pivoted table.
Date columns need special attention because they come in wildly inconsistent formats from different source systems. A date stored as "20231115" (YYYYMMDD as a plain integer), "11/15/2023" (US format text), and "2023-11-15T00:00:00Z" (ISO 8601 with timezone) are all the same date — but Power Query treats them completely differently.
For numeric dates in YYYYMMDD format, the most reliable conversion is:
Date.FromText(Text.From([DateColumn]), [Format="yyyyMMdd"])
This explicitly specifies the format, which avoids locale-dependent parsing bugs. Never use Date.From(Text.From([DateColumn])) without specifying a format — the interpretation will depend on the regional settings of whoever is refreshing the dataset.
For ISO 8601 timestamps with timezone offsets, use DateTimeZone.FromText and then convert to local time if needed:
DateTimeZone.ToLocal(DateTimeZone.FromText([TimestampColumn]))
Warning: Be very deliberate about timezone handling. If your source data is UTC and your users are in multiple timezones, decide at the Power Query layer how you want to handle this, and document it. Silent timezone conversion errors are a significant source of reporting discrepancies.
Parameters are variables you define once and reference throughout your queries. They're invaluable for environment switching (dev/test/prod), dynamic file paths, and date-range filters.
Create a parameter by right-clicking in the Queries pane → New Parameter. Give it a name, a type (Text, Number, Date, etc.), and a current value. Then reference it in any query step.
A common pattern is a ServerName parameter for database connections:
let
Source = Sql.Database(ServerName, DatabaseName),
SalesTable = Source{[Schema="dbo", Item="Sales"]}[Data]
in
SalesTable
When you need to switch from your development server to production, you change one parameter value, and every query that references it updates automatically. This is dramatically better than hunting through query after query looking for hardcoded connection strings.
For date-based filtering, parameters let you make queries dynamic. Create a ReportStartDate parameter of type Date, then use it in a filter step:
#"Filtered Rows" = Table.SelectRows(#"Previous Step", each [OrderDate] >= ReportStartDate)
Custom functions are reusable M expressions that accept inputs and return a value or table. They're essential when you need to apply the same complex transformation logic to multiple queries, or when you need to invoke an API endpoint for each row of a table.
Here's a simple but practical example: a function that cleans a phone number column by stripping everything except digits:
(rawPhone as text) as text =>
let
Digits = {"0","1","2","3","4","5","6","7","8","9"},
CleanedList = List.Select(Text.ToList(rawPhone), each List.Contains(Digits, _)),
Cleaned = Text.Combine(CleanedList)
in
Cleaned
Save this as a query named fnCleanPhone. Then in any query that has a phone number column, add a custom column that calls it:
fnCleanPhone([PhoneNumber])
Important: Custom functions that are invoked row-by-row can kill query folding. When you call a custom function for each row, Power Query must process every row locally — it cannot fold this to the source. For large datasets, pre-filter and reduce row counts before invoking custom functions.
The Power Query merge operation is a join between two queries. The UI is straightforward, but the join type selection trips people up. A refresher:
A common production use case: you have a Sales table and a CustomerSegments table that maps customer IDs to segments. You want to add the segment to the sales data. Use a Left Outer merge on CustomerID, then expand only the Segment column from the merged table.
The M for this looks like:
#"Merged Queries" = Table.NestedJoin(Sales, {"CustomerID"}, CustomerSegments, {"CustomerID"}, "CustomerSegments", JoinKind.LeftOuter),
#"Expanded CustomerSegments" = Table.ExpandTableColumn(#"Merged Queries", "CustomerSegments", {"Segment"}, {"Segment"})
Notice Table.ExpandTableColumn — the merge step just creates a nested table column; the expand step is what actually flattens the joined data into columns. The UI does both automatically, but knowing this distinction matters when debugging merge results.
Appending combines multiple tables with the same structure by stacking them vertically. The classic scenario is monthly files from an external system — twelve separate CSVs that need to become one table.
Rather than appending each file manually, use the "From Folder" connector. Point it at the folder containing your CSVs, and Power Query creates a query that automatically includes any new file added to that folder at the next refresh. This is a much more maintainable pattern than manually adding new files each month.
The auto-generated query from "From Folder" is functional but usually needs cleanup — you'll need to filter to only the file type you want, handle files with different schemas, and often strip out header rows that got included as data rows if the files use inconsistent formatting.
As your Power Query solution grows, you'll have queries that shouldn't load into the data model but exist to support other queries. These are staging queries.
A practical example: suppose you connect to a database, apply filters and initial cleaning, and then split the result into two destination tables — one for order headers and one for line items. Rather than connecting to the database twice and applying the same initial filtering twice, create a staging query that does the connection and initial cleaning, then create two downstream queries that reference it and do their respective final transformations.
To prevent a query from loading to the model, right-click it in the Queries pane and uncheck "Enable Load." It becomes an italic query that processes during refresh but doesn't create a table in the model. This keeps your model clean and reduces unnecessary data load.
Your query dependency graph matters for both performance and maintainability. Power Query evaluates queries in dependency order — staging queries first, then queries that depend on them. If two queries are independent, Power Query may parallelize their evaluation. Keep dependencies intentional and documented.
Let's put everything together with a realistic scenario. You're building a Power BI report for a regional retail company. You have three data sources:
Step 1: Load and stage the transactions
Create a query that loads the transactions CSV. In the initial cleaning steps:
Name this query Transactions_Staged and disable its load to the model.
Step 2: Clean transaction data
Create a new query that references Transactions_Staged. In this query:
Add a custom column to parse the date, since it comes in as "MM/DD/YYYY" text:
Date.FromText([TransactionDate], [Format="MM/dd/yyyy"])
Add a custom column to clean revenue — strip the dollar sign and convert to decimal:
Number.FromText(Text.Replace([Revenue], "$", ""))
Add a conditional column to categorize transaction size:
if [CleanRevenue] >= 10000 then "Large"
else if [CleanRevenue] >= 1000 then "Medium"
else "Small"
Name this query Transactions_Clean and enable its load.
Step 3: Load and clean the store reference table
Load the store Excel file. The store names need standardizing — use Replace Values for known variations, then trim and proper-case the result using Text.Proper([StoreName]).
Create a column for the store's region grouping using the conditional logic pattern from earlier. Name this query dim_Stores.
Step 4: Load the product catalog
The product catalog has the month-as-column structure from our earlier discussion. Load it, unpivot the month columns, then clean the Month column to extract a proper date. Create a YearMonth column using:
Date.FromText("01/" & Text.Replace([Month], "Sales_", "") & "/2023", [Format="dd/MMM/yyyy"])
Name this query dim_Products_Sales.
Step 5: Merge stores into transactions
In Transactions_Clean, do a Left Outer merge with dim_Stores on the store ID column. Expand only the StoreName, Region, and StoreSize columns. Remove the original raw store ID column.
At the end of this exercise, you have a clean transaction table that's enriched with store attributes, built on a staged base query, with all data type handling and text cleaning done correctly. This is the foundation of a trustworthy model.
The number one support issue with Power Query is "it worked yesterday, and now it's failing with a type error." This almost always means the source data changed — a column that was always numeric now has a text value in one row, or a date column has a blank.
Fix this by being defensive about type conversion. Instead of Table.TransformColumnTypes (which throws on any non-conforming value), use custom columns with try ... otherwise syntax:
try Number.FromText([Revenue]) otherwise null
This converts what it can and returns null for anything it can't parse, rather than failing the entire refresh.
You added a step and suddenly your refresh takes three times as long. The likely culprit is a query folding break. To check whether a step is folding, right-click the step in Applied Steps. If "View Native Query" is available and clickable, the step is folding. If it's grayed out, folding has broken at or before this step.
Common folding breakers:
The fix is usually to push your non-foldable steps to the end of the pipeline — do as much filtering and column removal as possible (these typically fold) before you apply the transformation that breaks folding. The goal is to minimize the rows and columns that the local engine has to process.
If you reference Query A from Query B and then try to reference Query B from Query A, Power Query will give you a circular reference error. This usually happens when people try to build lookup tables that reference the same source as the main query without proper staging. The fix is always to create a proper staging query and have both downstream queries reference that, rather than each other.
If a column name changes in the source, every downstream step that references that column will fail with a "Column not found" error. The fix is to update the step where the column name is first used. A preventative measure is to rename columns at the very first opportunity — typically immediately after Promoted Headers — so all downstream steps use your standardized names, not the source's names. When the source changes a column name, you only need to fix it in one place.
Some practitioners try to reduce the number of Applied Steps for performance reasons. This is usually misguided — Power Query compiles the step sequence into a single evaluation plan, and having twenty steps instead of five has negligible performance impact compared to folding behavior, row counts, and data types.
Where step count does matter is maintainability. Don't combine multiple unrelated transformations into a single step just to keep the list short. Steps are your audit trail; readable steps are worth having.
You now have a practitioner-level grasp of Power Query as a transformation layer. The key mental model: Power Query is where data becomes trustworthy, and everything in it runs once at refresh time. Do the row-level work here. Let DAX handle analytical calculations.
The patterns that will serve you most in production:
Where to go next:
List.Generate, Table.Group, and recursive functions for scenarios the UI simply cannot handle.The transformation layer is where data quality is either built or neglected. Build it carefully, document your steps, and your downstream model and reports will be proportionally more reliable.
Learning Path: Getting Started with Power BI