
Your Sales fact table has 800 million rows. Every morning at 6 AM, your Power BI dataset refresh kicks off, and by the time executives arrive at 8 AM, it's still running — or worse, it's failed because it timed out trying to reload four years of transaction history just to pick up yesterday's 50,000 new records. You've increased the gateway VM size twice, your IT team is grumbling about compute costs, and your business users are losing faith in the reports they depend on to make decisions.
This is the incremental refresh problem, and it's one of the most common scaling challenges in enterprise Power BI deployments. The solution isn't more hardware — it's a smarter refresh strategy that only processes what's actually changed. Power BI's incremental refresh feature lets you define a rolling window of data that gets refreshed on every cycle while keeping historical partitions frozen in place, permanently loaded and untouched. The result: refresh times that drop from hours to minutes, gateway load that plummets, and datasets that can realistically scale to billions of rows.
By the end of this lesson, you'll be able to design and implement a production-grade incremental refresh strategy from scratch. We'll cover the full lifecycle: configuring the RangeStart/RangeEnd parameters, writing queries that fold correctly, defining the refresh policy, verifying partitions in SSMS or Tabular Editor, handling the edge cases that catch most practitioners off guard, and monitoring the whole thing in production.
What you'll learn:
You should already be comfortable with:
You'll need a Power BI Premium or Premium Per User (PPU) license — incremental refresh configuration is available to everyone in Desktop, but the partition-based execution only activates on Premium capacity or PPU. You'll also need an XMLA endpoint-enabled workspace if you want to inspect or manually manage partitions, which we'll do later in this lesson.
Before touching a single parameter, you need to understand the mechanics. Most practitioners treat incremental refresh as a black box and then wonder why it isn't working. Let's open the box.
When you define an incremental refresh policy and publish your dataset, Power BI Service does something you won't see in the Desktop file: it splits your single query into multiple time-partitioned partitions automatically. Each partition covers a specific time range — say, one month of data for historical periods, or one day for recent data. The partition boundaries are defined by the policy you configure.
On each scheduled refresh cycle, the service evaluates which partitions fall within the "incremental window" (the range you want to keep fresh) and refreshes only those. Partitions outside that window — your historical data — are left completely untouched. They're already loaded into the in-memory columnar store and stay there until you explicitly process them or the data rolls out of your defined total range.
The critical enabler for this to work is query folding. When Power BI applies the RangeStart and RangeEnd parameter filters to your query, those filters must be pushed down to the data source as predicates — not evaluated in the Power Query engine after fetching the data. If folding breaks, Power BI has to pull the entire source table to filter it locally, which defeats the entire purpose and will likely cause your refresh to time out or run out of memory.
This is why the data source and connector matter enormously. SQL Server, Azure SQL, Synapse Analytics, Snowflake, BigQuery, and most enterprise-grade connectors support folding. SharePoint lists, Excel files, and many REST API connectors do not. If your source doesn't support folding, incremental refresh won't save you — you'll need a different strategy.
Key insight: Incremental refresh is fundamentally a partitioning strategy managed by Power BI Service. Desktop is only where you define the policy. The actual partition management happens in the cloud after publish.
Power BI's incremental refresh engine uses two reserved parameter names to communicate partition boundaries to your query. These aren't just any parameters — the names are case-sensitive and must be exact.
In Power BI Desktop, open Power Query Editor and create two new parameters:
Parameter 1:
RangeStartDate/Time1/1/2022 12:00:00 AM (use a date that's within your historical data range)Parameter 2:
RangeEndDate/Time1/1/2023 12:00:00 AMThe data type must be Date/Time — not Date, not Text, not Any. This is one of the most common mistakes. If you set it to Date, the comparison logic in your M query will type-mismatch against datetime columns and either error out or fail to fold properly.
The current values you assign in Desktop are only used when you load/refresh data locally in Desktop — they're your "development window." In production, Power BI Service overwrites these values with the actual partition boundaries.
Warning: Never set RangeStart and RangeEnd to span your full dataset during Desktop development. If you do, Desktop will try to load everything whenever you hit refresh. Set them to a small range — one or two months — to keep local development fast.
With your parameters defined, now you need to wire them into your data query. Open your fact table query in Power Query Editor. Here's what a properly configured query looks like for a SQL Server source:
let
Source = Sql.Database("prod-sql-server.database.windows.net", "SalesWarehouse"),
dbo_FactSales = Source{[Schema="dbo",Item="FactSales"]}[Data],
#"Filtered Rows" = Table.SelectRows(
dbo_FactSales,
each [TransactionDateTime] >= RangeStart
and [TransactionDateTime] < RangeEnd
)
in
#"Filtered Rows"
The filter uses >= for RangeStart and < for RangeEnd. This is intentional — the half-open interval [RangeStart, RangeEnd) ensures that partition boundaries don't overlap and you don't double-count records that land exactly on a boundary.
Your date column must be of datetime type in the source. If it's a plain date column, you'll need to cast or compare carefully — most SQL connectors handle implicit type coercion, but it's worth verifying.
This is not optional. You must verify that your filter is folding. In Power Query Editor, right-click the last applied step ("Filtered Rows") and check whether "View Native Query" is available and not grayed out. If it's available, click it — you should see SQL like this:
SELECT
[TransactionDateTime],
[SalesOrderID],
[CustomerID],
[ProductID],
[Quantity],
[UnitPrice],
[TotalAmount],
[StoreID]
FROM [dbo].[FactSales]
WHERE [TransactionDateTime] >= @RangeStart
AND [TransactionDateTime] < @RangeEnd
If "View Native Query" is grayed out, folding has broken at some step in your query. You need to identify where it broke and fix it before proceeding. Common folding breakers include:
Table.Buffer() anywhere in the chainTable.TransformColumnTypes() calls with locale argumentsPro tip: Apply your RangeStart/RangeEnd filter as early in the query steps as possible — ideally immediately after the source step. Every step before the filter must fold; every step after can potentially break folding without hurting the partition push-down.
Sometimes your date column isn't stored as a datetime — it's stored as an integer in the format YYYYMMDD, which is extremely common in data warehouses. You have two options:
Option 1: Filter on the integer using a calculated comparison (requires a foldable expression)
let
Source = Sql.Database("prod-sql-server.database.windows.net", "SalesWarehouse"),
dbo_FactSales = Source{[Schema="dbo",Item="FactSales"]}[Data],
RangeStartInt = Int32.From(
Date.Year(RangeStart) * 10000
+ Date.Month(RangeStart) * 100
+ Date.Day(RangeStart)
),
RangeEndInt = Int32.From(
Date.Year(RangeEnd) * 10000
+ Date.Month(RangeEnd) * 100
+ Date.Day(RangeEnd)
),
#"Filtered Rows" = Table.SelectRows(
dbo_FactSales,
each [DateKey] >= RangeStartInt
and [DateKey] < RangeEndInt
)
in
#"Filtered Rows"
Test this carefully with "View Native Query" — this arithmetic sometimes folds and sometimes doesn't depending on the connector version. If it doesn't fold, use Option 2.
Option 2: Push a view or stored procedure to the database that exposes a proper datetime column, then query that instead. This is the more reliable enterprise approach and separates your data access layer from your Power BI transformation logic.
With your query verified, right-click your fact table in the Fields pane in Power BI Desktop and select "Incremental refresh and real-time data." This opens the policy configuration dialog.
You'll configure two primary settings:
Archive data starting: How far back in total you want to keep data. For a sales fact table, this might be 5 years. Everything older than this will be removed from the dataset on the next refresh.
Incrementally refresh data starting: The rolling window that gets refreshed on every cycle. This is typically 1 day, 3 days, or 1 week depending on your data latency requirements and whether late-arriving data is a concern.
The right sizes depend on your business requirements and data characteristics. Here's a realistic framework:
Total history (archive window): Match your reporting requirements. If your executives need 3-year trend comparisons, you need at least 3 years. Add a buffer — 4 years if they need 3.
Incremental refresh window: This is about late-arriving data tolerance. If your source systems occasionally post-date transactions (a common issue in retail, logistics, and finance), you need a wider refresh window to catch corrections.
A common enterprise pattern is:
This means the last 7 days get refreshed every cycle, while the prior 5 years minus 7 days sit in frozen historical partitions. For a dataset with daily new records, this is usually more than enough to catch late arrivals.
If your data has a 30-day correction window (common in financial systems where month-end adjustments flow in throughout the following month), you'd set the incremental window to 35 days or 2 months to be safe.
Warning: A wider incremental refresh window means more data is processed on each refresh cycle. There's a real cost tradeoff. If you set your incremental window to 3 months and you have 100 million rows per month, you're processing 300 million rows every refresh. Measure your source query performance before committing to a wide window.
If your source table has a LastModifiedDateTime or UpdatedAt column, you can enable the "Only refresh complete periods" and "Detect data changes" options. This tells Power BI to query the max value of that column for each partition and only refresh a partition if its maximum update timestamp has changed since the last refresh.
In the "Detect data changes" field, you'd enter the column name: LastModifiedDateTime.
This is powerful for scenarios where only a small fraction of recent records are actually updated on any given day. Instead of refreshing all 7 days of data, Power BI might only refresh 2 days because those are the only partitions that actually have new or modified records.
Note: "Detect data changes" requires the column to be accessible and foldable. It also requires your source to correctly maintain the update timestamp — if records get modified without updating this column, you'll have stale data in your dataset without knowing it.
After configuring the policy, close the dialog, save your file, and publish to a Premium or PPU workspace. This is where incremental refresh comes to life.
When you publish for the first time, you still need to trigger a full refresh to load your historical data. Go to the dataset in Power BI Service, open Settings, and confirm the data source credentials are configured for your gateway (if on-premises) or direct cloud connection.
Trigger the first manual refresh. This initial refresh will take a while — it's loading your full archive window of data and creating all the partitions. For a 5-year dataset with monthly partitioning, you'll see roughly 60 historical partitions get created plus some more granular recent partitions.
After this first full load, subsequent scheduled refreshes will only touch the incremental window. That's when you see the dramatic time savings.
Power BI automatically determines partition granularity based on your policy:
| Incremental Window | Historical Partition Size |
|---|---|
| Less than 1 day | Hour |
| 1 day to 1 month | Day |
| 1 month to 1 year | Month |
| More than 1 year | Year |
The more recent partitions (within the incremental window) get finer granularity; older historical partitions get coarser granularity. This is automatic and generally sensible, but it matters for understanding your partition count when using Tabular Editor.
This is where enterprise practitioners separate themselves from casual users. Power BI's incremental refresh is powerful, but it's partially opaque through the standard UI. To really understand and control what's happening, you need to connect to the XMLA endpoint.
In your workspace settings (Premium workspaces), find the XMLA endpoint URL — it looks like powerbi://api.powerbi.com/v1.0/myorg/YourWorkspaceName.
Open SSMS and connect with "Analysis Services" as the server type, pasting the XMLA URL as the server name. Use your Power BI credentials. Once connected, expand the database (your dataset) and navigate to Tables to see all your objects, then expand a table to see Partitions.
For a properly configured incremental refresh dataset, you'll see something like:
FactSales
└── Partitions
├── FactSales-2019-Q1 (historical, year granularity)
├── FactSales-2019-Q2
├── FactSales-2020-M01 (month granularity)
├── FactSales-2020-M02
...
├── FactSales-2024-M11-D01 (day granularity, recent)
├── FactSales-2024-M11-D02
...
└── FactSales_IncrementalRefresh (the active refresh partition)
You can right-click any partition and choose "Process Partition" to manually force a refresh of just that slice. This is invaluable when:
Tabular Editor (free version works for this) connects to the XMLA endpoint and gives you a more developer-friendly view. Connect via "File > Open > Model from DB (PowerBI / SSAS)" and paste the XMLA endpoint.
In the partition editor, you can see and modify the M query for each partition, which reveals exactly how Power BI has parameterized your base query:
let
Source = Sql.Database("prod-sql-server.database.windows.net", "SalesWarehouse"),
dbo_FactSales = Source{[Schema="dbo",Item="FactSales"]}[Data],
#"Filtered Rows" = Table.SelectRows(
dbo_FactSales,
each [TransactionDateTime] >= #datetime(2024, 11, 1, 0, 0, 0)
and [TransactionDateTime] < #datetime(2024, 12, 1, 0, 0, 0)
)
in
#"Filtered Rows"
Notice how Power BI has replaced RangeStart and RangeEnd with hardcoded datetime literals for each partition. That's the partition isolation working correctly.
Warning: Do not manually edit partition queries in Tabular Editor unless you deeply understand what you're doing. Power BI Service owns the partition management lifecycle. If you manually alter partitions and then a scheduled refresh runs, the service may overwrite or conflict with your changes. Use manual partition processing only for emergency backfills, not for routine changes.
Theory is clean; production is messy. Here are the scenarios that actually come up in enterprise deployments.
Suppose your incremental refresh window is 7 days, but your finance team informs you that a posting error from 45 days ago was corrected at the source. That corrected data won't be picked up by your next refresh because it's outside the window.
Your options:
For ad-hoc corrections, Option 1 is usually best. For systemic late-arrival patterns, Option 3 is the sustainable enterprise solution.
If someone adds or removes a column from your source table, it will break your dataset. Schema changes require a republish of the Desktop file with the updated query. When you republish, Power BI will regenerate the partition structure — but it won't automatically reload historical partitions with the new schema.
After a schema change republish, you'll likely need to do a full refresh of the dataset to rebuild all partitions with the updated column structure. Plan for this in your change management process.
If you trigger "Refresh Now" from the Power BI Service dataset page, it refreshes all partitions within the incremental window — not a full historical reload. If you need to do a complete historical reload (after a schema change or data quality issue), you must do one of the following:
POST /datasets/{datasetId}/refreshes with {"type": "full"}The REST API approach is cleanest for automation:
curl -X POST \
'https://api.powerbi.com/v1.0/myorg/groups/{workspaceId}/datasets/{datasetId}/refreshes' \
-H 'Authorization: Bearer {token}' \
-H 'Content-Type: application/json' \
-d '{"type": "full"}'
If your Premium capacity supports it, you can extend incremental refresh with hybrid tables — where the most recent partition is served via DirectQuery rather than Import. This means your "today's data" is always live from the source, without waiting for the next scheduled refresh.
To enable this, check "Get the latest data in real time with DirectQuery" in the incremental refresh policy dialog. Power BI will configure the most recent partition as DirectQuery and all historical partitions as Import. You get import-speed performance for historical analysis and real-time freshness for current-day data.
The tradeoff: DirectQuery has performance implications for complex DAX measures. Test your most expensive measures against the live partition before committing to this approach.
Let's build this end-to-end. You'll implement a complete incremental refresh solution for a sales analytics dataset against a SQL Server source.
Scenario: You're a Power BI developer at a retail company. Your FactSales table in SQL Server has 5 years of daily transaction data — approximately 400 million rows. Refreshing the full table takes 3.5 hours. Your SLA is for data to be no more than 4 hours old by 8 AM. You need to get refresh time under 30 minutes.
Step 1: Assess your source table
First, query your source to understand data volume and distribution:
SELECT
CAST(TransactionDateTime AS DATE) AS TransactionDate,
COUNT(*) AS RowCount,
MAX(LastModifiedDateTime) AS MaxModifiedDate
FROM dbo.FactSales
WHERE TransactionDateTime >= DATEADD(MONTH, -3, GETDATE())
GROUP BY CAST(TransactionDateTime AS DATE)
ORDER BY TransactionDate DESC;
This tells you how many rows arrive per day (helps size your incremental window) and whether LastModifiedDateTime is being maintained (enables "Detect data changes").
Step 2: Create parameters in Power Query
Create RangeStart as Date/Time with value 10/1/2024 12:00:00 AM and RangeEnd as Date/Time with value 11/1/2024 12:00:00 AM. These give you a one-month development window.
Step 3: Build the query with folding
let
Source = Sql.Database(
"prod-sql-server.database.windows.net",
"RetailWarehouse",
[Query=null, CommandTimeout=null]
),
FactSales = Source{[Schema="dbo", Item="FactSales"]}[Data],
FilteredByDate = Table.SelectRows(
FactSales,
each [TransactionDateTime] >= RangeStart
and [TransactionDateTime] < RangeEnd
),
RemovedUnneededCols = Table.SelectColumns(
FilteredByDate,
{
"SalesOrderID", "TransactionDateTime", "CustomerID",
"ProductID", "StoreID", "Quantity", "UnitPrice",
"TotalAmount", "ChannelCode", "LastModifiedDateTime"
}
),
RenamedCols = Table.RenameColumns(
RemovedUnneededCols,
{{"ChannelCode", "SalesChannel"}}
)
in
RenamedCols
Verify "View Native Query" shows your date filter predicates in the generated SQL.
Step 4: Configure the policy
In the incremental refresh dialog:
Step 5: Publish and run initial refresh
Publish to your Premium workspace. Trigger an initial full refresh from the Service. Monitor it in the refresh history — expect this to take 45-90 minutes for the initial load (still better than 3.5 hours for subsequent runs).
Step 6: Verify partition structure
Connect to the XMLA endpoint in SSMS. Navigate to the FactSales table partitions. You should see approximately 5 year-granularity partitions for older data, monthly partitions for mid-range data, and day-granularity partitions for the last 10 days.
Step 7: Schedule and measure
Set a daily scheduled refresh at 5 AM. After the first automatic refresh completes, check refresh history duration. Target: under 30 minutes.
Mistake 1: Parameters are the wrong data type
Symptom: Refresh errors mentioning type mismatch, or "View Native Query" shows no date filter.
Fix: Delete and recreate both parameters with explicit Date/Time type. Even Date is wrong.
Mistake 2: Query folding breaks due to a step before the date filter
Symptom: "View Native Query" is grayed out on the filter step. Refresh processes the full table. Fix: Move the date filter step immediately after the source connection. Check each intermediate step using "View Native Query" to find where folding breaks. Restructure or move problematic transformations to after the filter.
Mistake 3: The incremental window is too narrow for late-arriving data
Symptom: Historical reports show sudden drops or inconsistencies for periods just outside the refresh window, especially after month-end.
Fix: Increase the incremental window to match your source system's correction window. Query your source for the distribution of LastModifiedDateTime - TransactionDateTime to understand the actual lag.
Mistake 4: Publishing overwrites manual partition changes
Symptom: Manually processed historical partitions get reverted after the next Desktop publish. Fix: Avoid republishing the Desktop file unless the query or policy has genuinely changed. Use the XMLA endpoint for operational partition management; Desktop publish is for structural changes only.
Mistake 5: Gateway timeout during initial full load
Symptom: The first refresh after publishing fails with gateway timeout errors.
Fix: For very large datasets, the initial full load can exceed gateway timeout limits. Use the Power BI REST API with {"type": "full", "commitMode": "partialBatch"} to allow partial commits, or consider loading the dataset in segments by temporarily setting a narrow initial range and gradually widening it.
Mistake 6: Historical data drift after a source truncate-and-reload
Symptom: Someone reloaded the source warehouse, but your Power BI dataset still shows the old data in historical partitions that weren't refreshed. Fix: When source data is rebuilt from scratch, you need a full dataset refresh. Frozen historical partitions are frozen — they don't self-heal. Build a monitoring query that periodically compares row counts between your source and each Power BI partition to detect drift early.
Debugging with Refresh History
Power BI Service refresh history shows duration and status, but for detailed diagnostics, use the Activity Log via REST API or Azure Log Analytics if your capacity is connected. The logs show individual partition refresh times, which helps you identify which specific partition is causing slowness.
GET https://api.powerbi.com/v1.0/myorg/groups/{workspaceId}/datasets/{datasetId}/refreshes
The response includes per-refresh metadata. For partition-level detail, you'll need SSMS connected to XMLA — right-click the table and look at "Partition Properties" after a refresh to see last processed timestamps per partition.
Incremental refresh transforms Power BI from a dataset refresh bottleneck into a sustainable enterprise-scale analytics platform. The core principles you've implemented here:
Where to go from here:
The investment you've made in understanding incremental refresh properly — not just clicking through a wizard but understanding the partitioning engine, the folding requirements, and the operational management — is what makes the difference between a fragile refresh setup and a production-grade enterprise analytics system.
Learning Path: Enterprise Power BI