
Picture this: You inherit a Power Query solution that processes monthly sales data from five different systems. The original creator left detailed documentation... just kidding. You're staring at a single massive query with cryptic column names, hardcoded values scattered throughout, and transformations that seem to work by magic. When you try to add a new data source, everything breaks in spectacular fashion.
Sound familiar? This scenario plays out in organizations worldwide where Power Query solutions start simple but evolve into unmaintainable monsters. The difference between a query that works today and one that works reliably for months or years lies in following disciplined practices from the start.
By the end of this lesson, you'll know how to architect Power Query solutions that your future self (and your colleagues) will thank you for. You'll build ETL processes that are transparent, debuggable, and adaptable to changing business requirements.
What you'll learn:
You should be comfortable with basic Power Query operations including connecting to data sources, applying transformations, and working with the Advanced Editor. We'll assume you understand M language fundamentals and can read existing Power Query code.
Maintainable ETL isn't just about clean code—it's about building systems that survive contact with the real world. Business requirements change, data sources evolve, team members come and go. Your Power Query solutions need to handle these realities gracefully.
The core principle is predictable behavior under change. When something needs to be modified, you should be able to:
This mindset shapes every decision from naming conventions to error handling strategies.
The biggest mistake in Power Query development is creating monolithic queries that try to do everything at once. Instead, break your ETL process into focused, reusable components that can be tested and maintained independently.
Create custom functions for any logic you'll use more than once. Here's a practical example from a financial reporting system:
// Function: CleanCurrencyText
(inputText as text) as number =>
let
// Remove common currency symbols and formatting
CleanedText = Text.Replace(
Text.Replace(
Text.Replace(inputText, "$", ""),
",", ""
),
"(", "-"
),
// Handle parentheses notation for negative numbers
FinalText = if Text.EndsWith(CleanedText, ")")
then Text.Replace(CleanedText, ")", "")
else CleanedText,
// Convert to number with error handling
Result = try Number.FromText(FinalText) otherwise null
in
Result
This function handles common currency formatting issues in one place. When you discover a new edge case (maybe some reports use € symbols), you fix it once rather than hunting through multiple queries.
Organize your queries into logical layers that build upon each other:
Raw Data Layer: Direct connections to source systems with minimal transformation
// Query: Raw_SalesData
let
Source = Sql.Database("prod-server", "SalesDB"),
SalesTable = Source{[Schema="dbo",Item="Sales"]}[Data]
in
SalesTable
Standardization Layer: Apply consistent formatting and business rules
// Query: Standardized_Sales
let
Source = Raw_SalesData,
// Apply consistent data types
TypedColumns = Table.TransformColumnTypes(Source, {
{"OrderDate", type date},
{"Amount", Currency.Type},
{"CustomerID", type text}
}),
// Standardize customer ID format (remove leading zeros, convert to text)
CleanCustomerID = Table.TransformColumns(TypedColumns, {
{"CustomerID", each Text.PadStart(Text.From(_), 8, "0"), type text}
})
in
CleanCustomerID
Business Logic Layer: Complex calculations and derived fields
// Query: Enriched_Sales
let
Source = Standardized_Sales,
// Add fiscal year calculation
AddFiscalYear = Table.AddColumn(Source, "FiscalYear",
each if Date.Month([OrderDate]) >= 7
then Date.Year([OrderDate]) + 1
else Date.Year([OrderDate])
),
// Merge with customer segment data
MergeSegments = Table.NestedJoin(AddFiscalYear, {"CustomerID"},
CustomerSegments, {"CustomerID"}, "SegmentData", JoinKind.LeftOuter),
// Expand segment information
ExpandSegments = Table.ExpandTableColumn(MergeSegments, "SegmentData",
{"Segment", "Region"}, {"CustomerSegment", "CustomerRegion"})
in
ExpandSegments
This layered approach provides several benefits:
Create a library of utility functions that standardize common operations across your organization:
// Query: DateUtils
[
// Get the last day of month for any date
EndOfMonth = (inputDate as date) as date =>
Date.EndOfMonth(inputDate),
// Convert various date formats to standard date
StandardizeDate = (inputText as text) as date =>
let
// Handle MM/DD/YYYY, DD/MM/YYYY, YYYY-MM-DD formats
CleanText = Text.Replace(inputText, "/", "-"),
ParsedDate = try Date.FromText(CleanText) otherwise null
in
ParsedDate,
// Calculate business days between two dates
BusinessDaysBetween = (startDate as date, endDate as date) as number =>
let
DaysList = List.Dates(startDate, Duration.Days(endDate - startDate) + 1, #duration(1,0,0,0)),
BusinessDays = List.Select(DaysList, each Date.DayOfWeek(_) <> 0 and Date.DayOfWeek(_) <> 6),
Count = List.Count(BusinessDays)
in
Count
]
Reference these utilities in other queries:
let
Source = SomeDataSource,
AddEndOfMonth = Table.AddColumn(Source, "MonthEnd",
each DateUtils[EndOfMonth]([TransactionDate]))
in
AddEndOfMonth
Hardcoded values are the enemy of maintainable ETL. They make queries brittle and difficult to deploy across environments. Power Query parameters provide a clean solution, but they need to be managed systematically.
Create a centralized configuration query that handles environment differences:
// Query: Config
let
// Determine environment based on current user or machine
CurrentUser = Environment.UserName(),
IsDevelopment = Text.Contains(Text.Lower(CurrentUser), "dev") or
Text.Contains(Text.Lower(Environment.MachineName()), "dev"),
// Environment-specific settings
Settings = if IsDevelopment then [
DatabaseServer = "dev-sql-server",
DatabaseName = "SalesDB_Dev",
FilePath = "C:\DevData\",
EmailNotifications = false
] else [
DatabaseServer = "prod-sql-server",
DatabaseName = "SalesDB",
FilePath = "\\prod-share\data\",
EmailNotifications = true
],
// Add common settings that don't vary by environment
AllSettings = Settings & [
MaxRetries = 3,
TimeoutMinutes = 30,
DefaultDateFormat = "yyyy-MM-dd"
]
in
AllSettings
Use configuration values throughout your queries:
// Query: ProductData
let
ServerName = Config[DatabaseServer],
DatabaseName = Config[DatabaseName],
Source = Sql.Database(ServerName, DatabaseName),
ProductTable = Source{[Schema="dbo",Item="Products"]}[Data]
in
ProductTable
Many ETL processes need flexible date ranges. Create parameters that support both absolute and relative dates:
// Parameter: ProcessingMode (Text)
// Values: "Incremental", "Full", "DateRange"
// Parameter: StartDate (Date)
// Only used when ProcessingMode = "DateRange"
// Parameter: EndDate (Date)
// Only used when ProcessingMode = "DateRange"
// Query: DateRange_Calculator
let
Mode = ProcessingMode,
Today = DateTime.Date(DateTime.LocalNow()),
DateRange = if Mode = "Incremental" then [
StartDate = Date.AddDays(Today, -7), // Last 7 days
EndDate = Today
] else if Mode = "Full" then [
StartDate = #date(2020, 1, 1), // Historical start
EndDate = Today
] else [ // DateRange mode
StartDate = StartDate, // Use parameter values
EndDate = EndDate
]
in
DateRange
This approach allows the same query to handle daily incremental loads, historical refreshes, and ad-hoc date range requests without modification.
Database connections often need different authentication methods across environments:
// Query: DatabaseConnection
let
ServerName = Config[DatabaseServer],
DatabaseName = Config[DatabaseName],
// Choose connection method based on environment
Connection = if Config[UseWindowsAuth] then
Sql.Database(ServerName, DatabaseName)
else
Sql.Database(ServerName, DatabaseName, [
Query = null,
CreateNavigationProperties = false,
Username = Config[DatabaseUser],
Password = Config[DatabasePassword]
])
in
Connection
Production ETL systems must handle errors gracefully and provide clear diagnostic information. Power Query's error handling capabilities, combined with good practices, create robust solutions.
Don't just catch errors—provide meaningful alternatives and logging:
// Query: Customer_Data_With_ErrorHandling
let
// Attempt to load customer data with multiple fallback strategies
PrimaryAttempt = try Sql.Database(Config[PrimaryServer], Config[DatabaseName]){[Schema="dbo",Item="Customers"]}[Data] otherwise null,
SecondaryAttempt = if PrimaryAttempt = null then
try Sql.Database(Config[BackupServer], Config[DatabaseName]){[Schema="dbo",Item="Customers"]}[Data] otherwise null
else PrimaryAttempt,
FileBackupAttempt = if SecondaryAttempt = null then
try Excel.Workbook(File.Contents(Config[BackupFilePath] & "CustomerBackup.xlsx")) otherwise null
else SecondaryAttempt,
// Create an error report if all attempts failed
ErrorReport = if FileBackupAttempt = null then
Table.FromRows({
{"Error", "Failed to load customer data from all sources", DateTime.LocalNow()}
}, {"Type", "Message", "Timestamp"})
else FileBackupAttempt,
// Add metadata about data source used
FinalResult = if FileBackupAttempt <> null then
Table.AddColumn(FileBackupAttempt, "DataSource",
each if PrimaryAttempt <> null then "Primary"
else if SecondaryAttempt <> null then "Secondary"
else "File Backup")
else ErrorReport
in
FinalResult
Some data quality issues affect individual fields rather than entire records. Handle these at the column level:
let
Source = RawCustomerData,
// Clean phone numbers with error tracking
CleanPhoneNumbers = Table.AddColumn(Source, "CleanPhone",
each let
OriginalPhone = [Phone],
CleanAttempt = try Text.Select(OriginalPhone, {"0".."9"}) otherwise null,
ValidLength = if CleanAttempt <> null and Text.Length(CleanAttempt) = 10
then CleanAttempt
else null
in ValidLength
),
// Track data quality issues
AddQualityFlags = Table.AddColumn(CleanPhoneNumbers, "PhoneQualityIssue",
each if [CleanPhone] = null and [Phone] <> null then "Invalid Format"
else if [CleanPhone] = null and [Phone] = null then "Missing"
else "Valid"
)
in
AddQualityFlags
Create a standardized error logging mechanism:
// Query: ErrorLogger
(ErrorSource as text, ErrorDetails as text, optional AdditionalData as record) as table =>
let
BaseError = [
Timestamp = DateTime.LocalNow(),
Source = ErrorSource,
Details = ErrorDetails,
Environment = Config[Environment],
User = Environment.UserName()
],
CompleteError = if AdditionalData = null
then BaseError
else BaseError & AdditionalData,
ErrorTable = Table.FromRecords({CompleteError})
in
ErrorTable
Use the error logger throughout your queries:
let
DataLoad = try SomeComplexOperation otherwise ErrorLogger("CustomerLoad", "Failed to connect to customer database", [
Server = Config[DatabaseServer],
AttemptTime = DateTime.LocalNow()
])
in
DataLoad
Code that explains itself reduces maintenance burden and knowledge transfer time. Good naming conventions and strategic comments make queries readable by team members who didn't write them.
Establish and follow consistent naming patterns across your organization:
Queries: Use PascalCase with prefixes that indicate purpose
Raw_CustomerData (direct source connection)Clean_CustomerData (standardized and validated)Dim_Customer (dimension table for reporting)Fact_Sales (fact table for reporting)Util_DateFunctions (reusable utilities)Columns: Use descriptive names that include units and context
// Instead of: "Amount"
// Use: "SalesAmountUSD" or "OrderTotalExcludingTax"
// Instead of: "Date"
// Use: "OrderDate" or "ShipmentDate"
// Instead of: "Flag"
// Use: "IsActiveCustomer" or "HasDiscount"
Steps: Name each transformation step clearly
let
Source = Raw_SalesData,
FilterCurrentYear = Table.SelectRows(Source, each Date.Year([OrderDate]) = Date.Year(DateTime.Date(DateTime.LocalNow()))),
AddCalculatedFields = Table.AddColumn(FilterCurrentYear, "ProfitMargin", each [Revenue] - [Cost]),
RemoveTestData = Table.SelectRows(AddCalculatedFields, each not Text.StartsWith([CustomerID], "TEST")),
SortByOrderDate = Table.Sort(RemoveTestData, {{"OrderDate", Order.Descending}})
in
SortByOrderDate
Don't comment everything—comment the non-obvious decisions and business rules:
let
Source = CustomerData,
// Business Rule: Customers are considered "Active" if they've placed an order
// in the last 18 months OR have a signed contract with future start date
AddActiveFlag = Table.AddColumn(Source, "IsActive",
each let
LastOrderDate = [MostRecentOrderDate],
ContractEndDate = [ContractEndDate],
EighteenMonthsAgo = Date.AddMonths(DateTime.Date(DateTime.LocalNow()), -18)
in
(LastOrderDate <> null and LastOrderDate >= EighteenMonthsAgo) or
(ContractEndDate <> null and ContractEndDate >= DateTime.Date(DateTime.LocalNow()))
),
// Performance Note: This lookup is expensive but necessary for compliance reporting
// Consider pre-calculating if this query times out
MergeComplianceData = Table.NestedJoin(AddActiveFlag, {"CustomerID"},
ComplianceRegistry, {"ID"}, "ComplianceInfo", JoinKind.LeftOuter)
in
MergeComplianceData
Add metadata to complex queries that explains their purpose and assumptions:
// Query: Monthly_Revenue_Report
// Purpose: Generates monthly revenue summaries for executive dashboard
// Data Sources: Sales database, customer master data, exchange rates
// Update Frequency: Daily at 6 AM
// Business Owner: Finance Team (contact: finance@company.com)
// Last Modified: 2024-01-15
// Dependencies: Raw_Sales, Dim_Customer, ExchangeRates
//
// Assumptions:
// - Revenue recognition follows accrual accounting
// - Multi-currency transactions use month-end exchange rates
// - Cancelled orders are excluded from revenue calculations
//
let
// Implementation follows...
Performance problems often emerge gradually as data volumes grow. Building performance-conscious habits from the start prevents many issues.
Query folding allows Power Query to push operations back to the data source, dramatically improving performance. Structure your queries to maximize folding opportunities:
// Good: Operations that can fold back to SQL Server
let
Source = Sql.Database("server", "database"),
SalesTable = Source{[Schema="dbo",Item="Sales"]}[Data],
FilteredRows = Table.SelectRows(SalesTable, each [OrderDate] >= #date(2024, 1, 1)),
SelectedColumns = Table.SelectColumns(FilteredRows, {"OrderID", "CustomerID", "Amount"}),
SortedTable = Table.Sort(SelectedColumns, {{"Amount", Order.Descending}})
in
SortedTable
// This translates to efficient SQL:
// SELECT OrderID, CustomerID, Amount
// FROM Sales
// WHERE OrderDate >= '2024-01-01'
// ORDER BY Amount DESC
Avoid operations that break query folding when possible:
// This breaks folding - complex M functions can't translate to SQL
AddCustomColumn = Table.AddColumn(Source, "ComplexCalculation",
each if Text.Contains([ProductName], "Premium") and [Amount] > 1000
then [Amount] * 1.15
else [Amount])
Large table joins require careful consideration of join types and merge algorithms:
// Efficient approach: Use appropriate join types and pre-filter
let
// Filter large tables before joining
RecentOrders = Table.SelectRows(Orders, each [OrderDate] >= Date.AddMonths(DateTime.Date(DateTime.LocalNow()), -6)),
ActiveCustomers = Table.SelectRows(Customers, each [Status] = "Active"),
// Use inner join when you only need matching records
OrderCustomerJoin = Table.NestedJoin(RecentOrders, {"CustomerID"},
ActiveCustomers, {"CustomerID"}, "CustomerInfo", JoinKind.Inner),
// Expand only needed columns
ExpandedResult = Table.ExpandTableColumn(OrderCustomerJoin, "CustomerInfo",
{"CustomerName", "Segment"}, {"CustomerName", "CustomerSegment"})
in
ExpandedResult
For very large datasets, consider processing data in chunks:
// Function: ProcessLargeDataset
(tableName as text, batchSize as number) as table =>
let
Source = Sql.Database(Config[DatabaseServer], Config[DatabaseName]),
// Get total record count
RecordCount = Table.RowCount(Source{[Schema="dbo",Item=tableName]}[Data]),
// Calculate number of batches needed
BatchCount = Number.RoundUp(RecordCount / batchSize),
// Process in batches
ProcessBatch = (batchNumber as number) as table =>
let
Offset = batchNumber * batchSize,
BatchQuery = "SELECT * FROM " & tableName &
" ORDER BY ID OFFSET " & Text.From(Offset) &
" ROWS FETCH NEXT " & Text.From(batchSize) & " ROWS ONLY",
BatchData = Sql.Database(Config[DatabaseServer], Config[DatabaseName], [Query = BatchQuery])
in
BatchData,
// Combine all batches
AllBatches = List.Generate(
() => 0,
each _ < BatchCount,
each _ + 1,
each ProcessBatch(_)
),
CombinedResult = Table.Combine(AllBatches)
in
CombinedResult
Systematic testing prevents ETL failures that can cascade through downstream systems and reports.
Create test queries for your custom functions:
// Query: Test_CleanCurrencyText
let
// Test data with various currency formats
TestCases = Table.FromRows({
{"$1,234.56", 1234.56},
{"$(1,234.56)", -1234.56},
{"1234.56", 1234.56},
{"$0.00", 0},
{"invalid", null},
{"$", null}
}, {"Input", "Expected"}),
// Apply the function to test inputs
AddActualResults = Table.AddColumn(TestCases, "Actual",
each CleanCurrencyText([Input])),
// Check if results match expectations
AddTestResult = Table.AddColumn(AddActualResults, "TestResult",
each if [Expected] = [Actual] then "PASS"
else if [Expected] = null and [Actual] = null then "PASS"
else "FAIL")
in
AddTestResult
Build validation checks into your ETL processes:
// Query: DataQuality_Sales
let
Source = ProcessedSalesData,
// Check for required fields
MissingOrderID = Table.SelectRows(Source, each [OrderID] = null or [OrderID] = ""),
MissingCustomerID = Table.SelectRows(Source, each [CustomerID] = null or [CustomerID] = ""),
MissingAmount = Table.SelectRows(Source, each [Amount] = null),
// Check for data anomalies
NegativeAmounts = Table.SelectRows(Source, each [Amount] < 0),
FutureDates = Table.SelectRows(Source, each [OrderDate] > DateTime.Date(DateTime.LocalNow())),
// Create quality report
QualityReport = Table.FromRows({
{"Missing Order ID", Table.RowCount(MissingOrderID)},
{"Missing Customer ID", Table.RowCount(MissingCustomerID)},
{"Missing Amount", Table.RowCount(MissingAmount)},
{"Negative Amounts", Table.RowCount(NegativeAmounts)},
{"Future Dates", Table.RowCount(FutureDates)}
}, {"Issue Type", "Count"}),
// Add timestamp and threshold flags
AddMetadata = Table.AddColumn(QualityReport, "ExceedsThreshold",
each [Count] > 100) // Adjust threshold as needed
in
AddMetadata
Test end-to-end data flow with sample datasets:
// Query: IntegrationTest_SalesFlow
let
// Create test dataset
TestData = Table.FromRows({
{"ORD001", "CUST001", #date(2024, 1, 15), 1500.00},
{"ORD002", "CUST002", #date(2024, 1, 16), 750.50},
{"ORD003", "CUST001", #date(2024, 1, 17), 2200.75}
}, {"OrderID", "CustomerID", "OrderDate", "Amount"}),
// Run through complete transformation pipeline
Step1_Standardized = fnStandardizeData(TestData),
Step2_Enriched = fnEnrichWithCustomerData(Step1_Standardized),
Step3_Calculated = fnAddCalculatedFields(Step2_Enriched),
// Validate final results
FinalValidation = Table.AddColumn(Step3_Calculated, "ValidationResult",
each if [OrderID] <> null and
[CustomerName] <> null and
[ProfitMargin] <> null
then "PASS"
else "FAIL")
in
FinalValidation
Let's put these practices together by building a complete customer analytics ETL pipeline. This exercise simulates a real-world scenario where you need to combine customer data from multiple sources, apply business rules, and prepare it for analysis.
You're building an ETL pipeline for a retail company that needs to analyze customer behavior across multiple channels. The requirements:
First, set up your configuration structure:
// Query: Config
let
Environment = "Development", // Change to "Production" when deploying
Settings = if Environment = "Development" then [
CRMConnectionString = "Data Source=dev-crm-server;Initial Catalog=CRM_Dev;",
EcommerceApiUrl = "https://dev-api.ecommerce.com/customers",
OutputPath = "C:\DevOutput\",
EnableDetailedLogging = true
] else [
CRMConnectionString = "Data Source=prod-crm-server;Initial Catalog=CRM;",
EcommerceApiUrl = "https://api.ecommerce.com/customers",
OutputPath = "\\prod-share\analytics\",
EnableDetailedLogging = false
],
CommonSettings = [
MaxRetryAttempts = 3,
DefaultCustomerSegments = {"Bronze", "Silver", "Gold", "Platinum"},
LifetimeValueThreshold = 1000
]
in
Settings & CommonSettings
// Parameter: ProcessingMode (Text)
// Default: "Incremental"
// Parameter: LookbackDays (Whole Number)
// Default: 30
Create reusable functions for common operations:
// Query: CustomerUtils
[
// Function to standardize phone numbers
CleanPhoneNumber = (phoneInput as text) as text =>
let
CleanedPhone = if phoneInput = null then null
else Text.Select(phoneInput, {"0".."9"}),
FormattedPhone = if CleanedPhone <> null and Text.Length(CleanedPhone) = 10
then "(" & Text.Middle(CleanedPhone, 0, 3) & ") " &
Text.Middle(CleanedPhone, 3, 3) & "-" &
Text.Middle(CleanedPhone, 6, 4)
else null
in
FormattedPhone,
// Function to calculate customer segment based on lifetime value
AssignCustomerSegment = (lifetimeValue as number) as text =>
if lifetimeValue = null then "Unknown"
else if lifetimeValue >= 5000 then "Platinum"
else if lifetimeValue >= 2000 then "Gold"
else if lifetimeValue >= 500 then "Silver"
else "Bronze",
// Function to handle data quality scoring
CalculateDataQualityScore = (record as record) as number =>
let
Checks = {
if Record.HasFields(record, "Email") and record[Email] <> null then 1 else 0,
if Record.HasFields(record, "Phone") and record[Phone] <> null then 1 else 0,
if Record.HasFields(record, "Address") and record[Address] <> null then 1 else 0,
if Record.HasFields(record, "FirstName") and record[FirstName] <> null then 1 else 0,
if Record.HasFields(record, "LastName") and record[LastName] <> null then 1 else 0
},
Score = List.Sum(Checks) / List.Count(Checks) * 100
in
Score
]
Build your raw data connections with error handling:
// Query: Raw_CRM_Customers
let
ConnectionAttempt = try Sql.Database("crm-server", "CRM_Database") otherwise null,
DataLoad = if ConnectionAttempt <> null then
try ConnectionAttempt{[Schema="dbo",Item="Customers"]}[Data]
otherwise Table.FromRows({}, {"CustomerID", "FirstName", "LastName", "Email", "Phone", "JoinDate"})
else
Table.FromRows({}, {"CustomerID", "FirstName", "LastName", "Email", "Phone", "JoinDate"}),
// Add metadata about data source status
AddSourceMetadata = if Table.RowCount(DataLoad) > 0 then
Table.AddColumn(DataLoad, "SourceSystem", each "CRM")
else
Table.AddColumn(DataLoad, "SourceSystem", each "CRM_ERROR")
in
AddSourceMetadata
// Query: Raw_Ecommerce_Customers
let
DateRange = if ProcessingMode = "Incremental" then
[StartDate = Date.AddDays(DateTime.Date(DateTime.LocalNow()), -LookbackDays),
EndDate = DateTime.Date(DateTime.LocalNow())]
else
[StartDate = #date(2020, 1, 1),
EndDate = DateTime.Date(DateTime.LocalNow())],
ApiCall = try Web.Contents(Config[EcommerceApiUrl], [
Query = [
start_date = Date.ToText(DateRange[StartDate], "yyyy-MM-dd"),
end_date = Date.ToText(DateRange[EndDate], "yyyy-MM-dd")
]
]) otherwise null,
ParsedData = if ApiCall <> null then
try Json.Document(ApiCall) otherwise {}
else {},
ConvertToTable = if List.Count(ParsedData) > 0 then
Table.FromList(ParsedData, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
else
Table.FromRows({}, {"Column1"}),
ExpandRecords = if Table.RowCount(ConvertToTable) > 0 then
Table.ExpandRecordColumn(ConvertToTable, "Column1",
{"customer_id", "email", "total_orders", "total_spent", "last_order_date"},
{"CustomerID", "Email", "TotalOrders", "TotalSpent", "LastOrderDate"})
else
Table.FromRows({}, {"CustomerID", "Email", "TotalOrders", "TotalSpent", "LastOrderDate"})
in
ExpandRecords
Create standardized versions of your raw data:
// Query: Standardized_CRM_Customers
let
Source = Raw_CRM_Customers,
// Apply consistent data types and formatting
StandardizedTypes = Table.TransformColumnTypes(Source, {
{"CustomerID", type text},
{"FirstName", type text},
{"LastName", type text},
{"Email", type text},
{"Phone", type text},
{"JoinDate", type date}
}),
// Clean and format phone numbers
CleanPhones = Table.TransformColumns(StandardizedTypes, {
{"Phone", CustomerUtils[CleanPhoneNumber], type text}
}),
// Standardize email format
CleanEmails = Table.TransformColumns(CleanPhones, {
{"Email", each if _ <> null then Text.Lower(Text.Trim(_)) else null, type text}
}),
// Add data quality score
AddQualityScore = Table.AddColumn(CleanEmails, "DataQualityScore",
each CustomerUtils[CalculateDataQualityScore](_))
in
AddQualityScore
Merge your data sources and add business calculations:
// Query: Unified_Customer_Data
let
CRMData = Standardized_CRM_Customers,
EcommerceData = Raw_Ecommerce_Customers,
// Full outer join to capture customers from both systems
CombinedData = Table.NestedJoin(CRMData, {"Email"},
EcommerceData, {"Email"}, "EcommerceInfo", JoinKind.FullOuter),
// Expand e-commerce data
ExpandedData = Table.ExpandTableColumn(CombinedData, "EcommerceInfo",
{"TotalOrders", "TotalSpent", "LastOrderDate"},
{"EcommerceTotalOrders", "EcommerceTotalSpent", "EcommerceLastOrderDate"}),
// Calculate lifetime value (combining multiple data points)
AddLifetimeValue = Table.AddColumn(ExpandedData, "LifetimeValue",
each let
EcommerceValue = if [EcommerceTotalSpent] <> null then [EcommerceTotalSpent] else 0,
// Could add other value calculations here (support tickets resolved, etc.)
TotalValue = EcommerceValue
in
TotalValue),
// Assign customer segments
AddSegment = Table.AddColumn(AddLifetimeValue, "CustomerSegment",
each CustomerUtils[AssignCustomerSegment]([LifetimeValue])),
// Calculate days since last order
AddRecency = Table.AddColumn(AddSegment, "DaysSinceLastOrder",
each if [EcommerceLastOrderDate] <> null then
Duration.Days(DateTime.Date(DateTime.LocalNow()) - [EcommerceLastOrderDate])
else null)
in
AddRecency
Create error tracking and data quality monitoring:
// Query: DataQuality_Report
let
Source = Unified_Customer_Data,
// Count various data quality issues
TotalRecords = Table.RowCount(Source),
MissingEmails = Table.RowCount(Table.SelectRows(Source, each [Email] = null)),
MissingNames = Table.RowCount(Table.SelectRows(Source, each [FirstName] = null or [LastName] = null)),
LowQualityRecords = Table.RowCount(Table.SelectRows(Source, each [DataQualityScore] < 60)),
OrphanedRecords = Table.RowCount(Table.SelectRows(Source, each [SourceSystem] = null)),
// Create summary report
QualityMetrics = Table.FromRows({
{"Total Records", TotalRecords, if TotalRecords > 0 then "OK" else "ERROR"},
{"Missing Emails", MissingEmails, if MissingEmails / TotalRecords < 0.1 then "OK" else "WARNING"},
{"Missing Names", MissingNames, if MissingNames / TotalRecords < 0.05 then "OK" else "WARNING"},
{"Low Quality Score", LowQualityRecords, if LowQualityRecords / TotalRecords < 0.2 then "OK" else "WARNING"},
{"Orphaned Records", OrphanedRecords, if OrphanedRecords = 0 then "OK" else "ERROR"}
}, {"Metric", "Count", "Status"}),
// Add percentages
AddPercentages = Table.AddColumn(QualityMetrics, "Percentage",
each if TotalRecords > 0 then Number.Round([Count] / TotalRecords * 100, 2) else 0),
// Add timestamp
AddTimestamp = Table.AddColumn(AddPercentages, "CheckTimestamp",
each DateTime.LocalNow())
in
AddTimestamp
This exercise demonstrates how maintainable ETL principles work together in a real scenario. The modular design makes it easy to add new data sources, the parameter system supports different processing modes, and comprehensive error handling provides visibility into data quality issues.
Even with best practices, certain issues appear frequently in Power Query development. Recognizing these patterns helps you debug problems quickly and avoid repeating them.
Problem: Adding custom columns or using complex M functions early in the query prevents operations from being pushed to the data source.
// Problematic approach - breaks folding immediately
let
Source = Sql.Database("server", "database"),
SalesTable = Source{[Schema="dbo",Item="Sales"]}[Data],
// This custom column breaks query folding
AddCustomField = Table.AddColumn(SalesTable, "CustomCalc", each [Amount] * 1.08),
// Subsequent operations must happen in Power Query instead of SQL
FilteredData = Table.SelectRows(AddCustomField, each [OrderDate] >= #date(2024, 1, 1))
in
FilteredData
Solution: Defer custom calculations until after filterable operations:
let
Source = Sql.Database("server", "database"),
SalesTable = Source{[Schema="dbo",Item="Sales"]}[Data],
// Apply filters first to reduce data volume
FilteredData = Table.SelectRows(SalesTable, each [OrderDate] >= #date(2024, 1, 1)),
SelectedColumns = Table.SelectColumns(FilteredData, {"OrderID", "Amount", "CustomerID"}),
// Now add custom calculations on the smaller dataset
AddCustomField = Table.AddColumn(SelectedColumns, "CustomCalc", each [Amount] * 1.08)
in
AddCustomField
Problem: Different parts of your query handle errors differently, creating unpredictable behavior:
// Inconsistent error handling
let
Source = try ExternalDataSource otherwise null, // Returns null on error
ProcessedData = try SomeTransformation(Source) otherwise #table({}, {}), // Returns empty table on error
FinalStep = ProcessedData ?? DefaultTable // Different fallback logic
in
FinalStep
Solution: Establish consistent error handling patterns:
let
// Standardized error handling approach
LoadData = try ExternalDataSource otherwise CreateErrorTable("DataLoad", "Source connection failed"),
ValidateData = if Table.RowCount(LoadData) = 0 then
CreateErrorTable("DataValidation", "No data returned")
else LoadData,
ProcessData = try SomeTransformation(ValidateData) otherwise CreateErrorTable("Processing", "Transformation failed")
in
ProcessData
Problem: Creating too many parameters makes queries difficult to manage and deploy:
// Too many parameters
ServerName, DatabaseName, UserName, Password, TableName, StartDate, EndDate,
FilterColumn, FilterValue, SortColumn, SortDirection, MaxRows, TimeoutSeconds
Solution: Group related parameters into configuration records:
// Organized parameter structure
DataSourceConfig = [
Server = "prod-server",
Database = "SalesDB",
Credentials = [User = "service_account", Password = "encrypted_value"]
],
ProcessingConfig = [
DateRange = [Start = #date(2024, 1, 1), End = DateTime.Date(DateTime.LocalNow())],
Filters = [Column = "Status", Value = "Active"],
Options = [MaxRows = 10000, TimeoutSeconds = 300]
]
Symptoms: Queries that used to complete quickly now take much longer or time out.
Diagnostic Steps:
Common Solutions:
// Add query diagnostics to identify bottlenecks
let
StartTime = DateTime.LocalNow(),
Source = SomeDataSource,
AfterSourceTime = DateTime.LocalNow(),
ProcessedData = SomeExpensiveTransformation(Source),
AfterProcessingTime = DateTime.LocalNow(),
// Add timing information for debugging
AddDiagnostics = Table.AddColumn(ProcessedData, "ProcessingMetrics",
each [
SourceLoadTime = Duration.TotalSeconds(AfterSourceTime - StartTime),
ProcessingTime = Duration.TotalSeconds(AfterProcessingTime - AfterSourceTime),
TotalTime = Duration.TotalSeconds(AfterProcessingTime - StartTime)
])
in
AddDiagnostics
Problem: Automatic data type detection produces inconsistent results across refreshes.
// Problematic - relies on automatic type detection
let
Source = Excel.Workbook(File.Contents("data.xlsx")),
Sheet1 = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
// Types inferred from sample data might be wrong
FirstRowAsHeader = Table.PromoteHeaders(Sheet1, [PromoteAllScalars=true])
in
FirstRowAsHeader
Solution: Explicitly specify data types:
let
Source = Excel.Workbook(File.Contents("data.xlsx")),
Sheet1 = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
FirstRowAsHeader = Table.PromoteHeaders(Sheet1, [PromoteAllScalars=true]),
// Explicitly set expected types
TypedColumns = Table.TransformColumnTypes(FirstRowAsHeader, {
{"OrderID", type text},
{"OrderDate", type date},
{"Amount", Currency.Type},
{"CustomerID", type text}
}, "en-US")
in
TypedColumns
Building maintainable Power Query ETL solutions requires discipline and planning, but the investment pays dividends in reduced maintenance overhead and increased reliability. The practices we've covered—modular architecture, systematic parameter management, comprehensive error handling, clear documentation, performance optimization, and thorough testing—work together to create robust data processing systems.
The key takeaways for your daily practice:
Start with structure: Before writing transformations, plan your query architecture. Identify reusable components and create a logical layer structure that separates concerns.
Parameterize from the beginning: Hardcoded values seem convenient initially but become maintenance nightmares. Build parameter management into your workflow from day one.
Design for failure: Systems fail in unexpected ways. Your error handling strategy should provide clear diagnostics and graceful degradation paths.
Think like a team: Write code that your colleagues (and future you) can understand and modify. Clear naming and strategic documentation make complex queries approachable.
Optimize proactively: Performance problems often appear gradually. Build performance consciousness into your development process rather than treating it as an afterthought.
Test systematically: Manual testing doesn't scale with complexity. Create repeatable test processes that catch issues before they reach production.
Next steps to deepen your expertise:
Advanced M Programming: Explore recursive functions, advanced list operations, and custom data connectors to handle complex scenarios.
Integration Patterns: Learn how to integrate Power Query solutions with other Microsoft platforms like Power Automate, Azure Data Factory, and Power BI dataflows.
Enterprise Deployment: Study deployment strategies for enterprise environments, including source control integration, automated testing, and change management processes.
Custom Connectors: Build custom data connectors for proprietary systems or complex APIs that require specialized authentication or data handling.
The practices demonstrated in this lesson form the foundation for professional-grade ETL development. As you apply them in your own projects, you'll discover that well-structured Power Query solutions not only work better—they adapt more easily to changing requirements and scale more gracefully with your organization's growing data needs.
Remember: maintainable ETL isn't about perfection from the start. It's about building systems that can evolve gracefully under the pressure of real-world requirements. Start applying these practices incrementally, and you'll see immediate improvements in both the reliability and maintainability of your data processing solutions.
Learning Path: Power Query Essentials