
You're staring at an API endpoint that returns a deeply nested JSON structure, and your boss needs a clean table by tomorrow morning. Or maybe you've inherited a legacy system that exports data as XML files with multiple namespaces and complex hierarchies. These scenarios are exactly where Power Query's M language shines—but only if you know how to navigate the complexities of semi-structured data.
While Power Query provides excellent automatic parsing for simple JSON and XML files, real-world data sources are rarely simple. They contain arrays of objects, mixed data types, optional fields, and nested structures that can make your head spin. The built-in wizards will get you started, but they often leave you with expanded columns full of "Record" and "List" values that need serious cleanup.
What you'll learn:
This lesson assumes you're comfortable with Power Query basics and have written custom M functions before. You should understand M's type system, know how to work with records and lists, and be familiar with the Table functions. If you haven't worked with the advanced editor in Power Query, spend some time there first.
M treats JSON and XML as hierarchical structures built from three core types: records (key-value pairs), lists (ordered collections), and primitives (text, numbers, etc.). This is fundamentally different from how databases treat data, and it's the key to unlocking complex transformations.
Let's start with a realistic scenario. You're working with a CRM API that returns customer data in this format:
{
"customers": [
{
"id": 1001,
"name": "Acme Corp",
"contacts": [
{
"type": "primary",
"person": "John Smith",
"methods": {
"email": "john@acme.com",
"phone": "+1-555-0123"
}
},
{
"type": "billing",
"person": "Jane Doe",
"methods": {
"email": "billing@acme.com"
}
}
],
"orders": [
{"date": "2024-01-15", "amount": 15000},
{"date": "2024-02-10", "amount": 8500}
]
},
{
"id": 1002,
"name": "Tech Solutions LLC",
"contacts": [
{
"type": "primary",
"person": "Mike Johnson",
"methods": {
"email": "mike@techsolutions.com",
"phone": "+1-555-0456"
}
}
],
"orders": []
}
]
}
The automatic JSON parser will give you a table with "contacts" and "orders" columns full of List values. Here's how to navigate this structure methodically.
First, let's understand what we're working with:
let
Source = Json.Document(File.Contents("C:\data\customers.json")),
CustomersTable = Table.FromRecords({Source}),
NavigateToCustomers = Table.ExpandRecordColumn(CustomersTable, "Column1", {"customers"}),
CustomersList = Table.Column(NavigateToCustomers, "customers"){0}
in
CustomersList
This gives us the list of customer records. Now we can see the structure clearly and decide how to flatten it based on our business requirements.
The key to working with complex JSON is understanding that you're traversing a tree structure. Each navigation step should be intentional and handle the possibility that data might be missing or malformed.
Let's extract customer information along with their primary contact details:
let
Source = Json.Document(File.Contents("C:\data\customers.json")),
CustomersList = Source[customers],
// Convert to table and expand basic customer info
CustomersTable = Table.FromList(CustomersList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandCustomerInfo = Table.ExpandRecordColumn(CustomersTable, "Column1",
{"id", "name", "contacts", "orders"},
{"CustomerID", "CustomerName", "Contacts", "Orders"}),
// Add column to extract primary contact
AddPrimaryContact = Table.AddColumn(ExpandCustomerInfo, "PrimaryContact",
each
let
ContactsList = [Contacts],
PrimaryContactRecord = List.Select(ContactsList, each [type] = "primary"),
PrimaryContact = if List.Count(PrimaryContactRecord) > 0
then PrimaryContactRecord{0}
else null
in
PrimaryContact
),
// Expand primary contact details
ExpandPrimaryContact = Table.ExpandRecordColumn(AddPrimaryContact, "PrimaryContact",
{"person", "methods"},
{"PrimaryContactPerson", "ContactMethods"}),
// Extract contact methods
AddContactEmail = Table.AddColumn(ExpandPrimaryContact, "PrimaryEmail",
each
let
Methods = [ContactMethods],
Email = if Methods <> null and Record.HasFields(Methods, "email")
then Methods[email]
else null
in
Email
),
AddContactPhone = Table.AddColumn(AddContactEmail, "PrimaryPhone",
each
let
Methods = [ContactMethods],
Phone = if Methods <> null and Record.HasFields(Methods, "phone")
then Methods[phone]
else null
in
Phone
),
// Clean up intermediate columns
RemoveIntermediateColumns = Table.RemoveColumns(AddContactPhone, {"Contacts", "ContactMethods"}),
// Add order summary
AddOrderSummary = Table.AddColumn(RemoveIntermediateColumns, "TotalOrderValue",
each
let
OrdersList = [Orders],
OrderAmounts = List.Transform(OrdersList, each [amount]),
TotalValue = List.Sum(OrderAmounts)
in
TotalValue
),
FinalCleanup = Table.RemoveColumns(AddOrderSummary, {"Orders"})
in
FinalCleanup
This approach demonstrates several critical patterns:
Performance tip: When working with large JSON files, consider using
Json.Documentwith the encoding parameter explicitly set, and avoid expanding columns you don't need. Each expansion operation creates a new table in memory.
Real-world JSON often contains arrays of mixed objects or optional fields that vary by record. Here's how to handle these situations elegantly.
Consider this product catalog JSON where products can have different attribute structures:
{
"products": [
{
"sku": "LAPTOP-001",
"name": "Professional Laptop",
"category": "electronics",
"attributes": {
"screen_size": "15.6",
"ram": "16GB",
"storage": "512GB SSD"
},
"pricing": [
{"region": "US", "currency": "USD", "price": 1299},
{"region": "EU", "currency": "EUR", "price": 1199}
]
},
{
"sku": "DESK-001",
"name": "Standing Desk",
"category": "furniture",
"attributes": {
"material": "bamboo",
"height_range": "28-48 inches",
"weight_capacity": "150 lbs"
},
"pricing": [
{"region": "US", "currency": "USD", "price": 599}
]
}
]
}
Here's a robust approach to handle this mixed structure:
let
Source = Json.Document(File.Contents("C:\data\products.json")),
ProductsList = Source[products],
ProductsTable = Table.FromList(ProductsList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandBasicInfo = Table.ExpandRecordColumn(ProductsTable, "Column1",
{"sku", "name", "category", "attributes", "pricing"},
{"SKU", "ProductName", "Category", "Attributes", "Pricing"}),
// Create a function to safely extract attributes based on product category
ExtractAttributeValue = (attributeRecord as record, fieldName as text) as any =>
if attributeRecord <> null and Record.HasFields(attributeRecord, fieldName)
then Record.Field(attributeRecord, fieldName)
else null,
// Add category-specific attribute columns
AddElectronicsAttributes = Table.AddColumn(ExpandBasicInfo, "ScreenSize",
each if [Category] = "electronics"
then ExtractAttributeValue([Attributes], "screen_size")
else null),
AddRAM = Table.AddColumn(AddElectronicsAttributes, "RAM",
each if [Category] = "electronics"
then ExtractAttributeValue([Attributes], "ram")
else null),
AddStorage = Table.AddColumn(AddRAM, "Storage",
each if [Category] = "electronics"
then ExtractAttributeValue([Attributes], "storage")
else null),
// Add furniture attributes
AddMaterial = Table.AddColumn(AddStorage, "Material",
each if [Category] = "furniture"
then ExtractAttributeValue([Attributes], "material")
else null),
AddDimensions = Table.AddColumn(AddMaterial, "Dimensions",
each if [Category] = "furniture"
then ExtractAttributeValue([Attributes], "height_range")
else null),
// Handle pricing - let's extract US pricing specifically
AddUSPrice = Table.AddColumn(AddDimensions, "USPrice",
each
let
PricingList = [Pricing],
USPricing = List.Select(PricingList, each [region] = "US"),
Price = if List.Count(USPricing) > 0
then USPricing{0}[price]
else null
in
Price
),
// Clean up
RemoveIntermediateColumns = Table.RemoveColumns(AddUSPrice, {"Attributes", "Pricing"})
in
RemoveIntermediateColumns
XML processing in M can be more challenging than JSON because of namespaces, attributes, and the distinction between element content and structure. Let's work with a realistic example from an e-commerce order export:
<?xml version="1.0" encoding="UTF-8"?>
<orders xmlns="http://company.com/orders/v1"
xmlns:customer="http://company.com/customer/v1"
xmlns:product="http://company.com/products/v1">
<order id="ORD-001" date="2024-01-15">
<customer:info>
<customer:id>CUST-1001</customer:id>
<customer:name>Acme Corporation</customer:name>
<customer:address type="billing">
<customer:street>123 Business Ave</customer:street>
<customer:city>Chicago</customer:city>
<customer:state>IL</customer:state>
<customer:zip>60601</customer:zip>
</customer:address>
</customer:info>
<items>
<item>
<product:sku>LAPTOP-001</product:sku>
<product:name>Professional Laptop</product:name>
<quantity>2</quantity>
<unit-price>1299.00</unit-price>
</item>
<item>
<product:sku>MOUSE-001</product:sku>
<product:name>Wireless Mouse</product:name>
<quantity>2</quantity>
<unit-price>49.99</unit-price>
</item>
</items>
</order>
<order id="ORD-002" date="2024-01-16">
<customer:info>
<customer:id>CUST-1002</customer:id>
<customer:name>Tech Solutions LLC</customer:name>
<customer:address type="billing">
<customer:street>456 Innovation Dr</customer:street>
<customer:city>Austin</customer:city>
<customer:state>TX</customer:state>
<customer:zip>78701</customer:zip>
</customer:address>
</customer:info>
<items>
<item>
<product:sku>DESK-001</product:sku>
<product:name>Standing Desk</product:name>
<quantity>1</quantity>
<unit-price>599.00</unit-price>
</item>
</items>
</order>
</orders>
Here's how to navigate this XML structure effectively:
let
Source = Xml.Tables(File.Contents("C:\data\orders.xml")),
// Navigate to the orders table
OrdersTable = Source{0}[Table],
// Extract order attributes (id and date are XML attributes)
ExpandOrderAttributes = Table.ExpandRecordColumn(OrdersTable, "Attribute:id", {"#text"}, {"OrderID"}),
ExpandDateAttributes = Table.ExpandRecordColumn(ExpandOrderAttributes, "Attribute:date", {"#text"}, {"OrderDate"}),
// Navigate to customer info - this requires understanding the XML namespace structure
ExpandCustomerInfo = Table.ExpandTableColumn(ExpandDateAttributes, "http://company.com/customer/v1:info",
{"http://company.com/customer/v1:id", "http://company.com/customer/v1:name", "http://company.com/customer/v1:address"}),
// Extract customer ID and name
ExtractCustomerID = Table.ExpandTableColumn(ExpandCustomerInfo, "http://company.com/customer/v1:id", {"#text"}),
RenameCustomerID = Table.RenameColumns(ExtractCustomerID, {{"#text", "CustomerID"}}),
ExtractCustomerName = Table.ExpandTableColumn(RenameCustomerID, "http://company.com/customer/v1:name", {"#text"}),
RenameCustomerName = Table.RenameColumns(ExtractCustomerName, {{"#text", "CustomerName"}}),
// Handle the address - it's a complex nested structure
ExtractAddress = Table.AddColumn(RenameCustomerName, "BillingAddress",
each
let
AddressTable = [#"http://company.com/customer/v1:address"],
// Address contains multiple child elements
AddressElements = Table.Group(AddressTable, {}, {
{"Street", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:street")[#text])},
{"City", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:city")[#text])},
{"State", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:state")[#text])},
{"Zip", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:zip")[#text])}
}),
FormattedAddress = AddressElements[Street]{0} & ", " &
AddressElements[City]{0} & ", " &
AddressElements[State]{0} & " " &
AddressElements[Zip]{0}
in
FormattedAddress
),
// Now handle the line items - this is where XML gets tricky
ExpandItems = Table.ExpandTableColumn(ExtractAddress, "items", {"item"}),
ExpandLineItems = Table.ExpandListColumn(ExpandItems, "item"),
// Each item is now a table with product information
ExtractLineItemDetails = Table.AddColumn(ExpandLineItems, "LineItemDetails",
each
let
ItemTable = [item],
// Create a record from the line item table
SKURow = Table.SelectRows(ItemTable, each [Name] = "http://company.com/products/v1:sku"),
SKU = if Table.RowCount(SKURow) > 0 then SKURow[#text]{0} else null,
NameRow = Table.SelectRows(ItemTable, each [Name] = "http://company.com/products/v1:name"),
ProductName = if Table.RowCount(NameRow) > 0 then NameRow[#text]{0} else null,
QtyRow = Table.SelectRows(ItemTable, each [Name] = "quantity"),
Quantity = if Table.RowCount(QtyRow) > 0 then Number.FromText(QtyRow[#text]{0}) else null,
PriceRow = Table.SelectRows(ItemTable, each [Name] = "unit-price"),
UnitPrice = if Table.RowCount(PriceRow) > 0 then Number.FromText(PriceRow[#text]{0}) else null,
LineTotal = if Quantity <> null and UnitPrice <> null then Quantity * UnitPrice else null
in
[SKU = SKU, ProductName = ProductName, Quantity = Quantity, UnitPrice = UnitPrice, LineTotal = LineTotal]
),
// Expand the line item details
ExpandLineItemDetails = Table.ExpandRecordColumn(ExtractLineItemDetails, "LineItemDetails",
{"SKU", "ProductName", "Quantity", "UnitPrice", "LineTotal"}),
// Clean up intermediate columns
CleanupColumns = Table.RemoveColumns(ExpandLineItemDetails, {
"http://company.com/customer/v1:address",
"item"
}),
// Convert data types
ConvertTypes = Table.TransformColumnTypes(CleanupColumns, {
{"OrderDate", type date},
{"Quantity", Int64.Type},
{"UnitPrice", Currency.Type},
{"LineTotal", Currency.Type}
})
in
ConvertTypes
XML Navigation Tip: When working with XML namespaces in M, the column names include the full namespace URI. Always examine the intermediate results to understand the exact column naming convention Power Query uses.
When working with large JSON or XML files, performance becomes critical. Here are proven strategies to optimize your M code:
For files larger than 100MB, consider whether you need to process the entire file at once:
let
// Instead of loading everything into memory
// Source = Json.Document(File.Contents("large-file.json")),
// Use streaming when possible
Source = Json.Document(
File.Contents("large-file.json"),
TextEncoding.Utf8
),
// Process in chunks if dealing with arrays
ProcessInBatches = (data as list, batchSize as number) as table =>
let
BatchCount = Number.RoundUp(List.Count(data) / batchSize),
Batches = List.Generate(
() => 0,
each _ < BatchCount,
each _ + 1,
each List.Range(data, _ * batchSize, batchSize)
),
ProcessedBatches = List.Transform(Batches, each Table.FromList(_, Splitter.SplitByNothing())),
CombinedResult = Table.Combine(ProcessedBatches)
in
CombinedResult
in
ProcessInBatches
Don't expand columns you don't need:
let
Source = Json.Document(File.Contents("data.json")),
// Instead of expanding all columns
// ExpandAll = Table.ExpandRecordColumn(Source, "Column1",
// Record.FieldNames(Source[Column1]{0})),
// Only expand what you need
ExpandSelected = Table.ExpandRecordColumn(Source, "Column1",
{"id", "name", "created_date"},
{"ID", "Name", "CreatedDate"}),
// Use column removal strategically
RemoveUnneeded = Table.RemoveColumns(ExpandSelected, {"internal_metadata", "debug_info"})
in
RemoveUnneeded
Apply filters as early as possible in your transformation pipeline:
let
Source = Json.Document(File.Contents("transactions.json")),
TransactionsList = Source[transactions],
// Filter before expanding - much faster
RecentTransactions = List.Select(TransactionsList,
each Date.FromText([date]) >= #date(2024, 1, 1)),
TransactionsTable = Table.FromList(RecentTransactions, Splitter.SplitByNothing()),
// Now expand only the filtered data
ExpandTransactions = Table.ExpandRecordColumn(TransactionsTable, "Column1",
{"id", "date", "amount", "description"})
in
ExpandTransactions
Semi-structured data is notorious for inconsistencies. Here's how to build robust error handling into your M code:
let
Source = Json.Document(File.Contents("api-data.json")),
// Create a safe field extractor function
SafeFieldExtract = (record as any, fieldName as text, defaultValue as any) as any =>
try
if record <> null and Value.Type(record) = type record and Record.HasFields(record, fieldName)
then Record.Field(record, fieldName)
else defaultValue
otherwise defaultValue,
// Create a safe list processor
SafeListProcess = (listValue as any, processor as function) as any =>
try
if listValue <> null and Value.Type(listValue) = type list
then processor(listValue)
else null
otherwise null,
// Create validation function for required fields
ValidateRecord = (record as any, requiredFields as list) as record =>
let
ValidationResults = List.Transform(requiredFields,
each [
Field = _,
HasValue = SafeFieldExtract(record, _, null) <> null,
Value = SafeFieldExtract(record, _, null)
]),
AllValid = List.AllTrue(List.Transform(ValidationResults, each [HasValue])),
ErrorMessage = if not AllValid
then "Missing required fields: " & Text.Combine(
List.Select(ValidationResults, each not [HasValue])[Field], ", ")
else null
in
[
IsValid = AllValid,
ErrorMessage = ErrorMessage,
Data = record
],
ProcessedData = Table.FromList(Source[items], Splitter.SplitByNothing()),
// Add validation
AddValidation = Table.AddColumn(ProcessedData, "Validation",
each ValidateRecord([Column1], {"id", "name", "price"})),
// Filter out invalid records and log errors
ValidRecords = Table.SelectRows(AddValidation, each [Validation][IsValid] = true),
InvalidRecords = Table.SelectRows(AddValidation, each [Validation][IsValid] = false),
// Process valid records
ExtractValidData = Table.AddColumn(ValidRecords, "CleanData", each [Validation][Data]),
ExpandCleanData = Table.ExpandRecordColumn(ExtractValidData, "CleanData",
{"id", "name", "price"}, {"ID", "Name", "Price"}),
FinalTable = Table.RemoveColumns(ExpandCleanData, {"Column1", "Validation"})
in
FinalTable
Let's put everything together with a realistic scenario. You're tasked with combining customer data from three sources:
Here's the complete solution:
let
// Source 1: Customer Profiles (JSON API)
GetCustomerProfiles = () =>
let
ApiResponse = Web.Contents("https://api.company.com/customers", [
Headers = [Authorization = "Bearer " & ApiKey],
Query = [limit = "1000", active = "true"]
]),
JsonData = Json.Document(ApiResponse),
CustomersList = JsonData[data],
ProfilesTable = Table.FromList(CustomersList, Splitter.SplitByNothing()),
ExpandProfiles = Table.ExpandRecordColumn(ProfilesTable, "Column1",
{"customer_id", "name", "email", "segment", "created_date"},
{"CustomerID", "CustomerName", "Email", "Segment", "CreatedDate"}),
ConvertTypes = Table.TransformColumnTypes(ExpandProfiles, {
{"CustomerID", type text},
{"CreatedDate", type datetime}
})
in
ConvertTypes,
// Source 2: Billing History (XML)
GetBillingHistory = () =>
let
XmlSource = Xml.Tables(File.Contents("C:\data\billing_export.xml")),
BillingTable = XmlSource{0}[Table],
ExpandPayments = Table.ExpandTableColumn(BillingTable, "payment",
{"customer_id", "amount", "date", "status"}),
ExtractPaymentData = Table.TransformColumns(ExpandPayments, {
{"customer_id", each _{0}[#text]},
{"amount", each Number.FromText(_{0}[#text])},
{"date", each Date.FromText(_{0}[#text])},
{"status", each _{0}[#text]}
}),
RenameColumns = Table.RenameColumns(ExtractPaymentData, {
{"customer_id", "CustomerID"},
{"amount", "PaymentAmount"},
{"date", "PaymentDate"},
{"status", "PaymentStatus"}
}),
// Aggregate to customer level
GroupPayments = Table.Group(RenameColumns, {"CustomerID"}, {
{"TotalPayments", each List.Sum([PaymentAmount]), type number},
{"LastPaymentDate", each List.Max([PaymentDate]), type date},
{"PaymentCount", each Table.RowCount(_), Int64.Type}
})
in
GroupPayments,
// Source 3: Support Tickets (JSON File)
GetSupportTickets = () =>
let
JsonSource = Json.Document(File.Contents("C:\data\support_tickets.json")),
TicketsList = JsonSource[tickets],
TicketsTable = Table.FromList(TicketsList, Splitter.SplitByNothing()),
ExpandTickets = Table.ExpandRecordColumn(TicketsTable, "Column1",
{"ticket_id", "customer_id", "priority", "status", "created_at", "resolved_at"},
{"TicketID", "CustomerID", "Priority", "Status", "CreatedAt", "ResolvedAt"}),
// Handle optional resolved_at field
SafeResolvedAt = Table.TransformColumns(ExpandTickets, {
{"ResolvedAt", each if _ <> null then DateTime.FromText(_) else null}
}),
// Calculate resolution time
AddResolutionTime = Table.AddColumn(SafeResolvedAt, "ResolutionHours",
each if [ResolvedAt] <> null and [CreatedAt] <> null
then Duration.TotalHours([ResolvedAt] - DateTime.FromText([CreatedAt]))
else null),
// Aggregate to customer level
GroupTickets = Table.Group(AddResolutionTime, {"CustomerID"}, {
{"TotalTickets", each Table.RowCount(_), Int64.Type},
{"OpenTickets", each Table.RowCount(Table.SelectRows(_, each [Status] <> "closed")), Int64.Type},
{"AvgResolutionHours", each List.Average(List.RemoveNulls([ResolutionHours])), type number},
{"HighPriorityTickets", each Table.RowCount(Table.SelectRows(_, each [Priority] = "high")), Int64.Type}
})
in
GroupTickets,
// Combine all sources
CustomerProfiles = GetCustomerProfiles(),
BillingHistory = GetBillingHistory(),
SupportTickets = GetSupportTickets(),
// Join data sources
JoinBilling = Table.NestedJoin(CustomerProfiles, {"CustomerID"}, BillingHistory, {"CustomerID"},
"BillingData", JoinKind.LeftOuter),
ExpandBilling = Table.ExpandTableColumn(JoinBilling, "BillingData",
{"TotalPayments", "LastPaymentDate", "PaymentCount"}),
JoinSupport = Table.NestedJoin(ExpandBilling, {"CustomerID"}, SupportTickets, {"CustomerID"},
"SupportData", JoinKind.LeftOuter),
ExpandSupport = Table.ExpandTableColumn(JoinSupport, "SupportData",
{"TotalTickets", "OpenTickets", "AvgResolutionHours", "HighPriorityTickets"}),
// Handle nulls from left joins
FillNulls = Table.ReplaceValue(ExpandSupport, null, 0, Replacer.ReplaceValue,
{"TotalPayments", "PaymentCount", "TotalTickets", "OpenTickets", "HighPriorityTickets"}),
// Calculate customer health score
AddHealthScore = Table.AddColumn(FillNulls, "HealthScore",
each
let
PaymentScore = if [TotalPayments] > 10000 then 40
else if [TotalPayments] > 5000 then 30
else if [TotalPayments] > 1000 then 20
else 10,
SupportScore = if [OpenTickets] = 0 then 30
else if [OpenTickets] <= 2 then 20
else 10,
EngagementScore = if [PaymentCount] > 10 then 30
else if [PaymentCount] > 5 then 20
else 10
in
PaymentScore + SupportScore + EngagementScore
),
// Final data type conversions
FinalTypes = Table.TransformColumnTypes(AddHealthScore, {
{"TotalPayments", Currency.Type},
{"LastPaymentDate", type date},
{"AvgResolutionHours", type number}
})
in
FinalTypes
This example demonstrates:
Production Tip: When building multi-source pipelines like this, consider implementing each source as a separate function in Power Query. This makes debugging easier and allows for better code reuse.
Problem: Expanding every available column leads to performance issues and cluttered data models.
Solution: Always be selective about which columns you expand. Use Table.ColumnNames() to inspect available columns first:
// Inspect available columns before expanding
let
Source = Json.Document(File.Contents("data.json")),
SampleRecord = Source{0},
AvailableFields = Record.FieldNames(SampleRecord)
in
AvailableFields
Problem: Assuming all records have the same structure leads to errors when field types vary.
Solution: Always use defensive navigation:
// Wrong approach
BadExtraction = [Amount],
// Correct approach
GoodExtraction = if Value.Type([Amount]) = type number
then [Amount]
else if Value.Type([Amount]) = type text
then Number.FromText([Amount])
else null
Problem: List operations fail when lists are empty or null.
Solution: Always check list contents before processing:
SafeListSum = (listValue as any) as number =>
if listValue <> null and Value.Type(listValue) = type list and List.Count(listValue) > 0
then List.Sum(listValue)
else 0
Problem: Hard-coding namespace URIs makes code brittle.
Solution: Extract namespaces programmatically or use partial matching:
// Create a helper function for namespace-aware field extraction
ExtractXmlField = (xmlTable as table, localName as text) as any =>
let
MatchingRows = Table.SelectRows(xmlTable,
each Text.EndsWith([Name], ":" & localName) or [Name] = localName),
Result = if Table.RowCount(MatchingRows) > 0
then MatchingRows[#text]{0}
else null
in
Result
When processing large files, you might encounter memory limitations. Here's a diagnostic approach:
// Add this at different points in your query to monitor memory usage
let
Source = Json.Document(File.Contents("large-file.json")),
// Check record count at various stages
StageOneCount = List.Count(Source[data]),
ProcessedData = Table.FromList(Source[data], Splitter.SplitByNothing()),
StageTwoCount = Table.RowCount(ProcessedData),
// If memory issues occur, implement batching
BatchProcessor = (data as list, batchSize as number) =>
let
TotalRecords = List.Count(data),
BatchCount = Number.RoundUp(TotalRecords / batchSize),
ProcessBatch = (batchIndex as number) =>
let
StartIndex = batchIndex * batchSize,
BatchData = List.Range(data, StartIndex, batchSize),
BatchTable = Table.FromList(BatchData, Splitter.SplitByNothing())
in
BatchTable,
AllBatches = List.Transform({0..BatchCount-1}, ProcessBatch),
CombinedResult = Table.Combine(AllBatches)
in
CombinedResult
in
if StageOneCount > 50000
then BatchProcessor(Source[data], 10000)
else ProcessedData
Working effectively with JSON and XML in Power Query's M language requires understanding that you're navigating tree structures, not flat tables. The key principles we've covered are:
The techniques you've learned here—defensive navigation, performance optimization, error handling, and multi-source integration—form the foundation for building robust data pipelines that can handle the complexity of real-world semi-structured data.
Your next steps should be:
Remember that Power Query is just one tool in the modern data stack. As you become more comfortable with M, consider when other approaches—like Azure Functions for complex JSON processing or dedicated ETL tools for high-volume XML processing—might be more appropriate for your specific use cases.
Learning Path: Advanced M Language