Wicked Smart Data
LearnArticlesAbout
Sign InSign Up
LearnArticlesAboutContact
Sign InSign Up
Wicked Smart Data

The go-to platform for professionals who want to master data, automation, and AI — from Excel fundamentals to cutting-edge machine learning.

Platform

  • Learning Paths
  • Articles
  • About
  • Contact

Connect

  • Contact Us
  • RSS Feed

© 2026 Wicked Smart Data. All rights reserved.

Privacy PolicyTerms of Service
All Articles
Advanced JSON and XML Data Processing in Power Query M Language

Advanced JSON and XML Data Processing in Power Query M Language

Power Query⚡ Practitioner17 min readApr 19, 2026Updated Apr 19, 2026
Table of Contents
  • Prerequisites
  • Understanding M's Approach to Semi-Structured Data
  • Navigating JSON Hierarchies with Precision
  • Advanced JSON Pattern Matching
  • Working with XML Namespaces and Complex Structures
  • Performance Optimization for Large Semi-Structured Files
  • Streaming vs. Loading
  • Selective Field Extraction
  • Early Filtering
  • Error Handling and Data Validation
  • Hands-On Exercise: Building a Multi-Source Data Pipeline
  • Common Mistakes & Troubleshooting

Working with JSON and XML Data Sources in M

You're staring at an API endpoint that returns a deeply nested JSON structure, and your boss needs a clean table by tomorrow morning. Or maybe you've inherited a legacy system that exports data as XML files with multiple namespaces and complex hierarchies. These scenarios are exactly where Power Query's M language shines—but only if you know how to navigate the complexities of semi-structured data.

While Power Query provides excellent automatic parsing for simple JSON and XML files, real-world data sources are rarely simple. They contain arrays of objects, mixed data types, optional fields, and nested structures that can make your head spin. The built-in wizards will get you started, but they often leave you with expanded columns full of "Record" and "List" values that need serious cleanup.

What you'll learn:

  • How to navigate complex JSON and XML structures programmatically in M
  • Advanced techniques for extracting data from nested hierarchies and mixed arrays
  • Performance optimization strategies for large semi-structured datasets
  • Error handling patterns for missing or malformed data elements
  • When to use M vs. other approaches for JSON/XML processing

Prerequisites

This lesson assumes you're comfortable with Power Query basics and have written custom M functions before. You should understand M's type system, know how to work with records and lists, and be familiar with the Table functions. If you haven't worked with the advanced editor in Power Query, spend some time there first.

Understanding M's Approach to Semi-Structured Data

M treats JSON and XML as hierarchical structures built from three core types: records (key-value pairs), lists (ordered collections), and primitives (text, numbers, etc.). This is fundamentally different from how databases treat data, and it's the key to unlocking complex transformations.

Let's start with a realistic scenario. You're working with a CRM API that returns customer data in this format:

{
  "customers": [
    {
      "id": 1001,
      "name": "Acme Corp",
      "contacts": [
        {
          "type": "primary",
          "person": "John Smith",
          "methods": {
            "email": "john@acme.com",
            "phone": "+1-555-0123"
          }
        },
        {
          "type": "billing",
          "person": "Jane Doe",
          "methods": {
            "email": "billing@acme.com"
          }
        }
      ],
      "orders": [
        {"date": "2024-01-15", "amount": 15000},
        {"date": "2024-02-10", "amount": 8500}
      ]
    },
    {
      "id": 1002,
      "name": "Tech Solutions LLC",
      "contacts": [
        {
          "type": "primary",
          "person": "Mike Johnson",
          "methods": {
            "email": "mike@techsolutions.com",
            "phone": "+1-555-0456"
          }
        }
      ],
      "orders": []
    }
  ]
}

The automatic JSON parser will give you a table with "contacts" and "orders" columns full of List values. Here's how to navigate this structure methodically.

First, let's understand what we're working with:

let
    Source = Json.Document(File.Contents("C:\data\customers.json")),
    CustomersTable = Table.FromRecords({Source}),
    NavigateToCustomers = Table.ExpandRecordColumn(CustomersTable, "Column1", {"customers"}),
    CustomersList = Table.Column(NavigateToCustomers, "customers"){0}
in
    CustomersList

This gives us the list of customer records. Now we can see the structure clearly and decide how to flatten it based on our business requirements.

Navigating JSON Hierarchies with Precision

The key to working with complex JSON is understanding that you're traversing a tree structure. Each navigation step should be intentional and handle the possibility that data might be missing or malformed.

Let's extract customer information along with their primary contact details:

let
    Source = Json.Document(File.Contents("C:\data\customers.json")),
    CustomersList = Source[customers],
    
    // Convert to table and expand basic customer info
    CustomersTable = Table.FromList(CustomersList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandCustomerInfo = Table.ExpandRecordColumn(CustomersTable, "Column1", 
        {"id", "name", "contacts", "orders"}, 
        {"CustomerID", "CustomerName", "Contacts", "Orders"}),
    
    // Add column to extract primary contact
    AddPrimaryContact = Table.AddColumn(ExpandCustomerInfo, "PrimaryContact", 
        each 
            let
                ContactsList = [Contacts],
                PrimaryContactRecord = List.Select(ContactsList, each [type] = "primary"),
                PrimaryContact = if List.Count(PrimaryContactRecord) > 0 
                                 then PrimaryContactRecord{0} 
                                 else null
            in
                PrimaryContact
    ),
    
    // Expand primary contact details
    ExpandPrimaryContact = Table.ExpandRecordColumn(AddPrimaryContact, "PrimaryContact", 
        {"person", "methods"}, 
        {"PrimaryContactPerson", "ContactMethods"}),
    
    // Extract contact methods
    AddContactEmail = Table.AddColumn(ExpandPrimaryContact, "PrimaryEmail", 
        each 
            let
                Methods = [ContactMethods],
                Email = if Methods <> null and Record.HasFields(Methods, "email") 
                        then Methods[email] 
                        else null
            in
                Email
    ),
    
    AddContactPhone = Table.AddColumn(AddContactEmail, "PrimaryPhone", 
        each 
            let
                Methods = [ContactMethods],
                Phone = if Methods <> null and Record.HasFields(Methods, "phone") 
                        then Methods[phone] 
                        else null
            in
                Phone
    ),
    
    // Clean up intermediate columns
    RemoveIntermediateColumns = Table.RemoveColumns(AddContactPhone, {"Contacts", "ContactMethods"}),
    
    // Add order summary
    AddOrderSummary = Table.AddColumn(RemoveIntermediateColumns, "TotalOrderValue", 
        each 
            let
                OrdersList = [Orders],
                OrderAmounts = List.Transform(OrdersList, each [amount]),
                TotalValue = List.Sum(OrderAmounts)
            in
                TotalValue
    ),
    
    FinalCleanup = Table.RemoveColumns(AddOrderSummary, {"Orders"})
in
    FinalCleanup

This approach demonstrates several critical patterns:

  1. Defensive navigation: We always check if fields exist before accessing them
  2. Null handling: We explicitly handle cases where expected data might be missing
  3. Type awareness: We're conscious that nested structures might contain different data types
  4. Step-by-step expansion: Rather than trying to expand everything at once, we work methodically

Performance tip: When working with large JSON files, consider using Json.Document with the encoding parameter explicitly set, and avoid expanding columns you don't need. Each expansion operation creates a new table in memory.

Advanced JSON Pattern Matching

Real-world JSON often contains arrays of mixed objects or optional fields that vary by record. Here's how to handle these situations elegantly.

Consider this product catalog JSON where products can have different attribute structures:

{
  "products": [
    {
      "sku": "LAPTOP-001",
      "name": "Professional Laptop",
      "category": "electronics",
      "attributes": {
        "screen_size": "15.6",
        "ram": "16GB",
        "storage": "512GB SSD"
      },
      "pricing": [
        {"region": "US", "currency": "USD", "price": 1299},
        {"region": "EU", "currency": "EUR", "price": 1199}
      ]
    },
    {
      "sku": "DESK-001", 
      "name": "Standing Desk",
      "category": "furniture",
      "attributes": {
        "material": "bamboo",
        "height_range": "28-48 inches",
        "weight_capacity": "150 lbs"
      },
      "pricing": [
        {"region": "US", "currency": "USD", "price": 599}
      ]
    }
  ]
}

Here's a robust approach to handle this mixed structure:

let
    Source = Json.Document(File.Contents("C:\data\products.json")),
    ProductsList = Source[products],
    ProductsTable = Table.FromList(ProductsList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    
    ExpandBasicInfo = Table.ExpandRecordColumn(ProductsTable, "Column1", 
        {"sku", "name", "category", "attributes", "pricing"}, 
        {"SKU", "ProductName", "Category", "Attributes", "Pricing"}),
    
    // Create a function to safely extract attributes based on product category
    ExtractAttributeValue = (attributeRecord as record, fieldName as text) as any =>
        if attributeRecord <> null and Record.HasFields(attributeRecord, fieldName)
        then Record.Field(attributeRecord, fieldName)
        else null,
    
    // Add category-specific attribute columns
    AddElectronicsAttributes = Table.AddColumn(ExpandBasicInfo, "ScreenSize", 
        each if [Category] = "electronics" 
             then ExtractAttributeValue([Attributes], "screen_size") 
             else null),
             
    AddRAM = Table.AddColumn(AddElectronicsAttributes, "RAM", 
        each if [Category] = "electronics" 
             then ExtractAttributeValue([Attributes], "ram") 
             else null),
             
    AddStorage = Table.AddColumn(AddRAM, "Storage", 
        each if [Category] = "electronics" 
             then ExtractAttributeValue([Attributes], "storage") 
             else null),
             
    // Add furniture attributes
    AddMaterial = Table.AddColumn(AddStorage, "Material", 
        each if [Category] = "furniture" 
             then ExtractAttributeValue([Attributes], "material") 
             else null),
             
    AddDimensions = Table.AddColumn(AddMaterial, "Dimensions", 
        each if [Category] = "furniture" 
             then ExtractAttributeValue([Attributes], "height_range") 
             else null),
    
    // Handle pricing - let's extract US pricing specifically
    AddUSPrice = Table.AddColumn(AddDimensions, "USPrice", 
        each 
            let
                PricingList = [Pricing],
                USPricing = List.Select(PricingList, each [region] = "US"),
                Price = if List.Count(USPricing) > 0 
                        then USPricing{0}[price] 
                        else null
            in
                Price
    ),
    
    // Clean up
    RemoveIntermediateColumns = Table.RemoveColumns(AddUSPrice, {"Attributes", "Pricing"})
in
    RemoveIntermediateColumns

Working with XML Namespaces and Complex Structures

XML processing in M can be more challenging than JSON because of namespaces, attributes, and the distinction between element content and structure. Let's work with a realistic example from an e-commerce order export:

<?xml version="1.0" encoding="UTF-8"?>
<orders xmlns="http://company.com/orders/v1" 
        xmlns:customer="http://company.com/customer/v1"
        xmlns:product="http://company.com/products/v1">
  <order id="ORD-001" date="2024-01-15">
    <customer:info>
      <customer:id>CUST-1001</customer:id>
      <customer:name>Acme Corporation</customer:name>
      <customer:address type="billing">
        <customer:street>123 Business Ave</customer:street>
        <customer:city>Chicago</customer:city>
        <customer:state>IL</customer:state>
        <customer:zip>60601</customer:zip>
      </customer:address>
    </customer:info>
    <items>
      <item>
        <product:sku>LAPTOP-001</product:sku>
        <product:name>Professional Laptop</product:name>
        <quantity>2</quantity>
        <unit-price>1299.00</unit-price>
      </item>
      <item>
        <product:sku>MOUSE-001</product:sku>
        <product:name>Wireless Mouse</product:name>
        <quantity>2</quantity>
        <unit-price>49.99</unit-price>
      </item>
    </items>
  </order>
  <order id="ORD-002" date="2024-01-16">
    <customer:info>
      <customer:id>CUST-1002</customer:id>
      <customer:name>Tech Solutions LLC</customer:name>
      <customer:address type="billing">
        <customer:street>456 Innovation Dr</customer:street>
        <customer:city>Austin</customer:city>
        <customer:state>TX</customer:state>
        <customer:zip>78701</customer:zip>
      </customer:address>
    </customer:info>
    <items>
      <item>
        <product:sku>DESK-001</product:sku>
        <product:name>Standing Desk</product:name>
        <quantity>1</quantity>
        <unit-price>599.00</unit-price>
      </item>
    </items>
  </order>
</orders>

Here's how to navigate this XML structure effectively:

let
    Source = Xml.Tables(File.Contents("C:\data\orders.xml")),
    
    // Navigate to the orders table
    OrdersTable = Source{0}[Table],
    
    // Extract order attributes (id and date are XML attributes)
    ExpandOrderAttributes = Table.ExpandRecordColumn(OrdersTable, "Attribute:id", {"#text"}, {"OrderID"}),
    ExpandDateAttributes = Table.ExpandRecordColumn(ExpandOrderAttributes, "Attribute:date", {"#text"}, {"OrderDate"}),
    
    // Navigate to customer info - this requires understanding the XML namespace structure
    ExpandCustomerInfo = Table.ExpandTableColumn(ExpandDateAttributes, "http://company.com/customer/v1:info", 
        {"http://company.com/customer/v1:id", "http://company.com/customer/v1:name", "http://company.com/customer/v1:address"}),
    
    // Extract customer ID and name
    ExtractCustomerID = Table.ExpandTableColumn(ExpandCustomerInfo, "http://company.com/customer/v1:id", {"#text"}),
    RenameCustomerID = Table.RenameColumns(ExtractCustomerID, {{"#text", "CustomerID"}}),
    
    ExtractCustomerName = Table.ExpandTableColumn(RenameCustomerID, "http://company.com/customer/v1:name", {"#text"}),
    RenameCustomerName = Table.RenameColumns(ExtractCustomerName, {{"#text", "CustomerName"}}),
    
    // Handle the address - it's a complex nested structure
    ExtractAddress = Table.AddColumn(RenameCustomerName, "BillingAddress", 
        each 
            let
                AddressTable = [#"http://company.com/customer/v1:address"],
                // Address contains multiple child elements
                AddressElements = Table.Group(AddressTable, {}, {
                    {"Street", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:street")[#text])},
                    {"City", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:city")[#text])}, 
                    {"State", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:state")[#text])},
                    {"Zip", each List.First(Table.SelectRows(_, each [Name] = "http://company.com/customer/v1:zip")[#text])}
                }),
                FormattedAddress = AddressElements[Street]{0} & ", " & 
                                 AddressElements[City]{0} & ", " & 
                                 AddressElements[State]{0} & " " & 
                                 AddressElements[Zip]{0}
            in
                FormattedAddress
    ),
    
    // Now handle the line items - this is where XML gets tricky
    ExpandItems = Table.ExpandTableColumn(ExtractAddress, "items", {"item"}),
    ExpandLineItems = Table.ExpandListColumn(ExpandItems, "item"),
    
    // Each item is now a table with product information
    ExtractLineItemDetails = Table.AddColumn(ExpandLineItems, "LineItemDetails",
        each 
            let
                ItemTable = [item],
                // Create a record from the line item table
                SKURow = Table.SelectRows(ItemTable, each [Name] = "http://company.com/products/v1:sku"),
                SKU = if Table.RowCount(SKURow) > 0 then SKURow[#text]{0} else null,
                
                NameRow = Table.SelectRows(ItemTable, each [Name] = "http://company.com/products/v1:name"), 
                ProductName = if Table.RowCount(NameRow) > 0 then NameRow[#text]{0} else null,
                
                QtyRow = Table.SelectRows(ItemTable, each [Name] = "quantity"),
                Quantity = if Table.RowCount(QtyRow) > 0 then Number.FromText(QtyRow[#text]{0}) else null,
                
                PriceRow = Table.SelectRows(ItemTable, each [Name] = "unit-price"),
                UnitPrice = if Table.RowCount(PriceRow) > 0 then Number.FromText(PriceRow[#text]{0}) else null,
                
                LineTotal = if Quantity <> null and UnitPrice <> null then Quantity * UnitPrice else null
            in
                [SKU = SKU, ProductName = ProductName, Quantity = Quantity, UnitPrice = UnitPrice, LineTotal = LineTotal]
    ),
    
    // Expand the line item details
    ExpandLineItemDetails = Table.ExpandRecordColumn(ExtractLineItemDetails, "LineItemDetails", 
        {"SKU", "ProductName", "Quantity", "UnitPrice", "LineTotal"}),
    
    // Clean up intermediate columns
    CleanupColumns = Table.RemoveColumns(ExpandLineItemDetails, {
        "http://company.com/customer/v1:address", 
        "item"
    }),
    
    // Convert data types
    ConvertTypes = Table.TransformColumnTypes(CleanupColumns, {
        {"OrderDate", type date},
        {"Quantity", Int64.Type},
        {"UnitPrice", Currency.Type}, 
        {"LineTotal", Currency.Type}
    })
in
    ConvertTypes

XML Navigation Tip: When working with XML namespaces in M, the column names include the full namespace URI. Always examine the intermediate results to understand the exact column naming convention Power Query uses.

Performance Optimization for Large Semi-Structured Files

When working with large JSON or XML files, performance becomes critical. Here are proven strategies to optimize your M code:

Streaming vs. Loading

For files larger than 100MB, consider whether you need to process the entire file at once:

let
    // Instead of loading everything into memory
    // Source = Json.Document(File.Contents("large-file.json")),
    
    // Use streaming when possible
    Source = Json.Document(
        File.Contents("large-file.json"),
        TextEncoding.Utf8
    ),
    
    // Process in chunks if dealing with arrays
    ProcessInBatches = (data as list, batchSize as number) as table =>
        let
            BatchCount = Number.RoundUp(List.Count(data) / batchSize),
            Batches = List.Generate(
                () => 0,
                each _ < BatchCount,
                each _ + 1,
                each List.Range(data, _ * batchSize, batchSize)
            ),
            ProcessedBatches = List.Transform(Batches, each Table.FromList(_, Splitter.SplitByNothing())),
            CombinedResult = Table.Combine(ProcessedBatches)
        in
            CombinedResult
in
    ProcessInBatches

Selective Field Extraction

Don't expand columns you don't need:

let
    Source = Json.Document(File.Contents("data.json")),
    
    // Instead of expanding all columns
    // ExpandAll = Table.ExpandRecordColumn(Source, "Column1", 
    //     Record.FieldNames(Source[Column1]{0})),
    
    // Only expand what you need
    ExpandSelected = Table.ExpandRecordColumn(Source, "Column1", 
        {"id", "name", "created_date"}, 
        {"ID", "Name", "CreatedDate"}),
    
    // Use column removal strategically
    RemoveUnneeded = Table.RemoveColumns(ExpandSelected, {"internal_metadata", "debug_info"})
in
    RemoveUnneeded

Early Filtering

Apply filters as early as possible in your transformation pipeline:

let
    Source = Json.Document(File.Contents("transactions.json")),
    TransactionsList = Source[transactions],
    
    // Filter before expanding - much faster
    RecentTransactions = List.Select(TransactionsList, 
        each Date.FromText([date]) >= #date(2024, 1, 1)),
    
    TransactionsTable = Table.FromList(RecentTransactions, Splitter.SplitByNothing()),
    
    // Now expand only the filtered data
    ExpandTransactions = Table.ExpandRecordColumn(TransactionsTable, "Column1", 
        {"id", "date", "amount", "description"})
in
    ExpandTransactions

Error Handling and Data Validation

Semi-structured data is notorious for inconsistencies. Here's how to build robust error handling into your M code:

let
    Source = Json.Document(File.Contents("api-data.json")),
    
    // Create a safe field extractor function
    SafeFieldExtract = (record as any, fieldName as text, defaultValue as any) as any =>
        try
            if record <> null and Value.Type(record) = type record and Record.HasFields(record, fieldName)
            then Record.Field(record, fieldName)
            else defaultValue
        otherwise defaultValue,
    
    // Create a safe list processor
    SafeListProcess = (listValue as any, processor as function) as any =>
        try
            if listValue <> null and Value.Type(listValue) = type list
            then processor(listValue)
            else null
        otherwise null,
    
    // Create validation function for required fields
    ValidateRecord = (record as any, requiredFields as list) as record =>
        let
            ValidationResults = List.Transform(requiredFields, 
                each [
                    Field = _, 
                    HasValue = SafeFieldExtract(record, _, null) <> null,
                    Value = SafeFieldExtract(record, _, null)
                ]),
            AllValid = List.AllTrue(List.Transform(ValidationResults, each [HasValue])),
            ErrorMessage = if not AllValid 
                          then "Missing required fields: " & Text.Combine(
                              List.Select(ValidationResults, each not [HasValue])[Field], ", ")
                          else null
        in
            [
                IsValid = AllValid,
                ErrorMessage = ErrorMessage,
                Data = record
            ],
    
    ProcessedData = Table.FromList(Source[items], Splitter.SplitByNothing()),
    
    // Add validation
    AddValidation = Table.AddColumn(ProcessedData, "Validation", 
        each ValidateRecord([Column1], {"id", "name", "price"})),
    
    // Filter out invalid records and log errors
    ValidRecords = Table.SelectRows(AddValidation, each [Validation][IsValid] = true),
    InvalidRecords = Table.SelectRows(AddValidation, each [Validation][IsValid] = false),
    
    // Process valid records
    ExtractValidData = Table.AddColumn(ValidRecords, "CleanData", each [Validation][Data]),
    ExpandCleanData = Table.ExpandRecordColumn(ExtractValidData, "CleanData", 
        {"id", "name", "price"}, {"ID", "Name", "Price"}),
    
    FinalTable = Table.RemoveColumns(ExpandCleanData, {"Column1", "Validation"})
in
    FinalTable

Hands-On Exercise: Building a Multi-Source Data Pipeline

Let's put everything together with a realistic scenario. You're tasked with combining customer data from three sources:

  1. A JSON API endpoint with customer profiles
  2. An XML export from the billing system with payment history
  3. A JSON file with customer support tickets

Here's the complete solution:

let
    // Source 1: Customer Profiles (JSON API)
    GetCustomerProfiles = () =>
        let
            ApiResponse = Web.Contents("https://api.company.com/customers", [
                Headers = [Authorization = "Bearer " & ApiKey],
                Query = [limit = "1000", active = "true"]
            ]),
            JsonData = Json.Document(ApiResponse),
            CustomersList = JsonData[data],
            ProfilesTable = Table.FromList(CustomersList, Splitter.SplitByNothing()),
            ExpandProfiles = Table.ExpandRecordColumn(ProfilesTable, "Column1", 
                {"customer_id", "name", "email", "segment", "created_date"}, 
                {"CustomerID", "CustomerName", "Email", "Segment", "CreatedDate"}),
            ConvertTypes = Table.TransformColumnTypes(ExpandProfiles, {
                {"CustomerID", type text},
                {"CreatedDate", type datetime}
            })
        in
            ConvertTypes,
    
    // Source 2: Billing History (XML)
    GetBillingHistory = () =>
        let
            XmlSource = Xml.Tables(File.Contents("C:\data\billing_export.xml")),
            BillingTable = XmlSource{0}[Table],
            ExpandPayments = Table.ExpandTableColumn(BillingTable, "payment", 
                {"customer_id", "amount", "date", "status"}),
            ExtractPaymentData = Table.TransformColumns(ExpandPayments, {
                {"customer_id", each _{0}[#text]},
                {"amount", each Number.FromText(_{0}[#text])},
                {"date", each Date.FromText(_{0}[#text])},
                {"status", each _{0}[#text]}
            }),
            RenameColumns = Table.RenameColumns(ExtractPaymentData, {
                {"customer_id", "CustomerID"},
                {"amount", "PaymentAmount"},
                {"date", "PaymentDate"}, 
                {"status", "PaymentStatus"}
            }),
            // Aggregate to customer level
            GroupPayments = Table.Group(RenameColumns, {"CustomerID"}, {
                {"TotalPayments", each List.Sum([PaymentAmount]), type number},
                {"LastPaymentDate", each List.Max([PaymentDate]), type date},
                {"PaymentCount", each Table.RowCount(_), Int64.Type}
            })
        in
            GroupPayments,
    
    // Source 3: Support Tickets (JSON File)
    GetSupportTickets = () =>
        let
            JsonSource = Json.Document(File.Contents("C:\data\support_tickets.json")),
            TicketsList = JsonSource[tickets],
            TicketsTable = Table.FromList(TicketsList, Splitter.SplitByNothing()),
            ExpandTickets = Table.ExpandRecordColumn(TicketsTable, "Column1", 
                {"ticket_id", "customer_id", "priority", "status", "created_at", "resolved_at"}, 
                {"TicketID", "CustomerID", "Priority", "Status", "CreatedAt", "ResolvedAt"}),
            
            // Handle optional resolved_at field
            SafeResolvedAt = Table.TransformColumns(ExpandTickets, {
                {"ResolvedAt", each if _ <> null then DateTime.FromText(_) else null}
            }),
            
            // Calculate resolution time
            AddResolutionTime = Table.AddColumn(SafeResolvedAt, "ResolutionHours", 
                each if [ResolvedAt] <> null and [CreatedAt] <> null
                     then Duration.TotalHours([ResolvedAt] - DateTime.FromText([CreatedAt]))
                     else null),
            
            // Aggregate to customer level
            GroupTickets = Table.Group(AddResolutionTime, {"CustomerID"}, {
                {"TotalTickets", each Table.RowCount(_), Int64.Type},
                {"OpenTickets", each Table.RowCount(Table.SelectRows(_, each [Status] <> "closed")), Int64.Type},
                {"AvgResolutionHours", each List.Average(List.RemoveNulls([ResolutionHours])), type number},
                {"HighPriorityTickets", each Table.RowCount(Table.SelectRows(_, each [Priority] = "high")), Int64.Type}
            })
        in
            GroupTickets,
    
    // Combine all sources
    CustomerProfiles = GetCustomerProfiles(),
    BillingHistory = GetBillingHistory(),
    SupportTickets = GetSupportTickets(),
    
    // Join data sources
    JoinBilling = Table.NestedJoin(CustomerProfiles, {"CustomerID"}, BillingHistory, {"CustomerID"}, 
        "BillingData", JoinKind.LeftOuter),
    ExpandBilling = Table.ExpandTableColumn(JoinBilling, "BillingData", 
        {"TotalPayments", "LastPaymentDate", "PaymentCount"}),
    
    JoinSupport = Table.NestedJoin(ExpandBilling, {"CustomerID"}, SupportTickets, {"CustomerID"}, 
        "SupportData", JoinKind.LeftOuter),
    ExpandSupport = Table.ExpandTableColumn(JoinSupport, "SupportData", 
        {"TotalTickets", "OpenTickets", "AvgResolutionHours", "HighPriorityTickets"}),
    
    // Handle nulls from left joins
    FillNulls = Table.ReplaceValue(ExpandSupport, null, 0, Replacer.ReplaceValue, 
        {"TotalPayments", "PaymentCount", "TotalTickets", "OpenTickets", "HighPriorityTickets"}),
    
    // Calculate customer health score
    AddHealthScore = Table.AddColumn(FillNulls, "HealthScore", 
        each 
            let
                PaymentScore = if [TotalPayments] > 10000 then 40 
                              else if [TotalPayments] > 5000 then 30
                              else if [TotalPayments] > 1000 then 20 
                              else 10,
                SupportScore = if [OpenTickets] = 0 then 30
                              else if [OpenTickets] <= 2 then 20
                              else 10,
                EngagementScore = if [PaymentCount] > 10 then 30
                                 else if [PaymentCount] > 5 then 20
                                 else 10
            in
                PaymentScore + SupportScore + EngagementScore
    ),
    
    // Final data type conversions
    FinalTypes = Table.TransformColumnTypes(AddHealthScore, {
        {"TotalPayments", Currency.Type},
        {"LastPaymentDate", type date},
        {"AvgResolutionHours", type number}
    })
in
    FinalTypes

This example demonstrates:

  • Handling multiple data source types within one query
  • Robust error handling for missing fields
  • Efficient aggregation and joining strategies
  • Business logic implementation (health score calculation)
  • Proper data type management

Production Tip: When building multi-source pipelines like this, consider implementing each source as a separate function in Power Query. This makes debugging easier and allows for better code reuse.

Common Mistakes & Troubleshooting

Mistake 1: Over-expanding Columns

Problem: Expanding every available column leads to performance issues and cluttered data models.

Solution: Always be selective about which columns you expand. Use Table.ColumnNames() to inspect available columns first:

// Inspect available columns before expanding
let
    Source = Json.Document(File.Contents("data.json")),
    SampleRecord = Source{0},
    AvailableFields = Record.FieldNames(SampleRecord)
in
    AvailableFields

Mistake 2: Ignoring Data Type Variations

Problem: Assuming all records have the same structure leads to errors when field types vary.

Solution: Always use defensive navigation:

// Wrong approach
BadExtraction = [Amount],

// Correct approach
GoodExtraction = if Value.Type([Amount]) = type number 
                 then [Amount]
                 else if Value.Type([Amount]) = type text 
                      then Number.FromText([Amount])
                      else null

Mistake 3: Not Handling Empty Arrays or Null Values

Problem: List operations fail when lists are empty or null.

Solution: Always check list contents before processing:

SafeListSum = (listValue as any) as number =>
    if listValue <> null and Value.Type(listValue) = type list and List.Count(listValue) > 0
    then List.Sum(listValue)
    else 0

Mistake 4: Inefficient XML Namespace Handling

Problem: Hard-coding namespace URIs makes code brittle.

Solution: Extract namespaces programmatically or use partial matching:

// Create a helper function for namespace-aware field extraction
ExtractXmlField = (xmlTable as table, localName as text) as any =>
    let
        MatchingRows = Table.SelectRows(xmlTable, 
            each Text.EndsWith([Name], ":" & localName) or [Name] = localName),
        Result = if Table.RowCount(MatchingRows) > 0 
                 then MatchingRows[#text]{0}
                 else null
    in
        Result

Troubleshooting Memory Issues

When processing large files, you might encounter memory limitations. Here's a diagnostic approach:

// Add this at different points in your query to monitor memory usage
let
    Source = Json.Document(File.Contents("large-file.json")),
    
    // Check record count at various stages
    StageOneCount = List.Count(Source[data]),
    
    ProcessedData = Table.FromList(Source[data], Splitter.SplitByNothing()),
    
    StageTwoCount = Table.RowCount(ProcessedData),
    
    // If memory issues occur, implement batching
    BatchProcessor = (data as list, batchSize as number) =>
        let
            TotalRecords = List.Count(data),
            BatchCount = Number.RoundUp(TotalRecords / batchSize),
            ProcessBatch = (batchIndex as number) =>
                let
                    StartIndex = batchIndex * batchSize,
                    BatchData = List.Range(data, StartIndex, batchSize),
                    BatchTable = Table.FromList(BatchData, Splitter.SplitByNothing())
                in
                    BatchTable,
            
            AllBatches = List.Transform({0..BatchCount-1}, ProcessBatch),
            CombinedResult = Table.Combine(AllBatches)
        in
            CombinedResult
in
    if StageOneCount > 50000 
    then BatchProcessor(Source[data], 10000)
    else ProcessedData

Summary & Next Steps

Working effectively with JSON and XML in Power Query's M language requires understanding that you're navigating tree structures, not flat tables. The key principles we've covered are:

  1. Navigate defensively: Always check for field existence and handle null values gracefully
  2. Be selective: Only expand the columns and process the data you actually need
  3. Handle variations: Real-world semi-structured data rarely follows a consistent schema
  4. Optimize early: Apply filters and transformations as early as possible in your pipeline
  5. Plan for scale: Consider memory usage and processing time for production datasets

The techniques you've learned here—defensive navigation, performance optimization, error handling, and multi-source integration—form the foundation for building robust data pipelines that can handle the complexity of real-world semi-structured data.

Your next steps should be:

  • Practice with your own organization's JSON and XML sources
  • Experiment with the Web.Contents function for API integration
  • Learn about Power Query's query folding capabilities for database sources
  • Explore advanced M functions like List.Generate for complex transformations

Remember that Power Query is just one tool in the modern data stack. As you become more comfortable with M, consider when other approaches—like Azure Functions for complex JSON processing or dedicated ETL tools for high-volume XML processing—might be more appropriate for your specific use cases.

Learning Path: Advanced M Language

Previous

Dynamic Queries with M: Build Flexible, Reusable Transformations

Next

Advanced JSON and XML Processing in Power Query M Language

Related Articles

Power Query🌱 Foundation

Advanced M: Iterators, Accumulators, and Recursive Patterns

13 min
Power Query🔥 Expert

Building a Reusable Function Library in Power Query

30 min
Power Query⚡ Practitioner

M Language Performance Patterns and Anti-Patterns: Optimize Power Query for Speed

15 min

On this page

  • Prerequisites
  • Understanding M's Approach to Semi-Structured Data
  • Navigating JSON Hierarchies with Precision
  • Advanced JSON Pattern Matching
  • Working with XML Namespaces and Complex Structures
  • Performance Optimization for Large Semi-Structured Files
  • Streaming vs. Loading
  • Selective Field Extraction
  • Early Filtering
  • Error Handling and Data Validation
  • Mistake 1: Over-expanding Columns
  • Mistake 2: Ignoring Data Type Variations
  • Mistake 3: Not Handling Empty Arrays or Null Values
  • Mistake 4: Inefficient XML Namespace Handling
  • Troubleshooting Memory Issues
  • Summary & Next Steps
  • Hands-On Exercise: Building a Multi-Source Data Pipeline
  • Common Mistakes & Troubleshooting
  • Mistake 1: Over-expanding Columns
  • Mistake 2: Ignoring Data Type Variations
  • Mistake 3: Not Handling Empty Arrays or Null Values
  • Mistake 4: Inefficient XML Namespace Handling
  • Troubleshooting Memory Issues
  • Summary & Next Steps