Working with Dates and Text in Power Query

Imagine you're working with a customer database where order dates arrive as "March 15, 2024 @ 2:30 PM EST" and customer names come in as "SMITH, JOHN A." Your analysis needs clean date columns for time-based calculations and properly formatted names for customer communications. Raw data rarely arrives in the format you need, and Power Query's text and date transformation capabilities are what separate efficient analysts from those still wrestling with Excel formulas.

Power Query provides a comprehensive toolkit for transforming messy text and date data into analysis-ready formats. Whether you're parsing timestamps from log files, cleaning imported text data, or performing complex date calculations, mastering these functions will dramatically improve your data preparation workflow.

What you'll learn:

How to parse and transform text using Power Query's text functions
Advanced date parsing techniques for non-standard formats
Text manipulation strategies for cleaning and standardizing data
Date arithmetic and business logic implementation
Performance optimization techniques for large datasets
Common troubleshooting approaches for text and date transformation issues

Prerequisites

You should be comfortable with Power Query's interface, basic M language syntax, and understand data types. Familiarity with creating custom columns and applying transformations through the UI will help you follow along more effectively.

Understanding Text Data Types and Parsing Strategies

Power Query treats text as a fundamental data type, but real-world text data comes with challenges: inconsistent formatting, embedded special characters, mixed case conventions, and cultural variations. Understanding how Power Query handles these scenarios is crucial for building robust transformations.

Let's start with a realistic dataset representing customer feedback data:

let
    Source = Table.FromRecords({
        [ID = 1, CustomerName = "  JOHNSON, MARY E.  ", 
         FeedbackDate = "2024-03-15T14:30:00Z", 
         Comment = "Great service! Will definitely return. 5/5 stars."],
        [ID = 2, CustomerName = "smith, john a.", 
         FeedbackDate = "March 20th, 2024 at 10:15 AM", 
         Comment = "Product arrived damaged. Disappointed. 2/5"],
        [ID = 3, CustomerName = "Dr. Patricia Williams-Chen", 
         FeedbackDate = "24/03/2024 16:45", 
         Comment = "Excellent customer support team! Highly recommend."],
        [ID = 4, CustomerName = "MARTINEZ,CARLOS", 
         FeedbackDate = "2024-03-28", 
         Comment = "Average experience. Nothing special. 3 out of 5 stars"]
    })
in
    Source

This dataset demonstrates common text and date challenges: inconsistent name formatting, multiple date formats, varying whitespace, and unstructured comment text. Let's systematically address each issue.

Text Cleaning and Standardization

The first step in text processing is cleaning. Power Query's Text.Trim, Text.Proper, and Text.Clean functions handle most basic cleanup:

CleanedNames = Table.AddColumn(Source, "CleanName", each 
    Text.Proper(Text.Trim([CustomerName]))
)

But this basic approach doesn't handle our specific formatting needs. Let's create a more sophisticated name standardization:

StandardizedNames = Table.AddColumn(Source, "StandardizedName", each
    let
        // Remove extra spaces and convert to proper case
        trimmed = Text.Trim([CustomerName]),
        properCase = Text.Proper(trimmed),
        
        // Handle "LASTNAME, FIRSTNAME" format
        nameResult = if Text.Contains(properCase, ",") then
            let
                parts = Text.Split(properCase, ","),
                lastName = Text.Trim(List.First(parts)),
                firstName = Text.Trim(List.Last(parts))
            in
                firstName & " " & lastName
        else
            properCase
    in
        nameResult
)

This approach handles comma-separated formats while preserving titles and hyphenated names. The key insight is using conditional logic within custom columns to handle different text patterns.

Advanced Text Pattern Recognition

Real-world text data often contains patterns that require extraction. Let's extract star ratings from our comment field:

ExtractRatings = Table.AddColumn(StandardizedNames, "Rating", each
    let
        comment = [Comment],
        // Look for patterns like "5/5", "3 out of 5", "4/5 stars"
        pattern1 = Text.PositionOf(comment, "/5"),
        pattern2 = Text.PositionOf(comment, " out of 5"),
        
        extractedRating = if pattern1 >= 0 then
            // Handle "X/5" format
            let
                beforeSlash = Text.End(Text.Start(comment, pattern1), 1)
            in
                if Value.Is(Number.FromText(beforeSlash), type number) then
                    Number.FromText(beforeSlash)
                else
                    null
        else if pattern2 >= 0 then
            // Handle "X out of 5" format
            let
                beforePhrase = Text.Start(comment, pattern2),
                lastChar = Text.End(beforePhrase, 1)
            in
                if Value.Is(Number.FromText(lastChar), type number) then
                    Number.FromText(lastChar)
                else
                    null
        else
            null
    in
        extractedRating
)

This demonstrates Power Query's text position functions and conditional logic for pattern matching. Notice how we validate that extracted text is actually numeric before converting it.

Date Parsing and Transformation Mastery

Date handling in Power Query requires understanding both the parsing capabilities and the cultural context of your data. Power Query's automatic date detection works well for standard formats, but real-world data often requires explicit parsing strategies.

Multi-Format Date Parsing

Our sample data contains four different date formats. Let's create a robust date parser that handles all of them:

ParsedDates = Table.AddColumn(ExtractRatings, "ParsedDate", each
    let
        dateText = [FeedbackDate],
        
        // Try ISO format first (most reliable)
        isoResult = try DateTime.FromText(dateText) otherwise null,
        
        // Try standard formats
        standardResult = if isoResult = null then
            try Date.FromText(dateText) otherwise null
        else
            DateTime.Date(isoResult),
            
        // Handle "March 20th, 2024 at 10:15 AM" format
        longFormatResult = if standardResult = null then
            try 
                let
                    // Remove ordinal indicators (st, nd, rd, th)
                    cleanText = Text.Replace(Text.Replace(Text.Replace(Text.Replace(
                        dateText, "st,", ","), "nd,", ","), "rd,", ","), "th,", ","),
                    // Remove "at" for datetime parsing
                    cleanerText = Text.Replace(cleanText, " at ", " "),
                    parsed = DateTime.FromText(cleanerText)
                in
                    DateTime.Date(parsed)
            otherwise null
        else
            standardResult,
            
        // Handle DD/MM/YYYY format (European)
        euroFormatResult = if longFormatResult = null then
            try
                let
                    parts = Text.Split(dateText, "/"),
                    day = Number.FromText(List.First(parts)),
                    month = Number.FromText(parts{1}),
                    yearPart = Text.Split(List.Last(parts), " "){0},
                    year = Number.FromText(yearPart)
                in
                    #date(year, month, day)
            otherwise null
        else
            longFormatResult
    in
        euroFormatResult
)

This parsing strategy uses a fallback approach: try the most reliable format first, then progressively handle more complex patterns. The try...otherwise construct is essential for robust date parsing.

Business Date Calculations

Once dates are properly parsed, you'll often need business-specific calculations. Let's add columns for business days since feedback and categorize feedback by recency:

BusinessDateLogic = Table.AddColumn(ParsedDates, "BusinessDaysAgo", each
    let
        feedbackDate = [ParsedDate],
        today = Date.From(DateTime.LocalNow()),
        daysDifference = Duration.Days(today - feedbackDate),
        
        // Calculate business days (excluding weekends)
        businessDays = List.Sum(
            List.Transform(
                {0..daysDifference-1},
                each if Date.DayOfWeek(Date.AddDays(feedbackDate, _)) < 5 
                     then 1 else 0
            )
        )
    in
        businessDays
)

Then add a recency category:

FeedbackRecency = Table.AddColumn(BusinessDateLogic, "RecencyCategory", each
    let
        daysAgo = [BusinessDaysAgo]
    in
        if daysAgo <= 1 then "Today/Yesterday"
        else if daysAgo <= 5 then "This Week"
        else if daysAgo <= 30 then "This Month"
        else "Older"
)

Working with Time Zones and DateTime Precision

When working with datetime data, time zone handling becomes critical. Power Query provides functions for time zone conversion:

TimeZoneAdjustment = Table.AddColumn(Source, "LocalDateTime", each
    let
        originalDateTime = DateTime.FromText([FeedbackDate]),
        // Convert UTC to Eastern Time
        easternDateTime = DateTimeZone.ToLocal(
            DateTimeZone.FromText([FeedbackDate])
        )
    in
        easternDateTime
)

For applications requiring precise time calculations, consider storing both UTC and local times, especially when data spans multiple time zones.

Advanced Text Manipulation Techniques

Beyond basic cleaning, text data often requires sophisticated transformations for analysis. Let's explore techniques for handling complex text scenarios.

Dynamic Text Splitting and Parsing

Consider parsing structured text like addresses or product codes. Here's how to handle variable-length, delimited text:

ParseStructuredText = Table.AddColumn(Source, "ParsedAddress", each
    let
        address = "123 Main St, Apt 4B, Springfield, IL, 62701",
        parts = Text.Split(address, ", "),
        
        // Create a record with flexible field assignment
        parsedAddress = [
            Street = parts{0}?,
            Unit = if List.Count(parts) > 4 then parts{1}? else null,
            City = if List.Count(parts) > 4 then parts{2}? else parts{1}?,
            State = if List.Count(parts) > 4 then parts{3}? else parts{2}?,
            Zip = List.Last(parts)
        ]
    in
        parsedAddress
)

The ? operator safely handles cases where list indices might not exist, preventing errors when text structure varies.

Text Analysis and Categorization

Power Query can perform basic text analysis for categorization purposes:

TextAnalysis = Table.AddColumn(Source, "SentimentIndicators", each
    let
        comment = Text.Lower([Comment]),
        
        positiveWords = {"great", "excellent", "recommend", "fantastic", "amazing"},
        negativeWords = {"disappointed", "terrible", "awful", "damaged", "worst"},
        
        positiveCount = List.Sum(
            List.Transform(positiveWords, each if Text.Contains(comment, _) then 1 else 0)
        ),
        
        negativeCount = List.Sum(
            List.Transform(negativeWords, each if Text.Contains(comment, _) then 1 else 0)
        ),
        
        sentiment = if positiveCount > negativeCount then "Positive"
                   else if negativeCount > positiveCount then "Negative"
                   else "Neutral"
    in
        [PositiveWords = positiveCount, NegativeWords = negativeCount, Sentiment = sentiment]
)

This approach creates structured sentiment indicators from unstructured text, useful for basic text classification before more sophisticated analysis.

Regular Expression Patterns in Power Query

While Power Query doesn't have native regex support, you can achieve similar results with text functions for common patterns:

ExtractPatterns = Table.AddColumn(Source, "ExtractedInfo", each
    let
        text = [Comment],
        
        // Extract email-like patterns (simplified)
        emailPattern = let
            atPosition = Text.PositionOf(text, "@"),
            beforeAt = if atPosition > 0 then
                Text.End(Text.Start(text, atPosition), 
                    List.Min({atPosition, 20})) else "",
            afterAt = if atPosition > 0 then
                Text.Start(Text.End(text, Text.Length(text) - atPosition - 1), 20) else "",
            email = if atPosition > 0 and Text.Contains(afterAt, ".") then
                beforeAt & "@" & afterAt else null
        in
            email,
            
        // Extract phone numbers (US format)
        phoneExtraction = let
            digits = Text.Select(text, {"0".."9"}),
            phone = if Text.Length(digits) = 10 then
                Text.Start(digits, 3) & "-" & 
                Text.Middle(digits, 3, 3) & "-" & 
                Text.End(digits, 4)
            else null
        in
            phone
    in
        [Email = emailPattern, Phone = phoneExtraction]
)

Performance Optimization for Text and Date Operations

When working with large datasets, text and date operations can become performance bottlenecks. Understanding optimization strategies is crucial for production environments.

Efficient Text Processing Strategies

Text operations are inherently expensive, so minimize unnecessary transformations:

// Inefficient: Multiple passes through the data
BadExample = 
    Table.AddColumn(Source, "Step1", each Text.Trim([CustomerName])),
    AddStep2 = Table.AddColumn(BadExample, "Step2", each Text.Proper([Step1])),
    AddStep3 = Table.AddColumn(AddStep2, "Final", each Text.Replace([Step2], ",", ""))

// Efficient: Single pass with combined logic
GoodExample = Table.AddColumn(Source, "CleanName", each
    let
        cleaned = Text.Replace(Text.Proper(Text.Trim([CustomerName])), ",", "")
    in
        cleaned
)

For complex transformations, consider using Table.TransformColumns to modify existing columns rather than creating new ones:

EfficientTransform = Table.TransformColumns(Source, {
    {"CustomerName", each Text.Proper(Text.Trim(_)), type text},
    {"FeedbackDate", each try Date.FromText(_) otherwise null, type nullable date}
})

Date Calculation Optimization

Date calculations can be optimized by leveraging Power Query's built-in functions rather than manual arithmetic:

// Less efficient
ManualDateCalc = Table.AddColumn(Source, "DaysDiff", each
    let
        date1 = [ParsedDate],
        date2 = Date.From(DateTime.LocalNow()),
        diff = Duration.Days(date2 - date1)
    in
        diff
)

// More efficient
OptimizedDateCalc = Table.AddColumn(Source, "DaysDiff", each
    Duration.Days(Date.From(DateTime.LocalNow()) - [ParsedDate])
)

Memory-Efficient Text Operations

For large text datasets, consider the memory implications of your transformations:

// Memory-efficient approach for large datasets
StreamlinedProcessing = 
    let
        // Filter early to reduce data volume
        FilteredData = Table.SelectRows(Source, each [CustomerName] <> null),
        
        // Combine multiple text operations
        ProcessedData = Table.TransformColumns(FilteredData, {
            {"CustomerName", each 
                let
                    cleaned = Text.Trim(_),
                    proper = Text.Proper(cleaned),
                    final = if Text.Contains(proper, ",") then
                        Text.Trim(List.Last(Text.Split(proper, ","))) & " " & 
                        Text.Trim(List.First(Text.Split(proper, ",")))
                    else proper
                in
                    final
            }
        })
    in
        ProcessedData

Hands-On Exercise: Building a Customer Data Processor

Let's apply these concepts by building a comprehensive customer data processor that handles real-world messy data:

let
    // Sample problematic customer data
    RawData = Table.FromRecords({
        [CustomerID = "CUST001", Name = "  ANDERSON, JAMES R.  ", 
         SignupDate = "2024-01-15T09:30:00Z", LastPurchase = "March 10th, 2024",
         Phone = "5551234567", Email = "  JAMES.ANDERSON@EMAIL.COM  ",
         Address = "123 Oak St, Suite 200, Chicago, IL, 60601"],
        
        [CustomerID = "CUST002", Name = "martinez, maria", 
         SignupDate = "15/02/2024", LastPurchase = "2024-03-20 14:30",
         Phone = "(555) 987-6543", Email = "maria.martinez@company.org",
         Address = "456 Pine Ave, Denver, CO, 80202"],
         
        [CustomerID = "CUST003", Name = "Dr. Patricia Williams-Chen PhD", 
         SignupDate = "Feb 28, 2024 @ 3:45 PM", LastPurchase = "28/03/2024",
         Phone = "555.876.5432", Email = "p.williams.chen@university.edu",
         Address = "789 Elm Blvd, Apt 15B, Austin, TX, 78701"]
    }),
    
    // Step 1: Clean and standardize names
    CleanNames = Table.AddColumn(RawData, "StandardName", each
        let
            rawName = Text.Trim([Name]),
            properCase = Text.Proper(rawName),
            
            // Handle "LASTNAME, FIRSTNAME" format
            standardized = if Text.Contains(properCase, ",") and 
                             not Text.Contains(properCase, "Dr.") and
                             not Text.Contains(properCase, "PhD") then
                let
                    parts = Text.Split(properCase, ","),
                    lastName = Text.Trim(List.First(parts)),
                    firstName = Text.Trim(List.Last(parts))
                in
                    firstName & " " & lastName
            else
                properCase
        in
            standardized
    ),
    
    // Step 2: Parse multiple date formats
    ParseDates = Table.AddColumn(CleanNames, "ParsedSignup", each
        let
            dateText = [SignupDate],
            
            // Try ISO format first
            isoResult = try DateTime.FromText(dateText) otherwise null,
            
            // Try DD/MM/YYYY format
            euroResult = if isoResult = null then
                try
                    let
                        parts = Text.Split(dateText, "/"),
                        day = Number.FromText(List.First(parts)),
                        month = Number.FromText(parts{1}),
                        year = Number.FromText(parts{2})
                    in
                        #date(year, month, day)
                otherwise null
            else DateTime.Date(isoResult),
            
            // Try "Feb 28, 2024 @ 3:45 PM" format
            longResult = if euroResult = null then
                try
                    let
                        cleanText = Text.Replace(dateText, " @ ", " "),
                        parsed = DateTime.FromText(cleanText)
                    in
                        DateTime.Date(parsed)
                otherwise null
            else euroResult
        in
            longResult
    ),
    
    // Step 3: Standardize phone numbers
    StandardizePhones = Table.AddColumn(ParseDates, "StandardPhone", each
        let
            phone = [Phone],
            digitsOnly = Text.Select(phone, {"0".."9"}),
            formatted = if Text.Length(digitsOnly) = 10 then
                "(" & Text.Start(digitsOnly, 3) & ") " & 
                Text.Middle(digitsOnly, 3, 3) & "-" & 
                Text.End(digitsOnly, 4)
            else phone
        in
            formatted
    ),
    
    // Step 4: Clean email addresses
    CleanEmails = Table.AddColumn(StandardizePhones, "StandardEmail", each
        Text.Lower(Text.Trim([Email]))
    ),
    
    // Step 5: Parse addresses into components
    ParseAddresses = Table.AddColumn(CleanEmails, "AddressComponents", each
        let
            address = [Address],
            parts = Text.Split(address, ", "),
            
            components = [
                Street = parts{0}?,
                Unit = if List.Count(parts) > 4 then parts{1}? else null,
                City = if List.Count(parts) > 4 then parts{2}? else parts{1}?,
                State = if List.Count(parts) > 4 then parts{3}? else parts{2}?,
                Zip = List.Last(parts)
            ]
        in
            components
    ),
    
    // Step 6: Add business metrics
    AddMetrics = Table.AddColumn(ParseAddresses, "CustomerMetrics", each
        let
            signupDate = [ParsedSignup],
            today = Date.From(DateTime.LocalNow()),
            daysSinceSignup = Duration.Days(today - signupDate),
            
            metrics = [
                DaysSinceSignup = daysSinceSignup,
                CustomerTenure = if daysSinceSignup < 30 then "New"
                               else if daysSinceSignup < 90 then "Recent" 
                               else "Established"
            ]
        in
            metrics
    ),
    
    // Step 7: Final cleanup - remove intermediate columns and expand records
    FinalResult = 
        let
            // Remove original messy columns
            CleanedTable = Table.RemoveColumns(AddMetrics, 
                {"Name", "SignupDate", "Phone", "Email", "Address"}),
            
            // Expand the address components
            ExpandAddress = Table.ExpandRecordColumn(CleanedTable, 
                "AddressComponents", 
                {"Street", "Unit", "City", "State", "Zip"}),
            
            // Expand customer metrics
            ExpandMetrics = Table.ExpandRecordColumn(ExpandAddress,
                "CustomerMetrics",
                {"DaysSinceSignup", "CustomerTenure"})
        in
            ExpandMetrics
            
in
    FinalResult

This comprehensive example demonstrates how to chain multiple text and date transformations efficiently while handling various edge cases and data quality issues.

Common Mistakes & Troubleshooting

Date Parsing Failures

The most common issue with date parsing is cultural format assumptions. Power Query's automatic detection may interpret "01/02/2024" as January 2nd when you expect February 1st:

// Problem: Ambiguous date interpretation
ProblematicParsing = Date.FromText("01/02/2024")  // Could be Jan 2 or Feb 1

// Solution: Explicit format specification
SafeParsing = Date.FromText("01/02/2024", [Format="dd/MM/yyyy"])

Always validate your date parsing results with sample data from different regions or systems.

Text Null Handling

Text functions in Power Query can fail when encountering null values. Always implement null checking:

// Problem: Function fails on null input
BadTextHandling = Table.AddColumn(Source, "CleanName", each
    Text.Proper([CustomerName])  // Fails if CustomerName is null
)

// Solution: Null-safe processing
SafeTextHandling = Table.AddColumn(Source, "CleanName", each
    if [CustomerName] = null then null else Text.Proper([CustomerName])
)

Performance Issues with Large Text Operations

Text processing on large datasets can cause memory issues. Monitor your query performance and consider these strategies:

Tip: Use the Power Query performance profiler to identify bottlenecks in text-heavy transformations. Text.Contains operations are particularly expensive when applied to large datasets.

// Optimize by filtering early
EfficientProcessing = 
    let
        FilteredSource = Table.SelectRows(Source, each [CustomerName] <> null),
        ProcessedData = Table.TransformColumns(FilteredSource, ...)
    in
        ProcessedData

Unicode and Special Character Issues

When working with international data, be aware of Unicode normalization issues:

// Handle Unicode normalization for consistent comparisons
NormalizedText = Table.AddColumn(Source, "NormalizedName", each
    let
        normalized = Text.Clean([CustomerName]),
        // Remove diacritics for standardization if needed
        simplified = Text.Replace(Text.Replace(normalized, "é", "e"), "ñ", "n")
    in
        simplified
)

For production systems handling international data, consider more comprehensive Unicode normalization strategies.

Summary & Next Steps

You've now mastered the essential techniques for handling text and date data in Power Query. These skills form the foundation for most data preparation tasks you'll encounter in professional environments.

Key concepts covered:

Multi-format date parsing with fallback strategies
Advanced text manipulation and pattern recognition
Performance optimization for large datasets
Error handling and null safety in transformations
Business logic implementation for dates and text

Immediate next steps:

Practice these techniques on your own datasets, particularly focusing on the edge cases specific to your data sources
Experiment with the performance optimization strategies on larger datasets to understand their impact
Build reusable functions for common text and date patterns in your organization

Advanced topics to explore:

Custom M functions for reusable text and date logic
Integration with Power BI's date intelligence functions
Advanced error handling and data validation patterns
Working with streaming data and incremental refresh scenarios

The techniques you've learned here will serve as building blocks for more advanced Power Query scenarios, including complex data modeling, automated report generation, and enterprise data integration workflows.

Working with Dates and Text in Power Query: Advanced Transformation Techniques