
Imagine you're working with a customer database where order dates arrive as "March 15, 2024 @ 2:30 PM EST" and customer names come in as "SMITH, JOHN A." Your analysis needs clean date columns for time-based calculations and properly formatted names for customer communications. Raw data rarely arrives in the format you need, and Power Query's text and date transformation capabilities are what separate efficient analysts from those still wrestling with Excel formulas.
Power Query provides a comprehensive toolkit for transforming messy text and date data into analysis-ready formats. Whether you're parsing timestamps from log files, cleaning imported text data, or performing complex date calculations, mastering these functions will dramatically improve your data preparation workflow.
What you'll learn:
You should be comfortable with Power Query's interface, basic M language syntax, and understand data types. Familiarity with creating custom columns and applying transformations through the UI will help you follow along more effectively.
Power Query treats text as a fundamental data type, but real-world text data comes with challenges: inconsistent formatting, embedded special characters, mixed case conventions, and cultural variations. Understanding how Power Query handles these scenarios is crucial for building robust transformations.
Let's start with a realistic dataset representing customer feedback data:
let
Source = Table.FromRecords({
[ID = 1, CustomerName = " JOHNSON, MARY E. ",
FeedbackDate = "2024-03-15T14:30:00Z",
Comment = "Great service! Will definitely return. 5/5 stars."],
[ID = 2, CustomerName = "smith, john a.",
FeedbackDate = "March 20th, 2024 at 10:15 AM",
Comment = "Product arrived damaged. Disappointed. 2/5"],
[ID = 3, CustomerName = "Dr. Patricia Williams-Chen",
FeedbackDate = "24/03/2024 16:45",
Comment = "Excellent customer support team! Highly recommend."],
[ID = 4, CustomerName = "MARTINEZ,CARLOS",
FeedbackDate = "2024-03-28",
Comment = "Average experience. Nothing special. 3 out of 5 stars"]
})
in
Source
This dataset demonstrates common text and date challenges: inconsistent name formatting, multiple date formats, varying whitespace, and unstructured comment text. Let's systematically address each issue.
The first step in text processing is cleaning. Power Query's Text.Trim, Text.Proper, and Text.Clean functions handle most basic cleanup:
CleanedNames = Table.AddColumn(Source, "CleanName", each
Text.Proper(Text.Trim([CustomerName]))
)
But this basic approach doesn't handle our specific formatting needs. Let's create a more sophisticated name standardization:
StandardizedNames = Table.AddColumn(Source, "StandardizedName", each
let
// Remove extra spaces and convert to proper case
trimmed = Text.Trim([CustomerName]),
properCase = Text.Proper(trimmed),
// Handle "LASTNAME, FIRSTNAME" format
nameResult = if Text.Contains(properCase, ",") then
let
parts = Text.Split(properCase, ","),
lastName = Text.Trim(List.First(parts)),
firstName = Text.Trim(List.Last(parts))
in
firstName & " " & lastName
else
properCase
in
nameResult
)
This approach handles comma-separated formats while preserving titles and hyphenated names. The key insight is using conditional logic within custom columns to handle different text patterns.
Real-world text data often contains patterns that require extraction. Let's extract star ratings from our comment field:
ExtractRatings = Table.AddColumn(StandardizedNames, "Rating", each
let
comment = [Comment],
// Look for patterns like "5/5", "3 out of 5", "4/5 stars"
pattern1 = Text.PositionOf(comment, "/5"),
pattern2 = Text.PositionOf(comment, " out of 5"),
extractedRating = if pattern1 >= 0 then
// Handle "X/5" format
let
beforeSlash = Text.End(Text.Start(comment, pattern1), 1)
in
if Value.Is(Number.FromText(beforeSlash), type number) then
Number.FromText(beforeSlash)
else
null
else if pattern2 >= 0 then
// Handle "X out of 5" format
let
beforePhrase = Text.Start(comment, pattern2),
lastChar = Text.End(beforePhrase, 1)
in
if Value.Is(Number.FromText(lastChar), type number) then
Number.FromText(lastChar)
else
null
else
null
in
extractedRating
)
This demonstrates Power Query's text position functions and conditional logic for pattern matching. Notice how we validate that extracted text is actually numeric before converting it.
Date handling in Power Query requires understanding both the parsing capabilities and the cultural context of your data. Power Query's automatic date detection works well for standard formats, but real-world data often requires explicit parsing strategies.
Our sample data contains four different date formats. Let's create a robust date parser that handles all of them:
ParsedDates = Table.AddColumn(ExtractRatings, "ParsedDate", each
let
dateText = [FeedbackDate],
// Try ISO format first (most reliable)
isoResult = try DateTime.FromText(dateText) otherwise null,
// Try standard formats
standardResult = if isoResult = null then
try Date.FromText(dateText) otherwise null
else
DateTime.Date(isoResult),
// Handle "March 20th, 2024 at 10:15 AM" format
longFormatResult = if standardResult = null then
try
let
// Remove ordinal indicators (st, nd, rd, th)
cleanText = Text.Replace(Text.Replace(Text.Replace(Text.Replace(
dateText, "st,", ","), "nd,", ","), "rd,", ","), "th,", ","),
// Remove "at" for datetime parsing
cleanerText = Text.Replace(cleanText, " at ", " "),
parsed = DateTime.FromText(cleanerText)
in
DateTime.Date(parsed)
otherwise null
else
standardResult,
// Handle DD/MM/YYYY format (European)
euroFormatResult = if longFormatResult = null then
try
let
parts = Text.Split(dateText, "/"),
day = Number.FromText(List.First(parts)),
month = Number.FromText(parts{1}),
yearPart = Text.Split(List.Last(parts), " "){0},
year = Number.FromText(yearPart)
in
#date(year, month, day)
otherwise null
else
longFormatResult
in
euroFormatResult
)
This parsing strategy uses a fallback approach: try the most reliable format first, then progressively handle more complex patterns. The try...otherwise construct is essential for robust date parsing.
Once dates are properly parsed, you'll often need business-specific calculations. Let's add columns for business days since feedback and categorize feedback by recency:
BusinessDateLogic = Table.AddColumn(ParsedDates, "BusinessDaysAgo", each
let
feedbackDate = [ParsedDate],
today = Date.From(DateTime.LocalNow()),
daysDifference = Duration.Days(today - feedbackDate),
// Calculate business days (excluding weekends)
businessDays = List.Sum(
List.Transform(
{0..daysDifference-1},
each if Date.DayOfWeek(Date.AddDays(feedbackDate, _)) < 5
then 1 else 0
)
)
in
businessDays
)
Then add a recency category:
FeedbackRecency = Table.AddColumn(BusinessDateLogic, "RecencyCategory", each
let
daysAgo = [BusinessDaysAgo]
in
if daysAgo <= 1 then "Today/Yesterday"
else if daysAgo <= 5 then "This Week"
else if daysAgo <= 30 then "This Month"
else "Older"
)
When working with datetime data, time zone handling becomes critical. Power Query provides functions for time zone conversion:
TimeZoneAdjustment = Table.AddColumn(Source, "LocalDateTime", each
let
originalDateTime = DateTime.FromText([FeedbackDate]),
// Convert UTC to Eastern Time
easternDateTime = DateTimeZone.ToLocal(
DateTimeZone.FromText([FeedbackDate])
)
in
easternDateTime
)
For applications requiring precise time calculations, consider storing both UTC and local times, especially when data spans multiple time zones.
Beyond basic cleaning, text data often requires sophisticated transformations for analysis. Let's explore techniques for handling complex text scenarios.
Consider parsing structured text like addresses or product codes. Here's how to handle variable-length, delimited text:
ParseStructuredText = Table.AddColumn(Source, "ParsedAddress", each
let
address = "123 Main St, Apt 4B, Springfield, IL, 62701",
parts = Text.Split(address, ", "),
// Create a record with flexible field assignment
parsedAddress = [
Street = parts{0}?,
Unit = if List.Count(parts) > 4 then parts{1}? else null,
City = if List.Count(parts) > 4 then parts{2}? else parts{1}?,
State = if List.Count(parts) > 4 then parts{3}? else parts{2}?,
Zip = List.Last(parts)
]
in
parsedAddress
)
The ? operator safely handles cases where list indices might not exist, preventing errors when text structure varies.
Power Query can perform basic text analysis for categorization purposes:
TextAnalysis = Table.AddColumn(Source, "SentimentIndicators", each
let
comment = Text.Lower([Comment]),
positiveWords = {"great", "excellent", "recommend", "fantastic", "amazing"},
negativeWords = {"disappointed", "terrible", "awful", "damaged", "worst"},
positiveCount = List.Sum(
List.Transform(positiveWords, each if Text.Contains(comment, _) then 1 else 0)
),
negativeCount = List.Sum(
List.Transform(negativeWords, each if Text.Contains(comment, _) then 1 else 0)
),
sentiment = if positiveCount > negativeCount then "Positive"
else if negativeCount > positiveCount then "Negative"
else "Neutral"
in
[PositiveWords = positiveCount, NegativeWords = negativeCount, Sentiment = sentiment]
)
This approach creates structured sentiment indicators from unstructured text, useful for basic text classification before more sophisticated analysis.
While Power Query doesn't have native regex support, you can achieve similar results with text functions for common patterns:
ExtractPatterns = Table.AddColumn(Source, "ExtractedInfo", each
let
text = [Comment],
// Extract email-like patterns (simplified)
emailPattern = let
atPosition = Text.PositionOf(text, "@"),
beforeAt = if atPosition > 0 then
Text.End(Text.Start(text, atPosition),
List.Min({atPosition, 20})) else "",
afterAt = if atPosition > 0 then
Text.Start(Text.End(text, Text.Length(text) - atPosition - 1), 20) else "",
email = if atPosition > 0 and Text.Contains(afterAt, ".") then
beforeAt & "@" & afterAt else null
in
email,
// Extract phone numbers (US format)
phoneExtraction = let
digits = Text.Select(text, {"0".."9"}),
phone = if Text.Length(digits) = 10 then
Text.Start(digits, 3) & "-" &
Text.Middle(digits, 3, 3) & "-" &
Text.End(digits, 4)
else null
in
phone
in
[Email = emailPattern, Phone = phoneExtraction]
)
When working with large datasets, text and date operations can become performance bottlenecks. Understanding optimization strategies is crucial for production environments.
Text operations are inherently expensive, so minimize unnecessary transformations:
// Inefficient: Multiple passes through the data
BadExample =
Table.AddColumn(Source, "Step1", each Text.Trim([CustomerName])),
AddStep2 = Table.AddColumn(BadExample, "Step2", each Text.Proper([Step1])),
AddStep3 = Table.AddColumn(AddStep2, "Final", each Text.Replace([Step2], ",", ""))
// Efficient: Single pass with combined logic
GoodExample = Table.AddColumn(Source, "CleanName", each
let
cleaned = Text.Replace(Text.Proper(Text.Trim([CustomerName])), ",", "")
in
cleaned
)
For complex transformations, consider using Table.TransformColumns to modify existing columns rather than creating new ones:
EfficientTransform = Table.TransformColumns(Source, {
{"CustomerName", each Text.Proper(Text.Trim(_)), type text},
{"FeedbackDate", each try Date.FromText(_) otherwise null, type nullable date}
})
Date calculations can be optimized by leveraging Power Query's built-in functions rather than manual arithmetic:
// Less efficient
ManualDateCalc = Table.AddColumn(Source, "DaysDiff", each
let
date1 = [ParsedDate],
date2 = Date.From(DateTime.LocalNow()),
diff = Duration.Days(date2 - date1)
in
diff
)
// More efficient
OptimizedDateCalc = Table.AddColumn(Source, "DaysDiff", each
Duration.Days(Date.From(DateTime.LocalNow()) - [ParsedDate])
)
For large text datasets, consider the memory implications of your transformations:
// Memory-efficient approach for large datasets
StreamlinedProcessing =
let
// Filter early to reduce data volume
FilteredData = Table.SelectRows(Source, each [CustomerName] <> null),
// Combine multiple text operations
ProcessedData = Table.TransformColumns(FilteredData, {
{"CustomerName", each
let
cleaned = Text.Trim(_),
proper = Text.Proper(cleaned),
final = if Text.Contains(proper, ",") then
Text.Trim(List.Last(Text.Split(proper, ","))) & " " &
Text.Trim(List.First(Text.Split(proper, ",")))
else proper
in
final
}
})
in
ProcessedData
Let's apply these concepts by building a comprehensive customer data processor that handles real-world messy data:
let
// Sample problematic customer data
RawData = Table.FromRecords({
[CustomerID = "CUST001", Name = " ANDERSON, JAMES R. ",
SignupDate = "2024-01-15T09:30:00Z", LastPurchase = "March 10th, 2024",
Phone = "5551234567", Email = " JAMES.ANDERSON@EMAIL.COM ",
Address = "123 Oak St, Suite 200, Chicago, IL, 60601"],
[CustomerID = "CUST002", Name = "martinez, maria",
SignupDate = "15/02/2024", LastPurchase = "2024-03-20 14:30",
Phone = "(555) 987-6543", Email = "maria.martinez@company.org",
Address = "456 Pine Ave, Denver, CO, 80202"],
[CustomerID = "CUST003", Name = "Dr. Patricia Williams-Chen PhD",
SignupDate = "Feb 28, 2024 @ 3:45 PM", LastPurchase = "28/03/2024",
Phone = "555.876.5432", Email = "p.williams.chen@university.edu",
Address = "789 Elm Blvd, Apt 15B, Austin, TX, 78701"]
}),
// Step 1: Clean and standardize names
CleanNames = Table.AddColumn(RawData, "StandardName", each
let
rawName = Text.Trim([Name]),
properCase = Text.Proper(rawName),
// Handle "LASTNAME, FIRSTNAME" format
standardized = if Text.Contains(properCase, ",") and
not Text.Contains(properCase, "Dr.") and
not Text.Contains(properCase, "PhD") then
let
parts = Text.Split(properCase, ","),
lastName = Text.Trim(List.First(parts)),
firstName = Text.Trim(List.Last(parts))
in
firstName & " " & lastName
else
properCase
in
standardized
),
// Step 2: Parse multiple date formats
ParseDates = Table.AddColumn(CleanNames, "ParsedSignup", each
let
dateText = [SignupDate],
// Try ISO format first
isoResult = try DateTime.FromText(dateText) otherwise null,
// Try DD/MM/YYYY format
euroResult = if isoResult = null then
try
let
parts = Text.Split(dateText, "/"),
day = Number.FromText(List.First(parts)),
month = Number.FromText(parts{1}),
year = Number.FromText(parts{2})
in
#date(year, month, day)
otherwise null
else DateTime.Date(isoResult),
// Try "Feb 28, 2024 @ 3:45 PM" format
longResult = if euroResult = null then
try
let
cleanText = Text.Replace(dateText, " @ ", " "),
parsed = DateTime.FromText(cleanText)
in
DateTime.Date(parsed)
otherwise null
else euroResult
in
longResult
),
// Step 3: Standardize phone numbers
StandardizePhones = Table.AddColumn(ParseDates, "StandardPhone", each
let
phone = [Phone],
digitsOnly = Text.Select(phone, {"0".."9"}),
formatted = if Text.Length(digitsOnly) = 10 then
"(" & Text.Start(digitsOnly, 3) & ") " &
Text.Middle(digitsOnly, 3, 3) & "-" &
Text.End(digitsOnly, 4)
else phone
in
formatted
),
// Step 4: Clean email addresses
CleanEmails = Table.AddColumn(StandardizePhones, "StandardEmail", each
Text.Lower(Text.Trim([Email]))
),
// Step 5: Parse addresses into components
ParseAddresses = Table.AddColumn(CleanEmails, "AddressComponents", each
let
address = [Address],
parts = Text.Split(address, ", "),
components = [
Street = parts{0}?,
Unit = if List.Count(parts) > 4 then parts{1}? else null,
City = if List.Count(parts) > 4 then parts{2}? else parts{1}?,
State = if List.Count(parts) > 4 then parts{3}? else parts{2}?,
Zip = List.Last(parts)
]
in
components
),
// Step 6: Add business metrics
AddMetrics = Table.AddColumn(ParseAddresses, "CustomerMetrics", each
let
signupDate = [ParsedSignup],
today = Date.From(DateTime.LocalNow()),
daysSinceSignup = Duration.Days(today - signupDate),
metrics = [
DaysSinceSignup = daysSinceSignup,
CustomerTenure = if daysSinceSignup < 30 then "New"
else if daysSinceSignup < 90 then "Recent"
else "Established"
]
in
metrics
),
// Step 7: Final cleanup - remove intermediate columns and expand records
FinalResult =
let
// Remove original messy columns
CleanedTable = Table.RemoveColumns(AddMetrics,
{"Name", "SignupDate", "Phone", "Email", "Address"}),
// Expand the address components
ExpandAddress = Table.ExpandRecordColumn(CleanedTable,
"AddressComponents",
{"Street", "Unit", "City", "State", "Zip"}),
// Expand customer metrics
ExpandMetrics = Table.ExpandRecordColumn(ExpandAddress,
"CustomerMetrics",
{"DaysSinceSignup", "CustomerTenure"})
in
ExpandMetrics
in
FinalResult
This comprehensive example demonstrates how to chain multiple text and date transformations efficiently while handling various edge cases and data quality issues.
The most common issue with date parsing is cultural format assumptions. Power Query's automatic detection may interpret "01/02/2024" as January 2nd when you expect February 1st:
// Problem: Ambiguous date interpretation
ProblematicParsing = Date.FromText("01/02/2024") // Could be Jan 2 or Feb 1
// Solution: Explicit format specification
SafeParsing = Date.FromText("01/02/2024", [Format="dd/MM/yyyy"])
Always validate your date parsing results with sample data from different regions or systems.
Text functions in Power Query can fail when encountering null values. Always implement null checking:
// Problem: Function fails on null input
BadTextHandling = Table.AddColumn(Source, "CleanName", each
Text.Proper([CustomerName]) // Fails if CustomerName is null
)
// Solution: Null-safe processing
SafeTextHandling = Table.AddColumn(Source, "CleanName", each
if [CustomerName] = null then null else Text.Proper([CustomerName])
)
Text processing on large datasets can cause memory issues. Monitor your query performance and consider these strategies:
Tip: Use the Power Query performance profiler to identify bottlenecks in text-heavy transformations. Text.Contains operations are particularly expensive when applied to large datasets.
// Optimize by filtering early
EfficientProcessing =
let
FilteredSource = Table.SelectRows(Source, each [CustomerName] <> null),
ProcessedData = Table.TransformColumns(FilteredSource, ...)
in
ProcessedData
When working with international data, be aware of Unicode normalization issues:
// Handle Unicode normalization for consistent comparisons
NormalizedText = Table.AddColumn(Source, "NormalizedName", each
let
normalized = Text.Clean([CustomerName]),
// Remove diacritics for standardization if needed
simplified = Text.Replace(Text.Replace(normalized, "é", "e"), "ñ", "n")
in
simplified
)
For production systems handling international data, consider more comprehensive Unicode normalization strategies.
You've now mastered the essential techniques for handling text and date data in Power Query. These skills form the foundation for most data preparation tasks you'll encounter in professional environments.
Key concepts covered:
Immediate next steps:
Advanced topics to explore:
The techniques you've learned here will serve as building blocks for more advanced Power Query scenarios, including complex data modeling, automated report generation, and enterprise data integration workflows.
Learning Path: Power Query Essentials