Building a Custom M Language Parser and Tokenizer for Dynamic Expression Evaluation in Power Query

Introduction

Imagine you're building a self-service reporting tool in Power BI where business analysts define their own calculated columns using a simple expression language — something like [Revenue] * 1.15 or IF([Region] = "North", [Revenue] * 0.9, [Revenue]). The expressions live in a configuration table, they change weekly, and you need Power Query to evaluate them dynamically against your dataset at refresh time. You can't hardcode anything. The expressions themselves are data.

This is the problem that breaks most Power Query practitioners. The standard advice is to reach for a scripting language or a backend service, but that's often not an option — you may be working in a locked-down environment, or the whole point is that the solution must live entirely within Power BI Desktop and the Power Query engine. What you need is the ability to parse an expression string into structured tokens, walk those tokens into an evaluation tree, and execute the result against live table data — all inside M.

That is exactly what we're building in this lesson. By the end, you will have a functioning mini expression evaluator written entirely in M. You'll understand how tokenizers and parsers work at a conceptual level, why M's functional and lazy evaluation model makes certain patterns possible that would be awkward in other languages, and where the hard limits of this approach live. This is not a toy demo — you will walk away with a reusable pattern you can adapt for real reporting infrastructure.

What you'll learn:

How to build a character-level tokenizer in M that converts an expression string into a structured list of typed tokens
How to implement a recursive descent parser that turns a flat token list into a nested expression tree (an AST)
How to write an evaluator that walks the AST and resolves values against a live Power Query record or table row
How to handle operator precedence, parenthesized sub-expressions, and named field references safely
The architectural trade-offs of doing this inside Power Query versus externalizing to Python or a database computed column

Prerequisites

This lesson assumes you are comfortable with advanced M — specifically:

Writing and composing custom functions with (x) => ... syntax
Working with records, lists, and tables as first-class values
Understanding let...in blocks and why evaluation is lazy
Using List.Generate, List.Accumulate, and recursive functions
Basic familiarity with what an Abstract Syntax Tree (AST) is — even just conceptually

If you've never written a recursive M function before, work through the recursion patterns in the intermediate M track first. The code here will make much more sense.

Why M Is Surprisingly Suited for This Problem

Before writing a single line of code, it's worth understanding why M isn't a crazy choice for this task — because your first instinct is probably "use Python" or "use a custom connector."

M is a purely functional language with immutable values and lazy evaluation. That combination is actually the bedrock of how all parsers and interpreters are designed in functional programming traditions. Languages like Haskell and ML (from which Power Query M takes heavy inspiration) have entire parser combinator libraries built on exactly the same principles we'll use here: functions that take input and return structured results, composed together without mutating any shared state.

The challenge with M isn't conceptual — it's practical. M lacks tail-call optimization, so deeply recursive grammars on long inputs will eventually hit stack limits. M has no native error-propagation monad, so we need to simulate it. And M functions cannot be dynamically constructed from strings at runtime using Expression.Evaluate without significant security warnings in Power BI Service. We'll address each of these constraints as we go.

The approach we'll use is called recursive descent parsing, which is the most practical parsing strategy for handwritten parsers. You write one function per grammar rule, and each function calls others to handle sub-expressions. It maps cleanly to M's function composition model.

Designing the Expression Grammar

We need to define what our expression language looks like before writing any parsing code. Let's design a grammar that covers the real use cases a business analyst would need:

expression   := comparison
comparison   := addition ( ( "=" | "<>" | "<" | ">" | "<=" | ">=" ) addition )*
addition     := multiplication ( ( "+" | "-" ) multiplication )*
multiplication := unary ( ( "*" | "/" ) unary )*
unary        := "-" unary | primary
primary      := NUMBER | STRING | FIELD_REF | "(" expression ")" | IF_EXPR
IF_EXPR      := "IF" "(" expression "," expression "," expression ")"
FIELD_REF    := "[" IDENTIFIER "]"

This grammar encodes operator precedence through its structure — multiplication binds tighter than addition because multiplication appears lower in the call stack than addition. This is the elegant trick at the heart of recursive descent: you don't need a precedence table; you just order your grammar rules.

Our token types will be:

Token Type	Examples
`Number`	`42`, `3.14`, `-0.5` (post-unary)
`String`	`"North"`, `"Q1 2024"`
`FieldRef`	`[Revenue]`, `[Customer Name]`
`Operator`	`+`, `-`, `*`, `/`, `=`, `<>`, `<=`, `>=`, `<`, `>`
`Comma`	`,`
`LParen`	`(`
`RParen`	`)`
`Keyword`	`IF`
`EOF`	end of input sentinel

Phase 1: Building the Tokenizer

The tokenizer (also called a lexer) has one job: take a raw string and return a list of typed token records. Each token knows its type and its value. The tokenizer is stateless — it never needs to look backward, and it consumes the input character by character from left to right.

Here's the complete tokenizer as a Power Query function. Study the structure carefully before we dissect it:

let
    Tokenize = (input as text) as list =>
    let
        chars = Text.ToList(input),
        n = List.Count(chars),

        // ── helpers ──────────────────────────────────────────────
        IsDigit  = (c) => c >= "0" and c <= "9",
        IsAlpha  = (c) => (c >= "a" and c <= "z") or (c >= "A" and c <= "Z") or c = "_",
        IsSpace  = (c) => c = " " or c = #(tab) or c = #(cr) or c = #(lf),

        Token = (type, value) => [Type = type, Value = value],

        // ── main loop ─────────────────────────────────────────────
        // State: [pos = current index, tokens = accumulated list]
        ScanLoop = (state as record) as list =>
            let
                pos    = state[pos],
                tokens = state[tokens]
            in
            if pos >= n then
                List.Combine({tokens, {Token("EOF", null)}})
            else
                let
                    c = chars{pos}
                in
                // Skip whitespace
                if IsSpace(c) then
                    @ScanLoop([pos = pos + 1, tokens = tokens])

                // Number literal
                else if IsDigit(c) then
                    let
                        numResult = ScanNumber(chars, pos, n),
                        newPos    = numResult[pos],
                        tok       = Token("Number", numResult[value])
                    in
                    @ScanLoop([pos = newPos, tokens = List.Combine({tokens, {tok}})])

                // String literal
                else if c = """" then
                    let
                        strResult = ScanString(chars, pos + 1, n),
                        newPos    = strResult[pos],
                        tok       = Token("String", strResult[value])
                    in
                    @ScanLoop([pos = newPos, tokens = List.Combine({tokens, {tok}})])

                // Field reference [FieldName]
                else if c = "[" then
                    let
                        fldResult = ScanFieldRef(chars, pos + 1, n),
                        newPos    = fldResult[pos],
                        tok       = Token("FieldRef", fldResult[value])
                    in
                    @ScanLoop([pos = newPos, tokens = List.Combine({tokens, {tok}})])

                // Two-character operators
                else if c = "<" and pos + 1 < n and chars{pos+1} = ">" then
                    @ScanLoop([pos = pos+2, tokens = List.Combine({tokens, {Token("Operator","<>")}})])
                else if c = "<" and pos + 1 < n and chars{pos+1} = "=" then
                    @ScanLoop([pos = pos+2, tokens = List.Combine({tokens, {Token("Operator","<=")}})])
                else if c = ">" and pos + 1 < n and chars{pos+1} = "=" then
                    @ScanLoop([pos = pos+2, tokens = List.Combine({tokens, {Token("Operator",">=")}})])

                // Single-character operators and punctuation
                else if c = "+" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","+")}})])
                else if c = "-" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","-")}})])
                else if c = "*" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","*")}})])
                else if c = "/" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","/")}})])
                else if c = "=" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","=")}}})])
                else if c = "<" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator","<")}})])
                else if c = ">" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Operator",">")}})])
                else if c = "," then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Comma",null)}})])
                else if c = "(" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("LParen",null)}})])
                else if c = ")" then @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("RParen",null)}})])

                // Keywords and identifiers
                else if IsAlpha(c) then
                    let
                        idResult = ScanIdentifier(chars, pos, n),
                        newPos   = idResult[pos],
                        raw      = idResult[value],
                        tokType  = if Text.Upper(raw) = "IF" then "Keyword" else "Identifier",
                        tok      = Token(tokType, Text.Upper(raw))
                    in
                    @ScanLoop([pos = newPos, tokens = List.Combine({tokens, {tok}})])

                // Unknown character — emit error token and continue
                else
                    @ScanLoop([pos=pos+1, tokens=List.Combine({tokens, {Token("Error", c)}}})])
        ,

        // ── sub-scanners ──────────────────────────────────────────
        ScanNumber = (chars, start, n) as record =>
            let
                Advance = (state) =>
                    if state[pos] < n and (IsDigit(chars{state[pos]}) or chars{state[pos]} = ".") then
                        @Advance([pos = state[pos]+1, acc = state[acc] & chars{state[pos]}])
                    else
                        state,
                result = Advance([pos = start, acc = ""])
            in
            [pos = result[pos], value = Number.FromText(result[acc])],

        ScanString = (chars, start, n) as record =>
            let
                Advance = (state) =>
                    if state[pos] >= n then
                        state  // unterminated — caller gets what we have
                    else if chars{state[pos]} = """" then
                        [pos = state[pos]+1, acc = state[acc]]  // consume closing quote
                    else
                        @Advance([pos = state[pos]+1, acc = state[acc] & chars{state[pos]}]),
                result = Advance([pos = start, acc = ""])
            in
            [pos = result[pos], value = result[acc]],

        ScanFieldRef = (chars, start, n) as record =>
            let
                Advance = (state) =>
                    if state[pos] >= n then state
                    else if chars{state[pos]} = "]" then
                        [pos = state[pos]+1, acc = state[acc]]
                    else
                        @Advance([pos = state[pos]+1, acc = state[acc] & chars{state[pos]}]),
                result = Advance([pos = start, acc = ""])
            in
            [pos = result[pos], value = result[acc]],

        ScanIdentifier = (chars, start, n) as record =>
            let
                Advance = (state) =>
                    if state[pos] < n and (IsAlpha(chars{state[pos]}) or IsDigit(chars{state[pos]})) then
                        @Advance([pos = state[pos]+1, acc = state[acc] & chars{state[pos]}])
                    else
                        state,
                result = Advance([pos = start, acc = ""])
            in
            [pos = result[pos], value = result[acc]],

        // ── kick it off ──────────────────────────────────────────
        tokens = ScanLoop([pos = 0, tokens = {}])
    in
        tokens
in
    Tokenize

Let's unpack the key design decisions here.

State threading as records. Because M has no mutable variables, we simulate mutable state by passing a record through each recursive call. The record [pos = ..., tokens = ...] carries the current position in the character array and the accumulated token list. Each call returns an updated copy. This is the standard functional programming pattern for simulating stateful iteration — it's called state threading or a state monad in disguise.

The @ prefix for recursion. In M, @FunctionName inside a let binding is how you call a function recursively when the function refers to itself by name. Without the @, M would look for ScanLoop as a prior binding in scope, not the function being defined. This is critical — forget the @ and you'll get a baffling "The name 'ScanLoop' wasn't found in the current scope" error.

Sub-scanners return position-and-value records. Each ScanNumber, ScanString, etc. returns [pos = ..., value = ...]. The caller updates its loop position to result[pos]. This is the standard "consume and return new position" contract in hand-written lexers.

Warning: List.Combine({tokens, {tok}}) on every iteration makes the tokenizer O(n²) in the worst case because it copies the growing list each time. For expressions under a few hundred characters this is fine. For very long formula strings (thousands of characters), consider building the token list in reverse with {tok} & tokens and reversing at the end using List.Reverse.

Phase 2: Building the Recursive Descent Parser

The parser takes the flat list of tokens from the tokenizer and builds a nested record structure — the Abstract Syntax Tree (AST). Each node in the AST has a Kind field and children appropriate to that kind:

AST Node Kind	Fields
`Literal`	`Value` (the actual M value)
`FieldRef`	`Name` (the field name string)
`BinaryOp`	`Op`, `Left`, `Right`
`UnaryOp`	`Op`, `Operand`
`IfExpr`	`Condition`, `Then`, `Else`

The parser follows the same state-threading pattern as the tokenizer, but instead of a character position, we track a token index:

let
    Parse = (tokens as list) as record =>
    let
        n = List.Count(tokens),
        Peek  = (pos) => if pos < n then tokens{pos} else tokens{n-1},  // EOF guard
        Cur   = (pos) => Peek(pos)[Type],
        CurV  = (pos) => Peek(pos)[Value],

        // ── Consume a token of an expected type ──────────────────
        Expect = (pos, expectedType) as record =>
            if Cur(pos) = expectedType then
                [pos = pos + 1, ok = true]
            else
                error "Expected " & expectedType & " but got " & Cur(pos),

        // ── Grammar rules ─────────────────────────────────────────

        // expression := comparison
        ParseExpression = (pos) as record => ParseComparison(pos),

        // comparison := addition ( compOp addition )*
        ParseComparison = (pos) as record =>
            let
                left = ParseAddition(pos),
                CompOps = {"=", "<>", "<", ">", "<=", ">="},
                Fold = (state) =>
                    if Cur(state[pos]) = "Operator" and List.Contains(CompOps, CurV(state[pos])) then
                        let
                            op    = CurV(state[pos]),
                            right = ParseAddition(state[pos] + 1),
                            node  = [Kind="BinaryOp", Op=op, Left=state[node], Right=right[node]]
                        in
                        @Fold([pos = right[pos], node = node])
                    else
                        state,
                result = Fold([pos = left[pos], node = left[node]])
            in
            result,

        // addition := multiplication ( ("+"|"-") multiplication )*
        ParseAddition = (pos) as record =>
            let
                left = ParseMultiplication(pos),
                Fold = (state) =>
                    if Cur(state[pos]) = "Operator" and List.Contains({"+","-"}, CurV(state[pos])) then
                        let
                            op    = CurV(state[pos]),
                            right = ParseMultiplication(state[pos] + 1),
                            node  = [Kind="BinaryOp", Op=op, Left=state[node], Right=right[node]]
                        in
                        @Fold([pos = right[pos], node = node])
                    else
                        state,
                result = Fold([pos = left[pos], node = left[node]])
            in
            result,

        // multiplication := unary ( ("*"|"/") unary )*
        ParseMultiplication = (pos) as record =>
            let
                left = ParseUnary(pos),
                Fold = (state) =>
                    if Cur(state[pos]) = "Operator" and List.Contains({"*","/"}, CurV(state[pos])) then
                        let
                            op    = CurV(state[pos]),
                            right = ParseUnary(state[pos] + 1),
                            node  = [Kind="BinaryOp", Op=op, Left=state[node], Right=right[node]]
                        in
                        @Fold([pos = right[pos], node = node])
                    else
                        state,
                result = Fold([pos = left[pos], node = left[node]])
            in
            result,

        // unary := "-" unary | primary
        ParseUnary = (pos) as record =>
            if Cur(pos) = "Operator" and CurV(pos) = "-" then
                let
                    operand = @ParseUnary(pos + 1)
                in
                [pos = operand[pos], node = [Kind="UnaryOp", Op="-", Operand=operand[node]]]
            else
                ParsePrimary(pos),

        // primary := NUMBER | STRING | FIELD_REF | "(" expression ")" | IF_EXPR
        ParsePrimary = (pos) as record =>
            if Cur(pos) = "Number" then
                [pos = pos+1, node = [Kind="Literal", Value=CurV(pos)]]
            else if Cur(pos) = "String" then
                [pos = pos+1, node = [Kind="Literal", Value=CurV(pos)]]
            else if Cur(pos) = "FieldRef" then
                [pos = pos+1, node = [Kind="FieldRef", Name=CurV(pos)]]
            else if Cur(pos) = "LParen" then
                let
                    inner  = ParseExpression(pos + 1),
                    closed = Expect(inner[pos], "RParen")
                in
                [pos = closed[pos], node = inner[node]]
            else if Cur(pos) = "Keyword" and CurV(pos) = "IF" then
                ParseIfExpr(pos)
            else
                error "Unexpected token: " & Cur(pos) & " (value: " & Text.From(CurV(pos)) & ")",

        // IF "(" condition "," thenExpr "," elseExpr ")"
        ParseIfExpr = (pos) as record =>
            let
                p0   = Expect(pos+1, "LParen"),
                cond = ParseExpression(p0[pos]),
                p1   = Expect(cond[pos], "Comma"),
                thn  = ParseExpression(p1[pos]),
                p2   = Expect(thn[pos], "Comma"),
                els  = ParseExpression(p2[pos]),
                p3   = Expect(els[pos], "RParen"),
                node = [Kind="IfExpr", Condition=cond[node], Then=thn[node], Else=els[node]]
            in
            [pos = p3[pos], node = node],

        // Run it
        result = ParseExpression(0)
    in
        result[node]
in
    Parse

The mutual recursion here is worth examining carefully. ParseExpression calls ParseComparison, which calls ParseAddition, which calls ParseMultiplication, which calls ParseUnary, which calls ParsePrimary, which calls back to ParseExpression in the parenthesized sub-expression case and in ParseIfExpr. This cycle is what makes the parser capable of handling arbitrary nesting depth.

Important note on M and mutual recursion: In M, all let-bindings within a single let...in block are visible to each other regardless of definition order. This is M's simultaneous binding behavior (it evaluates bindings lazily). That's why ParsePrimary can call ParseExpression even though ParseExpression is defined "above" it in the source — M doesn't care about textual order for let bindings.

Let's trace what happens when we parse [Revenue] * 1.15:

ParseExpression(0) → ParseComparison(0) → ParseAddition(0) → ParseMultiplication(0)
ParseMultiplication calls ParseUnary(0) → ParsePrimary(0) → sees FieldRef "Revenue" → returns {pos=1, node=[Kind="FieldRef", Name="Revenue"]}
Back in ParseMultiplication, Fold checks: is position 1 an Operator in {"*", "/"}? Yes: *. It calls ParseUnary(2) → ParsePrimary(2) → sees Number 1.15 → returns {pos=3, node=[Kind="Literal", Value=1.15]}
The BinaryOp node is assembled: [Kind="BinaryOp", Op="*", Left=[Kind="FieldRef", Name="Revenue"], Right=[Kind="Literal", Value=1.15]]
Fold continues: position 3 is EOF — not a * or / — so we're done.

The final AST for [Revenue] * 1.15 is:

BinaryOp(*)
├── FieldRef("Revenue")
└── Literal(1.15)

And for IF([Region] = "North", [Revenue] * 0.9, [Revenue]):

IfExpr
├── Condition: BinaryOp(=)
│   ├── FieldRef("Region")
│   └── Literal("North")
├── Then: BinaryOp(*)
│   ├── FieldRef("Revenue")
│   └── Literal(0.9)
└── Else: FieldRef("Revenue")

Phase 3: Building the Evaluator

The evaluator walks the AST and produces a result value. It takes two arguments: an AST node, and a context — a record where each field represents a column value for the current row. For example: [Revenue = 15000, Region = "North", Discount = 0.05].

let
    Evaluate = (node as record, context as record) as any =>
        if node[Kind] = "Literal" then
            node[Value]

        else if node[Kind] = "FieldRef" then
            if Record.HasFields(context, {node[Name]}) then
                Record.Field(context, node[Name])
            else
                error "Field not found in context: " & node[Name]

        else if node[Kind] = "UnaryOp" then
            let
                operand = @Evaluate(node[Operand], context)
            in
            if node[Op] = "-" then -operand
            else error "Unknown unary operator: " & node[Op]

        else if node[Kind] = "BinaryOp" then
            let
                left  = @Evaluate(node[Left],  context),
                right = @Evaluate(node[Right], context),
                op    = node[Op]
            in
            if      op = "+"  then left + right
            else if op = "-"  then left - right
            else if op = "*"  then left * right
            else if op = "/"  then if right = 0 then error "Division by zero" else left / right
            else if op = "="  then left = right
            else if op = "<>" then left <> right
            else if op = "<"  then left < right
            else if op = ">"  then left > right
            else if op = "<=" then left <= right
            else if op = ">=" then left >= right
            else error "Unknown operator: " & op

        else if node[Kind] = "IfExpr" then
            let
                cond = @Evaluate(node[Condition], context)
            in
            if cond then
                @Evaluate(node[Then], context)
            else
                @Evaluate(node[Else], context)

        else
            error "Unknown AST node kind: " & node[Kind]
in
    Evaluate

Notice the IfExpr evaluation is genuinely short-circuit: we evaluate the condition first, and only then evaluate the appropriate branch. This is important both for correctness (avoiding side effects in the unexecuted branch) and for performance (don't compute what you don't need).

Also notice that the evaluator is completely separate from the parser and tokenizer. This separation of concerns is intentional and important. If you want to add a "compile step" later — such as caching the AST to avoid re-parsing on every row — you only need to change the orchestration layer, not the evaluator itself.

Phase 4: Wiring It All Together

Now we need an orchestration function that takes an expression string, tokenizes it, parses it, and applies it to every row of a table. The key insight is that we should parse the expression once and reuse the AST for every row. Parsing is expensive (relatively speaking in M); field lookup and arithmetic are cheap.

let
    ApplyExpression = (sourceTable as table, expressionText as text, newColumnName as text) as table =>
    let
        // Parse once
        tokens = Tokenize(expressionText),
        ast    = Parse(tokens),

        // Build a list of result values, one per row
        rowCount = Table.RowCount(sourceTable),
        results  = List.Generate(
            () => [i = 0],
            (state) => state[i] < rowCount,
            (state) => [i = state[i] + 1],
            (state) =>
                let
                    row     = Table.Row(sourceTable, state[i]),
                    context = Record.FromTable(
                        Table.RenameColumns(
                            Record.ToTable(row),
                            {{"Name","Name"},{"Value","Value"}}
                        )
                    ),
                    value   = Evaluate(ast, context)
                in
                value
        ),

        // Add results as a new column
        withIndex   = Table.AddIndexColumn(sourceTable, "_idx_", 0, 1),
        withResults = Table.AddColumn(withIndex, newColumnName, 
            each results{[_idx_]}),
        cleaned     = Table.RemoveColumns(withResults, {"_idx_"})
    in
        cleaned
in
    ApplyExpression

Performance note: Table.Row(sourceTable, i) is O(n) in the table size for each call in some engine implementations — this can make the whole loop O(n²) on large tables. A safer pattern is to first convert the table to a list of records using Table.ToRecords(sourceTable), store that list, and index into it. List indexing is O(1).

Here's the optimized version of the core loop:

    rowRecords = Table.ToRecords(sourceTable),
    results = List.Transform(
        rowRecords,
        (row) => Evaluate(ast, row)
    ),

List.Transform is cleaner and avoids the List.Generate overhead. Each row is already a record with field names matching the column names, which is exactly what our evaluator expects as context.

Phase 5: Handling a Configuration Table

The real power comes when expressions live in a metadata table. Suppose you have a Power BI model with a table called CalculatedColumnConfig:

ColumnName	Expression
`AdjustedRevenue`	`[Revenue] * (1 - [Discount])`
`IsNorthernAccount`	`IF([Region] = "North", 1, 0)`
`RevenueCategory`	`IF([Revenue] >= 50000, "Large", IF([Revenue] >= 10000, "Medium", "Small"))`

Note the nested IF in RevenueCategory — our grammar supports this because IF_EXPR arguments are full expression productions, which can themselves be IF_EXPR nodes.

let
    ApplyAllExpressions = (sourceTable as table, configTable as table) as table =>
    let
        configs = Table.ToRecords(configTable),

        // Fold over each config record, accumulating a growing table
        result = List.Accumulate(
            configs,
            sourceTable,
            (currentTable, cfg) =>
                ApplyExpression(currentTable, cfg[Expression], cfg[ColumnName])
        )
    in
        result
in
    ApplyAllExpressions

List.Accumulate is the right tool here: it threads the growing table through each expression application, so each subsequent expression can reference columns added by prior expressions. This means RevenueCategory could theoretically reference AdjustedRevenue if that comes first in the config table. You get a mini dependency ordering system for free just by controlling row order in the config table.

Caching Parsed ASTs for Performance

If you're applying the same expression to thousands of rows, you definitely don't want to re-parse on every row. But what if you're applying the same expression to thousands of tables — perhaps in a parameterized report where the config table is shared across 50 data sources?

The solution is to pre-build an "expression cache" — a record where each field is a column name and each value is the pre-parsed AST:

let
    BuildASTCache = (configTable as table) as record =>
    let
        configs = Table.ToRecords(configTable),
        pairs   = List.Transform(
            configs,
            (cfg) => {cfg[ColumnName], Parse(Tokenize(cfg[Expression]))}
        ),
        cache   = Record.FromList(
            List.Transform(pairs, (p) => p{1}),
            List.Transform(pairs, (p) => p{0})
        )
    in
        cache
in
    BuildASTCache

You call BuildASTCache once per query refresh and store the result. Then your row-level evaluator calls Record.Field(astCache, columnName) to retrieve the pre-built AST instead of re-parsing. In a 10,000-row table with 5 expression columns, this eliminates 49,995 redundant parse operations.

Hands-On Exercise

You now have all the pieces. Here's a realistic exercise to cement your understanding.

Scenario: You work for a retail analytics team. Your sales data has columns: [ProductCategory], [BasePrice], [Quantity], [CustomerTier] (values: "Gold", "Silver", "Bronze"). The pricing team updates discount rules weekly in a SharePoint list that flows into Power Query as a table.

Your task:

Create a table in Power Query called SalesData with at least 10 rows and the four columns above. Use realistic values — categories like "Electronics" and "Apparel", prices between 10 and 500, quantities 1–20, and a mix of customer tiers.
Create a PricingRules table with these three rows:

ColumnName	Expression
`DiscountedPrice`	`[BasePrice] * IF([CustomerTier] = "Gold", 0.75, IF([CustomerTier] = "Silver", 0.85, 0.95))`
`LineTotal`	`[DiscountedPrice] * [Quantity]`
`IsPremiumElectronics`	`IF([ProductCategory] = "Electronics", IF([BasePrice] >= 200, 1, 0), 0)`

Paste in the Tokenize, Parse, and Evaluate functions from this lesson (create them as separate Power Query queries named Tokenize, Parse, and Evaluate).
Build the ApplyAllExpressions function and apply it to SalesData using PricingRules.
Verify that LineTotal correctly uses DiscountedPrice from the same row (since DiscountedPrice appears first in the config table and ApplyAllExpressions uses List.Accumulate).

Stretch goal: Add a HighValueFlag expression: IF([LineTotal] >= 1000, "High", "Normal"). Notice that this requires LineTotal to already exist as a column. Confirm that putting it after LineTotal in the config table makes it work, and putting it before breaks it. Then design a dependency resolution step that sorts config rows so that any expression referencing another computed column always comes after it.

Common Mistakes & Troubleshooting

"The name 'ParseExpression' wasn't found in the current scope"

This happens when you define parser functions in separate let blocks instead of inside a single let that makes all bindings visible simultaneously. All parser functions (ParseExpression, ParseComparison, etc.) must live in the same let...in block so M's simultaneous-binding rule applies. Refactoring them into separate queries breaks the mutual recursion.

"Expression.Error: We cannot apply operator + to types Number and Null"

Your context record doesn't have the field the expression is looking for, and Record.Field returned null. The evaluator's FieldRef branch handles missing fields with an explicit error, but only if you spell the field name exactly as it appears in the table column name. Column names in Power Query are case-sensitive. [revenue] and [Revenue] are different field references. Make your FieldRef lookup case-insensitive using Record.FieldNames + List.FindText if your expression authors are not reliable about casing.

"Stack overflow" on complex nested expressions

M's recursion stack is finite. Deeply nested IF chains (more than ~50 levels) or very long chains of binary operations can overflow it. The practical fix is to redesign the grammar to handle flat AND/OR chains iteratively rather than recursively, and to limit nesting depth with a depth counter threaded through the parser state.

Parser returns wrong precedence

If you accidentally mix up the order in the call chain — for example, calling ParseAddition from ParseUnary instead of ParseMultiplication — operator precedence breaks silently. The expression 2 + 3 * 4 would evaluate as (2 + 3) * 4 = 20 instead of the correct 2 + (3 * 4) = 14. Always test precedence explicitly with known-answer expressions before deploying.

`Table.Row` performance degradation on large tables

As mentioned earlier, Table.Row(table, i) can be O(n) per call in some Power Query engine contexts. Always prefer Table.ToRecords(table) first, then index the resulting list. A list index operation (myList{i}) is O(1).

Tokenizer mangles floating point numbers

The ScanNumber function as written is a simplification. It doesn't handle scientific notation (1.5e10) or numbers starting with a decimal (.5). If your expression authors use these formats, extend ScanNumber to handle leading dots and e/E exponent markers. The fix is a few additional conditions in the IsDigit branch of the advance loop.

`Expect` throws an error that swallows the expression context

When Expect fires on a parse error, the error message only says what was expected and what was found — it doesn't say where in the expression. Improve error messages by threading the original expression string into the parser state and reporting the character offset of the problematic token alongside the token content.

Architecture Trade-offs: When Not to Do This in M

This has been a deep dive into what's possible, but a complete education requires honesty about the limits.

Use this approach when:

The expression language is small and well-defined
Expressions change frequently and must be managed by non-developers
The solution must live entirely inside Power Query / Power BI Desktop
Row counts are in the tens of thousands, not hundreds of millions
You control the expression authoring experience and can enforce constraints

Consider alternatives when:

You need to execute arbitrary M code (use Expression.Evaluate carefully in Power BI Report Server or with known-safe strings; avoid in Power BI Service without explicit admin consent due to formula injection risk)
Performance is critical at scale — a Python custom function or a computed column in your data warehouse will outperform a pure-M evaluator by orders of magnitude
The expression language needs functions beyond arithmetic and comparison — at that point, you're building a real programming language, and you should use a real language to do it
Security is a primary concern — any system where end-users author expressions that get executed is a formula injection attack surface. Sanitize and validate ruthlessly before this ever touches production data

The formula injection risk deserves specific attention. If user-authored expression strings are stored in a database or SharePoint list that other users can modify, a malicious actor could craft an expression that, if you ever accidentally passed it to Expression.Evaluate rather than your custom parser, could exfiltrate data. By using your own parser, you have complete control over what the language can express — your evaluator only handles the operations you've implemented. This is actually a security advantage of the custom parser approach over Expression.Evaluate.

Summary & Next Steps

You've built a full expression evaluation pipeline in pure M: a tokenizer that converts raw strings to typed tokens, a recursive descent parser that builds a structured AST respecting operator precedence, and an evaluator that walks the AST against live row contexts. These three components are cleanly separated and each can be extended independently.

The core intellectual achievements here are:

State threading — simulating mutable state in a pure functional language using record-passing recursion
Simultaneous bindings — leveraging M's lazy evaluation of let blocks to achieve mutual recursion without forward declarations
Grammar-as-code — encoding operator precedence directly in the parser call chain, making it self-documenting and easy to extend
Parse-once, evaluate-many — separating the expensive parsing step from the per-row evaluation step for practical performance

Where to go from here:

Extend the grammar with logical operators (AND, OR, NOT), string functions (CONTAINS, STARTS_WITH), and date arithmetic
Add a type checker — a pass over the AST that verifies type consistency before evaluation, generating cleaner errors than runtime failures
Build an expression validator UI — a small Power Query function that parses an expression and returns a structured validation result (valid/invalid + error location) suitable for surfacing in a Power App or Teams notification
Explore parser combinators — the M community has begun experimenting with combinator-style parsers where grammar rules are first-class functions that can be composed. It's a more scalable approach for larger grammars
Benchmark against Python — set up a realistic comparison: 50,000 rows, 5 expression columns, M evaluator vs. Python UDF in Power Query. The results will sharpen your intuition about when to reach for which tool

The skills you've developed here — recursive data structure traversal, functional state management, grammar design — transfer directly to understanding how Power Query's own M engine evaluates your queries. You've been inside the machine. That perspective changes how you write M forever.

Building a Custom M Language Parser and Tokenizer for Dynamic Expression Evaluation in Power Query

Building a Custom M Language Parser and Tokenizer for Dynamic Expression Evaluation in Power Query

Introduction

Prerequisites

Why M Is Surprisingly Suited for This Problem

Designing the Expression Grammar

Phase 1: Building the Tokenizer

Phase 2: Building the Recursive Descent Parser

Phase 3: Building the Evaluator

Phase 4: Wiring It All Together

Phase 5: Handling a Configuration Table

Caching Parsed ASTs for Performance

Hands-On Exercise

Common Mistakes & Troubleshooting

"The name 'ParseExpression' wasn't found in the current scope"

"Expression.Error: We cannot apply operator + to types Number and Null"

"Stack overflow" on complex nested expressions

Parser returns wrong precedence

`Table.Row` performance degradation on large tables

Tokenizer mangles floating point numbers

`Expect` throws an error that swallows the expression context

Architecture Trade-offs: When Not to Do This in M

Summary & Next Steps

Related Articles

Building Reusable Power Query Function Libraries: Parameters, Recursion, and Modular M Code Patterns

Implementing Custom Query Folding Logic in M: Keeping Transformations Native to the Data Source

Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking

Related Articles

Power Query🔥 Expert
Building Reusable Power Query Function Libraries: Parameters, Recursion, and Modular M Code Patterns
29 min

Power Query⚡ Practitioner
Implementing Custom Query Folding Logic in M: Keeping Transformations Native to the Data Source
21 min

Power Query⚡ Practitioner
Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking
22 min

Building a Custom M Language Parser and Tokenizer for Dynamic Expression Evaluation in Power Query

Building a Custom M Language Parser and Tokenizer for Dynamic Expression Evaluation in Power Query

Introduction

Prerequisites

Why M Is Surprisingly Suited for This Problem

Designing the Expression Grammar

Phase 1: Building the Tokenizer

Phase 2: Building the Recursive Descent Parser

Phase 3: Building the Evaluator

Phase 4: Wiring It All Together

Phase 5: Handling a Configuration Table

Caching Parsed ASTs for Performance

Hands-On Exercise

Common Mistakes & Troubleshooting

"The name 'ParseExpression' wasn't found in the current scope"

"Expression.Error: We cannot apply operator + to types Number and Null"

"Stack overflow" on complex nested expressions

Parser returns wrong precedence

`Table.Row` performance degradation on large tables

Tokenizer mangles floating point numbers

`Expect` throws an error that swallows the expression context

Architecture Trade-offs: When *Not* to Do This in M

Summary & Next Steps

Related Articles

Building Reusable Power Query Function Libraries: Parameters, Recursion, and Modular M Code Patterns

Implementing Custom Query Folding Logic in M: Keeping Transformations Native to the Data Source

Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking

Architecture Trade-offs: When Not to Do This in M