Syntax Analysis

CDTk's parser builds a structured parse tree from a token stream using declarative Rule definitions. Like tokens, rules are public static fields in your Grammar subclass.

Parser Overview

After lexing, CDTk runs a recursive-descent parser guided by your rule tree. The parser starts at the grammar's root rule and recursively attempts to match sub-rules against the token stream. On a successful match the parser returns a ParseTree node; on failure it throws a ParseException.

The CDTk parser is intentionally greedy and left-to-right: it does not backtrack across Alt alternatives once a match starts consuming tokens. Structure your alternatives from most-specific to least-specific to avoid ambiguity.

Rule Constructors

ConstructorMeaningParse-tree result
Seq(r1, r2, …)All children must match in orderComposite node with N children
Alt(r1, r2, …)First matching alternative winsNode wrapping the matching child
Opt(r)Zero or one occurrenceNode wrapping child, or empty node
Rep(r)Zero or more occurrencesList node (may be empty)
Rep1(r)One or more occurrencesNon-empty list node
Ref(() => rule)Lazy nonterminal reference (enables recursion)Delegates to referenced rule at parse time
Token t (implicit)Match a single tokenTerminal leaf node

Seq and Alt

Seq builds a rule that matches each child in order — like a production in a BNF grammar. Alt is an ordered choice; the first alternative that matches without error is used:

// Matches: IDENT "=" expr ";"
public static Rule AssignStmt = Seq(IDENT, ASSIGN, ExprRule, SEMI);

// Matches either a return or an assignment
public static Rule Statement = Alt(ReturnStmt, AssignStmt);
Alt is greedy
If ReturnStmt partially matches and then fails, CDTk does not fall back to AssignStmt by default. Order alternatives from most-specific (longest prefix) to least-specific.

Opt, Rep, Rep1

Use repetition combinators to express optional parts and variable-length lists:

// Optional return type annotation
public static Rule ReturnType = Opt(Seq(ARROW, TypeName));

// Zero or more statements in a block
public static Rule Block = Seq(LBRACE, Rep(Statement), RBRACE);

// Comma-separated argument list with at least one argument
public static Rule ArgList = Seq(IDENT, Rep(Seq(COMMA, IDENT)));
public static Rule Params  = Alt(ArgList, Opt(IDENT));  // 0 or 1+ params

Nonterminal References (Ref)

C# static fields are initialized in declaration order, so a rule cannot directly reference another rule declared later in the same class. Use Ref(() => rule) to create a lazy reference resolved at parse time. This is also how you express recursion:

// Right-recursive expression: e.g. a + b + c
public static Rule ExprRule = Alt(
    Seq(IDENT, PLUS, Ref(() => ExprRule)),   // recursive case
    Seq(IDENT, MINUS, Ref(() => ExprRule)),  // recursive case
    INT,                                      // base case: literal
    IDENT                                     // base case: identifier
);

// Mutual recursion across rules — always use Ref for forward references
public static Rule IfStmt = Seq(
    KW_IF, LPAREN, Ref(() => ExprRule), RPAREN,
    Ref(() => Block)
);
💡
SdfxEncoder Walk
SdfxEncoder.BuildGrammarNode Walk() follows Ref() lambdas via reflection to include all rules reachable from the root in the SDXF binary output. Every rule referenced through Ref() will be serialized.

Root Rule

Override Grammar.RootRule to specify where the parser starts. If you don't override it, CDTk uses the first public static Rule field alphabetically — which is usually not what you want:

public class MyLang : Grammar {
    public static Rule ExprRule   = Alt(Seq(IDENT, PLUS, Ref(() => ExprRule)), IDENT);
    public static Rule StmtRule   = Alt(AssignStmt, ReturnStmt, IfStmt);
    public static Rule Program    = Rep1(StmtRule);

    public override Rule RootRule => Program;  // explicit root
}

The Parse Tree

After a successful parse, CDTk produces a tree of ParseNode objects. Each node is either:

  • Terminal — a single matched token. Name contains the scanned text (e.g., "if", "42").
  • Composite — an internal node with child ParseNode[] corresponding to Seq, Alt, Rep etc.

The tree is then passed to the semantic analysis phase which builds a SemanticTable.

SDXF Grammar Encoding

When you call SdfxEncoder.Encode(), the encoder serializes the grammar structure into a binary SDXF bundle so that the F# UAB engine can run CDTk translations without compiling C# code. Rules are serialized with these tags:

TagMeaning
0x40Rules container
0x41Individual rule entry (kind + children)
0x50Root rule ID

Full Example Grammar

A complete arithmetic expression grammar with operator precedence implemented via layered rules:

public class CalcGrammar : Grammar {
    public static Token PLUS   = Op("+");
    public static Token MINUS  = Op("-");
    public static Token STAR   = Op("*");
    public static Token SLASH  = Op("/");
    public static Token LPAREN = Punct("(");
    public static Token RPAREN = Punct(")");
    public static Token NUM    = Num();

    // Atom: number or parenthesised expression
    public static Rule Atom = Alt(
        Seq(LPAREN, Ref(() => Expr), RPAREN),
        NUM
    );

    // Term: handles * and /  (higher precedence)
    public static Rule Term = Alt(
        Seq(Atom, STAR,  Ref(() => Term)),
        Seq(Atom, SLASH, Ref(() => Term)),
        Atom
    );

    // Expr: handles + and -  (lower precedence)
    public static Rule Expr = Alt(
        Seq(Term, PLUS,  Ref(() => Expr)),
        Seq(Term, MINUS, Ref(() => Expr)),
        Term
    );

    public override Rule RootRule => Expr;
}