Syntax Analysis
CDTk's parser builds a structured parse tree from a token stream using declarative Rule definitions. Like tokens, rules are public static fields in your Grammar subclass.
Parser Overview
After lexing, CDTk runs a recursive-descent parser guided by your rule tree. The parser starts at the grammar's root rule and recursively attempts to match sub-rules against the token stream. On a successful match the parser returns a ParseTree node; on failure it throws a ParseException.
The CDTk parser is intentionally greedy and left-to-right: it does not backtrack across Alt alternatives once a match starts consuming tokens. Structure your alternatives from most-specific to least-specific to avoid ambiguity.
Rule Constructors
| Constructor | Meaning | Parse-tree result |
|---|---|---|
Seq(r1, r2, …) | All children must match in order | Composite node with N children |
Alt(r1, r2, …) | First matching alternative wins | Node wrapping the matching child |
Opt(r) | Zero or one occurrence | Node wrapping child, or empty node |
Rep(r) | Zero or more occurrences | List node (may be empty) |
Rep1(r) | One or more occurrences | Non-empty list node |
Ref(() => rule) | Lazy nonterminal reference (enables recursion) | Delegates to referenced rule at parse time |
Token t (implicit) | Match a single token | Terminal leaf node |
Seq and Alt
Seq builds a rule that matches each child in order — like a production in a BNF grammar. Alt is an ordered choice; the first alternative that matches without error is used:
// Matches: IDENT "=" expr ";"
public static Rule AssignStmt = Seq(IDENT, ASSIGN, ExprRule, SEMI);
// Matches either a return or an assignment
public static Rule Statement = Alt(ReturnStmt, AssignStmt);
ReturnStmt partially matches and then fails, CDTk does not fall back to AssignStmt by default. Order alternatives from most-specific (longest prefix) to least-specific.Opt, Rep, Rep1
Use repetition combinators to express optional parts and variable-length lists:
// Optional return type annotation
public static Rule ReturnType = Opt(Seq(ARROW, TypeName));
// Zero or more statements in a block
public static Rule Block = Seq(LBRACE, Rep(Statement), RBRACE);
// Comma-separated argument list with at least one argument
public static Rule ArgList = Seq(IDENT, Rep(Seq(COMMA, IDENT)));
public static Rule Params = Alt(ArgList, Opt(IDENT)); // 0 or 1+ params
Nonterminal References (Ref)
C# static fields are initialized in declaration order, so a rule cannot directly reference another rule declared later in the same class. Use Ref(() => rule) to create a lazy reference resolved at parse time. This is also how you express recursion:
// Right-recursive expression: e.g. a + b + c
public static Rule ExprRule = Alt(
Seq(IDENT, PLUS, Ref(() => ExprRule)), // recursive case
Seq(IDENT, MINUS, Ref(() => ExprRule)), // recursive case
INT, // base case: literal
IDENT // base case: identifier
);
// Mutual recursion across rules — always use Ref for forward references
public static Rule IfStmt = Seq(
KW_IF, LPAREN, Ref(() => ExprRule), RPAREN,
Ref(() => Block)
);
SdfxEncoder.BuildGrammarNode Walk() follows Ref() lambdas via reflection to include all rules reachable from the root in the SDXF binary output. Every rule referenced through Ref() will be serialized.Root Rule
Override Grammar.RootRule to specify where the parser starts. If you don't override it, CDTk uses the first public static Rule field alphabetically — which is usually not what you want:
public class MyLang : Grammar {
public static Rule ExprRule = Alt(Seq(IDENT, PLUS, Ref(() => ExprRule)), IDENT);
public static Rule StmtRule = Alt(AssignStmt, ReturnStmt, IfStmt);
public static Rule Program = Rep1(StmtRule);
public override Rule RootRule => Program; // explicit root
}
The Parse Tree
After a successful parse, CDTk produces a tree of ParseNode objects. Each node is either:
- Terminal — a single matched token.
Namecontains the scanned text (e.g.,"if","42"). - Composite — an internal node with child
ParseNode[]corresponding toSeq,Alt,Repetc.
The tree is then passed to the semantic analysis phase which builds a SemanticTable.
SDXF Grammar Encoding
When you call SdfxEncoder.Encode(), the encoder serializes the grammar structure into a binary SDXF bundle so that the F# UAB engine can run CDTk translations without compiling C# code. Rules are serialized with these tags:
| Tag | Meaning |
|---|---|
0x40 | Rules container |
0x41 | Individual rule entry (kind + children) |
0x50 | Root rule ID |
Full Example Grammar
A complete arithmetic expression grammar with operator precedence implemented via layered rules:
public class CalcGrammar : Grammar {
public static Token PLUS = Op("+");
public static Token MINUS = Op("-");
public static Token STAR = Op("*");
public static Token SLASH = Op("/");
public static Token LPAREN = Punct("(");
public static Token RPAREN = Punct(")");
public static Token NUM = Num();
// Atom: number or parenthesised expression
public static Rule Atom = Alt(
Seq(LPAREN, Ref(() => Expr), RPAREN),
NUM
);
// Term: handles * and / (higher precedence)
public static Rule Term = Alt(
Seq(Atom, STAR, Ref(() => Term)),
Seq(Atom, SLASH, Ref(() => Term)),
Atom
);
// Expr: handles + and - (lower precedence)
public static Rule Expr = Alt(
Seq(Term, PLUS, Ref(() => Expr)),
Seq(Term, MINUS, Ref(() => Expr)),
Term
);
public override Rule RootRule => Expr;
}