Pattern Matching

CDTk identifies language constructs through structural roles — string labels assigned to tokens that tell the translation engine how to map a construct from one language to another. Structural roles are the bridge between grammars.

Built-in Structural Roles

CDTk's translation engine recognises these roles in both the source and target grammar. Roles with the same name in both grammars are automatically mapped during translation:

Role StringMatchesTranslation Example
IfKeywordConditional branch startififblock_if (WASM)
ElseKeywordConditional else branchelseelse
LoopKeywordIteration constructwhileloop (WASM)
ReturnKeywordFunction returnreturnreturnreturn (WASM)
FuncKeywordFunction declarationstaticdeffunc (WASM)
ClassKeywordClass or struct declarationclassclass
TypeKeywordReturn type annotationvoid, int, string — preserved as-is

TypeKeyword and Return Types

The TypeKeyword role is special. CDTk's translation Step 3 and PrettyPrinter.format both check for this role to decide whether to preserve or strip a keyword in function signature output.

Without TypeKeyword, a keyword like void would be stripped during translation (since it has no matching structural role in a dynamically-typed target grammar like Python). With TypeKeyword, CDTk emits function void Main(...) in the intermediate representation, which the target grammar can then format correctly:

// In X86AsmGrammar (CRAB project)
public static Map Structural = new() {
    { KW_VOID, "TypeKeyword" },  // preserved in "function void Main(...)"
    { KW_INT,  "TypeKeyword" },  // preserved in "function int Add(...)"
    { KW_FN,   "FuncKeyword"  },
};

// In CDTk.fs — Step 3 logic
// tgtIsTypeKw3: if token has TypeKeyword role, emit it verbatim
// isTypeKw in PrettyPrinter.format: same check for text formatting
💡
TypeKeyword prevents stripping
Without TypeKeyword, void and int would be silently dropped during translation to Python (which has no type annotations). Assigning TypeKeyword preserves them in the intermediate output for grammars that do need them (e.g., C#, WASM, LLVM IR).

Defining Custom Roles

Any string can be used as a role. Define custom roles in a public static Map Structural field. CDTk will automatically discover and apply it via ApplyStaticMaps():

public static Map Structural = new() {
    // Standard roles
    { KW_FN,     "FuncKeyword"    },
    { KW_RETURN, "ReturnKeyword"  },
    { KW_IF,     "IfKeyword"      },

    // Custom roles for async/await support
    { KW_ASYNC,  "AsyncKeyword"   },
    { KW_AWAIT,  "AwaitKeyword"   },

    // Custom roles for module system
    { KW_IMPORT, "ImportKeyword"  },
    { KW_EXPORT, "ExportKeyword"  },

    // Type annotations
    { KW_VOID,   "TypeKeyword"    },
    { KW_INT,    "TypeKeyword"    },
    { KW_STRING, "TypeKeyword"    },
    { KW_BOOL,   "TypeKeyword"    },
};

Role Usage in Step 3 (Translation)

CDTk's F# translation engine (Step 3 in CDTk.fs) scans both the source and target grammar structural maps. When translating a token with a known role, it looks for a token in the target grammar that has the same role, and substitutes it. If no match is found:

  • For most roles — the token is dropped (e.g., Python doesn't need static).
  • For TypeKeyword — the token is preserved verbatim, even without a match in the target.
// Illustrative translation decision in Step 3 pseudocode:
// for each token in source:
//   if token.role == "TypeKeyword"  => emit as-is
//   if target has token with same role => substitute target token
//   else if token is structural    => drop (e.g., C# "static")
//   else                           => emit verbatim

Role Usage in PrettyPrinter

The PrettyPrinter.format function also checks structural roles when formatting output. It uses isTypeKw to determine whether a keyword should remain inline in a function signature or be omitted:

// F# PrettyPrinter.format logic (simplified):
// let isTypeKw token = token.StructuralRole = Some "TypeKeyword"
// Tokens with TypeKeyword role appear in: "function void Main()" output
// All other structural keywords are omitted or replaced

Stdlib Name Mapping

CDTk builds a cross-language standard library name mapping so that Console.WriteLine in C# maps to print in Python, etc. This is built from a list of equivalent function signatures across grammars:

// Build a stdlib mapping between multiple grammars
var nameMap = Compiler.BuildStdlibNameMap(
    grammars: [new CSharpGrammar(), new PythonGrammar(), new WasmGrammar()],
    stdlibFunctions: stdlibList
);

// Override FunctionDeclPattern in your grammar to tell CDTk
// how to extract function names from source for stdlib mapping:
public override Regex FunctionDeclPattern =>
    new Regex(@"^\s*(?:public\s+)?(?:static\s+)?\w+\s+(\w+)\s*\(");

Cross-Grammar Role Matching

Here is how the same function body is translated through three grammars using structural roles:

ConstructC# tokenPython tokenWASM tokenRole
Function declarationstaticdeffuncFuncKeyword
ReturnreturnreturnreturnReturnKeyword
Conditionalififblock_ifIfKeyword
LoopwhilewhileloopLoopKeyword
Return typeint(omitted)i32TypeKeyword