Compiler Description Toolkit
Build, translate & analyze programming languages with a hacker-grade .NET toolkit
CDTk provides a complete pipeline from grammar definition to binary output — all in pure .NET with zero runtime dependencies.
Define lexers and parsers using a clean C# DSL. Tokens, rules, and structural roles are declared as static fields — no code-gen step needed.
Translate source code between C#, Python, JavaScript, WASM, and LLVM IR using the semantic pipeline with round-trip fidelity.
Rich semantic tables map objects, morphisms, and two-cell transformations. The CBOR-based binary format ensures efficient serialization.
QUILL adds machine-learning-powered token classification using ML.NET with a 128-feature vector extracted from the UAB pipeline.
CRAB compiles to native x86 Windows PE executables via the integrated X86AsmParser, X86CodeGen, and PeWriter pipeline — zero LLVM needed.
Package input/output grammars and source files into a single binary SDXF bundle with tagged frames — perfect for toolchain distribution.
Install CDTk via NuGet and write your first grammar in under 5 minutes.
Add the NuGet package or clone the repo. Targets .NET 10 (QUILL targets .NET 8).
Subclass Grammar, declare tokens as static fields, and assign structural roles via a Map.
Use Compiler.CompileText() to translate between grammars, or CompileToBinary() for native output.
Customize output by overriding Grammar.Render(SemanticTable) to generate target-language syntax.
Package grammars and source files into a self-contained SDXF binary bundle for easy toolchain sharing.
+-------------------------------------------------------------+
| CDTk Translation Pipeline |
+-------------------------------------------------------------+
Source Text / Binary
|
v
+-------------+ +--------------+ +------------------+
| Grammar A |--->| UAB Parser |--->| SemanticTable |
| (tokens + | | (F# core) | | ObjectRow |
| rules) | | Step 1-3 | | MorphismRow |
+-------------+ +--------------+ | TwoCellRow |
+--------+---------+
|
v
+-------------+ +--------------+ +------------------+
| Grammar B |<---| Codegen |<---| Translator |
| (target) | | Render() | | Step 4-6 |
| | | override | | (morphism map) |
+-------------+ +--------------+ +------------------+
|
v
Output Text / Binary (PE EXE via CRAB, WASM via wasm-target)
Optional paths:
---------------------------------------------------------------
SDXF Bundle --> SdfxEncoder --> UAB SdfxDecode --> Grammar
QUILL --> ML.NET SDCA --> Token Class --> Semantic
CRAB --> X86AsmParser -> X86CodeGen --> PeWriter
Deep-dive into each phase of the compiler pipeline with hands-on examples and reference pages.
Install CDTk, write your first grammar, and compile hello world in minutes.
→Learn how CDTk tokenizes source text using the declarative token DSL.
→Understand rule definitions, nonterminal references, and parse-tree construction.
→Explore SemanticTable, ObjectRow, MorphismRow, and the CBOR binary format.
→Override Grammar.Render() to generate target-language syntax from the parse tree.
→Deep-dive into the structural role system and how patterns drive translation.
→Configure error reporting, warnings, and compile-time diagnostic messages.
→Build a complete programming language end-to-end with the CDTk grammar framework.
→Real-world examples: C# <> Python, WASM generation, PE EXE compilation, and more.
→A systems programmer operating in the shadows of the compiler stack. CDTk is the result of years spent dissecting language runtimes, reverse-engineering binary formats, and pushing the limits of what a .NET toolkit can do.
No affiliation. No employer. No personal details. Just code — clean, fast, and documented.
CDTk is open source because knowledge should be free. If you find it useful, star the repo. If you find a bug, open an issue. If you want to contribute, PRs are welcome.
Complete website redesign with LESS CSS. Full ML.NET pipeline via QUILL. Upgraded to .NET 10.
x86 PE EXE output. SDXF binary bundle format. CBOR semantic tables.
Round-trip: C# <-> Python <-> JS <-> WASM <-> LLVM IR. F# core UAB pipeline.
First public version with lexer/parser DSL, structural roles, and basic code generation.
CDTk made writing a multi-target compiler feel like filling in a form. The grammar DSL is genuinely elegant.
Round-trip C# to WASM to C# with zero data loss? I thought that was impossible. CDTk does it in two lines.
The QUILL ML integration is what sold me. Token classification with a pre-trained 128-dim model out of the box.
Shipping a PE EXE from C# source with no LLVM install? CRAB is incredible. Pure .NET, pure win.
The F# UAB pipeline is beautifully composable. I extended it with a custom grammar in a weekend.