The front end parses zkc source files and produces an Abstract Syntax
Tree (AST) — a structured, in-memory representation of the program
that subsequent compiler phases can analyse and transform. The AST is
a tree where each node represents a syntactic construct. The root of
the tree is a program, which is a list of top-level
declarations (functions, memories, constants, type_alias). A function
declaration carries a list of typed variables (parameters, return
values, and locals) together with a sequence statements which
constitute its body.
The concrete node types are:
- Declarations (
pkg/zkc/compiler/ast/decl/) —Function,Memory(read-only input, write-once output, or read/write RAM),ConstantandTypeAlias. - Statements (
pkg/zkc/compiler/ast/stmt/) —Assign,IfGoto(conditional branch),Goto(unconditional branch),Return, andFail. - Expressions (
pkg/zkc/compiler/ast/expr/) —Add,Sub,Mul,Const(numeric literal),LocalAccess(read a local variable), andNonLocalAccess(reference to a constant or memory declared elsewhere). - Types (
pkg/zkc/compiler/ast/data/) — currentlyUnsignedInt(uN) andTuple(a composite of multipleuNfields used for multi-word memory buses).
A subtle aspect here is the handling of structured control flow
(e.g. if/else, while, for) as found in zkc source files.
Specifically, it is lowered during parsing into an unstructured form
comprising of conditional (IfGoto) and unconditional branches (Goto).
As an example, consider the following zkc function:
// compute n^m
fn pow(n:u4, m:u4) -> (r:u4) {
var i:u8 = 0
r = 1
while i < m {
r = r * n
i = i + 1
}
return
}
This can be compiled (using e.g. zkc compile --ir test.zkc) into
the following AST form:
fn pow(n:u4, m:u4) -> (r:u4) {
var i:u8
[0] i = 0
[1] r = 1
[2] if i>=m goto 6
[3] r = r * n
[4] i = i + 1
[5] goto 2
[6] return
Here, the Program Counter (PC) locations are given on the left of
each instruction. We can see how the original while loop was
transformed into a flat instruction sequence using a conditional
if/goto branch and an unconditional goto. Here, for example,
goto 2 indicates that control flow branches to instruction [2] at
this point.
The parser (pkg/zkc/compiler/parser/) converts a zkc source file
into an UnlinkedSourceFile — a list of declarations whose cross-file
references are still unresolved string names. It proceeds in the
usual fashion using two phases:
-
Lexing — the source text is tokenised into a flat token stream. Whitespace and comments are discarded. The lexer recognises keywords (
fn,const,pub,memory,input,output,if,while,for,return,fail,var,include,type…), identifiers, numeric literals (decimal, hex, binary, and2^Nexponent form), and operator symbols. -
Recursive-descent parsing — the token stream is consumed top-down by a hand-written recursive-descent parser. At the top level it dispatches on the leading keyword to parse each declaration kind (
parseFunction,parseConstant,parseInputOutputMemory,parseReadWriteMemory,parseInclude). Inside a function body, statements are parsed one at a time; structured control flow (if/else,while,for) is immediately lowered to sequences ofIfGotoandGotoinstructions so that all later phases work with a uniform, flat instruction list. A source map is built in parallel, recording the span of the original source file that corresponds to each instruction (for use in later error messages).
The parser halts and returns the accumulated errors on the first declaration that cannot be parsed, rather than attempting error recovery.
References between declarations in the AST (a function calling another
function, an expression reading a constant defined in a different
file, etc.) are initially represented as unresolved symbols —
plain string names. The linker (pkg/zkc/compiler/linker.go)
replaces these with resolved symbols that point directly to the
target declaration, and reports errors when symbols cannot be found,
or are of wrong kind, etc.
ZkC is somewhat unusual in that it uses an integer-range type
system (pkg/zkc/compiler/validate/typing/). Rather than tracking
only the declared type of a variable, the type checker propagates the
maximum possible value of every sub-expression. This allows an
accurate calculation for the minimum bit-width required to store any
evaluation of that expression (which is needed later when lowering
into the IR layer).
Control-flow analysis (pkg/zkc/compiler/validate/control_flow.go)
checks certain properties of every function using a dataflow
analysis:
-
Definite assignment — every variable must be definitely assigned before it is read.
-
Termination — every control-flow path must end at a
returnorfailinstruction and, furthermore, all return values must be assigned whenever areturnstatement is reached. -
Reachability - every instruction must be reachable via a path from the start of the function.
Errors are reported when these properties do not hold. For example, the following program fails definite assignment:
fn f(n:u4) -> (r:u8) {
var i:u4
r = i + n
return
}
Attempting to compile this produces an error variable i possibly undefined which is generated by the control-flow analysis.
Code generation (pkg/zkc/compiler/codegen/) translates the validated
program into a boot machine --- a representation of the program at
the IR layer below. This is intentionally simpler than the AST
representation and many features found at the AST layer do not exist
at the IR layer (e.g. compound types such as structs or fixed-size
arrays).