Writing a code generator

AST graph WebAssembly logo

Dr. Paul Bone
paul.bone.id.au

About these slides

I these slides use some JavaScript and WebAssembly in addition to the reveal.js library. I tested this with Firefox 63, I did not test on other browsers or attempt to polyfill things (such as arrow functions) for older browsers. If it doesn't work try a more recent browser.

I have added some exposition slides, to help make reading the slides without the presentation clearer.

Press SPACEBAR to go to the next slide / reveal fragments.

Some images are fairly obviously not my own work. Diagrams, code and prose are Copyright (C) 2018 Paul Bone. You may view this information, but may not duplicate it, pass it off as your own work etc.

Who am I?

  • I work on the garbage collector of Firefox's JavaScript engine.
  • I work on compilers for fun.
  • Because compilers are fun.
  • I started on compilers when working on my honours' project in 2007, I was lucky.

Compilers

Compilers are made of a pipeline of operations, transforming different data structures.

  • Text
  • Abstract Syntax Tree
  • Assembly / Bytecode

This talk is about code generation.

Why code generation?

People assume it's hard.

Or worse, eldritch knowledge kept secret by a sacred order of complier authors.

1992 — Teen Talk Barbie
"Math class is tough"

1994 — Lisa vs Malibu Stacy
Malibu Stacy: "Thinking too much gives you wrinkles"
Lisa Lionheart: "Trust in yourself, and you can achieve anything"

Exposition

The point I'm trying to make with these slides and what I was saying verbally; is that we shouldn't tell ourselves "X is hard" because if we do, then we'll never attempt it. In some cases, this can even lead to gross misunderstandings held about a topic.

So you're saying it's easy?

Saying something is easy can also be a problem.

What I'm getting at is that it's not as hard as you may think. It's achievable.

I also like to say that it's fun.

Any topic can be deep

These are all deep topics, but you can learn the basics in an afternoon.

  • Music performance
  • Pottery
  • Quantum mechanics
  • Barista

The basics are fairly basic, but the depths can get really deep.

WebAssembly

  • Cross-platform virtual-machine for the web
  • Open format
  • Compact binary format & human readable text format (s-exprs)
  • Safe: memory-safe, sandboxed
  • Under development/specification
  • MVP ready
  • Available in the 4 most popular browsers now!
  • webassembly.org

WebAssembly is stack-based

f= 9c 5 +32
Stack
push 32;
push 35; // c
push 9;
mul;
push 5;
div;
add;

Datatypes

  • i32, i64, f32, f64 and function types
  • More to come

2's complement arithmetic:

  • Most integer operations are sign-agnostic
  • division, less-than, greater-than etc have signed/unsigned versions
  • Floats need their own operations

Relational operators use i32

The relational operators (equal, not-equal, less-than etc) return i32 regardless of their input type.

Conditional branch consumes an i32

// x (64-bit) is on the stack
i64.const 5
i64.lt_s      // (i64 i64 - i32)
if ...        // (i64 -)

Control flow

Control flow is structured

instr= atomic-instr BLOCK instr* END IF type THEN instr* ELSE instr* END LOOP type instr* END
  • Block-like instructions all have labels
  • Only branch to a label only from within its block

This is fairly unusual for an IR, it allows: break and continue but not arbitrary goto.

Indirect calls

call_indirect x

where x is a function type.

  1. Pops the i32 from the top-of-stack, let it be a
  2. Indexes the function table with a, let it be f.
  3. Verify that f has type x.
  4. Call f.

What's missing

Probably coming in a future version:

  • GC
  • Returning multiple values
  • Tail-calls
  • DOM integration

Are being considered / might be considered:

  • Unstructured control flow
  • Stack shuffling (but there are locals)

I don't work WebAssembly, including it's specification.

Exposition

The next slides show a Hypothetical language. This is designed to be as simple as possible in order to demonstrate code generation. This is not supposed to be a good code generator but a basic code generator.

Hypothetical language

Abstract Syntax Tree

data Func = Func {
        name        :: String,
        args        :: [String],
        body        :: Expr
    }
data Expr = BOp Op Expr Expr
          | Let String Expr Expr
          | Case Expr [PatExpr]
          | Call String [Expr]
          | Var String
          | Lit32 Integer

data Op = Add
        | Subtract
        | ...
        | LessThan
        | LessThanOrEqual
        | ...
        | NotEqual
        | Equal
data PatExpr = PatExpr Pattern Expr

data Pattern = Number Integer
             | Wildcard

Example program

c= 35 f= 9c 5 +32
Let "c" (Lit32 35)
        BOp Add
            (BOp Div
                 (BOp Mul (Lit32 9)
                          (Var "c"))
                 (Lit32 5))
            (Lit32 32)

What it doesn't have

  • Only one data type 32-bit signed integer.
  • Higher order values / calls.
  • Closures
  • Any data structure, eg: cons cell

Generating WebAssembly

Visit each node depth first, generate instructions.

i32.const 9
i32.get_local 0
i32.mul
i32.const 5
i32.div
i32.const 32
i32.add

How did we know where this variable was?

Allocate locals

data Symtab a = Symtab {
                    symbols     :: M.Map a Int,
                    next_id     :: Int
                }

build_locals :: Ast.Expr -> State (S.Symtab String) Ast.Expr
build_locals (Ast.Let var0 let0 in0) =
    do do_rename <- s_member var0
       (var, in2) <- if do_rename
                        then renaming
                        else do s_new var0
                                return (var0, in0)
       let_ <- build_locals let0
       in_ <- build_locals in2
       return (Ast.Let var let_ in_)
  • I have no duplicate variables, this could have been more optimal.
  • Add each variable to the symbol table.

Get and set locals

Get

Var x:

  1. Lookup the var in the symbol table.
  2. Generate get_local id.
  3. Which will read the local and push its value onto the stack.

Set

Let x e1 e2:

  1. Generate the code for e1.
  2. Lookup the var in the symbol table.
  3. Generate set_local id.
  4. Which will take the result of e1 from the stack and save it to the local
  5. Generate the code for e2.

Case

...
get_local 1 // Step 1.
set_local 2 // Step 3.
get_local 2 // Step 4.1
i32.const 1 // cont.
i32.eq      // cont.
(if (result i32)
  (then
    (i32.const 1)) // 4.2.
  (else            // 4.1.
    (get_local 0)  // 4.2.
    (i32.const 1)
    i32.sub
    ...)
)

WebAssembly makes this very easy with its nested blocks.

Case expr pats

  1. Generate the code for expr
  2. Create a new variable.
  3. Generate: i32.set_local v.
  4. for each pattern: PatExpr pat expr
    1. Generate test code for pat.
    2. Generate code for expr.
  5. Generate: unreachable

Exposition: This slide kind-of has a bug. I may fix it.

Demo!

Status:
Input:
Output:

Source:

-- Sorry, the syntax doesn't actually support
-- multi-line functions.
public fib n = let test = n < 2 in case test of
	1 -> 1
	0 -> fib(n - 1) + fib(n - 2)

WebAssembly Text Format (WAT)

(func (export "fib")
  (param i32)
  (result i32)
  (local i32 i32)
  (get_local 0)
  (i32.const 2)
  i32.lt_s
  (set_local 1)
  (get_local 1)
  (set_local 2)
  (get_local 2)
  (i32.const 1)
  i32.eq
  (if (result i32)
    (then
      (i32.const 1))
    (else
      (get_local 2)
      (i32.const 0)
      i32.eq
      (if (result i32)
        (then
          (get_local 0)
          (i32.const 1)
          i32.sub
          (call 2)
          (get_local 0)
          (i32.const 2)
          i32.sub
          (call 2)
          i32.add)
        (else
          unreachable))))
  return)

Thank you

This code generator:
github.com/PaulBone/ast2wasm
Me:
paul.bone.id.au / @Paul_Bone

Code generation is challenging / rewarding / fun.