No Compiler, part 3: A second, tiny language embedded inside Lua

April 16th, 2016

In the last couple weeks I’ve been chasing this weird daydream of writing scripts that directly talk to the LLVM compiler backend library, thus making software without a compiler frontend. Each script would be its own tiny frontend. My plan was this would let me jump directly into the "interesting bits" of programming language development. I wrote a couple of blog posts about this project, but so far those posts have only been about me learning to use LLVM. I’ve sort of got a handle on that now, so it’s time for next steps.

Here’s what "next steps" means to me: Normally, if you were developing a programming language with LLVM, you’d design an "abstract syntax tree" that roughly captures the semantics of the language you are designing. You’d design a syntax that reduces to your AST, and then you’d write code to reduce your AST to the IR ("intermediate representation") that the LLVM library consumes. What I imagine is that in my not-a-compiler experiments, I’d still have an AST, just my program would be directly constructing the AST as a data structure in memory instead of building it from a textual representation (or in other words: a "programming language"). This would let me try out different programming language semantics without having to pick a syntax. Once I’ve settled on a semantics, I could grow a syntax around it.

Or… maybe not? The thing that makes this idea seem "weird" and not just developing a language compiler piecemeal is the thought that maybe I could never bother with the syntax and just do entire projects by writing these scripts that generate ASTs as data structures. In the Haskell world they have this very specific meaning to the word "DSL"— normally that’s short for "domain specific language" and refers to writing a little language as a helper for some task, but what Haskell users like to do is design little sub-languages entirely out of idiomatic Haskell. They inject a few functions and operators (in Haskell also functions) into a namespace and call only those, and because Haskell has flexible syntax and a weird compiler this is as good as writing a whole new language compiler. I think what I want to do is create a library that effectively constructs that kind of DSL within Lua. I’d be writing software and my "programming language" would be an idiomatic Lua that, when executed by the Lua interpreter, generates my program’s executable.

Housekeeping

What I’ve been doing so far is something a little similar— instead of designing an AST and constructing it in memory, I treat the in-memory representation of LLVM IR as if it were my "AST" and construct that in memory. This works okay, but it is clumsy. It’s clumsy first off because it is a clumsy thing to be doing— LLVM IR is basically assembly— and second off because my setup for calling the builder functions is a little clumsy. Whatever I build next will be based on the LLVM wrapper I’ve been using so far, so before I move on I should make my LLVM wrapper at least nice to use.

First things first: The version of LLVM I’ve been using doesn’t exist. I downloaded LLVM from MacPorts, and the newest version they had listed was 3.9, so I installed that. It turns out 3.9 hasn’t been released yet and I’m using code from LLVM’s development branch. (Jeremy is very thorough.) This is pretty bad for my purposes because apparently 3.9 still has significant API changes planned before official release, so if I check out this code on a new computer in two years and try to build it it won’t work. No good. Easy to fix though: I download a copy of LLVM 3.8, repeat the process of handmaking Lua bindings from my first blog post, and rename the 3.9 bindings to indicate it’s a particular beta version. When running the script you now have a choice of LLVM wrappers, 3_8 or 3_9-r259341.

Second off: again, these wrappers are clumsy. Here’s a stretch of code from my last blog post:

local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

-- printf divisor
local printfArgs = ffi.new("LLVMValueRef [2]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring")
printfArgs[1] = divisorDeref
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

There is a bunch of repetition and boilerplate here. LLVM.LLVM and builder.llvm.builder appear over and over. A lot of the vertical space is taken up by the ffi.new idiom, which I have to use over and over because the C functions accept C arrays, not Lua lists, and this ffi.new thing is how you make a C array in Lua. This all makes the code hard to read and therefore bugs hard to spot. There are some things I can do about this.

  1. LLVM.LLVM is annoying

    I’m using LLVM through the C interface. Because C has no namespacing, LLVM-C attempts to be a good citizen by prefixing every single symbol it exports (functions, structures, enums, everything) with "LLVM". Because Lua does have namespacing, it quarantines all the imported symbols inside the LLVM table. This means "LLVM.LLVM" is repeated endlessly throughout my code, like a constant background static.

    I can fix this using weird metatable magic. In Lua (unless you’re using the C FFI) the sole data structure is the table, which is a built-in dictionary backed by a hashtable or something. Object-like capabilities are offered using something called the "metatable"; each table may have an invisibly associated partner table containing a set of standardized functions, which get invoked when the interpreter hits an exceptional case while working with the real table. Functions that can be in the metatable include __index, which is called if a key is not found, or __newindex, which is called before adding a new key binding, or __add, which is called if you attempt to use arithmetic + on a table (normally illegal). The normal use of metatables is to emulate object-oriented programming; you can easily design a class system or prototype system and emulate it using the metatables. At core though an empty table with a custom metatable is basically a magical thing where a custom function gets called anytime you read from it, write to it, or do arithmetic on it.

    Here’s what I’m going to use this for:

    -- Pass this one of the metatables below
    function emptyWithMetatable(metatable)
        local table = {}
        setmetatable(table, metatable)
        return table
    end
    
    -- Transforms table.someCall into LLVM.LLVMSomeCall
    llvmMetatable = {
        __index = function(_, key)
            return LLVM["LLVM" .. capitalize(key)]
        end
    }
    
    llvm = emptyWithMetatable(llvmMetatable)

    The llvm object is empty, so every attempt to read from it invokes the __index function; the __index function capitalizes the lookup key and prepends "LLVM". In other words, saying llvm.int32Type is now equivalent to saying LLVM.LLVMInt32Type.

    Actually, I can do a little better than this— the code above means on every single LLVM function call we’ll be doing a string manipulation on the name, which is less efficient than it could be. But I can add a higher-order function and:

    -- Wrap an __index function in this and it will cache the results in the table,
    -- such that your __index function will only be called once per unique index.
    local function cacheAndReturn(fn)
        return function(table, key)
            local value = fn(table, key)
            table[key] = value
            return value
        end
    end
    
    -- Transforms table.someCall into LLVM.LLVMSomeCall
    llvmMetatable = {
        __index = cacheAndReturn(function(_, key)
            return LLVM["LLVM" .. capitalize(key)]
        end)
    }
    
    llvm = emptyWithMetatable(llvmMetatable)

    That might be a little harder to read, but now every time I look up a key in my magic llvm table, the metatable will cache the result in llvm. On the next attempt to read that key, Lua will just find the key, __index will not be called, and no function call or string manipulation will occur.

  2. Repetitive first argument

    The LLVM library was designed to be object-oriented; most functions act "on" an object, such as a "module" or a "builder". Again I am using the C interface, where the wrapper authors dealt with object methods by just always passing the relevant object as the first argument. In my scripts however I tend to create one module or one builder and use them the entire program, so passing builder.llvm.builder on every single function— as happens in the code snippet above— gets annoying.

    Again I can make this look a little prettier with metatables:

    -- Transforms table.key(input) into target.key(arg, input)
    -- In other words, this wraps a table while prebinding all function calls with one argument 
    function bind1Metatable(target, arg)
        return {
            __index = cacheAndReturn(function(table, key)
                local func = target[key]
                return function(...) return func(arg, ...) end
            end)
        }
    end
    
    builder = emptyWithMetatable(bind1Metatable(llvm, llvm.createBuilder()))

    I now have an object builder which holds an inner LLVM builder object, such that saying builder.buildBr(loop) is equivalent to saying LLVM.LLVMBuildBR(innerBuilder, loop). Again I’m using the cache wrapper, so every time I access a new function from builder it has to construct a prebound wrapper, but on future lookups it just regurgitates the wrapper from before. (Note that as a tiny, probably-inconsequential optimization I’m calculating the value of func outside of the constructed function, so that gets calculated only once, too).

  3. ffi.new is annoying

    This one doesn’t even require metatables to work around:

    function cArray(type, values)
        local arrayType = string.format("%s [%d]", type, #values)
        local array = ffi.new(arrayType)
        for i,v in ipairs(values) do
            array[i-1] = v
        end
        return array
    end

    With this helper function, cArray("int", {1,2,3}) gets me a C FFI array containing (1,2,3). Simple. Notice the index garbage since Lua is 1-indexed.

    Since I will almost always be using the types LLVMTypeRef and LLVMValueRef, I can simplify this a little further even:

    typeArray = func.bind1(cArray, "LLVMTypeRef") -- Uses Penlight 'func' library
    valueArray = func.bind1(cArray, "LLVMValueRef")

With these helpers, my snippet from above now looks like:

local valueDeref = builder.buildLoad(value, "ValueDeref")
local divisorDeref = builder.buildLoad(divisor, "DivisorDeref")

-- printf divisor
local printfArgs = valueArray {
    builder.buildGlobalStringPtr("%d ", "tmpstring"),
    divisorDeref
}
builder.buildCall(printfFn, printfArgs, 2, "tmpcall")

That is way more readable. In the adjusted code, most lines are half the length they were at the end of the last post. By the way, notice me saying valueArray { }; in Lua, if a function name is followed by curly brackets, Lua assumes you were trying to invoke the function with the table inside the { } as a single argument. I will be abusing this heavily later.

Here’s the code so far. Now I get to move on to the fun stuff.

Shadow language [A DSL]

It’s time to design my "DSL" (again: "domain-specific language") for specifying programs. After that last stretch, I’m pretty hyped up on lua metatable magic, so I wanna see how metatables can help me.

With empty tables with custom metatables (…ETCMs?), I have these magical semantic things that I can read from, assign to, treat like numbers, and treat like functions, all using the natural syntax for those things. It sure would be nice if reading variables, assigning fields, adding numbers and calling functions in my language could "look like" actually doing those things in normal syntax. I mean, who wants to write LISP? (LISP users, probably.) I imagine I could set up a world where I can say things like:

variables.x = 3
variables.y = variables.x + 5
functions.print( variables.y )

…where variables and functions are ETCMs; and the objects you get if you read from those tables are ETCMs; and when this code runs it might look like I’m assigning 3 to x, adding 5 to that to get y, and then calling print with the value of y, but I’m not actually doing any of these things. Instead, I’m implicitly calling custom metatable functions that are registering each of these actions as I perform them, and at the end I can extract the record of which metatable functions I called as the specification of my program. The final, real program would "shadow" the fake actions I performed in Lua program.

I decide I’m going to call this DSL "shadow language", and call the magical recording-interactions-only ETCMs you interact with in it "shadow objects", because referring to "the DSL" all the way through this blog post would get really confusing and "ETCM"s might be the worst piece of tech terminology I’ve ever heard.

Instead of trying to write a full spec for the shadow DSL, I decide to write a program in it. I rewrite my number-factoring program from the last blog post:

backend.module("main", function(e)
    e.fn.printf = e.fn(e.type.int32, {e.type.int8.ptr, e.type.vararg})
    
    e.fn.atoi = e.fn(e.type.int32, {e.type.int8.ptr})
    
    e.fn.printFactors = e.fn(nil, {e.type.int32}, function(e, input)
        e.var.value = input
        e.var.divisor = 2
        e.while(e.var.divisor > e.var.value, e.block(function(e)
            e.if(e.var.value % e.var.divisor == 0,
                e.block(function(e)
                    e.fn.printf:call("%d ", e.var.divisor)
                    e.var.value = e.var.value / e.var.divisor
                end),
                e.block(function(e)
                    e.var.divisor = e.var.divisor + 1
                end)
            )
        end))
    end)

    e.fn.main = e.fn(e.int32, {e.type.int32, e.type.int32.ptr}, function(e, argc, argv)
        e.if(argc == 2,
            e.block(function(e)
                e.fn.printFactors:call( e.fn.atoi( argv.idx(1) ) ) 
                e.return(0)
            end),
            e.block(function(e)
                e.fn.printf:call("Please enter exactly one number as argument")
                e.return(1)
            end)
        )
    end)
end)

There’s a few things happening here. I’ll break some of them down:

  • The environment: At all times, I’m working off of this "environment" variable e. e contains a container field var (previously variables), a container fn (previously functions), containers types and blocks, and some helpers like if and while. Each of the containers can be assigned to, to register or (for mutable things like variables) mutate a named item (a variable, function, type or block label). Containers can also be read from, to get a reference (maybe a forward reference) to a named item. Or containers can be called like a function, to create an item in place (for example notice I say e.fn.printFactors = e.fn(...).
  • Construction functions: When I ask a container to create a new function, or a new block, I pass in a lua function. This is because "code", in the shadow language, is encoded in a sequence of Lua statements. So if I want to nest a stretch of "code" inside another, I have to control the sequence of execution somehow. I do that by embedding the "inner" code in its own function, which will get passed its own environment, and will run when the DSL library is ready for it to run.
  • "call": One weird thing: In shadow language a function call doesn’t actually call the function but just returns a record of the idea that a function needs to be called to produce a particular value. When I say e.var.x = e.fn.atoi( "3" ) in the code above, the atoi( "3" ) isn’t a procedure that has an effect, it’s an item in an expression tree. This means if you called e.fn.aoti( "3" ) alone on a line it would have literally no effect, since you didn’t do anything that registers that particular expression tree as an instruction that needs to be emitted. So I need some way to tell the shadow language I want a standalone function call to be emitted, and the way I do this is the awkward :call on the handful of void-returning functions I call here. (If this paragraph confused you: This is my fault, not yours. This bit is confusing.)

So: I have a little DSL! Is it a good DSL? Well, the "printFactors" program is a lot shorter in this than it was in the last post, when I was writing out LLVM IR. I don’t really like how the new program looks, though: The e. everywhere is bugging me. That looks to me like boilerplate, and as previously insinuated, I don’t like boilerplate. When I see boilerplate my eyes start glazing over, and when my eyes glaze over I start missing bugs. Luckily this is something I can fix with more weirdo Lua magic! Lua has a ridiculously abusable-looking feature called setfenv(), which lets you take a function and, before you call it, change the global variables which that function perceives as existing. If I plan to use this, I can eliminate e entirely by making it implicit:

backend.module("main", function()
    fn.printf = fn(type.int32, {type.int8.ptr, type.vararg})
    
    fn.atoi = fn(type.int32, {type.int8.ptr})
    
    fn.printFactors = fn(nil, {type.int32}, function(input)
        var.value = input
        var.divisor = 2
        do.while(var.divisor > var.value, function()
            do.if(var.value % var.divisor == 0,
                function()
                    do.fn.printf("%d ", var.divisor)
                    var.value = var.value / var.divisor
                end,
                function()
                    var.divisor = var.divisor + 1
                end)
        end)
    end)

    fn.main = fn(int32, {type.int32, type.int32.ptr}, function(argc, argv)
        do.if(argc == 2,
            function()
                do.fn.printFactors( fn.atoi( argv.idx(1) ) )
                do.return(0)
            end,
            function()
                do.fn.printf("Please enter exactly one number as argument")
                do.return(1)
            end)
    end)
end)

Here the various members of efn, var, etc— are forcibly populated into each function’s scope before it’s executed. fn doesn’t exist before builder.module() gets called at the top, and it doesn’t exist afterward. It will only exist in the temporary global scope injected by setfenv(). To limit the number of variables I need to inject, I quarantine some of the helpers like if and while into a do object, implied to be the container for non-expression statements— which, conveniently, gives me a better way of handling that :call thing from before.

And once I have things set up like this, I… actually don’t mind this! This is sort of okay. I can read this code. It’s still a bit cumbersome, and the scattered function()s everywhere are more than a little odd, but I’d rather write this than COBOL. This is a good start.

So: I have now a program written in my little program-specification DSL. Can I get to a point where I can compile it?

Improvisational compiler-stage design

When I was originally daydreaming about the shadow DSL, I think I imagined that each of the metatable calls would translate directly into an LLVM Build invocation. If I said var.x + var.y, for example, that would call a metatable __add somewhere which every time it is invoked would just plain call builder.buildAdd( builder.buildLoad(x, ""), builder.buildLoad(y, "") ) and return the resulting LLVMValueRef. This would have been a bad idea even by the standards of this project. Besides being brittle, it wouldn’t work with the syntax in the "printFactors" sample; if I’m using the LLVM builder I need to alloca the variables first thing when generating the function, but with my current DSL I won’t know what variables a function uses until after I’ve executed the DSL code. I need to do something a little bit smarter. I need to create an AST.

Here, finally, I’m doing something normal that a normal compiler would do. There’s probably best practices for how to design an AST. I don’t know what those are so I decide to wing it. I try to imagine what simple structure could be easily constructed from the shadow DSL above, which could also be easily decomposed into LLVM IR.

I imagine I’ll have a tree made of objects of six types: AstModule, AstFn, AstBody, AstBlock, AstType, AstExpr. "Body" is separate from "Function" since external functions like printf don’t have bodies. And the spec for these objects… well, just writing out a sample and then assuming I can craft a library to fit it later seemed to work okay for the shadow language, so maybe it can work for prototyping the AST classes also. I sit down and write out code instantiating an AST for something like my printFactors program inline:

module = AstModule{name="main"}
int32 = AstType{literal=int32Type}
int8 = AstType{literal=int8Type}
void = AstType{}
int8ptr = AstType{ptr=int8}
int8ptrptr = AstType{ptr=int8ptr}
module.namedFns.printf = AstFn{ret=int32, args={int8ptr}, vararg=true}
module.namedFns.atoi = AstFn{ret=int32, args={int8ptr}}
module.namedFns.printFactors = AstFn{ret=void, args={int32}, body=AstBody{
    vars = { value=int32, divisor=int32 },
    namedBlocks={
        entry=AstBlock{
            exprs={
                AstExpr{op='store', var='value', value=AstExpr{op='getParam',idx=0}},
                AstExpr{op='store', var='divisor', value=AstExpr{op='literal',value=2}}
            },
            exit='loop'
        },
        loop=AstBlock{
            exprs={
                AstExpr{op='branch', cond=AstExpr{op='bin', op2='>=', value={
                    AstExpr{op='load', var='value'}, AstExpr{op='load', var='divisor'}
                }}, y='modulo', n='done'}
            }
        },
        modulo=AstBlock{
            exprs={
                AstExpr{op='branch', cond=AstExpr{op='bin', op2='==', value={
                    AstExpr{op='bin', op2='%', value={
                        AstExpr{op='load', var='value'}, AstExpr{op='load', var='divisor'}
                    }},
                    AstExpr{op='literal', value=0}, 
                }}, y='divide', n='increment'}
            }
        },
...

This… goes on, for quite a while. I’ll spare you pasting the entire thing. It’s actually kind of dreadful. The written-out AST turns out to be almost as long as the LLVM-assembly-written-out-in-lua blob from my last blog post, but is somehow less readable. I guess that’s probably okay— the AST isn’t designed to be human-readable or writable.

The job of my AST classes is going to be to hold the structure of the program and generate LLVM objects from that structure. They’re all going to inherit from an AstBase:

class.AstBase()
    function AstBase:_init(spec)
        self.spec = spec
    end

    function AstBase:llvm()
        if not self.cached then
            self.cached = self:render()
        end
        return self.cached
    end

The constructor here follows a pattern that I use in every single Lua program I write, but I’ve never seen anyone else do; the arguments to the constructor are taken as a structure, which I store as self.spec. Normally in OO programming when you write constructors you take an argument list and then your constructor assigns each positional argument to a named variable in the object. I hate writing these. It’s just the most tedious, unnecessary-feeling thing. So instead in Lua, where I’m allowed to do this, my constructors take a structure full of named fields— you’ll notice in my sample code above, the constructor invocations all use {} brackets instead of ()— and nothing needs to be unpacked. Notice that this means all the constructor arguments are effectively optional.

Ultimately what these AST classes exist for is to convert a spec structure into an LLVM primitive. You pass in the spec to the constructor, and you fetch back an LLVM primitive by calling llvm(). To implement an AST class you only need to implement render(), which llvm() calls (and caches the result of, to guarantee render() only gets called once even if a single object shows up multiple places in the tree).

I try implementing one of the classes— AstType, which I expect will be the simplest:

class.AstType(AstBase)
    function AstType:_init(spec) self:super(spec) end

    function AstType:render()
        if self.spec.literal then return self.spec.literal end
        if self.spec.ptr then
            return llvm.pointerType(self.spec.ptr:llvm(), 0)
        end
        return llvm.voidType()
    end

So AstType expects one of three different specs:

  • {literal=}, with a premade LLVM type object as literal; in this case llvm() returns the type object.
  • {ptr=}, with an AstType as ptr; in this case llvm() returns a pointer type of the child type.
  • Empty {}, in which case it returns the void type.

(The constructor in this code snippet is unnecessary, by the way, since the default behavior with Penlight classes is to call the super constructor. But it’s nice to have the constructor already pasted in.)

So this is pretty straightforward! I have a program written out as calls to the constructors of these AST classes; I have a class heirarchy; and now I just need to fill in those render() methods and I’ll be able to generate an entire LLVM module by calling llvm() once. Everything I’ve done in this project so far has proceeded really quickly, so I sit down on a Sunday morning and hope I’ll be able to finish implementing the AST classes that day.

Two weeks later…

Slightly less improvisational compiler-stage design

Okay. What am I actually doing here?

I want to be able to specify programs as data structures in memory. I want this so I can design programming interfaces that construct those data structures. At the moment I want this so that I can implement my shadow DSL, but that’s not the long term goal. I don’t really care about the shadow language; it’s an experiment for a silly blog post. But I believe I’ll be making more complicated things later, and when I do so I’ll be using this AST framework. Honestly what I’m probably going to do with the AST framework before too long is just plug LPEG into it, at which point— despite my protestations— I will have literally written a compiler.

At the moment, when you "render" this AST into a program, it does so using LLVM. But I could imagine, at some point, making a version of the AST framework that uses the same interface but generates (for example) Javascript instead of LLVM IR. (You can generate Javascript from LLVM IR, but it’s pretty ugly Javascript since LLVM IR’s model is simpler than Javascript is.) So it would be good to design this so it can easily be adapted to support code generation models that don’t look like LLVM’s.

At the moment the design doesn’t actually support even LLVM. Let’s talk for a moment about what LLVM does.

When I call one of those LLVM build functions:

local two = llvm.constInt(int32Type, 2, false)
local three = llvm.constInt(int32Type, 3, false)

builder.buildAdd(two, three, "sum")

When I call builder.buildAdd, executing that call has a side effect. It’s writing a line of LLVM assembly into a ledger somewhere:

%sum = add i32 2, i32 3

Sometimes I use a value that a build function returns:

local value = builder.buildAdd(two, three, "sum")
builder.buildRet(value)

What does buildAdd return? Well, it returns %sum from the assembly snippet. This last bit of Lua is telling the builder to write in its ledger:

%sum = add i32 2, i32 3
ret i32 %sum

Since LLVM calls have side effects, it matters when you call LLVM functions. There are some things order matters for and some it doesn’t; you can construct a type or a function at any time, for example. LLVMConstInt likewise does not care about order. But it does matter the order you create blocks in, since the first block added to a function is always the entry. And it matters the order you build expressions/instructions in, because when you build an instruction its "name" gets added to the current block and if you write your instruction into the ledger for the wrong block you might not be able to reference it later.

So this means my first-attempt design for AstBase, where ast objects render their contents exactly once the first time you try to read them, isn’t very good. On my second attempt, render gets replaced with three functions: register(), render(), and emit().

In my second-attempt design, compiling happens in two passes. First, every object in the entire AST gets a register() call. This allows the AST to inspect itself and make sure things are correct, and if anything the AST does is time-sensitive it can use register() as an opportunity to note that down, so that in the second pass it knows to do things in the correct order. On the second pass, each object gets called with exactly one of emit() or render(). emit() and render() are actually the same thing, but render() is for things which return a value after emitting themselves and emit() is for things that don’t. When you emit() a module, there’s no point in returning the LLVM module object, but when you emit an add instruction you will tend to immediately need the LLVM object it returns.

The two passes are managed by the AstModule object, which is the top-level entry to everything else. It doesn’t have a register phase; to create a program you create an AstModule object and call module:emit(). AstModule‘s emit looks like:

-- Register all functions
for k,v in pairs(self.namedFns) do
    v.name = k
    v:register()
end

-- Emit all functions
for _,v in pairs(self.namedFns) do
    v:emit()
end

-- Process
backend:verify()
for _,v in pairs(self.namedFns) do
    backend:optimize(v.llvmObj)
end

-- Write
backend:debugDump()
backend:emitToFile()

What happens in the register phase? Well, functions register themselves with the module; function bodies build a list of their blocks and a list of their local variables; and expressions notify their function bodies which blocks and variables they use. Every leaf in the AST gets visited once while this is happening. If any particular node has references to other AstBase objects, then when it is register()ed it needs to call register() on those as well.

Here a bit of a problem comes up, and it requires me to make one more tweak to my model. If an AstExpr has to notify its AstBody about variables it uses, that means it needs to be able to find its AstBody. Actually, potentially, an expr might need to be able to get references to any one of (1) its module, (2) its function, (3) its body or (4) its block. I briefly consider passing all four of these down as arguments to every render() call, or maybe having each AstExpr keep a reference to all four. Both of these options sound miserable. I go with something that might not be as good software engineering, but works: I track a global "current" module, function, body and block. I give AstBase a function:

function AstBase:recur(label, fn)
    local prev = backend[label] -- Allow nesting
    backend[label] = self
    fn()
    backend[label] = prev
end

The generation process for modules and blocks nests nicely— each expression belongs to at most one block, each block belongs to at most one function, etc. When we call register() on an AstBody, we’ll be working with that body until the function completes. So we can use the shared global backend object to track the fact of, for example, which body we are currently working on. The "recur" function takes a function to execute, and is responsible for setting the tracked field before execution and then unsetting it afterward. (It saves and re-sets "prev" rather than just writing to nil at the end because later on I figured out that sometimes you might have to interrupt processing one block to look at another and then come back.) So for example an AstBody could call:

self:recur('currentBody', function()
    -- Any external functions called in this stretch of code
    -- will be able to find self at backend.currentBody
end)

Armed with these two tweaks— the two-pass tree traversal, and the ‘current’ tracking— my model works much better. There’s just one more idea I need to introduce, and then I’ll be able to show you the code. If you look at the AST sample above, you’ll notice I refer to variables and blocks by name. In practice once I start writing the render() functions it turns out to be a little bit awkward to look things up by name every time you use them. Also, it would be sort of cool if when I’m writing the AST structures I could "inline" small anonymous blocks. For example, toward the end of the full version of my sample AST I wound up writing:

AstExpr{op='branch',
    -- Condition: Verify that argc, the first argument to main(), is equal to 2
    cond=AstExpr{op='bin', op2='==', value={
        AstExpr{op='getParam',idx=0}, AstExpr{op='literal',value=2}
    }},

    -- Condition succeed block
    y=AstBlock{exprs={
        -- [[Snip]]
        -- Here I call printFactors, print a newline, and return 0
    }},

    -- Condition fail block
    n=AstBlock{exprs={
        AstExpr{op='call', fn='printf', args={
            AstExpr{op='literal',value="Please enter exactly one number as argument.\n"}
        }},
        AstExpr{op='return', value=AstExpr{op='literal', value=1}}
    }}
}

Instead of giving names to the blocks to the y and n branches of this branch statement, I just stuck AstBlock objects in there.

What I decide is that by the time emit()/render() is called, all "by name" references should be converted to plain object references— block names to AstBlocks, var names to a new AstVar structure. So for example later on when I implement register() for the branch-type AstExpr, it’s going to say something like:

for _,k in ipairs({"y", "n"}) do
    if self.spec[k] then
        self.spec[k] = backend.currentBody:registerBlock(self.spec[k])
    end
end

registerBlock() will be a "sanitizer" function that takes either an AstBlock or a name; if it’s an AstBlock it will just return it, if it’s a name it will look up the AstBlock for that name and return it. AstExpr:register() is expected to sweep through and replace any block references the expr has with references to whatever registerBlock() returns.

Now, at this point some of you might point out that once I perform this operation, the "AST" is no longer properly a "tree" because the references can reconverge and potentially even loop. WELL MAYBE IT IS SOME KIND OF FREAKY MUTANT LARRY NIVEN SPACE TREE AND YOU SHOULD JUST DEAL.

ast.lua

I am now finally ready to write some code. First off, here’s my AstBase class:

-- Base class for all AST objects
class.AstBase()
    function AstBase:_init(spec)
        self.spec = spec
    end

    function AstBase:llvm() -- Generate then return llvmObj
        if not self.llvmObj then
            self.llvmObj = self:render()
        end
        return self.llvmObj
    end

    function AstBase:render() -- Some objects do not support llvm()
        error("Could not render this object. You may need to call emit() first")
    end

    function AstBase:recur(label, fn) -- Temporarily register with backend and run function
        local prev = backend[label] -- Allow nesting
        backend[label] = self
        fn()
        backend[label] = prev
    end

All this code has already appeared in this post— the only new thing is the error-throwing default implementation for render(). Next comes the first of my actual AST object classes:

-- Representation for a module (ie, an object file). Call emit() to create program
class.AstModule(AstBase)
    function AstModule:_init(spec)
        self:super(spec)
        self.namedFns = self.spec.namedFns or {}
        self.spec.namedFns = nil
    end

    function AstModule:recur(fn) AstBase.recur(self, 'currentModule', fn) end

I give the individual classes personalized versions of recur() which have their "name" baked in. Here’s AstModule:emit() again, in its final form:

function AstModule:emit()
     -- Set a global for the duration of this function
    if backend then error("AstModule:emit called, but backend already exists") end
    backend = Backend("main")
    
    self:recur(function()
        -- Register all functions
        for k,v in pairs(self.namedFns) do
            v.name = k
            v:register()
        end

        -- Emit all functions
        for _,v in pairs(self.namedFns) do
            v:emit()
        end

        -- Process
        backend:verify()
        for _,v in pairs(self.namedFns) do
            backend:optimize(v.llvmObj)
        end

        -- Write
        backend:debugDump()
        backend:emitToFile()
    end)
    
    self.backend = backend
    backend = nil
end

Where most of the functions are only responsible for registering themselves with the global backend object, AstModule has to register the backend object itself. One more thing and then we’re done with AstModule:

-- Lookup/sanitization for functions
function AstModule:fn(name)
    -- Note it is assumed ALL modules are registered manually before emission starts
    if type(name) == 'string' then
        return self.namedFns[name]
    else
        return name -- Assume this is a function object already
    end
end

This is one of those "sanitizer" functions I mentioned, since the module object is responsible for managing functions.

AstFn is pretty simple:

-- Representation for a function
class.AstFn(AstBase)
    function AstFn:_init(spec) self:super(spec) end

    function AstFn:recur(fn) AstBase.recur(self, 'currentFn', fn) end

    function AstFn:register()
        if not self.llvmObj then
            self.llvmObj = backend.module.addFunction(self.name, self:type())
            if self.spec.body then
                self:recur(function() self.spec.body:register() end)
            end
        end
    end

    function AstFn:emit()
        if self.spec.body then
            self:recur(function() self.spec.body:emit() end)
        end
    end

Other than registering the function with the module during register(), AstFn pretty much just defers to its body object. The only real complexity is in the helper that calculates the function’s type in LLVM terms:

-- Fetch the LLVMFunctionType for this function
function AstFn:type()
    if not self.llvmType then
        local args = {}
        for i,v in ipairs(self.spec.args or {}) do
            table.insert(args, v:llvm())
        end
        self.llvmType = llvm.functionType(self.spec.ret and self.spec.ret:llvm() or llvm.voidType(),
            typeArray(args), #args, self.spec.vararg and true or false)
    end
    return self.llvmType
end

ret and args are AstType objects, so we need to call llvm() on them to get the representations LLVM consumes. Moving on to AstBody:

-- Representation for a function body
class.AstBody(AstBase)
    function AstBody:_init(spec)
        self:super(spec)
        self.spec.namedBlocks = self.spec.namedBlocks or {}
        self.anonBlocks = {}
        self.namedVars = {}
    end

    function AstBody:recur(fn) AstBase.recur(self, 'currentBody', fn) end

Notice the "or" statement specifying a default for namedBlocks if one isn’t in the spec already. (This implies a slightly wierd wrinkle on my convention of storing some fields inside of "spec"; the AST objects can mutate the spec objects, so you’d better not pass a single spec structure to multiple constructors.)

function AstBody:register()
    self:recur(function()
        self.llvmAllocaBlock = llvm.appendBasicBlock(backend.currentFn.llvmObj, "")

        for k,v in pairs(self.spec.namedBlocks) do
            v.name = k
            v:register()
        end

        if not self.spec.entry then error("Block without entry") end
        self.entry = self:registerBlock(self.spec.entry)
    end)
end

The body’s register() phase is mainly concerned with its blocks. Notice two blocks are special; the "alloca block" is the entry block where I register variables, so it must be registered with the function before anything else happens. The entry block is the block the alloca block transfers control to when it’s done. entry is mandatory for an AstBody spec, but it can be either a block object or a name, so registerBlock() is used.

-- Lookup/sanitization for a block during register phase. Accepts either a name or a block object.
function AstBody:registerBlock(b)
    if type(b) == 'string' then return self.spec.namedBlocks[b] end
    b:register()
    return b
end

-- Lookup/sanitization for a var during register phase. Accepts either a name or a var object.
function AstBody:registerVar(v, create)
    if type(v) == 'string' then
        if self.namedVars[v] then
            v = self.namedVars[v]
        else
            local name = v
            v = AstVar{}
            v.name = name
        end
    end
    if create then v.valid = true end
    v:register()
    return v
end

Here’s those sanitize functions I mentioned. Two things to notice: The work of actually registering is passed off to the register implementations for AstBlock and AstVar; also, registerVar takes a "create" argument which discriminates whether we are writing to or reading from a function. If a variable is ever read from in a function where it’s never written to, I’ll want that to be an error.

function AstBody:emit()
    self:recur(function()
        -- Instantiate vars
        backend.builder.positionBuilderAtEnd(self.llvmAllocaBlock)

        local function alloca(v)
            v.llvmObj = backend.builder.buildAlloca(int32Type, v.name or "anon")
        end

        for _,v in pairs(self.namedVars) do alloca(v) end
        for _,v in ipairs(self.anonVars) do alloca(v) end

        backend.builder.buildBr(self.entry.llvmObj)

        -- Run blocks
        for k,v in pairs(self.spec.namedBlocks) do
            v:emit()
        end
        for _,v in ipairs(self.anonBlocks) do
            v:emit()
        end
    end)
end

In its emission phase, a body generates the "alloca" block itself— again consisting only of the alloca instructions that reserve room for its variables on the stack, followed by a jump into the entry block— and then directs its blocks to emit themselves in no particular order. Notice that with both vars and blocks, there’s a split between "named" and "anonymous" items; "anonymous" blocks or vars wind up being those AstBlocks or AstVars that just sort of get discovered while traversing the AST without ever being assigned to a name. You can see this in action in AstBlock:register():

-- Representation for a block within a function
class.AstBlock(AstBase)
    function AstBlock:_init(spec)
        self:super(spec)
        self.spec.exprs = self.spec.exprs or {}
    end

    function AstBlock:register()
        if not self.llvmObj then
            if not self.name then table.insert(backend.currentBody.anonBlocks, self) end
            self.llvmObj = llvm.appendBasicBlock(backend.currentFn.llvmObj, self.name or "anon")
            for _,v in ipairs(self.spec.exprs) do
                v:register()
            end
            if self.spec.exit then
                self.exit = backend.currentBody:registerBlock(self.spec.exit)
            end
        end
    end

Named blocks are already present in namedBlocks, in the AstBody‘s spec, but blocks without names are apparently being found somewhere in the tree and need to be added to anonBlocks when they’re discovered. Notice something a little bit subtle here: since blocks can point at each other through AstExprs that branch, register() on a single block might get called more than once, so the method has to detect it’s already been run and silently return in that case. Once again, technically my "Abstract Syntax Tree" is more of a directed graph and the register() process is doing a DFS on that graph.

function AstBlock:emit()
    backend.builder.positionBuilderAtEnd(self.llvmObj)
    for i,v in ipairs(self.spec.exprs) do
        v:llvm()
    end
    if self.exit then
        backend.builder.buildBr(self.exit.llvmObj)
    end
end

Emission phase for a block is simple. The expressions (instructions) get rendered, and then as a convenience the AstBlock emits the jump out itself.

AstType and AstVar are also simple:

-- Representation for a variable type
class.AstType(AstBase)
    function AstType:_init(spec) self:super(spec) end

    function AstType:render()
        if self.spec.literal then return self.spec.literal end
        if self.spec.ptr then
            return llvm.pointerType(self.spec.ptr:llvm(), 0)
        end
        return llvm.voidType()
    end

-- Representation for a variable within a function
class.AstVar(AstBase)
    function AstType:_init(spec) self:super(spec) end

    function AstVar:register()
        if self.registered then return end
        self.registered = true
        if self.name then
            backend.currentBody.namedVars[self.name] = self
        else
            table.insert(backend.currentBody.anonVars, self)
        end
    end

AstType I’ve actually inlined in this post already, and AstVar is pretty much the same registration behavior as AstBlock (with the difference that the variable names aren’t already known at the time the register phase starts).

The final class, AstExpr, is the most complicated one by far. Here’s its register phase:

-- Representation for an expression or statement within a function
class.AstExpr(AstBase)
    function AstExpr:_init(spec) self:super(spec) end

    function AstExpr:register()
        if self.registered then return end
        self.registered = true

        local function registerValue()
            self.spec.value:register()
        end

        local switch = {
            bin = function()
                self.spec.value[1]:register()
                self.spec.value[2]:register()
            end,
            ['return'] = function()
                if self.spec.value then registerValue() end
            end,
            store = function()
                self.var = backend.currentBody:registerVar(self.spec.var, true)
                registerValue()
            end,
            load = function()
                self.var = backend.currentBody:registerVar(self.spec.var)
            end,
            deref = function()
                registerValue()
                if self.spec.idx then self.spec.idx:register() end
            end,
            branch = function()
                if self.spec.cond then self.spec.cond:register() end
                for _,k in ipairs({"y", "n"}) do
                    if self.spec[k] then
                        self.spec[k] = backend.currentBody:registerBlock(self.spec[k])
                    end
                end
                if not self.spec.y then error("Branch lacks a destination") end
                if self.spec.cond and not self.spec.n then
                    error("Conditional branch lacks a \"no\"")
                end
            end,
            call = function()
                for _,v in ipairs(self.spec.args) do
                    v:register()
                end
            end
        }
        local fn = switch[self.spec.op]
        if fn then fn() end
    end

AstExpr is complicated because it’s actually a bunch of different things. It has an op field which indicates what, exactly, this expression is doing. Someone more committed to OOP than me might have just made a different AstExpr for each op, but I just use a switch statement. Lua doesn’t have switch statements. I fake it with a table full of closures.

Other than a tiny amount of error checking, the register phase for an AstExpr is concerned only with furthering the DFS on the directed graph that is the "AST". Expressions, depending on what their op is, might reference a number of other AstExprs, AstVars or AstBlocks. Each of those needs to get their register() called.

AstExpr‘s render() is another big switch statement, so let’s take it a little bit at a time.

function AstExpr:render()
    local function literal(v)
        local t = type(v)
        if t == 'number' then -- Accept unwrapped numbers and strings
            return llvm.constInt(int32Type, v, false)
        elseif t == 'string' then
            return backend.builder.buildGlobalStringPtr(v, "")
        else     -- Otherwise assume we've been passed an LLVM object
            return v
        end
    end

    local switch = { -- Lookup table on op
        literal = function()
            return literal(self.spec.value)
        end,

The literal op, used to represent a inlined value in the AST, has its own little helper function. I don’t think I had a particularly good reason why.

bin = function() -- Binary expression
    local a = self.spec.value[1]:llvm()
    local b = self.spec.value[2]:llvm()
    local function build(kind) return function()
        return backend.builder['build'..capitalize(kind)](a, b, "")
    end end
    local function iCmp(kind) return function()
        return backend.builder.buildICmp(llvm[kind], a, b, "")
    end end
    local switch2 = {
        ['>'] = iCmp('intSGT'),  ['<'] = iCmp('intSLT'),
        ['>='] = iCmp('intSGE'), ['<='] = iCmp('intSLE'),
        ['=='] = iCmp('intEQ'),  ['!='] = iCmp('intNE'),
        ['+'] = build('add'),    ['-'] = build('sub'),
        ['*'] = build('mul'),    ['/'] = build('sDiv'),
        ['%'] = build('sRem')
    }
    return switch2[self.spec.op2]()
end,

bin has an op2 and its own inner switch. A fun thing about building a switch from a table of functions is that I can build the switch cases themselves using higher-order functions! This results in an incredibly clean looking switch statement at the cost of two nearly inscrutable helper functions. FUNCTIONAL PROGRAMMING.

Most of the ops there’s not much to say about— here, at last, I’m only doing a straightforward conversion from spec members to LLVM builder calls—

['return'] = function()
    if self.spec.value then
        local value = self.spec.value:llvm()
        backend.builder.buildRet(value)
    else
        return backend.builder.buildRetVoid()
    end
end,
getParam = function()
    return llvm.getParam(backend.currentFn.llvmObj, self.spec.idx)
end,
call = function()
    if not self.spec.fn then error("Function not found "..self.spec.fn) end
    local llvmArgs = {}                    -- map :llvm() on self.spec.args
    for _,v in ipairs(self.spec.args) do
        table.insert(llvmArgs, v:llvm())
    end
    local fn = backend.currentModule:fn(self.spec.fn)
    return backend.builder.buildCall(fn.llvmObj, valueArray(llvmArgs), #llvmArgs, "")
end,

—but a couple things stand out.

store, load and branch wind up accessing the .llvmObj of the blocks and variables they reference directly, since blocks and variables don’t have a render() and are required to have been constructed before the code reaches this point:

store = function()
    return backend.builder.buildStore( self.spec.value:llvm(), self.var.llvmObj )
end,
load = function()
    return backend.builder.buildLoad( self.var.llvmObj, "" )
end,
branch = function()
    if self.spec.cond then
        return backend.builder.buildCondBr(self.spec.cond:llvm(), self.spec.y.llvmObj, self.spec.n.llvmObj)
    else
        return backend.builder.buildBr(self.spec.y.llvmObj)
    end
end,

Finally, deref combines the slightly confusing LLVM getelementptr instruction and a load into a single operation, which is slightly more apt for my uses:

        deref = function() -- TODO: Allow idx to be a list
            local ptr = self.spec.value:llvm()
            if self.spec.idx then
                ptr = backend.builder.buildGEP(ptr, valueArray {self.spec.idx:llvm()}, 1, "")
            end
            return backend.builder.buildLoad(ptr, "")
        end
    }
    return switch[self.spec.op]()
end

…and that’s it! I run my printFactors-as-an-AST sample file:

luajit tests/printFactorsAst.lua && cc -o main main.o && ./main 543

And the result is 3 181.

It works! The AST version of the printFactors program is a little shorter than the version where I write out all the LLVM-C calls by hand (144 lines vs. 99), but it requires this 370 line AST library so maybe that’s a bit of a wash. Whatever. Here’s the code so far.

This two-week detour completed, I finally get to start on the shadow language.

shadow.lua

The DSL framework’s external interface will be two functions: shadowFn, which returns an AstFn, and shadowModule, which returns an AstModule. This means I could potentially mix functions defined with shadowFn into a module defined with the AST interface and vice versa.

shadowModule takes the name of the module, and the "construction function" for the module. Remember, a "construction function" is where the shadow language is written— it’s a function with no arguments which I’m planning to inject the shadow-language symbols into. shadowModule winds up being pretty simple. All it has to do is build the list of injected symbols and then call the construction function:

-- Create an AstModule given a name and a construction function
function shadowModule(name, gen)
    local module = AstModule{name=name}
    if gen then
        local env = {
            fn = emptyWithMetatable {
                __newindex = function(table, key, value)
                    module.namedFns[key] = value
                end,
                __call = function(func, ...)
                    return shadowFunction(...)
                end
            },
            type = shadowType
        }
        execWithEnv(gen, env)
    end
    return module
end

…and at the stage where you’re only defining modules, it doesn’t have much to inject. The only injected symbols in the module construction function are fn and type. Let’s consider fn first. You can only do two things with fn: assign to it to define a new function (__newindex), or call it to make an AstFn (__call). Since fn has the same interface as shadowFn I can just re-call shadowFn, and since the return value of shadowFn is already an AstFn I can just stick the return directly into the AstModule I’m building. type I’ll come back to in a moment.

execWithEnv isn’t something that exists in Lua, I had to build it:

-- Takes a function and a table, and executes fn with env as its _G.
-- Resets everything at completion, but does not use env's metatable during execution.
function execWithEnv(fn, env)
    local envOldMetatable = getmetatable(env)
    local fnOldEnv = getfenv(fn)

    -- Make fn's original env available as a parent scope
    local envMetatable = {__index=fnOldEnv}
    if (envOldMetatable) then setmetatable(env, nil) end -- Double-setting metatable not allowed
    setmetatable(env, envMetatable)

    -- Run fn
    setfenv(fn, env)
    fn()

    -- Restore original state
    setfenv(fn, fnOldEnv)
    setmetatable(env, envOldMetatable)
end

setfenv is a pretty blunt instrument; it doesn’t preserve the original scope of the function you’re mutating, and the effects on the function it acts on are permanent. execWithEnv is a bit easier to work with, but making it that way takes some work.

Anyway, type. This is where things start to get interesting. Wiring the fn container in the module construction scope was simple because its inputs and outputs were just the AstFn objects from earlier. But the members of type are going to be shadow objects— magical semantic things loaded up with however much metatable trickery I need to get the behavior I want in my DSL.

What’s the behavior I want? Looking back at the original "printFactors" sample, I fetched some int types (int8 and int32) from this type container, and I used a .ptr field on the returned types to get pointer versions. So the only special behavior I need on a type object is that .ptr field:

-- Convert AstType to shadow type
local function shadowTypeMake(v)
    local shadowTypeMetatable = {__index=cacheAndReturn(function(type, key)
        if key == 'ptr' then
            return shadowTypeMake(AstType{ptr=type.__astObject})
        end
        return nil
    end)}

    local t = {__astObject=v}
    setmetatable(t, shadowTypeMetatable)
    return t
end

A single shadow type has its AstType hidden inside as __astObject, and the ptr field is wired up to create and cache the pointer version on demand. So that’s a type object; now I need to make the type container.

-- Dispenser for shadow types, currently contains only int[number]s
shadowType = {
    fromAst = shadowTypeMake,
    vararg = {} -- Arbitrary object for comparing by reference
}

(shadowType.vararg is what you pass in as the final argument type if you have a function, like printf, which uses C vararg.)

shadowType has these two fixed members, and then a few more members are generated by the metatable:

setmetatable(shadowType, {__index=cacheAndReturn(function(_, k)
    local llvmType = nil

    -- Automatically look up int constructors in LLVM headers
    local _,_,intmatch = string.find(k, '^int(%d+)$') -- # of bits
    if intmatch then
        llvmType = llvm['int'..intmatch..'Type']()
    end

    if llvmType then
        return shadowTypeMake(AstType{literal=llvmType})
    end

    return nil
end)})

This is completely gratuitous. I need to make available six int types (1, 8, 16, 32, 64 and 128 bits); I could have just taken six lines to populate six members of shadowType, but instead I do eight lines of metaprogramming to make those six objects be generated on demand. Look, I write C all day at work. Let me have my fun.

Modules and types out of the way, it’s time to take on functions. A first pass at shadowFunction could look something like this:

-- Create an AstFn given a return shadow type, a list of arg shadow types, and a construction function
function shadowFunction(ret, args, gen)
    local astArgs = {}
    local vararg = false

    for _,v in ipairs(args) do
        if shadowType.vararg == v then -- Not a real argument, just denotes this fn is vararg
            vararg = true
        else
            table.insert(astArgs, v.__astObject)
        end
    end

    local fn = AstFn{ret=ret.__astObject, args=astArgs, vararg=vararg}
    if gen then
        -- TODO: RUN CONSTRUCTION FUNCTION
    end
    return fn
end

In this first pass I don’t actually call the construction function or create a function body— I just calculate the fn’s argument and return types and register the function as existing. (The types are accepted here as shadow objects, so "calculating" the types just means fetching the .__astObject fields out of the shadow types, then checking to see if the argument list ends with the magic vararg value.)

With exactly this code I’ve got so far and without filling in that TODO, I’ve got code that can generate a module, generate a function with no body, and generate types. This is actually enough to do a test (even a test that executes, since I can inject functions with bodies if I use AstFn). I decide to set a simple test up:

require("lib-lua/simple")
require("lib-lua/practice/shadow")

shadowModule("main", function()
    fn.printf = fn(type.int32, {type.int8.ptr, type.vararg})
    
    fn.atoi = fn(type.int32, {type.int8.ptr})
end):emit()

And this gives me a working module!

; ModuleID = 'main'

declare i32 @printf(i8*, ...)

declare i32 @atoi(i8*)

…or, well, it gives me a working module once I fix five or eight typos, learn that if you don’t return nil at the end of a metatable __index function the default behavior is not what you expect, and then realize that I forgot to implement the vararg feature and go back and rewrite the last few paragraphs of this post to include it. Anyway! I’ve proven the shadow-language concept works.

Here’s the code so far.

Final lap

No more scaffolding. On the first post in this series, I daydreamed about writing a DSL in idiomatic Lua; now all I have to do to get that working is implement that TODO: RUN CONSTRUCTION FUNCTION from the last code stretch.

I decide to implement this one feature at a time, and write a small test for each feature I add. The simplest function body I could support here is:

require("lib-lua/simple")
require("lib-lua/practice/shadow")

shadowModule("main", function()
    fn.main = fn(int32, {type.int32, type.int8.ptr.ptr}, function()
        go._return(3)
    end)
end):emit()

This snippet is a little different from the syntax I was using before. Something that didn’t occur to me when I wrote the original "printFactors" sample is that do, while, if and return are pure reserved words in Lua— they’re not even legal names for fields. To work around this I renamed do to go, and for while/if/return, I… couldn’t decide what to do, so I tacked on underscores for now.

TODO: RUN CONSTRUCTION FUNCTION now becomes:

local function nextBlock() return AstBlock{exprs={}} end
local block = nextBlock()
fn.spec.body = AstBody{entry = block}

-- Given a Lua value, return an AstExpr
local function literal(v) return AstExpr{op='literal', value=v} end

-- Given either a Lua value or an AstExpr, return an AstExpr
local function literalFilter(v)
    if astLiteralAllowed(v) then return literal(v) else return v end
end

execWithEnv(gen, {
    go = {
        _return = function(v)
            table.insert(block.spec.exprs, AstExpr{op='return', value=literalFilter(v)})
        end
    }
})

When a construction function is present, I create a body and a block for the fn being built, and then I run the construction function in execWithEnv like I did with modules. Right now the only thing I inject in the construction environment for fns is go, and all it needs to contain is _return. All _return does when you invoke it— all basically any line of "code" in the construction functions is going to do— is append an AstExpr object representing some instruction (in this case return) to the current block.

The literalFilter() helper is something I need because things lke _return() consume AST objects, but the argument I pass in might instead be a native Lua value like "3". I need to figure out what I have and convert it to an AST, and I delegate the task of figuring out what I have to a new function in ast.lua:

function astLiteralAllowed(v)
    local t = type(v)
    return t == 'number' or t == 'string' or t == 'cdata'
end

I assume native Lua values are numbers, strings, or LLVM objects (which are easy to identify because Lua knows they came from C).

This simple test program doesn’t output anything, but I can treat its UNIX return value as "output" and verify it works:

$ (luajit tests/returnValueShadow.lua && cc -o main main.o && ./main) 2> /dev/null
$ echo $?
3

My next test program, I add in variables:

shadowModule("main", function()
    fn.main = fn(type.int32, {type.int32, type.int8.ptr.ptr}, function()
        var.x = 5
        var.x = 6
        go._return(var.x)
    end)
end):emit()

To make this test work this I inject var into the construction environment:

var = emptyWithMetatable({
    __index = function(_, key)
        return AstExpr{op='load', var=key}
    end,
    __newindex = function(_, key, value)
        table.insert(block.spec.exprs, AstExpr{op='store', var=key, value=literalFilter(value)})
    end
}),

When you write a value into var (__newindex) it writes a store instruction as an AstExpr to the end of the current block’s expr list; when you read a value it writes nothing to the expr list, but returns a load AST that a different, value-writing function (in this case _return) might use.

So far pretty simple. In the next test I add in arithmetic:

shadowModule("main", function()
    fn.main = fn(type.int32, {type.int32, type.int8.ptr.ptr}, function()
        var.x = 5
        var.y = var.x + 4
        go._return(var.x + var.y)
    end)
end):emit()

Suddenly things are less simple.

I’ve been vaguely toying with this idea of "shadow" versions of values— for example, the "shadow type" objects I created earlier— which if you remember are tables with a hidden __astObject inside, and a tricked-out metatable to allow manipulating that AST object with Lua syntax. For + to work on the value returned by var.x, it’s going to need to be a shadow value. I’ll create that shadow value with shadowExprMake():

-- Convert AstExpr to shadow expr
local function shadowExprMake(v)
    local function binop(op2)
        return function(l, r)
            return shadowExprMake( AstExpr{op='bin', op2=op2, value={literalFilter(l), literalFilter(r)}} )
        end
    end
    local shadowExprMetatable = {
        __add = binop('+'), __sub = binop('-'), __mul = binop('*'), __div = binop('/'),
        __mod = binop('%'), __eq = binop('=='), __lt = binop('<'), __le = binop('<=')
    }

    local t = {__astObject=v}
    setmetatable(t, shadowExprMetatable)
    return t
end

Do you follow all the levels of indirection involved here? Calling shadowExprMake() constructs a metatable where __add is a function created by a higher order function binop(). When that __add function is invoked, it will "recursively" call shadowExprMake() to create a new shadow object which corresponds to whatever performing the + operation on the targets would logically result in. The code is at once compact, elegant and extremely difficult to follow!

I have to change the existing code for this to work: Before, _return, and the = operation on var, were expecting AST objects, and var.varname returned an AST object. But now I’m going to be working with shadow exprs instead of AstExprs directly, so I’m going to need to change var.varname to return shadow exprs:

__index = function(_, key)
    return shadowExprMake(AstExpr{op='load', var=key})
end

And I have to change literalFilter— which again, _return and operator-= on var filter everything they use through— to assume non-literal values are shadow exprs:

-- Given either a Lua value or shadow value, return an AstExpr
local function filterToAst(v)
    if astLiteralAllowed(v) then return literal(v) else return v.__astObject end
end

I rename literalFilter while I’m at it to something a little less vague.

Next feature is argument parameters:

shadowModule("main", function()
    fn.main = fn(type.int32, {type.int32, type.int8.ptr.ptr}, function(argc,argv)
        var.x = argc        -- IE number of command-line arguments passed in
        var.y = var.x + 4
        go._return(var.x + var.y)
    end)
end):emit()

If a function takes arguments, the construction function will be passed in shadow-exprs for those arguments as the arguments to the construction function itself. This is pretty easy to implement because my AST already has a construct for "fetch the Nth parameter to the current function". I can modify shadowFunction, before it invokes the construction function, to create a table with one such AST object for each argument:

local genArgs = {}
for i=1,#args do
    table.insert(genArgs, shadowExprMake(AstExpr{op='getParam',idx=i-1}))
end

Then I can pass that in to my execWithArgs function, where it gets unpacked into the construction function invocation using weird Lua magic:

function execWithEnv(fn, env, args)
    -- [snip]

    -- Run fn
    setfenv(fn, env)
    fn(unpack(args or {}))

Shadow language code can now accept arguments when it defines a function. Next step is to allow the code to call functions. The test program for that would be:

shadowModule("main", function()
    fn.atoi = fn(typ.int32, {typ.int8.ptr})

    fn.main = fn(typ.int32, {typ.int32, typ.int8.ptr.ptr}, function(argc,argv)
        var.x = fn.atoi("3")
        go._return(var.x + 1)
    end)
end):emit()

Somewhere at about this point I realized that one of the injected symbols in my shadow language is type, which isn’t a reserved word in Lua, but it is the name of a frightfully useful builtin function, so I renamed the type container to typ.

Anyway, to make this work I inject an fn into the construction environment:

fn = emptyWithMetatable({
    __index = function(_, key)
        return emptyWithMetatable({
            __call = function(_, ...)
                local args = {...}
                for i=1,#args do
                    args[i] = filterToAst(args[i])
                end
                return shadowExprMake(AstExpr{op='call', fn=key, args=args})
            end
        })
    end
}),

I already injected an fn in the environment for module construction functions, but the fn for function construction functions (…ugh, did I just write that?) is separate and behaves differently. With the new fn you can look up a key on it and it returns a shadow object which will imitate a function you can call.

This fn implementation makes the test work, but it isn’t enough. As described above, I need two versions of fn: the expression form I just implemented, and a statement form (go.fn). Once again, this is because the expression form only creates an AST and does not actually add that AST to the current block.

To make this work I unpack the meat of my initial attempt at fn into a higher-order function:

local function makeFnInvoke(key)
    return function(...)
        local args = {...}
        for i=1,#args do
            args[i] = filterToAst(args[i])
        end
        return shadowExprMake(AstExpr{op='call', fn=key, args=args})
    end
end

Once I’ve done this fn looks like:

_fn = emptyWithMetatable({
    __index = function(_, key)
        return makeFnInvoke(key)
    end
}),

In other words, makeFnInvoke takes a key and returns a callable entity that creates an AST for the given function using the args passed into it. I can also use this to make the go.fn:

fn = emptyWithMetatable({
    __index = function(_, key)
        local fn = makeFnInvoke(key)
        return function(...)
            table.insert(block.spec.exprs, fn(...).__astObject)
        end
    end
}),

fn(...).__astObject. I’m just going to sit here and savor that for a moment. …and also feel a bit guilty.

Functions are done. Next feature is indexing arrays. The test program for arrays runs its first command line argument through atoi and returns it as a UNIX status code:

shadowModule("main", function()
    fn.atoi = fn(typ.int32, {typ.int8.ptr})

    fn.main = fn(typ.int32, {typ.int32, typ.int8.ptr.ptr}, function(argc,argv)
        var.x = fn.atoi( argv[1] )
        go._return(var.x + 10)
    end)
end):emit()

This, too, is very simple to implement. Most of these features have been very simple to implement. This one literally only requires three lines, a single field added to the shadow expr metatable:

__index = function(v, key)
    return shadowExprMake( AstExpr{op='deref', value=v.__astObject, idx=literalFilter(key)})
end

v is a shadow object; indexing it creates a shadow object that indexes the value inside it. Easy.

At about this point, things stop getting easy.

There’s only two features left to implement: if and while. These get interesting because they create not just values, but also blocks. Here’s my test for the if statement:

shadowModule("main", function()
    fn.printf = fn(typ.int32, {typ.int8.ptr, typ.vararg})

    fn.main = fn(typ.int32, {typ.int32, typ.int8.ptr.ptr}, function(argc,argv)
        go._if(argc >= 3,
            function()
                go.fn.printf("Argc %d less than 3\n", argc)
            end,
            function()
                go.fn.printf("Argc %d greater or equal to 3\n", argc)
            end)
        go.fn.printf("Done\n")
        go._return(0)
    end)
end):emit()

Up until now, there have only been construction functions in two places: Module construction when invoking shadowModule, and fn construction when invoking fn. But now we have construction functions nested inside the construction function of an single fn, constructing blocks within that fn.

Implementing that means a little bit of restructuring: For starters, I need to take my huge execWithEnv invocation I’ve been building and put it in a local function:

local function build(gen, genArgs)
    execWithEnv(gen, {
        -- All injected environment for an fn construction function goes here
    }, genArgs)
end

build(gen, genArgs)

This means if I’m inside one of the closures injected into the environment— for example, the one implementing _if— I will be able to recursively call build() from there if I want to invoke a construction function.

Before I can move on, I hit a really unpleasant oddity. I overloaded all of the normal operations you can perform on a table— +. -, >, ==. The idea of my overloads is that instead of performing whatever calculation the operator would normally describe, the overloaded version returns an AST describing the calculation. For + and - this works great. But the first time I try to run code using a comparison— the >= in the sample— it fails. The Lua interpreter claims that >= cannot be performed on a table.

It turns out, looking in the documentation, that comparison operators are different from other operators in Lua. First off, there are restrictions on when they even get invoked. There’s a semi-complicated algorithm Lua follows to decide whether to invoke a metatable comparison function and if so which one to invoke. One of the steps in the algorithm is that it only runs the comparison if the two tables have the same metatable. Here there’s an oddness. All of my shadow objects have the same metatable functionally. But they’re not the same in the VM memory, because I create a new copy of the metatable for every single shadow object I create. I rewrite that shadowExprMake so that only one metatable exists:

-- Go far out of way to define shadowExprMetatable outside closure so comparison operators work
local function binop(op2)
    return function(l, r)
        return shadowExprMake( AstExpr{op='bin', op2=op2, value={filterToAst(l), filterToAst(r)}} )
    end
end
local shadowExprMetatable = {
    -- Metatable members here
}

-- Convert AstType to shadow expr
function shadowExprMake(v)
    local t = {__astObject=v}
    setmetatable(t, shadowExprMetatable)
    return t
end

It still doesn’t work; my metatable comparison functions are now being invoked, but somehow they’re always evaluating to true. It turns out that apparently comparison operators in Lua coerce their return values to true and false. This doesn’t seem to be in the documentation. It’s just something that Lua and LuaJIT both do (a nice person on Twitter actually dug up the Lua 5 source code and confirmed this). I optimized my shadowExprMake() function for nothing!

Although the language forcing comparison operators to always return true or false probably fits most users’ expectations, it kind of kills my weird overloading idea. I’m forced to inject the comparison operators into the namespace as functions. The fn-construction context picks up a new symbol get:

get = {
    literal = filterToAst,
    lt = binop('<'), gt = binop('>'), lte = binop('<='), gte = binop('>=')
},

With that cleared up, I’m free to go ahead and implement the go._if function. This one’s complicated, so I’m going to go over it a little bit at a time.

_if = function(cond, y, n)
    local branchExpr = AstExpr{op='branch', cond=literalFilter(cond)}
    table.insert(block.spec.exprs, branchExpr)

We’ve been working with an existing "block"; we want to append to its end a conditional branch instruction. The conditional branch instruction I defined here has a condition, but it’s missing its y and n blocks; I’m not done editing it.

    local afterBlock = nextBlock()
    local yBlock = nextBlock()

This code creates a block for the code for the if’s "success" path, and a block for flow control to resume from after the if statement is completed.

    block = yBlock
    build(y)
    block.spec.exit = afterBlock

Notice something a bit weird: build is a local function, closure. It makes reference throughout to a variable block which is local to the shadowFunction() invocation that build exists inside. This makes block effectively a bit of shared state for one full run through shadowFunction(). So in this code I set block to a new value immediately before calling build(y); this means the y construction function will write its expressions into the block I specified.

After running, the y-block dumps into the afterblock.

    local nBlock = afterBlock
    if n then
        nBlock = nextBlock()
        block = nBlock
        build(n)
        block.spec.exit = afterBlock
    end

What’s happening in this code is that the "no" path is optional, so if the user specified an n construction function I want to branch to a new block for that, but if the user did not then a "no" result branches to the afterblock.

    branchExpr.spec.y = yBlock   branchExpr.spec.n = nBlock
    block = afterBlock
end

I have now created both the y and n blocks, so I can add links to them in the original branchExpr. The last thing I do before the function ends is set the shared block to the afterblock, so any statements post-_if will be written into that.

So that’s go._if. The go._while implementation— which I do not bother writing a test case for— is much similar:

_while = function(cond, loop)
    local branchBlock = nextBlock()
    local loopBlock = nextBlock()
    local afterBlock = nextBlock()

    block.spec.exit = branchBlock

    local branchExpr = AstExpr{op='branch', cond=filterToAst(cond), y=loopBlock, n=afterBlock}
    table.insert(branchBlock.spec.exprs, branchExpr)

    block = loopBlock
    build(loop)
    block.spec.exit = branchBlock

    block = afterBlock
end

This sets the current block to immediately enter into a "branch block"; the branch block contains only a conditional branch instruction, which jumps either to the loop body or to the block which follows the _while. Pretty straightforward.

I don’t write a test case for _while because at this point, I have enough features working together I can just use my original sample program— the printFactors program— as the test case! I pull it out again and try to run it. I run into three problems.

Problem #1: It’s ugly? It’s just. It’s really ugly. My attempts to avoid Lua keywords have left the syntax of the shadow language inconsistent. Some fields have underscores before them, some don’t. One symbol (typ) is misspelled rather than using an underscore, another switched to a slightly weird synonym (go). If I were writing this I’d never be able to remember what to use where.

I decide to change my symbol naming rules: If a field is "injected"— if it is part of the shadow language rather than being part of your computer program— it has an underscore before it. Period. This lets me avoid intentional misspelling, and it makes my syntax consistent. With the new rules, my "printFactors" program looks like this:

require("lib-lua/simple")
require("lib-lua/practice/shadow")

shadowModule("main", function()
    _fn.printf = _fn(_type.int32, {_type.int8.ptr, _type.vararg})
    
    _fn.atoi = _fn(_type.int32, {_type.int8.ptr})
    
    _fn.printFactors = _fn(nil, {_type.int32}, function(input)
        _var.value = input
        _var.divisor = 2
        _do._while( _get._gte(_var.value, _var.divisor), function()
            _do._if( _get._eq(_var.value % _var.divisor, 0),
                function()
                    _do._fn.printf("%d ", _var.divisor)
                    _var.value = _var.value / _var.divisor
                end,
                function()
                    _var.divisor = _var.divisor + 1
                end)
        end)
        _do._fn.printf("\n")
    end)

    _fn.main = _fn(_type.int32, {_type.int32, _type.int8.ptr.ptr}, function(argc, argv)
        _do._if( _get._eq(argc, 2),
            function()
                _do._fn.printFactors( _fn.atoi( argv[1] ) )
                _do._return(0)
            end,
            function()
                _do._fn.printf("Please enter exactly one number as argument\n")
                _do._return(1)
            end)
    end)
end):emit()

So _var has an underscore, and both _do and _if (i.e. _do._if) have underscores, because those are things installed by the language. But the value in _var.value doesn’t have an underscore because that is a user variable, and the lack of the underscore helps you tell that, and this means that the underscore actually serves a purpose and isn’t just a dodge on Lua’s reserved words list. int32 in _type.int32 does not have an underscore and I don’t know if that’s consistent or not.

This is all… not terrible! The new syntax is not as nice as the version I put earlier in this post, but if it’s a prototyping language rather than something I’d use day to day maybe it’s okay.

Problems #2 and #3 come in the form of fatal errors from LLVM’s verifier (which I figured out how to turn on between posts two and three in this series) when I try to run "printFactors". Both errors claim that I failed to end a block correctly— LLVM requires every block to end with either a branch statement or a return statement. (If the verifier hadn’t stopped me, this would have resulted in a crash when I tried to emit an object file.) The two errors turn out to have happened for different reasons.

Problem #2 has to do with void functions. _fn.printFactors returns void, and in my sample program I don’t end the function with a _do._return() because any sensible language will do that implicitly for you. But I never implemented the part where it does it for you. I do that now.

I go back to ast.lua and add this helper function:

function AstBlock:unterminated()
    local exprCount = #self.spec.exprs
    if exprCount < 1 then return true end
    local finalOp = self.spec.exprs[exprCount].spec.op
    return not (finalOp == 'branch' or finalOp == 'return')
end

Which returns true if the block fails to follow LLVM’s termination rules. All it does is check the final instruction’s opcode. Once I have this helper, I can implement automatic-return-for-void-functions easily by adding this to the end of shadowFunction():

if not ret and block:unterminated() then
    table.insert(block.spec.exprs, AstExpr({op='return'}))
end

(To cover a couple of details: not ret in this context denotes "no return type"; and I only have to run this code once per function, not every block, because the rules that got us to this point will have already set spec.exit on every block except the final one.)

Problem #3 is a little more subtle than this, and is an accidental byproduct of the way I chain blocks in a do._if or do._while. In my sample, main() has a return type. It ends with a do._if, both branches of which end with an explicit return. But! When I run shadowFunction() to process all this, the do._if silently creates a block for the code following the do._if. But there is no code following the do._if, therefore nothing ever puts a return into the implicitly created block, therefore LLVM is unhappy.

This is, unambiguously, the fault of the shadow language code— it’s specifying an invalid block. However, wisely or unwisely, I decide to fix the problem in the AST code. The shadow language created a nonsensical situation here, but it created a detectably nonsensical situation.

The thing that’s weird about the blocks in the do._if is that they have spec.exit set (because do._if sets it unconditionally) but they also end with a do._return (because that’s what’s in the code). I decide to make it a rule that if a block ends with a branch or return, any redundant spec.exit is ignored. This literally requires nothing more than adding the words and self:unterminated() to the code in ast.lua that processes self.spec.exit:

if self.spec.exit and self:unterminated() then
    self.exit = backend.currentBody:registerBlock(self.spec.exit)
end

Problem solved; if the redundant self.spec.exit is not processed, the useless extra block is never registered, which means it’s not in the function, which means LLVM never finds out about it and has nothing to complain about.

These last two errors fixed, I can build my program—

$ luajit generate.lua 2> main.asm && cc -o main main.o
$ wc -l main.asm 
      63 main.asm
$ ./main 1024
2 2 2 2 2 2 2 2 2 2 

The generated assembly is 63 lines, whereas the printFactors program in the shadow language is 36 lines— for the first time in this project, my code to generate a program is shorter than the assembly it generates!— and the program works.Implementing the "shadow language" completely took only three days; the writeup of it you’ve just finished reading took over a week.

Here’s the code so far; the "printFactors" program is in generate.lua and the test snippets are in tests. Since I showed you shadow.lua only in fragments, you can see its full final version here.

What did I just do here?

So: I now have something that looks a lot like source code in an invented language, can be written like source code, but is actually Lua that manually constructs a program.

Is this useful? Why did I do this?

I guess when I originally started this mini-project, I thought I was actually going to write programs in the shadow language. I thought implementing the shadow language would be easier than implementing a toy language, and that once I was done I’d be able to do cool things while writing code because I could harness the full power of Lua as a macro language. This Lua macro thing is something I can do with the shadow language I’ve got— I could dump this into one of my source files:

function swap(_var, f1, f2)
    _var.___TMP = _var[f1]
    _var[f1] = _var[f2]
    _var[f2] = _var.___TMP
end

And then anytime I’m in a construction function I could say swap(_var, 'x', 'y') and this would generate three lines of code to swap x and y. Neat, I guess?

In practice, none of this turns out to be compelling. The shadow language is just a little too ugly (both aesthetically and conceptually). The "printFactors" program is full of distracting meaningless functions and _s. The macro snippet above is neat but not particularly attractive, and is certainly not intuitive to read. The final shadow language is no better than any existing language, and even if I started adding features that justified the language’s existence I’d still be better off installing LPEG and writing a conventional language with a conventional parser to host those features. So that didn’t work out.

Still, I think what I made here turns out to be a useful stepping stone toward a larger project. Let’s say I next try to make an actual compiled programming language. Well, first off, I’ve now got a serviceable first-pass AST for it, so that’s cool. But even beyond that, I think the shadow language will be useful as a prototyping tool. As I go, I’ll want to design features— things to do with memory management or typechecking, maybe. I think adding interfaces to those features in the shadow language— which is just calling functions from Lua in a particular order, after all— will be a lot easier than fitting them into a "real" language syntax. And writing tests for my features in the shadow language will be easier than writing out ASTs by hand (since the ASTs are super easy to generate in code, but a total pain to write in a text editor).

With all this in mind, here’s what I think I’m going to work on next:

Yes, that’s right: I’m going to read the Legend of Zelda manga and eat Wheat Thins. Stay tuned!

Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.

No Compiler, part 2: Turing Completeness

March 10th, 2016

Last week I made a blog post where I talked about making computer programs without a programming language, by writing Lua scripts that bind to the LLVM compiler backend library and call the instruction builder functions manually. I’m gonna keep playing with this, but before I keep going, people on Twitter pointed out some errors in the last post:

  • In the last post, I went to weird bother to construct the path to libLLVM, claiming that LuaJIT can’t figure out things like the standard operating system file extensions while loading libraries. @dotstdy points out that actually LuaJIT can figure out what to do just fine from ffi.load('LLVM'), as long as the library/DLL is in the normal OS search path. (I had only ever used LuaJIT for games, where you want to load specific binaries you shipped instead of anything the OS knows about.)
  • @whitequark points out several issues, notably:
    • In the last post I claimed lld, the LLVM project’s linker, is unique in being a linker that knows how to target multiple platforms. Actually GNU ld can do this also (although Darwin appears to be a notable hole in its platform support).
    • Some context: What LLVM frontends create is "LLVM IR", or Intermediate Representation. The Build functions in the LLVM API, which I’m calling, construct an in-memory version of this IR. The IR also has a textual representation, known as "LLVM assembly", which is dumped by the debug functions and consumed by llc. There’s also bitcode, which is a binary representation of the IR designed to be forward-compatible with future versions of LLVM. The true IR, however, is the in-memory representation.

      In the last post, I did not use this terminology correctly. I was thinking of the phrase "IR" as referring to the textual version (the LLVM assembly language) and so I repeatedly used the phrase "abstract syntax tree" to refer to the in-memory IR. This wasn’t a terminology I saw anywhere but I assumed it to at least make sense because "abstract syntax tree" is a sufficiently vague term. Unfortunately it turns out it’s an incorrect term in this case because in-memory LLVM IR has cycles (phi nodes) and is not a tree.

With that out of the way…

Parameters

In the last post I generated an executable that just ran printf(). Not really much of a "program". At that point I might as well have just been starting to write a compiler frontend— several LLVM frontend tutorials do start by doing exactly those steps.

If I’m going to be writing actual "programs" I need to learn to work with some more complicated things— at minimum variables, and loops/branches, and I need to be able to make and call functions. My goal for this post is going to be to write a small program that needs all these things— specifically, I’m going to write a program that factors numbers.

To get started, here was the code from the last post that emits the printf() call:

local printString = "LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\n"
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "tmpcall");

Let’s change this to add an integer argument to the printf():

local printString = "Number: %d\n"
local printNumber = 31337
local printfArgs = ffi.new("LLVMValueRef [2]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
printfArgs[1] = LLVM.LLVMConstInt(LLVM.LLVMInt32Type(), printNumber, false) -- no sign extend
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

printf() is a vararg function, so I don’t have to adjust its type definition to add the second argument. Running this prints (code so far):

Number: 31337

Let’s make the printed number an variable input by the user instead of a constant. I can get input from the command line if I adjust the prototype for main():

local int32Type = LLVM.LLVMInt32Type()
local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0)  -- address space 0 (arbitrary)

-- Main function prototype
local mainParams = ffi.new("LLVMTypeRef [2]")
mainParams[0] = int32Type
mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0)
local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

All seven lines of that was just to express int main(int argc, char *argv);. If I make this change I also ought to change my final return statement to return 0 instead of void:

LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false))

In theory, my main function now has parameters, which the header file suggests I can extract with LLVMGetParam(). I test this by changing one line of the printf invocation from earlier:

printfArgs[1] = LLVM.LLVMGetParam(mainFn, 0)

Running this I get (code so far):

$ ./main 
Number: 1
$ ./main 0
Number: 2
$ ./main 1 2 3
Number: 4
$ ./main a a a a a a a a
Number: 9

Wow, something actually worked on the first try! By printing the first parameter to mainFn I’m printing "argc", the number of command-line arguments. (The name of the program counts as an argument, which is why running without arguments prints 1.)

Aggregates

Printing the number of arguments is not actually what I want. What I actually want to do right now is print the first argument after the name. This means I somehow need to be able to index a value from the "argv" array. It turns out there is more than one way to do this. I look through the LLVM assembly reference, and here’s what I learn.

Before I go on, I should make something clear if it wasn’t already. Essentially all of the LLVM functions I’m calling, both the arguments and the return values are of one of two C types: Either LLVMTypeRef or LLVMValueRef. Each of these C types corresponds to a whole hierarchy of C++ types, which I assume users of the C++ interface might specifically care about, but I don’t (because the C interface collapses down to LLVMTypeRef/LLVMValueRef and then Lua’s dynamic typing hides even those coarse types). Each value object pointed to by a LLVMValueRef has a type object (an LLVMTypeRef) associated with it. Functions that consume LLVMValues usually have rules about which LLVMTypes the provided values can have— but in most cases LLVMType violations are not noticed by the library until much, much later at which point you get an unrecoverable assert failure, so you have to be really certain you’re following the LLVMType rules before you call functions.

With that out of the way, here’s what the assembly documentation tells me. There are three general categories of data type (categories an LLVMTypeRef can fall into) we care about when thinking about arrays:

  • Pointers: This means the associated value is literally a memory address, pointing to data of some known type.
  • Aggregates: This refers to a block of data containing a series of values. Aggregates come in two types, Arrays and Structs; values in an Array are all of the same type, a Struct has an ordered series of known types for its members. In both cases the aggregate has a known length, and you look up values in the aggregate by index ("the fourth value in this array", or "the second field of this struct"). However arrays have the quirk that by default you’re allowed to look up values past the end of the array; in this case you just get… whatever was in memory after the end of the array. The docs in one place actually advocate creating a 0-length array and indexing past the end of it in order to represent a C-style unknown-length array; I guess in this case the 0-length array would functionally be the same as a pointer, except that LLVM exposes different operations for working with pointers and Arrays and maybe you’d prefer to be working with the Array toolset for some reason.
  • Vectors: These look a lot like an Array, but refer to a situation where the members of the collection are not independent but rather are expected to be packed together in a SSE-style SIMD register. I will not be using Vectors in this blog post; I mention them here only because the docs have the operations for Arrays and Vectors all mixed in together, and it’s important to be aware they’re not what I’ll be using.

All of the concepts mentioned above— pointers, Arrays, Structs and Vectors— are parameterized by type. You can’t just have "a Pointer" or "a Vector". You’d have a pointer to an int32, or a vector of int8s with 8 members.

There are four noteworthy operations we can do on the data types in the above categories:

  • load: Given a LLVMValueRef representing a pointer, returns the LLVMValueRef representing the value it points to.
  • getelementptr: This one is quirky enough the LLVM people wrote a whole documentation page explaining what it is and isn’t, but it’s not actually that complicated. The idea here is that you have a pointer to an Aggregate, and you want to get a pointer to some specific data item within the Aggregate. It doesn’t actually get you the data item— no memory is dereferenced here, the resulting LLVMValueRef does represent a pointer. getelementptr is just doing math with the information encoded in a LLVMType to figure out where in memory a specific value is located.
  • extractvalue: Given an LLVMValueRef representing an Aggregate and an LLVMValueRef representing an index, returns the LLVMValueRef representing the indexed value (the actual value) within the array.
  • extractelement: Same as extractvalue, but works on Vectors.

So! Armed with all of this information, it seems clear that there are two ways I could go about my original goal of indexing one item out of argv:

  • I could convert (cast?) the argv pointer to an array of length zero and use extractvalue.
  • I could use getelementptr on the pointer to look up the address corresponding to my index, then load from that address.

So I guess I just took fourteen paragraphs to answer the question "how do I index an item from an array". But at least it was straightforward to look up! It’s so great to have documentation. By the way, did you know it took me two full weeks to figure out which LLVM-C function creates a getelementptr instruction? I eventually worked out by accident that the C builder function calls it a "GEP" even though nothing else in the LLVM project uses this terminology. Documentation is so great.

Anyway, I go with the second option (the getelementptr route) and it turns out to be pretty straightforward:

-- Read argv[1]
local argvIndex = ffi.new("LLVMValueRef [1]")
argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false)
local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder,
    LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex")
local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref")

Again: Make an LLVMValueRef for the index, use LLVMGetParam() to get an LLVMValueRef for argv, pass these into getelementptr, pass the result of that into load. One thing that might be a little nonobvious here is why the index is passed in as an array— LLVMBuildGEP() takes a list of indexes because it assumes you might want to nest aggregates, and it wants to let you use a single getelementptr instruction to look several nesting levels deep. So for example imagine you have a pointer to a two dimensional array of (x,y) points— an array containing arrays of struct {int x; int y; }s. If you wanted to get the address of array[3][4].y you could give getelementptr the index list [3, 4, 1]. I’m not interested in nesting here, so my index array has only one element.

argvValue now represents the string containing the first argument. I want it as a number, so I bring in the C atoi function and pass argvValue to that. This is no different from what I previously did to link in printf:

-- Atoi prototype
local atoiParams = ffi.new("LLVMTypeRef [1]")
atoiParams[0] = charPtrType
local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true)
local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType)

-- Atoi call site
local atoiArgs = ffi.new("LLVMValueRef [1]")
atoiArgs[0] = argvValue
local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall")

I can now give this result to our printf function:

printfArgs[1] = atoiResult

And I now have a program that takes a command line argument, converts it to an int, then prints it. Running this I get (code so far):

$ ./main 4
Number: 4
$ ./main 149134
Number: 149134
$ ./main
Segmentation fault

"Oops".

Branching

The program so far blindly reads the second element of argv, so it of course crashes if argv is only one element long. I need an error case.

In the last post, I mentioned that LLVM has a concept of "basic blocks", which are branchless stretches of code. I’ve so far been putting the entire program in a single basic block, Entry. Entry is now going to potentially branch into one of two new basic blocks, ArgcValid and ArgcInvalid.

local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid")
local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid")

Next I stick this up near the start of the program, immediately after positioning the builder in Entry:

local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2
    LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare")
LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid)

The br (branch) instruction has a conditional form that takes a LLVMValueRef and two basic blocks. It is implied but not specifically stated by the documentation that the LLVMValueRef needs to be the result of an icmp (integer compare) or fcmp (floating point compare) instruction. I want to do an integer compare, so I icmp parameter 0 (argc) with a constant 2 and branch to the two new basic blocks as appropriate.

I then stick this immediately before the rest of the program code:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid)

Everything I’ve written so far— the argv index, the atoi call, the printf— happens the same as before, just now that code is written into the ArgcValid block because that’s where the builder is positioned. Then at the end of the existing code, I stick in a new section:

-- argc != 2 case
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid)
    -- Print error
    local errorString = "Please enter exactly one number as argument.\n"
    local printfArgs = ffi.new("LLVMValueRef [1]")
    printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror")
    LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall")

    -- Return failure
    LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false))

The ArgcInvalid block does nothing but print an error message and return 1. I try this out and get (code so far):

$ ./main 4
Number: 4
$ ./main
Please enter exactly one number as argument.
$ ./main 3 4 5
Please enter exactly one number as argument.

This is the second time today something has worked on the first try. This is starting to get a little spooky.

Variables

My goal was to write a program that takes a number as input and factors it. So far what I have is a program that takes input. Time to factor.

I’m going to put the factoring in its own function. I define a prototype for it, taking an int as argument and returning void:

-- PrintFactors prototype
local printFactorsParams = ffi.new("LLVMTypeRef [1]")
printFactorsParams[0] = int32Type
local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false)
local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType)

Inside my ArgcValid block, I replace the printf with a call to the function:

-- PrintFactors call site
local printFactorsArgs = ffi.new("LLVMValueRef [1]")
printFactorsArgs[0] = atoiResult
LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "")

-- Printf call site (newline)
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall")

I do still have a printf there, but all it’s printing is "\n". As an amusing fact, since printFactors returns void, the name passed to LLVMBuildCall() MUST be "", exactly the value "" and no other, or the program crashes.

If I can implement the printFactors function, my program is done. Before I start, I plan out some pseudocode:

           value = param[0]           // Number being factored
           divisor = 2                // Candidate factor
           goto loop
loop:      if (divisor > value)       // Count divisor up, factor value down,
               goto done              // stop when the two pass each other.
           else
               goto modulo
modulo:    if (value % divisor == 0)
               goto divide
           else
               goto increment
divide:    printf("%d ", divisor)     // Factor found, print it
           value = value / divisor
           goto loop
increment: divisor = divisor + 1
           goto loop
done:      return

Actually, I say "pseudocode", but if you stuck semicolons on the ends of all the lines this would probably be valid C. Note in LLVM, every basic block has to end with a branch or return.

In this pseudocode, I repeatedly assign to the values value and divisor. This is a problem. LLVM doesn’t offer a way of modeling a "mutable" variable. LLVM offers load and store instructions to let you read and write to main memory, but the LLVMValues I’ve been working with— and you want to be working with LLVMValues, because they get to live in CPU registers— are based on something called the "SSA" model, where everything is assigned once and can never be changed unless you cheat using something called "phi nodes". This all sounds like a pain to learn and explain. So, I’m just not going to use it! The Kaleidoscope tutorial explains that LLVM ships with an optimization pass which promotes values read/written from memory to registers automatically, as long as that memory is allocated using an alloca in the entry block of a function. If I use this, I can avoid learning what "phi nodes" are at least one more day.

I start my printFactors implementation with those allocas:

local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value")
local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor")

LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value)
LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor)

Notice that value and divisor here are pointers. Writing values to them means writing to their memory locations.

I next define all the basic blocks I’m going to need in one go, and fall through into the loop:

local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop")
local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo")
local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide")
local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment")
local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done")

LLVM.LLVMBuildBr(builder.llvm.builder, loop)

Here’s the basic block under the loop: label; all it does is calculate divisor > value and branch:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- Test *divisor > *value and branch
    local compare = LLVM.LLVMBuildICmp(builder.llvm.builder,
        LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo)

Notice I have to start by dereferencing the value and divisor pointers so I can actually use their values.

Here’s the modulo: block where I test whether the divisor divides evenly into the value:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")
    
    -- *value % *divisor
    local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder,
        valueDeref, divisorDeref, "ModResult")

    -- Test *value % *divisor == 0 and branch
    local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ,
        modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment)

Notice I start by dereferencing value and divisor again, even though from the perspective of the code I just did this and the previous loaded values are still valid. This is inefficient, but I don’t care, because the mem2reg optimization pass is going to erase all these loads and stores.

The divide: block is the most complicated, becuase it has to both print out the factor and divide it out from the value:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- printf divisor
    local printfArgs = ffi.new("LLVMValueRef [2]")
    printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring")
    printfArgs[1] = divisorDeref
    LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

    -- *value = *value / *divisor
    local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder,
        valueDeref, divisorDeref, "DivideResult")
    LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value)

    -- Restart loop
    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

Here I use a store instruction to mutate an already-stored value for the first time. Also a tiny, interesting thing: The sdiv (signed divide) instruction has an "exact" form, where the backend gets to assume the divisor divides evenly. I know the divisor divides evenly because I just did a modulo check, so I use the exact form here. Does this actually make a difference to program performance? I have no idea!

If you followed all that, the increment: block is just more of the same:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment)
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- *divisor = *divisor + 1
    local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref,
        LLVM.LLVMConstInt(int32Type, 1, false), "AddResult")
    LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor)

    -- Restart loop
    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

And the done: block is basically nothing, noteworthy only because it demonstrates a conditional br can branch to a basic block but not to a return.

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done)
    LLVM.LLVMBuildRetVoid(builder.llvm.builder)

This is all some pretty dense code, so once I get it all typed out I do need a couple rounds of debugging. The first version of this code I attempted had the tragic line:

local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisorDeref, "DivisorDeref")

IE, instead of dereferencing divisor I’m dereferencing another basic block’s divisorDeref. I had to copy/paste that line like five times and apparently one time I messed it up. This results in a type error, but type errors in LLVM come in the form of halt-the-process assertion errors, so I only found this error by commenting out every single line of the function and then commenting them back in one by one until the error appeared. Once I fixed this I suddenly had a segmentation fault, because while commenting stuff in line by line I accidentally left a return void in the middle of a basic block and apparently it didn’t like that for some reason. Debugging the segmentation fault was a bit obnoxious, and required stepping through the x86-64 disassembly and matching up each instruction to a line of my Lua until I could find out where the error was happening. Both of these debug techniques were feasible because the function was short and the program is short, but I’m really going to have to find better solutions for debugging if I write longer programs with this.

Anyway, once I get those out of the way, I run my program and:

$ ./main 4
2 2 
$ ./main 13
13 
$ ./main 65536
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
$ ./main 2310
2 3 5 7 11 
$ ./main 2311
2311 

It works! It does have some slightly odd behavior on edge cases:

$ ./main 1

$ ./main 0

$ ./main -324

The loop in printFactors halts when the value is less than the divisor, and the divisor starts off at 2, so for inputs below 2 it just prints nothing. For 0 and 1 this is actually pretty good behavior: 0 and 1 have no prime factors, so printing nothing is correct (this is actually what the standard linux factor program does on these numbers). For negative numbers… uh… I don’t know what I expect it to do. Print -1 and then all the prime factors? Print an error? It would be pretty easy to code in a special behavior for negative numbers, but I don’t know what the behavior should be so I just leave it.

With that out of the way:

Optimization

There’s one last thing that’s a bit odd. My program is set to dump the program’s assembly to the console before it generates the exe, so that if anything is horribly wrong it will be obvious immediately. Well, something is wrong:

Loop:     ; preds = %Increment, %Divide, %PrintFactorsEntry
  %ValueDeref = load i32, i32* %Value
  %DivisorDeref = load i32, i32* %Divisor
  %DivisorCompare = icmp sgt i32 %DivisorDeref, %ValueDeref
  br i1 %DivisorCompare, label %Done, label %Modulo

Or, well, that’s correct, that’s what I entered, but wasn’t that mem2reg thing supposed to be erasing all the loads? I take a closer look at the Kaleidoscope sample code and discover it isn’t enough to just create the optimization pass manager like I did in the last post, I have to explicitly tell it to run. I give my builder object a new method:

function Builder:optimize(fn)
    LLVM.LLVMRunFunctionPassManager(self.llvm.passManager, fn)
end

…and then drop at the very end of my program, right before emitting the executable:

builder:optimize(mainFn)
builder:optimize(printFactorsFn)

What mimicking Kaleidoscope gets me is a "function pass manager", and you apparently have to invoke this individually on every single function in your program. There’s also a plain "pass manager" which operates on a whole module, and the header file implies it does different kinds of optimizations. I guess that’s another thing I’ll want to follow up on later. With the function pass manager being invoked, the loop: assembly looks very different:

Loop:                                             ; preds = %Increment, %Divide, %PrintFactorsEntry
  %Divisor.0 = phi i32 [ 2, %PrintFactorsEntry ], [ %Divisor.0, %Divide ], [ %AddResult, %Increment ]
  %Value.0 = phi i32 [ %0, %PrintFactorsEntry ], [ %DivideResult, %Divide ], [ %Value.0, %Increment ]
  %DivisorCompare = icmp sgt i32 %Divisor.0, %Value.0
  br i1 %DivisorCompare, label %Done, label %Modulo

I have no idea what this means. And I don’t have to! 😀

Done

My factoring program (code so far) is done! Excluding the setup with the builder object, the code is— imagine me taking a DEEP breath right now—

local int32Type = LLVM.LLVMInt32Type()
local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0)  -- address space 0 (arbitrary)

-- Main function prototype
local mainParams = ffi.new("LLVMTypeRef [2]")
mainParams[0] = int32Type
mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0)
local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

-- Printf prototype
local printfParams = ffi.new("LLVMTypeRef [1]")
printfParams[0] = charPtrType
local printfType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), printfParams, 1, true) -- vararg
local printfFn = LLVM.LLVMAddFunction(builder.llvm.module, "printf", printfType)

-- PrintFactors prototype
local printFactorsParams = ffi.new("LLVMTypeRef [1]")
printFactorsParams[0] = int32Type
local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false)
local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType)

-- Main function implementation
local mainEntry = LLVM.LLVMAppendBasicBlock(mainFn, "Entry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, mainEntry)

    -- Branch on argc
    local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid")
    local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid")
    local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2
        LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid)

    -- argc == 2 case
    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid)
        -- Read argv[1]
        local argvIndex = ffi.new("LLVMValueRef [1]")
        argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false)
        local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder,
            LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex")
        local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref")

        -- Atoi prototype
        local atoiParams = ffi.new("LLVMTypeRef [1]")
        atoiParams[0] = charPtrType
        local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true)
        local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType)

        -- Atoi call site
        local atoiArgs = ffi.new("LLVMValueRef [1]")
        atoiArgs[0] = argvValue
        local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall")

        -- PrintFactors call site
        local printFactorsArgs = ffi.new("LLVMValueRef [1]")
        printFactorsArgs[0] = atoiResult
        LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "")

        -- Printf call site (newline)
        local printfArgs = ffi.new("LLVMValueRef [1]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline")
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall")

        -- Return success
        LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false))

    -- argc != 2 case
    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid)
        -- Print error
        local errorString = "Please enter exactly one number as argument.\n"
        local printfArgs = ffi.new("LLVMValueRef [1]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror")
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall")

        -- Return failure
        LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false))

-- PrintFactors implementation
local printFactorsEntry = LLVM.LLVMAppendBasicBlock(printFactorsFn, "PrintFactorsEntry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, printFactorsEntry)
    -- Variables
    local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value")
    local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor")

    LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value)
    LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor)

    local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop")
    local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo")
    local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide")
    local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment")
    local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done")

    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- Test *divisor > *value and branch
        local compare = LLVM.LLVMBuildICmp(builder.llvm.builder,
            LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare")
        LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")
        
        -- *value % *divisor
        local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder,
            valueDeref, divisorDeref, "ModResult")

        -- Test *value % *divisor == 0 and branch
        local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ,
            modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly")
        LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- printf divisor
        local printfArgs = ffi.new("LLVMValueRef [2]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring")
        printfArgs[1] = divisorDeref
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

        -- *value = *value / *divisor
        local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder,
            valueDeref, divisorDeref, "DivideResult")
        LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value)

        -- Restart loop
        LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment)
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- *divisor = *divisor + 1
        local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref,
            LLVM.LLVMConstInt(int32Type, 1, false), "AddResult")
        LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor)

        -- Restart loop
        LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done)
        LLVM.LLVMBuildRetVoid(builder.llvm.builder)

So, what have I got here?

Well, for starters, I’ve achieved my goal— I’ve demonstrated I can write a useful program just by invoking the LLVM backend functions, and I have definitely entertained myself doing so. Looking upon what I have wrought I am amused as heck.

But this program is a behemoth! That is 145 lines of code for a program that you could have fully expressed in Perl in 62 characters:

($v)=@ARGV;$d=2;$v%$d?$d++:(print"$d\n")&&($v/=$d)until$d>$v

The generate.lua file is actually longer (9568 bytes) than the OS X executable it generates (8888 bytes). And the code is not just long, it is hard to read. It is dense, and repetitive, and consists mostly of boilerplate. If you look at the LLVM assembly generated by this code, it is both shorter and easier to understand the Lua code itself.

There’s a reason for this. I made a decision to write my not-compiler in Lua. But I’m not using Lua. I’m not taking advantage of Lua’s dynamism— well, never mind that, I’m not even treating Lua as a Turing-complete programming language. Every thing I do here, I do in the most literal and naive way. This was an intentional decision! The goal of this particular program was to be a proof-of-concept and help me learn to use the LLVM-C functions. But, I think I’m done with that goal now.

In the next post, I’m going to start designing abstractions. Okay, constructing LLVM IR by individually calling an LLVM-C function for each line of assembly results in something which is not easy to read or write. What if I designed a Lua API specifically to make constructing programs in code pleasant? What would that look like?

— — — — —

POSTSCRIPT: SOME THINGS I WOULD APPRECIATE ANSWERS ON IF ANYONE READING THIS KNOWS

  • The documentation refers to Aggregates and Vectors as derived types. Is a Pointer a "derived type"?
  • With Vectors, is a Vector required to be the same size as an SIMD register on the target platform? What does LLVM do if it isn’t (for example, an <8 x float> vector on x86-64)?
  • This is a paragraph I cut from this blog post because it broke up the flow and also I’m not sure if it is accurate. Is the following accurate?
    • The documentation calls some LLVM-C functions "instructions". Fundamentally, what an "instruction" does is assign a value to an SSA name (except for non-value-returning functions like br). If I chain together function calls, like passing the output of LLVMBuildLoad to LLVMBuildExactSDiv, this looks to me like a tree of function applications but LLVM internally is rewriting it into a sequential series of SSA name assignments. What I think of as an LLVMValueRef is, fundamentally, a single SSA name.

Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.

No Compiler

March 2nd, 2016

On LLVM, using C libraries from Lua, and writing software without a compiler

I’ve been working on a programming language. I make video games, and the languages that exist for this all have drawbacks that get in my way when I’m making stuff, so I decided to make a new one. I made an interpreter, and it works, which is pretty awesome! But it’s too slow. For what I want to be doing, I decided, rather than an interpreter I ought to be writing a compiler. Once I realized this my project stalled out, because I just really did not want to write a compiler. It’s a lot of work, and a lot of things I’ve never done before, and I wasn’t really sure where to start, and I just really really hate writing parsers.

Then last week I got this weird idea.

If you look in a book about compilers, you’re inevitably going to see a diagram like this:

This diagram is from "Modern Compiler Implementation in Java", the book they gave me in college, and it shows all the steps a compiler takes in turning source code into an executable program. This diagram is also out of date. The modern diagram looks like this:

LLVM is a compiler "backend"— these terms are fuzzy, but people like taking that first diagram and dividing it in half between the "frontend", which in this diagram is the first three boxes, and the "backend", which is the rest. LLVM is the backend for Clang, but LLVM predates Clang and can be used as a stand-alone library. This has lead to a sort of renaissance in homegrown programming languages since now you don’t need to write your own compiler, you can just write a frontend and plug it into LLVM.

And increasingly, this is what everyone does. You could write your own backend, but LLVM is a good backend, and it has about a zillion targets, and it has the combined engineering efforts of Apple and Google (and, now, Microsoft) improving it, and in practice you’re not going to write a better one. You might be able to write a backend which is in some way more apt for your particular use case, but you’re not going to write a better general one. And actually we’re now entering a world where at minimum supporting LLVM as a backend option is mandatory. Apple is aggressively moving to a model where you are no longer allowed to write your own compiler, and for certain platforms (the "Watch", and for a while they were trying to force this for the "TV" as well) you can’t actually create software, you just create bitcode (a binary representation of that "abstract syntax tree" from the diagram) and Apple compiles it for you with their copy of the LLVM backend. Which is sort of horrifying, and yet another example of how Apple’s burgeoning monopoly power f— um, anyway.

Where I’m going with this:

Many years ago, I used to write a lot of audio software. I was making music, and I’d make little programs to make music with— tiny instruments and such. My big dream for a long time was to eventually write a sequencer. I’d play my little homemade software instruments, and I’d play normal physical instruments, but then if I wanted to fit them together into a song I’d have to use sequencer software someone else wrote. This meant I never really had as much control over how things were timed or mixed together as I wanted, so I wanted to write my own sequencer. But that turned out to be hard. There was a lot of UX to design and a lot of lines of GUI to write. All my attempts failed. Eventually what I started doing instead was writing little single-song sequencer programs with no GUI. I’d write a tiny C program for each song, and each program would load the recorded tracks and samples from wav files in its directory, mix them together exactly however I’d described in the C code that it should, and spit out a wav file at the end. This worked really well! The songs sounded good and it was fun as heck. And because I was doing the mixing "by hand" instead of trying to express what I wanted to the intermediary of a sequencing program, I could direct my code on a whim to do complex things that would have been ridiculously difficult to represent in a GUI, things "wait until track 3 goes over this amplitude twice and then immediately cut in track 4".

Now, with Emily (my programming language project), I feel like I have a similar problem. I have things I want my games to do but which I have difficulty expressing elegantly to the tools I have. So I want to make a new tool— but this ultimately means I’m going to design a language that I can express complex ideas to, and then write a compiler that turns that language into LLVM ASTs, and then only after I’ve done all that do I get to start actually expressing my ideas to the language I designed. But I’m stuck on the difficulty of those first two steps— finding a good fixed medium through which to convey my thoughts to my compiler frontend, and then getting the frontend to translate that into LLVM. The "UX" part.

So here’s my weird idea: What if I just did what I did back when I was making music, and cut Emily and her compiler out of the loop completely, and wrote the LLVM ASTs myself? I could easily imagine writing a program that calls the LLVM AST construction functions until it’s described a program, and then directs LLVM to emit an exe. I would basically be writing software with no compiler. Or, maybe a different way of looking at it, is I’d be writing a lot of very small compilers, with no programming language, where each compiler emits only a single fixed program. Suddenly the diagram would look like this:

This sounds like a terrible idea. It also sounds hilarious. I’m gonna try it.

LLVM-C

In CS, we sometimes talk about "metaprogramming". This is a loose phrase for the idea that a language has exposed facilities for you to write code that will write other code for you. Sometimes this works pretty literally— if you look at C defines, or C++ templates, or LISP macros, what you’ve sort of got is a second programming language, semi-awkwardly embedded halfway into the first one, which generates code in the first language before the compiler/interpreter tries to run it. When I write my little not-compilers, I’ll be doing something like this; the language I write my not-compilers in will be my metalanguage, and the language itself will just not exist (or maybe, technically, be LLVM IR). The underlying language will be way too simplistic for writing anything meaningful at all in it to be convenient, so whichever metalanguage I pick had better be expressive enough to let me build within it all the abstractions I’d normally expect when writing software.

I picked Lua. Lua is my favoritestest programming language in the whole world, and nothing I’ve ever used beats it for flexibility. I don’t really use it anymore, because I kept running into practical problems when actually running my Lua programs (FFI/library support isn’t as convenient as it should be, GC is mandatory, it works badly with threading). But none of that is a problem here, because I won’t actually be running any Lua— Lua will only be a kind of build script.

One of the nice things about using Lua is LuaJIT, which is an interpreter implementation with a fairly unique FFI (i.e., its way of talking to software written in C). Usually talking to C libraries from a higher-level language like Lua is a pain where you use wrapper scripts, which generate copies of every single C function you want to call. These "wrapper functions" laboriously convert from however your language thinks about data and functions to the way that C does and, back. This is awkward, and bug-prone, and occasionally you’ll hit a point where C is using data in some way your language doesn’t even know how to describe and just get completely stuck. LuaJIT on the other hand does this completely different and sort of wild thing where it directly parses C header files at runtime, and then it just knows how to pack memory in such a way as to replicate C structs and calling conventions. This makes LuaJIT good for writing a script that mostly calls functions from a C library; LuaJIT can create C data items in memory and call C functions with them the same way a C compiler can, so using the LLVM libraries should be as easy as it would be in C.

There’s still a bit of a problem, which is that the LLVM libraries aren’t actually written in C. They’re written in C++. This is bad. C++ is basically total poison for language compatibility. C++ does not have what’s called an "ABI", which means there’s no standard for how functions call each other when you’re using libraries. Each compiler just kinda comes up with its own thing. This means not only is it impossible for languages that aren’t C++ to call C++ code, it’s not even possible for C++. Two C++ libraries using the same compiler can talk to each other, but if the person who wrote a library you want to use was using a different compiler than you (which on Windows, will happen frequently) there is nothing you can do.

Now, it does turn out— in part because they realized this was a problem— that the LLVM people also distribute a library named LLVM-C, which is a set of C wrappers to the underlying C++ API. Yay! Unfortunately, it’s sort of only half-supported. The LLVM-C documentation describes itself as "This module exposes parts of the LLVM library as a C API"— "parts of"? What the IRC channel explained to me is that LLVM-C is whatever parts of LLVM that somebody realized they needed and added to LLVM-C to scratch that momentary itch, piecemeal, at random times over the last thirteen years. Parts are missing, and the parts that are missing are totally random. Apparently there are still major projects using the LLVM-C interface, so this is not a reason to give up or anything, but this is still going to be a concern to be aware of moving forward. (Interestingly, the Rust people implemented their own equivalent of LLVM-C because they found LLVM-C wasn’t comprehensive enough— but, the Rust project LLVM C bindings are also a random set of whichever parts of LLVM the Rust people needed, it’s just a different subset from LLVM-C, so this does not help us here.)

Anyway, LLVM-C it is. I install version 3.9 (the latest) and download a copy of llvm-c-kaleidoscope— a C port of the program from the LLVM tutorial. I never do get llvm-c-kaleidoscope to compile, but that’s okay, it’s just for reference.

I’m now ready to start writing some Lua! I set up a little project:

config.lua
generate.lua
lib-ffi/
    import.lua
    llvm/
        3_9/
            (LLVM headers in here)
lib-lua/
    import.lua
    pl/
        (Penlight in here)

Unfortunately, Lua doesn’t have a very good story for libraries. Unlike languages like Python that are expected to have a runtime with various niceties already installed on the computer, Lua was designed to be embedded into larger programs (for example, games). The assumption was whoever was doing the embedding would set up whatever libraries are needed. So if you’re running a Lua script standalone (like I’m doing here), you kind of need to set up a little environment around it with the basics. Here "the basics" are provided by Penlight, a third party "standard library" for Lua which includes a lot of basic stuff, like classes (yes, in Lua object oriented programming is something you get from a library— like I said, it’s flexible). Even just getting Penlight into the program is a little awkward; lib-lua/import.lua contains:

-- Lets internal references in Penlight work
package.path = package.path .. ";./lib-lua/?.lua"

class = require("pl/class")
pretty = require("pl/pretty")

Yech.

Of more interest is the lib-ffi directory. This, too, is a little awkward. Like I said, LuaJIT consumes C header files, but it doesn’t do it the way a compiler does– there’s no include path, or anything. Instead you call a function named ffi.cdef with the header file contents passed in as a string. Except you can’t just put those contents in verbatim– LuaJIT doesn’t understand C preprocessor directives, so you have to strip those out first. Like I said, this is part of why I stopped using Lua– every time you bring in a new library there’s this labor-intensive process of massaging it so Lua can talk to it. In this case it doesn’t seem so bad though, because I only have to do it once, for the LLVM library (and maybe Clanglib at some point). Any libraries I use beyond that are going to be talking to my generated executable, not my Lua script.

I figure out which headers I’m going to need, and fill out lib-ffi/import.lua with:

ffi = require( "ffi" )

-- Libraries

local libExt = ({
   OSX     = "dylib",
   Windows = "dll",
   Linux   = "so", BSD = "so", POSIX = "so", Other = "so"
})[ffi.os]

local function libPath(name)
    return config.llvmLibPath .. "/" .. name .. "." .. libExt
end

LLVM = ffi.load( libPath("libLLVM") )

-- Headers

local llvmDir = "lib-ffi/llvm/" .. config.llvmVersion .. "/"

require(llvmDir .. "Types")
require(llvmDir .. "Target")            -- Requires Types
require(llvmDir .. "TargetMachine")     -- Requires Types, Target
require(llvmDir .. "ExecutionEngine")   -- Requires Types, Target, TargetMachine
require(llvmDir .. "ErrorHandling")
require(llvmDir .. "Core")              -- Requires ErrorHandling, Types
require(llvmDir .. "Transforms/Scalar") -- Requires Types

There are a couple of things going on here.

When you compile a C program, there’s a compile phase (which if you’re using a library, is when you interpret its headers) and a link phase (which if you’re using a library, is when you link the library binary in). LuaJIT does all of this at runtime, but it still needs the headers and the libraries. You feed it the headers as strings so it knows what memory layout to use, and then you give it the direct path of the library to dynamically load. It’s all manual and there’s nothing intelligent like a linker in play, so I have to do the legwork of figuring out things like what the file extension for a dynamic library is on the current operating system. There’s also absolutely no way for me to figure out where the LLVM libraries are installed, so I expect whoever runs this script to provide that path before running, in the config.lua file.

Then we get to the headers. I choose to version these, and make the user say which version they want in config.lua.

My goal is to create LLVM "abstract syntax trees". LLVM, as I understand it, has three ways to specify an AST: As LLVM IR, which is a textual representation of ASTs which changes with every version of LLVM; as bitcode, which is a binary representation which is standardized and compatible between LLVM versions; or by calling the functions in the LLVM or LLVM-C API. My understanding is that the LLVM-C API is stable— if you write a program targeting LLVM 3.6, you should be able to recompile with LLVM 3.9 and it should work. (If this is not true, please, somebody warn me now.) However, just because the 3.6 and 3.9 APIs are source compatible does not mean you can use the 3.9 headers with the 3.6 libraries. And I’m going to do something weird— I’m going to embed the 3.9 headers directly into my source repository, because I have to, because I’m loading my hand-massaged versions of the headers rather than the ones on disk.

Right now everything I’m doing is just a weird experiment— I’m not designing any of this to be convenient for other people to use. However, it seems like it would be pretty bad if I designed this in a way it literally only works on this one computer, with this one version of LLVM. Weird experiment or no I need to leave the door open for another human to run this, especially since anything that will trip up another human will also trip up my own future self. I open that door by leaving space for multiple versions of the "massaged" LLVM headers to be checked into this repository: I check in the 3.9-compatible headers I’ve got (in a folder awkwardly named 3_9, since the Lua loader can’t handle directory names containing periods), and expect that people using a different version of LLVM in future can at least do the conversion process on their own headers and check that in alongside 3_9.

With that established, I copy all the 3.9 LLVM-C headers into my 3_9 folder and check them in unchanged— I won’t be using them, but I want them around for future reference— and select the handful I need, the handful from import.lua, to convert to Lua. This process is mostly simple; I rename to .lua, delete the #if guards and #includes from the top and bottom, and stick ffi.cdef[[ at the top and ]] at the end. (The square brackets create a multi-line string.) Easy! Except:

  • Since I removed the #includes, I don’t get dependency management done for me. This means I have to work out the dependency tree by hand and make sure I include everything in the right order (hence the weird comments in import.lua).
  • Core.h contains some trickery with #define— some C metaprogramming. I have to run the C preprocessor manually on this file (making sure to do so after I remove the #includes at the top, because otherwise I’d get the entirety of stdlib pulled in to the file).
  • Target.h is even worse— it contains this strange repeated preprocessor construction:

    /* Declare all of the target-initialization functions that are available. */
    #define LLVM_TARGET(TargetName) \
      void LLVMInitialize##TargetName##TargetInfo(void);
    #include "llvm/Config/Targets.def"
    #undef LLVM_TARGET  /* Explicit undef to make SWIG happier */

    What is this? Well, apparently, when the LLVM libraries were installed on my computer, the install built these .def files containing per-system configuration information. If you look in Targets.def in the main LLVM headers, you find it’s full of lines like

    LLVM_TARGET(AArch64)
    LLVM_TARGET(AMDGPU)
    LLVM_TARGET(ARM)
    ...

    This is a problem. The reason I believe I can move my "massaged" 3.9 headers to another computer and have the code still work with the local dynamic libraries is I assume things like structure layouts don’t vary between any two compiled copies of a single version of LLVM. But, the .defs contents will vary from computer to computer! The only possible solution here is to leave the metaprogramming macro garbage in Target.lua, and instruct anyone who tries to compile this code to run the preprocessor themselves (thus getting the .def contents from their own computer). The project now has a README instructing users to run this monstrosity before running the not-compiler:

    gcc -I/opt/local/libexec/llvm-3.9/include -E -P -C -xc-header ./lib-ffi/llvm/3_9/Target.source.lua > ./lib-ffi/llvm/3_9/Target.lua

    That’s… really unsatisfying, so it looks like if I keep with this experiment, I’m going to eventually need something like a build script.

generate.lua

3,135 words into this blog post, I am finally ready to start writing code! I translate the most minimal chunk of the llvm-c-kaleidoscope project’s main file into Lua as can stand alone:

require("config")
require("lib-lua/import")
require("lib-ffi/import")

class.Builder()
function Builder:_init(moduleName)
    self.llvm = {
        module = LLVM.LLVMModuleCreateWithName(moduleName),
        builder = LLVM.LLVMCreateBuilder()
    }

    local enginePtr = ffi.new("LLVMExecutionEngineRef[1]")
    local msgPtr = ffi.new("char *[1]")
    if LLVM.LLVMCreateExecutionEngineForModule(enginePtr, self.llvm.module, msg) == 1 then
        local err = ffi.string(msgPtr[0])
        LLVM.LLVMDisposeMessage(msgPtr[0]);
        error("LLVMCreateExecutionEngineForModule error: " + err)
    end
    self.llvm.engine = enginePtr[0]

    self.llvm.passManager = LLVM.LLVMCreateFunctionPassManagerForModule(self.llvm.module)
    LLVM.LLVMAddTargetData(LLVM.LLVMGetExecutionEngineTargetData(self.llvm.engine), self.llvm.passManager);
    LLVM.LLVMAddPromoteMemoryToRegisterPass(self.llvm.passManager);
    LLVM.LLVMAddInstructionCombiningPass(self.llvm.passManager);
    LLVM.LLVMAddReassociatePass(self.llvm.passManager);
    LLVM.LLVMAddGVNPass(self.llvm.passManager);
    LLVM.LLVMAddCFGSimplificationPass(self.llvm.passManager);
    LLVM.LLVMInitializeFunctionPassManager(self.llvm.passManager);
end

function Builder:dump()
    LLVM.LLVMDumpModule(self.llvm.module)
end

function Builder:dispose()
    LLVM.LLVMDisposePassManager(self.llvm.pass_manager);
    LLVM.LLVMDisposeBuilder(self.llvm.builder);
    LLVM.LLVMDisposeModule(self.llvm.module);
end

builder = Builder("main")
builder:dump()

Here I create a single empty "module", and I dump the corresponding LLVM IR. I run it and:

; ModuleID = 'main'

It works! I don’t know what a module is yet— I assume it corresponds to an object file— but I’ve at least demonstrated I can talk to LLVM and not crash.

Here’s the project I’ve written so far. I want to go a little further than this, though. I want to make an executable.

Emitting to a file

Assuming a module is an object file, there are two things I need to do from here: I need to add a "main" function as entry point; and I need to write the object file to disk. I attempt the second thing first, since I assume it will be easier. This assumption is wrong.

The Kaleidoscope sample program doesn’t emit files— it uses JIT— so this is the first LLVM thing I’m left to do on my own. I investigate. Apparently the way you emit a module as an object file is LLVMTargetMachineEmitToFile(). This function is a bit of a monster. It takes two configuration arguments, one of which is a complex structure called a TargetMachine whose constructor takes seven arguments, many of which are magic strings.

At this point I’m starting to feel the limitations of the LLVM documentation. There’s no documentation for LLVM-C as such, so I wind up looking everything up twice, once in the LLVM-C reference to figure out what the corresponding C++ function is, then into the C++ documentation to find out what the function actually does. Once I find the function I want, the explanation is invariably limited to the strict function definition; questions like "but when do I call this?" or "but where does the TargetMachine come from?" are outside the scope of the reference document. But the reference document is the only document, so at that point I’m left searching StackOverflow. Meanwhile StackOverflow only helps when the thing I want to do is common; in the case of emitting an object file, apparently this is enough of a weird or at least power-usery thing that most entry-level LLVM users don’t attempt it. Apparently everyone else is using a command line tool called llc that converts LLVM IR to object files.

The arguments to LlvmCreateTargetMachine() are complicated because LLVM is a cross-compiler; it could compile for your machine you’re sitting at, or some arbitrary number of other machines. How do I get the specification for the machine I am sitting at? I decide to pull up the source code for llc and see what it did. And it turns out there are a bunch of functions for exactly this, returning defaults corresponding to your current machine! Things like getDefaultTargetTriple() and getFeaturesStr(). These functions are in sys.h and CommandFlags.h. Of the C++ headers. They’re not exposed to C. Um.

I need to specify eight things: a "target", a "triple", a "CPU", a "features" list, and four options related to optimization and what you’re building (library, executable, etc). The last four are thankfully enums, so those are easy to work out just from looking at the headers, and I find a function that can derive the "target" from the "triple", so now I’m down to three mysteries. I know how to look up the "triple" for my own machine, but again I don’t want this software bound to one computer. I wind up begging on IRC and Twitter for about 48 hours, and people who have done this before help me out. What I find out:

  • The "triple" is the same "what do I compile?" string all compilers use; you can get a sample from any one compiler on a given machine with clang -version or gcc -dumpmachine.
  • There is after all a C-exposed LLVM.LLVMGetDefaultTargetTriple(). (I apparently missed this in my first few scans of the five thousand lines of headers.)
  • "CPU" is the same as -march on a traditional compiler— it specifies microarchitecture. If you specify armv7 here, you’ll get optimizations that the armv6 can’t support, but your software also won’t run on armv6 anymore.
  • The "CPU" and "features" strings can be empty strings. If you do this, it gives you the most conservative possible microarchitecture for the triple you asked for. (Which might be closer to what you want anyway than the "defaults" sys.h would give you— after all, ask for the "default" on a brand new PC and you might get something no one else can run.)
  • The "target" is something weird and LLVM-internal, and refers to which LLVM backend you want to run. You’ll want to get this by using the function I mentioned that derives it from the triple. (You can also look up targets by name, but the names are pretty arcane— there’s no apparent consistency or correspondence to the outside world in the naming scheme, it was just up to whoever wrote that one LLVM add-on. For example there’s one x86 target that works with both x86 and x86-64, but the 32 and 64 bit ARM targets are separate.)

Armed with all of this, I’m able to put together an emission function which populates everything from the default triple (with overloading allowed via config.lua):

function Builder:emitToFile()
    local targetRefPtr = ffi.new("LLVMTargetRef [1]")
    compileTriple = config.compileTriple or LLVM.LLVMGetDefaultTargetTriple()
    compileCpu = config.compileCpu or ""
    compileFeatures = config.compileFeatures or ""

    if LLVM.LLVMGetTargetFromTriple (compileTriple, targetRefPtr, errPtr) == 1 then
        llvmErr("LLVMGetTargetFromTriple")
    end
    if LLVM.LLVMTargetMachineEmitToFile(
            LLVM.LLVMCreateTargetMachine(targetRefPtr[0], compileTriple, compileCpu, compileFeatures,
                LLVM.LLVMCodeGenLevelAggressive, LLVM.LLVMRelocDefault, LLVM.LLVMCodeModelSmall),
        self.llvm.module, cstring(self.moduleName .. ".o"), LLVM.LLVMObjectFile, errPtr) == 1 then
        llvmErr("LLVMTargetMachineEmitToFile")
    end
end

And then things get weird.

My first sign of trouble is a cryptic message Unable to find target for this triple (no targets are registered). I waste time assuming I fed the function a bad triple before realizing what this really means: there are no targets, at all. It turns out before LLVMGetTargetFromTriple() will work, you have to call a special function to initialize any targets you might want to hypothetically load. There is a different initialization function for each target— LLVMInitializeAArch64Target() for 64-bit ARM vs LLVMInitializeX86Target(), for example. They apparently did it this way because most users link LLVM into their program statically, and partitioning the target backends into functions like this means you don’t have to pack the support code for whatever "MSP430" is in to your binary if you don’t plan to support it. That makes sense, but I’m linking dynamically, and I don’t know which platforms I do or don’t support ahead of time.

What a user like me— using the dynamic LLVM, not really certain what they intend to support— usually does is call the LLVMInitializeAllTargets() function. Which is defined in Target.h

/** LLVMInitializeAllTargets - The main program should call this function if it
    wants to link in all available targets that LLVM is configured to
    support. */
static inline void LLVMInitializeAllTargets(void) {
#define LLVM_TARGET(TargetName) LLVMInitialize##TargetName##Target();
#include "llvm/Config/Targets.def"
#undef LLVM_TARGET  /* Explicit undef to make SWIG happier */
}

…as an inline function. And now I’m stuck. LuaJIT can’t call those— at all. Again, LuaJIT calls functions by loading them from a dynamic library, but inline functions aren’t in the dynamic library, they’re C code embedded directly in the header file.

I get around this by doing something truly horrible. LLVMInitializeAllTargets() is an inline function generated from Targets.def using C preprocessor. I’m already generating Target.lua by running the preprocessor on a Lua file, in order to clear out preprocessor directives inside the embedded C header string. So I just go ahead and, uh… generate some Lua… using the C preprocessor.

function LLVMInitializeAllTargets()
  #define LLVM_TARGET(TargetName) LLVM.LLVMInitialize##TargetName##Target()
  #include "llvm/Config/Targets.def"
  #undef LLVM_TARGET
end

After the preprocessor runs, this gives me a Lua function that calls each of the functions that the inaccessible C LLVMInitializeAllTargets() would have been calling. I make three more of these functions (in addition to initializing the "targets" themselves, you need to also initialize the "target infos", "target MCs", and "ASM printers" for each target, or you will get completely misleading error messages). And it works!

If I call my big combined LLVMInitializeAll() function first, emitToFile() prints no errors and I get a 200-byte main.o file which file claims to be Mach-O 64-bit.

Okay! Now can I actually do something?

Doing something

My module needs some code. Let’s do the simplest possible thing: Call printf.

LLVM modules keep a symbol table containing all known functions. When the operating system tries to run the exe I’m making, it will start by looking up a function named "main" in this symbol table and calling it.

local mainType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), nil, 0, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

I construct the type for a main function, then construct the function itself. AddFunction will add the function to the symbol table and then return the new function object.

local mainEntry = LLVM.LLVMAppendBasicBlock(mainFn, "entry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, mainEntry)

Since I will be adding code to the function, I need to create a "basic block" for the code to live in. "Basic block" is a standard term from compilers referring to a continuous section of code without any branches in it.

Earlier, I made a "builder object". This appears to be something like a text cursor. There’s a series of "Build" functions, and each inserts a bit of code at whatever point in the module the builder is positioned at. I add my new basic block to the function object, and then I position my builder at the end of the basic block. The basic block takes a name when you create it; as far as I can tell this name is arbitrary, and I assume it’s only there in case you want to direct a jump there from the end of some other basic block. I name it "entry".

Next I want to call printf. But first I have to tell LLVM what printf is:

local printfParams = ffi.new("LLVMTypeRef [1]")
printfParams[0] = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0) -- address space 0 (arbitrary)
local printfType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), printfParams, 1, true) -- vararg
local printfFn = LLVM.LLVMAddFunction(builder.llvm.module, "printf", printfType)

Again here I’m constructing a function and putting it in the symbol table. However, this time I am not going to be adding any code to it; I’m just putting it in the symbol table so LLVM knows it exists and what type it is. Later the linker will supply the implementation to go with that symbol table entry. The type on this one is a little more complicated than the one for the main function; it has arguments.

local printString = "LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\n"
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "tmpcall")

Now I’m ready to start using the builder. I Build a global string, and then I Build a call site. Both the string and the call site have "names". Like the basic block, the names appear to be arbitrary, although I cannot imagine a possible use for the names like I could with the basic block.

LLVM.LLVMBuildRetVoid(builder.llvm.builder)

Since I’m done, I cap off my basic block by returning void.

Thirteen lines of code. And that is actually all I need! My generator script spits out a new main.o and gives me a glimpse of what the code I just constructed looks like in LLVM IR:

; ModuleID = 'main'

@tmpstring = private unnamed_addr constant [47 x i8] c"LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\0A\00"

define void @main() {
entry:
  %tmpcall = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([47 x i8], [47 x i8]* @tmpstring, i32 0, i32 0))
  ret void
}

declare i32 @printf(i8*, ...)

That was straightforward, right? Let me take a moment to talk about what I had to do to get that thirteen lines of code.

Again, the LLVM documentation is not very helpful:

These reference documents I’ve been working with were created by Doxygen, which means unless someone went to the bother of commenting a particular function you get only these autogenerated blurbs. I mentioned that LLVMBuildCall() has a "Name" parameter and I don’t know what it does; well, what do I see if I check the documentation? Answer: The LLVMBuildCall() documentation is autogenerated and gives only a source line number. Well, but that’s LLVM-C documentation, right? What do I see if I check the "real" documentation, the C++ documentation? Answer: The IRBuilder::CreateCall() documentation is autogenerated and gives only a source line number. I never did figure out what the "Name" meant.

Now that I’ve finally got to generating ASTs I’ve encountered a new issue with the docs— many of the StackOverflow suggestions for things I need to do by this point are actually turning up as LLVM IR snippets. So now I get instructions given to me as LLVM IR, which I then have to figure out how to translate into C++, so that I can go from there to translating into C (so I can rewrite it in Lua). On the bright side though, LLVM IR turns out to be lavishly documented.

In the end, I figured out very little of the AST generation code by myself. The AST code above was constructed by consulting a mix of the llvm-c-kaleidoscope project, this tutorial written by IBM, and this sample code written by a French security researcher. The IBM and 0vercl0k samples do almost exactly the same things I’m trying to, and were tremendously helpful. But even coding with these great models to follow took some tinkering, since I still had to get past a problem I didn’t really talk about so far: LLVM reacts to most unexpected inputs by crashing. For example, LLVMCreateTargetMachine() above takes an empty string as a valid input; but pass in a NULL string and you get a crash.

A lot of these crashes are pretty easy to deal with. Early on in writing the above code, I was trying to use LLVMConstString() where I should have been using LLVMBuildGlobalStringPtr(), which meant I was passing in to printf the equivalent of a char[48] instead of the equivalent of a char *. The result was an assert:

This I could deal with. It’s not as specific as it could be, but the stack is clean, and the assert gave me a line number. I found out I could get MacPorts to show me the source code to my LLVM version (sudo port patch llvm-3.9 && `port work llvm-3.9`), and this let me look up the context, and after some testing with LLVMTypeOf()/LLVMDumpType() I understood what the problem was and where to start on fixing it.

The other problems I had were… not like that. I had a bug where I was trying to call LLVMBuildGlobalStringPtr() before creating the basic block and positioning the builder; the symptom was a segmentation fault deep inside the optimization pass manager when emitting the object file. I had a bug where I was failing to end my function with a ret void; the symptom was an assert failure, again deep in the optimization pass manager, with the totally mind-boggling assert error Assertion failed: (Val && "isa<> used on a null pointer"), function doit, file Casting.h. In both of these cases, there was basically nothing I could do to understand what went wrong or how to fix it, no breadcrumb trail back to my code. I got out of both of these situations by just checking to see if there was anything the IBM/0vercl0k samples did but I didn’t, and adding those things until the crashes went away.

The reason I’m going into all this is that these various issues taught me something: This is going to be a problem. The thirteen lines of code above are very simple, but constructing them took a full day and in the end was achieved only by following someone else’s example. A lot of the things I do next I will not have examples to follow. So this tells me if I want to make any progress I’m going to need to change how I’m doing things:

  • I need, at least at first, to stick to the heavily trafficked paths. There’s some entry points to this library that lots and lots of people have gone through, and some that are less heavily used. What I’m finding is whenever I veer into the second area things get difficult. As a user of LLVM-C rather than the standard libraries I’m already pushing it; I unfortunately probably can’t afford to be an outlier in other ways also. I’ll want to always code things the simple, standard way before trying anything clever, and I’m probably going to need to get used to testing LLVM IR before attempting things against the API.
  • I am going to need to get used to the internals of the LLVM code. I tried pulling the source code from MacPorts; that’s not enough. I’m going to need to build my own LLVM and get debug symbols (I might have been able to figure something out with the isa<> used on a null pointer error, for example, if I’d been able to inspect locals and find out what was null) and I’m probably going to need to read enough source code to understand the architecture of entire modules before I can understand some of the error messages I’m going to get. This is going to be, in other words, one of those libraries.

Linking

But, I now have a main.o file! Now I just have to link it and I’m done.

This part is obnoxious. My original goal was to write a program that does all the things a compiler does. But you sort of can’t write a linker. Linkers are more the domain of the operating system and its code loading system than they are the domain of language standards, and so effectively only the operating system vendor can really write one because only the OS vendor knows what the linker even does. Whatever your preferred C compiler is, it doesn’t actually link; it actually just finds the OS-supplied linker and invokes it as a command-line program. In other words, the chart I drew at the top of this post is not quite accurate. A more accurate chart would be:

To make my original claim of "no compiler" accurate I could maybe call the linker myself (thus leaving this post with instructions that only work on OS X), or maybe look into LLVM’s experimental linker lld which is someday supposed to be able to do the unthinkable and replace the OS linker. But why make things difficult? I take the cheap route:

cc main.o -o main

And my program works!

$ ./main 
LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!

The Lua script that acts as this program’s "compiler" is less than a hundred lines long.

So this is pretty promising! If I can call printf, I’ve got a basis for generating programs that do much more complicated things. It’s only a couple more steps to add variables or flow control, and once I get that done I’ll be able to do fun things like making tiny DSLs out of idiomatic Lua, or maybe even prototype some low-level systems for Emily. Here‘s the source code for my final Lua scripts in this post, and I will definitely be posting more about this in future.

Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.

On LLVM, using C libraries from Lua, and writing software without a compiler

I’ve been working on a programming language. I make video games, and the languages that exist for this all have drawbacks that get in my way when I’m making stuff, so I decided to make a new one. I made an interpreter, and it works, which is pretty awesome! But it’s too slow. For what I want to be doing, I decided, rather than an interpreter I ought to be writing a compiler. Once I realized this my project stalled out, because I just really did not want to write a compiler. It’s a lot of work, and a lot of things I’ve never done before, and I wasn’t really sure where to start, and I just really really hate writing parsers.

Then last week I got this weird idea.

If you look in a book about compilers, you’re inevitably going to see a diagram like this:

This diagram is from "Modern Compiler Implementation in Java", the book they gave me in college, and it shows all the steps a compiler takes in turning source code into an executable program. This diagram is also out of date. The modern diagram looks like this:

LLVM is a compiler "backend"— these terms are fuzzy, but people like taking that first diagram and dividing it in half between the "frontend", which in this diagram is the first three boxes, and the "backend", which is the rest. LLVM is the backend for Clang, but LLVM predates Clang and can be used as a stand-alone library. This has lead to a sort of renaissance in homegrown programming languages since now you don’t need to write your own compiler, you can just write a frontend and plug it into LLVM.

And increasingly, this is what everyone does. You could write your own backend, but LLVM is a good backend, and it has about a zillion targets, and it has the combined engineering efforts of Apple and Google (and, now, Microsoft) improving it, and in practice you’re not going to write a better one. You might be able to write a backend which is in some way more apt for your particular use case, but you’re not going to write a better general one. And actually we’re now entering a world where at minimum supporting LLVM as a backend option is mandatory. Apple is aggressively moving to a model where you are no longer allowed to write your own compiler, and for certain platforms (the "Watch", and for a while they were trying to force this for the "TV" as well) you can’t actually create software, you just create bitcode (a binary representation of that "abstract syntax tree" from the diagram) and Apple compiles it for you with their copy of the LLVM backend. Which is sort of horrifying, and yet another example of how Apple’s burgeoning monopoly power f— um, anyway.

Where I’m going with this:

Many years ago, I used to write a lot of audio software. I was making music, and I’d make little programs to make music with— tiny instruments and such. My big dream for a long time was to eventually write a sequencer. I’d play my little homemade software instruments, and I’d play normal physical instruments, but then if I wanted to fit them together into a song I’d have to use sequencer software someone else wrote. This meant I never really had as much control over how things were timed or mixed together as I wanted, so I wanted to write my own sequencer. But that turned out to be hard. There was a lot of UX to design and a lot of lines of GUI to write. All my attempts failed. Eventually what I started doing instead was writing little single-song sequencer programs with no GUI. I’d write a tiny C program for each song, and each program would load the recorded tracks and samples from wav files in its directory, mix them together exactly however I’d described in the C code that it should, and spit out a wav file at the end. This worked really well! The songs sounded good and it was fun as heck. And because I was doing the mixing "by hand" instead of trying to express what I wanted to the intermediary of a sequencing program, I could direct my code on a whim to do complex things that would have been ridiculously difficult to represent in a GUI, things "wait until track 3 goes over this amplitude twice and then immediately cut in track 4".

Now, with Emily (my programming language project), I feel like I have a similar problem. I have things I want my games to do but which I have difficulty expressing elegantly to the tools I have. So I want to make a new tool— but this ultimately means I’m going to design a language that I can express complex ideas to, and then write a compiler that turns that language into LLVM ASTs, and then only after I’ve done all that do I get to start actually expressing my ideas to the language I designed. But I’m stuck on the difficulty of those first two steps— finding a good fixed medium through which to convey my thoughts to my compiler frontend, and then getting the frontend to translate that into LLVM. The "UX" part.

So here’s my weird idea: What if I just did what I did back when I was making music, and cut Emily and her compiler out of the loop completely, and wrote the LLVM ASTs myself? I could easily imagine writing a program that calls the LLVM AST construction functions until it’s described a program, and then directs LLVM to emit an exe. I would basically be writing software with no compiler. Or, maybe a different way of looking at it, is I’d be writing a lot of very small compilers, with no programming language, where each compiler emits only a single fixed program. Suddenly the diagram would look like this:

This sounds like a terrible idea. It also sounds hilarious. I’m gonna try it.

LLVM-C

In CS, we sometimes talk about "metaprogramming". This is a loose phrase for the idea that a language has exposed facilities for you to write code that will write other code for you. Sometimes this works pretty literally— if you look at C defines, or C++ templates, or LISP macros, what you’ve sort of got is a second programming language, semi-awkwardly embedded halfway into the first one, which generates code in the first language before the compiler/interpreter tries to run it. When I write my little not-compilers, I’ll be doing something like this; the language I write my not-compilers in will be my metalanguage, and the language itself will just not exist (or maybe, technically, be LLVM IR). The underlying language will be way too simplistic for writing anything meaningful at all in it to be convenient, so whichever metalanguage I pick had better be expressive enough to let me build within it all the abstractions I’d normally expect when writing software.

I picked Lua. Lua is my favoritestest programming language in the whole world, and nothing I’ve ever used beats it for flexibility. I don’t really use it anymore, because I kept running into practical problems when actually running my Lua programs (FFI/library support isn’t as convenient as it should be, GC is mandatory, it works badly with threading). But none of that is a problem here, because I won’t actually be running any Lua— Lua will only be a kind of build script.

One of the nice things about using Lua is LuaJIT, which is an interpreter implementation with a fairly unique FFI (i.e., its way of talking to software written in C). Usually talking to C libraries from a higher-level language like Lua is a pain where you use wrapper scripts, which generate copies of every single C function you want to call. These "wrapper functions" laboriously convert from however your language thinks about data and functions to the way that C does and, back. This is awkward, and bug-prone, and occasionally you’ll hit a point where C is using data in some way your language doesn’t even know how to describe and just get completely stuck. LuaJIT on the other hand does this completely different and sort of wild thing where it directly parses C header files at runtime, and then it just knows how to pack memory in such a way as to replicate C structs and calling conventions. This makes LuaJIT good for writing a script that mostly calls functions from a C library; LuaJIT can create C data items in memory and call C functions with them the same way a C compiler can, so using the LLVM libraries should be as easy as it would be in C.

There’s still a bit of a problem, which is that the LLVM libraries aren’t actually written in C. They’re written in C++. This is bad. C++ is basically total poison for language compatibility. C++ does not have what’s called an "ABI", which means there’s no standard for how functions call each other when you’re using libraries. Each compiler just kinda comes up with its own thing. This means not only is it impossible for languages that aren’t C++ to call C++ code, it’s not even possible for C++. Two C++ libraries using the same compiler can talk to each other, but if the person who wrote a library you want to use was using a different compiler than you (which on Windows, will happen frequently) there is nothing you can do.

Now, it does turn out— in part because they realized this was a problem— that the LLVM people also distribute a library named LLVM-C, which is a set of C wrappers to the underlying C++ API. Yay! Unfortunately, it’s sort of only half-supported. The LLVM-C documentation describes itself as "This module exposes parts of the LLVM library as a C API"— "parts of"? What the IRC channel explained to me is that LLVM-C is whatever parts of LLVM that somebody realized they needed and added to LLVM-C to scratch that momentary itch, piecemeal, at random times over the last thirteen years. Parts are missing, and the parts that are missing are totally random. Apparently there are still major projects using the LLVM-C interface, so this is not a reason to give up or anything, but this is still going to be a concern to be aware of moving forward. (Interestingly, the Rust people implemented their own equivalent of LLVM-C because they found LLVM-C wasn’t comprehensive enough— but, the Rust project LLVM C bindings are also a random set of whichever parts of LLVM the Rust people needed, it’s just a different subset from LLVM-C, so this does not help us here.)

Anyway, LLVM-C it is. I install version 3.9 (the latest) and download a copy of llvm-c-kaleidoscope— a C port of the program from the LLVM tutorial. I never do get llvm-c-kaleidoscope to compile, but that’s okay, it’s just for reference.

I’m now ready to start writing some Lua! I set up a little project:

config.lua
generate.lua
lib-ffi/
    import.lua
    llvm/
        3_9/
            (LLVM headers in here)
lib-lua/
    import.lua
    pl/
        (Penlight in here)

Unfortunately, Lua doesn’t have a very good story for libraries. Unlike languages like Python that are expected to have a runtime with various niceties already installed on the computer, Lua was designed to be embedded into larger programs (for example, games). The assumption was whoever was doing the embedding would set up whatever libraries are needed. So if you’re running a Lua script standalone (like I’m doing here), you kind of need to set up a little environment around it with the basics. Here "the basics" are provided by Penlight, a third party "standard library" for Lua which includes a lot of basic stuff, like classes (yes, in Lua object oriented programming is something you get from a library— like I said, it’s flexible). Even just getting Penlight into the program is a little awkward; lib-lua/import.lua contains:

-- Lets internal references in Penlight work
package.path = package.path .. ";./lib-lua/?.lua"

class = require("pl/class")
pretty = require("pl/pretty")

Yech.

Of more interest is the lib-ffi directory. This, too, is a little awkward. Like I said, LuaJIT consumes C header files, but it doesn’t do it the way a compiler does– there’s no include path, or anything. Instead you call a function named ffi.cdef with the header file contents passed in as a string. Except you can’t just put those contents in verbatim– LuaJIT doesn’t understand C preprocessor directives, so you have to strip those out first. Like I said, this is part of why I stopped using Lua– every time you bring in a new library there’s this labor-intensive process of massaging it so Lua can talk to it. In this case it doesn’t seem so bad though, because I only have to do it once, for the LLVM library (and maybe Clanglib at some point). Any libraries I use beyond that are going to be talking to my generated executable, not my Lua script.

I figure out which headers I’m going to need, and fill out lib-ffi/import.lua with:

ffi = require( "ffi" )

-- Libraries

local libExt = ({
   OSX     = "dylib",
   Windows = "dll",
   Linux   = "so", BSD = "so", POSIX = "so", Other = "so"
})[ffi.os]

local function libPath(name)
    return config.llvmLibPath .. "/" .. name .. "." .. libExt
end

LLVM = ffi.load( libPath("libLLVM") )

-- Headers

local llvmDir = "lib-ffi/llvm/" .. config.llvmVersion .. "/"

require(llvmDir .. "Types")
require(llvmDir .. "Target")            -- Requires Types
require(llvmDir .. "TargetMachine")     -- Requires Types, Target
require(llvmDir .. "ExecutionEngine")   -- Requires Types, Target, TargetMachine
require(llvmDir .. "ErrorHandling")
require(llvmDir .. "Core")              -- Requires ErrorHandling, Types
require(llvmDir .. "Transforms/Scalar") -- Requires Types

There are a couple of things going on here.

When you compile a C program, there’s a compile phase (which if you’re using a library, is when you interpret its headers) and a link phase (which if you’re using a library, is when you link the library binary in). LuaJIT does all of this at runtime, but it still needs the headers and the libraries. You feed it the headers as strings so it knows what memory layout to use, and then you give it the direct path of the library to dynamically load. It’s all manual and there’s nothing intelligent like a linker in play, so I have to do the legwork of figuring out things like what the file extension for a dynamic library is on the current operating system. There’s also absolutely no way for me to figure out where the LLVM libraries are installed, so I expect whoever runs this script to provide that path before running, in the config.lua file.

Then we get to the headers. I choose to version these, and make the user say which version they want in config.lua.

My goal is to create LLVM "abstract syntax trees". LLVM, as I understand it, has three ways to specify an AST: As LLVM IR, which is a textual representation of ASTs which changes with every version of LLVM; as bitcode, which is a binary representation which is standardized and compatible between LLVM versions; or by calling the functions in the LLVM or LLVM-C API. My understanding is that the LLVM-C API is stable— if you write a program targeting LLVM 3.6, you should be able to recompile with LLVM 3.9 and it should work. (If this is not true, please, somebody warn me now.) However, just because the 3.6 and 3.9 APIs are source compatible does not mean you can use the 3.9 headers with the 3.6 libraries. And I’m going to do something weird— I’m going to embed the 3.9 headers directly into my source repository, because I have to, because I’m loading my hand-massaged versions of the headers rather than the ones on disk.

Right now everything I’m doing is just a weird experiment— I’m not designing any of this to be convenient for other people to use. However, it seems like it would be pretty bad if I designed this in a way it literally only works on this one computer, with this one version of LLVM. Weird experiment or no I need to leave the door open for another human to run this, especially since anything that will trip up another human will also trip up my own future self. I open that door by leaving space for multiple versions of the "massaged" LLVM headers to be checked into this repository: I check in the 3.9-compatible headers I’ve got (in a folder awkwardly named 3_9, since the Lua loader can’t handle directory names containing periods), and expect that people using a different version of LLVM in future can at least do the conversion process on their own headers and check that in alongside 3_9.

With that established, I copy all the 3.9 LLVM-C headers into my 3_9 folder and check them in unchanged— I won’t be using them, but I want them around for future reference— and select the handful I need, the handful from import.lua, to convert to Lua. This process is mostly simple; I rename to .lua, delete the #if guards and #includes from the top and bottom, and stick ffi.cdef[[ at the top and ]] at the end. (The square brackets create a multi-line string.) Easy! Except:

  • Since I removed the #includes, I don’t get dependency management done for me. This means I have to work out the dependency tree by hand and make sure I include everything in the right order (hence the weird comments in import.lua).
  • Core.h contains some trickery with #define— some C metaprogramming. I have to run the C preprocessor manually on this file (making sure to do so after I remove the #includes at the top, because otherwise I’d get the entirety of stdlib pulled in to the file).
  • Target.h is even worse— it contains this strange repeated preprocessor construction:

    /* Declare all of the target-initialization functions that are available. */
    #define LLVM_TARGET(TargetName) \
      void LLVMInitialize##TargetName##TargetInfo(void);
    #include "llvm/Config/Targets.def"
    #undef LLVM_TARGET  /* Explicit undef to make SWIG happier */

    What is this? Well, apparently, when the LLVM libraries were installed on my computer, the install built these .def files containing per-system configuration information. If you look in Targets.def in the main LLVM headers, you find it’s full of lines like

    LLVM_TARGET(AArch64)
    LLVM_TARGET(AMDGPU)
    LLVM_TARGET(ARM)
    ...

    This is a problem. The reason I believe I can move my "massaged" 3.9 headers to another computer and have the code still work with the local dynamic libraries is I assume things like structure layouts don’t vary between any two compiled copies of a single version of LLVM. But, the .defs contents will vary from computer to computer! The only possible solution here is to leave the metaprogramming macro garbage in Target.lua, and instruct anyone who tries to compile this code to run the preprocessor themselves (thus getting the .def contents from their own computer). The project now has a README instructing users to run this monstrosity before running the not-compiler:

    gcc -I/opt/local/libexec/llvm-3.9/include -E -P -C -xc-header ./lib-ffi/llvm/3_9/Target.source.lua > ./lib-ffi/llvm/3_9/Target.lua

    That’s… really unsatisfying, so it looks like if I keep with this experiment, I’m going to eventually need something like a build script.

generate.lua

3,135 words into this blog post, I am finally ready to start writing code! I translate the most minimal chunk of the llvm-c-kaleidoscope project’s main file into Lua as can stand alone:

require("config")
require("lib-lua/import")
require("lib-ffi/import")

class.Builder()
function Builder:_init(moduleName)
    self.llvm = {
        module = LLVM.LLVMModuleCreateWithName(moduleName),
        builder = LLVM.LLVMCreateBuilder()
    }

    local enginePtr = ffi.new("LLVMExecutionEngineRef[1]")
    local msgPtr = ffi.new("char *[1]")
    if LLVM.LLVMCreateExecutionEngineForModule(enginePtr, self.llvm.module, msg) == 1 then
        local err = ffi.string(msgPtr[0])
        LLVM.LLVMDisposeMessage(msgPtr[0]);
        error("LLVMCreateExecutionEngineForModule error: " + err)
    end
    self.llvm.engine = enginePtr[0]

    self.llvm.passManager = LLVM.LLVMCreateFunctionPassManagerForModule(self.llvm.module)
    LLVM.LLVMAddTargetData(LLVM.LLVMGetExecutionEngineTargetData(self.llvm.engine), self.llvm.passManager);
    LLVM.LLVMAddPromoteMemoryToRegisterPass(self.llvm.passManager);
    LLVM.LLVMAddInstructionCombiningPass(self.llvm.passManager);
    LLVM.LLVMAddReassociatePass(self.llvm.passManager);
    LLVM.LLVMAddGVNPass(self.llvm.passManager);
    LLVM.LLVMAddCFGSimplificationPass(self.llvm.passManager);
    LLVM.LLVMInitializeFunctionPassManager(self.llvm.passManager);
end

function Builder:dump()
    LLVM.LLVMDumpModule(self.llvm.module)
end

function Builder:dispose()
    LLVM.LLVMDisposePassManager(self.llvm.pass_manager);
    LLVM.LLVMDisposeBuilder(self.llvm.builder);
    LLVM.LLVMDisposeModule(self.llvm.module);
end

builder = Builder("main")
builder:dump()

Here I create a single empty "module", and I dump the corresponding LLVM IR. I run it and:

; ModuleID = 'main'

It works! I don’t know what a module is yet— I assume it corresponds to an object file— but I’ve at least demonstrated I can talk to LLVM and not crash.

Here’s the project I’ve written so far. I want to go a little further than this, though. I want to make an executable.

Emitting to a file

Assuming a module is an object file, there are two things I need to do from here: I need to add a "main" function as entry point; and I need to write the object file to disk. I attempt the second thing first, since I assume it will be easier. This assumption is wrong.

The Kaleidoscope sample program doesn’t emit files— it uses JIT— so this is the first LLVM thing I’m left to do on my own. I investigate. Apparently the way you emit a module as an object file is LLVMTargetMachineEmitToFile(). This function is a bit of a monster. It takes two configuration arguments, one of which is a complex structure called a TargetMachine whose constructor takes seven arguments, many of which are magic strings.

At this point I’m starting to feel the limitations of the LLVM documentation. There’s no documentation for LLVM-C as such, so I wind up looking everything up twice, once in the LLVM-C reference to figure out what the corresponding C++ function is, then into the C++ documentation to find out what the function actually does. Once I find the function I want, the explanation is invariably limited to the strict function definition; questions like "but when do I call this?" or "but where does the TargetMachine come from?" are outside the scope of the reference document. But the reference document is the only document, so at that point I’m left searching StackOverflow. Meanwhile StackOverflow only helps when the thing I want to do is common; in the case of emitting an object file, apparently this is enough of a weird or at least power-usery thing that most entry-level LLVM users don’t attempt it. Apparently everyone else is using a command line tool called llc that converts LLVM IR to object files.

The arguments to LlvmCreateTargetMachine() are complicated because LLVM is a cross-compiler; it could compile for your machine you’re sitting at, or some arbitrary number of other machines. How do I get the specification for the machine I am sitting at? I decide to pull up the source code for llc and see what it did. And it turns out there are a bunch of functions for exactly this, returning defaults corresponding to your current machine! Things like getDefaultTargetTriple() and getFeaturesStr(). These functions are in sys.h and CommandFlags.h. Of the C++ headers. They’re not exposed to C. Um.

I need to specify eight things: a "target", a "triple", a "CPU", a "features" list, and four options related to optimization and what you’re building (library, executable, etc). The last four are thankfully enums, so those are easy to work out just from looking at the headers, and I find a function that can derive the "target" from the "triple", so now I’m down to three mysteries. I know how to look up the "triple" for my own machine, but again I don’t want this software bound to one computer. I wind up begging on IRC and Twitter for about 48 hours, and people who have done this before help me out. What I find out:

  • The "triple" is the same "what do I compile?" string all compilers use; you can get a sample from any one compiler on a given machine with clang -version or gcc -dumpmachine.
  • There is after all a C-exposed LLVM.LLVMGetDefaultTargetTriple(). (I apparently missed this in my first few scans of the five thousand lines of headers.)
  • "CPU" is the same as -march on a traditional compiler— it specifies microarchitecture. If you specify armv7 here, you’ll get optimizations that the armv6 can’t support, but your software also won’t run on armv6 anymore.
  • The "CPU" and "features" strings can be empty strings. If you do this, it gives you the most conservative possible microarchitecture for the triple you asked for. (Which might be closer to what you want anyway than the "defaults" sys.h would give you— after all, ask for the "default" on a brand new PC and you might get something no one else can run.)
  • The "target" is something weird and LLVM-internal, and refers to which LLVM backend you want to run. You’ll want to get this by using the function I mentioned that derives it from the triple. (You can also look up targets by name, but the names are pretty arcane— there’s no apparent consistency or correspondence to the outside world in the naming scheme, it was just up to whoever wrote that one LLVM add-on. For example there’s one x86 target that works with both x86 and x86-64, but the 32 and 64 bit ARM targets are separate.)

Armed with all of this, I’m able to put together an emission function which populates everything from the default triple (with overloading allowed via config.lua):

function Builder:emitToFile()
    local targetRefPtr = ffi.new("LLVMTargetRef [1]")
    compileTriple = config.compileTriple or LLVM.LLVMGetDefaultTargetTriple()
    compileCpu = config.compileCpu or ""
    compileFeatures = config.compileFeatures or ""

    if LLVM.LLVMGetTargetFromTriple (compileTriple, targetRefPtr, errPtr) == 1 then
        llvmErr("LLVMGetTargetFromTriple")
    end
    if LLVM.LLVMTargetMachineEmitToFile(
            LLVM.LLVMCreateTargetMachine(targetRefPtr[0], compileTriple, compileCpu, compileFeatures,
                LLVM.LLVMCodeGenLevelAggressive, LLVM.LLVMRelocDefault, LLVM.LLVMCodeModelSmall),
        self.llvm.module, cstring(self.moduleName .. ".o"), LLVM.LLVMObjectFile, errPtr) == 1 then
        llvmErr("LLVMTargetMachineEmitToFile")
    end
end

And then things get weird.

My first sign of trouble is a cryptic message Unable to find target for this triple (no targets are registered). I waste time assuming I fed the function a bad triple before realizing what this really means: there are no targets, at all. It turns out before LLVMGetTargetFromTriple() will work, you have to call a special function to initialize any targets you might want to hypothetically load. There is a different initialization function for each target— LLVMInitializeAArch64Target() for 64-bit ARM vs LLVMInitializeX86Target(), for example. They apparently did it this way because most users link LLVM into their program statically, and partitioning the target backends into functions like this means you don’t have to pack the support code for whatever "MSP430" is in to your binary if you don’t plan to support it. That makes sense, but I’m linking dynamically, and I don’t know which platforms I do or don’t support ahead of time.

What a user like me— using the dynamic LLVM, not really certain what they intend to support— usually does is call the LLVMInitializeAllTargets() function. Which is defined in Target.h

/** LLVMInitializeAllTargets - The main program should call this function if it
    wants to link in all available targets that LLVM is configured to
    support. */
static inline void LLVMInitializeAllTargets(void) {
#define LLVM_TARGET(TargetName) LLVMInitialize##TargetName##Target();
#include "llvm/Config/Targets.def"
#undef LLVM_TARGET  /* Explicit undef to make SWIG happier */
}

…as an inline function. And now I’m stuck. LuaJIT can’t call those— at all. Again, LuaJIT calls functions by loading them from a dynamic library, but inline functions aren’t in the dynamic library, they’re C code embedded directly in the header file.

I get around this by doing something truly horrible. LLVMInitializeAllTargets() is an inline function generated from Targets.def using C preprocessor. I’m already generating Target.lua by running the preprocessor on a Lua file, in order to clear out preprocessor directives inside the embedded C header string. So I just go ahead and, uh… generate some Lua… using the C preprocessor.

function LLVMInitializeAllTargets()
  #define LLVM_TARGET(TargetName) LLVM.LLVMInitialize##TargetName##Target()
  #include "llvm/Config/Targets.def"
  #undef LLVM_TARGET
end

After the preprocessor runs, this gives me a Lua function that calls each of the functions that the inaccessible C LLVMInitializeAllTargets() would have been calling. I make three more of these functions (in addition to initializing the "targets" themselves, you need to also initialize the "target infos", "target MCs", and "ASM printers" for each target, or you will get completely misleading error messages). And it works!

If I call my big combined LLVMInitializeAll() function first, emitToFile() prints no errors and I get a 200-byte main.o file which file claims to be Mach-O 64-bit.

Okay! Now can I actually do something?

Doing something

My module needs some code. Let’s do the simplest possible thing: Call printf.

LLVM modules keep a symbol table containing all known functions. When the operating system tries to run the exe I’m making, it will start by looking up a function named "main" in this symbol table and calling it.

local mainType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), nil, 0, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

I construct the type for a main function, then construct the function itself. AddFunction will add the function to the symbol table and then return the new function object.

local mainEntry = LLVM.LLVMAppendBasicBlock(mainFn, "entry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, mainEntry)

Since I will be adding code to the function, I need to create a "basic block" for the code to live in. "Basic block" is a standard term from compilers referring to a continuous section of code without any branches in it.

Earlier, I made a "builder object". This appears to be something like a text cursor. There’s a series of "Build" functions, and each inserts a bit of code at whatever point in the module the builder is positioned at. I add my new basic block to the function object, and then I position my builder at the end of the basic block. The basic block takes a name when you create it; as far as I can tell this name is arbitrary, and I assume it’s only there in case you want to direct a jump there from the end of some other basic block. I name it "entry".

Next I want to call printf. But first I have to tell LLVM what printf is:

local printfParams = ffi.new("LLVMTypeRef [1]")
printfParams[0] = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0) -- address space 0 (arbitrary)
local printfType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), printfParams, 1, true) -- vararg
local printfFn = LLVM.LLVMAddFunction(builder.llvm.module, "printf", printfType)

Again here I’m constructing a function and putting it in the symbol table. However, this time I am not going to be adding any code to it; I’m just putting it in the symbol table so LLVM knows it exists and what type it is. Later the linker will supply the implementation to go with that symbol table entry. The type on this one is a little more complicated than the one for the main function; it has arguments.

local printString = "LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\n"
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "tmpcall")

Now I’m ready to start using the builder. I Build a global string, and then I Build a call site. Both the string and the call site have "names". Like the basic block, the names appear to be arbitrary, although I cannot imagine a possible use for the names like I could with the basic block.

LLVM.LLVMBuildRetVoid(builder.llvm.builder)

Since I’m done, I cap off my basic block by returning void.

Thirteen lines of code. And that is actually all I need! My generator script spits out a new main.o and gives me a glimpse of what the code I just constructed looks like in LLVM IR:

; ModuleID = 'main'

@tmpstring = private unnamed_addr constant [47 x i8] c"LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\0A\00"

define void @main() {
entry:
  %tmpcall = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([47 x i8], [47 x i8]* @tmpstring, i32 0, i32 0))
  ret void
}

declare i32 @printf(i8*, ...)

That was straightforward, right? Let me take a moment to talk about what I had to do to get that thirteen lines of code.

Again, the LLVM documentation is not very helpful:

These reference documents I’ve been working with were created by Doxygen, which means unless someone went to the bother of commenting a particular function you get only these autogenerated blurbs. I mentioned that LLVMBuildCall() has a "Name" parameter and I don’t know what it does; well, what do I see if I check the documentation? Answer: The LLVMBuildCall() documentation is autogenerated and gives only a source line number. Well, but that’s LLVM-C documentation, right? What do I see if I check the "real" documentation, the C++ documentation? Answer: The IRBuilder::CreateCall() documentation is autogenerated and gives only a source line number. I never did figure out what the "Name" meant.

Now that I’ve finally got to generating ASTs I’ve encountered a new issue with the docs— many of the StackOverflow suggestions for things I need to do by this point are actually turning up as LLVM IR snippets. So now I get instructions given to me as LLVM IR, which I then have to figure out how to translate into C++, so that I can go from there to translating into C (so I can rewrite it in Lua). On the bright side though, LLVM IR turns out to be lavishly documented.

In the end, I figured out very little of the AST generation code by myself. The AST code above was constructed by consulting a mix of the llvm-c-kaleidoscope project, this tutorial written by IBM, and this sample code written by a French security researcher. The IBM and 0vercl0k samples do almost exactly the same things I’m trying to, and were tremendously helpful. But even coding with these great models to follow took some tinkering, since I still had to get past a problem I didn’t really talk about so far: LLVM reacts to most unexpected inputs by crashing. For example, LLVMCreateTargetMachine() above takes an empty string as a valid input; but pass in a NULL string and you get a crash.

A lot of these crashes are pretty easy to deal with. Early on in writing the above code, I was trying to use LLVMConstString() where I should have been using LLVMBuildGlobalStringPtr(), which meant I was passing in to printf the equivalent of a char[48] instead of the equivalent of a char *. The result was an assert:

This I could deal with. It’s not as specific as it could be, but the stack is clean, and the assert gave me a line number. I found out I could get MacPorts to show me the source code to my LLVM version (sudo port patch llvm-3.9 && `port work llvm-3.9`), and this let me look up the context, and after some testing with LLVMTypeOf()/LLVMDumpType() I understood what the problem was and where to start on fixing it.

The other problems I had were… not like that. I had a bug where I was trying to call LLVMBuildGlobalStringPtr() before creating the basic block and positioning the builder; the symptom was a segmentation fault deep inside the optimization pass manager when emitting the object file. I had a bug where I was failing to end my function with a ret void; the symptom was an assert failure, again deep in the optimization pass manager, with the totally mind-boggling assert error Assertion failed: (Val && "isa<> used on a null pointer"), function doit, file Casting.h. In both of these cases, there was basically nothing I could do to understand what went wrong or how to fix it, no breadcrumb trail back to my code. I got out of both of these situations by just checking to see if there was anything the IBM/0vercl0k samples did but I didn’t, and adding those things until the crashes went away.

The reason I’m going into all this is that these various issues taught me something: This is going to be a problem. The thirteen lines of code above are very simple, but constructing them took a full day and in the end was achieved only by following someone else’s example. A lot of the things I do next I will not have examples to follow. So this tells me if I want to make any progress I’m going to need to change how I’m doing things:

  • I need, at least at first, to stick to the heavily trafficked paths. There’s some entry points to this library that lots and lots of people have gone through, and some that are less heavily used. What I’m finding is whenever I veer into the second area things get difficult. As a user of LLVM-C rather than the standard libraries I’m already pushing it; I unfortunately probably can’t afford to be an outlier in other ways also. I’ll want to always code things the simple, standard way before trying anything clever, and I’m probably going to need to get used to testing LLVM IR before attempting things against the API.
  • I am going to need to get used to the internals of the LLVM code. I tried pulling the source code from MacPorts; that’s not enough. I’m going to need to build my own LLVM and get debug symbols (I might have been able to figure something out with the isa<> used on a null pointer error, for example, if I’d been able to inspect locals and find out what was null) and I’m probably going to need to read enough source code to understand the architecture of entire modules before I can understand some of the error messages I’m going to get. This is going to be, in other words, one of those libraries.

Linking

But, I now have a main.o file! Now I just have to link it and I’m done.

This part is obnoxious. My original goal was to write a program that does all the things a compiler does. But you sort of can’t write a linker. Linkers are more the domain of the operating system and its code loading system than they are the domain of language standards, and so effectively only the operating system vendor can really write one because only the OS vendor knows what the linker even does. Whatever your preferred C compiler is, it doesn’t actually link; it actually just finds the OS-supplied linker and invokes it as a command-line program. In other words, the chart I drew at the top of this post is not quite accurate. A more accurate chart would be:

To make my original claim of "no compiler" accurate I could maybe call the linker myself (thus leaving this post with instructions that only work on OS X), or maybe look into LLVM’s experimental linker lld which is someday supposed to be able to do the unthinkable and replace the OS linker. But why make things difficult? I take the cheap route:

cc main.o -o main

And my program works!

$ ./main 
LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!

The Lua script that acts as this program’s "compiler" is less than a hundred lines long.

So this is pretty promising! If I can call printf, I’ve got a basis for generating programs that do much more complicated things. It’s only a couple more steps to add variables or flow control, and once I get that done I’ll be able to do fun things like making tiny DSLs out of idiomatic Lua, or maybe even prototype some low-level systems for Emily. Here‘s the source code for my final Lua scripts in this post, and I will definitely be posting more about this in future.

Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.

My Lambdaconf talk on the Emily programming language

March 1st, 2016

You might have seen my posts here about Emily, my hybrid functional/object-oriented programming language project. I actually gave a talk at Lambdaconf in May of 2015 about the project, and it is now available on Youtube. While the documents on the Emily website are intended to be introductions to the language for a general audience, the talk was about design, and was aimed at functional programmers and programming language enthusiasts. I’m pretty happy with how it turned out. Here it is if you’d like to watch it:

BECOME A GREAT ARTIST IN JUST 10 SECONDS, updates

February 28th, 2015

Back in 2013, I made a game called BECOME A GREAT ARTIST IN JUST 10 SECONDS with Michael Brough for Ludum Dare 48. It is fun! You should play it.

In the time since, ARTIST has been featured at Indiecade 2014 as an official selection– here it is being demoed at the Night Games event–

Following which, ARTIST was selected as a finalist at the 2015 IGF festival. If you make it to the IGF festival this week, I’ll be out there demoing ARTIST on the expo floor, and Gamasutra has an interview with me and Michael about it.

One last thing. GREAT ARTIST was demoed twice at the Gamecity event in Britain, both times with custom controllers– once a special trackball version made by Niall Moody:

And once on an honest to goodness arcade cabinet, using a special two-player version of the game:

The two player version was only up for about six hours, but somehow in that time Rock Paper Shotgun managed to play it.

— — — —

If you’d like to try any of the things mentioned in this post out:

  1. Download the original version here
  2. Download the “expo” version here. This is the version shown at Indiecade and the IGF pavilion, it runs full screen and if left untouched for more than 20 seconds enters a “slideshow” mode cycling through artwork by Liz Ryerson.
  3. Download the two-player “arcade” version here, in case you have access to something like a cabinet– this one has special controls, which are explained in the readme.

GREAT ARTIST is open source, and all three of these versions can be found on the BitBucket page.

Announcing the Emily programming language, version 0.1

January 18th, 2015

This last year I’ve been working on a programming language. Around April, I sat down and I wrote out a list of all the things I want out of a language. Nothing that exists right now felt satisfying, so I decided to make my own. I’ve named this language Emily, and as of today I have a finished “version 0.1” release that you can download and run. This version is very primitive in certain ways, but you can write a program in it and it points enough toward what the language can eventually be that I’m excited.

An example

Here’s a small program in Emily:

width = 80

foreach ^upto ^perform = {
    counter = 0
    while ^(counter < upto) ^( perform counter; counter = counter + 1; )
}

inherit ^class = [ parent = class ]

line = [                               # Object for one line of printout
    createFrom ^old = {
        foreach width ^at { # Rule 135
            final = width - 1
            here   = old at
            before = old ( at == 0 ? final : at - 1 )
            after  = old ( at == final ? 0 : at + 1 )
            this.append: ( here && before && after ) \
                     || !( here || before || after )
        }
    }
    print ^ = {
        this.each ^cell { print: cell ? "*" : " " }
        println ""                                          # Next line
    }
]

repeatWith ^old = {  # Repeatedly print a line, then generate a new one
    do: old.print
    new = inherit line
    new.createFrom old
    repeatWith new
}

starting = inherit line        # Create a starting line full of garbage
next = 1
foreach width ^at (
    starting.append: at != next
    if (at == next) ^( next = next * 2 )
)
repeatWith starting                                             # Begin

This executes the 1d rule 135 cellular automata, or in other words, it prints strange pyramids:

*  * *** ******* *************** ******************************* ***************
      *   *****   *************   *****************************   **************
 ****   *  ***  *  ***********  *  ***************************  *  ************
  **  *     *       *********       *************************       **********
*       ***   *****  *******  *****  ***********************  *****  ********  *
  *****  *  *  ***    *****    ***    *********************    ***    ******
*  ***          *  **  ***  **  *  **  *******************  **  *  **  ****  ***
    *  ********         *               *****************               **    **
 **     ******  *******   *************  ***************  *************    **
    ***  ****    *****  *  ***********    *************    ***********  **    **
 **  *    **  **  ***       *********  **  ***********  **  *********      **
       **          *  *****  *******        *********        *******  ****    **
 *****    ********     ***    *****  ******  *******  ******  *****    **  **
  ***  **  ******  ***  *  **  ***    ****    *****    ****    ***  **        **
   *        ****    *           *  **  **  **  ***  **  **  **  *      ******
**   ******  **  **   *********                 *                 ****  ****  **
*  *  ****          *  *******  ***************   ***************  **    **    *
       **  ********     *****    *************  *  *************      **    **
******      ******  ***  ***  **  ***********       ***********  ****    **    *
*****  ****  ****    *    *        *********  *****  *********    **  **    **
 ***    **    **  **   **   ******  *******    ***    *******  **        **
  *  **    **        *    *  ****    *****  **  *  **  *****      ******    ****
        **    ******   **     **  **  ***               ***  ****  ****  **  **
*******    **  ****  *    ***          *  *************  *    **    **
 *****  **      **     **  *  ********     ***********     **    **    ********
  ***      ****    ***         ******  ***  *********  ***    **    **  ******
*  *  ****  **  **  *  *******  ****    *    *******    *  **    **      ****  *
       **               *****    **  **   **  *****  **       **    ****  **
******    *************  ***  **        *      ***      *****    **  **      ***
*****  **  ***********    *      ******   ****  *  ****  ***  **        ****  **
****        *********  **   ****  ****  *  **       **    *      ******  **    *
***  ******  *******      *  **    **         *****    **   ****  ****      **
 *    ****    *****  ****       **    *******  ***  **    *  **    **  ****
   **  **  **  ***    **  *****    **  *****    *      **       **      **  ****

Anyway, even without knowing the syntax, you might notice a few things looking at this program:

  • This language is extremely extensible. Near the beginning, I realized there were two things that I didn’t get a chance to put in the standard library: a foreach function, and a way to instantiate an object of a class. So I just implemented them in the program. Very simple.
  • Support for both [“unpure”] functional and object-oriented styles: Functions are being passed as values all over the place, functions are being created offhandedly (that’s the ^), objects with prototype inheritance are created very casually (that’s []), tail recursion works.
  • Clean, familiar syntax: At least, assuming you’ve been writing a lot of Python, JS, or Lua, this looks a lot like code you’ve written before, other than the ^s.

Why’s Emily special?: For language geeks

The one central New Idea in Emily is that everything is a function, or maybe put another way, objects and functions are interchangeable. You can think of an “object”, or a structure, as being a function that maps key names to values; Emily actually treats objects this way. If an object can be a function, then anything can be; 3.add can be the function that adds 3 to another number, 3 can be the function that maps .add to 3.add.

The important thing about this isn’t that functions specifically are important; what turns out to be important is simply that everything in Emily is the same thing. Everything acts like a unary (one-argument) function, so where other languages might have several “verbs” (field lookup, variable definition, function definition, function application, arithmetic) Emily has exactly one verb (function application). All you’re doing is applying functions in a particular order, and writing code just means plugging those functions together like pipes. Even the “conventional” syntax elements, like + or the . lookups, are just unary function applications in disguise. Even declaring a variable is done with a function call.

So why does this matter?

Why’s Emily special?: The practical side

I like to build abstractions in my code. If I’m writing a C program, and I find I’m writing for(int c = 0; c < upto; c++) {something} over and over, I get annoyed; I wish I could just define a “foreach upto {something}”, like I did in the pyramid program up top.

I like to mix different kinds of tools when I write software. I really like Lua, but Lua isn’t very good for certain kinds of binary manipulation and threading, so I write programs that are part C++ and part Lua. But this is tricky; the languages each have their own complexities, and the complexities clash. Languages don’t work well with each other.

I assert both these things get a lot easier if you have a language whose underlying model is very simple.

In Emily, where everything is function applications, building complex abstractions just means plugging the applications together in a particular order. The language itself doesn’t need to be this simple– again, if you look above, a lot of stuff is happening in the code that doesn’t look like a function application. But that complexity is something built on top of the language, rather than being a fundamental part of the model. This means it can be extended further, and it also means it can be replaced.

Syntax like + or =, for example, is actually performing macro transformations– 3 + 5 * 4 gets rewritten into 3.plus(5 .times 4). This makes these operators easy to extend– if you want to design an object that “acts like” a number, you just define an object that implements the .plus and .times methods. It also makes them possible to replace– the transformations are just little programs, and in a later version of Emily you’ll be able to define your own operator transformations, if there’s a different syntax you’d like better.

That’s not all that impressive, though– operator overloading is a pretty standard feature in languages, and macros are not the way you’d prefer to implement abstractions. Rather the more basic element in Emily of everything is function application is what opens up the really powerful possibilities. Let’s try something a little more unusual. As mentioned above, Emily has prototype inheritance. But it’s single inheritance– only one parent per object. What if you’d prefer multiple inheritance? Well, you can implement it yourself, by writing a single function:

# A function that returns a function-- it generates the fake "parent"
dualInherit ^parent1 ^parent2 = ^key {
    thisUpdate this: \
        (parent1.has key ? parent1 : parent2) key
}

hasAddOne = [
    addOne ^ = this.value = this.value + 1
]
hasAddTwo = [
    addTwo ^ = this.value = this.value + 2
]

child = [
    parent = dualInherit hasAddOne hasAddTwo
    value  = 0
]

do: child.addOne
do: child.addTwo
println: child.value

So what’s happening here? The object that child inherits from is chosen by setting the parent key. But objects and functions in Emily are interchangeable. So child inherits from a function. dualInherit takes two desired parents and manufactures a function which takes a key and executes it on whichever of the two parents knows how to respond. child then uses dualInherit to inherit from both the classes hasAddOne (from which it inherits the method addOne) and hasAddTwo (from which it inherits the method `addTwo).

What else?

There’s a couple more interesting things that made it into the language, even as early as this version is:

  • \version 0.1
    This feature is small, but I believes it solves a somewhat fundamental problem with programming languages. Each Emily program is encouraged to start with a line identifying the language version it was developed against. When an Emily interpreter– current or future– encounters the \version line, it makes a decision about whether that code can be run or not. If the hosting interpreter is backward-compatible with the code’s version, it just runs. But if backward-incompatible changes have been made to the language since then, it will either enter a compatibility mode or politely refuse to run. At some point, it will be possible to install these compatibility modes as pluggable modules.

    I write a lot of Python, and a huge running problem is compatibility between versions. In Python, as in most programming languages, the implementation version is the same as the language version. Python 2.4 runs Python 2.4, Python 2.7 runs Python 2.7, Python 3.1 runs Python 3.1, etc. Meanwhile Python 2.7 can run Python 2.4 code, but Python 3.1 can’t run Python 2.7 code, which means Python is competing with itself and nobody uses Python 3 because all the code’s written for 2.7. (And even before the big 3.0 switch, forward compatibility created a huge problem all by itself: If you had a program that used a feature from 2.5, but what you had installed was 2.4, you wouldn’t know it until you tried to run it and something would break strangely, possibly at runtime.)

    This is all silly! Language versions define interfaces, and interpreters are engines. We shouldn’t be holding back on upgrading our engines because the interface changed (and if there’s some reason a new engine can’t handle the old interface, it should at least fail very early). It’s generally possible at least in principle to convert between these interfaces, so it should be possible to install something that does conversion for an incompatible past interface (probably even a future one!). It should be possible to mix code written against different interfaces in the same program– maybe even the same file. There’s surely a point at which this becomes untenable (library cross compatibility probably gets awkward quick), but language implementors not being able to get updates adopted because nobody wants to lose back compatibility with 15-year-old versions doesn’t sound very tenable either.

    Anyway, for now: Just tag each file with its version, and all this becomes a lot easier to sort out later.

  • Proper Unicode support

    This really ought to be something we just expect of a language these days, but Emily is being developed for full Unicode support and the interpreter treats source files as UTF-8. Right now this only extends as far as handling Unicode whitespace– well, and the macro system supports unicode symbols, but since you can’t make macros yet that’s not so useful. As soon as possible though my goal is to implement UAX #31 so you can use Unicode in identifiers, and I’m hoping to work with coders fluent in non-latin-script languages to make sure Emily is usable for those purposes.

    Oh, also: At some point I’m just gonna make smart quotes work as quotes. If we’re gonna keep pasting them in to our code by accident, they might as well work.

  • Return continuations

    I’m… not sure I should be calling too much attention to this, but it is a bit unique. As I’ve said, everything in Emily is a function, at least in form. There are no “keywords” as syntax constructs– any “syntax” is just shorthand for function calls. So if I’m going to implement, say, return, it has to be something that can be treated like a function. return in Emily winds up being what’s called a “continuation”– a function-like object that when called just jumps to a particular place, in this case the end of the method call.

    This has an interesting side effect that I had not entirely intended:

    timeMachine ^ = (           # Function takes no arguments
        goBackward = return     # Store return in a variable. Yes, really.
        return 1                # Return. This line only runs once.
    )
    
    counter = do timeMachine    # The return value of "do TimeMachine" is 1, right?
    println: counter            # Only the first time-- every time we call goBackward,
    goBackward: counter + 1     # we return "counter + 1" from timeMachine,
                                # even though timeMachine already ended.
    

    …it turns out “return” for a particular function call can be treated like any other value, and even outlive the function call it was born in. As with any other kind of continuation, this creates the opportunity for some very powerful constructs (I’m trying to work out how I can implement Java-style named breaks with it) and also the opportunity for some really bad ideas. I’m going to see if I can find a way to encourage the former while maybe putting some limits on the latter.

What next?

What you see here is part of a more ambitious set of ideas (link goes to my original design writeup before I started writing any code). Here’s some things I want in Emily eventually; some of this may not be completely feasible, but I think even making it part of the way there would be great.

  • Types

    There’s a lot of things I want to try with typing in Emily, and I’m nervous about elaborating too far on what they are before I know for sure what I can pull off. But in general: Emily should have type annotations. Typing should be “gradual”; you should be able to have a part of your program with type annotations and a part without. There should be a prover for types, and an inference engine. If you assign something to a variable or argument which has a type annotation, that should mean a runtime check for correctness. There should be a “compile time” check also; the compile time check should be able to tell some calls are definitely type-correct and optimize out the correctness check, and it should be able to tell some calls are definitely type-incorrect and refuse to run the script until they’re fixed. (Right now, nothing stops you from typing 3(4) except that this will fail at runtime).

    This is all pretty standard for a language written in the last ten years. But I want to try some odder things, based around Emily’s core ideas that everything in the language is “the same thing”, interchangeable with functions, everything is extensible and there is no syntax “magic”. So: Types should be constructable at runtime. It should be possible to use a function as a type; there should be a syntax for turning x such x > 4 into a type. Some types should just be language-level assertions, like there should be a type for “this variable’s value is known at runtime” or “executing this function has no side-effects”. There should be an ability along the lines of assigning a type as a key for an object– something like [ .name="minusone"; ^x of int = x - 1 ] for a function that returns "minusone" when applied on .name or subtracts 1 when called with an int. In other words, I want average objects to be able to act like match or pattern-matching guards from a functional programming language. (This would mean that you could build up guards that inherit from other guards, which I kind of like.)

  • C++

    I need to be able to interoperate with other languages. I write video games, so there’s a whole bunch of C++ libraries I really need access to. Interoperability with C++ is hard. Interoperability with anything is hard. As alluded above I used to write a lot of Lua, and the interaction layer with C++ was just incredibly complicated and involved generating code with a Python script.

    Complicated or no, I’m going to have to write an interop layer for Emily. But I think that there are some design choices in Emily that are going to make this easier. Because Emily is so dynamic, I can generate a bunch of stuff at runtime that otherwise would have required a code generator, and Emily’s everything-is-functions rule means that wrappers can be much less awkward than they might in another language. Objects and classes and methods from different languages all have different semantics; if you have an object in one language which actually wraps a guest object from another language, you’re going to wind up with weird mismatches where the objects and methods you’re interacting with in the host language don’t quite behave like their analogues in the guest language (the “complexity clash” I mentioned earlier). But an object in Emily is a smooth ball; its semantics are just a matter of what arguments you feed it. If the type system is smart, it could even verify that the arguments you feed the wrappers are always consistent with the semantics of the guest object.

  • Reader macros

    Emily has macros that transform the AST into a different AST before it is executed. A lot of functional languages have the concept of a “reader macro”, which generates AST from whole cloth. There’s a piece of implied code called the “reader” which is consuming the text of your program and parsing it into an AST; \ will eventually be a way of sending messages to this reader, possibly enabling transformations that a normal macro couldn’t perform because it would need access to the program source as a string. Right now the only reader instruction is \version, and the only thing it can do is halt the program if the version is wrong. There will be more reader instructions in future (for example, \op will probably be the way to define the normal, non-reader macros).

    What really interests me though is the possibility of a \reader, which would cause the reader to just plain get out of the way and hand off parsing the rest of the file to a piece of Emily code. My daydream here is that if you could write a \reader that parses some completely other programming language and emits Emily ASTs (hopefully easy since Emily really only barely has an AST, the AST is just a tree of applications)– and if I’m right that it will be unusually easy to write Emily code that accepts other languages’ objects as guests– then Emily could become a useful intermediate layer that knows how to translate between several different languages and mediate their differences.

These are all Big Ideas though, and the ideas the language already has could stand some cleanup, so for 0.2 I’m going to be focused on basic functionality improvements (less confusing scoping, operators on strings, short-circuiting booleans, user-defined operators, unicode, package loading, IO).

Downloading and running Emily

As mentioned, Emily is available from a BitBucket page (or, if you can’t use Mercurial, its GitHub mirror), but not yet in any other form. You will need to compile it yourself; it’s written in Objective Caml, so you’ll need to install that first. For instructions, see https://bitbucket.org/runhello/emily/src/stable/doc/build.md. By the time you read this, there may also be install packages; see emilylang.org.

What’s “BitBucket”?

If you’re not familiar with BitBucket or Github, go to the Downloads page on BitBucket, click the “Branches” tab, and to the right of the word “stable” click “zip”. This will download a zip file of the most recent stable release.

Can’t I just download something?

If you happen to use “homebrew” for Mac OS X, Misty De Meo has helpfully put together a homebrew package. You can install Emily via brew by running:

brew install mistydemeo/emily/emily

Hopefully by the next release we’ll have Debian packages too!

Getting started

If you’re interested in Emily and want to give it a shot, a good thing to read next would be the tutorial. If you just want to know more, there’s a bunch of documentation including some design documents and information about modding the language in the docs folder.

License

Emily is MIT licensed, meaning it is open source and only attribution for the authors is required. The list of contributors is here.

Open source index, Indiecade

October 6th, 2014

Hi hi. Two small blog-relevant updates.

  1. It got frustrating to me how hard it is to get around the list of projects on my BitBucket page, so I made a sorted index of all my open source code. The list is hosted on BitBucket and includes all my game projects, including a number I’ve never released.

  2. The game BECOME A GREAT ARTIST IN JUST 10 SECONDS I made with Michael Brough is an Official Selection at Indiecade 2014. The game will be demoed at Indiecade’s Night Games event this Saturday, with this customized keyboard made by Rachel Yonamine:

    GREAT ARTIST keyboard

    Here’s our Indiecade trailer:

Emily programming language: Status update one

June 4th, 2014

About a month ago, I put up some plans for a programming language. It actually got some interest!, so I thought I’d go over some things that have happened since then.

First off, I decided the name of the language is definitely Emily, unless I’m somehow legally forced to call it something else. I made a website for it and an announcement-feed Twitter, at emilylang.org and @emilylanguage, respectively.

Second off, I now have a prototype implementation! It’s a very, very minimal implementation– you seriously couldn’t develop anything in it, and it’s missing most of the language’s better planned elements– but it does demonstrate a couple of the language’s striking features, and it’s turing-complete. (And the turing completeness part can be demonstrated in an amusing way: It turns out to be possible to embed Unlambda programs in it.)

Programs written in the language don’t look very pretty right now. Most of the “operators” in Emily, like = or +, are eventually intended to be defined as macros, and those macros aren’t implemented in the prototype yet, so setting a variable looks a little funny:

# Prints "3.0"
set .test 3
print test

And adding looks a little funny:

# Prints "7.0"
print (3 .plus 4)

In the final version of Emily, writing “3 + 4” will do the same thing as writing “3 .plus 4”– the only difference will be that + has operator precedence, and .plus does not.

There’s one macro that *is* in already– ^, or “lambda”, which turns whatever comes right after it into a function:

# Prints "4.0"
^x( print x ) 4

This defines an “anonymous function”– if you aren’t familiar with that idea, it’s a function that doesn’t have a name but just shows up in an expression, like a literal– that takes one argument and prints it. It then feeds the anonymous function the argument “4”.

With these couple things explained, here’s a more complicated Emily program that works in the current interpreter:

set .countup ^arg {
    set .count (arg.from)
    loop ^(
        print count
        print "\n"
        set .count ( count .plus (arg.step) )
        count.lt (arg.to)
    )
}

countup[ set .from 10; set .to 20; set .step 2 ]

Again, not very pretty, but this demonstrates a couple of interesting things that Emily tries to do.

To recap my last blog post a bit, everything in Emily is formally a function– everything. Objects are functions pretending to be objects, functions which take key names (in the form of strings like .plus— assume this is just a funny way of saying “plus”) and map them to values. Numbers are objects, which are functions. Scopes are objects (which are functions); the language sets up and shuffles around the scope functions silently, and it forwards any bare symbols it finds (like “set” or “count” in the examples) to the current scope function as a string argument. A lot of the time these various functions don’t really *act* like functions, of the kind we normally recognize (for this reason I’ve been calling them “blocks”, after Smalltalk) but the fact the language doesn’t distinguish the different kinds of functions semantically opens up a lot of possibilities for useful abstraction. The only “verb” in Emily is “apply”– apply function A to argument B. Everything else is just built on top of that.

Not a lot happens in this example, but a couple things do stand out: assigning a variable value just means invoking the “set” method on the current scope (i.e., passing .set to the current scope block, doing which returns a function that can be used to alter that scope block’s mappings). There are no special flow control constructs– “loop” is just a function that takes a closure as argument and executes it until it returns false. The thing that really stands out to me here though are the appearance of {} and []– the countup language body is wrapped in a {} to give it its own variable scope, and when we invoke countup the argument list is written as an “object literal”, denoted by []. What’s interesting here is how these two things are implemented by the language.

Parenthesis in Emily create “groups”– I have to call them that because I used the word “block” for something else already, maybe a mistake. Any parenthesis in Emily can contain multiple statements, separated by semicolons. A trick here is that group (), scoped group {}, and object literal [] are all the same thing, distinguished only by what the scope is within the group and what is returned at the end. More interestingly, there’s no “magic” to the group operators– effectively, what defines the different group types are a couple of lines of setup/teardown code that run at the beginning and end of the group, and that setup/teardown could in principle be written in Emily. In other words, {} and [] could in the final version just be implemented as macros! In the current implementation, all three group types are handled by the same code in the interpreter, with the difference being:

    (Assume any time a statement executes, its evaluated value is assigned to a variable named “last”)
    Plain groups: Set [scope] to the enclosing scope; after final statement return “last”.
    Scoped groups: Create a block [newscope], which prototype-inherits from the enclosing scope, and set [scope] to it. After final statement return “last”.
    Object literals: Create a block [object]. Create a block [newscope], which prototype-inherits from the enclosing scope. Assign [newscope].set = [object].set and [newscope].this = [object]. After final statement, return [object].

Did I get too language wonky there? The point I’m trying to make here is that “an object literal syntax” is a really basic thing to a language, but this language kind of doesn’t have one, because it doesn’t need one. That’s just normal code running inside of the [], and the thing that makes [] “special”– the setup/teardown– could have been implemented by an end user. This is exciting to me because it implies end users can implement other language extensions, things as complicated as an object literal syntax, on their own.

The implementation

The prototype implementation of Emily is hosted on BitBucket, although it will not do you a lot of good right now. The main problem is that it is written in Objective-C, which means that it will only easily build if you have OS X and XCode. Compiling on other operating systems is possible but will require setting up something like GNUStep. I haven’t investigated this myself, mostly because I intend to port to something-other-than-Objective-C rather than bother with GNUStep. If you do have a mac, or do have GNUStep set up, you’ll probably find it pretty easy to set up and extend. The current implementation is only about five or six source files. (Sorry if the BitBucket page is a little confusing, they just changed their layout and now it’s all weird– the cloud with the arrow in it is the “download source tarball” button.)

Remember, this is literally the very first thing I got to run at all– as of this exact writing there’s no code documentation, no documentation of the exact language as implemented so far (the closest is the sample code files), and there are known memory leaks. I’m not even giving this a version number yet. Still, it’s a start! If you want to try actually using or extending it, feel free to contact me and I’ll help you get set up.

I designed a programming language and it looks like this.

April 22nd, 2014

For reasons I talk about here, I’m going to try to create a programming language. So far I’ve got basically a first-draft design.

There is a specific idea I decided I want to explore, when I do this: Programming languages duplicate too much. Programming languages often have multiple syntaxes that do very similar things, or multiple underlying concepts that do very similar things. It is sometimes possible to collapse these similar things into one thing, and when we do, I usually like the results better.

For example, many languages (Perl, Python, PHP) have both a dictionary type and an object type, but the two are used in effectively the same way; on the other hand Lua collapses dictionaries and objects into one type (tables), and makes an object field lookup identical to a dictionary string lookup. Or most object oriented languages distinguish objects and classes, but prototype based languages show that you can get by with just objects; if objects can inherit from other objects, then a “class” is just a pattern for a particular kind of object. When you collapse ideas together like this, or build language features on top of existing features rather than adding new primitives, you reduce both the amount of mental overhead in thinking about code implementation and also the amount of redundant syntax. There’s usually a lot of redundant syntax. C++ uses . to access a field from a reference, and -> to access a field from a pointer. Most languages use (x) to indicate an argument to a function, and [x] to indicate an index to an array. Why? If pointers and references were just special cases of one underlying concept, or if arrays and functions were, you could use one syntax for each pair and you wouldn’t have to mentally track what each variable is, you wouldn’t have to do all the obnoxious manual refactoring when you suddenly decide to replace a reference with a pointer somewhere or vice versa.

In the language I’ve been thinking about, I started with Lua’s “Table” idea– what if i built objects out of dictionaries?– and decided to take it one step further, and build both objects and dictionaries out of functions. In this language, there’s one underlying data structure that functions, objects, dictionaries, and some other stuff besides are just special cases of– design patterns of.

Taking a cue from Smalltalk, I’m going to call this underlying structure “blocks”.

Blocks are just functions

A block, for purposes of this blog post, is a unary function. It takes exactly one argument, and it returns a value. Anywhere in this blog post I say “blocks”, I could have just written “functions”. I’m going to mostly use the “block” jargon instead of saying “functions” because some blocks will be “used like” functions and some very much will not be.

In my language, you’ll define a function just with “=”:

    addOne ^x = x + 1

The ^x is an argument binding. The ^ signals to the language that the right side of the = needs to be a function body (a closure). If on the next line you just said

    y = addOne 3

That would just assign 4 to the variable “y”, it would not create a function.

Blocks are pattern-matched functions

A big part of this project is going to be that I really like the ideas in functional languages like ML or Haskell, but I don’t actually enjoy *writing* in those languages. I like OO. I want a language that gives me the freedom and expressiveness of FP, but comfortably lets me code in the OO style I use in Python or Lua or C++. So I’m going to steal as many ideas from FP languages as I can. Three really important ideas I’m going to steal are closures, currying, and pattern matching.

In case you don’t know those languages, let me stop and explain pattern matching real quick. You know how C++ lets you function overload?

    // In C++
    void addOneHour(int &k) { k = (k + 1) % 12; }
    void addOneHour(float &k) { k = fod(k + 1.0, 12); }

Well, pattern matching is as if you could switch not just on type, but also on value:

    // In hypothetical-C++
    void addOneAbsolute(int &k where k > 0) { k = k + 1; }
    void addOneAbsolute(int &k where k < 0) { k = k - 1; } void addOneAbsolute(0) { } // Do nothing

That last line– the one demonstrating we could write a function whose pattern matches only *one single value*– is going to be important to this language. Why?

Blocks are dictionaries

In my language, if I want to assign more than one pattern to a single block, I just use = multiple times:

    factorial ^x = x * factorial (x - 1)
    factorial 0 = 1

“Factorial” is a block. The way I’m looking at it, a block is just a data structure which maps patterns to closures. It’s like a dictionary, but some of the keys (the ones with bound variables) match multiple values.

However we could not bother assigning any bound-variable patterns, and then we’d just have a dictionary or an array:

    nameOfMonth 1 = "January"
    nameOfMonth 2 = "February"
    nameOfMonth 3 = "March"
    ...

Blocks are objects

Here I want to introduce a data type called an “atom”. This is an idea stolen from Erlang (and possibly Ruby?). Technically an atom is an “interned string”. It’s something that the programmer sees as a string, but the compiler sees as an integer (or a pointer, or something which has a constant-time comparison). You get at the atom by putting a . before a symbol; the symbol is the name of the atom:

    x = .atomname

It’s cheaper to compare atoms than strings (.atomname == .atomname is cheaper than “atomname” == “atomname”) and cheaper to use them as dictionary lookup keys. This means atoms work well as keys for fields of an object. Objective-C for example actually uses atoms as the lookup keys for its method names, although it calls them “selectors”. In my language, this looks like:

    constants.pi = 3.14
    constants.e = 2.71
    constants.phi = 1.61

Notice this looks like normal object syntax from any number of languages. But formally, what we’re doing is adding matching patterns to a function. What’s cool about that is it means we’ll eventually be able to use machinery designed for functions, on objects. Like to skip ahead a bit, eventually we’ll be able to do something like

    map constants [.pi, .e, .phi]

and this will evaluate to an array [3.14, 2.71, 1.61].

What’s up with the square brackets? Oh, right. well, I think all this “constants.” nonsense is gonna get kinda tiresome. So let’s say there’s a syntax like:

    constants = [ pi = 3.14, e = 2.71, phi = 1.61 ]

Notice I say “pi” and not “.pi”– on the left side of an =, the initial “.” is implicit. More on that in a moment.

One other thing. Inside of the [ ], there exists an implicit “this” variable, corresponding to the object the [ ] creates. so if you say

    counter = [
        count = 0
        increment ^x = { this.count = this.count + x }
        decrement ^x = { this.count = this.count - x }
    ]
    counter.increment 1
    counter.increment 3

Then at the end of this string of code “counter.count” is equal to four.

Blocks are prototypes

What if we want more than one counter object? Well, you’ll notice an interesting consequence of our pattern matching concept. Let’s say I said:

    counter = [
        init ^x = { this.count = x }
        increment ^x = { this.count = this.count + x }
        decrement ^x = { this.count = this.count - x }
    ]

    counter_instance ^x = counter x
    counter_instance.init 3
    counter_instance.increment 5

When we say “counter_instance.whatever”, the language interprets this as calling the block counter_instance with the argument .whatever. So if counter_instance is defined to just re-call “counter”, then on the next line saying “counter_instance.init 3” will fetch the block stored in counter.init, and then that block gets called with the argument 3. The way the “this” binding works is special, such that counter_instance.init gets invoked “on” counter_instance– “this” is equal to counter_instance, not counter.

The syntax we used to make counter_instance “inherit” is pretty ugly, so let’s come up with a better one:

    counter_instance.ditch = counter

I haven’t explained much about how = works, but when we say “counter_instance ^x = “, what we’re really doing is taking a closure with an argument binding and adding it to counter_instance’s implementation-internal key-value store, with the key being a pattern object that matches “anything”. “.ditch” is a shortcut for that one match-anything key slot. In other words, by setting counter_instance.ditch to counter, we are saying that counter is counter_instance’s “prototype”.

Something to make clear here: the lines inside a [ ] aren’t “magic”, like C++ inside a struct declaration or anything. They’re just normal lines of code, like you’d find inside a { }. The difference is the insides of [ ] are using a specially prepared scope with access to a “this” and a “super”, and at the end of the [ ] the [ ] expression returns the scope into which all these values are being assigned (“this”). The upshot is you could easily have the first line of your [ ] be something like an “inherit counter;” call that sets the ditch and does some various other fix-up to make this prototype system act more like some other kind of object system, like a class system (I like classes). This sort of thing is possible because

Blocks are scopes

Like most languages, this one has a chain of scopes. You’ll notice above I offhandedly use both ( ) and { } ; these are the same thing, in that they’re a series of statements which evaluate to the value of the final statement:

    x = ( 1; 2; 3 )

…sets x equal to 3. The 1; and 2; are noops. (Semicolon is equivalent, in the examples I’ve given here, to line-ending. There’s also a comma which is effectively a semicolon but different and the difference is not worth explaining right now.)

The one difference between { } and ( ) is that { } places its values into a new scope. What is a scope? A scope is a block. When you say

    a = 4

The unbound variable a is atom-ized, and fed into the current scope block. In other words “a” by itself translates to “scope.a”. When you create a new inner scope, say by using { }, a new scope block is created, and its ditch is set to the block for the enclosing scope. The scope hierarchy literally uses the same mechanism as the prototype chain.

Block constituents are properties (or: blocks are assignment statements)

Non-language geeks may want to skip this section.

I’ve been pretty vague about what = does, and that’s because it has to do several layers of things (matching items that already exist, binding variables, wrapping closures, and actually performing assignment). However, ultimately = must write [pattern, closure] pairs into one or more blocks. = cannot, however, actually write anything by itself. Ultimately, when = decides it needs to assign something, it is calling a “set” method.

    a = 4

Is ultimately equivalent to

    scope.set .a 4

That = is sugar for .set is a small detail, but it has some neat consequences. For one thing, since everything that happens in this language is curryable, it means you can trivially make a function:

    a_mutator = set.a

…which when called will reassign the “a” variable within this current scope (remember, “set” by itself will just be “scope.set”). For another thing, this means you can create a “property” for a particular variable:

    set.a ^x = ( b = x + 1 )
    a = 3

After this code runs, “b” will be equal to 4 (and “a” will still be equal to a function that mutates “b”).

The existence of .set will also have some interesting effects once we have types and therefore a type checker. I’ve been kinda vague about whether = has “set” or “let” semantics– that is, if you assign to a variable does it auto-instantiate or must you predeclare it, if there is a variable by the assigned name in the ditch does assignment shadow in the assigned block or reassign in the parent block, etc. And the answer is it doesn’t much matter for purposes of this post, because any of the possible things that happen when you set a field (“not declared” error thrown, assigned to top-level block, assigned to a parent block) could just be all things that could and do happen in different blocks, depending on what that block’s .set is set to. For example, it would probably make sense for object blocks and scope blocks to have a different last-ditch “.set” behavior, or be sensible to allow different source files to have different “.set”s for their file-level scopes (“use strict”).

On that note, let’s talk about types. There’s a lot of very exciting stuff happening in the study of types in programming languages right now, both types as used in languages and types as used in extra-lingual static analysis tools. I don’t understand a lot of this research yet (and I want to learn) but I think I understand enough to have an idea of what’s possible with types right now, and that means I know how I want types in this language to work.

Blocks are types

Let’s say we have a syntax variable:type that we can use to constrain the arguments of a function.

    factorial ^x : int = x - 1

When this function is called, there will be a runtime check, if “x” is not an int, it will be a runtime failure. Let’s say we can use the a:b construct inside expressions too:

    square ^x = ( x * x:float ) :: stateless

Let’s say that :: instead of : indicates that the type is being applied not to the value returned by that parenthesis, but to the implicit “function” defined by the parenthesis itself. “stateless” is a type that applies to functions; if we assert a function is “stateless” we assert that it has no side-effects, and its resulting value depends only on its inputs. (In other words, it is what in another language might be called “pure”.)

There’s some kind of a inferred typing system in place. There’s a compile time type checker, and when it looks at that “square” function it can tell that since “x” is a float in one place inside the expression, that the “x” passed into square must itself be a float. It can also tell that since the only code executed in “square ^x” is stateless, that the function “square ^x” is also stateless. Actually the “stateless” is from the checker’s perspective unnecessary, since if the checker has enough information about x to know the * in (x * x) is a stateless operation– which, if it knows x is a float, it does know that– then square ^x would be stateless anyway.

There’s some kind of a gradual typing system in place. There is a compile-time step which, everywhere square ^x is called, tries to do some kind of a type-proving step and determine if the argument to square is a float. If it can prove the argument is a float, it actually omits the runtime check to save performance. If it *can’t* prove the argument is a float, or it can prove the argument *isn’t* a float, it adds the check and maybe prints some kind of compile-time warning. (To stress: some of these properties, like “stateless”, might be in many cases *impossible* to prove, in which case the checker is conservative and treats “can’t prove” as a failure.) Besides omitting safety checks, there are some other important kinds of optimizations that the type checker might be able to enable. Critically, and this will become important in a moment, if a function is stateless then it can be potentially executed at runtime.

So what are types? Well, they’re just functions. “int” and “stateless” are language-builtin functions that return true if their arguments are an int, or a provably stateless function, respectively. (For purposes of a type function, if the type *doesn’t* match, then either a runtime failure or a return false are okay.) Types are values, so you can construct new ones by combining them. Let’s say that this language has the || and && short-circuit boolean operators familiar from other languages, but it also has & and | which are “function booleans”– higher level functions, essentially, such that a | b returns a function f(x) which is true if either a(x) or b(x) is true. So if “stateless” and “nogc” are two of the builtin type functions, then we can say:

    inlineable = stateless | nogc

And if we want to define a totally unique type? Well, you just define a function:

    positive ^x = x > 0
    sqrt ^x : positive = x / x    # Note: There might be a bug here

Obviously you can’t use just any function here– there would have to be some specific type condition (probably something like the “inlineable” I describe above) that any function used as a type in a pattern would be required to conform to. This condition would begin and end with “whatever the type checker can efficiently prove to apply or not at compile-time”.

Let’s finally say there’s some sugar for letting you define these “type condition” functions at the same time you define the function to whose parameters they apply; we could reduce that last block down to

    sqrt (^x >= 0) = x / 2    # Square root implementation, WIP 2

One other bit of sugar that having a type system makes easy:

Blocks are argument lists

So everything so far has been a unary function, right? There’s only so much we can do with those. This language is set up for currying– that’s how method lookup works, after all– and I would like to offer explicit sugar for curry:

    curryadd ^x ^y = x + y

But ehh, I don’t actually like using currying for everything. I like argument lists. And I really, *really* like named arguments, like Python uses. Let’s say we have this syntax:

    divide [^numerator, ^denominator = 1] = numerator / denominator

The “parameters” block there? Is totally just a block. But there’s some kind of block wiring such that:

    divide [4, 2]           # Evaluates to 2
    divide [4]              # Evaluates to 4-- "denominator" has a default argument
    divide [9, denominator=3]                       # Evaluates to 3
    divide [denominator = 4, numerator = 16]        # Evaluates to 4
    divide [ ]       # Compile-time error -- assignment for "numerator" not matched

There’s some sort of block “matching” mechanism such that if the argument block can be wired to the parameter block, it will be. I don’t have an exact description handy of how the wiring works, but as long as blocks remember the order in which their (key, value) pairs are assigned, and as long as they can store (key, value) pairs where exactly one of key and value is (no value), then such a matching mechanism is at least possible.

My expectation is that almost all functions in this language will use the argument blocks for their parameters, and almost all invocations will have an argument block attached.

Blocks are macros

I wanna go back here and look at something closer: We’ve defined that there’s some subset of this language which can be run at compile time, and that the type checker can identify which functions are in that category. I think this is a pretty powerful concept, because it means the language can use *itself* as its macro language.

So far in this post, you’ve seen three main kinds of syntax in the code samples: Unary function application (again, a field lookup like a.b.c is really just a bunch of currying), “=”, and little extra operators like “+”. What I’m going to assert is that the extra operators– and also maybe =, and maybe even [ ]– are actually just rewrite rules. So for the line:

    3 + square 4

Before actually being executed, this line is transformed into

    3 .plus ( scope .square 4 )

“3”, like anything else, is a block. Like in Io or Self, adding three to four is just invoking a particular method on the 3 object. In this language “+”, the symbol, is just a shortcut for .plus, with parser rules to control grouping and precedence. (If we actually just wrote “3 .plus square 4”, then the currying would try to interpret this as “(3 .plus square) 4”, which is not what we want.)

There’s some kind of a syntax for defining line-rewrite rules, something like:

    op [ symbol = "!", precedence = 6, replace = .not, insert = .unary_postfix, group = .right_inclusive ]
    op [ symbol = "*", precedence = 5, replace = .times, insert = .infix, group = .both ]
    op [ symbol = "+", precedence = 4, replace = .plus, insert = .infix, group = .both ]
    op [ symbol = "==", precedence = 3, replace = .eq, insert = .infix, group = .both ]
    op [ symbol = "&&", precedence = 2, replace = .and, insert = .infix, group = .both ]
    op [ symbol = "||", precedence = 1, replace = .or, insert = .infix, group = .both ]

Which means for something like

    result = 12 * 2 + 9 == 3 + 8 * 4
    result = !parser.valid 34 && result

Ultimately what’s actually being executed is:

    scope .set .result ( ( ( 12 .times 2 ) .plus 9 ) .eq ( 3 .plus ( 8 .times 4 ) ) )
    scope .set .result ( ( ( scope .parser .valid 34 ) .not ) .and ( scope .result ) )

So this is fine for symbols like + and – which operate on two clearly-defined values, but what about something more complicated like “=”? Well, there ought to be some kind of way to pass “op” a .custom function, which takes in a list of lexed tokens representing a line and returns a transformed list of tokens. At that point you can do pretty much anything. “=” might be the *one* thing that you can’t implement this way because = does special things involving adding bindings. But short of that, custom “op”s would be sufficient even for things like, I don’t know, flow control:

    if ( a == 4 ) { k.x = 3 } else { k.x = 4 }

I may be getting into the language-geek weeds again here but I’m gonna walk through this: Let’s say I have a higher order function “if ^pred ^exec” which takes functions “pred” and “exec”, executes “pred” (pred is probably nullary… which I haven’t decided what that means in this language yet), if the result is true it executes “exec” and returns the void combinator (v ^x = v), if the result is false it returns a function which expects as argument either .elsif (in which case it returns if) or .else (in which case it returns a function that takes a nullary function as argument and evaluates it). We’ve now defined the familiar if…elsif…else construct entirely in terms of higher order functions, but actually *using* this construct would be pretty irritating, because the “pred” and “exec” blocks couldn’t just be ( ) or { } as people expect from other languages, they’d have to be function-ized (which means annoying extra typing, toss some ^s in or however lambdas are made in this language). But, we can declare “if”, “else” and “elsif” rewrite ops: “if” ^-izes the next two tokens and then replaces itself with just “if” again; “else” and “elsif” ^-ize the next one token each and then replace themselves with .else or .elsif. If we do this, then the familiar if… else syntax above just *works*.

…why am I going into all this, about “if” “else”? Well, because I want to stress that it means *flow control constructs can be implemented in the language itself*, and they will be truly first-class equals with builtins like “if” or “while”. In my eventual vision of this language, the *only* language-level syntactical elements are

    .
    ^
    ( )
    [ ]
    { }
    ;

And *everything* else, including comment indicators and the end-of-line statement-terminator, is just rewrite rules, ideally rewrite rules written in the language itself. Which implies if you don’t like the language’s syntax much, you could just unload the builtin “stdops” module that contains things like “+” and “if”, and substitute your own. “op” rules are local to scopes, so syntax could vary hugely file to file. Which… well, shouldn’t it? I know people who avoid entire languages because they don’t like one or two things about the syntax. Say, people who go “well, Objective-C has a neat object model, but I can’t get used to all those square brackets”. Or I in my last blog post specifically said that although they both have lots of features I like, I personally won’t use LISP because I can’t make visual sense of S-expressions, I won’t use Javascript because of the casting rules. None of this makes any sense! Languages should be about *features*. They should be models of computation, and we should be evaluating them based on how expressive that model is, based on the features of the underlying flow control or object model or type system or whatever. Syntax shouldn’t have to be part of the language selection process, and if languages let us put the sugar on ourselves instead of pre-sugaring everything then it wouldn’t have to be. I’m probably getting carried away here. What was I talking about? Did I say something just now about casting rules? Let’s talk about casting rules.

Blocks are language machinery

Some syntactical elements, like [ ] and =, might be too complex for the programmer to plausibly implement themselves. The programmer should still have a fair amount of control over these builtins work. One way to do this would be to have things like [ ] and = implicitly call functions that exist in the current scope. For example, instead of calling .set, = might call a function “assign” that exists in current scope; this would allow individual scopes to make policy decisions such as the variable auto-instantiation rules I mentioned earlier. [ ], at the moment it instantiates the new block, might call a function “setup” that exists in the current scope, allowing the programmer to do things like change the default ditch (base class) or the exact meaning of “inherit”. There might be a function that defines the default type constraints for numbers, or strings, or lines of code. Maybe somewhere there’s a Haskell fan who wants to be able to have every ( ) wrapped up to be ^-ized and every line wrapped in ( ) :: stateless, so that any code *they* write winds up being effectively lazy-evaluated and side-effect-free and they can only communicate with the rest of the language using unsafe monads. They should be able to do that.

One thing I definitely want in is for there to be something like a “fallback” function which, if a particular block is called with an argument whose type doesn’t fit any pattern the block has defined, attempts to map the argument to one of the patterns the block *can* handle. In other words questions about whether different but intraconvertable types like ints and floats can be converted between without a cast would be a decision made on a per-project or per-file basis. Or for example if there’s a function

    square ^x:int = x*x

and one of the patterns on the fallback block is

    fallback ^fn : function( ^type, _ ) [^x : type] = fn x    # Follow all that?

(Let’s assume “function” is a higher-order type function such that function(a,b) is the type of a function a -> b, and let’s assume _ has the magic property of “match anything, but don’t capture it” when used in a pattern.)

…then even though the function is only defined for (square x) we could totally get away with calling square[ x ], because the fallback function could match [ x ] to x.

Uh, incidentally, I’m not totally sure this thing with the fallback function is actually in general possible or possible to make performant. But as with most of the stuff in this language, I think it would be fun to try!

Blocks are C++ or Javascript objects in disguise, potentially

There’s one last thing I want to talk about here, although it’s one of the most important features from my perspective. The model we have here– where formally speaking all field accesses execute functions, all field assignments execute functions, and there’s some kind of type checker at work capable of tracking fine detail about what kinds of operations get performed on individual blocks– means that the underlying language-level implementation of “a block” could differ from block to block.

The model I’ve described here for blocks is extremely dynamic and flexible– *too* flexible, such that it would be very difficult to make code using all these dynamic features performant. Except not every block will be using all of the features blocks have. Some blocks will only contain “value” keys (i.e. never a ^var:type pattern), and the type inferrer will be able to prove this the case. The compiler/interpreter could represent this one block internally as a plain hashtable, rather than taking the overhead to enable executing arbitrary code on every access. Some blocks, despite being mutable, will have a fixed known set of keys; the language could maybe represent these in memory as plain structs, and translate atoms to fixed memory offsets at compile time.

And some blocks, at the programmer’s direction, might be doing something else altogether. It’s easy to imagine a “proxy object” where each invocation of an atom and an argument on the block is actually copying the atom and argument and shipping them into another thread or across a network, and the type checker ensures the contract is followed and objects are actually copyable; you could build an Erlang style messaging system this way.

Of particular interest to me, some blocks might actually be guests from some totally other system, say a different language with its own object model. An FFI for some other language could make wrapper blocks for that language’s objects, and put in place type guarantees that the programmer does not interact with those blocks in any way the guest language does not support. The two languages I’d personally really like to be able to interface with this way are C++ and Javascript, because these languages have valuable platform and library support, but also are languages I do not actually want to *write*.

C++ in particular interests me, because I’m not aware of any languages which are “higher level” in the sense that interests me but which can currently adequately interface with C++. C++ is actually pretty tricky to interface with– the big problem here, to my mind, being that method calling conventions (name mangling) vary from compiler to compiler. Actually, on some platforms (by which I mean “Windows”) it’s the case that shared libraries (DLLs) can’t be shared between compilers even if you are writing in C++ yourself. It would probably be necessary, if making a C++ FFI, to target one particular compiler (I’d vote Clang, because it’s extensible and has good platform support). Choosing to target one particular compiler would have a neat side effect: With some knowledge of the compiler’s implementation details, it *ought* to be possible to make blocks that inherit from C++ classes, and have those blocks actually construct fake vtables at runtime that jump into the compiled code for (or interpreter for) my language. Since in my language “classes” and “objects” get constructed by calling functions whose execution could be potentially deferred to runtime, it would be essentially invisible to the programmer when they say [ inherit QObject; objectName = “Block” ] whether a normal block or a pseudo-C++ class is being constructed.

Okay?

Anyway, here’s what I think I’ve got here. I started with one single idea (pattern-matched unary functions that remember the order in which their patterns were assigned), and asked the question “how much of what a language normally does could I collapse into this one concept?”. The answer turns out to be “very nearly EVERYTHING”, including stuff (like type specifications, macros and FFIs) that most languages would wind up inventing effectively an entire sub-language just to support (templates… ugh). I actually *do* want a new programming language, mostly because of that thing I mentioned with not liking any existing language’s C++ interop, and I actually do intend to at least attempt this project. I’m basically just gonna download the Clang source at some point and see how far I get. One thing in my favor is that since this language is based on a small number of simple things that interact in complex ways, I could probably get a minimal implementation going without too much difficulty (especially if I don’t go for types on the first pass).

Oh, one very final thing: I never said what all of this is called. In my head I’m planning to name this language either Emily, because despite being fundamentally OO it strikes me as a fairly “ML-y” language; or Emmy, after Emmy Noether. I’ll decide which later.

That’s all.

Note to commenters: Harsh criticisms are very much welcomed. Criticisms based on demographics or assumptions about my background are not. Thanks.

Statement of Intent

April 22nd, 2014

I think I have decided that these are the things I most want in a programming language:

    1a. Objects with string keys, which double as dictionaries.
    2a. Closures.
    3a. A way to store a closure in an object and invoke it with the relevant object bound to some variable within the function (“this”).
    4a. Prototype-based inheritance for objects.
    5a. The potential to instantiate a closure, or an arbitrary singleton object, inside of an expression (“lambda”).
    6a. Gradual typing (with calls from untyped code into typed code being protected by something like a check and an exception throw at runtime).
    7a. Two-way communication with C++ (C++ objects may be visible within the language, language objects may be passed into C++ code).
    8a. Two-way communication with Javascript.
    9a. Support for multiple concurrent threads of execution, with interaction between by means of message passing.
    10a. It must be possible for me to develop a program on one operating system and execute it on another operating system (i.e. there is a VM or a cross compiler).

And here are some things that I think would be very nice in a programming language:

    1b. Pattern matching.
    2b. Syntax support to allow me to declare curried functions, as transparently as if they were non-curried functions.
    3b. Function invocation with named arguments (i.e. Python keyword args).
    4b. Suspension and resumption of stacks (coroutines or generators).
    5b. Atoms (interned strings which can be used as object keys without paying for a string lookup).
    6b. Properties (i.e. overloading of assignment).
    7b. The language itself should be ultimately defined as a AST, with multiple possible syntaxes reducing down to that single AST.
    8b. Language features (ints, exceptions) should be possible to enable or disable for individual programs or pieces of code. Not all features fit all projects.
    9b. There should be a potential for (arbitrary and user-defined?) type “adjectives” which are assertions that are remembered as part of the type and checked as part of the type checker. (Is this object mutable? Is this object mutable with regard to its set of keys? Does this function have side effects? Are these two properties recursive for the related object/call graph? Does this function ever use “this”? Is this object ever accessed with a non-atom key? Is this integer greater than three?)
    10b. Potential to directly manipulate the scope of a closure (I.E. mappings of unbound variables in the function body) after function declaration.
    11b. The collapsing of functions and objects. (“Look up key ‘f’ on object a” and “Invoke method a with unary argument ‘f'” should be the same operation, same syntax, same behavior. Function scope and prototype chain should be, insofar as the user can tell, the same mechanism).
    12b. Customizable automatic casting rules (if we have 1b, applied by a user-selected pattern-matched function).
    13b. Syntax support for object “classes” that have the familiar semantics of Python or Java. Ideally, classes would just be a pattern implemented on top of prototypes, but the syntax should make this pattern easy and the class identity should be visible to things like debuggers.
    14b. Some simple user-configurable syntax redefinition (like the ability to define an operator ^ where invoking 3^4 should be actually interpreted as exponent(3,4), and ^ has a user-defined operator precedence, etc).
    15b. The ability to have some sort of object which is ultimately represented at runtime as a packed memory buffer, such that saying “set element 4 of x to 3” is actually writing “3” into a specific well-defined byte or bytes in memory. (This is a useful thing both because of performance optimizations it makes possible, and for interacting with hardware like GPUs which only understand byte streams.)

I would like to explore ways to get to a point where I have a programming environment that satisfies everything from the A list and some emotionally satisfying number of items from the B list. In seeking such a language, there are incidentally some conditions that any language I spend time using will have to satisfy. I don’t want to have an argument with anyone about this final list; just think of them as personal preferences.

    1x. It must not require me to program with S-expressions.
    2x. It must not require the use of the JVM or CLR. (I am potentially willing to compromise on the CLR.)
    3x. It must not force me to program in a way which avoids the use of side-effects or state.
    4x. It must not require me to use any closed-source compiler, core library, or development tool.

No language currently exists which provides everything on my A list. Some of the items on the B list, particularly 10b and 11b, do not exist in any language that I am aware of. Most of the most interesting items on the B list only seem to exist in languages which fail one or more items on the X list. Any currently existing language will be a compromise. How fair do these compromises look?

If I use Lua, I am happiest. Lua gets me 1a, 2a, 3a, 4a, 5a, 10a, 4b, 6b, and 13b, 15b if I’m using Luajit, and since I have 5a I can awkwardly approximate 3b. No language I know of does 1a better than Lua. Unfortunately, 7a and 9a, which Lua does *not* support, are two of my *most* important conditions– I literally can’t develop without those two, whereas missing 1a through 5a will merely make me sad.

    (Lua gets something close to 7a– two-way communication with *C*– and two-way communication with C can be used to construct trampolines that *approximate* communication with C++, but in my experience using this technique is so frustrating I would rather avoid Lua altogether than attempt it again. There are also incidentally projects which purport to offer 8a for Lua, but I have not seen these projects actually working and I believe that they would need additional development before being actually used.)

If I use Javascript, I do get 1a, 2a, 3a, 4a, 5a, 8a obviously, 10a, and 6b. If I am using Javascript ES6, I also get 9a, 3b and 4b. Actually, Javascript does pretty well against my lists. Unfortunately, some of the personally important items on the list Javascript satisfies, it messes up quite badly. For example it offers 4a, but uses a baroque method for defining an object with a prototype, and it does not offer any way when overriding a prototype method to invoke the “super” method. That last one’s pretty huge; “super” is a basic feature of all object systems which, since it is present in Self, I would argue without “super” you don’t have a prototype-based inheritance system at all. On similar lines, Javascript offers 2a but essentially ruins it by not (before ES6) offering proper block scope; if I can’t readily create a closure in (for example) a loop, that is so awkward I probably won’t ever use closures at all.

There is another problem with Javascript: It is an unending cavalcade of horrors. Javascript feels *shoddy*. Many extremely basic language features are riddled with weird exceptions, and exceptions to exceptions, such that you never feel certain what the code you have just written does; many kinds of simple typos result in a silent failure or unexpected behavior rather than any kind of error. (The type coercion rules alone would feel like a great argument against using Javascript even if everything else about the language were perfect.) I very much like what is in this toolbox, but the individual tools all feel as if they will fall apart in my hands if for example I hit anything too hard with that hammer. This shifting-sand problem is exacerbated by the dramatic variance between Javascript versions and implementations; the primary benefit of Javascript is its huge installed base, but only a small subset of the syntax is “safe” in the sense that an acceptable portion of your installed base is compatible with it. This is an unfortunate property when we are talking about mere syntactical convenience features: 3b and 4b are nice, but I would not literally give up users to get them.

    (I have not investigated if Javascript satisfies 7a).

If I use C++, I get 4a (or rather 13b, class-inheritance alone, which is close enough for me), 6a (not gradual typing but strong typing, which I don’t mind), 7a (sort of– assuming you never dynamically link against anything from a competing C++ compiler), 9a (with memory sharing instead of events, but you can add the events via a library), 10a (with *great* difficulty), 15b (although since 15b is *all* C++ offers for data storage, it isn’t very pleasant), and an extremely limited 14b that can be used to get 6b (although only for some kinds of variables). There is also a horrible, confusing mockery of 12b which really the language would be better off if they had not tried to offer this feature. If I use C++11, I get 2a, but it’s gross (the closures capture variables manually, rather than automatically; they’re only created if you use the “lambda” feature, which is totally separate from any other kind of function or method; and the syntax is incredibly ugly). C++11 also has “auto”, which is… one step closer to my ideal form of 6a than strong typing alone would be, I guess. Overall C++ and C++11 don’t offer *anything* exactly the way I’d most prefer it, and the only items on my list I’d even feel C++ is really “good enough” at are 4a/13b and 9a.

    (At this point one notices something interesting about my criteria 7a: Not even C++ does very well against it. This is a hard criteria to meet.)

I’ll leave it as an exercise to the reader to test my criteria against Python, ML, Haskell, and Erlang.

Looking at my options, I feel that if I could somehow hack Luajit to offer 6a and 7a and also support Lua Lanes (which provides 9a), I’d have a language I were totally satisfied with. I spent a while seriously considering doing just that. However, Luajit is big, and 7a in *particular* is such a hairy thing to implement that I wouldn’t want to attempt it in someone else’s codebase.

So I think if I want a language that satisfies all of these things, it would have to be a new language. Along those lines, I have written a follow-up post here.