No Compiler, part 2: Turing Completeness

Last week I made a blog post where I talked about making computer programs without a programming language, by writing Lua scripts that bind to the LLVM compiler backend library and call the instruction builder functions manually. I’m gonna keep playing with this, but before I keep going, people on Twitter pointed out some errors in the last post:

  • In the last post, I went to weird bother to construct the path to libLLVM, claiming that LuaJIT can’t figure out things like the standard operating system file extensions while loading libraries. @dotstdy points out that actually LuaJIT can figure out what to do just fine from ffi.load('LLVM'), as long as the library/DLL is in the normal OS search path. (I had only ever used LuaJIT for games, where you want to load specific binaries you shipped instead of anything the OS knows about.)
  • @whitequark points out several issues, notably:
    • In the last post I claimed lld, the LLVM project’s linker, is unique in being a linker that knows how to target multiple platforms. Actually GNU ld can do this also (although Darwin appears to be a notable hole in its platform support).
    • Some context: What LLVM frontends create is "LLVM IR", or Intermediate Representation. The Build functions in the LLVM API, which I’m calling, construct an in-memory version of this IR. The IR also has a textual representation, known as "LLVM assembly", which is dumped by the debug functions and consumed by llc. There’s also bitcode, which is a binary representation of the IR designed to be forward-compatible with future versions of LLVM. The true IR, however, is the in-memory representation.

      In the last post, I did not use this terminology correctly. I was thinking of the phrase "IR" as referring to the textual version (the LLVM assembly language) and so I repeatedly used the phrase "abstract syntax tree" to refer to the in-memory IR. This wasn’t a terminology I saw anywhere but I assumed it to at least make sense because "abstract syntax tree" is a sufficiently vague term. Unfortunately it turns out it’s an incorrect term in this case because in-memory LLVM IR has cycles (phi nodes) and is not a tree.

With that out of the way…

Parameters

In the last post I generated an executable that just ran printf(). Not really much of a "program". At that point I might as well have just been starting to write a compiler frontend— several LLVM frontend tutorials do start by doing exactly those steps.

If I’m going to be writing actual "programs" I need to learn to work with some more complicated things— at minimum variables, and loops/branches, and I need to be able to make and call functions. My goal for this post is going to be to write a small program that needs all these things— specifically, I’m going to write a program that factors numbers.

To get started, here was the code from the last post that emits the printf() call:

local printString = "LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\n"
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "tmpcall");

Let’s change this to add an integer argument to the printf():

local printString = "Number: %d\n"
local printNumber = 31337
local printfArgs = ffi.new("LLVMValueRef [2]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring")
printfArgs[1] = LLVM.LLVMConstInt(LLVM.LLVMInt32Type(), printNumber, false) -- no sign extend
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

printf() is a vararg function, so I don’t have to adjust its type definition to add the second argument. Running this prints (code so far):

Number: 31337

Let’s make the printed number an variable input by the user instead of a constant. I can get input from the command line if I adjust the prototype for main():

local int32Type = LLVM.LLVMInt32Type()
local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0)  -- address space 0 (arbitrary)

-- Main function prototype
local mainParams = ffi.new("LLVMTypeRef [2]")
mainParams[0] = int32Type
mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0)
local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

All seven lines of that was just to express int main(int argc, char *argv);. If I make this change I also ought to change my final return statement to return 0 instead of void:

LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false))

In theory, my main function now has parameters, which the header file suggests I can extract with LLVMGetParam(). I test this by changing one line of the printf invocation from earlier:

printfArgs[1] = LLVM.LLVMGetParam(mainFn, 0)

Running this I get (code so far):

$ ./main 
Number: 1
$ ./main 0
Number: 2
$ ./main 1 2 3
Number: 4
$ ./main a a a a a a a a
Number: 9

Wow, something actually worked on the first try! By printing the first parameter to mainFn I’m printing "argc", the number of command-line arguments. (The name of the program counts as an argument, which is why running without arguments prints 1.)

Aggregates

Printing the number of arguments is not actually what I want. What I actually want to do right now is print the first argument after the name. This means I somehow need to be able to index a value from the "argv" array. It turns out there is more than one way to do this. I look through the LLVM assembly reference, and here’s what I learn.

Before I go on, I should make something clear if it wasn’t already. Essentially all of the LLVM functions I’m calling, both the arguments and the return values are of one of two C types: Either LLVMTypeRef or LLVMValueRef. Each of these C types corresponds to a whole hierarchy of C++ types, which I assume users of the C++ interface might specifically care about, but I don’t (because the C interface collapses down to LLVMTypeRef/LLVMValueRef and then Lua’s dynamic typing hides even those coarse types). Each value object pointed to by a LLVMValueRef has a type object (an LLVMTypeRef) associated with it. Functions that consume LLVMValues usually have rules about which LLVMTypes the provided values can have— but in most cases LLVMType violations are not noticed by the library until much, much later at which point you get an unrecoverable assert failure, so you have to be really certain you’re following the LLVMType rules before you call functions.

With that out of the way, here’s what the assembly documentation tells me. There are three general categories of data type (categories an LLVMTypeRef can fall into) we care about when thinking about arrays:

  • Pointers: This means the associated value is literally a memory address, pointing to data of some known type.
  • Aggregates: This refers to a block of data containing a series of values. Aggregates come in two types, Arrays and Structs; values in an Array are all of the same type, a Struct has an ordered series of known types for its members. In both cases the aggregate has a known length, and you look up values in the aggregate by index ("the fourth value in this array", or "the second field of this struct"). However arrays have the quirk that by default you’re allowed to look up values past the end of the array; in this case you just get… whatever was in memory after the end of the array. The docs in one place actually advocate creating a 0-length array and indexing past the end of it in order to represent a C-style unknown-length array; I guess in this case the 0-length array would functionally be the same as a pointer, except that LLVM exposes different operations for working with pointers and Arrays and maybe you’d prefer to be working with the Array toolset for some reason.
  • Vectors: These look a lot like an Array, but refer to a situation where the members of the collection are not independent but rather are expected to be packed together in a SSE-style SIMD register. I will not be using Vectors in this blog post; I mention them here only because the docs have the operations for Arrays and Vectors all mixed in together, and it’s important to be aware they’re not what I’ll be using.

All of the concepts mentioned above— pointers, Arrays, Structs and Vectors— are parameterized by type. You can’t just have "a Pointer" or "a Vector". You’d have a pointer to an int32, or a vector of int8s with 8 members.

There are four noteworthy operations we can do on the data types in the above categories:

  • load: Given a LLVMValueRef representing a pointer, returns the LLVMValueRef representing the value it points to.
  • getelementptr: This one is quirky enough the LLVM people wrote a whole documentation page explaining what it is and isn’t, but it’s not actually that complicated. The idea here is that you have a pointer to an Aggregate, and you want to get a pointer to some specific data item within the Aggregate. It doesn’t actually get you the data item— no memory is dereferenced here, the resulting LLVMValueRef does represent a pointer. getelementptr is just doing math with the information encoded in a LLVMType to figure out where in memory a specific value is located.
  • extractvalue: Given an LLVMValueRef representing an Aggregate and an LLVMValueRef representing an index, returns the LLVMValueRef representing the indexed value (the actual value) within the array.
  • extractelement: Same as extractvalue, but works on Vectors.

So! Armed with all of this information, it seems clear that there are two ways I could go about my original goal of indexing one item out of argv:

  • I could convert (cast?) the argv pointer to an array of length zero and use extractvalue.
  • I could use getelementptr on the pointer to look up the address corresponding to my index, then load from that address.

So I guess I just took fourteen paragraphs to answer the question "how do I index an item from an array". But at least it was straightforward to look up! It’s so great to have documentation. By the way, did you know it took me two full weeks to figure out which LLVM-C function creates a getelementptr instruction? I eventually worked out by accident that the C builder function calls it a "GEP" even though nothing else in the LLVM project uses this terminology. Documentation is so great.

Anyway, I go with the second option (the getelementptr route) and it turns out to be pretty straightforward:

-- Read argv[1]
local argvIndex = ffi.new("LLVMValueRef [1]")
argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false)
local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder,
    LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex")
local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref")

Again: Make an LLVMValueRef for the index, use LLVMGetParam() to get an LLVMValueRef for argv, pass these into getelementptr, pass the result of that into load. One thing that might be a little nonobvious here is why the index is passed in as an array— LLVMBuildGEP() takes a list of indexes because it assumes you might want to nest aggregates, and it wants to let you use a single getelementptr instruction to look several nesting levels deep. So for example imagine you have a pointer to a two dimensional array of (x,y) points— an array containing arrays of struct {int x; int y; }s. If you wanted to get the address of array[3][4].y you could give getelementptr the index list [3, 4, 1]. I’m not interested in nesting here, so my index array has only one element.

argvValue now represents the string containing the first argument. I want it as a number, so I bring in the C atoi function and pass argvValue to that. This is no different from what I previously did to link in printf:

-- Atoi prototype
local atoiParams = ffi.new("LLVMTypeRef [1]")
atoiParams[0] = charPtrType
local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true)
local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType)

-- Atoi call site
local atoiArgs = ffi.new("LLVMValueRef [1]")
atoiArgs[0] = argvValue
local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall")

I can now give this result to our printf function:

printfArgs[1] = atoiResult

And I now have a program that takes a command line argument, converts it to an int, then prints it. Running this I get (code so far):

$ ./main 4
Number: 4
$ ./main 149134
Number: 149134
$ ./main
Segmentation fault

"Oops".

Branching

The program so far blindly reads the second element of argv, so it of course crashes if argv is only one element long. I need an error case.

In the last post, I mentioned that LLVM has a concept of "basic blocks", which are branchless stretches of code. I’ve so far been putting the entire program in a single basic block, Entry. Entry is now going to potentially branch into one of two new basic blocks, ArgcValid and ArgcInvalid.

local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid")
local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid")

Next I stick this up near the start of the program, immediately after positioning the builder in Entry:

local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2
    LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare")
LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid)

The br (branch) instruction has a conditional form that takes a LLVMValueRef and two basic blocks. It is implied but not specifically stated by the documentation that the LLVMValueRef needs to be the result of an icmp (integer compare) or fcmp (floating point compare) instruction. I want to do an integer compare, so I icmp parameter 0 (argc) with a constant 2 and branch to the two new basic blocks as appropriate.

I then stick this immediately before the rest of the program code:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid)

Everything I’ve written so far— the argv index, the atoi call, the printf— happens the same as before, just now that code is written into the ArgcValid block because that’s where the builder is positioned. Then at the end of the existing code, I stick in a new section:

-- argc != 2 case
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid)
    -- Print error
    local errorString = "Please enter exactly one number as argument.\n"
    local printfArgs = ffi.new("LLVMValueRef [1]")
    printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror")
    LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall")

    -- Return failure
    LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false))

The ArgcInvalid block does nothing but print an error message and return 1. I try this out and get (code so far):

$ ./main 4
Number: 4
$ ./main
Please enter exactly one number as argument.
$ ./main 3 4 5
Please enter exactly one number as argument.

This is the second time today something has worked on the first try. This is starting to get a little spooky.

Variables

My goal was to write a program that takes a number as input and factors it. So far what I have is a program that takes input. Time to factor.

I’m going to put the factoring in its own function. I define a prototype for it, taking an int as argument and returning void:

-- PrintFactors prototype
local printFactorsParams = ffi.new("LLVMTypeRef [1]")
printFactorsParams[0] = int32Type
local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false)
local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType)

Inside my ArgcValid block, I replace the printf with a call to the function:

-- PrintFactors call site
local printFactorsArgs = ffi.new("LLVMValueRef [1]")
printFactorsArgs[0] = atoiResult
LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "")

-- Printf call site (newline)
local printfArgs = ffi.new("LLVMValueRef [1]")
printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline")
LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall")

I do still have a printf there, but all it’s printing is "\n". As an amusing fact, since printFactors returns void, the name passed to LLVMBuildCall() MUST be "", exactly the value "" and no other, or the program crashes.

If I can implement the printFactors function, my program is done. Before I start, I plan out some pseudocode:

           value = param[0]           // Number being factored
           divisor = 2                // Candidate factor
           goto loop
loop:      if (divisor > value)       // Count divisor up, factor value down,
               goto done              // stop when the two pass each other.
           else
               goto modulo
modulo:    if (value % divisor == 0)
               goto divide
           else
               goto increment
divide:    printf("%d ", divisor)     // Factor found, print it
           value = value / divisor
           goto loop
increment: divisor = divisor + 1
           goto loop
done:      return

Actually, I say "pseudocode", but if you stuck semicolons on the ends of all the lines this would probably be valid C. Note in LLVM, every basic block has to end with a branch or return.

In this pseudocode, I repeatedly assign to the values value and divisor. This is a problem. LLVM doesn’t offer a way of modeling a "mutable" variable. LLVM offers load and store instructions to let you read and write to main memory, but the LLVMValues I’ve been working with— and you want to be working with LLVMValues, because they get to live in CPU registers— are based on something called the "SSA" model, where everything is assigned once and can never be changed unless you cheat using something called "phi nodes". This all sounds like a pain to learn and explain. So, I’m just not going to use it! The Kaleidoscope tutorial explains that LLVM ships with an optimization pass which promotes values read/written from memory to registers automatically, as long as that memory is allocated using an alloca in the entry block of a function. If I use this, I can avoid learning what "phi nodes" are at least one more day.

I start my printFactors implementation with those allocas:

local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value")
local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor")

LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value)
LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor)

Notice that value and divisor here are pointers. Writing values to them means writing to their memory locations.

I next define all the basic blocks I’m going to need in one go, and fall through into the loop:

local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop")
local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo")
local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide")
local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment")
local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done")

LLVM.LLVMBuildBr(builder.llvm.builder, loop)

Here’s the basic block under the loop: label; all it does is calculate divisor > value and branch:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- Test *divisor > *value and branch
    local compare = LLVM.LLVMBuildICmp(builder.llvm.builder,
        LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo)

Notice I have to start by dereferencing the value and divisor pointers so I can actually use their values.

Here’s the modulo: block where I test whether the divisor divides evenly into the value:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")
    
    -- *value % *divisor
    local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder,
        valueDeref, divisorDeref, "ModResult")

    -- Test *value % *divisor == 0 and branch
    local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ,
        modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment)

Notice I start by dereferencing value and divisor again, even though from the perspective of the code I just did this and the previous loaded values are still valid. This is inefficient, but I don’t care, because the mem2reg optimization pass is going to erase all these loads and stores.

The divide: block is the most complicated, becuase it has to both print out the factor and divide it out from the value:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide)
    local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- printf divisor
    local printfArgs = ffi.new("LLVMValueRef [2]")
    printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring")
    printfArgs[1] = divisorDeref
    LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

    -- *value = *value / *divisor
    local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder,
        valueDeref, divisorDeref, "DivideResult")
    LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value)

    -- Restart loop
    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

Here I use a store instruction to mutate an already-stored value for the first time. Also a tiny, interesting thing: The sdiv (signed divide) instruction has an "exact" form, where the backend gets to assume the divisor divides evenly. I know the divisor divides evenly because I just did a modulo check, so I use the exact form here. Does this actually make a difference to program performance? I have no idea!

If you followed all that, the increment: block is just more of the same:

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment)
    local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

    -- *divisor = *divisor + 1
    local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref,
        LLVM.LLVMConstInt(int32Type, 1, false), "AddResult")
    LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor)

    -- Restart loop
    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

And the done: block is basically nothing, noteworthy only because it demonstrates a conditional br can branch to a basic block but not to a return.

LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done)
    LLVM.LLVMBuildRetVoid(builder.llvm.builder)

This is all some pretty dense code, so once I get it all typed out I do need a couple rounds of debugging. The first version of this code I attempted had the tragic line:

local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisorDeref, "DivisorDeref")

IE, instead of dereferencing divisor I’m dereferencing another basic block’s divisorDeref. I had to copy/paste that line like five times and apparently one time I messed it up. This results in a type error, but type errors in LLVM come in the form of halt-the-process assertion errors, so I only found this error by commenting out every single line of the function and then commenting them back in one by one until the error appeared. Once I fixed this I suddenly had a segmentation fault, because while commenting stuff in line by line I accidentally left a return void in the middle of a basic block and apparently it didn’t like that for some reason. Debugging the segmentation fault was a bit obnoxious, and required stepping through the x86-64 disassembly and matching up each instruction to a line of my Lua until I could find out where the error was happening. Both of these debug techniques were feasible because the function was short and the program is short, but I’m really going to have to find better solutions for debugging if I write longer programs with this.

Anyway, once I get those out of the way, I run my program and:

$ ./main 4
2 2 
$ ./main 13
13 
$ ./main 65536
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
$ ./main 2310
2 3 5 7 11 
$ ./main 2311
2311 

It works! It does have some slightly odd behavior on edge cases:

$ ./main 1

$ ./main 0

$ ./main -324

The loop in printFactors halts when the value is less than the divisor, and the divisor starts off at 2, so for inputs below 2 it just prints nothing. For 0 and 1 this is actually pretty good behavior: 0 and 1 have no prime factors, so printing nothing is correct (this is actually what the standard linux factor program does on these numbers). For negative numbers… uh… I don’t know what I expect it to do. Print -1 and then all the prime factors? Print an error? It would be pretty easy to code in a special behavior for negative numbers, but I don’t know what the behavior should be so I just leave it.

With that out of the way:

Optimization

There’s one last thing that’s a bit odd. My program is set to dump the program’s assembly to the console before it generates the exe, so that if anything is horribly wrong it will be obvious immediately. Well, something is wrong:

Loop:     ; preds = %Increment, %Divide, %PrintFactorsEntry
  %ValueDeref = load i32, i32* %Value
  %DivisorDeref = load i32, i32* %Divisor
  %DivisorCompare = icmp sgt i32 %DivisorDeref, %ValueDeref
  br i1 %DivisorCompare, label %Done, label %Modulo

Or, well, that’s correct, that’s what I entered, but wasn’t that mem2reg thing supposed to be erasing all the loads? I take a closer look at the Kaleidoscope sample code and discover it isn’t enough to just create the optimization pass manager like I did in the last post, I have to explicitly tell it to run. I give my builder object a new method:

function Builder:optimize(fn)
    LLVM.LLVMRunFunctionPassManager(self.llvm.passManager, fn)
end

…and then drop at the very end of my program, right before emitting the executable:

builder:optimize(mainFn)
builder:optimize(printFactorsFn)

What mimicking Kaleidoscope gets me is a "function pass manager", and you apparently have to invoke this individually on every single function in your program. There’s also a plain "pass manager" which operates on a whole module, and the header file implies it does different kinds of optimizations. I guess that’s another thing I’ll want to follow up on later. With the function pass manager being invoked, the loop: assembly looks very different:

Loop:                                             ; preds = %Increment, %Divide, %PrintFactorsEntry
  %Divisor.0 = phi i32 [ 2, %PrintFactorsEntry ], [ %Divisor.0, %Divide ], [ %AddResult, %Increment ]
  %Value.0 = phi i32 [ %0, %PrintFactorsEntry ], [ %DivideResult, %Divide ], [ %Value.0, %Increment ]
  %DivisorCompare = icmp sgt i32 %Divisor.0, %Value.0
  br i1 %DivisorCompare, label %Done, label %Modulo

I have no idea what this means. And I don’t have to! 😀

Done

My factoring program (code so far) is done! Excluding the setup with the builder object, the code is— imagine me taking a DEEP breath right now—

local int32Type = LLVM.LLVMInt32Type()
local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0)  -- address space 0 (arbitrary)

-- Main function prototype
local mainParams = ffi.new("LLVMTypeRef [2]")
mainParams[0] = int32Type
mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0)
local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false)
local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)

-- Printf prototype
local printfParams = ffi.new("LLVMTypeRef [1]")
printfParams[0] = charPtrType
local printfType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), printfParams, 1, true) -- vararg
local printfFn = LLVM.LLVMAddFunction(builder.llvm.module, "printf", printfType)

-- PrintFactors prototype
local printFactorsParams = ffi.new("LLVMTypeRef [1]")
printFactorsParams[0] = int32Type
local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false)
local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType)

-- Main function implementation
local mainEntry = LLVM.LLVMAppendBasicBlock(mainFn, "Entry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, mainEntry)

    -- Branch on argc
    local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid")
    local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid")
    local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2
        LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare")
    LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid)

    -- argc == 2 case
    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid)
        -- Read argv[1]
        local argvIndex = ffi.new("LLVMValueRef [1]")
        argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false)
        local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder,
            LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex")
        local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref")

        -- Atoi prototype
        local atoiParams = ffi.new("LLVMTypeRef [1]")
        atoiParams[0] = charPtrType
        local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true)
        local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType)

        -- Atoi call site
        local atoiArgs = ffi.new("LLVMValueRef [1]")
        atoiArgs[0] = argvValue
        local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall")

        -- PrintFactors call site
        local printFactorsArgs = ffi.new("LLVMValueRef [1]")
        printFactorsArgs[0] = atoiResult
        LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "")

        -- Printf call site (newline)
        local printfArgs = ffi.new("LLVMValueRef [1]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline")
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall")

        -- Return success
        LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false))

    -- argc != 2 case
    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid)
        -- Print error
        local errorString = "Please enter exactly one number as argument.\n"
        local printfArgs = ffi.new("LLVMValueRef [1]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror")
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall")

        -- Return failure
        LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false))

-- PrintFactors implementation
local printFactorsEntry = LLVM.LLVMAppendBasicBlock(printFactorsFn, "PrintFactorsEntry")
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, printFactorsEntry)
    -- Variables
    local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value")
    local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor")

    LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value)
    LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor)

    local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop")
    local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo")
    local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide")
    local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment")
    local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done")

    LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- Test *divisor > *value and branch
        local compare = LLVM.LLVMBuildICmp(builder.llvm.builder,
            LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare")
        LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")
        
        -- *value % *divisor
        local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder,
            valueDeref, divisorDeref, "ModResult")

        -- Test *value % *divisor == 0 and branch
        local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ,
            modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly")
        LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide)
        local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref")
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- printf divisor
        local printfArgs = ffi.new("LLVMValueRef [2]")
        printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring")
        printfArgs[1] = divisorDeref
        LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");

        -- *value = *value / *divisor
        local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder,
            valueDeref, divisorDeref, "DivideResult")
        LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value)

        -- Restart loop
        LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment)
        local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref")

        -- *divisor = *divisor + 1
        local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref,
            LLVM.LLVMConstInt(int32Type, 1, false), "AddResult")
        LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor)

        -- Restart loop
        LLVM.LLVMBuildBr(builder.llvm.builder, loop)

    LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done)
        LLVM.LLVMBuildRetVoid(builder.llvm.builder)

So, what have I got here?

Well, for starters, I’ve achieved my goal— I’ve demonstrated I can write a useful program just by invoking the LLVM backend functions, and I have definitely entertained myself doing so. Looking upon what I have wrought I am amused as heck.

But this program is a behemoth! That is 145 lines of code for a program that you could have fully expressed in Perl in 62 characters:

($v)=@ARGV;$d=2;$v%$d?$d++:(print"$d\n")&&($v/=$d)until$d>$v

The generate.lua file is actually longer (9568 bytes) than the OS X executable it generates (8888 bytes). And the code is not just long, it is hard to read. It is dense, and repetitive, and consists mostly of boilerplate. If you look at the LLVM assembly generated by this code, it is both shorter and easier to understand the Lua code itself.

There’s a reason for this. I made a decision to write my not-compiler in Lua. But I’m not using Lua. I’m not taking advantage of Lua’s dynamism— well, never mind that, I’m not even treating Lua as a Turing-complete programming language. Every thing I do here, I do in the most literal and naive way. This was an intentional decision! The goal of this particular program was to be a proof-of-concept and help me learn to use the LLVM-C functions. But, I think I’m done with that goal now.

In the next post, I’m going to start designing abstractions. Okay, constructing LLVM IR by individually calling an LLVM-C function for each line of assembly results in something which is not easy to read or write. What if I designed a Lua API specifically to make constructing programs in code pleasant? What would that look like?

— — — — —

POSTSCRIPT: SOME THINGS I WOULD APPRECIATE ANSWERS ON IF ANYONE READING THIS KNOWS

  • The documentation refers to Aggregates and Vectors as derived types. Is a Pointer a "derived type"?
  • With Vectors, is a Vector required to be the same size as an SIMD register on the target platform? What does LLVM do if it isn’t (for example, an <8 x float> vector on x86-64)?
  • This is a paragraph I cut from this blog post because it broke up the flow and also I’m not sure if it is accurate. Is the following accurate?
    • The documentation calls some LLVM-C functions "instructions". Fundamentally, what an "instruction" does is assign a value to an SSA name (except for non-value-returning functions like br). If I chain together function calls, like passing the output of LLVMBuildLoad to LLVMBuildExactSDiv, this looks to me like a tree of function applications but LLVM internally is rewriting it into a sequential series of SSA name assignments. What I think of as an LLVMValueRef is, fundamentally, a single SSA name.

Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.

One Response to “No Compiler, part 2: Turing Completeness”

  1. Mason Bially Says:

    Hey I’ve really appreciated this series of posts, you hit a lot of the same issues I did using the LLVM C API.

    `CompositeType` seems to be a common superclass of pointers, arrays and struct type classes in the LLVM C++ code. So I would bet yes.

    If you run the right optimization passes and enable the right backend configuration options it’s my understanding the answer is that LLVM will attempt to do the right thing. I get this from the loop vectorizer documentation and my experience with clang, I haven;t gotten there yet (soon!).

    That last paragraph seems correct to me. It’s my understanding that each `LLVMValueRef` is (generally) an SSA register with a few exceptions (`LLVMGetParam` comes to mind), and that understanding of instructions seems generally correct (although `store`, `call` (optionally; which is super weird, you give it an empty – not null – name to get it as a return-less subroutine call instruction) and `ret` don’t assign their results to a new SSA register but are still “instructions” from what I can tell of the language spec).

Leave a Reply