No Compiler, part 2: Turing Completeness
Last week I made a blog post where I talked about making computer programs without a programming language, by writing Lua scripts that bind to the LLVM compiler backend library and call the instruction builder functions manually. I’m gonna keep playing with this, but before I keep going, people on Twitter pointed out some errors in the last post:
- In the last post, I went to weird bother to construct the path to libLLVM, claiming that LuaJIT can’t figure out things like the standard operating system file extensions while loading libraries. @dotstdy points out that actually LuaJIT can figure out what to do just fine from
ffi.load('LLVM')
, as long as the library/DLL is in the normal OS search path. (I had only ever used LuaJIT for games, where you want to load specific binaries you shipped instead of anything the OS knows about.) - @whitequark points out several issues, notably:
- In the last post I claimed
lld
, the LLVM project’s linker, is unique in being a linker that knows how to target multiple platforms. Actually GNUld
can do this also (although Darwin appears to be a notable hole in its platform support). -
Some context: What LLVM frontends create is "LLVM IR", or Intermediate Representation. The Build functions in the LLVM API, which I’m calling, construct an in-memory version of this IR. The IR also has a textual representation, known as "LLVM assembly", which is dumped by the debug functions and consumed by
llc
. There’s also bitcode, which is a binary representation of the IR designed to be forward-compatible with future versions of LLVM. The true IR, however, is the in-memory representation.In the last post, I did not use this terminology correctly. I was thinking of the phrase "IR" as referring to the textual version (the LLVM assembly language) and so I repeatedly used the phrase "abstract syntax tree" to refer to the in-memory IR. This wasn’t a terminology I saw anywhere but I assumed it to at least make sense because "abstract syntax tree" is a sufficiently vague term. Unfortunately it turns out it’s an incorrect term in this case because in-memory LLVM IR has cycles (phi nodes) and is not a tree.
- In the last post I claimed
With that out of the way…
Parameters
In the last post I generated an executable that just ran printf()
. Not really much of a "program". At that point I might as well have just been starting to write a compiler frontend— several LLVM frontend tutorials do start by doing exactly those steps.
If I’m going to be writing actual "programs" I need to learn to work with some more complicated things— at minimum variables, and loops/branches, and I need to be able to make and call functions. My goal for this post is going to be to write a small program that needs all these things— specifically, I’m going to write a program that factors numbers.
To get started, here was the code from the last post that emits the printf()
call:
local printString = "LET'S MAKE A JOURNEY TO THE CAVE OF MONSTERS!\n" local printfArgs = ffi.new("LLVMValueRef [1]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring") LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "tmpcall");
Let’s change this to add an integer argument to the printf()
:
local printString = "Number: %d\n" local printNumber = 31337 local printfArgs = ffi.new("LLVMValueRef [2]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, printString, "tmpstring") printfArgs[1] = LLVM.LLVMConstInt(LLVM.LLVMInt32Type(), printNumber, false) -- no sign extend LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall");
printf()
is a vararg function, so I don’t have to adjust its type definition to add the second argument. Running this prints (code so far):
Number: 31337
Let’s make the printed number an variable input by the user instead of a constant. I can get input from the command line if I adjust the prototype for main()
:
local int32Type = LLVM.LLVMInt32Type() local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0) -- address space 0 (arbitrary) -- Main function prototype local mainParams = ffi.new("LLVMTypeRef [2]") mainParams[0] = int32Type mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0) local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false) local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType)
All seven lines of that was just to express int main(int argc, char *argv);
. If I make this change I also ought to change my final return statement to return 0 instead of void:
LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false))
In theory, my main function now has parameters, which the header file suggests I can extract with LLVMGetParam()
. I test this by changing one line of the printf invocation from earlier:
printfArgs[1] = LLVM.LLVMGetParam(mainFn, 0)
Running this I get (code so far):
$ ./main Number: 1 $ ./main 0 Number: 2 $ ./main 1 2 3 Number: 4 $ ./main a a a a a a a a Number: 9
Wow, something actually worked on the first try! By printing the first parameter to mainFn
I’m printing "argc", the number of command-line arguments. (The name of the program counts as an argument, which is why running without arguments prints 1.)
Aggregates
Printing the number of arguments is not actually what I want. What I actually want to do right now is print the first argument after the name. This means I somehow need to be able to index a value from the "argv" array. It turns out there is more than one way to do this. I look through the LLVM assembly reference, and here’s what I learn.
Before I go on, I should make something clear if it wasn’t already. Essentially all of the LLVM functions I’m calling, both the arguments and the return values are of one of two C types: Either LLVMTypeRef
or LLVMValueRef
. Each of these C types corresponds to a whole hierarchy of C++ types, which I assume users of the C++ interface might specifically care about, but I don’t (because the C interface collapses down to LLVMTypeRef
/LLVMValueRef
and then Lua’s dynamic typing hides even those coarse types). Each value object pointed to by a LLVMValueRef
has a type object (an LLVMTypeRef
) associated with it. Functions that consume LLVMValues usually have rules about which LLVMTypes the provided values can have— but in most cases LLVMType violations are not noticed by the library until much, much later at which point you get an unrecoverable assert failure, so you have to be really certain you’re following the LLVMType rules before you call functions.
With that out of the way, here’s what the assembly documentation tells me. There are three general categories of data type (categories an LLVMTypeRef
can fall into) we care about when thinking about arrays:
- Pointers: This means the associated value is literally a memory address, pointing to data of some known type.
- Aggregates: This refers to a block of data containing a series of values. Aggregates come in two types, Arrays and Structs; values in an Array are all of the same type, a Struct has an ordered series of known types for its members. In both cases the aggregate has a known length, and you look up values in the aggregate by index ("the fourth value in this array", or "the second field of this struct"). However arrays have the quirk that by default you’re allowed to look up values past the end of the array; in this case you just get… whatever was in memory after the end of the array. The docs in one place actually advocate creating a 0-length array and indexing past the end of it in order to represent a C-style unknown-length array; I guess in this case the 0-length array would functionally be the same as a pointer, except that LLVM exposes different operations for working with pointers and Arrays and maybe you’d prefer to be working with the Array toolset for some reason.
- Vectors: These look a lot like an Array, but refer to a situation where the members of the collection are not independent but rather are expected to be packed together in a SSE-style SIMD register. I will not be using Vectors in this blog post; I mention them here only because the docs have the operations for Arrays and Vectors all mixed in together, and it’s important to be aware they’re not what I’ll be using.
All of the concepts mentioned above— pointers, Arrays, Structs and Vectors— are parameterized by type. You can’t just have "a Pointer" or "a Vector". You’d have a pointer to an int32, or a vector of int8s with 8 members.
There are four noteworthy operations we can do on the data types in the above categories:
load
: Given aLLVMValueRef
representing a pointer, returns theLLVMValueRef
representing the value it points to.getelementptr
: This one is quirky enough the LLVM people wrote a whole documentation page explaining what it is and isn’t, but it’s not actually that complicated. The idea here is that you have a pointer to an Aggregate, and you want to get a pointer to some specific data item within the Aggregate. It doesn’t actually get you the data item— no memory is dereferenced here, the resultingLLVMValueRef
does represent a pointer.getelementptr
is just doing math with the information encoded in a LLVMType to figure out where in memory a specific value is located.extractvalue
: Given anLLVMValueRef
representing an Aggregate and anLLVMValueRef
representing an index, returns theLLVMValueRef
representing the indexed value (the actual value) within the array.extractelement
: Same asextractvalue
, but works on Vectors.
So! Armed with all of this information, it seems clear that there are two ways I could go about my original goal of indexing one item out of argv
:
- I could convert (cast?) the
argv
pointer to an array of length zero and useextractvalue
. - I could use
getelementptr
on the pointer to look up the address corresponding to my index, thenload
from that address.
So I guess I just took fourteen paragraphs to answer the question "how do I index an item from an array". But at least it was straightforward to look up! It’s so great to have documentation. By the way, did you know it took me two full weeks to figure out which LLVM-C function creates a getelementptr
instruction? I eventually worked out by accident that the C builder function calls it a "GEP" even though nothing else in the LLVM project uses this terminology. Documentation is so great.
Anyway, I go with the second option (the getelementptr
route) and it turns out to be pretty straightforward:
-- Read argv[1] local argvIndex = ffi.new("LLVMValueRef [1]") argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false) local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder, LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex") local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref")
Again: Make an LLVMValueRef
for the index, use LLVMGetParam()
to get an LLVMValueRef
for argv, pass these into getelementptr
, pass the result of that into load
. One thing that might be a little nonobvious here is why the index is passed in as an array— LLVMBuildGEP()
takes a list of indexes because it assumes you might want to nest aggregates, and it wants to let you use a single getelementptr
instruction to look several nesting levels deep. So for example imagine you have a pointer to a two dimensional array of (x,y) points— an array containing arrays of struct {int x; int y; }
s. If you wanted to get the address of array[3][4].y
you could give getelementptr
the index list [3, 4, 1]. I’m not interested in nesting here, so my index array has only one element.
argvValue
now represents the string containing the first argument. I want it as a number, so I bring in the C atoi
function and pass argvValue
to that. This is no different from what I previously did to link in printf
:
-- Atoi prototype local atoiParams = ffi.new("LLVMTypeRef [1]") atoiParams[0] = charPtrType local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true) local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType) -- Atoi call site local atoiArgs = ffi.new("LLVMValueRef [1]") atoiArgs[0] = argvValue local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall")
I can now give this result to our printf
function:
printfArgs[1] = atoiResult
And I now have a program that takes a command line argument, converts it to an int, then prints it. Running this I get (code so far):
$ ./main 4 Number: 4 $ ./main 149134 Number: 149134 $ ./main Segmentation fault
"Oops".
Branching
The program so far blindly reads the second element of argv
, so it of course crashes if argv is only one element long. I need an error case.
In the last post, I mentioned that LLVM has a concept of "basic blocks", which are branchless stretches of code. I’ve so far been putting the entire program in a single basic block, Entry
. Entry
is now going to potentially branch into one of two new basic blocks, ArgcValid
and ArgcInvalid
.
local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid") local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid")
Next I stick this up near the start of the program, immediately after positioning the builder in Entry
:
local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2 LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare") LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid)
The br
(branch) instruction has a conditional form that takes a LLVMValueRef
and two basic blocks. It is implied but not specifically stated by the documentation that the LLVMValueRef
needs to be the result of an icmp
(integer compare) or fcmp
(floating point compare) instruction. I want to do an integer compare, so I icmp
parameter 0 (argc
) with a constant 2 and branch to the two new basic blocks as appropriate.
I then stick this immediately before the rest of the program code:
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid)
Everything I’ve written so far— the argv
index, the atoi
call, the printf
— happens the same as before, just now that code is written into the ArgcValid
block because that’s where the builder is positioned. Then at the end of the existing code, I stick in a new section:
-- argc != 2 case LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid) -- Print error local errorString = "Please enter exactly one number as argument.\n" local printfArgs = ffi.new("LLVMValueRef [1]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror") LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall") -- Return failure LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false))
The ArgcInvalid
block does nothing but print an error message and return 1. I try this out and get (code so far):
$ ./main 4 Number: 4 $ ./main Please enter exactly one number as argument. $ ./main 3 4 5 Please enter exactly one number as argument.
This is the second time today something has worked on the first try. This is starting to get a little spooky.
Variables
My goal was to write a program that takes a number as input and factors it. So far what I have is a program that takes input. Time to factor.
I’m going to put the factoring in its own function. I define a prototype for it, taking an int as argument and returning void:
-- PrintFactors prototype local printFactorsParams = ffi.new("LLVMTypeRef [1]") printFactorsParams[0] = int32Type local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false) local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType)
Inside my ArgcValid
block, I replace the printf
with a call to the function:
-- PrintFactors call site local printFactorsArgs = ffi.new("LLVMValueRef [1]") printFactorsArgs[0] = atoiResult LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "") -- Printf call site (newline) local printfArgs = ffi.new("LLVMValueRef [1]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline") LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall")
I do still have a printf
there, but all it’s printing is "\n"
. As an amusing fact, since printFactors
returns void, the name passed to LLVMBuildCall()
MUST be ""
, exactly the value ""
and no other, or the program crashes.
If I can implement the printFactors
function, my program is done. Before I start, I plan out some pseudocode:
value = param[0] // Number being factored divisor = 2 // Candidate factor goto loop loop: if (divisor > value) // Count divisor up, factor value down, goto done // stop when the two pass each other. else goto modulo modulo: if (value % divisor == 0) goto divide else goto increment divide: printf("%d ", divisor) // Factor found, print it value = value / divisor goto loop increment: divisor = divisor + 1 goto loop done: return
Actually, I say "pseudocode", but if you stuck semicolons on the ends of all the lines this would probably be valid C. Note in LLVM, every basic block has to end with a branch or return.
In this pseudocode, I repeatedly assign to the values value
and divisor
. This is a problem. LLVM doesn’t offer a way of modeling a "mutable" variable. LLVM offers load
and store
instructions to let you read and write to main memory, but the LLVMValues I’ve been working with— and you want to be working with LLVMValues, because they get to live in CPU registers— are based on something called the "SSA" model, where everything is assigned once and can never be changed unless you cheat using something called "phi nodes". This all sounds like a pain to learn and explain. So, I’m just not going to use it! The Kaleidoscope tutorial explains that LLVM ships with an optimization pass which promotes values read/written from memory to registers automatically, as long as that memory is allocated using an alloca in the entry block of a function. If I use this, I can avoid learning what "phi nodes" are at least one more day.
I start my printFactors
implementation with those allocas:
local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value") local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor") LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value) LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor)
Notice that value
and divisor
here are pointers. Writing values to them means writing to their memory locations.
I next define all the basic blocks I’m going to need in one go, and fall through into the loop:
local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop") local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo") local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide") local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment") local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done") LLVM.LLVMBuildBr(builder.llvm.builder, loop)
Here’s the basic block under the loop:
label; all it does is calculate divisor > value and branch:
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- Test *divisor > *value and branch local compare = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare") LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo)
Notice I have to start by dereferencing the value
and divisor
pointers so I can actually use their values.
Here’s the modulo:
block where I test whether the divisor divides evenly into the value:
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- *value % *divisor local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder, valueDeref, divisorDeref, "ModResult") -- Test *value % *divisor == 0 and branch local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly") LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment)
Notice I start by dereferencing value
and divisor
again, even though from the perspective of the code I just did this and the previous loaded values are still valid. This is inefficient, but I don’t care, because the mem2reg optimization pass is going to erase all these loads and stores.
The divide:
block is the most complicated, becuase it has to both print out the factor and divide it out from the value:
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- printf divisor local printfArgs = ffi.new("LLVMValueRef [2]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring") printfArgs[1] = divisorDeref LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall"); -- *value = *value / *divisor local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder, valueDeref, divisorDeref, "DivideResult") LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value) -- Restart loop LLVM.LLVMBuildBr(builder.llvm.builder, loop)
Here I use a store
instruction to mutate an already-stored value for the first time. Also a tiny, interesting thing: The sdiv
(signed divide) instruction has an "exact" form, where the backend gets to assume the divisor divides evenly. I know the divisor divides evenly because I just did a modulo check, so I use the exact form here. Does this actually make a difference to program performance? I have no idea!
If you followed all that, the increment:
block is just more of the same:
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment) local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- *divisor = *divisor + 1 local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref, LLVM.LLVMConstInt(int32Type, 1, false), "AddResult") LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor) -- Restart loop LLVM.LLVMBuildBr(builder.llvm.builder, loop)
And the done:
block is basically nothing, noteworthy only because it demonstrates a conditional br
can branch to a basic block but not to a return.
LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done) LLVM.LLVMBuildRetVoid(builder.llvm.builder)
This is all some pretty dense code, so once I get it all typed out I do need a couple rounds of debugging. The first version of this code I attempted had the tragic line:
local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisorDeref, "DivisorDeref")
IE, instead of dereferencing divisor I’m dereferencing another basic block’s divisorDeref. I had to copy/paste that line like five times and apparently one time I messed it up. This results in a type error, but type errors in LLVM come in the form of halt-the-process assertion errors, so I only found this error by commenting out every single line of the function and then commenting them back in one by one until the error appeared. Once I fixed this I suddenly had a segmentation fault, because while commenting stuff in line by line I accidentally left a return void in the middle of a basic block and apparently it didn’t like that for some reason. Debugging the segmentation fault was a bit obnoxious, and required stepping through the x86-64 disassembly and matching up each instruction to a line of my Lua until I could find out where the error was happening. Both of these debug techniques were feasible because the function was short and the program is short, but I’m really going to have to find better solutions for debugging if I write longer programs with this.
Anyway, once I get those out of the way, I run my program and:
$ ./main 4 2 2 $ ./main 13 13 $ ./main 65536 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 $ ./main 2310 2 3 5 7 11 $ ./main 2311 2311
It works! It does have some slightly odd behavior on edge cases:
$ ./main 1 $ ./main 0 $ ./main -324
The loop in printFactors
halts when the value is less than the divisor, and the divisor starts off at 2, so for inputs below 2 it just prints nothing. For 0 and 1 this is actually pretty good behavior: 0 and 1 have no prime factors, so printing nothing is correct (this is actually what the standard linux factor
program does on these numbers). For negative numbers… uh… I don’t know what I expect it to do. Print -1 and then all the prime factors? Print an error? It would be pretty easy to code in a special behavior for negative numbers, but I don’t know what the behavior should be so I just leave it.
With that out of the way:
Optimization
There’s one last thing that’s a bit odd. My program is set to dump the program’s assembly to the console before it generates the exe, so that if anything is horribly wrong it will be obvious immediately. Well, something is wrong:
Loop: ; preds = %Increment, %Divide, %PrintFactorsEntry %ValueDeref = load i32, i32* %Value %DivisorDeref = load i32, i32* %Divisor %DivisorCompare = icmp sgt i32 %DivisorDeref, %ValueDeref br i1 %DivisorCompare, label %Done, label %Modulo
Or, well, that’s correct, that’s what I entered, but wasn’t that mem2reg thing supposed to be erasing all the loads? I take a closer look at the Kaleidoscope sample code and discover it isn’t enough to just create the optimization pass manager like I did in the last post, I have to explicitly tell it to run. I give my builder object a new method:
function Builder:optimize(fn) LLVM.LLVMRunFunctionPassManager(self.llvm.passManager, fn) end
…and then drop at the very end of my program, right before emitting the executable:
builder:optimize(mainFn) builder:optimize(printFactorsFn)
What mimicking Kaleidoscope gets me is a "function pass manager", and you apparently have to invoke this individually on every single function in your program. There’s also a plain "pass manager" which operates on a whole module, and the header file implies it does different kinds of optimizations. I guess that’s another thing I’ll want to follow up on later. With the function pass manager being invoked, the loop:
assembly looks very different:
Loop: ; preds = %Increment, %Divide, %PrintFactorsEntry %Divisor.0 = phi i32 [ 2, %PrintFactorsEntry ], [ %Divisor.0, %Divide ], [ %AddResult, %Increment ] %Value.0 = phi i32 [ %0, %PrintFactorsEntry ], [ %DivideResult, %Divide ], [ %Value.0, %Increment ] %DivisorCompare = icmp sgt i32 %Divisor.0, %Value.0 br i1 %DivisorCompare, label %Done, label %Modulo
I have no idea what this means. And I don’t have to! 😀
Done
My factoring program (code so far) is done! Excluding the setup with the builder object, the code is— imagine me taking a DEEP breath right now—
local int32Type = LLVM.LLVMInt32Type() local charPtrType = LLVM.LLVMPointerType(LLVM.LLVMInt8Type(), 0) -- address space 0 (arbitrary) -- Main function prototype local mainParams = ffi.new("LLVMTypeRef [2]") mainParams[0] = int32Type mainParams[1] = LLVM.LLVMPointerType(charPtrType, 0) local mainType = LLVM.LLVMFunctionType(int32Type, mainParams, 2, false) local mainFn = LLVM.LLVMAddFunction(builder.llvm.module, "main", mainType) -- Printf prototype local printfParams = ffi.new("LLVMTypeRef [1]") printfParams[0] = charPtrType local printfType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), printfParams, 1, true) -- vararg local printfFn = LLVM.LLVMAddFunction(builder.llvm.module, "printf", printfType) -- PrintFactors prototype local printFactorsParams = ffi.new("LLVMTypeRef [1]") printFactorsParams[0] = int32Type local printFactorsType = LLVM.LLVMFunctionType(LLVM.LLVMVoidType(), printFactorsParams, 1, false) local printFactorsFn = LLVM.LLVMAddFunction(builder.llvm.module, "printFactors", printFactorsType) -- Main function implementation local mainEntry = LLVM.LLVMAppendBasicBlock(mainFn, "Entry") LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, mainEntry) -- Branch on argc local argcValid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcValid") local argcInvalid = LLVM.LLVMAppendBasicBlock(mainFn, "ArgcInvalid") local argcCmp = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, -- argc == 2 LLVM.LLVMGetParam(mainFn, 0), LLVM.LLVMConstInt(int32Type, 2, false), "ArgcCompare") LLVM.LLVMBuildCondBr(builder.llvm.builder, argcCmp, argcValid, argcInvalid) -- argc == 2 case LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcValid) -- Read argv[1] local argvIndex = ffi.new("LLVMValueRef [1]") argvIndex[0] = LLVM.LLVMConstInt(int32Type, 1, false) local argvIndexAddr = LLVM.LLVMBuildGEP(builder.llvm.builder, LLVM.LLVMGetParam(mainFn, 1), argvIndex, 1, "ArgvIndex") local argvValue = LLVM.LLVMBuildLoad(builder.llvm.builder, argvIndexAddr, "ArgvDeref") -- Atoi prototype local atoiParams = ffi.new("LLVMTypeRef [1]") atoiParams[0] = charPtrType local atoiType = LLVM.LLVMFunctionType(LLVM.LLVMInt32Type(), atoiParams, 1, true) local atoiFn = LLVM.LLVMAddFunction(builder.llvm.module, "atoi", atoiType) -- Atoi call site local atoiArgs = ffi.new("LLVMValueRef [1]") atoiArgs[0] = argvValue local atoiResult = LLVM.LLVMBuildCall(builder.llvm.builder, atoiFn, atoiArgs, 1, "atoicall") -- PrintFactors call site local printFactorsArgs = ffi.new("LLVMValueRef [1]") printFactorsArgs[0] = atoiResult LLVM.LLVMBuildCall(builder.llvm.builder, printFactorsFn, printFactorsArgs, 1, "") -- Printf call site (newline) local printfArgs = ffi.new("LLVMValueRef [1]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "\n", "printfnewline") LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printfnumbercall") -- Return success LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 0, false)) -- argc != 2 case LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, argcInvalid) -- Print error local errorString = "Please enter exactly one number as argument.\n" local printfArgs = ffi.new("LLVMValueRef [1]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, errorString, "printferror") LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 1, "printferrorcall") -- Return failure LLVM.LLVMBuildRet(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 1, false)) -- PrintFactors implementation local printFactorsEntry = LLVM.LLVMAppendBasicBlock(printFactorsFn, "PrintFactorsEntry") LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, printFactorsEntry) -- Variables local value = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Value") local divisor = LLVM.LLVMBuildAlloca(builder.llvm.builder, int32Type, "Divisor") LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMGetParam(printFactorsFn, 0), value) LLVM.LLVMBuildStore(builder.llvm.builder, LLVM.LLVMConstInt(int32Type, 2, false), divisor) local loop = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Loop") local modulo = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Modulo") local divide = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Divide") local increment = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Increment") local done = LLVM.LLVMAppendBasicBlock(printFactorsFn, "Done") LLVM.LLVMBuildBr(builder.llvm.builder, loop) LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, loop) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- Test *divisor > *value and branch local compare = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntSGT, divisorDeref, valueDeref, "DivisorCompare") LLVM.LLVMBuildCondBr(builder.llvm.builder, compare, done, modulo) LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, modulo) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- *value % *divisor local modResult = LLVM.LLVMBuildSRem(builder.llvm.builder, valueDeref, divisorDeref, "ModResult") -- Test *value % *divisor == 0 and branch local dividesEvenly = LLVM.LLVMBuildICmp(builder.llvm.builder, LLVM.LLVMIntEQ, modResult, LLVM.LLVMConstInt(int32Type, 0, false), "DividesEvenly") LLVM.LLVMBuildCondBr(builder.llvm.builder, dividesEvenly, divide, increment) LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, divide) local valueDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, value, "ValueDeref") local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- printf divisor local printfArgs = ffi.new("LLVMValueRef [2]") printfArgs[0] = LLVM.LLVMBuildGlobalStringPtr(builder.llvm.builder, "%d ", "tmpstring") printfArgs[1] = divisorDeref LLVM.LLVMBuildCall(builder.llvm.builder, printfFn, printfArgs, 2, "tmpcall"); -- *value = *value / *divisor local divideResult = LLVM.LLVMBuildExactSDiv(builder.llvm.builder, valueDeref, divisorDeref, "DivideResult") LLVM.LLVMBuildStore(builder.llvm.builder, divideResult, value) -- Restart loop LLVM.LLVMBuildBr(builder.llvm.builder, loop) LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, increment) local divisorDeref = LLVM.LLVMBuildLoad(builder.llvm.builder, divisor, "DivisorDeref") -- *divisor = *divisor + 1 local addResult = LLVM.LLVMBuildAdd(builder.llvm.builder, divisorDeref, LLVM.LLVMConstInt(int32Type, 1, false), "AddResult") LLVM.LLVMBuildStore(builder.llvm.builder, addResult, divisor) -- Restart loop LLVM.LLVMBuildBr(builder.llvm.builder, loop) LLVM.LLVMPositionBuilderAtEnd(builder.llvm.builder, done) LLVM.LLVMBuildRetVoid(builder.llvm.builder)
So, what have I got here?
Well, for starters, I’ve achieved my goal— I’ve demonstrated I can write a useful program just by invoking the LLVM backend functions, and I have definitely entertained myself doing so. Looking upon what I have wrought I am amused as heck.
But this program is a behemoth! That is 145 lines of code for a program that you could have fully expressed in Perl in 62 characters:
($v)=@ARGV;$d=2;$v%$d?$d++:(print"$d\n")&&($v/=$d)until$d>$v
The generate.lua
file is actually longer (9568 bytes) than the OS X executable it generates (8888 bytes). And the code is not just long, it is hard to read. It is dense, and repetitive, and consists mostly of boilerplate. If you look at the LLVM assembly generated by this code, it is both shorter and easier to understand the Lua code itself.
There’s a reason for this. I made a decision to write my not-compiler in Lua. But I’m not using Lua. I’m not taking advantage of Lua’s dynamism— well, never mind that, I’m not even treating Lua as a Turing-complete programming language. Every thing I do here, I do in the most literal and naive way. This was an intentional decision! The goal of this particular program was to be a proof-of-concept and help me learn to use the LLVM-C functions. But, I think I’m done with that goal now.
In the next post, I’m going to start designing abstractions. Okay, constructing LLVM IR by individually calling an LLVM-C function for each line of assembly results in something which is not easy to read or write. What if I designed a Lua API specifically to make constructing programs in code pleasant? What would that look like?
— — — — —
POSTSCRIPT: SOME THINGS I WOULD APPRECIATE ANSWERS ON IF ANYONE READING THIS KNOWS
- The documentation refers to Aggregates and Vectors as derived types. Is a Pointer a "derived type"?
- With Vectors, is a Vector required to be the same size as an SIMD register on the target platform? What does LLVM do if it isn’t (for example, an
<8 x float>
vector on x86-64)? - This is a paragraph I cut from this blog post because it broke up the flow and also I’m not sure if it is accurate. Is the following accurate?
- The documentation calls some LLVM-C functions "instructions". Fundamentally, what an "instruction" does is assign a value to an SSA name (except for non-value-returning functions like
br
). If I chain together function calls, like passing the output ofLLVMBuildLoad
toLLVMBuildExactSDiv
, this looks to me like a tree of function applications but LLVM internally is rewriting it into a sequential series of SSA name assignments. What I think of as anLLVMValueRef
is, fundamentally, a single SSA name.
- The documentation calls some LLVM-C functions "instructions". Fundamentally, what an "instruction" does is assign a value to an SSA name (except for non-value-returning functions like
Any code snippets in this post, or in the BitBucket revisions linked in this post, are available under the Creative Commons Zero license.
May 5th, 2016 at 7:48 pm
Hey I’ve really appreciated this series of posts, you hit a lot of the same issues I did using the LLVM C API.
`CompositeType` seems to be a common superclass of pointers, arrays and struct type classes in the LLVM C++ code. So I would bet yes.
If you run the right optimization passes and enable the right backend configuration options it’s my understanding the answer is that LLVM will attempt to do the right thing. I get this from the loop vectorizer documentation and my experience with clang, I haven;t gotten there yet (soon!).
That last paragraph seems correct to me. It’s my understanding that each `LLVMValueRef` is (generally) an SSA register with a few exceptions (`LLVMGetParam` comes to mind), and that understanding of instructions seems generally correct (although `store`, `call` (optionally; which is super weird, you give it an empty – not null – name to get it as a return-less subroutine call instruction) and `ret` don’t assign their results to a new SSA register but are still “instructions” from what I can tell of the language spec).