Writing a BF Compiler for .net (Part 4: . and ,)

In the last two parts we tackled 4 of the 8 possible BF Commands: >, <, +, -. Now we look at . and , for input and output.

When working with a Console Application, it only makes sense to use the built-in commands Console.Write and Console.Read.

Let's look at output (the . command) first. The C# code we're converting is a one-liner:

Console.Write((char)memory[pointer]);

As memory[pointer] is a byte, we have to cast it to char to write it to the console. In IL, the line looks like this:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  ldelem.u1
IL_000b:  call       void [mscorlib]System.Console::Write(char)

We already know what the two ldsfld commands to: They load the static field onto the evaluation stack. Now, ldelem.u1 is a new command and this is our cast to char. To quote the Documentation:

Loads the element with type unsigned int8 at a specified array index onto the top of the evaluation stack as an int32.

In other words, ldelem.u1 expects to load an Unsigned 8-Bit Integer which is a byte. You may wonder where the cast is, as char is a 16-Bit Unsigned Integer (=UTF-16 Unicode Character). So how is that possible? Well, ECMA-335 contains the answer in Partition III, Section 1.6 Implicit argument coercion:

While the CLI operates only on 6 types (int32, native int, int64, F, O, and &) the metadata supplies a much richer model for parameters of methods. When about to call a method, the CLI performs implicit type conversions, detailed in the following table.

Translation: If the Parameter on the Stack is an int32 (which it is according to ldelem.u1), then the CLI will implicitly convert it to char if calling a method that wants a char.

The method call itself is then simply a call to static method Write in class System.Console in assembly mscorlib which returns void. The arguments to the method are taken from the evaluation stack. If a method takes multiple arguments, they have to be pushed in the correct order: First argument first.

That's the . command: Get the value of memory[pointer], call Console.Write with it. What about the , command to read a line?

In C#, this is again a one-liner:

memory[pointer] = (Byte)Console.Read();

while in IL this is a few lines more:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  call       int32 [mscorlib]System.Console::Read()
IL_000f:  conv.u1
IL_0010:  stelem.i1

Once again, we start by loading our array and the index into it onto the stack. However, we are not doing anything with them right now. Instead, we call Console.Read which returns an int32. According to the documentation of the call command, The return value is pushed onto the stack.

So now our stack contains three values: The array, the current index and the return value of Console.Read (as Console.Read doesn't take parameters our memory & pointer are still on the stack). conv.u1 takes the Int32, converts it to UInt8 (that's the cast to byte in the method) and puts it on the stack again.

stelem.i1 is a new command:

Replaces the array element at a given index with the int8 value on the evaluation stack.

So this pops off the value, the index and the array and replaces the element. This is equivalent of calling ldelema followed by the operation that pushes the new value to the stack followed by stobj but only takes one instruction if the correct values are on the stack.

In Part 5, I'll finish the command introduction with the [ and ] command (explaining how a while-loop works) and then we finally build our compiler!

Comments (1)

[...] Part 4 covered these two, but now it gets a bit interesting. We need to call a method, Console.Read and Console.Write. [...]