A Visual Studio Macro to insert a new Guid

I’ve been trying to create some SharePoint Content Types and List Definitions recently, and everyone who done that before knows what you need for that: Guids, and quite a few of them. One for each Field, Feature, Solution… So instead of using GuidGen, I wanted something that inserts a new Guid at the cursor position in the Editor when I press a certain keyboard shortcut.

Luckily, this is rather easy with the Macro Editor. Just create a new Macro/Module and enter this code:

Sub InsertGuid()
    Dim newId As String = Guid.NewGuid().ToString("B")
    Dim doc As Document = DTE.ActiveDocument
    Dim textDoc As TextDocument = CType(doc.Object("TextDocument"), TextDocument)
    textDoc.StartPoint.CreateEditPoint()
    textDoc.Selection.Insert(newId)
End Sub

You can then go to Tools / Options / Environment / Keyboard and look for the Macro you just created (Macros.MyMacros.SomeModule.InsertGuid) and assign a Keyboard shortcut to it.

Writing a BF Compiler for .net (Part 5: [ and ] – while loops in IL)

The final two commands we’re looking at are [ and ]. Their description in the first article was a bit cryptic, [ was described as

Go to the next instruction if the byte at the memory pointer is not 0, otherwise move it past the matching ] instruction

while ] was described as

Go to the instruction after the matching [ if the byte at the memory pointer is not 0, else move it past the ]

In C# code, this is a lot simpler:

// BF Code for this: [-]
while (memory[pointer] > 0)
{
    // Instructions between [ and ]
    // The following instruction is only to have a body
    memory[pointer]--;
}

It’s a while-loop. It’s important to note that we have to use a pre-test loop, that is a loop that checks the condition before executing the loop (as opposed to a do-while loop which executes the code block at least once and checks afterwards).

So how does a while loop look in .net IL?

// See note below regarding .s suffix on br.s and bgt.s
IL_0000:  br.s       IL_001f
// This is the memory[pointer]-- instruction
IL_0002:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0007:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000c:  ldelema    [mscorlib]System.Byte
IL_0011:  dup
IL_0012:  ldobj      [mscorlib]System.Byte
IL_0017:  ldc.i4.1
IL_0018:  sub
IL_0019:  conv.u1
IL_001a:  stobj      [mscorlib]System.Byte
// This is the while loop
IL_001f:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0024:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_0029:  ldelem.u1
IL_002a:  ldc.i4.0
IL_002b:  bgt.s      IL_0002

GOTO considered harmful?
Okay, this looks complicated, but it is easy. To explain it, we have to open Pandora’s Box and look at the dirtiest secret there is in development: At Machine Level, GOTOs are essential.
Ha, take that Dijkstra!

Regardless how much you abstract it away, control structures like while have to be translated as “GOTO’s”, or more precisely as jumps to addresses to continue execute code from. In .net, this is not called GOTO though, it’s called Branch.

Our code has three parts: A single GOTO/Branch instruction at the beginning, the body of the loop (in our case the single memory[pointer]– instruction) and then the while check.

So we start with br.s, which is described as

Unconditionally transfers control to a target instruction (short form).

In other words, this is a GOTO and it goes to IL_001f. The code starting from here does the while-check: Load memory and pointer onto the stack. Then load the value of memory[pointer] onto the stack as Unsigned 8-Bit Int. Afterwards, push the number 0 to the stack.

Our evaluation stack now contains the value of memory[pointer] and the number 0. Then we have the new bgt.s command:

Transfers control to a target instruction (short form) if the first value is greater than the second value.

In other words and Pseudocode: if(memory[pointer] > 0) goto IL_0002;

The code starting from IL_0002 is our memory[pointer]– instruction which will be executed and then we’ll do the while-check again.

In Debug mode, the bgt instruction is not used. Instead, the check is done much more complicated. Feel free to look it up using ILDASM, but Debug Mode uses this C# Pseudocode to capture the result of the comparison into a local variable:

bool DoJump = memory[pointer] > 0;
if(DoJump) goto IL_0002;

This is useful for Debugging (who would’ve thought it, given that it’s a debug build?), but rather heavy compared to Release mode (8 instructions and a local variable compared to 5 instructions without).

Looking at that, you can easily imagine what the difference between a while and a do while loop is: The do while loop does not have the br.s instruction at the beginning. It therefore executes the method body at least once before it enters the while-check.

Before I end this post, I want to talk about short form commands.

What is “Short Form”?
If you look at the IL Commands, some say “Short Form”. What does this mean? Well, normally all addresses are 32 Bit, that is 4 Bytes. If you want an unconditional jump, you would use the br command with the target address. However, this means you’ll have 5 bytes in the target file – 1 for the Br Instruction and 4 for the target. As this instruction is so common, it would be a massive overhead to always have to write 5 bytes to the file.

Short Form commands only take 1 byte for the target address. The target here is described as

1-byte signed offset from the beginning of the instruction following the current instruction

So instead of giving an absolute address, we give a relative address to jump to instead. This only works if the target is less than ~125 bytes away (signed offset!) of course, so it’s a lot less flexible and your compiler needs to know the distance between the target and the jump instructions. However, the savings are huge as short form only requires 2 bytes, less than half of the full instructions.

This concludes the command overview. Part 6 will finally show how we will write our compiler.

Writing a BF Compiler for .net (Part 4: . and ,)

In the last two parts we tackled 4 of the 8 possible BF Commands: >, <, +, -. Now we look at . and , for input and output.

When working with a Console Application, it only makes sense to use the built-in commands Console.Write and Console.Read.

Let’s look at output (the . command) first. The C# code we’re converting is a one-liner:

Console.Write((char)memory[pointer]);

As memory[pointer] is a byte, we have to cast it to char to write it to the console. In IL, the line looks like this:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  ldelem.u1
IL_000b:  call       void [mscorlib]System.Console::Write(char)

We already know what the two ldsfld commands to: They load the static field onto the evaluation stack. Now, ldelem.u1 is a new command and this is our cast to char. To quote the Documentation:

Loads the element with type unsigned int8 at a specified array index onto the top of the evaluation stack as an int32.

In other words, ldelem.u1 expects to load an Unsigned 8-Bit Integer which is a byte. You may wonder where the cast is, as char is a 16-Bit Unsigned Integer (=UTF-16 Unicode Character). So how is that possible? Well, ECMA-335 contains the answer in Partition III, Section 1.6 Implicit argument coercion:

While the CLI operates only on 6 types (int32, native int, int64, F, O, and &) the metadata supplies a much richer model for parameters of methods. When about to call a method, the CLI performs implicit type conversions, detailed in the following table.

Translation: If the Parameter on the Stack is an int32 (which it is according to ldelem.u1), then the CLI will implicitly convert it to char if calling a method that wants a char.

The method call itself is then simply a call to static method Write in class System.Console in assembly mscorlib which returns void. The arguments to the method are taken from the evaluation stack. If a method takes multiple arguments, they have to be pushed in the correct order: First argument first.

That’s the . command: Get the value of memory[pointer], call Console.Write with it. What about the , command to read a line?

In C#, this is again a one-liner:

memory[pointer] = (Byte)Console.Read();

while in IL this is a few lines more:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  call       int32 [mscorlib]System.Console::Read()
IL_000f:  conv.u1
IL_0010:  stelem.i1

Once again, we start by loading our array and the index into it onto the stack. However, we are not doing anything with them right now. Instead, we call Console.Read which returns an int32. According to the documentation of the call command, The return value is pushed onto the stack.

So now our stack contains three values: The array, the current index and the return value of Console.Read (as Console.Read doesn’t take parameters our memory & pointer are still on the stack). conv.u1 takes the Int32, converts it to UInt8 (that’s the cast to byte in the method) and puts it on the stack again.

stelem.i1 is a new command:

Replaces the array element at a given index with the int8 value on the evaluation stack.

So this pops off the value, the index and the array and replaces the element. This is equivalent of calling ldelema followed by the operation that pushes the new value to the stack followed by stobj but only takes one instruction if the correct values are on the stack.

In Part 5, I’ll finish the command introduction with the [ and ] command (explaining how a while-loop works) and then we finally build our compiler!

How the Async support in RestSharp can help with Report Generation

Note: This was written a long time ago for the then-current version of RestSharp that had experimental Async support. John and his contributors have updated RestSharp tremendously since then, but by now these samples are outdated and only here for illustrative purposes.

I’ve been using RestSharp in the past weeks for some backend tools, mainly because REST is easy to implement and I like the Deserialization support that comes with it. One of the reasons I wanted to add async support is because I wanted to write a monitoring application.

I am using Microsoft Chart Controls, and I have a server that accepts REST Requests in the form /reports/health/{entity} to return something like this:

<healthdata>
    <healthindex>73</healthindex>
</healthdata>

I don’t know which entities are being pinged at compile time. I can’t even put them in a config file, as someone might add or remove entities while the application is still running. So I decided to have an architecture with a Timer, a Request Dispatcher and a Response Handler.

The Timer executes every 5 seconds, gets the current list of entities and calls the Request Dispatcher:

private void timer1_Tick(object sender, EventArgs e)
{
    foreach(string entity in GetEntities()){
        DispatchRequest(entity);
    }
}

The Request Dispatcher creates a new RestRequest and a custom State Object. The state object will be important later, for now lets just note that it contains the entityname and the time the request was sent.

private void DispatchRequest(string entityName)
{
    var rq = new RestRequest("health/{entity}", Method.GET);
    rq.AddParameter("entity", entityName, ParameterType.UrlSegment);

    var state = new ServiceState {Entity = entityName, RequestDateUtc = DateTime.UtcNow};
    _client.BeginExecute<healthdata>(rq, HandleResponse, state);
}

The Request Dispatcher doesn’t care what happens afterwards, it just happily fires new requests all the time and tells BeginExecute to call HandleResponse afterwards.

private void HandleResponse(IAsyncResult res)
{
    // Note: This assumes the response is always good. In reality, you want to
    // check response.ResponseStatus and act accordingly
    var result = (AsyncResult) res;
    var caller = (RequestExecuteCaller<healthdata>) result.AsyncDelegate;
    var state = res.AsyncState as ServiceState;
    var response = caller.EndInvoke(res);
    AddDataToChart(state.Entity, response.Data.healthindex, state.RequestDateUtc);
}

Here comes our state object into play. If you look at the XML from the server above, you see that it doesn’t echo the Entity Name. As HandleResponse handles all responses, it has to know which response corresponds to which server. Also, async operations can come in a different order than the one executed – I may send requests for entities s1, s2, s3, s4 and receive them back in the order s3, s1, s4, s2. If the timer interval is too small and the server latency is too high, I might even receive a later request for s2 before an earlier one – it’s pure anarchy.

That’s why I’ve passed a state object to BeginExecute, as this allowed me to capture the entity name and the exact Date of the request. So all that HandleResponse has to do is to get the state and healthdata and call the function that adds the data to the chart with precise information about the Time, the Entity and it’s health.

All without ever locking the UI or without giving the dispatcher or the handler too much work that they shouldn’t be doing.


(Excuse the horrible colors, I’m only a developer :))

How Optional Parameters work, why they can be dangerous, and why they work in .net 2/3 as well

One of the changes I really like in Visual Studio 2010 are optional parameters to method. Basically they allow you to specify a default value for each parameter and thus reduce the number of overloads to a method. Make sure to read to the end as there is a huge word of caution regarding optional parameters.

Instead of having this:

private static string SomeMethod(int value)
{
    return SomeMethod(value, "Was Empty");
}

private static string SomeMethod(int value, string data)
{
    return string.Format("{0}: {1}", value, data);
}

You can just have this:

// static because it's a console app, no extra magic
private static string OptionalMethod(int value, string data = "Was Empty")
{
    return string.Format("{0}: {1}", value, data);
}

The nice thing about this feature is that it also works with .net 2/3. Why? Because it is implemented like this:

private static string OptionalMethod(int value,
                          [Optional, DefaultParameterValue("Was Empty")] string data)
{
    return string.Format("{0}: {1}", value, data);
}

Looks like a normal method (the longest overload) with some attributes. Are those attributes the magic? No, it’s much simpler.
If this is your calling code:

Console.WriteLine(OptionalMethod(1));
Console.WriteLine(OptionalMethod(2,"Test"));

Then this is how it looks in the compiled assembly:

Console.WriteLine(OptionalMethod(1, "Was Empty"));
Console.WriteLine(OptionalMethod(2, "Test"));

As you see, the compiler changes the method call to put in all optional parameters into the caller. This explains why this works with pre-.net 4 applications. However, this also explains the huge warning I want to give:

Changing the optional parameter value will not change the behavior of your callers!

Imagine you want to change the value to “No value specified”. With the overload solution this is trivial:

private static string SomeMethod(int value)
{
    return SomeMethod(value, "No value specified");
}

Recompile your provider assembly and every consumer who uses that overload will get the new value. However, with optional parameters you have to re-compile each and every consumer assembly as well, so changing an optional parameter is a big, breaking change.

Does that mean that Optional Parameters are evil and should be avoided at all costs? Some people might think so, but I think they are useful for variables that almost never change. Think about a function like SendMail(string server, int port = 25).

However, I do object that ReSharper 5 offers the hint to introduce Optional Parameters too lightly:

Edit: I gave the ReSharper guys bad credit. If the method is declared as public, it does not offer that hint. Sorry guys for not double checking that! This then makes complete sense, as Optional Parameters are fine and safe for private/internal members.

I know too many people why blindly follow any advice that ReSharper give and that might see optional parameters as “Ohh, Shiny, no more Overloads!”. Of course, you can argue that developers should know their stuff and deserve to fall hard on their face once they rolled it out as a big API and suddenly realize they have to recompile/redeploy dozens or hundreds of consumer assemblies because of a change. I am however not a fan of this type of developer elitism and would have preferred if ReSharper would not show that hint by default.

Just my 2 cents though.

Writing a BF Compiler for .net (Part 3: pointer++ and pointer–)

The last article gave an introduction to the concepts and we started looking at the memory[pointer]++/– functions. Today is a quicky post to look at two other instructions, pointer++ and pointer– or > and < in BF.

Unsurprisingly, these are extremely simple operations, requiring only 5 IL Commands:

  IL_0000:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
  IL_0005:  ldc.i4.1
  IL_0006:  add
  IL_0007:  conv.i2
  IL_0008:  stsfld     int16 BFHelloWorldCSharp.Program::pointer

ldsfld loads a static field onto the evaluation stack, ldc.i4.1 pushes the number 1 to the stack, add takes the two values from the stack and pushes the result back. conv.i2 converts the value on the stack (which is an Int32) to Int16 (2-Byte Int), pads it to be Int32 (as that is the smallest datatype possible on the stack) and pushes it back. stsfld then replaces the value in the static variable with the value from the stack.

As usual, pointer– works the exact same way with the difference of using sub instead of add. One word of note: sub subtracts value2 from value1. While the order doesn’t matter for add, it does for sub. Also note that add and sub do not detect overflow, so you can happily add 1 to Int16.MaxValue. If you want overflow checking, there is sub.ovf and add.ovf which throw an OverflowException.

With the 4 easy operations done, we will look at . and , next for input and output. Finally, we will look at [ and ]. After we have looked at the IL for each operation, we will write our compiler.

Writing a BF Compiler for .net (Part 2: Writing BF in C# and looking at the IL)

In the last posting, we looked at BF as a language and how to write an interpreter in C#. The point of doing that was to understand what each function actually does. As I am no IL Expert, my second step was to write the Hello World BF Application as C#, that is one extremely long function with each BF command spelled out. You can find the full function here. This is an Excerpt:

static void Main(string[] args)
{
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    while(memory[pointer] > 0)
    {
        pointer++;
        memory[pointer]++;
        memory[pointer]++;
        memory[pointer]++;
        // snipped
    }
    // snipped
}

As you see, we literally translated each instruction into a C# instruction. Compile this in Debug Mode, load it up in ildasm:

Whoa, now were talking! Note that the ildasm will look different between Debug and Release modes due to optimizations. I prefer Debug to get an understanding how this works, and then later on Release mode to see the differences. The point is really to understand how this stuff looks in IL as we have to write IL ourselves later on.

So the first C# instruction is memory[pointer]++, which is a short way of saying memory[pointer] = memory[pointer] + 1. It is important that many language constructs and short-hand commands simply do not exist in IL – they are convenience functions in the C# compiler. Looking at the IL, our first C# instruction is now 9 IL instructions, starting from IL_0001 and ending at IL_0019.

  IL_0001:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
  IL_0006:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
  IL_000b:  ldelema    [mscorlib]System.Byte
  IL_0010:  dup
  IL_0011:  ldobj      [mscorlib]System.Byte
  IL_0016:  ldc.i4.1
  IL_0017:  add
  IL_0018:  conv.u1
  IL_0019:  stobj      [mscorlib]System.Byte

Let’s quickly look at each command before I explain in Detail what is happening:

  1. Push the value of a static field ‘memory’ onto the evaluation stack.
  2. Push the value of a static field ‘pointer’ onto the evaluation stack.
  3. Load the address of the array element at a specified array index onto the top of the evaluation stack as type & (managed pointer).
  4. Copy the current topmost value on the evaluation stack, and then pushes the copy onto the evaluation stack.
  5. Copy the value type object pointed to by an address to the top of the evaluation stack.
  6. Push the integer value of 1 onto the evaluation stack as an int32.
  7. Add two values and pushes the result onto the evaluation stack.
  8. Convert the value on top of the evaluation stack to unsigned int8, and extends it to int32.
  9. Copy a value of a specified type from the evaluation stack into a supplied memory address.

That looks more complicated than it really is, especially if you don’t know how CPUs usually work or never worked with Assembler. .net/IL is a Stack Based language, and functions work against this stack. For example, the Add instruction at address IL_0016 does not take parameters. Instead, it removes 2 elements from the stack, adds them, and pushes the result back to the stack. Be aware that most functions assume that the objects on the stack have the correct type – trying to push a string and an Int32 and calling Add will crash the CLR. Not all functions require their parameters on the stack, but many do – you can find an Overview of each function in Partition III of the ECMA-335 standard or in the OpCodes Fields.

Anyway, let’s dissect the instruction and look at what’s happening.
1. Getting memory[pointer]
In order to increment the value at memory[pointer], we first need to get it. This is the job of the ldelema instruction at IL_000a. To quote from the documentation:
Loads the address of the array element at a specified array index onto the top of the evaluation stack as type & (managed pointer).

So this function requires an array and an array index, and it gives us a managed pointer to that element. It expects that we push the array and the index to the stack (in that order). It will then pop (remove) these two elements from the stack and pushes the pointer back to the array.

The ldsfld function pushes the value of a static field onto the stack (pointer and memory were declared static). So we first push the memory array to the stack, and then the value of the pointer variable.

Here is a diagram:

2. Incrementing the value
Now we have a pointer to memory[pointer], but we need to increase it. That is what the next 4 instructions do. The important function is add at IL_0017, which takes two values from the stack and adds them. But at the moment, we only have the managed pointer on the stack. We first need to get the value of memory[pointer] to the stack. This is that ldobj at IL_0011 does: It gets the address of the value from the stack and pushes the actual value onto it. Note that this is now just a byte – it has no connection to the memory array, it is simply a number.

Before that, you might wonder what the dup instruction does: It simply creates a copy of the topmost value on the stack, effectively duplicating it. This might seem useless for the addition, but it is required later. For now, just ignore it.

Now that we have the current value of the memory[pointer], we need the second number to perform the addition. As ++ is just a shorthand for +1, we need to push the number 1 to the stack. To push an Int32 value to the stack, you can use the ldc.i4 instruction which pushes the supplied number to the stack. However, there are also shorthand commands for some numbers, and the command ldc.i4.1 pushes the number 1 to the stack. (I do not know if these commands exist for performance or for code size reasons, also I strongly believe the latter. ldc.i4.1 is a 1-byte command, whereas ldc.i4 requires 5 bytes: 1 for the command and 4 bytes for the Int32 value to push).

So now we have the current value of memory[pointer] and the number 1 on the stack, and the Add function pops them off, performs the addition and pushes the result.

A diagram for this part:

Just a slight note: The 1 at the end is not the same 1 as the one before. Add removes 0 and 1 from the stack, “generates a new 1” through performing 0 + 1 and pushes that to the stack.

3. Storing the value in the array again
We performed the addition, but as a result we simply have the number 1 on the stack now – memory[pointer] is still 0 though! The last two instructions take care of saving it back. Now, the number 1 on the stack is an Int32, but we have an array of Bytes. The first instruction, conv.u1 at IL_0018 gets a value off the stack, converts it to an unsigned 8-bit integer (which is exactly what a Byte in .net is), adds padding to Int32 and pushes it back to the stack. This might seem silly, but the evaluation stack only holds 4- or 8-byte integers (See Partition III, 1.1 Data Types of ECMA-335). The conv instruction makes sure that the actual data is stored in the “correct” bits (the ones later used to read the value), while the other 24 bits are merely padding to make it an Int32.

stobj copies the value to the desired memory location. You may now understand why we had the dup instruction at IL_0010: We need to know where to copy the value to, but the ldobj object removed the address from the stack already at IL_0011! So by duplicating the address, we can give it to both functions (otherwise we would have to do the lookup in the first three lines again).

And that is the memory[pointer]++ C# instruction in IL! I know it looks scary at first, but it’s rather simple and logical once you understood how this stuff works internally. Remember, at some point each programming language needs to be translated into very specific CPU instructions. We are not at that level (this is what the CLR and it’s JITter does), but we are at a fairly low level. Remember, we have to write a compiler for this, so we need to understand a) what is happening and b) why it is happening. Making those diagrams helped me a lot, especially because I didn’t understand the use of the dup instruction at first.

Now that we know how this instruction works, we will look at the other 7 BF Instructions. I can already tell you how memory[pointer]– works: It is exactly the same function, with the difference of using sub instead of add at IL_0017, so no need for an article of it’s own.

The two pointer operations (pointer++ and pointer–) will be described in the next article.

Writing a BF Compiler for .net (Part 1: Explanation of the language and interpreter in C#)

Today I’m starting a small series about how to write a Compiler to .net IL. I wanted to understand IL better and wanted to learn how .net really works. Even though I’m a Web Developer (which usually means working on a much higher level of abstraction), at some point there is this impossible issue that has no cause and no resolution and where Reflector and WinDbg have to come in.

My main focus is understanding IL, not implementing a language. So I picked an extremely simple but otherwise complete language – Brainf**k (abbreviated BF from here on). Yes, it is a very esoteric language, but as said, focus is on the IL Part, not the actual language.

BF only has 8 instructions, represented by a single character each:

> Moves the memory pointer forward
< Moves the memory pointer backwards
+ Increments the value at the current memory pointer
Decrements the value at the current memory pointer
. Outputs the value at the current memory pointer
, Reads a single byte from some input, storing it at the memory pointer
[ Go to the next instruction if the byte at the memory pointer is not 0, otherwise move it past the matching ] instruction
] Go to the instruction after the matching [ if the byte at the memory pointer is not 0, else move it past the ]

So basically with BF you have three objects: Some Memory, a Pointer to a byte in the Memory and an instruction pointer. The memory can be implemented as a simple byte array and the memory pointer as an index into it.

Let’s say we have 4 bytes of memory. At the beginning, everything is initialized to 0 (P = Memory Pointer):
Memory: 00 00 00 00 P:0

Now we execute the following code:
++>+++>++++>+<-

The result is this:
Memory: 02 03 03 01 P:2

The [ and ] instructions are more complicated, but essentially they form a while loop – while(memory[pointer] != 0) { // instructions between the brackets }. One example is to increase the value of a register without having many + signs. Let’s say we want to increase memory[1] to 25. For that, we use memory[0] as a counter:
+++++[>+++++<-]

After running this, the memory will look like this:
Memory: 00 19 00 00 P:0
(Remember that this is hex, so 19h is 25 decimal)

What did we do? We increased memory[0] to 5. Then we entered a while-block. First, we move the memory pointer to memory[1] and increase it 5 times, to 5. Then we move back to memory[0] and decrease it to 4. The ] then checks if memory[0] is 0, which it isn't, so we return to the instruction after the [, which means we move again to memory[1], increase it 5 more times, move back to memory[0], decrease it to 3 and repeat until memory[0] is 0.

As said, not terribly complicated once you understood how the brackets and the memory pointer work. Let's look at Hello World in BF:
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
(Note that this needs 5 bytes of memory instead of 4)

This looks scary, but is also quite simple if you are familiar with ASCII Encoding. The . command outputs the value at the memory pointer to the console and the console then needs to decide what to do with it. Again, everything in a PC is essentially just a stream of bytes, and a byte can mean different things to different applications. The Console tries to display the byte by looking up which character maps to a byte. In ASCII, the letter H is mapped to byte 48h or 72 decimal. So if you want to display the letter H, you need to increase the value 72 times.

The Hello World example uses the while loop to save a lot of +. Also instead of using 13 Bytes (The Letters H, e, l, l, o, [space], W, o, r, l, d, !, [line break]), the example only uses 5 bytes by reusing existing memory locations. Look up the values for each letter in the ASCII table and it should become clear how this works. The Wikipedia Article contains a lot more explanations.

Now, before I started building a real compiler, I first wanted to see how this would work and could be implemented, so I started with an interpreter in C# that reads text from one TextBox and outputs results into another one (no support for the , instruction here though). The code for that is somewhat naive, but it helps understanding how this could work:

byte[] memory = new byte[Int16.MaxValue];
Int16 memoryPointer = 0;

private void Execute(char[] code)
{
    int instructionPointer = 0;
    while (instructionPointer < code.Length)
    {
        var currentCommand = code[instructionPointer];
        instructionPointer++;
        switch (currentCommand)
        {
            case '>':
                memoryPointer++;
                break;
            case '<':
                memoryPointer--;
                break;
            case '+':
                memory[memoryPointer]++;
                break;
            case '-':
                memory[memoryPointer]--;
                break;
            case '.':
                // Weird casting because AppendText wants a string, but we have to
                // convert the byte to a char first.
                textBox2.AppendText(((char)memory[memoryPointer]).ToString());
                break;
            case ',':
                // read 1 byte of input - not implemented
                break;
            case '[':
                int currentIndex = instructionPointer;
                int bracketCounter = 1;
                while (bracketCounter > 0 &&
                       instructionPointer < code.Length &&
                       code[instructionPointer] > 0)
                {
                    if (code[instructionPointer] == '[') bracketCounter++;
                    if (code[instructionPointer] == ']') bracketCounter--;
                    instructionPointer++;
                }
                if (bracketCounter == 0)
                {
                    // Change previous ] to \0 so that the recursive call
                    // returns at the end of the block
                    code[instructionPointer - 1] = '\0';
                    while(memory[memoryPointer] > 0) {
                       Execute(code.Skip(currentIndex).ToArray());
                    }
                    code[instructionPointer - 1] = ']';
                    break;
                }
                break;
            case ']':
                // The ] bracket is normally handled as part of the [ case above
                throw new InvalidOperationException("Unbalanced Brackets");
                break;
            case '\0':
                return;
        }
    }
}

First, we declare our memory. As this is .net and I didn't want to use unsafe code/pointers, I'm declaring an Array of Bytes. The total size is 32767 bytes. Then I declare a memoryPointer which is just an index into the memory array (it's not a pointer in the traditional sense, I just called it that).

Then we have the Execute method, which takes an array of bytes - the BF code. We iterate through this array and perform the appropriate action for each instruction. This should be really straight-forward, except for the [, ] and \0 cases. If we encounter a [, we go all the way to the end in order to find until the instruction pointer hits the end of the code or a 0-byte. The bracketCounter is there for us to make sure our brackets are balanced - you can nest brackets, but they have to be balanced. [++[++]++ is illegal code as the first [ does not have a matching ]. As soon as our bracketCounter hits 0 again, we stop this loop. The important side effect of this: The instructionPointer is now behind the ].

We replace the ] with a 0-byte and recursively call the Execute function with the fraction of the code after the [ (that's the job of the currentIndex variable: Remember where the [ was). The recursive call will return as soon as it hits the 0-byte. The while-loop then repeats this recursive call until the memory at the pointer is 0 and re-inserts the ].

If you execute this in a WinForms application, your textBox2 should display Hello World!.

Phew, I hope you could follow me all the way down here and got an understanding of how BF works. Tomorrow we will translate the BF code into C# code and look at the generated IL, and then I will post articles explaining what each of these instructions do, how IL OpCodes work, what the fact that .net is Stack based actually means, and then we will have a real BF -> .net Compiler.

Expect a fun little series 🙂

Do we need an open source alternative to Reflector?

I just started .net Reflector and got the update prompt:

Nothing special here, except that both options are bad. Clicking Yes gives me this window:

And clicking No prevents Reflector from starting. So basically I can’t use Reflector unless I download a new version manually or disconnect my PC from the internet or somehow else suppress the update check. This got me thinking about the Status of .net Reflector.

It’s an excellent tool, to the point it’s almost essential for many developers. Lutz Roeder did a fantastic job with it, and Red Gate is continuing to improve it. But the tool is not free, and the message above just had me realize again that it can be shut down any time. Of course, Red Gate has promised to keep a free version and I have no reason to think they are not truthful, but at the same time I’ve seen many companies shutting down free products for whatever reason. This would be absolutely their right, but it got me thinking: Is .net Reflector “too big to fail”?

Should there be an Open Source Alternative? Or am I too heavily biased (being a SharePoint developer) and overestimate the importance of it?

Experimental Async support for RestSharp

This article is outdated and based on my very first, experimental patch. John Shehan made some massive improvements, so I only leave this article online for reference about the traditional ASyncPattern. Have a look at RestClient.ExecuteAsync in newer versions of RestSharp.

I have just commited a change to my ASync branch on my fork of RestSharp on GitHub. I want to do some more testing first before I ask John Sheehan to pull in the change. The new files are IAsyncRestClient.cs and AsyncRestClient.cs which derive from their normal (synchronous) implementations. Edit: This change was merged into the official RestSharp source code and is documented on the RestSharp Wiki. There was a major bug fix to the functionality on April 18, 2010.

Anyway, here are the examples on how to use it. I assume you know what the state parameter on an Async call does, if not just set it to null.

var client = new AsyncRestClient(serviceUrl);
var request = new RestRequest(Method.GET);

// Synchronous Execution still works - AsyncRestClient derives from RestClient
var SyncResponse = client.Execute(request);
Console.WriteLine(response.Content);

// But Asynchronous Execution is much nicer!
// Method 1: Waiting for an Asynchronous Call with EndExecute
var asyncRes1 = client.BeginExecute(request, null, "some state, can be null");
var responseAsync1 = client.EndExecute(asyncRes1);
Console.WriteLine(responseAsync1.Content);

// Method 2: Polling for Asynchronous Call Completion
var asyncRes2 = client.BeginExecute(request, null, "some state");
while (!asyncRes2.IsCompleted)
{
    Console.Write(".");
    System.Threading.Thread.Sleep(100);
}
var responseAsync2 = client.EndExecute(asyncRes2);
Console.WriteLine(responseAsync2.Content);

// Method 3: Waiting for an Asynchronous Call with WaitHandle
var asyncRes3 = client.BeginExecute(request, null, "some state");
asyncRes3.AsyncWaitHandle.WaitOne();
var responseAsync3 = client.EndExecute(asyncRes3);
Console.WriteLine(responseAsync3.Content);

// Method 4: Using a Callback Method
static void Main(string[] args)
{
    var asyncRes4 = client.BeginExecute(request, EndResponse, "some state");
    Console.ReadLine();            
}

static void EndResponse(IAsyncResult res)
{
    var result = (AsyncResult)res;
    var caller = (RequestExecuteCaller)result.AsyncDelegate;
    var response = caller.EndInvoke(res);
    Console.WriteLine("AsyncResult: "+response.Content);
}

// This also works for the Execute<T> method, without callback...
var asyncResT = client.BeginExecute<MyDTOClass>(request, null, "some state");
var responseAsyncT = client.EndExecute<MyDTOClass>(asyncResT);
Console.WriteLine(responseAsyncT.Data.SomePropertyInMyClass);

// ...and with Callback
static void Main(string[] args)
{
    var asyncResTC = client.BeginExecute<MyDTOClass>(request, EndResponse, "some state");
    Console.ReadLine();            
}

static void EndResponse(IAsyncResult res)
{
    var result = (AsyncResult)res;
    var caller = (RequestExecuteCaller<MyDTOClass>)result.AsyncDelegate;
    var response = caller.EndInvoke(res);
    Console.WriteLine("AsyncResult: "+response.Data.SomePropertyInMyClass);
}