Careful with SPContext.Current…

...as it will be NULL within a Timer Job or Workflow. I have some shared Data Access classes that use SPContext.Current.Web all over the place and now that I want to use them from within a Timer Job, I have to refactor them to take a SPWeb as a Parameter...

Why doesn’t Windows offer a working help system anymore?

If you are developing Windows Desktop applications, you may want to offer context-sensitive help, triggered either by pressing F1 or by clicking on the question mark icon in the title bar and on an element. Back in the old days (starting in 1990 and de-facto ending in 2006), there was WinHelp.

Now, WinHelp wasn't exactly beautiful and in recent years (after 1996 that is), the "Maximize Database Size" dialog was downright stupid, but WinHelp had all the features a Help system needs: Articles are organized in Chapters and can contain Images, Links and basic formatting. And it allowed your application to open a specific page, providing contextual help.

But most importantly, WinHelp just WORKS. Really, you press F1 and maybe you have to "Maximize Database Size" once, but then it opens. I never ever had a problem with WinHelp.

But Microsoft decided it wasn't modern anymore. That we needed something new. Granted, WinHelp clearly showed it's age, and creation of Help Files was somewhat complicated. So they introduced Compiled HTML Help, or CHM. It is a modern Help system, allowing you much more freedom with your layout and styling. It's a really good format, with one tiny little problem: CHM doesn't actually work:

Turns out that CHM is displayed through the MSHTML Control (which is essentially an embedded Internet Explorer) and thus it has some security limitations. The most important one is that CHM files on non-trusted (e.g., network) locations simply don't work.

Now, you may say that this can be resolved. The file can be unblocked, or the path can be set to trusted. An Application Installer could do that. I reply: Doesn't matter. It's a Help system. It has to work without configuration. Press F1, get help. If I'm in a situation I need help using my application and my help system tells me that it wants some treats first, it's a failure. Besides, not every Application has an installer because not every application needs one. A large amount of applications are just DLLs (like the one the above screenshot is from) or ZIPped application files.

So CHM is a complete and utter failure, and Microsoft at least acknowledged that by killing off Microsoft Help 2 and starting a new approach with MAML. However, MAML is not a Help System, it's a language that can be used as source to be converted into an output format like HTML, RTF or whatever. In other words, Microsoft has created DocBook again without actually solving the problem of displaying help.

The real successor to CHM seems to be the HelpPane introduced in Windows Vista and included in Windows 2008 and 7 as well. Those help files have the extension h1s and a nice little icon, so Windows knows what they are. There is our new Help system, right? Well, try to double click one of those h1s files...

Hmmm... So Microsoft didn't just register a file type handler for h1s files. Well, can't be that hard to do, can it?

AP Help - Guided Help - Technical FAQ

Can I launch Guided Help through other means besides the Help Pane?
Yes, but you must create and publish the Guided Help topic through Help. Once you have a Guided Help topic compiled into an H1S file and installed (at this stage only possible for Microsoft and OEM's), you can launch it directly through a command line if you wish.
The syntax is:

%systemroot%\system32\acw.exe –Extensions GuidedHelp.dll –taskID mshelp://windows/?id=id-of-your-help-topic –ExecutionMode DoIt | ShowMe

For a fast impression copy following text to your run dialog:

%SystemRoot%\System32\ACW.exe -Extensions GuidedHelp.dll -taskID mshelp://windows/?id=3726934c-1315-4c29-bd4d-e42c10225e5a -ExecutionMode ShowMe

Excuse me, but ARE YOU FRIGGIN' KIDDING ME? Oh, yes you are, let me just quote Microsoft:

Microsoft is committed to providing Help and Support technology in the Windows® platform and will continue to investigate new solutions for software developers.

Sorry, but if "comitted" means "Killing off perfectly working solutions and replacing them with a plethora of broken solutions every two years" then you are absolutely right, because that's what you are doing. WinHelp survived 16 Years and if you would still ship it with Vista and 7 then it would still be alive. So you as an application developer, what can you do? WinHelp isn't part of Vista and Windows 7 anymore and you're not allowed to distribute it with your application. CHM/H1S doesn't work. What are your alternatives?

Some applications use PDF. They offer rich layout and a Table of Content, however there is no standard reader. Sure, there is Adobe Reader, but you can't easily control it (e.g., open a PDF on a given page) - if the user has a version that is too old or too new for your application, you may run into issues. And if the user doesn't have Adobe Reader (or any other PDF reader) installed, you have to explain why someone would download an additional program just because you're not competent enough to include help. So PDF is not an option.

What about HTML Files? Everyone has a browser, even the short lived Windows 7 E Editions included MSHTML allowing you to at least display HTML within an application. The major downside of HTML is that you can't control which browser displays it, so you have to stay conservative and make sure old Internet Explorer or Firefox browsers display it (say goodbye to transparent PNGs...). JavaScript maybe tricky (also due to widely spread Extensions like NoScript). And instead of one help file, you have a whole folder. Adding contextual help to your application is somewhat possible, but overall you simply lose the ability to control and test how the help looks and works.

This is possibly the moment where you expect me to say "But after researching all these non-working options, here is the one that works!". Sorry, can't do that. I don't know a single Help system that works on Windows Vista/7/2008. I asked on StackOverflow a long time ago and the consensus was the same.

It's really sad that a task that seems so simple and straight forward is too hard for Microsoft. Seriously, all that you need to do is to take a simple container format, some basic formatting options, the ability to link and embed images and an API to call Help from your application. If you want, include video support with a standard codec (keeping in mind Windows N/KN Editions)

Simple, easy, straight-forward, hassle-free or in other words: Exactly how a Help System should work. Exactly how WinHelp worked since 1990 before it was brutally murdered. Rest in Piece WinHelp, we miss you dearly.

Dealing with Multiple Time Zones in SharePoint 2010

Organizations that deploy SharePoint farms often have employees in different countries, or at least in different Time Zones. While people in the US (which spans 4 time zones) are pretty comfortable with translating between time zones all the time, the same cannot be said for everyone. Trying to translate between Pacific Time and Middle European Time is just painful, especially since the daylight savings time starts and end at different dates.

With SharePoint 2010 you get the tools to convert the time according to the users time zone. There are two types of Regional Settings: Each Site (SPWeb) has RegionalSettings that specify the Time Zone (and Locale, Calendar etc.) for that site. This is useful if you have sites that are predominately used by people in one time zone. The second type of Regional Settings are the one the user (SPUser) can set (My Settings - My Regional Settings). Those are the same settings as the ones on SPWeb, but each user can specify their own setting.

When storing Dates in code, you have two options:

  • Store the time in local time of the Web and use DatesInUtc = true on a SPQuery to get it back as Utc
  • Store the time in Utc and to not use DatesInUtc on SPQuery

What does that mean? As said, each SPWeb has it's own Regional Settings. Let's assume you have a date of 2010-06-14 15:00:00.

If the TimeZone of the SPWeb is Pacific Time (GMT-8) and you query the List using SPQuery, you get back this date. If you however set DatesInUtc = true on the SPQuery, you get back 2010-06-14 22:00:00. SharePoint doesn't know if 15:00:00 was already UTC, so using DatesInUtc may translate a date twice.

The caveat here is that when storing dates, you would normalize them either to UTC or to the Local Time of the Web. What would you do if some employee from Texas (which runs on Central Time, GMT-6) enters 2010-06-14 15:00:00? You would need to store it either as GMT-8 (so the time becomes 13:00:00) or as UTC (22:00:00).

Needless to say, I prefer to store all dates as UTC if the list isn't visible to the user directly. Then when querying the list through Code, I can just convert the time to whatever the user's timezone is:

var user = SPContext.Current.Web.CurrentUser;
// Always perform a Null-Check on SPUser.RegionalSettings
if (user.RegionalSettings != null)
{
    return user.RegionalSettings.TimeZone.UTCToLocalTime(listDateUtc);
}
else
{
    // User didn't set a time zone, so use the one from the Web
    return SPContext.Current.Web.RegionalSettings.TimeZone.UTCToLocalTime(listDateUtc);
}

Overall, the option for people to set their own timezones independently from the SPWeb is a fantastic and long needed addition. On the other hand, it does make dealing with times a bit more complex.

If the list is visible to the user, you may need to normalize the times differently (for example, use user.RegionalSettings.TimeZone.LocalTimeToUTC to convert a user time to UTC and then SPWeb.RegionalSettings.TimeZone.UTCToLocalTime to convert the time to the Web-Time).

If you do build custom pages that make use of the Microsoft.SharePoint.WebControls.DateTimeControl then you can just use UseTimeZoneAdjustment="true" on it to have it automatically convert to UTC and back (SelectedDate will be UTC when accessed through code, but the User's/Web's time when rendered).

A Visual Studio Macro to insert a new Guid

I've been trying to create some SharePoint Content Types and List Definitions recently, and everyone who done that before knows what you need for that: Guids, and quite a few of them. One for each Field, Feature, Solution... So instead of using GuidGen, I wanted something that inserts a new Guid at the cursor position in the Editor when I press a certain keyboard shortcut.

Luckily, this is rather easy with the Macro Editor. Just create a new Macro/Module and enter this code:

Sub InsertGuid()
    Dim newId As String = Guid.NewGuid().ToString("B")
    Dim doc As Document = DTE.ActiveDocument
    Dim textDoc As TextDocument = CType(doc.Object("TextDocument"), TextDocument)
    textDoc.StartPoint.CreateEditPoint()
    textDoc.Selection.Insert(newId)
End Sub

You can then go to Tools / Options / Environment / Keyboard and look for the Macro you just created (Macros.MyMacros.SomeModule.InsertGuid) and assign a Keyboard shortcut to it.

Writing a BF Compiler for .net (Part 5: [ and ] – while loops in IL)

The final two commands we're looking at are [ and ]. Their description in the first article was a bit cryptic, [ was described as

Go to the next instruction if the byte at the memory pointer is not 0, otherwise move it past the matching ] instruction

while ] was described as

Go to the instruction after the matching [ if the byte at the memory pointer is not 0, else move it past the ]

In C# code, this is a lot simpler:

// BF Code for this: [-]
while (memory[pointer] > 0)
{
    // Instructions between [ and ]
    // The following instruction is only to have a body
    memory[pointer]--;
}

It's a while-loop. It's important to note that we have to use a pre-test loop, that is a loop that checks the condition before executing the loop (as opposed to a do-while loop which executes the code block at least once and checks afterwards).

So how does a while loop look in .net IL?

// See note below regarding .s suffix on br.s and bgt.s
IL_0000:  br.s       IL_001f
// This is the memory[pointer]-- instruction
IL_0002:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0007:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000c:  ldelema    [mscorlib]System.Byte
IL_0011:  dup
IL_0012:  ldobj      [mscorlib]System.Byte
IL_0017:  ldc.i4.1
IL_0018:  sub
IL_0019:  conv.u1
IL_001a:  stobj      [mscorlib]System.Byte
// This is the while loop
IL_001f:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0024:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_0029:  ldelem.u1
IL_002a:  ldc.i4.0
IL_002b:  bgt.s      IL_0002

GOTO considered harmful?
Okay, this looks complicated, but it is easy. To explain it, we have to open Pandora's Box and look at the dirtiest secret there is in development: At Machine Level, GOTOs are essential.
Ha, take that Dijkstra!

Regardless how much you abstract it away, control structures like while have to be translated as "GOTO's", or more precisely as jumps to addresses to continue execute code from. In .net, this is not called GOTO though, it's called Branch.

Our code has three parts: A single GOTO/Branch instruction at the beginning, the body of the loop (in our case the single memory[pointer]-- instruction) and then the while check.

So we start with br.s, which is described as

Unconditionally transfers control to a target instruction (short form).

In other words, this is a GOTO and it goes to IL_001f. The code starting from here does the while-check: Load memory and pointer onto the stack. Then load the value of memory[pointer] onto the stack as Unsigned 8-Bit Int. Afterwards, push the number 0 to the stack.

Our evaluation stack now contains the value of memory[pointer] and the number 0. Then we have the new bgt.s command:

Transfers control to a target instruction (short form) if the first value is greater than the second value.

In other words and Pseudocode: if(memory[pointer] > 0) goto IL_0002;

The code starting from IL_0002 is our memory[pointer]-- instruction which will be executed and then we'll do the while-check again.

In Debug mode, the bgt instruction is not used. Instead, the check is done much more complicated. Feel free to look it up using ILDASM, but Debug Mode uses this C# Pseudocode to capture the result of the comparison into a local variable:

bool DoJump = memory[pointer] > 0;
if(DoJump) goto IL_0002;

This is useful for Debugging (who would've thought it, given that it's a debug build?), but rather heavy compared to Release mode (8 instructions and a local variable compared to 5 instructions without).

Looking at that, you can easily imagine what the difference between a while and a do while loop is: The do while loop does not have the br.s instruction at the beginning. It therefore executes the method body at least once before it enters the while-check.

Before I end this post, I want to talk about short form commands.

What is "Short Form"?
If you look at the IL Commands, some say "Short Form". What does this mean? Well, normally all addresses are 32 Bit, that is 4 Bytes. If you want an unconditional jump, you would use the br command with the target address. However, this means you'll have 5 bytes in the target file - 1 for the Br Instruction and 4 for the target. As this instruction is so common, it would be a massive overhead to always have to write 5 bytes to the file.

Short Form commands only take 1 byte for the target address. The target here is described as

1-byte signed offset from the beginning of the instruction following the current instruction

So instead of giving an absolute address, we give a relative address to jump to instead. This only works if the target is less than ~125 bytes away (signed offset!) of course, so it's a lot less flexible and your compiler needs to know the distance between the target and the jump instructions. However, the savings are huge as short form only requires 2 bytes, less than half of the full instructions.

This concludes the command overview. Part 6 will finally show how we will write our compiler.

Writing a BF Compiler for .net (Part 4: . and ,)

In the last two parts we tackled 4 of the 8 possible BF Commands: >, <, +, -. Now we look at . and , for input and output.

When working with a Console Application, it only makes sense to use the built-in commands Console.Write and Console.Read.

Let's look at output (the . command) first. The C# code we're converting is a one-liner:

Console.Write((char)memory[pointer]);

As memory[pointer] is a byte, we have to cast it to char to write it to the console. In IL, the line looks like this:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  ldelem.u1
IL_000b:  call       void [mscorlib]System.Console::Write(char)

We already know what the two ldsfld commands to: They load the static field onto the evaluation stack. Now, ldelem.u1 is a new command and this is our cast to char. To quote the Documentation:

Loads the element with type unsigned int8 at a specified array index onto the top of the evaluation stack as an int32.

In other words, ldelem.u1 expects to load an Unsigned 8-Bit Integer which is a byte. You may wonder where the cast is, as char is a 16-Bit Unsigned Integer (=UTF-16 Unicode Character). So how is that possible? Well, ECMA-335 contains the answer in Partition III, Section 1.6 Implicit argument coercion:

While the CLI operates only on 6 types (int32, native int, int64, F, O, and &) the metadata supplies a much richer model for parameters of methods. When about to call a method, the CLI performs implicit type conversions, detailed in the following table.

Translation: If the Parameter on the Stack is an int32 (which it is according to ldelem.u1), then the CLI will implicitly convert it to char if calling a method that wants a char.

The method call itself is then simply a call to static method Write in class System.Console in assembly mscorlib which returns void. The arguments to the method are taken from the evaluation stack. If a method takes multiple arguments, they have to be pushed in the correct order: First argument first.

That's the . command: Get the value of memory[pointer], call Console.Write with it. What about the , command to read a line?

In C#, this is again a one-liner:

memory[pointer] = (Byte)Console.Read();

while in IL this is a few lines more:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  call       int32 [mscorlib]System.Console::Read()
IL_000f:  conv.u1
IL_0010:  stelem.i1

Once again, we start by loading our array and the index into it onto the stack. However, we are not doing anything with them right now. Instead, we call Console.Read which returns an int32. According to the documentation of the call command, The return value is pushed onto the stack.

So now our stack contains three values: The array, the current index and the return value of Console.Read (as Console.Read doesn't take parameters our memory & pointer are still on the stack). conv.u1 takes the Int32, converts it to UInt8 (that's the cast to byte in the method) and puts it on the stack again.

stelem.i1 is a new command:

Replaces the array element at a given index with the int8 value on the evaluation stack.

So this pops off the value, the index and the array and replaces the element. This is equivalent of calling ldelema followed by the operation that pushes the new value to the stack followed by stobj but only takes one instruction if the correct values are on the stack.

In Part 5, I'll finish the command introduction with the [ and ] command (explaining how a while-loop works) and then we finally build our compiler!

How the Async support in RestSharp can help with Report Generation

Note: This was written a long time ago for the then-current version of RestSharp that had experimental Async support. John and his contributors have updated RestSharp tremendously since then, but by now these samples are outdated and only here for illustrative purposes.

I've been using RestSharp in the past weeks for some backend tools, mainly because REST is easy to implement and I like the Deserialization support that comes with it. One of the reasons I wanted to add async support is because I wanted to write a monitoring application.

I am using Microsoft Chart Controls, and I have a server that accepts REST Requests in the form /reports/health/{entity} to return something like this:

<healthdata>
    <healthindex>73</healthindex>
</healthdata>

I don't know which entities are being pinged at compile time. I can't even put them in a config file, as someone might add or remove entities while the application is still running. So I decided to have an architecture with a Timer, a Request Dispatcher and a Response Handler.

The Timer executes every 5 seconds, gets the current list of entities and calls the Request Dispatcher:

private void timer1_Tick(object sender, EventArgs e)
{
    foreach(string entity in GetEntities()){
        DispatchRequest(entity);
    }
}

The Request Dispatcher creates a new RestRequest and a custom State Object. The state object will be important later, for now lets just note that it contains the entityname and the time the request was sent.

private void DispatchRequest(string entityName)
{
    var rq = new RestRequest("health/{entity}", Method.GET);
    rq.AddParameter("entity", entityName, ParameterType.UrlSegment);

    var state = new ServiceState {Entity = entityName, RequestDateUtc = DateTime.UtcNow};
    _client.BeginExecute<healthdata>(rq, HandleResponse, state);
}

The Request Dispatcher doesn't care what happens afterwards, it just happily fires new requests all the time and tells BeginExecute to call HandleResponse afterwards.

private void HandleResponse(IAsyncResult res)
{
    // Note: This assumes the response is always good. In reality, you want to
    // check response.ResponseStatus and act accordingly
    var result = (AsyncResult) res;
    var caller = (RequestExecuteCaller<healthdata>) result.AsyncDelegate;
    var state = res.AsyncState as ServiceState;
    var response = caller.EndInvoke(res);
    AddDataToChart(state.Entity, response.Data.healthindex, state.RequestDateUtc);
}

Here comes our state object into play. If you look at the XML from the server above, you see that it doesn't echo the Entity Name. As HandleResponse handles all responses, it has to know which response corresponds to which server. Also, async operations can come in a different order than the one executed - I may send requests for entities s1, s2, s3, s4 and receive them back in the order s3, s1, s4, s2. If the timer interval is too small and the server latency is too high, I might even receive a later request for s2 before an earlier one - it's pure anarchy.

That's why I've passed a state object to BeginExecute, as this allowed me to capture the entity name and the exact Date of the request. So all that HandleResponse has to do is to get the state and healthdata and call the function that adds the data to the chart with precise information about the Time, the Entity and it's health.

All without ever locking the UI or without giving the dispatcher or the handler too much work that they shouldn't be doing.


(Excuse the horrible colors, I'm only a developer :) )

How Optional Parameters work, why they can be dangerous, and why they work in .net 2/3 as well

One of the changes I really like in Visual Studio 2010 are optional parameters to method. Basically they allow you to specify a default value for each parameter and thus reduce the number of overloads to a method. Make sure to read to the end as there is a huge word of caution regarding optional parameters.

Instead of having this:

private static string SomeMethod(int value)
{
    return SomeMethod(value, "Was Empty");
}

private static string SomeMethod(int value, string data)
{
    return string.Format("{0}: {1}", value, data);
}

You can just have this:

// static because it's a console app, no extra magic
private static string OptionalMethod(int value, string data = "Was Empty")
{
    return string.Format("{0}: {1}", value, data);
}

The nice thing about this feature is that it also works with .net 2/3. Why? Because it is implemented like this:

private static string OptionalMethod(int value,
                          [Optional, DefaultParameterValue("Was Empty")] string data)
{
    return string.Format("{0}: {1}", value, data);
}

Looks like a normal method (the longest overload) with some attributes. Are those attributes the magic? No, it's much simpler.
If this is your calling code:

Console.WriteLine(OptionalMethod(1));
Console.WriteLine(OptionalMethod(2,"Test"));

Then this is how it looks in the compiled assembly:

Console.WriteLine(OptionalMethod(1, "Was Empty"));
Console.WriteLine(OptionalMethod(2, "Test"));

As you see, the compiler changes the method call to put in all optional parameters into the caller. This explains why this works with pre-.net 4 applications. However, this also explains the huge warning I want to give:

Changing the optional parameter value will not change the behavior of your callers!

Imagine you want to change the value to "No value specified". With the overload solution this is trivial:

private static string SomeMethod(int value)
{
    return SomeMethod(value, "No value specified");
}

Recompile your provider assembly and every consumer who uses that overload will get the new value. However, with optional parameters you have to re-compile each and every consumer assembly as well, so changing an optional parameter is a big, breaking change.

Does that mean that Optional Parameters are evil and should be avoided at all costs? Some people might think so, but I think they are useful for variables that almost never change. Think about a function like SendMail(string server, int port = 25).

However, I do object that ReSharper 5 offers the hint to introduce Optional Parameters too lightly:

Edit: I gave the ReSharper guys bad credit. If the method is declared as public, it does not offer that hint. Sorry guys for not double checking that! This then makes complete sense, as Optional Parameters are fine and safe for private/internal members.

I know too many people why blindly follow any advice that ReSharper give and that might see optional parameters as "Ohh, Shiny, no more Overloads!". Of course, you can argue that developers should know their stuff and deserve to fall hard on their face once they rolled it out as a big API and suddenly realize they have to recompile/redeploy dozens or hundreds of consumer assemblies because of a change. I am however not a fan of this type of developer elitism and would have preferred if ReSharper would not show that hint by default.

Just my 2 cents though.

Writing a BF Compiler for .net (Part 3: pointer++ and pointer–)

The last article gave an introduction to the concepts and we started looking at the memory[pointer]++/-- functions. Today is a quicky post to look at two other instructions, pointer++ and pointer-- or > and < in BF.

Unsurprisingly, these are extremely simple operations, requiring only 5 IL Commands:

  IL_0000:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
  IL_0005:  ldc.i4.1
  IL_0006:  add
  IL_0007:  conv.i2
  IL_0008:  stsfld     int16 BFHelloWorldCSharp.Program::pointer

ldsfld loads a static field onto the evaluation stack, ldc.i4.1 pushes the number 1 to the stack, add takes the two values from the stack and pushes the result back. conv.i2 converts the value on the stack (which is an Int32) to Int16 (2-Byte Int), pads it to be Int32 (as that is the smallest datatype possible on the stack) and pushes it back. stsfld then replaces the value in the static variable with the value from the stack.

As usual, pointer-- works the exact same way with the difference of using sub instead of add. One word of note: sub subtracts value2 from value1. While the order doesn't matter for add, it does for sub. Also note that add and sub do not detect overflow, so you can happily add 1 to Int16.MaxValue. If you want overflow checking, there is sub.ovf and add.ovf which throw an OverflowException.

With the 4 easy operations done, we will look at . and , next for input and output. Finally, we will look at [ and ]. After we have looked at the IL for each operation, we will write our compiler.

Writing a BF Compiler for .net (Part 2: Writing BF in C# and looking at the IL)

In the last posting, we looked at BF as a language and how to write an interpreter in C#. The point of doing that was to understand what each function actually does. As I am no IL Expert, my second step was to write the Hello World BF Application as C#, that is one extremely long function with each BF command spelled out. You can find the full function here. This is an Excerpt:

static void Main(string[] args)
{
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    memory[pointer]++;
    while(memory[pointer] > 0)
    {
        pointer++;
        memory[pointer]++;
        memory[pointer]++;
        memory[pointer]++;
        // snipped
    }
    // snipped
}

As you see, we literally translated each instruction into a C# instruction. Compile this in Debug Mode, load it up in ildasm:

Whoa, now were talking! Note that the ildasm will look different between Debug and Release modes due to optimizations. I prefer Debug to get an understanding how this works, and then later on Release mode to see the differences. The point is really to understand how this stuff looks in IL as we have to write IL ourselves later on.

So the first C# instruction is memory[pointer]++, which is a short way of saying memory[pointer] = memory[pointer] + 1. It is important that many language constructs and short-hand commands simply do not exist in IL - they are convenience functions in the C# compiler. Looking at the IL, our first C# instruction is now 9 IL instructions, starting from IL_0001 and ending at IL_0019.

  IL_0001:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
  IL_0006:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
  IL_000b:  ldelema    [mscorlib]System.Byte
  IL_0010:  dup
  IL_0011:  ldobj      [mscorlib]System.Byte
  IL_0016:  ldc.i4.1
  IL_0017:  add
  IL_0018:  conv.u1
  IL_0019:  stobj      [mscorlib]System.Byte

Let's quickly look at each command before I explain in Detail what is happening:

  1. Push the value of a static field 'memory' onto the evaluation stack.
  2. Push the value of a static field 'pointer' onto the evaluation stack.
  3. Load the address of the array element at a specified array index onto the top of the evaluation stack as type & (managed pointer).
  4. Copy the current topmost value on the evaluation stack, and then pushes the copy onto the evaluation stack.
  5. Copy the value type object pointed to by an address to the top of the evaluation stack.
  6. Push the integer value of 1 onto the evaluation stack as an int32.
  7. Add two values and pushes the result onto the evaluation stack.
  8. Convert the value on top of the evaluation stack to unsigned int8, and extends it to int32.
  9. Copy a value of a specified type from the evaluation stack into a supplied memory address.

That looks more complicated than it really is, especially if you don't know how CPUs usually work or never worked with Assembler. .net/IL is a Stack Based language, and functions work against this stack. For example, the Add instruction at address IL_0016 does not take parameters. Instead, it removes 2 elements from the stack, adds them, and pushes the result back to the stack. Be aware that most functions assume that the objects on the stack have the correct type - trying to push a string and an Int32 and calling Add will crash the CLR. Not all functions require their parameters on the stack, but many do - you can find an Overview of each function in Partition III of the ECMA-335 standard or in the OpCodes Fields.

Anyway, let's dissect the instruction and look at what's happening.
1. Getting memory[pointer]
In order to increment the value at memory[pointer], we first need to get it. This is the job of the ldelema instruction at IL_000a. To quote from the documentation:
Loads the address of the array element at a specified array index onto the top of the evaluation stack as type & (managed pointer).

So this function requires an array and an array index, and it gives us a managed pointer to that element. It expects that we push the array and the index to the stack (in that order). It will then pop (remove) these two elements from the stack and pushes the pointer back to the array.

The ldsfld function pushes the value of a static field onto the stack (pointer and memory were declared static). So we first push the memory array to the stack, and then the value of the pointer variable.

Here is a diagram:

2. Incrementing the value
Now we have a pointer to memory[pointer], but we need to increase it. That is what the next 4 instructions do. The important function is add at IL_0017, which takes two values from the stack and adds them. But at the moment, we only have the managed pointer on the stack. We first need to get the value of memory[pointer] to the stack. This is that ldobj at IL_0011 does: It gets the address of the value from the stack and pushes the actual value onto it. Note that this is now just a byte - it has no connection to the memory array, it is simply a number.

Before that, you might wonder what the dup instruction does: It simply creates a copy of the topmost value on the stack, effectively duplicating it. This might seem useless for the addition, but it is required later. For now, just ignore it.

Now that we have the current value of the memory[pointer], we need the second number to perform the addition. As ++ is just a shorthand for +1, we need to push the number 1 to the stack. To push an Int32 value to the stack, you can use the ldc.i4 instruction which pushes the supplied number to the stack. However, there are also shorthand commands for some numbers, and the command ldc.i4.1 pushes the number 1 to the stack. (I do not know if these commands exist for performance or for code size reasons, also I strongly believe the latter. ldc.i4.1 is a 1-byte command, whereas ldc.i4 requires 5 bytes: 1 for the command and 4 bytes for the Int32 value to push).

So now we have the current value of memory[pointer] and the number 1 on the stack, and the Add function pops them off, performs the addition and pushes the result.

A diagram for this part:

Just a slight note: The 1 at the end is not the same 1 as the one before. Add removes 0 and 1 from the stack, "generates a new 1" through performing 0 + 1 and pushes that to the stack.

3. Storing the value in the array again
We performed the addition, but as a result we simply have the number 1 on the stack now - memory[pointer] is still 0 though! The last two instructions take care of saving it back. Now, the number 1 on the stack is an Int32, but we have an array of Bytes. The first instruction, conv.u1 at IL_0018 gets a value off the stack, converts it to an unsigned 8-bit integer (which is exactly what a Byte in .net is), adds padding to Int32 and pushes it back to the stack. This might seem silly, but the evaluation stack only holds 4- or 8-byte integers (See Partition III, 1.1 Data Types of ECMA-335). The conv instruction makes sure that the actual data is stored in the "correct" bits (the ones later used to read the value), while the other 24 bits are merely padding to make it an Int32.

stobj copies the value to the desired memory location. You may now understand why we had the dup instruction at IL_0010: We need to know where to copy the value to, but the ldobj object removed the address from the stack already at IL_0011! So by duplicating the address, we can give it to both functions (otherwise we would have to do the lookup in the first three lines again).

And that is the memory[pointer]++ C# instruction in IL! I know it looks scary at first, but it's rather simple and logical once you understood how this stuff works internally. Remember, at some point each programming language needs to be translated into very specific CPU instructions. We are not at that level (this is what the CLR and it's JITter does), but we are at a fairly low level. Remember, we have to write a compiler for this, so we need to understand a) what is happening and b) why it is happening. Making those diagrams helped me a lot, especially because I didn't understand the use of the dup instruction at first.

Now that we know how this instruction works, we will look at the other 7 BF Instructions. I can already tell you how memory[pointer]-- works: It is exactly the same function, with the difference of using sub instead of add at IL_0017, so no need for an article of it's own.

The two pointer operations (pointer++ and pointer--) will be described in the next article.