The one dynamic language I think Microsoft needs to embrace in .net…

So a few days ago, Jimmy Schementi announced the death of IronRuby. Oh, sorry, "IronRuby isn't dead, it's just back in the hands of the community" which is essentially the same.

Now, while there may be a chance for IronRuby to survive, I personally think it's not something that Microsoft should do, so I think it makes sense for them to kill it off internally. Same for IronPython. I think the DLR is a great addition to .net though, and I think there is a far better language already available.

JavaScript.

Many of us perceive JavaScript as a Browser-only language that is hampered by different implementations in different Browsers (textContent vs. innerText anyone?), something which is being addressed through the use of frameworks like jQuery.

But think it one step further: How many languages do you need to write a Web Application? At least two: JavaScript and whatever backend language you use, for example C#, Java, Ruby or PHP. How do you write Form Validation? You write separate Code for the Client Side and for the Server Side so that users get instant feedback without being able to compromise the system by turning JavaScript off. This is stupid and a severe violation of DRY. Also there are subtle differences between the Code, for example because JavaScript's RegEx Engine works slightly different than the .net one.

.net is at it's core a language-independent technology that doesn't only support but encourage using multiple languages. Sure, there are only 3-4 really supported languages which are rarely mixed (Some people mix C++/CLI with C# or VB.net for some COM stuff, or F# with C# for financial/statistic stuff) and some languages that have some limited support (Boo is my favorite in this category).

So why not do what's logical and embrace JavaScript as a Server-Side technology? Node.js arrived recently and showed it's possible. Microsoft already somewhat supports JavaScript in Visual Studio (although the experience is far less than stellar) and they even had their own bastard child in form of JScript.net.

There is a big discussion whether or not C# is a good language to write Web Applications in. People point at Cucumber and talk about how great Ruby works with the Web while C# feels like a chore because of it's static typing, verbose Syntax and need for IoC Containers to do Unit Testing properly. Other people point at the insecurity of dynamic languages, about the lack of compiler errors, about the confusion created by not having to declare variables which leads to subtle bugs like `$total = 0; foreach(item in items) $totel += item.price`.

I say: Use both. Use C# for your backend code, for your Business Classes. Get Compiler Errors when you screw up. Make sure everything needs to be explicitly casted to whatever it has to be and that all variables have to be defined officially before use. For the frontend, use JavaScript. Create your Views and View Models like you would create them in the browser, through DOM Manipulation. Write and maintain your verification code exactly once, in one language, browser and server side. Feel free to whatever crazy manipulation you need to do without having to declare and cast tons of variables. If you screw up, it's "only" your Views, not your data structure because that's C# code.

Utopia? Maybe. JavaScript is not without it's faults and it may not be as elegant as Ruby in places. People would have to learn two languages, and people may ask "Why should we switch to an insecure language if C# served us well for years?" - well, the latter people are usually the ones who don't think they need Unit Testing and believe that WebForms is a perfectly good technology, so let them continue to use it and let the rest of us move forward to face the challenges of 2010 and beyond.

Customers want better apps. They want AJAX, they want snappy, cool looking UI. The Browser market is not a "Make sure it runs in IE and doesn't suck in Netscape 4.78" market anymore. We need to create cool apps, and we need to run them on Internet Explorer, Firefox, Safari on OS X, Safari on iPhone and Android. Our Web Apps need to do more in less time.

And Microsoft is in a great position because they have the foundation already built. Make Visual Studio a kick-ass JavaScript Development tool and bring it to the server side. Give us great debugging because even with FireBug and the IE Development Tools, it still sucks. Look what Node.js did. Look what you are doing with Internet Explorer 9's JS Engine. Look what you have with Active Scripting and look what you did with JScript.net.

It's incredible to see that Microsoft has all this technology already lying around and that no one had the idea to just combine them all together for ASP.net. Sure, Internet Explorer got a lot of (well deserved) crap for issues in it's JavaScript implementation, but IE9's previews look really promising.

Because let's face it: Any .net Language not supported by Microsoft in Visual Studio is doomed.

64-Bit Bitfield Cheat Sheet

Just as a cheat sheet for me, a 64-Bit Bitfield in Dec and Hex.

Bit Int Hex
1 1 0x1
2 2 0x2
3 4 0x4
4 8 0x8
5 16 0x10
6 32 0x20
7 64 0x40
8 128 0x80
9 256 0x100
10 512 0x200
11 1024 0x400
12 2048 0x800
13 4096 0x1000
14 8192 0x2000
15 16384 0x4000
16 32768 0x8000
17 65536 0x10000
18 131072 0x20000
19 262144 0x40000
20 524288 0x80000
21 1048576 0x100000
22 2097152 0x200000
23 4194304 0x400000
24 8388608 0x800000
25 16777216 0x1000000
26 33554432 0x2000000
27 67108864 0x4000000
28 134217728 0x8000000
29 268435456 0x10000000
30 536870912 0x20000000
31 1073741824 0x40000000
32 2147483648 0x80000000
33 4294967296 0x100000000
34 8589934592 0x200000000
35 17179869184 0x400000000
36 34359738368 0x800000000
37 68719476736 0x1000000000
38 137438953472 0x2000000000
39 274877906944 0x4000000000
40 549755813888 0x8000000000
41 1099511627776 0x10000000000
42 2199023255552 0x20000000000
43 4398046511104 0x40000000000
44 8796093022208 0x80000000000
45 17592186044416 0x100000000000
46 35184372088832 0x200000000000
47 70368744177664 0x400000000000
48 140737488355328 0x800000000000
49 281474976710656 0x1000000000000
50 562949953421312 0x2000000000000
51 1125899906842624 0x4000000000000
52 2251799813685248 0x8000000000000
53 4503599627370496 0x10000000000000
54 9007199254740992 0x20000000000000
55 18014398509481984 0x40000000000000
56 36028797018963968 0x80000000000000
57 72057594037927936 0x100000000000000
58 144115188075855872 0x200000000000000
59 288230376151711744 0x400000000000000
60 576460752303423488 0x800000000000000
61 1152921504606846976 0x1000000000000000
62 2305843009213693952 0x2000000000000000
63 4611686018427387904 0x4000000000000000
64 9223372036854775808 0x8000000000000000

Checking if a bit is set:

// AND: Only return 1 if both bits are 1
// 0011 & 0100 = 0000
// 0111 & 0100 = 0100
isSet = (value & 0x4) == 0x4;
isSet = (value & 0x4) > 0;

Setting a bit:

// OR: If either operand is 1, return 1.
// 0011 | 0100 = 0111
newvalue = oldvalue | 0x4; // |=

Unsetting a bit:

// NOT: Invert the Bits
// ~0100 = 1011
// AND: Return 1 if both bits are 1
// 0011 & 1011 = 0011
// 0111 & 1011 = 0011
newvalue = oldvalue & (~0x4) // &= ~0x4;

Toggling a bit:

// XOR: If both bits are equal, return 0
// 0111 ^ 0100 = 0011
// 0011 ^ 0100 = 0111
newvalue = oldvalue ^ 0x4; // ^=

Is the SharePoint Object Model too weak for excellent Applications?

I'm doing SharePoint since about 3 years now, starting with SharePoint 2007 and moving into 2010 in November when the Beta was released. While I can't say that I'm in love with the development experience, I do think it's a very capable product for the users. During the years, I've learned many of the quirks and tricks of SharePoint and despite it's many little issues, I liked SharePoint 2007 development.

SharePoint 2010 added a ton of new Features, including a separation into Service Applications (replacing the SSP) and many new Social Features like tagging and commenting. Also, the Development tools radically improved. However, I think that the Object Model didn't scale well over the years. I think it's downright broken in SharePoint 2010 and that the next version of SharePoint needs a completely new Object Model with properly separated APIs/Modules.

Why? Here are some of the big and small issues I've encountered while I wanted to do some really simple things.

SPMetal doesn't generate all field Types
Create a new List in SharePoint, then enable the Managed Metadata and Enterprise Keywords option on the List or add a new Field to the List which is a Managed Metadata column. Run SPMetal against the list. Look how the Proxy doesn't have the Keyword field.

It's bad enough that the generated proxy is unnecessarily fragile, but not supporting all of the *built-in* field types makes it unusable for all but the most simple queries. Granted, Managed Keywords is a separate Feature of Standard/Enterprise, but it's still an official Microsoft out of the box feature.

Querying Managed Keywords through CAML is only possible by Name, not by ID
Let's say you want to query the List and get all items that have two different keywords. As Keywords can have the same name (if they are in different paths of the Term Set), I thought it would be best to query with the Guid of the Term.

Well, turns out you can't. You can query by their WssId though (which is an int that seems to be assigned uniquely on each Site Collection) by adding the LookupId="TRUE" attribute. However, try to chain two queries together with an <And> or by adding the value twice. What happens? You get all items that have any Keyword. It's an Or, not And.

The only way is to Query by the Title and hope you never have duplicate titles or that you can enable the Full Path option on the Field. Now, for the standard Keywords this isn't that much of a problem as they don't have a hierarchy and therefore no duplicates, but you aren't always that lucky.

This is a deeper issue though, it's a problem with the LookupMulti field from which the Multi-Taxonomy Field inherits.

No way to query the User Profile store effectively
This is something that is easy in pretty much every CMS on the market: Give me a distinct list of departments in the company. If the user profile store has a field that holds the department, it's literally a SELECT DISTINCT(Department) from USERS ORDER BY Department.

In SharePoint, there is no way to do that. You could query the User Information list on the main site collection, but that may not contain all users. You can query Active Directory directly, but what's the point of the User Profile store in SharePoint then?

If you want to get a list of all departments, you get a UserProfileManager and loop through all profiles, then fill a List<string> or HashSet. This is slow and resource intensive.

No way to get some tag statistics efficiently
Another really simple scenario: I have a list that contains a lot of items. Users can use the Tags & Notes feature of SharePoint to tag items. We want to get statistics: Give me the top 10 tags that start with 'su' and how often they were used on this list.

The first problem is obviously that SharePoint stores social tags against an exact URL. And when I mean exact URL, then I mean:

  • Alternate Access Mappings are not supported. If one person uses http://internalportal and another person uses https://portal.internal.example.com, they won't see each other's tags and notes
  • If you have a Ribbon open and your URL has the "InitialTabID=...." QueryString, then your Tags and Notes will not be visible to other people who don't have it

So that's bad enough. But what about getting the statistics? Using SQL, such a feature is developed in 5 minutes since all you need is this query:

SELECT TOP 10 InputTermLabel, COUNT(InputTermLabel) AS Count
FROM dbo.SocialTags
WHERE InputTermLabel LIKE 'su%' AND UrlID IN (
SELECT UrlID
FROM dbo.Urls
WHERE Url like 'http://myportal/Lists/MyList/DispForm.aspx%')
GROUP BY InputTermLabel
ORDER BY Count DESC

Using the Object Model, it is impossible to do this effectively. The closest I got is this:

var result = new Dictionary<string, int>();
var baseUrl = "http://myportal/Lists/MyList/DispForm.aspx";
var stm = new SocialTagManager(SPServiceContext.GetContext(site));
var terms = stm.GetAllTerms(new Uri(baseUrl),0);
foreach (var term in terms)
{
    var name = term.Term.Name;
    if (!name.StartsWith("su", StringComparison.InvariantCultureIgnoreCase)) continue;
    var tc = stm.GetUrls(term.Term);
    int usageCount = tc.Count(url => url.AbsoluteUri.StartsWith(baseUrl));
    result[name] = usageCount;
}
result = result.OrderByDescending(kvp => kvp.Value)
             .Take(10).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);

As you may see, this is incredibly inefficient. The SQL Server sends me ALL the terms (at least I can specify a base URL!) and only on the client I can filter the ones starting with 'su'. Then I have to ask for every single Term to get the URLs. If you have a couple hundred tags, this is an incredible resource hog (tons of SQL Queries, a lot of network traffic, high memory usage). That would be the moment to wrap it into a Timer Job, cache it and not have those statistics in Real time.

The last one made me break one of the golden rules with SharePoint
The golden rule is: Never ever directly talk to the SQL Database, always go through to the SharePoint Object Model.

Today, I broke that rule, and I don't feel dirty or guilty. That last point was the final straw. The Object Model simply doesn't cut it anymore, it's weaker than what I can do with some low end PHP CMS Systems, weaker than stuff like XOOPS or PHP-Nuke. The whole social functionality is poorly integrated.

After trying to get Tag-Statistics for 6 hours, I spent 30 Minutes to write a small wrapper: Iterate through all the SPDatabases in the farm, find the SocialDB that belongs to the current Web Application, use Reflection to get the SqlSession property and use Reflection again to call ExecuteReader on it. As a result, I get a nice SqlDataReader back.

No fighting with the Object Model, just plain "I want this, give it to me without killing all my resources".

Obviously, I now need to be careful with Service Packs and Hotfixes. I've set up a little tool that can compare SQL Schemas. That way, whenever a Hotfix/Service Pack gets released I run it to see if any of the Databases/Tables I use changed and adapt my code accordingly. Even if it takes me 15 Minutes to make the changes, Microsoft would have to release 24 Schema-Changing Updates to break even with the 6 hours I wasted today on this.

Granted, I use it strictly for SELECTs and only as a last resort and I wouldn't release code containing it, but as some point I have to get pragmatic about it. I can't spend a month developing something that should only take a week at maximum just because SharePoint doesn't like me.

Sorry for ranting so much, but SharePoint 2010 simply feels like a 10 year old application in places. There is so much stuff that was bad when it was still SharePoint Portal Server 2001 aka. Tahoe (ONET.xml...) and that's worse now that we have nicer technologies.

Why are pretty much all of the collections non-generic? Why is SPListItemCollection still a non-generic Collection of Object and not a List<SPListItem>?
Why are there no standalone SharePoint MSBuild tasks, thus requiring Visual Studio 2010 to be installed on a build server?
Why does creating Content Types through XML Files require me to specify fields twice? Why does creating a List for that Content Type through XML require me to specify the fields a third time?
Why does a product than can cost hundreds of thousands of dollars in licensing still require me to depend on Reflector and looking at the Stored Procedures just to understand simple things?
Why is there no easy way to check for the existence of certain items in some collections without iterating through it completely? Usually, there is only an indexer that throws an Exception if an item isn't found.
Why are so many useful functions and/or classes internal or sometimes even internal sealed? Many collections have internal "GetItem" functions that return null if the item wasn't found, which is great but as said, they are internal. The external functions usually wrap the GetItem call and throw an Exception if null - ARGH!
Why is there no MS Connect site to report issues and ask someone from MS to resolve them in a patch? Oh wait, there is a SharePoint Community Connect Site. It even has some bugs and the occasional Microsoft poster - except that all the real issues are ignored there.
Why are the simple things so hard and the hard things impossible to do?

The weird thing is, I'm not even that unhappy developing for it. As said, I still believe that SharePoint is a good product for the users to use and does so many things right there.
But I also believe that the Object Model needs a huge refresh.

I assume Microsoft wants to make upgrading as painless as possible, seeing how SharePoint is targeted at Enterprises. So very little changes, dragging the old stuff along.
I just hope that the next SharePoint version will be based on the .net 4 CLR and that the breaking changes there (unlike 3.0 and 3.5, .net 4.0 introduced a new CLR) would allow some people at Microsoft to show some bravery by revamping the whole OM and turning SharePoint from an ancient but modern looking product into a modern product.

</rant>

A few more thoughts about SWiki

It's been some time since I wrote a post about me re-thinking SWiki. In the meantime, I have experimented a bit with several approaches, and the recent announcements of IIS Express and SQL CE 4 sparked some new interest in this project.

As I said earlier, my problem was that I can't display Images that don't have a URL in the hosted Internet Explorer, but that I wanted to keep HTML-compatible pages. The first approach is to have a local web server that delivers the pages. There is however a second approach, which involves actually having the images in the file system. I could either store them in the database and "extract" them to a temporary directory when SWiki starts, or I could keep them externally and only "register" them in the database.

I think I like that approach as well, because it solves any corporate networking concerns (IT usually isn't too happy with people running their own rogue web servers within a corporate network...).

I'm busy with some other projects and I have to find a new name for SWiki (someone else had the name before :) ), but I do now have a good idea on what I want to do with it and how to achieve that.

Careful with SPContext.Current…

...as it will be NULL within a Timer Job or Workflow. I have some shared Data Access classes that use SPContext.Current.Web all over the place and now that I want to use them from within a Timer Job, I have to refactor them to take a SPWeb as a Parameter...

Why doesn’t Windows offer a working help system anymore?

If you are developing Windows Desktop applications, you may want to offer context-sensitive help, triggered either by pressing F1 or by clicking on the question mark icon in the title bar and on an element. Back in the old days (starting in 1990 and de-facto ending in 2006), there was WinHelp.

Now, WinHelp wasn't exactly beautiful and in recent years (after 1996 that is), the "Maximize Database Size" dialog was downright stupid, but WinHelp had all the features a Help system needs: Articles are organized in Chapters and can contain Images, Links and basic formatting. And it allowed your application to open a specific page, providing contextual help.

But most importantly, WinHelp just WORKS. Really, you press F1 and maybe you have to "Maximize Database Size" once, but then it opens. I never ever had a problem with WinHelp.

But Microsoft decided it wasn't modern anymore. That we needed something new. Granted, WinHelp clearly showed it's age, and creation of Help Files was somewhat complicated. So they introduced Compiled HTML Help, or CHM. It is a modern Help system, allowing you much more freedom with your layout and styling. It's a really good format, with one tiny little problem: CHM doesn't actually work:

Turns out that CHM is displayed through the MSHTML Control (which is essentially an embedded Internet Explorer) and thus it has some security limitations. The most important one is that CHM files on non-trusted (e.g., network) locations simply don't work.

Now, you may say that this can be resolved. The file can be unblocked, or the path can be set to trusted. An Application Installer could do that. I reply: Doesn't matter. It's a Help system. It has to work without configuration. Press F1, get help. If I'm in a situation I need help using my application and my help system tells me that it wants some treats first, it's a failure. Besides, not every Application has an installer because not every application needs one. A large amount of applications are just DLLs (like the one the above screenshot is from) or ZIPped application files.

So CHM is a complete and utter failure, and Microsoft at least acknowledged that by killing off Microsoft Help 2 and starting a new approach with MAML. However, MAML is not a Help System, it's a language that can be used as source to be converted into an output format like HTML, RTF or whatever. In other words, Microsoft has created DocBook again without actually solving the problem of displaying help.

The real successor to CHM seems to be the HelpPane introduced in Windows Vista and included in Windows 2008 and 7 as well. Those help files have the extension h1s and a nice little icon, so Windows knows what they are. There is our new Help system, right? Well, try to double click one of those h1s files...

Hmmm... So Microsoft didn't just register a file type handler for h1s files. Well, can't be that hard to do, can it?

AP Help - Guided Help - Technical FAQ

Can I launch Guided Help through other means besides the Help Pane?
Yes, but you must create and publish the Guided Help topic through Help. Once you have a Guided Help topic compiled into an H1S file and installed (at this stage only possible for Microsoft and OEM's), you can launch it directly through a command line if you wish.
The syntax is:

%systemroot%\system32\acw.exe –Extensions GuidedHelp.dll –taskID mshelp://windows/?id=id-of-your-help-topic –ExecutionMode DoIt | ShowMe

For a fast impression copy following text to your run dialog:

%SystemRoot%\System32\ACW.exe -Extensions GuidedHelp.dll -taskID mshelp://windows/?id=3726934c-1315-4c29-bd4d-e42c10225e5a -ExecutionMode ShowMe

Excuse me, but ARE YOU FRIGGIN' KIDDING ME? Oh, yes you are, let me just quote Microsoft:

Microsoft is committed to providing Help and Support technology in the Windows® platform and will continue to investigate new solutions for software developers.

Sorry, but if "comitted" means "Killing off perfectly working solutions and replacing them with a plethora of broken solutions every two years" then you are absolutely right, because that's what you are doing. WinHelp survived 16 Years and if you would still ship it with Vista and 7 then it would still be alive. So you as an application developer, what can you do? WinHelp isn't part of Vista and Windows 7 anymore and you're not allowed to distribute it with your application. CHM/H1S doesn't work. What are your alternatives?

Some applications use PDF. They offer rich layout and a Table of Content, however there is no standard reader. Sure, there is Adobe Reader, but you can't easily control it (e.g., open a PDF on a given page) - if the user has a version that is too old or too new for your application, you may run into issues. And if the user doesn't have Adobe Reader (or any other PDF reader) installed, you have to explain why someone would download an additional program just because you're not competent enough to include help. So PDF is not an option.

What about HTML Files? Everyone has a browser, even the short lived Windows 7 E Editions included MSHTML allowing you to at least display HTML within an application. The major downside of HTML is that you can't control which browser displays it, so you have to stay conservative and make sure old Internet Explorer or Firefox browsers display it (say goodbye to transparent PNGs...). JavaScript maybe tricky (also due to widely spread Extensions like NoScript). And instead of one help file, you have a whole folder. Adding contextual help to your application is somewhat possible, but overall you simply lose the ability to control and test how the help looks and works.

This is possibly the moment where you expect me to say "But after researching all these non-working options, here is the one that works!". Sorry, can't do that. I don't know a single Help system that works on Windows Vista/7/2008. I asked on StackOverflow a long time ago and the consensus was the same.

It's really sad that a task that seems so simple and straight forward is too hard for Microsoft. Seriously, all that you need to do is to take a simple container format, some basic formatting options, the ability to link and embed images and an API to call Help from your application. If you want, include video support with a standard codec (keeping in mind Windows N/KN Editions)

Simple, easy, straight-forward, hassle-free or in other words: Exactly how a Help System should work. Exactly how WinHelp worked since 1990 before it was brutally murdered. Rest in Piece WinHelp, we miss you dearly.

Dealing with Multiple Time Zones in SharePoint 2010

Organizations that deploy SharePoint farms often have employees in different countries, or at least in different Time Zones. While people in the US (which spans 4 time zones) are pretty comfortable with translating between time zones all the time, the same cannot be said for everyone. Trying to translate between Pacific Time and Middle European Time is just painful, especially since the daylight savings time starts and end at different dates.

With SharePoint 2010 you get the tools to convert the time according to the users time zone. There are two types of Regional Settings: Each Site (SPWeb) has RegionalSettings that specify the Time Zone (and Locale, Calendar etc.) for that site. This is useful if you have sites that are predominately used by people in one time zone. The second type of Regional Settings are the one the user (SPUser) can set (My Settings - My Regional Settings). Those are the same settings as the ones on SPWeb, but each user can specify their own setting.

When storing Dates in code, you have two options:

  • Store the time in local time of the Web and use DatesInUtc = true on a SPQuery to get it back as Utc
  • Store the time in Utc and to not use DatesInUtc on SPQuery

What does that mean? As said, each SPWeb has it's own Regional Settings. Let's assume you have a date of 2010-06-14 15:00:00.

If the TimeZone of the SPWeb is Pacific Time (GMT-8) and you query the List using SPQuery, you get back this date. If you however set DatesInUtc = true on the SPQuery, you get back 2010-06-14 22:00:00. SharePoint doesn't know if 15:00:00 was already UTC, so using DatesInUtc may translate a date twice.

The caveat here is that when storing dates, you would normalize them either to UTC or to the Local Time of the Web. What would you do if some employee from Texas (which runs on Central Time, GMT-6) enters 2010-06-14 15:00:00? You would need to store it either as GMT-8 (so the time becomes 13:00:00) or as UTC (22:00:00).

Needless to say, I prefer to store all dates as UTC if the list isn't visible to the user directly. Then when querying the list through Code, I can just convert the time to whatever the user's timezone is:

var user = SPContext.Current.Web.CurrentUser;
// Always perform a Null-Check on SPUser.RegionalSettings
if (user.RegionalSettings != null)
{
    return user.RegionalSettings.TimeZone.UTCToLocalTime(listDateUtc);
}
else
{
    // User didn't set a time zone, so use the one from the Web
    return SPContext.Current.Web.RegionalSettings.TimeZone.UTCToLocalTime(listDateUtc);
}

Overall, the option for people to set their own timezones independently from the SPWeb is a fantastic and long needed addition. On the other hand, it does make dealing with times a bit more complex.

If the list is visible to the user, you may need to normalize the times differently (for example, use user.RegionalSettings.TimeZone.LocalTimeToUTC to convert a user time to UTC and then SPWeb.RegionalSettings.TimeZone.UTCToLocalTime to convert the time to the Web-Time).

If you do build custom pages that make use of the Microsoft.SharePoint.WebControls.DateTimeControl then you can just use UseTimeZoneAdjustment="true" on it to have it automatically convert to UTC and back (SelectedDate will be UTC when accessed through code, but the User's/Web's time when rendered).

A Visual Studio Macro to insert a new Guid

I've been trying to create some SharePoint Content Types and List Definitions recently, and everyone who done that before knows what you need for that: Guids, and quite a few of them. One for each Field, Feature, Solution... So instead of using GuidGen, I wanted something that inserts a new Guid at the cursor position in the Editor when I press a certain keyboard shortcut.

Luckily, this is rather easy with the Macro Editor. Just create a new Macro/Module and enter this code:

Sub InsertGuid()
    Dim newId As String = Guid.NewGuid().ToString("B")
    Dim doc As Document = DTE.ActiveDocument
    Dim textDoc As TextDocument = CType(doc.Object("TextDocument"), TextDocument)
    textDoc.StartPoint.CreateEditPoint()
    textDoc.Selection.Insert(newId)
End Sub

You can then go to Tools / Options / Environment / Keyboard and look for the Macro you just created (Macros.MyMacros.SomeModule.InsertGuid) and assign a Keyboard shortcut to it.

Writing a BF Compiler for .net (Part 5: [ and ] – while loops in IL)

The final two commands we're looking at are [ and ]. Their description in the first article was a bit cryptic, [ was described as

Go to the next instruction if the byte at the memory pointer is not 0, otherwise move it past the matching ] instruction

while ] was described as

Go to the instruction after the matching [ if the byte at the memory pointer is not 0, else move it past the ]

In C# code, this is a lot simpler:

// BF Code for this: [-]
while (memory[pointer] > 0)
{
    // Instructions between [ and ]
    // The following instruction is only to have a body
    memory[pointer]--;
}

It's a while-loop. It's important to note that we have to use a pre-test loop, that is a loop that checks the condition before executing the loop (as opposed to a do-while loop which executes the code block at least once and checks afterwards).

So how does a while loop look in .net IL?

// See note below regarding .s suffix on br.s and bgt.s
IL_0000:  br.s       IL_001f
// This is the memory[pointer]-- instruction
IL_0002:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0007:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000c:  ldelema    [mscorlib]System.Byte
IL_0011:  dup
IL_0012:  ldobj      [mscorlib]System.Byte
IL_0017:  ldc.i4.1
IL_0018:  sub
IL_0019:  conv.u1
IL_001a:  stobj      [mscorlib]System.Byte
// This is the while loop
IL_001f:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0024:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_0029:  ldelem.u1
IL_002a:  ldc.i4.0
IL_002b:  bgt.s      IL_0002

GOTO considered harmful?
Okay, this looks complicated, but it is easy. To explain it, we have to open Pandora's Box and look at the dirtiest secret there is in development: At Machine Level, GOTOs are essential.
Ha, take that Dijkstra!

Regardless how much you abstract it away, control structures like while have to be translated as "GOTO's", or more precisely as jumps to addresses to continue execute code from. In .net, this is not called GOTO though, it's called Branch.

Our code has three parts: A single GOTO/Branch instruction at the beginning, the body of the loop (in our case the single memory[pointer]-- instruction) and then the while check.

So we start with br.s, which is described as

Unconditionally transfers control to a target instruction (short form).

In other words, this is a GOTO and it goes to IL_001f. The code starting from here does the while-check: Load memory and pointer onto the stack. Then load the value of memory[pointer] onto the stack as Unsigned 8-Bit Int. Afterwards, push the number 0 to the stack.

Our evaluation stack now contains the value of memory[pointer] and the number 0. Then we have the new bgt.s command:

Transfers control to a target instruction (short form) if the first value is greater than the second value.

In other words and Pseudocode: if(memory[pointer] > 0) goto IL_0002;

The code starting from IL_0002 is our memory[pointer]-- instruction which will be executed and then we'll do the while-check again.

In Debug mode, the bgt instruction is not used. Instead, the check is done much more complicated. Feel free to look it up using ILDASM, but Debug Mode uses this C# Pseudocode to capture the result of the comparison into a local variable:

bool DoJump = memory[pointer] > 0;
if(DoJump) goto IL_0002;

This is useful for Debugging (who would've thought it, given that it's a debug build?), but rather heavy compared to Release mode (8 instructions and a local variable compared to 5 instructions without).

Looking at that, you can easily imagine what the difference between a while and a do while loop is: The do while loop does not have the br.s instruction at the beginning. It therefore executes the method body at least once before it enters the while-check.

Before I end this post, I want to talk about short form commands.

What is "Short Form"?
If you look at the IL Commands, some say "Short Form". What does this mean? Well, normally all addresses are 32 Bit, that is 4 Bytes. If you want an unconditional jump, you would use the br command with the target address. However, this means you'll have 5 bytes in the target file - 1 for the Br Instruction and 4 for the target. As this instruction is so common, it would be a massive overhead to always have to write 5 bytes to the file.

Short Form commands only take 1 byte for the target address. The target here is described as

1-byte signed offset from the beginning of the instruction following the current instruction

So instead of giving an absolute address, we give a relative address to jump to instead. This only works if the target is less than ~125 bytes away (signed offset!) of course, so it's a lot less flexible and your compiler needs to know the distance between the target and the jump instructions. However, the savings are huge as short form only requires 2 bytes, less than half of the full instructions.

This concludes the command overview. Part 6 will finally show how we will write our compiler.

Writing a BF Compiler for .net (Part 4: . and ,)

In the last two parts we tackled 4 of the 8 possible BF Commands: >, <, +, -. Now we look at . and , for input and output.

When working with a Console Application, it only makes sense to use the built-in commands Console.Write and Console.Read.

Let's look at output (the . command) first. The C# code we're converting is a one-liner:

Console.Write((char)memory[pointer]);

As memory[pointer] is a byte, we have to cast it to char to write it to the console. In IL, the line looks like this:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  ldelem.u1
IL_000b:  call       void [mscorlib]System.Console::Write(char)

We already know what the two ldsfld commands to: They load the static field onto the evaluation stack. Now, ldelem.u1 is a new command and this is our cast to char. To quote the Documentation:

Loads the element with type unsigned int8 at a specified array index onto the top of the evaluation stack as an int32.

In other words, ldelem.u1 expects to load an Unsigned 8-Bit Integer which is a byte. You may wonder where the cast is, as char is a 16-Bit Unsigned Integer (=UTF-16 Unicode Character). So how is that possible? Well, ECMA-335 contains the answer in Partition III, Section 1.6 Implicit argument coercion:

While the CLI operates only on 6 types (int32, native int, int64, F, O, and &) the metadata supplies a much richer model for parameters of methods. When about to call a method, the CLI performs implicit type conversions, detailed in the following table.

Translation: If the Parameter on the Stack is an int32 (which it is according to ldelem.u1), then the CLI will implicitly convert it to char if calling a method that wants a char.

The method call itself is then simply a call to static method Write in class System.Console in assembly mscorlib which returns void. The arguments to the method are taken from the evaluation stack. If a method takes multiple arguments, they have to be pushed in the correct order: First argument first.

That's the . command: Get the value of memory[pointer], call Console.Write with it. What about the , command to read a line?

In C#, this is again a one-liner:

memory[pointer] = (Byte)Console.Read();

while in IL this is a few lines more:

IL_0000:  ldsfld     uint8[] BFHelloWorldCSharp.Program::memory
IL_0005:  ldsfld     int16 BFHelloWorldCSharp.Program::pointer
IL_000a:  call       int32 [mscorlib]System.Console::Read()
IL_000f:  conv.u1
IL_0010:  stelem.i1

Once again, we start by loading our array and the index into it onto the stack. However, we are not doing anything with them right now. Instead, we call Console.Read which returns an int32. According to the documentation of the call command, The return value is pushed onto the stack.

So now our stack contains three values: The array, the current index and the return value of Console.Read (as Console.Read doesn't take parameters our memory & pointer are still on the stack). conv.u1 takes the Int32, converts it to UInt8 (that's the cast to byte in the method) and puts it on the stack again.

stelem.i1 is a new command:

Replaces the array element at a given index with the int8 value on the evaluation stack.

So this pops off the value, the index and the array and replaces the element. This is equivalent of calling ldelema followed by the operation that pushes the new value to the stack followed by stobj but only takes one instruction if the correct values are on the stack.

In Part 5, I'll finish the command introduction with the [ and ] command (explaining how a while-loop works) and then we finally build our compiler!

←Older