Missing XML comment for publicly visible type or member ‘considered harmful’

One of the nice things about .net is that you can automatically generate an .xml file for the xmldoc comments.

One of the worst things however is that by default, this leads to compiler warnings (and, in case “warnings as errors is enabled” – as it should be – leads to a failed compilation).

1>FooWrapper.cs(5,18,5,28): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper'
1>FooWrapper.cs(7,21,7,24): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper.Foo'
1>FooWrapper.cs(9,16,9,26): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper.FooWrapper()'
1>FooWrapper.cs(14,16,14,26): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper.FooWrapper(bool)'
1>FooWrapper.cs(19,32,19,39): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper.Dispose(bool)'
1>FooWrapper.cs(23,21,23,28): warning CS1591: Missing XML comment for publicly visible type or member 'FooWrapper.Dispose()'

This often leads to the desire to add comments to everything, possibly even using automated tools, which results in a class like this:

/// <summary>
/// A Class to wrap a Foo value.
/// </summary>
public class FooWrapper: IDisposable
{
    /// <summary>
    /// The wrapped Foo value
    /// </summary>
    public bool Foo { get; }

    /// <summary>
    /// Initializes a new instance of the <see cref="FooWrapper"/> class.
    /// </summary>
    public FooWrapper()
    {
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="FooWrapper"/> class,
    /// with the given value for foo.
    /// </summary>
    public FooWrapper(bool foo)
    {
        Foo = foo;
    }

    /// <summary>
    /// Releases unmanaged and - optionally - managed resources.
    /// </summary>
    /// <param name="disposing">
    ///     <c>true</c> to release both managed and unmanaged resources;
    ///     <c>false</c> to release only unmanaged resources.
    /// </param>
    protected virtual void Dispose(bool disposing)
    {
    }

    /// <summary>
    /// Performs application-defined tasks associated with freeing,
    /// releasing, or resetting unmanaged resources.
    /// </summary>
    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
}

What’s wrong with this class? The signal-to-noise ratio is atrocious, and I consider this downright harmful to understanding what the class does, and of course the comments get outdated even quicker the more there are. Let’s break it down into the useful and useless:

FooWrapper: A Class to wrap a Foo value.

Potentially useful. This tells me what the class is meant for, but sane naming of the class already does that. It could be more useful to explain why Foo needs to be wrapped and when I should use this instead of just passing around the Foo value directly, and when to subclass it.

Foo: The wrapped Foo value

Useless. I know it’s a wrapped Foo value because it’s a property named Foo in a class named FooWrapper. What could make this useful is by explaining what this Foo value represents, and what I would use it for.

FooWrapper: Initializes a new instance of the <see cref=”FooWrapper”/> class.

Useless. I know that it initializes a new instance of the FooWrapper class, because it’s a constructor of the FooWrapper class. That’s what constructors do, they initialize new instances of the class they are part of. There is no other information conveyed here – no information about potential side-effects, about valid input arguments, about potential Exceptions, nothing.

The overload that tells me that the bool foo argument will initialize Foo to the given foo is also useless, because – well, duh, what else is it’s going to do?

Dispose: Releases resources

Useless. IDisposable is a fundamental language feature, so both the reason for this method and the Dispose pattern are well known. What isn’t known is if there’s anything noteworthy – does it dispose any values that were passed into the constructor? (Important e.g., when passing Streams around – whose job is it to close/dispose the stream in the end?). Are there negative side effects if NOT disposing in time?

Useful comments

Now, this class is arguably a very simplistic example. But that makes it also a very good example, because many applications and libraries contain tons of these simple classes. And many times, it feels that they are commented like this out of Malicious Compliance in order to shut the compiler warnings up or fulfill some “All Code must be documented” rule.

The real solution is to suppress the 1591 warning and only add comments to code that do something non-obvious or critical to pay attention to. In the case of the above example class, the best I can come up with is below.

/// <summary>
/// This class wraps a Foo value, captured
/// from when the operation was started.
///
/// Operations that need to capture additional values
/// should derive from this to add their own additional
/// values.
/// </summary>
public class FooWrapper : IDisposable
{
    /// <summary>
    /// The Foo that was wrapped at the beginning of the operation.
    /// Changes to the Foo value in the holder class do not change this value.
    /// </summary>
    public bool Foo { get; }

    public FooWrapper()
    {

    }

    public FooWrapper(bool foo)
    {
        Foo = foo;
    }

    /// <summary>
    /// This class implements IDisposable to allow
    /// derived classes to capture values that need to be
    /// disposed when the operation is finished.
    /// </summary>
    protected virtual void Dispose(bool disposing)
    {
    }

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }
}

Now, the comments convey useful information: We learn the intent of the class – that’s something not obvious from the code. Though arguably, this class should now be called InitialOperationState or something like that. It also explains why/when to create subclasses for it. The comment on the property now explains something about the purpose, rather than just reiterating the code in prose. And finally, the Dispose(bool) method explains why it’s there. The constructors and Dispose() methods do not need any comments – they don’t do anything worth commenting.

And because I suppressed 1591, the compiler is happy as well.

Accessing LDAP Directory Services in .NET Core

The .NET Framework has had support for LDAP through the System.DirectoryServices Namespaces since forever. This has been a P/Invoke into wldap32.dll, which limited the ability for developers to troubleshoot issues and wasn’t platform-independent. With the advent of .NET Core and the desire to run applications on Linux or macOS, the lack of LDAP Support has been an issue.

In the JAVA World, it’s normal to have fully managed libraries in lieu of platform-limited wrappers, and LDAP is no Exception. These days, the Apache Directory LDAP API™ looks like the go-to, but way back in the day, Novell also had an LDAP Client. This was eventually donated to the OpenLDAP project and lives in the JLDAP tree, although development has long since stopped. Back in the day, Novell used to own Mono, and during that time they made a C# conversion of their LDAP Client. The code was clearly ran through an automated JAVA-to-C# converter, but it offered a fully managed way to access LDAP.

While that C# code had lain dormant since the initial release in 2006, .NET Core offered a new incentive to revisit it. dsbenghe made a conversion of the code to support .NET Standard 1.3/2.0, which lives at https://github.com/dsbenghe/Novell.Directory.Ldap.NETStandard and is available on Nuget as Novell.Directory.Ldap.NETStandard.

Over the past couple of weeks, I’ve made some contributions as well, mainly to add support for SASL Authentication, which is available since Version 3.0.0-beta4. At this time, only the CRAM-MD5, DIGEST-MD5 and PLAIN mechanisms are available, but this offers the foundation to connect to a wider range of directories in case Simple LDAP Bind isn’t an option.

An example of how to connect using DIGEST-MD5 an LDAP Directory (in this case, Active Directory):

var ADHost = "mydc.example.com";
var saslRequest = new SaslDigestMd5Request("Username", "Password", "Domain", ADHost);

using (var conn = new LdapConnection())
{
    try
    {
        conn.Connect(ADHost, 389);
        conn.StartTls();
        conn.Bind(saslRequest);
        Console.WriteLine($"[{conn.AuthenticationMethod}] {conn.AuthenticationDn}");
    }
    finally
    {
        if (conn.Tls)
        {
            conn.StopTls();
        }
    }
    
}

Now, whether this is preferable over simple bind is up for discussion – the fact that DIGEST-MD5 requires the domain controller to store the password with reversible encryption is certainly a potential issue. But on the other hand, if you cannot guarantee the security of the transport, DIGEST-MD5 at least means your password will never have to be sent over the wire.

Ultimately, support for the SASL EXTERNAL mechanism with Client Certificates and support for Kerberos will offer modern security/authentication mechanisms. But the bottom line is that there is now a 100% managed LDAP Client for .net that’s in active development. One that is supposed to support any LDAP Server instead of focusing mainly on Active Directory, but one that will offer first class Active Directory support as well. For Stack Overflow Enterprise, we made first class LDAP Authentication support a big goal for the future. We want to support as many real-world environments as possible, and we want everything to work on .NET Core as well. There’s still plenty of work to do, but I’m happy that this project exists.

PicSol – a .net Nonogram/Picross Solver Library

Nonograms – also known as Griddlers, Picture Crosswords, or Picross – are pretty cool puzzles, kind of like a more visual Crossword puzzle or Sudoku. Of all the games on my New 2DS XL, Mario’s Picross and the Picross e series are near the top of my Activity Log (beaten only by Smash Bros).

I got curious about algorithmic solutions to those Nonograms, which seems deceptively easy, but is actually NP-complete. When trying to solve a Nonogram, often I can to only fill in one or a few cells of a group, which then leads to another cell that can be filled in (or X-ed out), and step by step, cell by cell, I solve the Nonogram. Now, that assumes that the Nonogram is properly designed – if that’s the case, then there is always at least one cell that either must definitely be filled or definitely be empty.

All of Jupiter’s games are well designed – even the most tricky ones (with a bunch of 1’s and 2’s and no big numbers) always follow the mantra of There’s always at least one cell that has a definitive solution. There are a lot of other games on the market (Steam returns about 15 games when searching for Picross or Nonogram), and some are not well designed and actually require guessing.

I ended up (after a bunch of googling approaches and other existing solvers) with a solution that’s mostly brute force – generate all possibilities for a row and column, then eliminate those that can’t be correct, rinse and repeat until there’s only 1 possibility left for each row and column, or until we determined that the Nonogram is actually unsolvable. There are some shortcuts that we can take, e.g, when a row/column is empty, completely filled, or completely filled with gaps in-between them.

I’ve created PicSol, a library for .net Standard 2.0 and .net Framework 4.0 (or newer) and available on Nuget which offers a Solver for Nonograms.

Check out the README for information on how to use it, or look at the Console project in the GitHub repository.



Some Ruby concepts explained for .net developers

I’m normally a .net developer, it’s been my bread and butter for the past seven years and will be for several more. But it’s also important for me to keep in touch with other languages out there, including Ruby. Here’s my personal cheat sheet to remember naming conventions.

Method Names are lower case and use underscores, as do Method Arguments. The result of the last expression is automatically returned – there is no direct equivalent of void, although nil can serve that purpose.

def my_method(some_argument)
  1 + 1 # implicitly returns 2.
end

Local Variables are also lower case with underscores, and no special var keyword is required to declare them.

def some_method
  my_variable = 2
  1 + my_variable
end

Instance Variables – that is, a non-static field in a class – are prefixed with @. Somewhat surprisingly, they can be declared within a method.

class MyClass
  def do_stuff
    @test = 4
  end

  def testing
    2 + @test
  end
end

myc = MyClass.new
puts myc.do_stuff
puts myc.testing

This outputs 4 and 6. If I remove the puts myc.do_stuff line, this throws an error: test.rb:8:in '+': nil can't be coerced into Fixnum (TypeError).

Constructors are methods called initialize:

class MyClass
  def initialize(initial_value)
    @test = initial_value
  end

  def testing
    return 2 + @test
  end
end

myc = MyClass.new(3)
puts myc.testing

This outputs 5. Instance Variables are private by default, but Ruby has three special ways to declare a variable as public: attr_accessor, attr_reader and attr_writer. Changing the class to this:

class MyClass
  attr_reader :test

  # .. rest as above
end

myc = MyClass.new(3)
puts myc.test # outputs 3
myc.test = 4  # undefined method 'test='

So attr_reader is like public dynamic Test { get; private set; } in .net, while attr_writer is like { private get; set; } and attr_accessor is like { get; set; }.

To create property getters and setters, just create methods. In the end, that is what attr_reader etc. are doing, just like the .net auto-property syntax creates actual methods on compilation.

def test=(value)
  puts "I'm a property setter for @test!"
  @test = value
end

def test
  puts "I'm a property getter for @test!"
  return @test
end

Supposedly, attr_ methods are faster than manually implementing methods – not sure if it’s true, but they are definitely the recommended way if you don’t need actual logic in your getters and setters.

The syntax above used a Ruby symbol, as evidenced by the colon – :test. This is the concept that I took the longest to figure out. In a way, symbols are like interned strings in .net, since the same symbol will always mean the same thing whereas instances of strings may not be reference equal despite having the same content. Generally, Symbols should be seen as constant identifiers (they are in fact immutable). I recommend this blog post for some more information, but interned string seems to be the best .net analogue I could come up with.

Class Variables are static properties. In method names, self. is the equivalent of a static method. There are some caveats when inheriting with regards to static properties.

class MyClass
  @@static_var = 8

  def initialize(my_value)
    @instance_var = my_value
  end

  def testing
    @instance_var + @@static_var
  end

  def self.static_var=(value)
    @@static_var = value
  end
end

myc = MyClass.new(3)
puts myc.testing  # 11

myc2 = MyClass.new(4)
puts myc2.testing # 12

MyClass.static_var = 10
puts myc.testing  # 13
puts myc2.testing # 14

Constants are not prefixed and use SCREAMING_CAPS syntax.

class MyClass
  MY_CONSTANT = 20

  def testing
    4 + MY_CONSTANT
  end
end

myc = MyClass.new
puts myc.testing # 24

Class Inheritance uses < BaseClass syntax. Like .net, Ruby does not support multiple inheritance but unlike .net, there are no interfaces.

class MyClass
  def initialize
    @test = 4
  end
end

class MyDerivedClass < MyClass
  def testing
    2 + @test
  end
end

myc = MyDerivedClass.new
puts myc.testing # 6

Modules in Ruby are a bit like single-namespace assemblies in .net. Modules can contain Constants, methods, classes, etc. The include keyword is like using in .net.

module MyModule
  SOME_CONSTANT = 20

  def MyModule.say_hello
    puts "Hello!"
  end
end

class MyClass
  include MyModule

  def testing
    MyModule.say_hello
    4 + SOME_CONSTANT
  end
end

myc = MyClass.new
puts myc.testing # Hello!, followed by 24

Modules do not support inheritance, despite them being like classes (in fact, Ruby’s class class inherits from the Module class, which inherits from Object). What’s somewhat noteworthy is that constants do not need the Module name, unless there is something “closer” in scope.

class MyClass
  include MyModule

  SOME_CONSTANT = 30

  def testing
    puts 4 + SOME_CONSTANT            # 34
    puts 4 + MyModule::SOME_CONSTANT  # 24
  end
end

The double colon (::) was described as the namespace resolution operator on Stack Overflow.

There is obviously a lot more to Ruby that doesn’t translate 1:1 to .net, but I hope that the above code samples make it a bit easier to understand Ruby as a .net developer

Simplexcel 1.0.3

A while ago I created my own, simple Excel .xlsx creation library for .net 4.0+. The reason I did this was because I was unhappy with the ones on the market – many of them tried to replicate every feature Excel has to offer but don’t test actually against the actual Excel application and thus miss things like sheet lengths/invalid characters (which are NOT part of the OpenXML Spreadsheet standard) or allow to otherwise create invalid Excel files, prompting Excel to give you a super useless “Invalid data found, do you want to recover?”.

Simplexcel is my attempt at a library that may not offer all the features Excel can do, but tries to make sure that every feature actually works with every applicable version of Excel (2007, 2008 for Mac, 2010, 2011 for Mac and 2013).

Today I updated it:

Version 1.0.3 (2013-08-20)

  • Added support for external hyperlinks
  • Made Workbooks serializable using the .net DataContractSerializer

Full Documentation: http://mstum.github.io/Simplexcel/
Nuget.org Package Page: http://www.nuget.org/packages/simplexcel/
GitHub Page: https://github.com/mstum/Simplexcel

Are Data Annotations a sign of bad design?

The .net Framework comes with a validation system, called Data Annotations. At first glance, these are pretty awesome and I have used them a lot. In a nutshell, it allows you to do stuff like this:

[Required]
[StringLength(64)]
public string Title { get; set; }

It is also absolutely trivial to write your own validators and to validate objects using these. In fact, ASP.net MVC will automatically verify these as part of its model binding.

However, they have their limits, and because of these limits I started questing them a while ago. Since they are attributes, they have to be constant – not a big deal, constraints should be constant anyway. There is some awkwardness around MVC model binding, for example this:

[Range(0,100)]
public byte Value { get; set; }

will actually blow up during model binding if a negative amount is passed since the binding of a negative value to a byte fails and we don’t even get to this point – arguably, this is an ASP.net MVC issue and not directly related to Data Annotations.

The bigger problem with Data Annotations is that they can’t validate the object completely and thus make it hard to adhere to DRY and SRP rules. For example, let’s say I have a business rule that says that Titles must be unique.

Whose job is it to ensure this constraint? With Data Annotations, we put the responsibility in the hands of the business object, which turns it from a simple data structure into an object (Uncle Bob Martin has some good insight about the distinction in his book Clean Code) and thus may violate SRP.

Let’s say we would want to create a [NoDuplicateTitle] Attribute. Inside the attribute, we could use Dependency Injection to get a database connection (since MVC 3, there is a built in DependencyResolver static), then do the validation and bail out if required. Our Business Service and Repository layers could then just run validation on the business object itself to not duplicate their logic.

The obvious problem is that these attributes are not reusable outside of an ASP.net MVC application unless you bundle them with their own Dependency Injection mechanism (essentially just a Service Locator). The other problem is that you just added additional database calls to your application, in places that aren’t immediately obvious.

Another issue I see: What about other dependencies that aren’t directly visible from the database? Let’s say you have a rule that says you can only create 10 items an hour. That rule may change over time (e.g., next week management decides that it should be 25/hour) or may be inconsistent accross users (HR personell can create an unlimited amount while IT staff can only create 5). These rules are not constants, they can change from one second to the other. Also, this rule doesn’t describe the validity of an object but rather the validity of an action. The same object may be valid or invalid depending on the user and time of submission and thus can only reliably be verified in the service in the moment you actually try to save it.

Is it the responsibility of the business object to know if it can be saved given the current user and time? That seems like too much responsibility for something that’s supposed to be a data structure. Is it the services responsibility? That sounds more like it, since ultimately the service enforces the business rule. But if the service enforces the "10 items/hour" rule, shouldn’t it also enforce the "64 character max length" rule? Because if the title is too long, the save should fail. Should the business object care why it fails?

Arguably, the C# type system is causing us some grief here since there is no good way to create a type with constraints like "a number between 0 and 100" or "a non-empty string that is 64 characters or less" (I think that Haskell or F# support this), but we still have to solve it somehow.

One way is to create a IEnumerable<ValidationResult> Validate(MyBusinessObject bo) method in the service and have it make the calls. This satisfies SRP and DRY since it’s now only the service that handles this. The downside is that we lose some cheap and easy ways to build HTML Forms, e.g. an Html.TextBoxFor call can no longer check for a StringLengthAttribute to set the maxlength on the generated textbox. Data Annotations still make sense on View Models given that, but now we violated DRY and have to deal with changes all over the place (someone decides that Titles can be 72 chars long. You now need to change it in all places, and since you’re likely dealing with a hardcoded literal 64 you need to make sure you’re not changing some other constraint that also was 64 and shouldn’t change. If you want to use a public const int MaxTitleLength = 64 you need to ask yourself where to put it. On the service? Now your ViewModels have a compile-time dependency on the service. On the business object? Seems to violate SRP. In a Constants.cs class? Smells like a big ball of mud. It has to go somewhere and the correct answer will vary by project and by the underlying tech stack, but it’s an interesting problem to think about nevertheless.

There is another option involving dynamic generation of attributes at compile time, but that seems like a lot of bandaid. I don’t know what the right answer is, but I’ll try a different approach without Data Annotations for my next project and see how that goes.

Debugging a .net 4.0 application when .net 4.5 is installed

I have a machine that runs .net 4.0 and where I took a memory dump of an application. I moved the dump to my machine, which has .net 4.5 installed and tried to debug it in WinDbg:

0:000> .loadby sos clr
0:000> !DumpHeap
Failed to load data access DLL, 0x80004005
Verify that 1) you have a recent build of the debugger (6.2.14 or newer)
            2) the file mscordacwks.dll that matches your version of clr.dll is 
                in the version directory or on the symbol path
            3) or, if you are debugging a dump file, verify that the file 
                mscordacwks_<arch>_<arch>_<version>.dll is on your symbol path.
            4) you are debugging on supported cross platform architecture as 
                the dump file. For example, an ARM dump file must be debugged
                on an X86 or an ARM machine; an AMD64 dump file must be
                debugged on an AMD64 machine.

You can also run the debugger command .cordll to control the debugger's
load of mscordacwks.dll.  .cordll -ve -u -l will do a verbose reload.
If that succeeds, the SOS command should work on retry.

If you are debugging a minidump, you need to make sure that your executable
path is pointing to clr.dll as well.

On the internet, there’s a bunch of guides that focus on bringing in the right version of mscordacwks.dll, but that wasn’t my problem (I have setup the Symbol Server and WinDBG was correctly downloading the right version). Turns out that .net 4.5 comes with a new SOS.dll that is incompatible with .net 4.0 memory dumps. The solution for me was to copy the SOS.dll from a .net 4.0 machine (from C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319) into the winext folder of my WinDbg installation (C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x86\winext) and then just load it with

.load sos

Yeah, this whole in-place upgrade of .net 4.0 to 4.5 was truly a great idea…

Simplexcel – simple .xlsx library for .net

As part of almost every application I work on, I need to create Excel sheets one way or the other, usually from an ASP.net MVC application. During the years, I’ve tried several approaches, and they all sucked one way or the other:

  • COM Interop: By far the worst option. Requires an installed Excel. Slow. Error prone. Resource-intensive. Not supported in a server environment.
  • CSV or HTML Tables: Only supports a single worksheet, not much formatting, prone to break Excel’s heuristics (e.g., the string “00123” is interpreted as a number, stripping the leading zeroes. For big numbers, Excel loves to use scientific notation, which sucks for Barcodes which should just be interpreted as strings) and hard to create (CSV is a pain if you need to escape quotes or have newlines)
  • Excel 2003 XML/SpreadsheetML: A crutch. Uncertain future, limited options, big files. But actually, not too bad.
  • One of the many Excel 2007+ .xlsx libraries: I tried about 4 or 5, and they all sucked in a different way. No offense, but some library authors try to cover the entirety of Excel’s capabilities, leading to an awkward API. Many don’t catch specific Excel limitations that aren’t part of the standard (e.g., Sheet name length or invalid characters), which means I’m creating sheets that cause Excel to tell me there was invalid data, leaving me puzzled how to fix that.
  • Going Low-Level with the OpenXML SDK 2.0. Believe me, you don’t want to go down that road. There is very little help creating the documents, and if you want certain features that seem obvious (e.g., setting the Author of a document, which requires adding the creator element to core.xml) you will find that there is actually no way to do it.

So, all solutions I tried sucked. Which means that I set out to create another solution that sucks slightly less. Armed with some spare time, the ECMA-376 standard and Excel 2007 and 2010 to actually test against, I created a library that has a limited set of features, but implements them well, handles errors properly and (hopefully) has a good API for you guys to work against.

Features

  • You can store numbers as numbers, so no more unwanted conversion to scientific notation on large numbers!
  • You can store text that looks like a number as text, so no more truncation of leading zeroes because Excel thinks it’s a number
  • You can have multiple Worksheets
  • You have basic formatting: Font Name/Size, Bold/Underline/Italic, Color, Border around a cell
  • You can specify the size of cells
  • Workbooks can be saved compressed or uncompressed (CPU Usage vs. Network Traffic)
  • You can specify repeating rows and columns (from the top and left respectively), useful when printing.
  • Fully supported in ASP.net and Windows Services (The documentation contains an example ActionResult for ASP.net MVC)

Usage

You can get the Simplexcel Nuget Package for .net 4.0 Client Profile or higher – https://nuget.org/packages/simplexcel

Documentation can be found at http://mstum.github.com/Simplexcel/

Source Code

Licensed under the MIT License, the code can be found at https://github.com/mstum/Simplexcel

I would like Tech Books to be like this

I just received my copy of Using the HTML5 Filesystem API by Eric Bidelman and wanted to use it as an example of how I would like Tech Books to be:

HTML5FS

This book is specialized and concise. As you might see from the image, it comes it an well under 100 Pages and covers exactly one topic.

I have only very limited time compared to the amount of technology that’s out there. The Web is evolving so fast, it’s hard to keep up with new developments. Information is scattered in blog posts, and it’s often hard to just find a cohesive front-to-back tutorial. Also, even though I own both a Kindle and an iPad, I prefer my books to be on dead trees.

Tech Books are often massive, 500 to 1000 page compendiums, and there are cases when it’s necessary. If you read the .net Specification, the C# Language Specification or David Flanagan’s excellent JavaScript: The Definitive Guide, it’s hard to get these under 500 pages, and that’s okay because they are compendiums about an entire language.

However, way too often I see books that have a lot of text but very little information. One of my Pet Peeves is the Introduction and Installation section. Many Web Books start with a 50 page history of the Internet, from CERN to HTML5, how the internet has revolutionized the world and what great things lie in front of us. Sorry, if I really want to know about the history of the Internet, I hit up Wikipedia, just like the book author likely did when researching. The only interesting piece of History is the motivation behind a technology, answered by asking two questions: What Problem did the creator of a technology run into, and how is this technology attempting to solve it?

Same for the installation Section. I don’t need a 20 Page installation that’s outdated anyway when the book comes out, Give us the official Project Website and maybe a few gotchas, but we can take it from there.

Being able to read a book in an evening is a huge win because it already plants knowledge into our brain about what’s available. I might not have immediate need for a technology, but knowing what and how it does stuff means that the next time I run into a technical problem, I might think “Hey, I can use X to solve that!”.

The HTML5 Filesystem API book does just that. It has a small Intro and dives right into code examples. The API is simple, yet does introduce new objects into JavaScript. There is no example application per se (One could think that building an Address Book with uploadable Photos would be cool), presumably because it would take the focus away from the core. The code builds on top of each other though, from the creation of a file system to adding, deleting, uploading and remote getting files.

In fact, this book is good for another point: Tech Books are usually behind the applications they release. When a technology comes out, the early books are usually garbage, based on Beta/Prerelease versions with code examples that are slightly broken because RTM changed stuff. They don’t have many real world usage scenarios, often cover too much too shallow. This is even worse when the technology is an update (e.g., ASP.net MVC 3) and the early books are just previously released books (e.g., for ASP.net MVC 2) updated with some new features rather than redoing the entire book front to end with the new technology in mind.

The later books are usually too late – by the time they come out, the successor technology has already been announced, and people gained knowledge through blog posting, cursing at how bad the technology is documented.

Tech Books remind me a lot of Waterfall development, in that they simply take way too long to come to the market and don’t meet the needs of the customers. Writing a 500+ page tome simply takes time.

The agile version of that is what the above book demonstrates: The technology it’s discussing isn’t even out yet and may change, a fact that’s clearly stated in the book. However, because it’s only 80 or so pages, it’s quick to write and gives the reader enough knowledge about all aspects of the API, so that eventual changes should be fairly trivial to do (Eric also includes the link to the W3C Spec, so you can stay up to date). But even if they completely throw away and redo the API, it only wasted a small amount of my time – and my money. The book is $20 (cheaper on Amazon right now), so it’s not a massive investment at all, especially if I would calculate the time it would take me to scour through scattered blog postings as billable time.

Another great example is The Node Beginner Book. It’s cheap ($5), comes in at about 60 pages, is cohesive and available right now (compared to some books that are scheduled to come out in 4 months or more). CoffeeScript: Accelerated JavaScript Development is another excellent one. I heard that the Nuget folks are considering a book as well, and I would love to just get a book that covers how Nuget works internally, how the Server was written and how to interact with the Visual Studio plugin. Sure, I can look at the source code, but again, a concise and cohesive overview with samples and a reference part is all that’s really needed.

Some other books I would totally buy: Model Binding and Validation in ASP.net MVC 3/4, RESTFul WCF Applications, Ways to send data from the Server to a browser (From COMET and setInterval/JSONP crutches to Web Sockets and Server Side Events), Writing an ORM from Scratch.

Writing a BF Compiler for .net (Part 7: The ret instruction)

Yesterday I have concluded the .net BF series with the explanation and code of the .net BF Compiler. But there was one thing that is important and that was unclear in the post: The ret instruction.

Remember how the compiler emits a ret instruction at the end of the constructor and Execute method? I said it’s optional and only required for verifiable code. Well, I’ve asked on StackOverflow in the meantime and got the answer I was looking for:

Control is not permitted to simply “fall through” the end of a method. All paths shall terminate with one of these instructions: ret, throw, jmp, or (tail. followed by call, calli, or callvirt).

(ECMA-335; 12.4, 6)

So, there you have it: Make sure each method ends with one of the above instructions.