PicSol – a .net Nonogram/Picross Solver Library

Nonograms – also known as Griddlers, Picture Crosswords, or Picross – are pretty cool puzzles, kind of like a more visual Crossword puzzle or Sudoku. Of all the games on my New 2DS XL, Mario’s Picross and the Picross e series are near the top of my Activity Log (beaten only by Smash Bros).

I got curious about algorithmic solutions to those Nonograms, which seems deceptively easy, but is actually NP-complete. When trying to solve a Nonogram, often I can to only fill in one or a few cells of a group, which then leads to another cell that can be filled in (or X-ed out), and step by step, cell by cell, I solve the Nonogram. Now, that assumes that the Nonogram is properly designed – if that’s the case, then there is always at least one cell that either must definitely be filled or definitely be empty.

All of Jupiter’s games are well designed – even the most tricky ones (with a bunch of 1’s and 2’s and no big numbers) always follow the mantra of There’s always at least one cell that has a definitive solution. There are a lot of other games on the market (Steam returns about 15 games when searching for Picross or Nonogram), and some are not well designed and actually require guessing.

I ended up (after a bunch of googling approaches and other existing solvers) with a solution that’s mostly brute force – generate all possibilities for a row and column, then eliminate those that can’t be correct, rinse and repeat until there’s only 1 possibility left for each row and column, or until we determined that the Nonogram is actually unsolvable. There are some shortcuts that we can take, e.g, when a row/column is empty, completely filled, or completely filled with gaps in-between them.

I’ve created PicSol, a library for .net Standard 2.0 and .net Framework 4.0 (or newer) and available on Nuget which offers a Solver for Nonograms.

Check out the README for information on how to use it, or look at the Console project in the GitHub repository.



Using .net Framework sources for better debugging

Over the last couple of weeks, we’ve been working on changes to our SAML 2.0 Authentication on Stack Overflow Enterprise. This was required to better support scenarios around signing and encrypting SAML Requests and Responses, and as such, a lot of the work was centered around XML Security, specifically the SignedXml and EncryptedXml classes.

Now, one thing I can say for sure about SAML 2.0 is that every Identity Provider implements it slightly differently, causing XML Validation errors or CryptographicExceptions somewhere deep in the code. So, how can we properly debug and fix this?

The first option is to Enable Symbol Server support in Visual Studio. This gives you .pdbs for .net Framework code, but because the Framework is compiled in Release mode, some code is inlined or otherwise rewritten to no longer match the exact source, which makes following variables and even call stacks really hard the deeper you go.

Another option is to check out the .net Framework Reference Source, also available on the easy to remember http://sourceof.net. This is the actual source code of the .net Framework, which allows at least reading through it to see what’s actually going on, at least until you hit any native/external code. You can even download the sources, which not only allows you to view it in Visual Studio, but it also allows you to compare implementations across Framework versions to see if anything changed. (The code on the website is always only for the latest Framework, which is 4.7.1 at the time of writing. If I need to see how something was implemented in 4.6.2, I need to download the sources)

Another thing that we can do with the reference sources is to put them into our project, change namespaces to avoid ambiguity, and then use our private copies of the implementation.

This is a huge help for several reasons:

  • We can step through actual code, no compiler symbols or weird abstractions, it’s actual code.
  • Since we can compile in Debug mode, we don’t have to worry about optimizations making the debug experience much harder.
  • Breakpoints work properly, including conditional ones.
  • If we have a hypothesis about a cause of or fix for an issue, we can make source code changes to verify.

Now, there are two problems with Framework Sources. First off, because Framework code can refer to protected and internal methods, we might have to either copy a massive amount of supporting code, or implement workarounds to call those methods that are inaccessible to us (there are 79 files in my private copy of SignedXml/EncryptedXml). But the real showstopper is that the license doesn’t allow us to ship Framework code, as it’s only licensed for reference use. So if I found a way to fix an issue, I need to see how I can make this work on our side of the code, using the actual .net Framework classes.

Now, if we really don’t find a way to solve this issue without needing to modify Framework code, a possible option are the .NET Core Libraries (CoreFX) sources, because that code is MIT Licensed. It’s a subset of the .net Framework, but due to the license, anything that’s there can be used, modified, and shipped by us. This is a bit of a last resort, but can be preferable to worse workarounds. I can not understate how awesome it is that Microsoft releases so much code under such a permissible license. It not only makes our life so much easier, but in turn it benefits our customers by providing a better product.

In the end, we could resolve all issues we’ve seen without having to modify framework code after why understood exactly where (and why!) something was failing, and our SAML 2.0 implementation got a whole lot better because of the availability of the source code.

Simplexcel 2.0.5

It’s been a few month since I released Simplexcel 2.0.0, which was a major change in that it added .net Standard support, and can be used on .net Core, incl. ASP.net Core.

Since then, there have been a few further feature updates:

  • Add Worksheet.Populate<T> method to fill a sheet with data. Caveats: Does not loot at inherited members, doesn’t look at complex types.
  • Also add static Worksheet.FromData<T> method to create and populate the sheet in one.
  • Support for freezing panes. Right now, this is being kept simple: call either Worksheet.FreezeTopRow or Worksheet.FreezeLeftColumn to freeze either the first row (1) or the leftmost column (A).
  • If a Stream is not seekable (e.g., HttpContext.Response.OutputStream), Simplexcel automatically creates a temporary MemoryStream as an intermediate.
  • Add Cell.FromObject to make Cell creation easier by guessing the correct type
  • Support DateTime cells
  • Add support for manual page breaks. Call Worksheet.InsertManualPageBreakAfterRow or Worksheet.InsertManualPageBreakAfterColumn with either the zero-based index of the row/column after which to create the break, or with a cell address (e.g., B5) to create the break below or to the left of that cell.

Simplexcel is available on Nuget, and the source is on GitHub.

TresorLib – a deterministic password generator library for .net

Tresor is a .net library (.net 4/netstandard 1.3 or newer) to generate passwords in a deterministic way, that is, the same inputs will yield the same outputs, but from simple inputs (Like the service name twitter and the passphrase I'm the best 17-year old ever.) you will get a strong password like c<q_!^~,'.KTbPV=9^mU.

This helps with the behavior of using the same passphrase for multiple sites, but is also not reliant on a password vault that you don’t always tend to have with you (e.g., because it’s a file on your computer and you’re on the road) or requires you to trust a cloud-service to securely keep your data.

Internet Websites get compromised all the time – so frequently that haveibeenpwned.com has been created for people to get notified whenever their email address shows up in a data breach. Troy Hunt has an amazing blog about web security issues and breaches, and explains why you should use a Password Manager.

Tresor is a port of Vault by James Coglan to .net. When I first discovered Vault, it immediately checked the boxes I was looking for: It works everywhere as long as I have internet access, but it doesn’t store any data that can be compromised. All I need to do is remember the service name and settings, and voila, there’s my password without having to store it in yet another online database that can be compromised.

For more information about Vault, check the FAQ on https://getvau.lt/faq.html.

Please note that this port is not endorsed or in any way associated with James Coglan.

TresorLib has been published to Nuget, and the source code is available on GitHub. It is licensed under the GNU General Public License Version 3 (GPLv3).

Example usage:

var service = "twitter";
var phrase = "I'm the best 17-year old ever.";
var password = Tresor.GeneratePassword(service, phrase, TresorConfig.Default);

// password => c;q- q}+&,KTbPVn9]mh

For more usage information and documentation, check the Readme in the GitHub repository.

Simplexcel 2.0.0

A couple of years ago, I created a simple .net Library to create Excel .xlsx sheets without the need to do any COM Interop or similar nonsense, so that it can be used on a web server.

I just pushed a major new version, Simplexcel 2.0.0 to NuGet. This is now targeting both .net Framework 4.5+ and .NET Standard 1.3+, which means it can now also be used in Cross-Platform applications, or ASP.net Core.

There are a few breaking changes, most notably the new Simplexcel.Color struct that is used instead of System.Drawing.Color and the change of CompressionLevel from an enum to a bool, but in general, this should be a very simple upgrade. Unless you still need to target .net Framework 4 instead of 4.5+, stay on Version 1.0.5 for that.

RAM and CPU Cycles are there to be used.

What is wrong with this picture?

Answer: The white area in the top of the CPU and Memory graphs. These indicate that I spent money on something that I’m not using.

One of the complaints that are often brought forward how certain applications (especially browsers) are “memory hogs”. As I’m writing this, Chrome uses 238.1 MB of RAM, and a separate Opera uses 129.8 MB. Oh my, remember when 4 MB were enough to run an entire operating system?

Now, here’s the thing about RAM and CPU Cycles: I spend my (or someone elses) hard earned cash on it in order to speed up my computer use. That’s literally why it exists – to make stuff go faster, always.

Having 32 GB of RAM cost about $200. In the above screenshot, about $135 of those hard earned dollars are just doing noting. It’s like hiring (and paying) an employee full-time and only giving them 3 hours of work every day. That CPU? It’s Quad-Core, Eight Thread, with 2.6 Billion Cycles per second – that’s between 10 and 20 Billion total cycles each second. And yet, it’s sitting at 16% in that screenshot. For a CPU that’s priced at $378, that’s $317 not being used.

There are priorities and trade-offs that everyone needs to make. In a Laptop, maybe you don’t want the CPU to be constantly close to 100%, because it drains the battery and the whirring cooling fan is annoying. But maybe you bought such a high end laptop specifically because you want to use that much CPU power and RAM. I certainly did.

My primary application is Visual Studio, which has been making some really good strides in 2017 to be faster. Find all References is pretty fast, Go To Types is pretty fast. Just “Find in Files” could be faster because it still seems to hit the disk. The cost for that? Currently 520 MB RAM usage. I’ll take that. In fact, if I could more speed at the expense of more RAM, I’d take that as well. In fact, I would love for Visual Studio to find a way to reduce the 45 Second build time – as you see in the graph, the CPU only briefly spikes. Why is it not constant 100% when I click the Build button? Is there a way to just have everything needed to compile constantly in RAM? (and yes, I know about RAM disks, and I do have a SSD that does 3 GB/s – but the point is for applications to be more greedy)

Two applications that I run that really make great use of my Computer are SQL Server and Hyper-V. SQL Server is currently sitting at 3.5 GB and will grow to whatever it needs. And Hyper-V will use whatever it needs as well. Both application also do respect my limits if I set them.

But they don’t prematurely limit themselves. Some people are complaining about Spotify’s memory usage. Is that too much for a media player? Depends. I’m not using Spotify, but I use iTunes. Sometimes I just want to play a specific song or Album, or just browse an Artist to find something I’m in the mood for. Have you ever had an application where you scroll a long list and halfway through it lags because it has to load more data? Or where you search/filter and it takes a bit to display the results? I find that infuriating. My music library is only about ~16000 tracks – can I please trade some RAM to make the times that I do interact with it as quick as possible? YMMV, but for me, spending 500 MB on a background app for it to be super-responsive every time I interact with it would be a great tradeoff. Maybe for you, it’s different, but for me, iTunes does stay fast at the expense of my computer resources.

Yeah, some apps may take that too far, or do wrong behaviors like trashing your SSD. Some apps use too much RAM because they’re coded inefficiently, or because there is an actual bug. It should always be a goal to reduce resource usage as much as possible.

But that should just be one of the goals. Another goal should be to maximize performance and productivity. And when your application sees a machine with 8, 16 or even 32 GB RAM, it should maybe ask itself if it should just use some of that for productivity reasons. I’d certainly be willing to trade some of that white space in my task manager for productivity. And when I do need it for Hyper-V or SQL Server, then other apps can start treating RAM like it’s some sort of scarce resource. Or when I want to be in battery-saver mode, prioritizing 8 hours of slow work over 4 hours of fast work.

But right now, giving a couple of hundreds of megs to my Web Browsers and Productivity Apps is a great investment.

git rebase on pull, and squash on merge

Here’s a few git settings that I prefer to set, now that I actually work on projects with other people and care for a useful history.

git config --global branch.master.mergeoptions "--squash"
This always squashes merges into master. I think that work big enough to require multiple commits should be done in a branch, then squash-merged into master. That way, master becomes a great overview of individual features rather than the nitty-gritty of all the back and forth of a feature.

git config --global pull.rebase true
This always rebases your local commits, or, in plain english: It always puts your local, unpushed commits to the top of the history when you pull changes from the server. I find this useful because if I’m working on something for a while, I can regularly pull in other people’s changes without fracturing my history. Yes, this is history-rewriting, but I care more for a useful rather than a “pure” history.

Combined with git repository hosting (GitLab, GitHub Enterprise, etc.), I found that browsing history is a really useful tool to keep up code changes (especially across timezones), provided that the history is actually useful.

When nslookup works but you can’t ping it, NetBIOS may be missing

I have a custom DNS Server (running Linux) and I also have a server running Linux (“MyServer”). The DNS Server has an entry in /etc/hosts for MyServer.

On my Windows machines, I can nslookup MyServer and get the IP back, but when I try to access the machine through ping or any of the services it offers, the name doesn’t resolve. Access via the IP Address works fine though.

What’s interesting is that if I add a dot at the end (ping MyServer.) then it suddenly works. What’s happening?!

What’s happening is that Windows doesn’t use DNS but NetBIOS for simple name resolution. nslookup talks to the DNS Server, but anything else doesn’t use DNS.

The trick was to install Samba on MyServer, because it includes a NetBIOS Server (nmbd). On Ubuntu 16.04, just running sudo apt-get install Samba installs and auto-starts the service, and from that moment on my Windows machines could access it without issue.

There are ways to not use NetBIOS, but I didn’t want to make changes on every Windows client (since I’m using a Domain), so this was the simplest solution I could find. I still needed entries in my DNS Server so that Mac OS X can resolve it.

Why Feature Toggles enable (more) frequent deployments

As Nick Craver explained in his blog posting about our deployment process, we deploy Stack Overflow to production 5-10 times a day. Apart from the surrounding tooling (automated builds, one-click deploys, etc.) one of the reasons that is possible is because the master-branch rarely ever stays stale – we don’t feature branch a lot. That makes for few merge-nightmares or scenarios where suddenly a huge feature gets dropped into the codebase all at once.

The thing that made the whole “commit early, commit often” principle click for me was how easy it is to add new feature toggles to Stack Overlow. Feature Toggles (or Feature Flags), as described by Martin Fowler make the application [use] these toggles in order to decide whether or not to show the new feature.

The Stack Overflow code base contains a Site Settings class with (as of right now) 1302 individual settings. Some of these are slight behavior changes for different sites (all 150+ Q&A sites run off the same code base), but a lot of them are feature toggles. When the new IME Editor was built, I added another feature toggle to make it only active on a few sites. That way, any huge issue would’ve been localized to a few sites rather than breaking all Stack Exchange sites.

Feature toggles allow for a half-finished feature to live in master and to be deployed to production – in fact, I can intentionally do that if I want to test it with a limited group of users or have our community team try it before the feature gets released network-wide. (This is how the “cancel misclicked flags” feature was rolled out). But most importantly, it allows for changes to constantly go live. If there is any unintended side-effects, we notice it faster and have an easier time locating it as the relative changeset is small. Compare that to some massive merge that might introduce a whole bunch of issues all at once.

For feature toggles to work, it must be easy to add new ones. When you start out with a new project and want to add your first feature toggle, it may be tempting to just add that one new toggle, but if the code base grows bigger, having an easy mechanism really pays off. Let me show you how I add a new feature toggle to Stack Overflow:

[SiteSettings]
public partial class SiteSettings
{
    // ... other properties ...

    [DefaultValue(false)]
    [Description("Enable to allow users to retract their flags on Questions, Answers and Teams.")]
    [AvailableInJavascript]
    public bool AllowRetractingFlags { get; private set; }
}

When I recompile and run the application, I can go to a developer page and view/edit the setting:

RetractFlags

Anywhere in my code, I can gate code behind an if (SiteSettings.AllowRetractingFlags) check, and I can even use that in JavaScript if I decorate it with the [AvailableInJavascript] attribute (NĂ©stor Soriano added that feature recently, and I don’t want to miss it anymore).

Note what I did not have to do: I did not need to create any sort of admin UI, I did not need to write some CRUD logic to persist the setting in the database, I did not need to update some Javascript somewhere. All I had to do was to add a new property with some attributes to a class and recompile. What’s even better is that I can use other datatypes than bool – our code supports at least strings and ints as well, and it is possible to add custom logic to serialize/deserialize complex objects into a string. For example, my site setting can be a semi-colon separated list of ints that is entered as 1;4;63;543 on the site, but comes back as an int-array of [1,4,63,543] in both C# and JavaScript.

I wasn’t around when that feature was built and don’t know how much effort it took, but it was totally worth building it. If I don’t want a feature to be available, I just put it behind a setting without having to dedicate a bunch of time to wire up the management of the new setting.

Feature Toggles. Use them liberally, by making it easy to add new ones.

Handling IME events in JavaScript

Stack Overflow has been expanding past the English-speaking community for a while, and with the launch of both a Japanese version of Stack Overflow and a Japanese Language Stack Exchange (for English speakers interested in learning Japanese) we now have people using IME input regularly.

For those unfamiliar with IME (like I was a week ago), it’s an input help where you compose words with the help of the operating system:
IME
In this clip, I’m using the cursor keys to go up/down through the suggestions list, and I can use the Enter key to select a suggestion.

The problem here is that doing this actually sends keyup and keydown events, and so does pressing Enter. Interestingly enough, IME does not send keypress events. Since Enter also submits Comments on Stack Overflow, the issue was that selecting an IME suggestion also submits the comment, which was hugely disruptive when writing Japanese.

Browsers these days emit Events for IME composition, which allowed us to handle this properly now. There are three events: compositionstart, compositionupdate and compositionend.

Of course, different browsers handle these events slightly differently (especially compositionupdate), and also behave differently in how they treat keyboard events.

  • Internet Explorer 11, Firefox and Safari emit a keyup event after compositionend
  • Chrome and Edge do not emit a keyup event after compositionend
  • Safari additionally emits a keydown event (event.which is 229)

So the fix is relatively simple: When you’re composing a Word, we should not have Enter submit the form. The tricky part was really just to find out when you’re done composing, which requires swallowing the keyup event that follows compositionend on browsers that emit it, without requiring people on browsers that do not emit the event to press Enter an additional time.

The code that I ended up writing uses two boolean variables to keep track if we’re currently composing, and if composition just ended. In the latter case, we swallow the next keyup event unless there’s a keydown event first, and only if that keydown event is not Safari’s 229. That’s a lot of if’s, but so far it seems to work as expected.

submitFormOnEnterPress: function ($form) {
    var $txt = $form.find('textarea');
    var isComposing = false; // IME Composing going on
    var hasCompositionJustEnded = false; // Used to swallow keyup event related to compositionend

    $txt.keyup(function(event) {
        if (isComposing || hasCompositionJustEnded) {
            // IME composing fires keydown/keyup events
            hasCompositionJustEnded = false;
            return;
        }

        if (event.which === 13) {
            $form.submit();
        }
    });

    $txt.on("compositionstart",
            function(event) {
                isComposing = true;
            })
        .on("compositionend",
            function(event) {
                isComposing = false;
                // some browsers (IE, Firefox, Safari) send a keyup event after
                //  compositionend, some (Chrome, Edge) don't. This is to swallow
                // the next keyup event, unless a keydown event happens first
                hasCompositionJustEnded = true;
            })
        .on("keydown",
            function(event) {
                // Safari on OS X may send a keydown of 229 after compositionend
                if (event.which !== 229) {
                    hasCompositionJustEnded = false;
                }
            });
},

Here’s a jsfiddle to see the keyboard events that are emitted.