A failed attempt of using a Tinkerboard or Raspberry Pi as a Backup Server

In my earlier post about my new file server, I talked about setting up a backup strategy. Now, my first attempt was to use a ASUS Tinkerboard or Raspberry Pi 3. This attempt failed, and I’d like to go over why, so that future readers don’t make the same mistake.

The Theory: Desired Setup

Here’s my desired backup strategy:

As you see, the Tinker Board was supposed to be the local backup server, that is, the server that holds backups of all the main data. From there, I want to upload to Backblaze’s B2 service.

Bottleneck 1: Multi-Core CPUs don’t really matter for SFTP

My initial plan was to use the SSH File Transfer Protocol (SFTP). After all, the Tinker Board with its 1.8 GHz Quad Core CPU should do well, right? No, not really. The thing is that SFTP doesn’t seem to parallelize well – it uses one Core to the max, and no other.

Whether this is a limitation of the protocol, or a limit of OpenSSH, I don’t know, but I just couldn’t get good speed over SFTP.

Bottleneck 2: USB 2.0 is just not very fast

Now, this one is a bit of a “well, duh!” issue, but I initially didn’t really consider that USB 2.0 is capped at a theoretical max of 480 MBit/s, which is 60 MB/s. So even after switching from SFTP down to SMB or unencrypted FTP, I wasn’t reaching more than about 45 MB/s, even though the hard drive itself can do much more. This would mainly be a problem for the initial backup (~300 GB) and for restores though.

Bottleneck 3 (Raspberry Pi only): Slow Ethernet

On the Raspberry Pi, the Ethernet is only 100 MBit/s, and connected via USB, thus sharing bandwidth. On the ASUS Tinker Board, the Ethernet is a dedicated Gigabit Ethernet controller, and thus doesn’t share bandwidth.

A lot of boxes and cables, for a lot of money

This one is subjective, but my setup was 1 ASUS Tinkerboard with Power Supply, connecting to a USB Hard Drive which also has its own power supply. It looked messy, and also wasn’t really cheap. For $102 ($60 Tinker Board, $14 Case with Fan, $8 Power Supply and $20 USB Drive Case), it’s cheaper than most anything else. For example, a $55 Board with 10W CPU, $40 case, $20 Power Supply and $23 RAM would’ve been $140, but likely much faster.

Going with a Mini-ITX instead

I’ll have a bigger blog post about the actual implementation later, but in the end I re-used my previous Mini-ITX box which has proper SATA for full speed hard drives and performs much better.

I do want to say that both the Raspberry Pi 3 and the ASUS Tinker Board are awesome little boards – especially the Tinker Board impressed me. But in this case, I tried to use them for something they are just not good at, and ended up wasting a bunch of time and money.

TresorLib – a deterministic password generator library for .net

Tresor is a .net library (.net 4/netstandard 1.3 or newer) to generate passwords in a deterministic way, that is, the same inputs will yield the same outputs, but from simple inputs (Like the service name twitter and the passphrase I'm the best 17-year old ever.) you will get a strong password like c<q_!^~,'.KTbPV=9^mU.

This helps with the behavior of using the same passphrase for multiple sites, but is also not reliant on a password vault that you don’t always tend to have with you (e.g., because it’s a file on your computer and you’re on the road) or requires you to trust a cloud-service to securely keep your data.

Internet Websites get compromised all the time – so frequently that haveibeenpwned.com has been created for people to get notified whenever their email address shows up in a data breach. Troy Hunt has an amazing blog about web security issues and breaches, and explains why you should use a Password Manager.

Tresor is a port of Vault by James Coglan to .net. When I first discovered Vault, it immediately checked the boxes I was looking for: It works everywhere as long as I have internet access, but it doesn’t store any data that can be compromised. All I need to do is remember the service name and settings, and voila, there’s my password without having to store it in yet another online database that can be compromised.

For more information about Vault, check the FAQ on https://getvau.lt/faq.html.

Please note that this port is not endorsed or in any way associated with James Coglan.

TresorLib has been published to Nuget, and the source code is available on GitHub. It is licensed under the GNU General Public License Version 3 (GPLv3).

Example usage:

var service = "twitter";
var phrase = "I'm the best 17-year old ever.";
var password = Tresor.GeneratePassword(service, phrase, TresorConfig.Default);

// password => c;q- q}+&,KTbPVn9]mh

For more usage information and documentation, check the Readme in the GitHub repository.

Dell PowerEdge T30 – My 2017 File Server

It’s been about 18 Months since I build a NAS for my own use and over time, my requirements have changed drastically. For one, I went with Windows instead of OpenBSD because I wanted to run some Windows-only software, and because NTFS works for backing up other NTFS Volumes, like my Home PCs. I’ve been using Backup4All Pro to backup the server to another external storage (a 3 TB Apple Time Capsule), including AES-256 Encryption.

There were 3 major issues with the hardware in my existing NAS:

  1. The case only fits 3 Hard Drives (2x 2.5″ and 1x 3.5″), and I need some more space
  2. The CPU struggles with compression and encryption
  3. The RAM is not ECC

Number 1 would’ve been an easy fix, just get a different case, but – despite varying opinions – I want ECC RAM in my file server, and that required a new mainboard & CPU.

ECC RAM in the Intel world requires a workstation chipset (like the C232 or C236) and ECC supporting CPU, which would be a Pentium or Xeon. The Core i3/i5/i7 CPUs do not support ECC even though the lowly Pentium does, presumably to not cannibalize the Xeon E3 market. After all, the only real difference between a Xeon E3 and a Core i7 is ECC RAM support.

After looking around and pricing a CPU + Mainboard + ECC RAM Combo, I stumbled upon the Dell PowerEdge T30 tower server, which at least right now is on sale for $199 for the Pentium G4400/4GB ECC and $349 for the Xeon E3-1225 v5/8 GB non-ECC version, with coupon codes 199T30 and 349T30 (Update: These seem expired now). $199 definitely beats anything I can price together for a Case + Board + CPU + ECC RAM, so I ended up buying one as my new file server.

The Specs

  • Intel Pentium G4400 CPU (2x 3.3 GHz Skylake, 3 MB Cache, 54W TDP, No Hyperthreading)
  • 1x 4 GB DDR4-2133 ECC RAM – there are 4 slots, supporting a maximum of 64 GB. Dual-Channel, so with only 1 DIMM it’s a bit of a waste.
  • 1 TB 7200rpm SATA Hard Disk (mine is a Toshiba DT01ACA100, manufactured October 2016, 512 emulated sectors, 32 MB Cache)
  • Intel C236 Chipset Main Board
  • Front: 2x USB 3.0, 2x USB 2.0, Sound
  • Back: 4x USB 3.0, Sound, HDMI, 2x DisplayPort, PS/2 Mouse and Keyboard, and Gigabit Intel I219-LM Ethernet
  • Slots: 1x PCI Express 3.0 x16, 2x PCI Express 3.0 x4 Slots, and 1 old-school 32-Bit 33 MHz PCI slot
  • Undocumented M.2 2280 SSD Slot – requires PCI Express M.2 SSD, can’t use SATA/AHCI.
  • 290 Watt Power Supply
  • No Remote Management, at least on the Pentium (no DRAC either), although the Xeon CPU would support Intel AMT 11.0
  • No Operating System

I ended up buying a few extra things:

  • Kingston KTD-PE421E/4G – 4 GB DDR4-2133 ECC RAM, intended for the PowerEdge T30 (Kingston’s compatibility list) – $45
  • IO Crest 2 Port SATA III PCI-Express x1 Card (SY-PEX40039) with ASM1061 chipset – to add 2 more SATA Ports. No fancy RAID Controller needed here. – $15
  • SATA Power Splitter Cable – power those two extra SATA drives since I don’t have the official Dell cable. – $5
  • Samsung PM951 128 GB NVMe SSD – after discovering the undocumented M.2 SSD Slot. – $75

So that’s an extra $140 for a $199 server, but I couldn’t have built anything better for $340.

Storage Options

The server supports up to 6 hard drives, with 4x 3.5″ and 2x 2.5″ slots. Instead of 2x 2.5″ drives, a slim laptop optical drive can be installed. However, there are only 4 SATA ports on the mainboard (Intel RST, supports RAID), and by default only power cables for 4 drives as well (2 for the bottom, 2 for the top). That’s why I ended up buying a cheap 2-Port SATA controller and a SATA power splitter cable.

If you want to run 6 drives, consider the official Dell cable:

512-byte sectors

One interesting tidbit is that the hard drive is using 512e sectors, that means that even though internally the drive uses 4K Advanced Format, the system actually sees a drive with old-school 512 Byte Sectors. This can help with compatibility issues, because some systems (e.g., VMWare ESXi/vSphere) don’t support 4K Sector drives.

Since Dell doesn’t specify the type of drive you get, there might be other drives as well, but I think that 512e sectors makes sense for a server that’s without an operating system.

Since I don’t run VMWare and since my other data drives are 4K Sectors, the OS Drive being 512e doesn’t matter to me.

The M.2 Slot

There is an undocumented M.2 Slot on the mainboard. Actually, there are two, one in the rear of the case that’s not actually provisioned (no slot on the board, but the soldering points are there) and one M.2 Slot in the bottom right under the RAMs, that is actually provisioned and working – until BIOS 1.0.3 (and 1.0.5 as well).

I get 900 MB/s reading from the 128 GB Samsung PM951, and it’s possible to boot off it. at least until BIOS 1.0.2, which is what I’m running. I reached out to Dell to ask them to reconsider disabling the slot – it’s undocumented, but it works. I don’t know if the comparatively low speed is because the PM951 is slow or if the M.2 Slot is only connected to 2 PCIe Lanes, but I’d take a 2-Lane M.2 slot over no slot.

Drive Setup and Backup Strategy

  • 1 TB Toshiba DT01ACA100 – Intel RST – OS Drive, Windows Server 2012 R2 Standard
  • 2x 1 TB 2.5″ WD Red WD10JFCX – RAID 1 on Intel RST – Important Data like documents, photos and emails – stuff I can’t get back
  • 4 TB WD Red WD40EFRX – Intel RST – Big Stuff like Videos, Installers and just a giant grabbag of files that I can get back or afford to lose
  • 3 TB WD Red WD30EFRX – ASM1061 – Backups from my local machines
  • 128 GB Samsung PM961 – M.2 Slot – Not in use yet, didn’t want to waste it on the OS, but keep if I need super-fast storage (e.g., upgrade to 10 GBit Network and use as iSCSI Target)

Backup strategy

I haven’t set it all up yet, but the idea is to have a multi-stage backup. First off, local machines on my network just store files on the server or regularly backup onto the 3 TB drives using Windows Backup. For the data on the Server, I use Backup4All Pro to create an AES-encrypted backup to an external drive. For the longest time, this used to be an Apple Time Capsule, but that thing only supports SMB1, which is horribly broken and insecure, so that’s no longer an option. Since I still have my old server (it’s becoming a service box since I want to keep this file server here as clean as possible), I think I’m going to move the drive from the Time Capsule into it.

That gives me several levels of protection locally, but I’m also planning to archive to Amazon Glacier. Backup4All supports backing up to Amazon S3, and it’s possible to have S3 archive to Glacier, which makes this a good option. If you set the “Transition to Glacier” value to 0, it immediately goes to Glacier and no S3 storage fees apply – only transfer fees.

Once the entire Backup system is up and running, I’ll create a separate post.

Future Upgrade-ability

Now, there are some thing to be aware of. First off, the 290W power supply has exactly two power cables, which both go into the mainboard. From the mainboard, there are exactly 4 SATA Power Connectors (unless you order the SATA Power Extension for an extra $20), which is exactly what you need for the 4 SATA ports on the mainboard.

Extensibility is really limited: If you were thinking of adding a powerful GPU, you won’t have the 6 pin connector to drive it. If you want to add more hard drives (the case holds 4x 3.5″ and 2x 2.5″ drives), you’ll need either Dell’s SATA Power Extension (which they don’t seem to sell separately) or a third party one that hopefully doesn’t catch fire.

The C236 chipset supports Kaby Lake, so it should be possible to e.g., upgrade to a Pentium G4600 and get HyperThreading, or to a Skylake-based or Kaby Lake-based Xeon E3, assuming Dell doesn’t have any weird BIOS limits in place.

Memory-wise, there are 4 DIMM slots in 2 Channels, which currently mean up to 64 GB DDR4 RAM.

Verdict

It’s an entry-level server. It was reasonably priced, has ECC RAM, is really quiet, and does what I need. I like it.

More Pictures (click to enlarge)




Simplexcel 2.0.0

A couple of years ago, I created a simple .net Library to create Excel .xlsx sheets without the need to do any COM Interop or similar nonsense, so that it can be used on a web server.

I just pushed a major new version, Simplexcel 2.0.0 to NuGet. This is now targeting both .net Framework 4.5+ and .NET Standard 1.3+, which means it can now also be used in Cross-Platform applications, or ASP.net Core.

There are a few breaking changes, most notably the new Simplexcel.Color struct that is used instead of System.Drawing.Color and the change of CompressionLevel from an enum to a bool, but in general, this should be a very simple upgrade. Unless you still need to target .net Framework 4 instead of 4.5+, stay on Version 1.0.5 for that.

RAM and CPU Cycles are there to be used.

What is wrong with this picture?

Answer: The white area in the top of the CPU and Memory graphs. These indicate that I spent money on something that I’m not using.

One of the complaints that are often brought forward how certain applications (especially browsers) are “memory hogs”. As I’m writing this, Chrome uses 238.1 MB of RAM, and a separate Opera uses 129.8 MB. Oh my, remember when 4 MB were enough to run an entire operating system?

Now, here’s the thing about RAM and CPU Cycles: I spend my (or someone elses) hard earned cash on it in order to speed up my computer use. That’s literally why it exists – to make stuff go faster, always.

Having 32 GB of RAM cost about $200. In the above screenshot, about $135 of those hard earned dollars are just doing noting. It’s like hiring (and paying) an employee full-time and only giving them 3 hours of work every day. That CPU? It’s Quad-Core, Eight Thread, with 2.6 Billion Cycles per second – that’s between 10 and 20 Billion total cycles each second. And yet, it’s sitting at 16% in that screenshot. For a CPU that’s priced at $378, that’s $317 not being used.

There are priorities and trade-offs that everyone needs to make. In a Laptop, maybe you don’t want the CPU to be constantly close to 100%, because it drains the battery and the whirring cooling fan is annoying. But maybe you bought such a high end laptop specifically because you want to use that much CPU power and RAM. I certainly did.

My primary application is Visual Studio, which has been making some really good strides in 2017 to be faster. Find all References is pretty fast, Go To Types is pretty fast. Just “Find in Files” could be faster because it still seems to hit the disk. The cost for that? Currently 520 MB RAM usage. I’ll take that. In fact, if I could more speed at the expense of more RAM, I’d take that as well. In fact, I would love for Visual Studio to find a way to reduce the 45 Second build time – as you see in the graph, the CPU only briefly spikes. Why is it not constant 100% when I click the Build button? Is there a way to just have everything needed to compile constantly in RAM? (and yes, I know about RAM disks, and I do have a SSD that does 3 GB/s – but the point is for applications to be more greedy)

Two applications that I run that really make great use of my Computer are SQL Server and Hyper-V. SQL Server is currently sitting at 3.5 GB and will grow to whatever it needs. And Hyper-V will use whatever it needs as well. Both application also do respect my limits if I set them.

But they don’t prematurely limit themselves. Some people are complaining about Spotify’s memory usage. Is that too much for a media player? Depends. I’m not using Spotify, but I use iTunes. Sometimes I just want to play a specific song or Album, or just browse an Artist to find something I’m in the mood for. Have you ever had an application where you scroll a long list and halfway through it lags because it has to load more data? Or where you search/filter and it takes a bit to display the results? I find that infuriating. My music library is only about ~16000 tracks – can I please trade some RAM to make the times that I do interact with it as quick as possible? YMMV, but for me, spending 500 MB on a background app for it to be super-responsive every time I interact with it would be a great tradeoff. Maybe for you, it’s different, but for me, iTunes does stay fast at the expense of my computer resources.

Yeah, some apps may take that too far, or do wrong behaviors like trashing your SSD. Some apps use too much RAM because they’re coded inefficiently, or because there is an actual bug. It should always be a goal to reduce resource usage as much as possible.

But that should just be one of the goals. Another goal should be to maximize performance and productivity. And when your application sees a machine with 8, 16 or even 32 GB RAM, it should maybe ask itself if it should just use some of that for productivity reasons. I’d certainly be willing to trade some of that white space in my task manager for productivity. And when I do need it for Hyper-V or SQL Server, then other apps can start treating RAM like it’s some sort of scarce resource. Or when I want to be in battery-saver mode, prioritizing 8 hours of slow work over 4 hours of fast work.

But right now, giving a couple of hundreds of megs to my Web Browsers and Productivity Apps is a great investment.

Dell TB16 Thunderbolt Dock

It’s been a while since I blogged about my Dell XPS 15 (9550), and one of the remarks there were that the Thunderbolt port was useless because there was no working Dock for it – Dell’s TB15 was a fiasco. (One of two issues with the XPS 15 9550 – the other being that the Wireless 1830 card is a piece of horse dung)

Fast forward a few months and Dell has fixed it by releasing a new Dock, the TB16. I ordered the 240 Watt Version, Dell Part 452-BCNU.

First observation is that the 240 Watt Power Supply is absolutely massive, but relatively flat. Not an issue since it’s stationary anyway, but if you were considering carrying it around, think again. Also, one important note on the difference between USB-C and Thunderbolt 3. They both have the same connector, but Thunderbolt 3 is four times as fast as USB 3.1 Gen2 – 40 GBit/s compared to 10 GBit/s. This is important when driving two high resolution (4k @ 60 Hz) monitors and a bunch of other peripherals. There are plenty of USB-C docks out there, but not really many Thunderbolt 3 docks, hence it’s so important that the TB16 finally arrived.

Update: On the Dell XPS 15 laptop (9550 and 9560), the Thunderbolt 3 port is only connected via two instead of four PCI Express 3.0 lanes, thus limiting the maximum bandwidth to 20 GBit/s. If you were thinking of daisy chaining a potent graphics card (e.g., via a Razer Core) or a fast storage system, that port will be a bottleneck. Though it seems to be useable for 8K video editing, so YMMV.

The promise of Thunderbolt is simple: One single cable from the Notebook to the Dock for absolutely everything. And that’s exactly what it delivers. Let’s look at the ports.

In the front, there are two USB 3.0 Superspeed ports, and a Headset jack that supports TRRS connections (Stereo Headset and Microphone), along with an indicator light if the laptop is charging.

In the back, we have VGA, HDMI 1.4a, Mini and regular DisplayPort 1.2 for up to 4 external monitors. A Gigabit Ethernet port (technically a USB Ethernet card), Two regular USB 2.0 ports, one USB 3.0 Superspeed port, and a Thunderbolt 3 port for daisy chaining or usable as a regular USB-C port. And last but not least, a regular stereo audio jack for speakers and the power input for the 240 Watt power supply.

Update: I played around with the monitor ports, and it seems that my XPS 15 9550 can only drive 3 monitors at any given time, so either internal Laptop Screen + 2 DP Monitors or 2 DP + 1 HDMI or 2 DP + 1 VGA monitor. The User’s Guide says that 3 or 4 monitor configurations should be possible (4 only without the laptop display). I haven’t tried too much though, it seems that 3 monitors would require going down to 30 Hz, which is obviously not an option in a real world scenario. Maybe it’s possible to do, but for my use case, 2 DP + internal laptop screen is good enough. For anything more, some USB DisplayLink graphics card will do the job.

My setup uses a lot of these ports: Two Monitors (2560×1440 @ 60 Hz resolution) via DisplayPort and mini DisplayPort, Gigabit Ethernet, USB 3.0 to a Hub, both Audio Outputs, and occasionally an external USB 3.0 Hard Drive.

It all connects to the laptop with a single cable, just as promised. And that includes charging, so no need for the laptop power supply.

Performance-wise, it’s pretty satisfactory. The Ethernet port gets about 70 MB/s (Update: I’ve seen higher speeds, up to 100 MB/s, so I guess my home server is the bottleneck), and I can make full use of my 300 MBit/s internet connection (humblebrag), something I couldn’t do with Wireless. My external hard drive gets 110 MB/s – since it’s a 2.5″ rotary drive, that might be the limitation of the drive. I haven’t tried with two 4K Monitors, but my 2560×1440 screens have no issues with flickering or else. Copying directly from a network share to the USB 3.0 drive also works fine at the same speed (~70 MB/s) without disturbing the monitors.



Make sure to go to Dell’s support site and download the latest BIOS, Intel Thunderbolt Controller Driver and Intel Thunderbolt 3 Firmware Update, along with the USB Ethernet, USB Audio and ASMedia USB 3.0 Controller drivers. It’s absolutely necessary to do this, since older firmware and driver versions have a lot of issues. FWIW, I’m running 64-Bit Windows 8.1 and everything works perfectly fine – I haven’t tested on Windows 10, but assume it’ll be fine there as well.

It took way too long and required a lot of patches – but now that it’s here and working, I can say that the TB16 is everything Thunderbolt promised. It’s expensive, but if Thunderbolt stops changing connectors with every version, I can see the dock lasting a long time, on many future laptops.

git rebase on pull, and squash on merge

Here’s a few git settings that I prefer to set, now that I actually work on projects with other people and care for a useful history.

git config --global branch.master.mergeoptions "--squash"
This always squashes merges into master. I think that work big enough to require multiple commits should be done in a branch, then squash-merged into master. That way, master becomes a great overview of individual features rather than the nitty-gritty of all the back and forth of a feature.

git config --global pull.rebase true
This always rebases your local commits, or, in plain english: It always puts your local, unpushed commits to the top of the history when you pull changes from the server. I find this useful because if I’m working on something for a while, I can regularly pull in other people’s changes without fracturing my history. Yes, this is history-rewriting, but I care more for a useful rather than a “pure” history.

Combined with git repository hosting (GitLab, GitHub Enterprise, etc.), I found that browsing history is a really useful tool to keep up code changes (especially across timezones), provided that the history is actually useful.

IIS/ASP.net Troubleshooting Cheatsheet

Setting up IIS usually results in some error (403, 500…) at first. Since I run into this a lot and always forget to write down the steps, here’s my cheatsheet now, which I’ll update if I run into additional issues.

Folder Permissions

  • Give IIS_IUSRS Read permission to the folder containing your code.
  • If StaticFile throws a 403 when accessing static files, also give IUSR permission
  • Make sure files aren’t encrypted (Properties => Advanced)

If your code modifies files in that folder (e.g., using the default identity database or logging, etc.), you might need write permissions as well.

Registering ASP.net with IIS

If installing .net Framework after IIS, need to run one of these:

  • Windows 7/8/2012: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\aspnet_regiis.exe -i
  • Windows 10: dism /online /enable-feature /all /featurename:IIS-ASPNET45

Can we build a better console.log?

For a lot of Front End JavaScript work, our browser has become the de-facto IDE thanks to powerful built-in tools in Firefox, Chrome or Edge. However, one area that has seen little improvement over the years is the console.log function.

Nowadays, I might have 8 or 9 different modules in my webpage that all output debug logs when running non-minified versions to help debugging. Additionally, it is also my REPL for some ad-hoc coding. This results in an avalanche of messages that are hard to differentiate. There are a few ways to make things stand out. console.info, console.warn and console.error are highlighted in different ways:
cnsolelevel

Additionaly, there’s console.group/.groupEnd but that requires you to wrap all calls inside calls to .group(name) and .groupEnd()
onsolegroup

The problem is that there is no concept of scope inherent to the logger. For example, it would be useful to create either a scoped logger or pass in a scope name:

console.logScoped("Foo", "My Log Message");
console.warnScoped("Foo", "A Warning, oh noes!");

var scoped = console.createScope("Foo");
scoped.log("My Log Message");
scoped.warn("A Warning, oh noes!");

This would allow Browsers to create different sinks for their logs, e.g., I could have different tabs:
scopes
From there, we can even think about stuff like “Write all log messages with this scope to a file” and then I could browse the site, test out functionality and then check the logfile afterwards.

Of course, there are a few issues to solve (Do we support child scopes? Would we expose sinks/output redirection via some sort of API?), but I think that it’s time to have a look at how to turn the console.log mechanism from printf-style debugging into something a lot closer to a dev logging facility.

Thoughts on ORMs, 2016 Edition

This is a bit of a follow up to my 3 year old post about LINQ 2 SQL and my even older 2011 Thoughts on ORM post. I’ve changed my position several times over the last three years as I learned about issues with different approaches. TL;DR is that for reads, I’d prefer handwritten SQL and Dapper for .Net as a mapper to avoid having to deal with DataReaders. For Inserts and Updates, I’m a bit torn.

I still think that L2S is a vastly better ORM than Entity Framework if you’re OK with solely targeting MS SQL Server, but once you go into even slightly more complex scenarios, you’ll run into issues like the dreaded SELECT N+1 far too easily. This is especially true if you pass around entities through layers of code, because now virtually any part of your code (incl. your View Models or serialization infrastructure) might make a ton of additional SQL calls.

The main problem here isn’t so much detecting the issue (Tools like MiniProfiler or L2S Prof make that easy) – it’s that fixing the issue can result in a massive code refactor. You’d have to break up your EntitySets and potentially create new business objects, which then require a bunch of further refactorings.

My strategy has been this for the past two years:

  1. All Database Code lives in a Repository Class
  2. The Repository exposes Business-objects
  3. After the Repository returns a value, no further DB Queries happen without the repository being called again

All Database Code lives in a Repository Class

I have one or more classes that end in Repository and it’s these classes that implement the database logic. If I need to call the database from my MVC Controller, I need to call a repository. That way, any database queries live in one place, and I can optimize the heck out of calls as long as the inputs and outputs stay the same. I can also add a caching layer right there.

The Repository exposes Business-objects

If I have a specialized business type (say AccountWithBalances), then that’s what the repository exposes. I can write a complex SQL Query that joins a bunch of tables (to get the account balances) and optimize it as much as I want. There are scenarios where I might have multiple repositories (e.g., AccountRepository and TransactionsRepository), in which case I need to make a judgement call: Should I add a “Cross-Entity” method in one of the repositories, or should I go one level higher into a service-layer (which could be the MVC Controller) to orchestrate?

After the Repository returns a value, no further DB Queries happen without the repository being called again

But regardless how I decide on where the AccountWithBalances Getter-method should live, the one thing that I’m not going to do is exposing a hot database object. Sure, it sounds convenient to get an Account and then just do var balance = acc.Transactions.Sum(t => t.Amount); but that will lead to 1am debugging sessions because your application broke down once more than two people hit it at the same time.

At the end, long term maintainability suffers greatly otherwise. It seems that it’s more productive (and it is for simple apps), but once you get a grip on T-SQL you’re writing your SQL Queries anyway, and now you don’t have to worry about inefficient SQL because you can look at the Query Execution Plan and tweak. You’re also not blocked from using optimized features like MERGE, hierarchyid or Windowing Functions. It’s like going from MS SQL Server Lite to the real thing.

So raw ADO.net? Nah. The problem with that is that after you wrote an awesome query, you now have to deal with SqlDataReaders which is not pleasant. Now, if you were using LINQ 2 SQL, it would map to business objects for you, but it’s slow. And I mean prohibitively so. I’ve had an app that queried the database in 200ms, but then took 18 seconds to map that to .net objects. That’s why I started using Dapper (Disclaimer: I work for Stack Overflow, but I used Dapper before I did). It doesn’t generate SQL, but it handles parameters for me and it does the mapping, pretty fast actually.

If you know T-SQL, this is the way I’d recommend going because the long-term maintainability is worth it. And if you don’t know T-SQL, I recommend taking a week or so to learn the basics, because long-term it’s in your best interest to know SQL.

But what about Insert, Update and Delete?

Another thing that requires you to use raw SQL is Deletes. E.g., “delete all accounts who haven’t visited the site in 30 days” can be expressed in SQL, but with an ORM you can fall into the trap of first fetching all those rows and then deleting them one by one, which is just nasty.

But where it gets really complicated is when it comes to foreign key relationships. This is actually a discussion we had internally at Stack Overflow, with good points on either side. Let’s say you have a Bank Account, and you add a new Transaction that also has a new Payee (the person charging you money, e.g., your Mortgage company). The schema would be designed to have a Payees table (Id, Name) and a Transactions table (Id, Date, PayeeId, Amount) with a Foreign Key between Transactions.PayeeId and Payees.Id. In our example, we would have to insert the new Payee first, get their Id and then create a Transaction with that Id. In SQL, I would probably write a Sproc for this so that I can use a Merge statement:

CREATE PROCEDURE dbo.InsertTransaction
	@PayeeName nvarchar(100),
	@Amount decimal(19,4)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @payeeIdTable TABLE (Id INTEGER)
    DECLARE @payeeId INT = NULL

    MERGE dbo.Payees AS TGT
        USING (VALUES (@PayeeName)) AS SRC (Name)
        ON (TGT.Name = SRC.Name)
    WHEN NOT MATCHED BY TARGET THEN
        INSERT(Name) VALUES (SRC.Name)
    WHEN MATCHED THEN
        UPDATE SET @payeeId = TGT.Id
    OUTPUT Inserted.Id INTO @payeeIdTable
    ;

    IF @payeeId IS NULL
        SET @payeeId = (SELECT TOP 1 Id FROM @payeeIdTable)

    INSERT INTO dbo.Transactions (Date, PayeeId, Amount)
    VALUES (GETDATE(), @payeeId, @Amount)
END
GO

The problem is that once you have several Foreign Key relationships, possibly even nested, this can quickly become really complex and hard to get right. In this case, there might be a good reason to use an ORM because built-in object tracking makes this a lot simpler. But this is the perspective of someone who never had performance problems doing INSERT or UPDATE, but plenty of problems doing SELECTs. If your application is INSERT-heavy, hand rolled SQL is the way to go – if only just because all these dynamic SQL queries generated by an ORM don’t play well with SQL Server’s Query Plan Cache (which is another reason to consider stored procedures (sprocs) for Write-heavy applications)

Concluding this, there are good reasons to use ORMs for developer productivity (especially if you don’t know T-SQL well) but for me, I’ll never touch an ORM for SELECTs again if I can avoid it.