Lazy-loading in an ORM considered harmful?

One of the features that looks insanely cool in many ORMs is Lazy loading of properties. Say you have an "Order" table that stores orders of a customer. Inside your Order business object you have a list of Line Items which is linked to the Products table and you have the customer record, which is linked to the Customer table.

If you want to display a list of Orders, you don’t touch line items, and thus only fetch entries from the Orders table. But in the details page of the order, you fetch the products as well. The cool thing: In your repository, it is only a generic Get/GetAll method to get it.

How awesome is this? Lazy loading is amazing, it makes the code so much shorter! Except… it’s a ticking time bomb. If you return a live database object from the repository into your services and views, someone at some point will think "Hey, we should display the order total as well!" and change the code to put in a Orders.LineItems.Sum(li => li.Total). Well, congratulations, you just introduced a SELECT N+1 into your code!

If you are not careful and just chain lazy collections (e.g., a Product may have a List that contains all orders the product is part of.), you may end up fetching the full database. Several times over. For every line in your list. Of course, you can tell your front-end developers to be careful, and of course you should have monitoring (like MVC Mini Profiler) from the very start so that you can catch it early. However, it is also a philosophy to fall into the pit of success by default and really have to climb out of it.

I’m no longer using generic repositories with generic Get/GetAll methods. I have a specific OrderRepository that has a method for IList<order> GetOrderDetails(IEnumerable<int> orderIds) which only populates fields it needs and throws an exception on stuff that doesn’t exist (like Products) or even returns a special type like OrderDetails, and then a GetOrder method that has the full object. Note that I’m returning an IList, not an IQueryable or IEnumerable – by the time it leaves the repository, it is fully populated and disconnected from the database.

Sure, sometimes you end up with a Service method that is literally just a call to the repository, but you still separate business logic into the service layer vs. persistance logic into the repository layer.

Jekyll and GitHub

A while ago I wanted to setup a second blog for some fiction. I could have set up another blog at WordPress or hosted it locally, but I didn’t want all those dependencies. Instead, I went with Jekyll, which generates static HTML files from a directory structure. So no database, no heavy server backend, just a thing serving static files.

When it came to hosting, I noticed that GitHub now offers Pages, which is pretty amazing since they even support DNS CNAMEs, which means I can take my domain and just create an A record in DNS that points at GitHub. Unfortunately, the documentation on GitHub isn’t too great, so here’s the recap:

  • If you don’t use their page builder tool but prefer manual pages, create a new branch in your git repository, called gh-pages
  • To this repository, you can upload your Jekyll hierarchy and they build it automatically for you. If you do this the first time, it may take some time to show up. GitHub says 10 minutes, but it took an hour for me. After the initial push is up though, further pushes are published instantly
  • Even if your repository is private, published pages will be public
  • The downside of uploading Jekyll repositories is that GitHub doesn’t run plugins

The last point was an issue for me: I have plugins to build Tag/Category pages and to minify CSS, and GitHub won’t run them. Okay, no biggie, I’ll just build them locally. Just check out the gh-pages branch, Jekyll into it, and commit changes, right?

Well, Jekyll unfortunately cleans out the target directory when it builds. That actually makes sense, since we don’t want any old garbage in there, but it unfortunately also deletes the .git folder.

After searching a bit, I found that git has an amazing feature: You can specify –work-tree to tell git that the Working Directory is in a different folder than the .git folder. I don’t know why they put in that feature as it seems so niche, but I would like to express sincere gratitude, it saved me here.

I structured my git repository like this:

  • master branch contains the Jekyll site and is checked out to ~/MySiteSource
  • gh-pages branch starts out empty and contains the built site. It is checked out to ~/MySite/_site, but I moved the .git folder to ~/MySite

Inside the ~/MySite directory (which is NOT part of the git repository, but contains the .git folder for gh-pages), I have a little shell script (or batch file if you’re on Windows):

cd ../MySiteSource
jekyll --no-auto
cd ../MySite
git --work-tree=./_site add .
git --work-tree=./_site commit -m "built site"
git --work-tree=./_site push origin gh-pages

Simple yet effective. So we go into MySiteSource, tell jekyll to build the site (_config.yml sets the destination to ../MySite/_site), then go back to MySite and use the –work-tree option to commit the gh-pages branch.