Is the SharePoint Object Model too weak for excellent Applications?

I'm doing SharePoint since about 3 years now, starting with SharePoint 2007 and moving into 2010 in November when the Beta was released. While I can't say that I'm in love with the development experience, I do think it's a very capable product for the users. During the years, I've learned many of the quirks and tricks of SharePoint and despite it's many little issues, I liked SharePoint 2007 development.

SharePoint 2010 added a ton of new Features, including a separation into Service Applications (replacing the SSP) and many new Social Features like tagging and commenting. Also, the Development tools radically improved. However, I think that the Object Model didn't scale well over the years. I think it's downright broken in SharePoint 2010 and that the next version of SharePoint needs a completely new Object Model with properly separated APIs/Modules.

Why? Here are some of the big and small issues I've encountered while I wanted to do some really simple things.

SPMetal doesn't generate all field Types
Create a new List in SharePoint, then enable the Managed Metadata and Enterprise Keywords option on the List or add a new Field to the List which is a Managed Metadata column. Run SPMetal against the list. Look how the Proxy doesn't have the Keyword field.

It's bad enough that the generated proxy is unnecessarily fragile, but not supporting all of the *built-in* field types makes it unusable for all but the most simple queries. Granted, Managed Keywords is a separate Feature of Standard/Enterprise, but it's still an official Microsoft out of the box feature.

Querying Managed Keywords through CAML is only possible by Name, not by ID
Let's say you want to query the List and get all items that have two different keywords. As Keywords can have the same name (if they are in different paths of the Term Set), I thought it would be best to query with the Guid of the Term.

Well, turns out you can't. You can query by their WssId though (which is an int that seems to be assigned uniquely on each Site Collection) by adding the LookupId="TRUE" attribute. However, try to chain two queries together with an <And> or by adding the value twice. What happens? You get all items that have any Keyword. It's an Or, not And.

The only way is to Query by the Title and hope you never have duplicate titles or that you can enable the Full Path option on the Field. Now, for the standard Keywords this isn't that much of a problem as they don't have a hierarchy and therefore no duplicates, but you aren't always that lucky.

This is a deeper issue though, it's a problem with the LookupMulti field from which the Multi-Taxonomy Field inherits.

No way to query the User Profile store effectively
This is something that is easy in pretty much every CMS on the market: Give me a distinct list of departments in the company. If the user profile store has a field that holds the department, it's literally a SELECT DISTINCT(Department) from USERS ORDER BY Department.

In SharePoint, there is no way to do that. You could query the User Information list on the main site collection, but that may not contain all users. You can query Active Directory directly, but what's the point of the User Profile store in SharePoint then?

If you want to get a list of all departments, you get a UserProfileManager and loop through all profiles, then fill a List<string> or HashSet. This is slow and resource intensive.

No way to get some tag statistics efficiently
Another really simple scenario: I have a list that contains a lot of items. Users can use the Tags & Notes feature of SharePoint to tag items. We want to get statistics: Give me the top 10 tags that start with 'su' and how often they were used on this list.

The first problem is obviously that SharePoint stores social tags against an exact URL. And when I mean exact URL, then I mean:

  • Alternate Access Mappings are not supported. If one person uses http://internalportal and another person uses https://portal.internal.example.com, they won't see each other's tags and notes
  • If you have a Ribbon open and your URL has the "InitialTabID=...." QueryString, then your Tags and Notes will not be visible to other people who don't have it

So that's bad enough. But what about getting the statistics? Using SQL, such a feature is developed in 5 minutes since all you need is this query:

SELECT TOP 10 InputTermLabel, COUNT(InputTermLabel) AS Count
FROM dbo.SocialTags
WHERE InputTermLabel LIKE 'su%' AND UrlID IN (
SELECT UrlID
FROM dbo.Urls
WHERE Url like 'http://myportal/Lists/MyList/DispForm.aspx%')
GROUP BY InputTermLabel
ORDER BY Count DESC

Using the Object Model, it is impossible to do this effectively. The closest I got is this:

var result = new Dictionary<string, int>();
var baseUrl = "http://myportal/Lists/MyList/DispForm.aspx";
var stm = new SocialTagManager(SPServiceContext.GetContext(site));
var terms = stm.GetAllTerms(new Uri(baseUrl),0);
foreach (var term in terms)
{
    var name = term.Term.Name;
    if (!name.StartsWith("su", StringComparison.InvariantCultureIgnoreCase)) continue;
    var tc = stm.GetUrls(term.Term);
    int usageCount = tc.Count(url => url.AbsoluteUri.StartsWith(baseUrl));
    result[name] = usageCount;
}
result = result.OrderByDescending(kvp => kvp.Value)
             .Take(10).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);

As you may see, this is incredibly inefficient. The SQL Server sends me ALL the terms (at least I can specify a base URL!) and only on the client I can filter the ones starting with 'su'. Then I have to ask for every single Term to get the URLs. If you have a couple hundred tags, this is an incredible resource hog (tons of SQL Queries, a lot of network traffic, high memory usage). That would be the moment to wrap it into a Timer Job, cache it and not have those statistics in Real time.

The last one made me break one of the golden rules with SharePoint
The golden rule is: Never ever directly talk to the SQL Database, always go through to the SharePoint Object Model.

Today, I broke that rule, and I don't feel dirty or guilty. That last point was the final straw. The Object Model simply doesn't cut it anymore, it's weaker than what I can do with some low end PHP CMS Systems, weaker than stuff like XOOPS or PHP-Nuke. The whole social functionality is poorly integrated.

After trying to get Tag-Statistics for 6 hours, I spent 30 Minutes to write a small wrapper: Iterate through all the SPDatabases in the farm, find the SocialDB that belongs to the current Web Application, use Reflection to get the SqlSession property and use Reflection again to call ExecuteReader on it. As a result, I get a nice SqlDataReader back.

No fighting with the Object Model, just plain "I want this, give it to me without killing all my resources".

Obviously, I now need to be careful with Service Packs and Hotfixes. I've set up a little tool that can compare SQL Schemas. That way, whenever a Hotfix/Service Pack gets released I run it to see if any of the Databases/Tables I use changed and adapt my code accordingly. Even if it takes me 15 Minutes to make the changes, Microsoft would have to release 24 Schema-Changing Updates to break even with the 6 hours I wasted today on this.

Granted, I use it strictly for SELECTs and only as a last resort and I wouldn't release code containing it, but as some point I have to get pragmatic about it. I can't spend a month developing something that should only take a week at maximum just because SharePoint doesn't like me.

Sorry for ranting so much, but SharePoint 2010 simply feels like a 10 year old application in places. There is so much stuff that was bad when it was still SharePoint Portal Server 2001 aka. Tahoe (ONET.xml...) and that's worse now that we have nicer technologies.

Why are pretty much all of the collections non-generic? Why is SPListItemCollection still a non-generic Collection of Object and not a List<SPListItem>?
Why are there no standalone SharePoint MSBuild tasks, thus requiring Visual Studio 2010 to be installed on a build server?
Why does creating Content Types through XML Files require me to specify fields twice? Why does creating a List for that Content Type through XML require me to specify the fields a third time?
Why does a product than can cost hundreds of thousands of dollars in licensing still require me to depend on Reflector and looking at the Stored Procedures just to understand simple things?
Why is there no easy way to check for the existence of certain items in some collections without iterating through it completely? Usually, there is only an indexer that throws an Exception if an item isn't found.
Why are so many useful functions and/or classes internal or sometimes even internal sealed? Many collections have internal "GetItem" functions that return null if the item wasn't found, which is great but as said, they are internal. The external functions usually wrap the GetItem call and throw an Exception if null - ARGH!
Why is there no MS Connect site to report issues and ask someone from MS to resolve them in a patch? Oh wait, there is a SharePoint Community Connect Site. It even has some bugs and the occasional Microsoft poster - except that all the real issues are ignored there.
Why are the simple things so hard and the hard things impossible to do?

The weird thing is, I'm not even that unhappy developing for it. As said, I still believe that SharePoint is a good product for the users to use and does so many things right there.
But I also believe that the Object Model needs a huge refresh.

I assume Microsoft wants to make upgrading as painless as possible, seeing how SharePoint is targeted at Enterprises. So very little changes, dragging the old stuff along.
I just hope that the next SharePoint version will be based on the .net 4 CLR and that the breaking changes there (unlike 3.0 and 3.5, .net 4.0 introduced a new CLR) would allow some people at Microsoft to show some bravery by revamping the whole OM and turning SharePoint from an ancient but modern looking product into a modern product.

</rant>

Comments (5)

Eric FangSeptember 6th, 2010 at 07:43

Can not agree more.

I think most of the problems are caused by historical reasons. And not all developers in the sharepoint development team are so brilliant.

[...] ranted about the weak SharePoint Object Model before, but today I think I found the proof that the people working on the API for the Social [...]

ShirasFebruary 3rd, 2011 at 11:05

How do i get all the user profiles in one shot using Silverlight client object model. I tried using UserProfileService.asmx but i could not find which returns me all the profiles. the webservice helps me only retrieving either by index or name one after another which i am not interested as the list is huge. My requirement is to get all the EMails of all the user profiles.

RasmusJune 28th, 2011 at 12:25

Totally agree...

The OM is pretty much just the methods SP needs in order to built their own UI and not really what developers needs to extend it.

This is usual MS policy... everything should be backward compliant. But the OM is such a big beast that I don't think they're gonna change much.

And for the "social features" in 2010. That's just too little and too late so when SP is being sold and the customer asks if SP2010 has web 2.0 features, the salesrep can say "sure, we even got tagging" ;).

MarkusAugust 10th, 2011 at 09:57

Yes, yes and yes again!