Data Access Layer: To LINQ or not to LINQ

Posted by adamjh on Jul 21st, 2007

Over the last several days, our progress meter has leveled off a bit as we've begun to take a step back from rapid prototyping efforts to spend some time thinking through the ways in which components of our system will communicate with each other.

One specific technology, LINQ to SQL (formerly "DLink" and almost certainly to be renamed again before it is released early next year), has particular potential as a drop-in replacement for the traditional data access layer.  DotNetSlackers has a pretty decent introduction, and Kris Vandermotten shares some thoughts on LINQ's implications on the DAL here and here.  He writes:

There is no doubt that Linq to SQL will have an enormous impact on the way we write data access layers. I wouldn’t be surprised to find out that the impact is so profound, that we might even have to reconsider the very nature of a data access layer. In fact, what is a data access layer (DAL) anyway?

Kris's second post concludes that the LINQ/SqlMetal-generated classes might serve as a DAL in and of themselves.  After reading it, I started wondering if/how this might apply to the data access layer we had begun to hack together earlier in the morning.  Jeff and I spent several hours reading, whiteboarding, and discussing whether or not LINQ was ready for "prime time", and if/how we would leverage it in what we're building.

While the idea of not having to design, implement, and maintain a data access layer designed around the needs of our specific system is tempting, a few questions come to mind:

  1. Can LINQ replace the object/relational mapping functionality performed by, and abstraction provided by, the traditional data access layer?
  2. Can instances of the LINQ/SqlMetal-generated easily be passed between components at different tiers?
  3. How does LINQ fit with scaling mechanisms that involve indirection such as data caching and partitioning?
  4. Is there sufficient support for all of the SQL features we're likely to use?
  5. Is it sufficiently performant? (promising early results)

Let's look at question #1.  Traditionally, all of the code that handles queries might have been pushed down into a set of data access methods, libraries, or even services that return data through a set of shared classes or interfaces.  In a simple example, to retrieve information about a user given her email address, I might call a GetUser() method that takes in an email address and returns a User object.  As time progresses, I might realize there is a better way to design my data access layer than exposing hundreds of GetUser()-like methods, and consolidate them by adding more flexible parameters or even by designing a mini-query language myself.  All this data access infrastructure, serving primarily to translate between relational data returned by SQL queries and object data modeled as a set of classes, is time-consuming, and is where I believe LINQ brings significant value.

For example, suppose the component I am working on needs to retrieve information about currently logged-in users.  Using LINQ as a data access mechanism, I could simply code:

        Table<User> users = db.GetTable<User>();

        var q = from u in users
                where u.LoggedIn == true
                select u;
        
        List<User> loggedInUsers = new List<User>(q);

Now, suppose I decide I want to display all the photos of all the currently logged-in users.  Again, using LINQ, with no design/implementation/management overhead around this specific retrieval which joins user and photo data that happens to reside in different tables, I could simply tack-on the following to the above code:

        foreach (var user in loggedInUsers)
        {
            foreach (var photo in user.Photos)
            {
                DisplayPhoto(photo.Data);
            }
        }

In cases where certain specific queries, even as expressed in C#/.Net syntax, are complex and might be reused across the codebase, it still might make sense to extract them out (in their LINQ form).  But the majority overhead of designing, implementing, and maintaining a traditional data access layer is still effectively replaced by the value of LINQ.  So, to sum up my thoughts around question #1, I'd at least say that I'd be willing to explore moving away from the traditional notion of abstracting away queries into a data access "layer", given the value that the LINQ constructs bring to the language/framework.

While question #1 was mostly about buying into vision/value, questions #2-5 are where the rubber hits the road, and are where limitations begin to arise that make us question whether or not LINQ version 1 will meet our needs, as follows:

First, if we are no longer constructing our own set of classes to model data and data retrieval methods to instantiate/populate them, we need to be able to pass LINQ and SqlMetal-generated objects around our n-tier application.  The DataContext is not serializable nor intended to be passed between tiers, and while there is an attach mechanism, support in general for these n-tier scenarios seems to be limited.  Showstopper.

Second, I haven't been able to dig up any strategies or support for introducing indirection into LINQ queries.  If our data is partitioned across many databases, or we need to introduce a cache, I'm unsure how to do this other than wrapping LINQ with well, a data access layer!

Third, I can't seem to find any answers to whether or not LINQ will support some of the newer SQL features like Spatial types/queries, again implying the need for some sort of parallel data access mechanism.

So, to wrap this post up... after roughly a day of investigation, it's safe to say that we've come to the conclusion that while LINQ is a promising technology, and could potentially replace much of the overhead in designing, implementing, and maintaining a data access layer in the future, it's no silver bullet today.  We'll likely incorporate it into our system in some capacity (i.e. as syntactic sugar for querying SQL), but it will likely remain abstracted by a set of data access mechanisms that will still be required for any data that is likely to be cached, partitioned, or accessed by multiple components/tiers now or in the future.