Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

LINQ Challenges and SQL Server Compact Edition

4.81/5 (40 votes)
25 Mar 2008CPOL22 min read 1   1.1K  
Overcoming challenges with LINQ to SQL and using LINQ with SQL Server Compact Edition.

Compact LINQ

Contents

Introduction

Developers meet challenges all the time. It is a part of what we do. One of the most common challenges a developer faces is managing data. There are times when the data can be stored via a document format. However, a lot of us deal with data that has individuality, character, relationships…. You get the personified pun picture, I'm sure. Of course, I'm referring to relational data.

Relational data is so common in applications that the variety of choices we have to store it, organize it, query it, manipulate it, and so on, is astounding. Microsoft's own SQL Server 2005 product comes in six editions! Each provider has a slight variation on the SQL ANSI standard too. The point is, with so many options to choose from, it's no wonder why many developers put the majority of their effort into the "data access layer". By this, I am referring to the libraries, services, and/or code fragments written to communicate with the storage engine used for an application's data, regardless of how developmentally incorrect the scheme may be. That data is pointless just sitting in storage. We have to exchange it in and out with our applications for it to have any real use.

LINQ to SQL

Enter LINQ to SQL (originally DLINQ). Now we can query all sorts of data with a language we are already familiar with (currently C# or VB). I was very excited when I heard about the release of LINQ (Language-Integrated Query), especially with regards to SQL. The potential to reduce code and create a system of consistency is invaluable to all developers. The beauty of LINQ is amplified by the latest enhancements in Visual Studio 2008.

LINQ also features an object-relational mapping (ORM) tool to make life even easier. We can point the tool at a database and enjoy cleanly generated code bliss. The code that it generates for us is extremely easy to extend, allowing us to write business logic, validation, helper methods, complex transaction scenarios, and more. I have been using object-relational mapping (ORM) for some time now, and have struggled with several of the free implementations available. There are certainly some very well written ORM tools out there, but having one at your fingertips, inside the Framework, is next to greatness.

Couple that with all the other new features added to Visual Studio 2008, and LINQ is a welcome addition to the family. There are some new challenges with LINQ, but that should be expected. We're developers after all.

Grounds for Determination (The Sample)

To begin discussing the challenges we can expect with LINQ to SQL, we need to establish a sample that I can use to demonstrate through. As developers, we have a need to quantify our output thoroughly. Some of us are independent contractors, some own consulting/development companies, others work for those companies. Regardless, there is a need to report the time we spend working on projects in order to accurately bill "the client". Therefore, what better sample for us to experiment with LINQ to SQL than with an application that tracks our time spent working.

This application will be different than the many out there already that allow you to input your time for projects. Our sample application is meant to actually track hours in real-time. I'd like it to be usable by non-developers as well. Any professional that is onsite or in the field with a laptop could benefit from clicking a couple of menus in order to get detailed time tracking. I even want the option to tell the program to detect when I am away from the computer, so it can pause and deduct that "away time" from the duration. That all said, I think it best we create a desktop application.

I also feel that this is a perfect opportunity to use the SQL Server Compact Edition as our data storage engine. We could certainly use XML or even a flat text file, but I want the ability to quickly query sets of data for aggregate information and more. Why not use SQL Server Express Edition instead? Well, it would be really nice if the application could install without any system dependencies other than the obvious; the .NET Framework 3.5 Express Edition cannot be embedded into our application, and requires more privileges to install than our user may have. This is especially true in a work environment.

For more information on choosing between SQL Server Compact and Express Editions, see the article and whitepaper published by Microsoft.

Also, please note that the sample application is, by no means, complete. It is only meant as an example scenario that leads us to a decision to use LINQ to SQL with SQL Server Compact Edition 3.5. The code available for download is a bare-bones application that showcases the topics discussed in the remainder of this article.

LINQ and SQL Server Compact Edition

One of the challenges encountered by developers looking to get acquainted with LINQ to SQL, is how to use it with SQL Server Compact Edition (SSCE), version 3.5. We were told LINQ will work with SQL Server for now, with more providers on the way (possibly not until LINQ to Entities is released). However, if you try to drag some tables from a Compact Edition data connection onto a new LINQ to SQL designer canvas, you'll witness a nasty error dialog stating that the provider is not supported!

Unsupported Data Provider

Figure 1: The dreaded "unsupported data provider" error in the LINQ to SQL designer.

More accurately, the SSCE provider is not supported by the LINQ to SQL designer. You can still use the command-line tool, SQLMetal, to generate your data entities, data access, and other ORM code. If you prefer command-line tools, you're all set. If you like the idea of a visual representation of your entities, there is still hope.

I recommend creating a batch file or PowerShell script to generate your file(s) via SQLMetal. This allows you the benefit of a quick execution when you inevitably need to re-generate your data access layer due to schema changes. I've included one for download, as an example.

For our sample application, the command needed is fairly simple.

SqlMetal.exe TimeApp.sdf /dbml:TimeApp.dbml /namespace:TimeApp.DataAccess /pluralize

Note: The SQLMetal tool is located, by default, on your primary drive at: Program Files\Microsoft SDKs\Windows\V6.0A\Bin\SqlMetal.exe.

Notice that I've specified a few options for the namespace of the generated code, to pluralize the entity class names and to generate a DBML file. The generated DBML file is extremely important for those of you who want the visual designer support. With it, you can make minor changes that occur in your schema via the designer, or you can choose to edit the DBML file itself. It's just XML, so feel free to dive in. Once you add the file to your project, Visual Studio 2008 will automatically generate the corresponding code for your data access layer.

DBML Designer Diagram

Figure 2: The time tracking application's SSCE DBML file in the LINQ to SQL designer.

You can see the very simple layout of tables in the LINQ to SQL designer in Figure 2. The sample time tracking application will allow the user to select a project, and optionally a task, in order to begin tracking time. One note: the project and task tables are practically identical, and could have been combined into a single table with a parent field to allow for hierarchy; however, I chose to separate them for this example, so you can see some of the challenges you may encounter with similar designs.

Working with Enumerations in LINQ to SQL

There are plenty of times when a simple numeric column with a constraint, or business rule, will do instead of a foreign key to a lookup table. Lookup tables are great if the values will change often or if the values need to be changed by the user. In other cases, we, the developers, determine the exact domain of such a field. We still want to offer the user the ability to select one of the possible values, but we control those possibilities entirely. This is a perfect opportunity to use an enumeration in the application to represent those field values.

So, what's the problem? When the ORM generates the entities that represent your database's tables, it represents these numeric fields with numeric properties. When writing the user interface, we could translate the user's selections to the appropriate numbers. If we want to use an enumeration instead, we still have to translate the enumeration values to the corresponding numeric values. Your first instinct may be to use the power provided by the ORM to create a new code file with a partial class that includes a new property that handles these translations.

C#:
C#
public enum EstimationComparison : byte {
    Overall = 1,
    Repeating = 2
}

public partial class Project {

    public EstimationComparison EstimationComparison {
        get { return (EstimationComparison)this.EstimationComparisonType; }
        set {
            if (Enum.IsDefined(typeof(EstimationComparison), value))
                this.EstimationComparisonType = (byte)value;
        }
    }
}
VB:
VB
Public Enum EstimationComparison As Byte
    Overall = 1
    Repeating = 2
End Enum

Partial Public Class Project

    Public Property EstimationComparison() As EstimationComparison
        Get
            Return CType(Me.EstimationComparisonType, EstimationComparison)
        End Get
        Set(ByVal value As EstimationComparison)
            If [Enum].IsDefined(GetType(EstimationComparison), value) Then
                Me.EstimationComparisonType = CType(value, Byte)
            End If
        End Set
    End Property

End Class
Listing 1: This code extends the Project entity to include a new property. The new property translates the existing numeric property (EstimationComparisonType) value to and from an enumeration (EstimationComparison).

This is good, and makes use of the tools we're given by the ORM for LINQ. However, we can do better than that. We can actually tell the ORM to use a different type for the property, and it will handle the conversions for us. As long as the type we inform it to use is compatible with the data type used to store the values in the database, there will be no problems. This can be accomplished in a couple of ways. One is to edit the DBML file manually and use SQLMetal to regenerate the code from the edited file. The other way is to open the DBML in the designer and change the type of the entity's property via the property page. Since we also want this field to be Nullable, we have to make sure to set the property type accordingly.

Setting the Type of a Property to an Enumeration

Figure 3: Change the Type of the property for an entity to anything compatible with the Server Data Type. For this example, I've changed the Type to a Nullable enumeration. If you are using C#, you can set this to EstimationComparisonType? as a shortcut.

Once this change is made, we can make a direct assignment to the EstimationComparisonType property of the Project entity with one of the enumeration values. The Framework takes care of the conversion.

C#:
C#
Project Project = DataContext.Projects.Single(P => P.ProjectId == 1);
Project.EstimationComparisonType = EstimationComparisonType.Overall;
VB:
VB
Dim Project As Proejct = DataContext.Projects.Single(Function(P) P.ProjectId = 1)
Project.EstimationComparisonType = EstimationComparisonType.Overall
Listing 2: We can make direct enumeration assignments to the property changed in Figure 3.

Field Subset Challenges with LINQ to SQL

One thing we should have learned early in our database development careers is to restrict the set of data we retrieve to just what we need and nothing more. That means, we should limit our rows via a well-thought "where" clause and our columns via an actual column list in the "select" clause of our SQL statements.

SQL
-- No restrictions at all.
select *
from Project
order by [Name]

-- Limit the columns to just those needed.
-- Limit the rows as needed.
select ProjectId, [Name]
from Project
where IsActive = 1
order by [Name]
Listing 3: The first query is quick to write, but is not very efficient. The second query forces us to think about what is actually needed. For a list of products in a simple list control, the second query is much better for our needs.

There are two different queries of the Project table in Listing 3. If our intent for this data is to fill a simple list control, we do not need all the fields in the table. All we really care about for our list is the primary key and the textual representation of each record. The second query is certainly more ideal.

In beta versions of LINQ to SQL, we were able to force which fields were loaded into the entity object we were retrieving. It turns out, this was a bug. To avoid confusion, this so-called "feature" was never meant to be. The thinking is, if a developer calls on data from a library, layer, or other code written by another developer, there would be mass confusion if that retrieved entity has missing data in the properties.

For example, if I call a method from a library that another developer wrote that retrieves our project list and only gets ProjectId and Name, I could become very lost when I try to access the EstimatedDuration property. Things would be bad enough if a field is not nullable and has uninitialized data, but imagine the EstimatedDuration property which is nullable. I may presume that there is no such value in the database, when in fact, there may be. I have no way of knowing based on the Product entity I received.

Therefore, when LINQ to SQL was released to manufacturing (out of beta to release), this kind of thing is no longer allowed. I still need the ability to retrieve just some fields from a table though, and I can. I can use anonymous types locally, but if I want to pass the results around, I need to create a new class to hold just those values I desire and then use a new instance of the class when retrieving the data.

C#:
C#
public class ProjectSummary {
    public int ProjectId { get; set; }
    public string Name { get; set; }
    public List<TaskSummary> Tasks { get; set; }
}
VB:
VB
Public Class ProjectSummary
    Private _projectId As Integer
    Private _name As String
    Private _tasks As List(Of TaskSummary)

    Public Property ProjectId As Integer
        Get
            Return _projectId
        End Get
        Set(ByVal value As Integer)
            _projectId = value
        End Set
    End Property

    Public Property Name As String
        Get
            Return _name
        End Get
        Set(ByVal value As String)
            _name = value
        End Set
    End Property

    Public Property Tasks As List(Of TaskSummary)
        Get
            Return _tasks
        End Get
        Set(ByVal value As List(Of TaskSummary))
            _tasks = value
        End Set
    End Property
End Class
Listing 4: A ProjectSummary class is created to hold a subset of the field values from the table.
C#:
C#
var ProjectList = from P in DataContext.Projects
                  orderby P.Name
                  select new ProjectSummary {
                      ProjectId = P.ProjectId,
                      Name = P.Name
                  };
VB:
VB
Dim ProjectList = From P In DataContext.Projects _
                  Order By P.Name _
                  Select ProjectSummary = New With { _
                      .ProjectId = P.ProjectId, _
                      .Name = P.Name _
                  }
Listing 5: The LINQ query that uses our new ProjectSummary class to hold the fields we want.

When we create a new class to hold just the fields we want, it is fairly easy to adjust our LINQ syntax to get the data into a new instance of our class. In our example, Listing 4 declares a new ProjectSummary class with just the ProjectId and Name properties, and Listing 5 shows the query to fill a new instance of ProjectSummary with the appropriate data.

Change Tracking Challenges with LINQ to SQL

Change tracking with LINQ to SQL can be quite challenging in certain cases; especially, if you are just starting out with the new technology. Let's take a look at such a case with the sample time tracking application.

Sample Projects Window

Figure 4: The "Projects" window" loads the projects from the database and binds them to the list box.

The "Projects" window retrieves a summary list of the current projects from the database via the generated data context. The fetched project list is bound directly to the list box seen in Figure 4. When the user presses the "Edit" button, the entire Project is retrieved and passed to the "Edit Project" window.

Sample Edit Project Window

Figure 5: The "Edit Project" window uses an entity object instance passed to it to bind directly to the controls, with the exception of Tasks.

The "Edit Project" window receives an instance of a Project entity object and binds all its properties to the controls on the form, except for the Tasks child list. If we bind the Project.Tasks list to the list box, we will have no way to undo changes made to that list if the user chooses to "Cancel" the dialog box.

Despite the fact that the data context can track object changes in order to determine what has been modified, there is no public method we can call to "undo changes" made thus far. There are a few solutions to this drawback.

Firstly, you can instruct the ORM to inherit the entity classes from a base class of your choice. It is within your power to create a base class that supports deep cloning. With this ability, you could clone the original state of the object when any property is changed. Furthermore, you could add a method to undo the changes made to the object that will revert it back to the original values. However, implementing deep cloning is a bit intensive, and would certainly result in performance loss. This would be compounded with each child object depth. Of course, you could implement a maximum depth, and allow it to be specified on a case-by-case basis. Good luck with this one!

Alternatively, you can take the approach of flagging objects for modification and/or deletion. Our example application is most concerned with undoing deleted tasks. We could initialize a separate list of the project's tasks. That way, when the user presses the "Delete" button for a task, the Task object is "marked" for deletion and removed from our separate list (which is bound to the list box). This will leave the object in the original Project.Tasks list, but remove it from the list used to display the tasks in the list box. If the user clicks the "OK" button of the "Edit Project" dialog, we can delete those tasks that have been marked for deletion before submitting changes to the project.

Disconnected Challenges with LINQ to SQL

Another issue some developers encounter involves disconnected scenarios. Suppose you retrieve an entity or collection of entities via a LINQ to SQL data context in a separate physical layer. In a case like that, the original data context used to retrieve the data will cease to exist by the time it travels to the user interface layer and back again for updates. There is some support for such a very common architecture.

If you prefer to use the generated entity classes that the ORM built, you can pass instances of those between your layers and get results. Beware of the limitations though. You can re-attach the objects to a new data context for updates. However, the Attach method of a context's Table instance does not support a deep object graph. In fact, it only supports itself. That is, when you attach an object, only that object is attached. If your object has child objects related to it via foreign keys, those objects will not be attached. You must attach the child objects individually in order for them to participate in an update.

LINQ to SQL usually uses the original state of an object to maintain optimistic concurrency control during updates. Since object tracking is severed during the disconnected scenario, you will have to provide that original object, or an alternative, when re-attaching the object for updates. The re-attachment must receive the original object in addition to the modified one, or your entity must have a version member. A version member is typically defined as a timestamp field in your database, and you will definitely want to flag the generated property as read only.

These requirements are necessary for LINQ to SQL to determine if the data you wish to update has already been modified by another user. Keep all of this in mind if you wish to pass the generated entity objects between your layers as opposed to messages.

Pre-fetching Challenges with LINQ to SQL

LINQ to SQL has many great performance tweaks in-place. A lot of thought went into the way data is retrieved. If you take a moment to examine the SQL that is generated while in debug mode or via the data context's Log property, you will see evidence of the care put into the statements.

Another performance feature is deferred loading, or "lazy loading". Deferred loading prevents certain data from being retrieved before it is actually needed. There is an option available per property on your entities for "delay loading" the property value. However, all child objects default to this behavior. In our example application, the Project entity has a property for the Tasks defined under it. When a project is retrieved via LINQ to SQL, only the data from the corresponding table is retrieved. If, however, I access the Tasks property, another query is issued to the database to retrieve those tasks. This is by design to prevent unnecessary load on the database for "just in case" scenarios. However, there are times when we know that we want the child objects to be loaded.

The time tracking application displays a menu on its notification icon (system tray icon) containing all the projects in the database. Each project menu item also has a list of child menu items for each associated task. Knowing that we need to build these menus ahead of time affords us the vision to pre-fetch all necessary tasks at the same time we get all the projects. If we can instruct LINQ to do all of this in one query, we can save some performance from multiple queries (1 query instead of 1 for the projects, plus 1 for each project to get the tasks).

We will use a feature of LINQ to SQL to achieve this "eager loading". The data context has a property named LoadOptions that accepts an instance of DataLoadOptions. This is how we specify that we want to go ahead and load one or more related child entities in one query.

C#:
C#
// Pre-fetch the tasks for each project.
DataLoadOptions Options = new DataLoadOptions();
Options.LoadWith<Project>(P => P.Tasks);
DataContext.LoadOptions = Options;

var ProjectList = from P in DataContext.Projects
                  orderby P.Name
                  select new ProjectSummary {
                      ProjectId = P.ProjectId,
                      Name = P.Name,
                      Tasks = (
                          from T in P.Tasks
                          orderby T.Name
                          select new TaskSummary {
                              TaskId = T.TaskId,
                              Name = T.Name
                          }
                      ).ToList()
                  };
VB:
VB
' Pre-fetch the tasks for each project.
Dim Options As New DataLoadOptions()
Options.LoadWith(Of Project)(Function(P) P.Tasks)
DataContext.LoadOptions = Options

Dim ProjectList = From P In DataContext.Projects _
                  Order By P.Name _
                  Select ProjectSummary = New With { _
                      .ProjectId = P.ProjectId, _
                      .Name = P.Name, _
                      .Tasks = ( _
                          From T In P.Tasks _
                          Order By T.Name _
                          Select TaskSummary = New With { _
                              .TaskId = T.TaskId, _
                              .Name = T.Name _
                          } _
                       ).ToList() _
                   }
Listing 6: Deferred loading can be turned off for select child entities via the LoadOptions property of the data context. In this case, we choose to pre-fetch the child Tasks of each Project retrieved in the query.

This is a truly great feature! However, when we examine the generated SQL after making this setting, there is a potential performance problem. Notice in Figure 6 how the SQL that LINQ generates uses a join to get the data from both the Project and Task tables.

SQL Server Query Visualizer

Figure 6: Inspecting the generated SQL via the SQL Server Query Visualizer, we can see the statement uses a left outer join to retrieve both Projects and Tasks at once.

My initial response to seeing the above generated SQL was a bit of surprise. My reason for surprise is due to the data retrieved. Each field selected from the Project table will be duplicated for each Task row retrieved.

Duplicated Project Data

Figure 7: When multiple Tasks exist for a Project, all the fields selected from the Project are duplicated for each Task also retrieved.

I expected to see two queries: one to retrieve the necessary projects and another to retrieve the matching tasks for those projects (via a correlated subquery if a "where" clause existed for the project query). I expected the results of the two queries would then be used to map tasks to projects when building the object graph.

Not being one to just accept such results, I e-mailed Scott Guthrie, a general manager at Microsoft who has written several blog posts about LINQ. I asked him why this behavior is found in LINQ for pre-fetching related entities. Scott deferred the question to Dinesh Kulkarni, a program manager on the LINQ project. Dinesh works specifically on LINQ to SQL. His response summarized the many options his team considered with regards to performance and eager loading, or pre-fetching. The team apparently ran several tests on different sets of data of varying sizes. Some solutions worked well on some data sets and not on others. After much thought, they opted for the current solution, even though he expressed their unhappiness with the performance.

Dinesh also referenced a detailed blog post by the senior performance architect, Rico Mariani. I highly recommend reading the blog post when you have some free time. The post does touch on some of these very issues of performance.

After reading Dinesh's reply and the blog post, I came to the conclusion that there is no "silver bullet". I can absolutely see times where multiple queries would not yield good performance due to correlated subqueries. Imagine a parent query that has some complicated filters in the "where" clause. Each child query that is pre-fetched will have to have a subquery that correlates it with the parent query with the same complicated filters. That kind of behavior would compound the complexity greatly in those situations.

Other Points of Interest

It is great that you can respond to certain events inside your entities. While exploring some of these options, I experimented with the idea of setting some default values for a new object when setting one of its properties to a related entity instance. This worked surprisingly well.

C#:
C#
public partial class Task {

    partial void OnCreated() {
        // Subscribe to the PropertyChanged event when created.
        this.PropertyChanged += new 
          System.ComponentModel.PropertyChangedEventHandler(Task_PropertyChanged);
    }

    void Task_PropertyChanged(object sender, 
              System.ComponentModel.PropertyChangedEventArgs e) {
        if (e.PropertyName == "Project") {
            if (Project != null && TaskId <= 0) {
                // Set the defaults of this new Task based on the assigned Project.
                EstimationComparisonType = Project.EstimationComparisonType;
                DetectAway = Project.DetectAway;
                IsBillable = Project.IsBillable;
            }
        }
    }
}
VB:
VB
Partial Public Class Task

    Private Sub OnCreated()
        ' Subscribe to the PropertyChanged event when created.
        AddHandler Me.PropertyChanged, AddressOf Task_PropertyChanged
    End Sub

    Sub Task_PropertyChanged(sender As Object, 
             e As System.ComponentModel.PropertyChangedEventArgs)
        If e.PropertyName = "Project"
            If Not Project Is Nothing AndAlso TaskId <= 0
                ' Set the defaults of this new Task based on the assigned Project.
                EstimationComparisonType = Project.EstimationComparisonType
                DetectAway = Project.DetectAway
                IsBillable = Project.IsBillable
            End If
        End If
    End Sub

End Class
Listing 7: Setting default properties of the Task entity instance when it is assigned a parent Project value.

The code in Listing 7 will be executed if a new instance of a Task object is assigned an existing Project value via its related property. In such a case, the Task will have some of its properties set to defaults from the parent Project object.

One other thing to look out for when working with LINQ to SQL is a very obscure error that I received a couple of times. This error is more prone to occur when you edit the schema of your database without updating the DBML file. I received a ChangeConflictException with the message, "Row not found or changed", several times while running the sample application. It turned out to be something very simple. I changed the sample schema to allow nulls for a couple of fields and forgot to update the DBML file for my LINQ entities. All I had to do to fix this was to indicate that nulls were allowed for the corresponding entity property. In the case of my custom enumeration property, I also had to use the System.Nullable generic type when specifying the enumeration type. I expected the ORM, or DBML designer, to imply the Nullable type when I set the property's Nullable flag to "True". However, it did not, so I had to set the property's Type as indicated earlier, "System.Nullable<EstimationComparisonType>".

Summing Up

In a field filled with acronyms, one of the newest to the .NET Framework is extremely powerful. LINQ offers us a simple interface to our data. Despite the challenges it creates, it can be a very valuable tool in a developer's "kit". Using LINQ to SQL with SQL Server Compact Edition should certainly speed up the development process for your next application requiring embedded data. All you need is a little knowledge on making it work best and some free time to give it a shot.

Good luck and have fun!

References and Resources

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)