Contents
Introduction
Developers meet challenges all the time. It is a part of what we do. One of the most common challenges a developer faces is managing data. There are times when the data can be stored via a document format. However, a lot of us deal with data that has individuality, character, relationships…. You get the personified pun picture, I'm sure. Of course, I'm referring to relational data.
Relational data is so common in applications that the variety of choices we have to store it, organize it, query it, manipulate it, and so on, is astounding. Microsoft's own SQL Server 2005 product comes in six editions! Each provider has a slight variation on the SQL ANSI standard too. The point is, with so many options to choose from, it's no wonder why many developers put the majority of their effort into the "data access layer". By this, I am referring to the libraries, services, and/or code fragments written to communicate with the storage engine used for an application's data, regardless of how developmentally incorrect the scheme may be. That data is pointless just sitting in storage. We have to exchange it in and out with our applications for it to have any real use.
LINQ to SQL
Enter LINQ to SQL (originally DLINQ). Now we can query all sorts of data with a language we are already familiar with (currently C# or VB). I was very excited when I heard about the release of LINQ (Language-Integrated Query), especially with regards to SQL. The potential to reduce code and create a system of consistency is invaluable to all developers. The beauty of LINQ is amplified by the latest enhancements in Visual Studio 2008.
LINQ also features an object-relational mapping (ORM) tool to make life even easier. We can point the tool at a database and enjoy cleanly generated code bliss. The code that it generates for us is extremely easy to extend, allowing us to write business logic, validation, helper methods, complex transaction scenarios, and more. I have been using object-relational mapping (ORM) for some time now, and have struggled with several of the free implementations available. There are certainly some very well written ORM tools out there, but having one at your fingertips, inside the Framework, is next to greatness.
Couple that with all the other new features added to Visual Studio 2008, and LINQ is a welcome addition to the family. There are some new challenges with LINQ, but that should be expected. We're developers after all.
Grounds for Determination (The Sample)
To begin discussing the challenges we can expect with LINQ to SQL, we need to establish a sample that I can use to demonstrate through. As developers, we have a need to quantify our output thoroughly. Some of us are independent contractors, some own consulting/development companies, others work for those companies. Regardless, there is a need to report the time we spend working on projects in order to accurately bill "the client". Therefore, what better sample for us to experiment with LINQ to SQL than with an application that tracks our time spent working.
This application will be different than the many out there already that allow you to input your time for projects. Our sample application is meant to actually track hours in real-time. I'd like it to be usable by non-developers as well. Any professional that is onsite or in the field with a laptop could benefit from clicking a couple of menus in order to get detailed time tracking. I even want the option to tell the program to detect when I am away from the computer, so it can pause and deduct that "away time" from the duration. That all said, I think it best we create a desktop application.
I also feel that this is a perfect opportunity to use the SQL Server Compact Edition as our data storage engine. We could certainly use XML or even a flat text file, but I want the ability to quickly query sets of data for aggregate information and more. Why not use SQL Server Express Edition instead? Well, it would be really nice if the application could install without any system dependencies other than the obvious; the .NET Framework 3.5 Express Edition cannot be embedded into our application, and requires more privileges to install than our user may have. This is especially true in a work environment.
For more information on choosing between SQL Server Compact and Express Editions, see the article and whitepaper published by Microsoft.
Also, please note that the sample application is, by no means, complete. It is only meant as an example scenario that leads us to a decision to use LINQ to SQL with SQL Server Compact Edition 3.5. The code available for download is a bare-bones application that showcases the topics discussed in the remainder of this article.
LINQ and SQL Server Compact Edition
One of the challenges encountered by developers looking to get acquainted with LINQ to SQL, is how to use it with SQL Server Compact Edition (SSCE), version 3.5. We were told LINQ will work with SQL Server for now, with more providers on the way (possibly not until LINQ to Entities is released). However, if you try to drag some tables from a Compact Edition data connection onto a new LINQ to SQL designer canvas, you'll witness a nasty error dialog stating that the provider is not supported!
Figure 1: The dreaded "unsupported data provider" error in the LINQ to SQL designer.
More accurately, the SSCE provider is not supported by the LINQ to SQL designer. You can still use the command-line tool, SQLMetal, to generate your data entities, data access, and other ORM code. If you prefer command-line tools, you're all set. If you like the idea of a visual representation of your entities, there is still hope.
I recommend creating a batch file or PowerShell script to generate your file(s) via SQLMetal. This allows you the benefit of a quick execution when you inevitably need to re-generate your data access layer due to schema changes. I've included one for download, as an example.
For our sample application, the command needed is fairly simple.
SqlMetal.exe TimeApp.sdf /dbml:TimeApp.dbml /namespace:TimeApp.DataAccess /pluralize
Note: The SQLMetal tool is located, by default, on your primary drive at: Program Files\Microsoft SDKs\Windows\V6.0A\Bin\SqlMetal.exe.
Notice that I've specified a few options for the namespace of the generated code, to pluralize the entity class names and to generate a DBML file. The generated DBML file is extremely important for those of you who want the visual designer support. With it, you can make minor changes that occur in your schema via the designer, or you can choose to edit the DBML file itself. It's just XML, so feel free to dive in. Once you add the file to your project, Visual Studio 2008 will automatically generate the corresponding code for your data access layer.
Figure 2: The time tracking application's SSCE DBML file in the LINQ to SQL designer.
You can see the very simple layout of tables in the LINQ to SQL designer in Figure 2. The sample time tracking application will allow the user to select a project, and optionally a task, in order to begin tracking time. One note: the project and task tables are practically identical, and could have been combined into a single table with a parent field to allow for hierarchy; however, I chose to separate them for this example, so you can see some of the challenges you may encounter with similar designs.
Working with Enumerations in LINQ to SQL
There are plenty of times when a simple numeric column with a constraint, or business rule, will do instead of a foreign key to a lookup table. Lookup tables are great if the values will change often or if the values need to be changed by the user. In other cases, we, the developers, determine the exact domain of such a field. We still want to offer the user the ability to select one of the possible values, but we control those possibilities entirely. This is a perfect opportunity to use an enumeration in the application to represent those field values.
So, what's the problem? When the ORM generates the entities that represent your database's tables, it represents these numeric fields with numeric properties. When writing the user interface, we could translate the user's selections to the appropriate numbers. If we want to use an enumeration instead, we still have to translate the enumeration values to the corresponding numeric values. Your first instinct may be to use the power provided by the ORM to create a new code file with a partial class that includes a new property that handles these translations.
C#:
public enum EstimationComparison : byte {
Overall = 1,
Repeating = 2
}
public partial class Project {
public EstimationComparison EstimationComparison {
get { return (EstimationComparison)this.EstimationComparisonType; }
set {
if (Enum.IsDefined(typeof(EstimationComparison), value))
this.EstimationComparisonType = (byte)value;
}
}
}
VB:
Public Enum EstimationComparison As Byte
Overall = 1
Repeating = 2
End Enum
Partial Public Class Project
Public Property EstimationComparison() As EstimationComparison
Get
Return CType(Me.EstimationComparisonType, EstimationComparison)
End Get
Set(ByVal value As EstimationComparison)
If [Enum].IsDefined(GetType(EstimationComparison), value) Then
Me.EstimationComparisonType = CType(value, Byte)
End If
End Set
End Property
End Class
Listing 1: This code extends the Project
entity to include a new property. The new property translates the existing numeric property (EstimationComparisonType
) value to and from an enumeration (EstimationComparison
).
This is good, and makes use of the tools we're given by the ORM for LINQ. However, we can do better than that. We can actually tell the ORM to use a different type for the property, and it will handle the conversions for us. As long as the type we inform it to use is compatible with the data type used to store the values in the database, there will be no problems. This can be accomplished in a couple of ways. One is to edit the DBML file manually and use SQLMetal to regenerate the code from the edited file. The other way is to open the DBML in the designer and change the type of the entity's property via the property page. Since we also want this field to be Nullable
, we have to make sure to set the property type accordingly.
Figure 3: Change the Type
of the property for an entity to anything compatible with the Server Data Type. For this example, I've changed the Type
to a Nullable
enumeration. If you are using C#, you can set this to EstimationComparisonType?
as a shortcut.
Once this change is made, we can make a direct assignment to the EstimationComparisonType
property of the Project
entity with one of the enumeration values. The Framework takes care of the conversion.
C#:
Project Project = DataContext.Projects.Single(P => P.ProjectId == 1);
Project.EstimationComparisonType = EstimationComparisonType.Overall;
VB:
Dim Project As Proejct = DataContext.Projects.Single(Function(P) P.ProjectId = 1)
Project.EstimationComparisonType = EstimationComparisonType.Overall
Listing 2: We can make direct enumeration assignments to the property changed in Figure 3.
Field Subset Challenges with LINQ to SQL
One thing we should have learned early in our database development careers is to restrict the set of data we retrieve to just what we need and nothing more. That means, we should limit our rows via a well-thought "where
" clause and our columns via an actual column list in the "select
" clause of our SQL statements.
select *
from Project
order by [Name]
select ProjectId, [Name]
from Project
where IsActive = 1
order by [Name]
Listing 3: The first query is quick to write, but is not very efficient. The second query forces us to think about what is actually needed. For a list of products in a simple list control, the second query is much better for our needs.
There are two different queries of the Project
table in Listing 3. If our intent for this data is to fill a simple list control, we do not need all the fields in the table. All we really care about for our list is the primary key and the textual representation of each record. The second query is certainly more ideal.
In beta versions of LINQ to SQL, we were able to force which fields were loaded into the entity object we were retrieving. It turns out, this was a bug. To avoid confusion, this so-called "feature" was never meant to be. The thinking is, if a developer calls on data from a library, layer, or other code written by another developer, there would be mass confusion if that retrieved entity has missing data in the properties.
For example, if I call a method from a library that another developer wrote that retrieves our project list and only gets ProjectId
and Name
, I could become very lost when I try to access the EstimatedDuration
property. Things would be bad enough if a field is not nullable and has uninitialized data, but imagine the EstimatedDuration
property which is nullable. I may presume that there is no such value in the database, when in fact, there may be. I have no way of knowing based on the Product
entity I received.
Therefore, when LINQ to SQL was released to manufacturing (out of beta to release), this kind of thing is no longer allowed. I still need the ability to retrieve just some fields from a table though, and I can. I can use anonymous types locally, but if I want to pass the results around, I need to create a new class to hold just those values I desire and then use a new instance of the class when retrieving the data.
C#:
public class ProjectSummary {
public int ProjectId { get; set; }
public string Name { get; set; }
public List<TaskSummary> Tasks { get; set; }
}
VB:
Public Class ProjectSummary
Private _projectId As Integer
Private _name As String
Private _tasks As List(Of TaskSummary)
Public Property ProjectId As Integer
Get
Return _projectId
End Get
Set(ByVal value As Integer)
_projectId = value
End Set
End Property
Public Property Name As String
Get
Return _name
End Get
Set(ByVal value As String)
_name = value
End Set
End Property
Public Property Tasks As List(Of TaskSummary)
Get
Return _tasks
End Get
Set(ByVal value As List(Of TaskSummary))
_tasks = value
End Set
End Property
End Class
Listing 4: A ProjectSummary
class is created to hold a subset of the field values from the table.
C#:
var ProjectList = from P in DataContext.Projects
orderby P.Name
select new ProjectSummary {
ProjectId = P.ProjectId,
Name = P.Name
};
VB:
Dim ProjectList = From P In DataContext.Projects _
Order By P.Name _
Select ProjectSummary = New With { _
.ProjectId = P.ProjectId, _
.Name = P.Name _
}
Listing 5: The LINQ query that uses our new ProjectSummary
class to hold the fields we want.
When we create a new class to hold just the fields we want, it is fairly easy to adjust our LINQ syntax to get the data into a new instance of our class. In our example, Listing 4 declares a new ProjectSummary
class with just the ProjectId
and Name
properties, and Listing 5 shows the query to fill a new instance of ProjectSummary
with the appropriate data.
Change Tracking Challenges with LINQ to SQL
Change tracking with LINQ to SQL can be quite challenging in certain cases; especially, if you are just starting out with the new technology. Let's take a look at such a case with the sample time tracking application.
Figure 4: The "Projects" window" loads the projects from the database and binds them to the list box.
The "Projects" window retrieves a summary list of the current projects from the database via the generated data context. The fetched project list is bound directly to the list box seen in Figure 4. When the user presses the "Edit" button, the entire Project
is retrieved and passed to the "Edit Project" window.
Figure 5: The "Edit Project" window uses an entity object instance passed to it to bind directly to the controls, with the exception of Tasks
.
The "Edit Project" window receives an instance of a Project
entity object and binds all its properties to the controls on the form, except for the Tasks
child list. If we bind the Project.Tasks
list to the list box, we will have no way to undo changes made to that list if the user chooses to "Cancel" the dialog box.
Despite the fact that the data context can track object changes in order to determine what has been modified, there is no public method we can call to "undo changes" made thus far. There are a few solutions to this drawback.
Firstly, you can instruct the ORM to inherit the entity classes from a base class of your choice. It is within your power to create a base class that supports deep cloning. With this ability, you could clone the original state of the object when any property is changed. Furthermore, you could add a method to undo the changes made to the object that will revert it back to the original values. However, implementing deep cloning is a bit intensive, and would certainly result in performance loss. This would be compounded with each child object depth. Of course, you could implement a maximum depth, and allow it to be specified on a case-by-case basis. Good luck with this one!
Alternatively, you can take the approach of flagging objects for modification and/or deletion. Our example application is most concerned with undoing deleted tasks. We could initialize a separate list of the project's tasks. That way, when the user presses the "Delete" button for a task, the Task
object is "marked" for deletion and removed from our separate list (which is bound to the list box). This will leave the object in the original Project.Tasks
list, but remove it from the list used to display the tasks in the list box. If the user clicks the "OK" button of the "Edit Project" dialog, we can delete those tasks that have been marked for deletion before submitting changes to the project.
Disconnected Challenges with LINQ to SQL
Another issue some developers encounter involves disconnected scenarios. Suppose you retrieve an entity or collection of entities via a LINQ to SQL data context in a separate physical layer. In a case like that, the original data context used to retrieve the data will cease to exist by the time it travels to the user interface layer and back again for updates. There is some support for such a very common architecture.
If you prefer to use the generated entity classes that the ORM built, you can pass instances of those between your layers and get results. Beware of the limitations though. You can re-attach the objects to a new data context for updates. However, the Attach
method of a context's Table
instance does not support a deep object graph. In fact, it only supports itself. That is, when you attach an object, only that object is attached. If your object has child objects related to it via foreign keys, those objects will not be attached. You must attach the child objects individually in order for them to participate in an update.
LINQ to SQL usually uses the original state of an object to maintain optimistic concurrency control during updates. Since object tracking is severed during the disconnected scenario, you will have to provide that original object, or an alternative, when re-attaching the object for updates. The re-attachment must receive the original object in addition to the modified one, or your entity must have a version member. A version member is typically defined as a timestamp field in your database, and you will definitely want to flag the generated property as read only.
These requirements are necessary for LINQ to SQL to determine if the data you wish to update has already been modified by another user. Keep all of this in mind if you wish to pass the generated entity objects between your layers as opposed to messages.
Pre-fetching Challenges with LINQ to SQL
LINQ to SQL has many great performance tweaks in-place. A lot of thought went into the way data is retrieved. If you take a moment to examine the SQL that is generated while in debug mode or via the data context's Log
property, you will see evidence of the care put into the statements.
Another performance feature is deferred loading, or "lazy loading". Deferred loading prevents certain data from being retrieved before it is actually needed. There is an option available per property on your entities for "delay loading" the property value. However, all child objects default to this behavior. In our example application, the Project
entity has a property for the Task
s defined under it. When a project is retrieved via LINQ to SQL, only the data from the corresponding table is retrieved. If, however, I access the Tasks
property, another query is issued to the database to retrieve those tasks. This is by design to prevent unnecessary load on the database for "just in case" scenarios. However, there are times when we know that we want the child objects to be loaded.
The time tracking application displays a menu on its notification icon (system tray icon) containing all the projects in the database. Each project menu item also has a list of child menu items for each associated task. Knowing that we need to build these menus ahead of time affords us the vision to pre-fetch all necessary tasks at the same time we get all the projects. If we can instruct LINQ to do all of this in one query, we can save some performance from multiple queries (1 query instead of 1 for the projects, plus 1 for each project to get the tasks).
We will use a feature of LINQ to SQL to achieve this "eager loading". The data context has a property named LoadOptions
that accepts an instance of DataLoadOptions
. This is how we specify that we want to go ahead and load one or more related child entities in one query.
C#:
DataLoadOptions Options = new DataLoadOptions();
Options.LoadWith<Project>(P => P.Tasks);
DataContext.LoadOptions = Options;
var ProjectList = from P in DataContext.Projects
orderby P.Name
select new ProjectSummary {
ProjectId = P.ProjectId,
Name = P.Name,
Tasks = (
from T in P.Tasks
orderby T.Name
select new TaskSummary {
TaskId = T.TaskId,
Name = T.Name
}
).ToList()
};
VB:
Dim Options As New DataLoadOptions()
Options.LoadWith(Of Project)(Function(P) P.Tasks)
DataContext.LoadOptions = Options
Dim ProjectList = From P In DataContext.Projects _
Order By P.Name _
Select ProjectSummary = New With { _
.ProjectId = P.ProjectId, _
.Name = P.Name, _
.Tasks = ( _
From T In P.Tasks _
Order By T.Name _
Select TaskSummary = New With { _
.TaskId = T.TaskId, _
.Name = T.Name _
} _
).ToList() _
}
Listing 6: Deferred loading can be turned off for select child entities via the LoadOptions
property of the data context. In this case, we choose to pre-fetch the child Task
s of each Project
retrieved in the query.
This is a truly great feature! However, when we examine the generated SQL after making this setting, there is a potential performance problem. Notice in Figure 6 how the SQL that LINQ generates uses a join
to get the data from both the Project and Task tables.
Figure 6: Inspecting the generated SQL via the SQL Server Query Visualizer, we can see the statement uses a left outer join to retrieve both Project
s and Task
s at once.
My initial response to seeing the above generated SQL was a bit of surprise. My reason for surprise is due to the data retrieved. Each field selected from the Project table will be duplicated for each Task
row retrieved.
Figure 7: When multiple Task
s exist for a Project
, all the fields selected from the Project
are duplicated for each Task
also retrieved.
I expected to see two queries: one to retrieve the necessary projects and another to retrieve the matching tasks for those projects (via a correlated subquery if a "where
" clause existed for the project query). I expected the results of the two queries would then be used to map tasks to projects when building the object graph.
Not being one to just accept such results, I e-mailed Scott Guthrie, a general manager at Microsoft who has written several blog posts about LINQ. I asked him why this behavior is found in LINQ for pre-fetching related entities. Scott deferred the question to Dinesh Kulkarni, a program manager on the LINQ project. Dinesh works specifically on LINQ to SQL. His response summarized the many options his team considered with regards to performance and eager loading, or pre-fetching. The team apparently ran several tests on different sets of data of varying sizes. Some solutions worked well on some data sets and not on others. After much thought, they opted for the current solution, even though he expressed their unhappiness with the performance.
Dinesh also referenced a detailed blog post by the senior performance architect, Rico Mariani. I highly recommend reading the blog post when you have some free time. The post does touch on some of these very issues of performance.
After reading Dinesh's reply and the blog post, I came to the conclusion that there is no "silver bullet". I can absolutely see times where multiple queries would not yield good performance due to correlated subqueries. Imagine a parent query that has some complicated filters in the "where
" clause. Each child query that is pre-fetched will have to have a subquery that correlates it with the parent query with the same complicated filters. That kind of behavior would compound the complexity greatly in those situations.
Other Points of Interest
It is great that you can respond to certain events inside your entities. While exploring some of these options, I experimented with the idea of setting some default values for a new object when setting one of its properties to a related entity instance. This worked surprisingly well.
C#:
public partial class Task {
partial void OnCreated() {
this.PropertyChanged += new
System.ComponentModel.PropertyChangedEventHandler(Task_PropertyChanged);
}
void Task_PropertyChanged(object sender,
System.ComponentModel.PropertyChangedEventArgs e) {
if (e.PropertyName == "Project") {
if (Project != null && TaskId <= 0) {
EstimationComparisonType = Project.EstimationComparisonType;
DetectAway = Project.DetectAway;
IsBillable = Project.IsBillable;
}
}
}
}
VB:
Partial Public Class Task
Private Sub OnCreated()
AddHandler Me.PropertyChanged, AddressOf Task_PropertyChanged
End Sub
Sub Task_PropertyChanged(sender As Object,
e As System.ComponentModel.PropertyChangedEventArgs)
If e.PropertyName = "Project"
If Not Project Is Nothing AndAlso TaskId <= 0
EstimationComparisonType = Project.EstimationComparisonType
DetectAway = Project.DetectAway
IsBillable = Project.IsBillable
End If
End If
End Sub
End Class
Listing 7: Setting default properties of the Task
entity instance when it is assigned a parent Project
value.
The code in Listing 7 will be executed if a new instance of a Task
object is assigned an existing Project
value via its related property. In such a case, the Task
will have some of its properties set to defaults from the parent Project
object.
One other thing to look out for when working with LINQ to SQL is a very obscure error that I received a couple of times. This error is more prone to occur when you edit the schema of your database without updating the DBML file. I received a ChangeConflictException
with the message, "Row not found or changed", several times while running the sample application. It turned out to be something very simple. I changed the sample schema to allow nulls for a couple of fields and forgot to update the DBML file for my LINQ entities. All I had to do to fix this was to indicate that nulls were allowed for the corresponding entity property. In the case of my custom enumeration property, I also had to use the System.Nullable
generic type when specifying the enumeration type. I expected the ORM, or DBML designer, to imply the Nullable
type when I set the property's Nullable
flag to "True". However, it did not, so I had to set the property's Type
as indicated earlier, "System.Nullable<EstimationComparisonType>
".
Summing Up
In a field filled with acronyms, one of the newest to the .NET Framework is extremely powerful. LINQ offers us a simple interface to our data. Despite the challenges it creates, it can be a very valuable tool in a developer's "kit". Using LINQ to SQL with SQL Server Compact Edition should certainly speed up the development process for your next application requiring embedded data. All you need is a little knowledge on making it work best and some free time to give it a shot.
Good luck and have fun!
References and Resources