NOTE: This article was originally part of the VistaDB.Net blog and has been moved here.
ADO.NET is a set of libraries included in the .NET Framework to facilitate the communication of applications with various data storage mechanisms. These libraries form the basis for all third parties to provide data access services to users of .NET applications.
Visual Studio 2005 and 2008 did not change the data access model. In fact, ADO.NET 2 is the longest running Microsoft data access technology without a major revision. I don’t know if this is going to change in .NET 4, but the stability of ADO.NET is a major reason for its adoption in VistaDB.
In this article, I going to give a high level overview of the ADO.NET object model and how VistaDB supports that model. I will then follow up with more articles discussing specifics for this model.
The ADO.NET Object Model
ADO.NET at its most basic level was designed to help developers work efficiently with multi tier databases, across intranet or Internet scenarios. This is a tall order for a single framework, but I think they managed to pull it off quite well.
I often view the object model as being split into two parts – a connected and a disconnected side of the model. Even though there is no distinction between the two “sides”, it helps me to visualize how the data object was intended to be used.
Disconnected ADO.NET Classes
The objects located in the disconnected half of the model do not communicate directly with the connected objects. This is a big deal because it means each side is isolated through interfaces to communicate. I have found this is often the most confusing part of the ADO.NET spec because users think each side should be able to directly communicate.
The ADO.NET DataSet
for example does not ever communicate directly with the database. To fetch data from the database into a DataSet
, you must pass the DataSet
into the Fill
method of a connected object (The DataAdapter
).
DataSet Class
DataSet
s are a container class for a number of DataTable
objects (stored in the Tables
collection). You can think of this as an in memory representation of multiple tables, their constraints, relationships, and the actual rows.
The data stored in the DataSet
is a disconnected copy of the actual database. Any changes you make to a DataSet
only exist in RAM. You can quickly get a list of just the changes by calling the GetChanges
method to return only modified DataRows
. This allows you to submit smaller change sets back to the database.
The Merge
method also allows to combine multiple change sets into a single DataSet
. This is very useful in multi tier scenarios where you need to receive partial updates from lots of different systems and merge them into a single database change.
The DataSet
is also unique in that you can read and write this disconnected data cache to disk. A common use is to put the DataSet
into an XML document for transmission to multiple systems to merge into their local database.
DataTable Class
The DataTable
allows you to examine the actual rows of a DataSet
through rows and columns collections. You can store the results of a query in a DataTable
by calling the Adapters Fill
method. Once the DataTable
is filled, the database connection is released and operates disconnected only. You can then continue to examine the data without any further communication between the DataTable
and the database.
This is important for scenarios where you want to work across a network share. Caching the data on the local machine is vital to performance of the application. Whenever possible, you should not be using connected objects against a network shared database, the traffic is very expensive.
DataColumn Class
Each DataTable
has a collection of DataColumn
s associated with it. This represents a single Column
in the database. But the object doesn’t actually contain any of the data stored in the DataTable
. This is just a metadata storage class about the column, its constraints, types, etc. It is quite useful for finding when a single column allows null
s, has a unique constraint, etc. The autoincrement property is also implemented at the DataColumn
level since you want to ensure each row has a unique value.
One interesting usage for this class is the Expression
property. This allows you to define how the data for this column is calculated. This would enable you to compute the value of an items total price based upon units sold and price per unit at the DataColumn
level rather than through a SQL command. The added benefit to using this scenario is that updates to the Price
and Quantity
columns will recalculate the total price column in RAM without having to return to the database to rerun the SQL.
DataRow Class
Each row in a DataTable
is represented in the Rows
collection as a DataRow
. To examine the data in a specific column, use the Item
property of the appropriate DataRow
object. You can lookup the columns in a row through an integer index, or by the name of the Column
.
The DataRow
class is also where you perform updates to a DataSet
. When you are preparing to edit a row, the BeginEdit
method should be called on the DataRow
object. The EndEdit
method then can be used to save the changes back to the DataSet
, CancelEdit
provides an undo facility. This is key to how the DataGrid
and other grid controls allow users to edit, commit, and cancel their changes in memory.
A key concept often missed by developers is that changes to the DataRow
even when they are “AcceptChanges
” calls are only committing them to the RAM copy of the database (the DataSet
). You must still commit the changes back to the database in order to store them permanently.
Constraints Class
Each DataColumn
may have multiple Constraints. Conditions such as unique are applied through this class. Constraint objects are maintained through the DataTables
Constraints collection.
DataRelation Class
Most tables in a single DataSet
will be related in some way. The DataSet
can handle mapping things like Orders
to OrderDetails
with a little help from the DataRelation
class. This is a way to enforce referential integrity from the database, without having to make a complete round trip to the database.
DataView Class
Although excluded in the diagram above, this is an important class. You can use multiple DataView
objects to example the same DataTable
in a DataSet
. This can result in massive memory savings since only one copy of the data needs to be maintained in the DataSet
. You can do a lot of view type operations with the in memory DataView
object.
Connected ADO.NET Classes
Connected classes in ADO.NET are designed to communicate directly with the data source. Most of these classes map closely to basic data access concepts such as the Connection to the database, a Query
, and QueryResults
.
ProviderFactory Class
New in ADO.NET 2, this is an object factory that allows all .NET applications to generically load a provider without knowing much about it. Each Provider Factory includes a way to create Connections
, ConnectionStringBuilders
, etc.
It is this ProviderFactory
object that allows VistaDB to be swapped with SQL Server at runtime through your app.config or web.config files. It is not a perfect match for communicating directly against the strongly typed classes, but it is very close.
Connection Class
A Connection
is a representation of the actual physical connection to the database. This may be through a server, or locally. You use this class to connect and disconnect from the actual database.
A connection also acts as the handle to the database for other objects like the DataAdapter
and Command
objects. They do not communicate directly to the database, they must go through a Connection
.
ConnectionStringBuilder Class
This is another new class in ADO.NET 2 that simplified the process of building connection strings and remembering the options for a specific provider, it is a great utility class to save you time. Once you have built up this object, you can assign the Connection
class ConnectionString
property to this object.
DataAdapter
The DataAdapter
is a new concept, there are no matching concepts in ADO or DAO. The DataAdapter
is the bridge between your database and the disconnected ADO.NET objects. The Fill
method provides an efficient mechanism to fetch the results of a query into DataSet
or DataTable
so you can work with the data offline (disconnected). You also use the DataAdapter
to submit pending changes from the disconnected objects back to the database.
This is a complex class with lots of internal jobs to track. We will cover it in more detail in another blog post.
Command Class
This class represents a question or query against a database, a call to a stored procedure, or a direct request to return the contents of a specific table. This is probably the hardest class to port your usage between database vendors as most of the calls at this level are vendor specific syntax.
Using a Command
object with a database is pretty easy. You set the Connection
property to an existing opened Connection
object, and then specify the command you want to execute in the CommandText
. You can supply SQL commands, or just the name of a stored procedure or table (although you have to then also change the CommandType
property to what you wish to accomplish).
Commands have many ways to execute them, but all of them basically call ExecuteQuery
under the hood. If you don’t care about the results, or only want the first row and column, there are short cuts you can take (ExecuteNonQuery
and ExecuteScalar
).
Note that VistaDB does not support the ExecuteXML
syntax at this time.
Parameter Class
To simplify the process of putting parameters into a Command
object, the Parameter
class allows you to quickly put parameters into a query without string
concatenation or worry about SQL Injection attacks. Most developers rely heavily on parameterized queries because of the added benefit of SQL Injection protection; string
s are automatically quoted to the specs of the underlying database.
There are many ways to create Parameters for Command
objects, but they are mostly just different ways of doing the same thing. Some people prefer one syntax over another, but they all end up as Parameter
objects on a Command
object at execution in the database.
DataReader Class
The DataReader
is built as a way to retrieve and examine the rows returned in response to your query as quickly as possible. Not all of the rows have to be ready before you can start working with the data. Only a single row is exposed at a time, so the database engine can get you the first record quickly and then continue to work in the background to find the rest of the answers over time.
The data returned by a DataReader
is always read only. This class was built to be a lightweight forward only, read only, way to run through data quickly (this was called a firehose cursor in ADO).
Transaction Class
Transactions are used to ensure that multiple changes to database rows occur as a single unit of work. The Connection
class has a BeginTransaction
method that can be used to create a Transaction
.
A definite best practice is to ensure that Transactions are placed in Using
statements for rapid cleanup if they are not committed. Otherwise, the objects (and any internal locks that may be needed) will remain active until the GC gets around to cleaning it up.
What About Cursors?
You will notice that I never mentioned database cursors in the above discussion. That is because they do not exist within the ADO.NET framework. ADO.NET does not support server side cursors by design. The DataSet
and DataTable
classes most closely resemble a cursor, and the DataReader
class is a close match for a client side forward only, read only, cursor, but it is not the same thing.