In the life of every software engineer, there should be at least (I’d love to say “at most" actually) one project where bussiness logic is implemented on database side. My current project is one of that kind. We have all logic designed as stored procedures, then there’s a middleware where we use LINQ2SQL for database API bindings, then there are iPhone application and a web-site.
Well, it works. But it’s a hell.
First of all, T-SQL itself is a very primitive low-level language. There are just no features to make components reusable. So, as long as your procedures are just couple of select
statements, everything’s fine. But the more execution flows you have to implement, the more unreadable your procedure becomes. I do firmly believe that being extremely pedantic can make things better, but it doesn’t solve the problem completely. It’s just fine to make mistakes when you write a program. In most cases you have an easy way to cover your code with automated tests, at least it’s pretty straighforward in high-level languages. With T-SQL, you just can’t.
In our project, I had to take testing tasks as one of my concerns. I finally came out with idea that the easiest approach technically is just to test the middleware+database together. Test suite is just another client like iPhone application or web-site. This approach proved its adequacy, but the problem is, when Teamcity says things like Test "CanTransferLotsOfMoneys" failed
, database guys have no idea what it means.
So, for a long time I’ve been trying to find out a solution to write tests for stored procedure in pure T-SQL. In this case, I could have gotten rid of being the only person to intepret test failure. The major point was, in T-SQL it’s just impossible to access multiple result sets if stored procedure returns them. In our project, there are 2 or may be 3 procedures, which only return 1 result set. That’s the issue.
Always return only one result set
Sure this will never work. Simple example is, I need to get data to render a blog post page. I’ll need:
- Current user’s details
- Post details
- Comments and their details
There are at least 3 result sets.
Use temporary tables
This will probably work, but what do you do if some procedures call themselves recursively?
Table variables
Table variables is a great idea:
declare @Users table(
UserId int,
UserName nvarchar(256)
)
exec DoSomething @Users
But unfortunately stored procedures can’t return them.
I finally discovered that T-SQL has features to work with XML. The solution consists of few simple ideas.
- There’s a type
xml
in T-SQL. This type allows storing XML documents and accessing their nodes. - Objects of this type can be either converted to
nvarchar()
or constructed from it. - Objects of this type can be passed to stored procedures as output parameters.
- You can easily generate XML from
select
queries.
Let’s say we have 2 tables:
create table Blog(
BlogId int identity(1,1) not null,
BlogName nvarchar(256)
)
create table Post(
PostId int identity(1,1) not null,
BlogId int not null, -- yes, this should be FK
BlogName nvarchar(256)
)
The task is:
- Create a testable stored procedure
GetBlogWithDetails @blogId
, that will return BlogId
, BlogName
and all its posts. - Definition of “testable" is: I can access all the data returned and based on this data make some conclusions.
Normally, this procedure would look like this: (no error handling - sorry)
create procedure GetBlogWithDetails(@BlogId int) as
begin
select BlogId, BlogName
from Blog
where BlogId = @BlogId
select PostId, BlogId, PostName
from Post
where BlogId = @BlogId
end
You can’t access its second result set. The solution is to render same data as XML and return this XML as output parameter.
create procedure GetBlogWithDetailsXml(
@blogId int,
@xml xml output) as
begin
-- first, create 2 table variables for blogs and for posts
declare @blogRows table(BlogId int, BlogName nvarchar(256))
declare @postRows table(PostId int, BlogId int, PostName nvarchar(256))
-- (#1) this is where you implement your useful logic
insert into @blogRows
select BlogId, BlogName
from Blog
where BlogId = @blogId
insert into @postRows
select PostId, BlogId, PostName
from Post
where BlogId = @blogId
-- here you render your 2 tables containing useful data as XML
declare @blogRowsXml xml = (
select BlogId, BlogName
from @blogRows as Blog
for xml auto)
declare @postRowsXml xml = (
select PostId, BlogId, PostName
from @postRows as Post
for xml auto)
-- here you build a single XML with all the data required
set @xml =
cast(@blogRowsXml as nvarchar(max)) +
cast(@postRowsXml as nvarchar(max))
-- (#2) here you return the data as "raw" result sets.
select * from @blogRows
select * from @postRows
end
As you see, “useful" code is only at #1 and #2. The rest is bunch of stuff for XML. You can now play with GetBlogWithDetailsXml
like this:
For my test data,
declare @xml xml
exec GetBlogWithDetailsXml 1, @xml output
select @xml
returns XML like this:
<Blog BlogId="1" BlogName="Blog #1" />
<Post PostId="1" BlogId="1" PostName="Post #1 in Blog #1" />
<Post PostId="2" BlogId="1" PostName="Post #2 in Blog #1" />
It’s now pretty easy to extract all the data returned:
select
Col.value('@BlogId', 'int') as BlogId,
Col.value('@BlogName', 'nvarchar(256)') as BlogName
from @xml.nodes('/Blog') as Data(Col)
select
Col.value('@PostId', 'int') as PostId,
Col.value('@BlogId', 'int') as BlogId,
Col.value('@PostName', 'nvarchar(256)') as PostName
from @xml.nodes('/Post') as Data(Col)
with my test data, it returns:
BlogId | BlogName |
---|
1 | Blog #1 |
PostId | BlogId | PostName |
---|
1 | 1 | Post #1 in Blog #1 |
2 | 1 | Post #2 in Blog #1 |
I’m not really sure if returning “raw" result sets even makes sense with this approach. From the viewpoint of LINQ2SQL, yes, you won’t be able to utilize its mapping feature, but deserializing object from XML is a primitive task, so it shouldn’t be an issue.