So speaking to any smart database people, they will generally tell you that the SELECT * FROM ...
is a bad idea and should not be used. Today (well would have been if I didn't procrastinate) I spent many hours debugging something from the wrong places because I used SELECT *
. Below is what I found out =).
Rennie Moodley helped me understand what was happening after I figured out where the problem was, helped me create the sample code below and then the 2 of us and Christopher Swart poked at and improved the sample and found some more weirdness which you'll also see below.
Sample Setup
For the setup of the sample, you can use any database, the sample code contains everything to create a SampleDb
and then drop it in the end.
USE [master]
GO
CREATE DATABASE [SampleDb]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'SampleDb', FILENAME = N'C:\SQL\Database Engine\Data\SampleDb.mdf' , _
SIZE = 4096KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N'SampleDb_log', FILENAME = N'C:\SQL\Database Engine\Log\SampleDb_log.ldf' , _
SIZE = 1024KB , FILEGROWTH = 10%)
GO
Initial Weirdness
So for the initial bit of weirdness that cost me a couple hours, I basically had a table and 2 views (but the scenario can play out with a table and 1 view as we'll do in this post). It looked something like below.
Table
CREATE TABLE dbo.tb_View_Test
(A INT
,B INT)
GO
INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)
GO
View
CREATE VIEW dbo.vw_Test
AS
SELECT *
FROM dbo.tb_View_Test
GO
With this in place, we are able to do a select
and notice that we do have data.
I then somewhere along the line added another column to the view (in this case table
) like so:
ALTER TABLE dbo.tb_View_Test
ADD C INT NULL
GO
added a reference to this column in my code, got side tracked and then after lunch and a meeting, came back to the code where I developed in some other areas of the app for something that I remembered about while away from my PC and then eventually came back to this area of code which by now I lost complete context of what I was doing. I ran the code and received a weird error saying that the column didn't exist. I double checked the view with sp_helptext
to make sure it was in the database (only part that doesn't work with this sample because sp_helptext
doesn't support tables). I thought at this stage maybe the view had the column specified (which was not like me because I'm a lazy developer), so I did a sp_helptext
on the view.
sys.sp_helptext @objname = N'vw_Test',
@columnname = NULL
GO
which showed me that the view still had a star:
but why then did the results show otherwise?
After as mentioned many hours of looking at this from checking that the app was still configured to hit this database and not a random other environment to restarting my PC, I reinstalled all the objects and by doing that lost all my sample data. Which we would simulate by dropping the table and view and then recreating the table with the extra column and then creating the view as we previously did.
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
DROP VIEW dbo.vw_Test
END
GO
IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
DROP TABLE dbo.tb_View_Test
END
GO
CREATE TABLE dbo.tb_View_Test
(A INT
,B INT)
GO
INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)
GO
CREATE VIEW dbo.vw_Test
AS
SELECT *
FROM dbo.tb_View_Test
GO
ALTER TABLE dbo.tb_View_Test
ADD C INT NULL
GO
This now returned the data I was expecting.
So what was different now? The code looked the same so what could it be. At the point, the non lazy developer on a lot of energy drinks kicked in and I started digging which made me revert the table changes and reinstall everything again and then made the single column add as above which I confirmed made the weirdness return which for completeness looks like this:
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
DROP VIEW dbo.vw_Test
END
GO
IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
DROP TABLE dbo.tb_View_Test
END
GO
CREATE TABLE dbo.tb_View_Test
(A INT
,B INT)
GO
INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)
GO
CREATE VIEW dbo.vw_Test
AS
SELECT *
FROM dbo.tb_View_Test
GO
and then running that select
:
SELECT *
FROM dbo.vw_Test
GO
It was at this point that I realized that reinstalling the view fixed my problem probably. To try this, you can either drop and create the view or simple ALTER
it which is the most likely way you would do this in production for most people.
ALTER VIEW dbo.vw_Test
AS
SELECT *
FROM dbo.tb_View_Test
GO
and then once again we run the select
:
SELECT *
FROM dbo.vw_Test
GO
and we now have that extra column.
Enter Rennie
At this point, I called Rennie over and showed him the weirdness. After going for some coffee I found in my inbox a mail containing a sample script giving me some insight. The key pieces to note was that querying the information schema before running the ALTER
.
SELECT *
FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_SCHEMA = cu.TABLE_SCHEMA
AND c.TABLE_CATALOG = cu.TABLE_CATALOG
AND c.TABLE_NAME = cu.TABLE_NAME
AND c.COLUMN_NAME = cu.COLUMN_NAME
WHERE cu.VIEW_NAME = 'vw_Test'
AND cu.VIEW_SCHEMA = 'dbo'
GO
revealed that the column wasn't there:
whereas running that ALTER
statement again:
So this again illustrated that what the ALTER
achieved, he also mentioned that we can use sp_refreshview
to refresh the view:
sp_refreshview 'dbo.vw_Test'
GO
Enter Chris
At this point, we did what is the best feeling in finding a bug like this in our code, we poked to see what else we (I) could learn from this which also revealed some almost more bazar then the views caching the columns it returned.
Column Rename
What happens if you rename one of the columns:
sp_rename 'dbo.tb_View_Test.A', 'D', 'COLUMN';
GO
we are given a warning saying that we could be breaking stuff:
and so we ran the select
and I was thinking that it would throw an error:
and we noticed that the sp_rename
didn't seem to work or at least that's what I would have thought before our earlier discovery. So just to double check, we looked at the information schema again:
SELECT *
FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_SCHEMA = cu.TABLE_SCHEMA
AND c.TABLE_CATALOG = cu.TABLE_CATALOG
AND c.TABLE_NAME = cu.TABLE_NAME
AND c.COLUMN_NAME = cu.COLUMN_NAME
WHERE cu.VIEW_NAME = 'vw_Test'
AND cu.VIEW_SCHEMA = 'dbo'
GO
and thought it would show us the same result as earlier with a row for A, B and C but instead noted that D was showing here.
Now I was thinking where could it be keeping the information for the views columns then and more importantly how do we get a A column back from the result when it doesn't exist. We looked next turned to sys.views
and sys.columns
.
SELECT c.*
FROM sys.views AS v
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO
This showed us the A column still being there as part of what SQL knew to be in the view.
So how does the view get the data which was in column A and is now in column D? Well, for this, we turn back to sp_refreshview
, information schema, sys.views
and sys.columns
.
sp_refreshview 'dbo.vw_Test'
GO
SELECT ORDINAL_POSITION, *
FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_SCHEMA = cu.TABLE_SCHEMA
AND c.TABLE_CATALOG = cu.TABLE_CATALOG
AND c.TABLE_NAME = cu.TABLE_NAME
AND c.COLUMN_NAME = cu.COLUMN_NAME
WHERE cu.VIEW_NAME = 'vw_Test'
AND cu.VIEW_SCHEMA = 'dbo'
GO
SELECT column_id, c.*
FROM sys.views AS v
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO
which shows us:
Now this is the part I looked at by myself so I can only assume that SQL is smart here and uses the column id/pos when it queries and labels the result as what it currently believes to be the column name.
Dropping and Adding Column
So we knew how the view would handle column renames but what about dropping and creating columns. For this, we dropped the column B
and added the column E
.
ALTER TABLE dbo.tb_View_Test
DROP COLUMN B
GO
ALTER TABLE dbo.tb_View_Test
Add E INT NULL
GO
Run the view select
again:
SELECT *
FROM dbo.vw_Test
GO
And as we could almost expect from this post column E
didn't exist but what about column B
? It should exist right because we haven't refreshed the view cache but how could it because it was now dropped from the database. This most weirdly for me returns the column (which I guess could be expected) but now we have no data for column B
.
We run the magic again to see what SQL thinks is going on:
SELECT ORDINAL_POSITION, *
FROM INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN INFORMATION_SCHEMA.COLUMNS AS c
ON c.TABLE_SCHEMA = cu.TABLE_SCHEMA
AND c.TABLE_CATALOG = cu.TABLE_CATALOG
AND c.TABLE_NAME = cu.TABLE_NAME
AND c.COLUMN_NAME = cu.COLUMN_NAME
WHERE cu.VIEW_NAME = 'vw_Test'
AND cu.VIEW_SCHEMA = 'dbo'
GO
SELECT column_id, c.*
FROM sys.views AS v
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO
and notice now that information schema says we have no column B
and also no column E
and the sys.views
and sys.columns
query shows D, B and C rows.
a sp_refreshview
and select
:
sp_refreshview 'dbo.vw_Test'
GO
SELECT *
FROM dbo.vw_Test
GO
shows the data as we would now expect it:
Now again, here I am assuming what SQL does and here I think it must obviously be using sys.columns
for what columns to show and then going to information schema for where to find the data, naturally not find a matching column would return null
data. However, looking at the result above showing the queries off information schema, sys.views
and sys.columns
disproves my previous idea about how it matches the columns. Maybe it checks if the same amount of columns exist, then use order matching otherwise the name matching or something silly like that but then that wouldn't work for a rename and drop and create without refreshing the view.
Column Drop, Add and Rename
Before we see if anything weird happens here will quickly top up our table with data to see for NULL
's, for that we'll just TRUNCATE
and then INSERT
rows as before.
TRUNCATE TABLE dbo.tb_View_Test
GO
INSERT INTO dbo.tb_View_Test VALUES (1,2,9)
INSERT INTO dbo.tb_View_Test VALUES (3,4,10)
INSERT INTO dbo.tb_View_Test VALUES (5,6,11)
INSERT INTO dbo.tb_View_Test VALUES (7,8,12)
GO
SELECT *
FROM dbo.vw_Test
GO
and we now have the data:
and this is where things get very weird for me (could also be because it's late and my brain is tired) but if we drop, add and rename a column:
ALTER TABLE dbo.tb_View_Test
DROP COLUMN C
GO
ALTER TABLE dbo.tb_View_Test
Add F INT NULL
GO
sp_rename 'dbo.tb_View_Test.D', 'G', 'COLUMN';
GO
and run our select
:
we expect that column C
would contain NULL
data as before but instead column E
has the NULL
data. Looking back to our new favorite place:
we notice that column C
has been removed as expected. If we run our refresh and select again:
We see that now column E
has data and the old column D
has been renamed to G
and has data. From this, my new assumption is like my old one and that is that SQL gets the column pos it needs data for from information schema and then gets the label for those columns from that same position in sys.columns
.
Dropping Columns Without Adding Any
So what happens when we drop a column but don't add another? Let's test that, we are going to drop the column E
because it feels like it won't make any difference:
ALTER TABLE dbo.tb_View_Test
DROP COLUMN E
GO
SELECT *
FROM dbo.vw_Test
GO
we run the magic:
Now based on our last theory, we were saying that it would get columns to get data from Information Schema and labels from sys.columns
so you would almost expect a similar result as Information Schema has 2 results and sys.columns
has 3 to before with the last column being NULL
. Let's run that select
.
And instead of getting a NULL
value at the end, we get an error:
Msg 4502, Level 16, State 1, Line 396
View or function 'dbo.vw_Test' has more column names specified than columns defined.
Now I haven't figured how exactly why in the previous example SQL just puts a column with NULL
data and in this one, it throws an error. I know it's obvious that it's because there isn't a third column but it would be nice to find the place SQL looks to know this which I will continue looking for =) and of course if we run the select
now.
We have a working view again.
If I Were Not To Use a SELECT *
A little clean up is needed here again. Let's drop the table and view and create them as they existed at the start of this post.
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
DROP VIEW dbo.vw_Test
END
GO
IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
DROP TABLE dbo.tb_View_Test
END
GO
CREATE TABLE dbo.tb_View_Test
(A INT
,B INT)
GO
INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)
GO
CREATE VIEW dbo.vw_Test
AS
SELECT *
FROM dbo.tb_View_Test
GO
Now alter
the view to use column names instead of *
.
ALTER VIEW dbo.vw_Test
AS
SELECT A, B
FROM dbo.tb_View_Test
GO
And the select
just to verify data:
Looks good.
Add a Column
Now let's do the add a column:
ALTER TABLE dbo.tb_View_Test
Add E INT NULL
GO
The command completes as expected, run the select
:
We are given just our 2 columns as expected. Although this didn't magically give us the 3rd column if we do a sp_helptext
on the view we will see that we are only asking for column A and B. Now if we ALTER
the view to include column E
.
ALTER VIEW dbo.vw_Test
AS
SELECT A, B, E
FROM dbo.tb_View_Test
GO
and run the select
:
We are given the extra column and nothing has surprised us yet.
Dropping and Adding a Column
So now that we have added a column and see that nothing weird happens, let's try the dropping and adding a column example. We'll drop column B
and add a column C
.
ALTER TABLE dbo.tb_View_Test
DROP COLUMN B
GO
ALTER TABLE dbo.tb_View_Test
Add C INT NULL
GO
Now, if you remember when we did this earlier, we got funny data with column names that didn't match the data. Let's run the select
.
We get the error:
Msg 207, Level 16, State 1, Procedure vw_Test, Line 472
Invalid column name 'B'.
Msg 4413, Level 16, State 1, Line 470
Could not use view or function 'dbo.vw_Test' because of binding errors.
We now get what you'd expect because that column doesn't exist. No chance of data getting "corrupt" by showing us data under the wrong heading and of course we can fix this by altering the view to drop the column B and add column C...
ALTER VIEW dbo.vw_Test
AS
SELECT A, E, C
FROM dbo.tb_View_Test
GO
...and then running the select
would show us data again.
No surprises here either.
Sample Teardown
To clear out everything we created, we can just drop the database or if you used an existing database, you can just drop the table and view.
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
DROP VIEW dbo.vw_Test
END
GO
IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
DROP TABLE dbo.tb_View_Test
END
GO
USE [master]
GO
DROP DATABASE [SampleDb]
GO
Conclusion
Think back to some of the weird SQL errors that you've had in the past, I can think of a couple of applications where I've come across this error and never dug to find out why it was working. Probably too busy chasing deadlines to learn.
This is also unfortunately one of the times where "Have you turned if off and on again" will not help as after restarting SQL, you will still get the "wrong" data.
From all of this, you can see that using SELECT *
is a bad idea, in most cases if you change a view you wouldn't go and refresh, alter or drop and create every other view in your database as that makes no sense at all.
Found this tweet a while ago about #SelectStar
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
A sample script with everything above including all the selects that weren't printed out but mentioned and all the "clean up" data can be found in my Git Hub Gists https://binary-stuff.com/gist/aa8e5cf3dfee66f21bcb.