Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

They Said Select * was Bad But...

4.00/5 (1 vote)
9 Apr 2015CPOL11 min read 12.7K  
They said Select * was bad but...

So speaking to any smart database people, they will generally tell you that the SELECT * FROM ... is a bad idea and should not be used. Today (well would have been if I didn't procrastinate) I spent many hours debugging something from the wrong places because I used SELECT *. Below is what I found out =).

Rennie Moodley helped me understand what was happening after I figured out where the problem was, helped me create the sample code below and then the 2 of us and Christopher Swart poked at and improved the sample and found some more weirdness which you'll also see below.

Sample Setup

For the setup of the sample, you can use any database, the sample code contains everything to create a SampleDb and then drop it in the end.

SQL
USE [master]
GO

--create sample database
CREATE DATABASE [SampleDb]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'SampleDb', FILENAME = N'C:\SQL\Database Engine\Data\SampleDb.mdf' , _
	SIZE = 4096KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'SampleDb_log', FILENAME = N'C:\SQL\Database Engine\Log\SampleDb_log.ldf' , _
	SIZE = 1024KB , FILEGROWTH = 10%)
GO

Initial Weirdness

So for the initial bit of weirdness that cost me a couple hours, I basically had a table and 2 views (but the scenario can play out with a table and 1 view as we'll do in this post). It looked something like below.

Table

SQL
-- create helper table with 2 columns
CREATE TABLE dbo.tb_View_Test
  (A INT
  ,B INT)
GO


INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)

GO

View

SQL
-- create view against the table
CREATE VIEW dbo.vw_Test
AS
  SELECT *
    FROM dbo.tb_View_Test
GO

With this in place, we are able to do a select and notice that we do have data.

image

I then somewhere along the line added another column to the view (in this case table) like so:

SQL
-- alter underlying table and add a 3rd column
ALTER TABLE dbo.tb_View_Test
  ADD C INT NULL
GO

added a reference to this column in my code, got side tracked and then after lunch and a meeting, came back to the code where I developed in some other areas of the app for something that I remembered about while away from my PC and then eventually came back to this area of code which by now I lost complete context of what I was doing. I ran the code and received a weird error saying that the column didn't exist. I double checked the view with sp_helptext to make sure it was in the database (only part that doesn't work with this sample because sp_helptext doesn't support tables). I thought at this stage maybe the view had the column specified (which was not like me because I'm a lazy developer), so I did a sp_helptext on the view.

SQL
--sp_helptext not supported by tables
--sys.sp_helptext @objname = N'dbo.tb_View_Test', -- nvarchar(776)
--    @columnname = NULL -- sysname
--GO


sys.sp_helptext @objname = N'vw_Test', -- nvarchar(776)
    @columnname = NULL -- sysname
GO

which showed me that the view still had a star:

image

but why then did the results show otherwise?

image

After as mentioned many hours of looking at this from checking that the app was still configured to hit this database and not a random other environment to restarting my PC, I reinstalled all the objects and by doing that lost all my sample data. Which we would simulate by dropping the table and view and then recreating the table with the extra column and then creating the view as we previously did.

SQL
-- drop helper objects if they exist
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
      DROP VIEW dbo.vw_Test
END
GO


IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
      DROP TABLE dbo.tb_View_Test
END
GO


-- create helper table with 2 columns
CREATE TABLE dbo.tb_View_Test
  (A INT
  ,B INT)
GO


INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)

GO


-- create view against the table
CREATE VIEW dbo.vw_Test
AS
  SELECT *
    FROM dbo.tb_View_Test
GO


-- alter underlying table and add a 3rd column
ALTER TABLE dbo.tb_View_Test
  ADD C INT NULL
GO

This now returned the data I was expecting.

image

So what was different now? The code looked the same so what could it be. At the point, the non lazy developer on a lot of energy drinks kicked in and I started digging which made me revert the table changes and reinstall everything again and then made the single column add as above which I confirmed made the weirdness return which for completeness looks like this:

SQL
-- drop helper objects if they exist
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
      DROP VIEW dbo.vw_Test
END
GO


IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
      DROP TABLE dbo.tb_View_Test
END
GO


-- create helper table with 3 columns
CREATE TABLE dbo.tb_View_Test
  (A INT
  ,B INT)
GO


INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)

GO


-- create view against the table
CREATE VIEW dbo.vw_Test
AS
  SELECT *
    FROM dbo.tb_View_Test
GO

and then running that select:

SQL
-- check that it returns just 2 columns
SELECT * 
  FROM dbo.vw_Test
GO

image

It was at this point that I realized that reinstalling the view fixed my problem probably. To try this, you can either drop and create the view or simple ALTER it which is the most likely way you would do this in production for most people.

SQL
-- alter view 
ALTER VIEW dbo.vw_Test
AS
  SELECT *
    FROM dbo.tb_View_Test
GO

and then once again we run the select:

SQL
-- check that it returns 3 columns
SELECT * 
  FROM dbo.vw_Test
GO

image

and we now have that extra column.

Enter Rennie

At this point, I called Rennie over and showed him the weirdness. After going for some coffee I found in my inbox a mail containing a sample script giving me some insight. The key pieces to note was that querying the information schema before running the ALTER.

SQL
-- double confirm that the metadata for the view does not reflect the 3rd column
SELECT * 
FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN    INFORMATION_SCHEMA.COLUMNS AS c
ON      c.TABLE_SCHEMA  = cu.TABLE_SCHEMA
AND     c.TABLE_CATALOG = cu.TABLE_CATALOG
AND     c.TABLE_NAME    = cu.TABLE_NAME
AND     c.COLUMN_NAME   = cu.COLUMN_NAME
WHERE   cu.VIEW_NAME    = 'vw_Test'
AND     cu.VIEW_SCHEMA  = 'dbo'
GO

revealed that the column wasn't there:

image

whereas running that ALTER statement again:

image

So this again illustrated that what the ALTER achieved, he also mentioned that we can use sp_refreshview to refresh the view:

SQL
-- call sp_refreshview to refresh the metadata
sp_refreshview 'dbo.vw_Test'
GO

Enter Chris

At this point, we did what is the best feeling in finding a bug like this in our code, we poked to see what else we (I) could learn from this which also revealed some almost more bazar then the views caching the columns it returned.

Column Rename

What happens if you rename one of the columns:

SQL
--rename column
sp_rename 'dbo.tb_View_Test.A', 'D', 'COLUMN';
GO

we are given a warning saying that we could be breaking stuff:

image

and so we ran the select and I was thinking that it would throw an error:

image

and we noticed that the sp_rename didn't seem to work or at least that's what I would have thought before our earlier discovery. So just to double check, we looked at the information schema again:

SQL
-- note that the information schema shows the columne as D
SELECT * 
FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN    INFORMATION_SCHEMA.COLUMNS AS c
ON      c.TABLE_SCHEMA  = cu.TABLE_SCHEMA
AND     c.TABLE_CATALOG = cu.TABLE_CATALOG
AND     c.TABLE_NAME    = cu.TABLE_NAME
AND     c.COLUMN_NAME   = cu.COLUMN_NAME
WHERE   cu.VIEW_NAME    = 'vw_Test'
AND     cu.VIEW_SCHEMA  = 'dbo'
GO

and thought it would show us the same result as earlier with a row for A, B and C but instead noted that D was showing here.

image

Now I was thinking where could it be keeping the information for the views columns then and more importantly how do we get a A column back from the result when it doesn't exist. We looked next turned to sys.views and sys.columns.

SQL
SELECT c.* 
FROM sys.views AS v 
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO

This showed us the A column still being there as part of what SQL knew to be in the view.

image

So how does the view get the data which was in column A and is now in column D? Well, for this, we turn back to sp_refreshview, information schema, sys.views and sys.columns.

SQL
sp_refreshview 'dbo.vw_Test'
GO


SELECT ORDINAL_POSITION, * 
FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN    INFORMATION_SCHEMA.COLUMNS AS c
ON      c.TABLE_SCHEMA  = cu.TABLE_SCHEMA
AND     c.TABLE_CATALOG = cu.TABLE_CATALOG
AND     c.TABLE_NAME    = cu.TABLE_NAME
AND     c.COLUMN_NAME   = cu.COLUMN_NAME
WHERE   cu.VIEW_NAME    = 'vw_Test'
AND     cu.VIEW_SCHEMA  = 'dbo'
GO


SELECT column_id, c.* 
FROM sys.views AS v 
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO

which shows us:

image

Now this is the part I looked at by myself so I can only assume that SQL is smart here and uses the column id/pos when it queries and labels the result as what it currently believes to be the column name.

Dropping and Adding Column

So we knew how the view would handle column renames but what about dropping and creating columns. For this, we dropped the column B and added the column E.

SQL
ALTER TABLE dbo.tb_View_Test
  DROP COLUMN B 
GO


ALTER TABLE dbo.tb_View_Test
  Add E INT NULL
GO

Run the view select again:

SQL
SELECT * 
  FROM dbo.vw_Test
GO

And as we could almost expect from this post column E didn't exist but what about column B? It should exist right because we haven't refreshed the view cache but how could it because it was now dropped from the database. This most weirdly for me returns the column (which I guess could be expected) but now we have no data for column B.

image

We run the magic again to see what SQL thinks is going on:

SQL
SELECT ORDINAL_POSITION, * 
FROM    INFORMATION_SCHEMA.VIEW_COLUMN_USAGE AS cu
JOIN    INFORMATION_SCHEMA.COLUMNS AS c
ON      c.TABLE_SCHEMA  = cu.TABLE_SCHEMA
AND     c.TABLE_CATALOG = cu.TABLE_CATALOG
AND     c.TABLE_NAME    = cu.TABLE_NAME
AND     c.COLUMN_NAME   = cu.COLUMN_NAME
WHERE   cu.VIEW_NAME    = 'vw_Test'
AND     cu.VIEW_SCHEMA  = 'dbo'
GO


SELECT column_id, c.* 
FROM sys.views AS v 
INNER JOIN sys.columns AS c ON v.object_id = c.object_id
WHERE v.name = 'vw_Test'
GO

and notice now that information schema says we have no column B and also no column E and the sys.views and sys.columns query shows D, B and C rows.

image

a sp_refreshview and select:

SQL
sp_refreshview 'dbo.vw_Test'
GO


SELECT * 
  FROM dbo.vw_Test
GO

shows the data as we would now expect it:

image

Now again, here I am assuming what SQL does and here I think it must obviously be using sys.columns for what columns to show and then going to information schema for where to find the data, naturally not find a matching column would return null data. However, looking at the result above showing the queries off information schema, sys.views and sys.columns disproves my previous idea about how it matches the columns. Maybe it checks if the same amount of columns exist, then use order matching otherwise the name matching or something silly like that but then that wouldn't work for a rename and drop and create without refreshing the view.

Column Drop, Add and Rename

Before we see if anything weird happens here will quickly top up our table with data to see for NULL's, for that we'll just TRUNCATE and then INSERT rows as before.

SQL
TRUNCATE TABLE dbo.tb_View_Test
GO


INSERT INTO dbo.tb_View_Test VALUES (1,2,9)
INSERT INTO dbo.tb_View_Test VALUES (3,4,10)
INSERT INTO dbo.tb_View_Test VALUES (5,6,11)
INSERT INTO dbo.tb_View_Test VALUES (7,8,12)
GO

SELECT * 
  FROM dbo.vw_Test
GO

and we now have the data:

image

and this is where things get very weird for me (could also be because it's late and my brain is tired) but if we drop, add and rename a column:

SQL
ALTER TABLE dbo.tb_View_Test
  DROP COLUMN C 
GO


ALTER TABLE dbo.tb_View_Test
  Add F INT NULL
GO



sp_rename 'dbo.tb_View_Test.D', 'G', 'COLUMN';
GO

and run our select:

image

we expect that column C would contain NULL data as before but instead column E has the NULL data. Looking back to our new favorite place:

image

we notice that column C has been removed as expected. If we run our refresh and select again:

image

We see that now column E has data and the old column D has been renamed to G and has data. From this, my new assumption is like my old one and that is that SQL gets the column pos it needs data for from information schema and then gets the label for those columns from that same position in sys.columns.

Dropping Columns Without Adding Any

So what happens when we drop a column but don't add another? Let's test that, we are going to drop the column E because it feels like it won't make any difference:

SQL
ALTER TABLE dbo.tb_View_Test
  DROP COLUMN E 
GO


SELECT * 
  FROM dbo.vw_Test
GO

we run the magic:

image

Now based on our last theory, we were saying that it would get columns to get data from Information Schema and labels from sys.columns so you would almost expect a similar result as Information Schema has 2 results and sys.columns has 3 to before with the last column being NULL. Let's run that select.

image

And instead of getting a NULL value at the end, we get an error:

Msg 4502, Level 16, State 1, Line 396
View or function 'dbo.vw_Test' has more column names specified than columns defined.

Now I haven't figured how exactly why in the previous example SQL just puts a column with NULL data and in this one, it throws an error. I know it's obvious that it's because there isn't a third column but it would be nice to find the place SQL looks to know this which I will continue looking for =) and of course if we run the select now.

image

We have a working view again.

If I Were Not To Use a SELECT *

A little clean up is needed here again. Let's drop the table and view and create them as they existed at the start of this post.

SQL
-- drop helper objects if they exist
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
      DROP VIEW dbo.vw_Test
END
GO


IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
      DROP TABLE dbo.tb_View_Test
END
GO


-- create helper table with 2 columns
CREATE TABLE dbo.tb_View_Test
  (A INT
  ,B INT)
GO


INSERT INTO dbo.tb_View_Test VALUES (1,2)
INSERT INTO dbo.tb_View_Test VALUES (3,4)
INSERT INTO dbo.tb_View_Test VALUES (5,6)
INSERT INTO dbo.tb_View_Test VALUES (7,8)
GO


-- create view against the table
CREATE VIEW dbo.vw_Test
AS
  SELECT *
    FROM dbo.tb_View_Test
GO

Now alter the view to use column names instead of *.

SQL
-- create view against the table
ALTER VIEW dbo.vw_Test
AS
  SELECT A, B
    FROM dbo.tb_View_Test
GO

And the select just to verify data:

image

Looks good.

Add a Column

Now let's do the add a column:

SQL
ALTER TABLE dbo.tb_View_Test
  Add E INT NULL
GO

The command completes as expected, run the select:

image

We are given just our 2 columns as expected. Although this didn't magically give us the 3rd column if we do a sp_helptext on the view we will see that we are only asking for column A and B. Now if we ALTER the view to include column E.

SQL
ALTER VIEW dbo.vw_Test
AS
  SELECT A, B, E
    FROM dbo.tb_View_Test
GO

and run the select:

image

We are given the extra column and nothing has surprised us yet.

Dropping and Adding a Column

So now that we have added a column and see that nothing weird happens, let's try the dropping and adding a column example. We'll drop column B and add a column C.

SQL
ALTER TABLE dbo.tb_View_Test
  DROP COLUMN B 
GO


ALTER TABLE dbo.tb_View_Test
  Add C INT NULL
GO

Now, if you remember when we did this earlier, we got funny data with column names that didn't match the data. Let's run the select.

image

We get the error:

Msg 207, Level 16, State 1, Procedure vw_Test, Line 472
Invalid column name 'B'.
Msg 4413, Level 16, State 1, Line 470
Could not use view or function 'dbo.vw_Test' because of binding errors.

We now get what you'd expect because that column doesn't exist. No chance of data getting "corrupt" by showing us data under the wrong heading and of course we can fix this by altering the view to drop the column B  and add column C...

SQL
ALTER VIEW dbo.vw_Test
AS
  SELECT A, E, C
    FROM dbo.tb_View_Test
GO

...and then running the select would show us data again.

image

No surprises here either.

Sample Teardown

To clear out everything we created, we can just drop the database or if you used an existing database, you can just drop the table and view.

SQL
-- drop helper objects if they exist
IF OBJECT_ID('dbo.vw_Test') IS NOT NULL
BEGIN
      DROP VIEW dbo.vw_Test
END
GO


IF OBJECT_ID('dbo.tb_View_Test') IS NOT NULL
BEGIN
      DROP TABLE dbo.tb_View_Test
END
GO


USE [master]
GO


--drop sample database
DROP DATABASE [SampleDb]
GO

Conclusion

Think back to some of the weird SQL errors that you've had in the past, I can think of a couple of applications where I've come across this error and never dug to find out why it was working. Probably too busy chasing deadlines to learn.

This is also unfortunately one of the times where "Have you turned if off and on again" will not help as after restarting SQL, you will still get the "wrong" data.

Have you turned if off and on again

From all of this, you can see that using SELECT * is a bad idea, in most cases if you change a view you wouldn't go and refresh, alter or drop and create every other view in your database as that makes no sense at all.

Found this tweet a while ago about #SelectStar

<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

A sample script with everything above including all the selects that weren't printed out but mentioned and all the "clean up" data can be found in my Git Hub Gists https://binary-stuff.com/gist/aa8e5cf3dfee66f21bcb.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)