How to Calculate the Length of the Largest Text Value in Every Table Column

Daniel Miller

5.00/5 (4 votes)

22 Nov 2015CPOL2 min read

19.9K

This tip shows how to display the length of the longest string value in every table and column in your database.

Introduction

Recently, I inherited a SQL Server database from another developer. It was a big database, with about 8 million rows in 22 tables. The data was entirely plain text (English and French only) and the number of columns per table was relatively small, so despite the number of rows, the overall size of the database seemed unusually large to me (over 6 GB).

One of the steps I took to decrease the size of the database was convert the data type for all NVARCHAR columns to VARCHAR.

VARCHAR is a non-Unicode variable length character data type, which takes 1 byte per character.

NVARCHAR is a Unicode variable length character data type, which takes 2 bytes per character.

This decreased the size of every table by more than 50 percent, taking indexes into account.

At the same time, I took this opportunity to adjust the size of every VARCHAR column so its maximum length was a closer match to the size of the largest text value it contained. (This doesn't contribute to decreasing the size of a database, but it does bring the schema into closer alignment with the actual data.)

In order to resize all of the table columns, I needed a T-SQL statement to list the tables and calculate MAX(LEN()) for every VARCHAR column. I couldn't find anything to do exactly what I wanted, so I wrote a SQL script that turned out to do the job nicely.

Using the Code

The SQL looks like this:

SQL

DECLARE @TableName NVARCHAR(100);

DECLARE X CURSOR
FOR
    SELECT  TABLE_SCHEMA + '.[' + TABLE_NAME + ']'
    FROM    INFORMATION_SCHEMA.TABLES
    WHERE   TABLE_TYPE = 'BASE TABLE'
    ORDER BY TABLE_SCHEMA, TABLE_NAME;

OPEN X;
FETCH FROM X INTO @TableName;
 
WHILE @@FETCH_STATUS = 0
    BEGIN

        -- Create and execute a query to retrieve MAX(LEN()) for every varchar column.

        DECLARE @Query NVARCHAR(MAX);
        SELECT  @Query = 'SELECT ''' + @TableName + ''' as TableName, '
                + STUFF(( SELECT    ', MAX(LEN([' + COLUMN_NAME + '])) as ['
                                    + COLUMN_NAME + ' ('
                                    + CASE WHEN CHARACTER_MAXIMUM_LENGTH = -1
                                           THEN 'max'
                                           ELSE CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10))
                                      END + ')]'
                          FROM      INFORMATION_SCHEMA.COLUMNS
                          WHERE     DATA_TYPE IN ( 'nvarchar', 'varchar' )
                                    AND TABLE_SCHEMA + '.[' + TABLE_NAME + ']' = @TableName
                        FOR
                          XML PATH('')
                        ), 1, 1, '') + ' FROM ' + @TableName
        FROM    INFORMATION_SCHEMA.COLUMNS
        WHERE   DATA_TYPE IN ( 'nvarchar', 'varchar' );
        
        PRINT @Query;
        EXEC sp_executesql @Query;
 
        FETCH NEXT FROM X INTO @TableName;
    END;

CLOSE X;
DEALLOCATE X;

Executing the query returns a result that looks like this:

It isn't beautiful, but it does the trick. Now at one glance, I can see the maximum length of the actual values in every column in every table throughout the database. I can also see the maximum character length already defined for each column; this appears in brackets beside the column name.

Then, I can adjust the size of each column as (if) needed:

SQL

-- Resize the column, leaving some extra space for future outliers.
ALTER TABLE a.SaleOrder ALTER COLUMN CustomerOrderNumber VARCHAR(20)

Points of Interest

I don't see the STUFF function very frequently, so it didn't occur to me right away. This function inserts a string into another string by deleting a specified length of characters in the first string at the start position, and then inserting the second string into the first string at the start position. This can be handy for string-manipulation - especially here, where we generate a dynamic SQL statement that hits the database only once per table.

Some of the other steps I took to reduce the size of the database included rebuilding the indexes on all tables, and shrinking the database with DBCC. At the end of the day, the size of the database was less than 1.5 GB, for an overall reduction of about 75 percent.

History

October 23^rd, 2015 - First draft
November 21^st, 2015 - Added improvements to the script, based on suggestions from readers

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)