Making the case for code generation
Mark Williamson
As any programmer of even modest abilities can attest, writing code to implement a system can be a redundant and laborious task. Often we wonder, why must we type essentially the same exact code over and over again, from project to project. We wonder this even as we are writing the program code. We purchase expensive Integrated Development Environments which effectively are nothing more than oversized Notepads. These wonderful IDEs allow us to develop and test our code, maybe even design it visually, but seldom do they actually make our jobs easier. In order to make my job easier, a tool basically has to do my job for me, at my command.
This is the premise of Code Generation tools. To take an 80/20 approach to producing programs. Unfamiliar with the 80/20 rule? | |
Pareto's Principle - The 80-20 Rule
In 1906, Italian economist Vilfredo Pareto created a mathematical formula to describe the unequal distribution of wealth in his country, observing that twenty percent of the people owned eighty percent of the wealth. In the late 1940s, Dr. Joseph M. Juran inaccurately attributed the 80/20 Rule to Pareto, calling it Pareto's Principle. While it may be misnamed, Pareto's Principle or Pareto's Law as it is sometimes called, can be a very effective tool to help you manage effectively.
Where It Came From
After Pareto made his observation and created his formula, many others observed similar phenomena in their own areas of expertise. Quality Management pioneer, Dr. Joseph Juran, working in the US in the 1930s and 40s recognized a universal principle he called the "vital few and trivial many" and reduced it to writing. In an early work, a lack of precision on Juran's part made it appear that he was applying Pareto's observations about economics to a broader body of work. The name Pareto's Principle stuck, probably because it sounded better than Juran's Principle.
As a result, Dr. Juran's observation of the "vital few and trivial many", the principle that 20 percent of something always are responsible for 80 percent of the results, became known as Pareto's Principle or the 80/20 Rule. You can read his own description of the events in the Juran Institute article titled Juran's Non-Pareto Principle.
What It Means
The 80/20 Rule means that in anything a few (20 percent) are vital and many(80 percent) are trivial. In Pareto's case it meant 20 percent of the people owned 80 percent of the wealth. In Juran's initial work he identified 20 percent of the defects causing 80 percent of the problems. Project Managers know that 20 percent of the work (the first 10 percent and the last 10 percent) consume 80 percent of your time and resources. You can apply the 80/20 Rule to almost anything, from the science of management to the physical world.
Applying the 80/20 rule to programming
An 80 percent example
Let’s say we have a task to create a website. In today’s environment, almost no one simply wants a static website, meaning purely HTML. That would be too easy. When programmers are involved, we’ve now gone beyond the web designer’s world and have entered the domain of dynamic content. This term suggests a database driven, interactive, living, breathing, software application. Whatever your programming and project development methodology, they have the following things in common:
- Requirements gathering
- Database design
- User interface implementation
- Workflow development
- User testing
- Implementation
- Post mortem (maintenance, 1.1 release etc)
There are tools at every step of the way to help us deliver projects. The areas I am most interested, and most involved in, are database design, workflow and user interface development. It is no coincidence that these, at least for me, account for 80 percent of my project time.
Exploring our example problem domain a bit further, we will start with a database design. Even the most modest system may contain dozens of database tables. A database would be utterly worthless if there were no way to enter or retrieve the data. So, we must also program views into our data. These views may be of any imaginable method or style, and typically encompass listing our data, and allowing us to edit our data. Therefore, we might assume that for the most part, we may have a list and an edit screen for each table in our system.
Now, getting data in and out of our database system requires some other program code, and this is called a data access layer. Some people call these Business objects, or Value objects, data access objects, etc. Whatever the name, the purpose is the same: Represent our database in code. Usually we’ll have to write four queries minimum per table. These are called CRUD, and this acronym stands for Create, Read, Update, Delete.
Ok, let’s recap.
- Four queries (or stored procedures) per table, minimum
- Two views (minimum) per table
- Business objects to represent our data in code
Now, if we had four tables in our database, this equates to:
- 16 stored procedures
- 8 views
- 8 (minimum) business objects
We haven’t even begun to speak about our project’s “Business rules”, the how’s and why’s of our system. You see, that’s the lovely 20 percent.
As a Microsoft .NET developer, I can tell you that there are several attempts on Microsoft’s part to make this easier for you. However, as any other .NET developer will tell you, implementing these is still a pain, with substantial (sometimes idiotic) code to write. The above example can easily yield more than 5,000 lines of code. That’s not bad, if it’s the only project you’ll ever do in your life. Doubtful, I know. Let’s say we did one of these tiny little projects a month for a year. That’s 12 projects times 5,000 lines of code. 60,000 lines of code in a year. I’m being very modest here…
Regardless of your programming methodology, you’ll eventually have to create some database tables and write some code. No tool in the universe is going to save you from that. However, there are numerous free and commercial tools out there that will, most of the time, help you. Also, there are many that just purely suck and get in your way.
Taking the example above into the real world, I personally have never written an application involving only four tables. I worked on a contact management system that involved 92 database tables, 9 of which were join tables (and didn’t have views associated). Now, let’s do the math.
I’d safely say that for every one table of average size, there exists at least of 2,000 lines of associated code and stored procedures.
92 x 2,000 = 184,000 lines of code for that CMS system (and that’s bare minimum)
What if we had a tool that could inspect our database schema, allow us to make create some additional meta data about fields, tables, foreign keys, etc, and the apply this data model to a variety of templates, which would in turn output any type of file we could dream up?
There are several commercial applications available that perform this goal. Some of them are quite good. However, I get my kicks out of writing software, not buying software.
No Magic Bullet
Code generation is not a magic bullet. It’s not going to make your coffee for you. There may be times that you spend hours perfecting templates to yield your code. Then, you have to go back into the generated code and edit it to provide the final business rules or page level customizations that may not apply to the project at large. There are essentially two opposing opinions on code generation:
- The tool that generated the code, owns it (Generated code should not be edited) <li'>Generated code is a starting point for the final application
I am in the second category. And this is where DBInspector helps me. I generate data grids, edit forms, queries, stored procedures and such that is highly redundant coding that would be mind-numbingly boring if done by hand all day long. Then, I take this code, and apply the final touches.
Be advised, however, that getting your templates just the way you like them is an effort unto itself. But, if you work in an environment where you have standards in place for the overall user interface or coding standards, then generating code will be a real time saver. If you work in an environment where no two projects are alike, you will still save time, but, don’t expect the code generating tool to work miracles for you.
Enter: The DBInspector
At the time that I was writing the contact management system, I really really didn’t want to write 92 x 4 stored procedures, let alone about 40 editable datagrids and edit forms with validation etc. I mean, I REALLY REALLY didn’t want to. So I thought to myself, what if I used XSL to generate these for me. I figured I could write a tool that read the database schema from Microsoft SQL Server and created a model of that schema in XML. Then I could write some XSLT style sheets to transform this into code.
This solution worked pretty good for a while, but fighting with XSL is almost as frustrating and boring as writing stored procedures. I revisited the DBInspector program recently when I came across NVelocity. If you’re a java programmer, you may have heard of Velocity. Velocity is a templating engine that can introspect some objects and apply their properties (fields) to a template, spitting out whatever interesting details you could come up with.
Well, some kind fellow ported Velocity to .NET and I integrated this package into my DBInspector tool. Velocity templates are very simple, but their utility is certainly worthy of admiration. With simple if..else, foreach, and macro constructs, I have created some very potent code generation templates, and I’m not especially adept at Velocity templating.
The 80/20 rule continues to apply even to code generating tools. The widely accepted assertion is that a code generating tool will get you 80 percent of the way to having workable code, but you still have to provide 20 percent. In some cases, that 20 percent is still a lot of work, but at least we can focus on the business rules of the project, and not being merely code monkeys spewing out thousands of lines of code.
The remainder of this document will show you how I use DBInspector to reverse engineer a database, generating literally thousands of lines of code for merely an hour or two of my time (and that’s being generous).
The first step to starting a project with the DBInspector is to create your database tables. Bone up on third normal form and relational database design, it will help you a lot!
It’s difficult to represent this, however, the above is the schema diagram for my portal project which was derived from the Microsoft ASP.NET IBuySpy portal starter kit. I have a few more tables than the original, but you can see there are almost two dozen or more tables. Not too bad. I first used DBInspector to generate the stored procedures for this database. The nice thing about DBInspector, and other tools like it, your code will be consistent. That means if you made a mistake in your template, at least your mistake is consistent!
Before we generate some code, let’s talk about how to design your database to maximize code generation success.
I can uniformly state that not all database platforms are created equal. Each has its place in the universe, and they must all peacefully coexist. With that said, I have a preference to Microsoft SQL Server. I find the tools easier to use than Oracle and mySql, and it has all the features I could conceive that I would need. Buried under the hood are some really great stored procedures that tell you a lot about the database. Namely, they are:
sp_tables
sp_columns
sp_keys
Each gives you a plethora of information about the structure of your database. DBInspector utilizes these procedures in modeling your database in an internal metadata form, and eventually will write this out as XML. The XML is only used for maintaining state of some settings that you’ll make within the DBInspector program. These settings cannot be stored in the database itself, so we opt for XML to help us out here.
There are fundamentally only two classes of data that we are interested in:
Tables and fields
Sure, we use other information about the database like primary and foreign keys, but the bulk of what we want is in two classes: TableInfo and Field.
So, we have our database and now we want to write some stored procedures to input data and get data out of our db. I won’t discuss the reason here for using stored procedures, I will merely suggest that if you’re still cluttering your code with SQL statements, bad bad bad for you!
Here is a screen shot of what the sp_tables procedure outputs:
You can see it simply lists the tables and their type. Oh my, how boring. Yes, well, its an important start. The next step is to get some field information. We do that with sp_columns. I’ll spare you the boring stuff. Now on to the DBInspector.
Upon loading the program, we are asked for a config file to load. (Details on this config file in another readme.)
Once we are loaded, DBInspector automatically interrogates the database and fetches the schema. This includes all tables, fields, primary and foreign keys. It even introspects views and stored procedures for those curious enough to dive in.
The user interface is simple, yet affords sufficient functionality. The list of checkboxes on the left is used to select and configure fields for inclusion in your data access layer. In very few cases will we want all of our tables included here. Some concepts will need explaining (later) so try to follow along.
As we click on our fields, we see our field properties will change. Most of this is for informational purposes, however, three items worth discussing are:
- Search criteria field
- We can use this to create a flag, which later can be used to tell our Velocity template whether the current field was flagged to be used as part of a search criteria. Depending on your code, and your implementation, you could potentially create a business object which is used to specify search criteria to some stored procedure. More on this later.
- Data grid field
- This was added because I needed a way to tell my ASP.NET datagrid generating templates that I wanted this field visible or invisible on the grid. The template I wrote still creates all the code to display the field, but its visibility is set to false. Presumably, all you need to do to make the field visible in the grid later, is to change the value in the generated ASP.NET code to True
- Foreign key to display
- This is a very powerful and handy piece that warrants an in-depth explanation.
Using Foreign Keys
Lets say we have two tables called called Customers and CustomerType:
It is convenient sometimes to be able to retrieve fields from Customers as well as the CustomerType.Name field in the same query. Loading DBInspector, and inspecting our sample database, we can set each of the fields according to our application needs. Shown below is our CustomerTypeID foreign key field:
By selecting Name in the Foreignkey Select Field, we are telling DBInspector to pull the value from the CustomerType.Name field for our associated Customers record. The generated business objects and stored procedures will accommodate a new field called CustomerTypeIDName which contains this value. Note that this field isn’t part of the underlying Customers record. Thus, our generated class file looks like:
The field is populated by the stored procedures (which are also generated). There is a caveat to this scheme: If you have a nullable foreign key relationship, where some records in your primary table will not have matching records in the foreign key table, no results would be returned. This is actually easily overcome by adjusting the JOIN operator in the procedure code. With some work, we could probably handle this elegantly (database platform dependent of course).
Some samples of generated code:
As you can see, the code is consistent. This is one of the nice things about code generators, the code is always consistent. However, if you make a bug in your template, the bug is also consistent :)
Our templates for generating stored procedures can get pretty complicated, but once you are used to the syntax (and the help of the syntax highlighting editor from SharpDevelop).
Shout outs
DBInspector implements a few very cool Open Source projects, without which it would just be an empty shell. This is the reason DBInspector is also opensource. These other initiatives are listed below and you are encouraged to check them out:
Here are some good reference links for you to enjoy: