Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / DevOps / Git

Git - How to Validate Commit Messages?

4.14/5 (3 votes)
26 May 2019CPOL8 min read 21.1K  
To make sure that you validated all your GIT commit messages is not so easy, let me show why

Introduction

Last time, I wrote an introduction to GIT for beginners. This time, I would like to give a solution on a slightly advanced problem in GIT. I had to solve the following issue: all commit messages need to follow some specific rules (maximum length of line, etc.) and no commit should be able to be pushed if its commit message does not fulfill these rules.

I thought there should be an easy solution for this problem and for sure, a lot of people already solved it, since validation of commit messages should be needed for several projects. In fact, it was not so easy. Let me describe why!

Git Commit Hooks

In GIT, you can specify so called commit hooks. These are scripts which are called in case of specific actions. There are commit hooks which are running on “client” side and there are hooks running on “server” side. I’m using quotes, since we know in GIT, there is no specified role for server and client repos, each repo can behave both as server and client. You can find all these commit hooks under .git/hooks in your git repo.

There is a commit hook called commit-msg, this is exactly what we need for that purpose. If you are calling the git commit command, this script is invoked, it is getting the commit message as an input parameter and if it doesn’t return with 0, a commit will be discarded. It sounds really good. The only problem that this script is running in case of a commit on the “client” repo. Which means if the client, the one who cloned the server repo, is removing this commit hook from the .git/hooks folder of the repo, he will be able to commit whatever he wants. So this is good as a first check, but it still does not solve the problem in a secure way. Furthermore, the commit_hooks are not version controlled, so you need to find another way (maybe through some additional scripts) to copy them to the client repos. Alternatively, you can create a symlink between the hooks folder and a version controlled folder.

If we want to be hundred percent sure that no invalid commit message will appear in the server repo, we need to check it on the server side.

For checking on server side, one possibility is the usage of commit hook. This commit hook is invoked whenever someone pushed something to the server and if it has a non-zero return value, the push will be refused.

The only problem is that this commit hook has no clear input about which exact commit has been pushed. It can read its input from the standard input and it contains only the changes of the git references in the following format: old value new value reference name.

How to figure out now which are the new commits?

First, go a bit deeper into git and learn about references.

Git Branches and References

As you are pushing commit in git, you are always pushing to a branch. By default, you are on master branch, but you can create new branches anytime which are branching out from an already existing branch. If you are doing a git push, it is either pushing your branch to its upstream branch if it exists. If you fetched the branch from the server, its upstream branch will be set up automatically. The upstream branch is always a branch on the server. If the upstream branch does not exist or if you are in detached head mode (you are not on any branch, your head is just pointing to a random commit), git will ask you to specify to which branch are you pushing (like git push origin master).

Let’s go one step back now. What is a git ref? A git ref is like a named pointer to a specific commit in your repository. You can find all references under the .git/refdirectory. And branches are nothing else than special references. They are also just a pointer to a commit, but if you are committing something new to the commit, the branch will automatically change to point to the latest commit on the branch. But it is just a named pointer, nothing else.

What Happens at Git Push?

Commits don’t know much. They know their own content and their parent commit. In case of merge commits, the commit has multiple parents, otherwise only one. The very first “root” commit in the repo has no parent at all.

So if you are calling git push, you are always pushing one or more (by using gitpush --all) branches. You are letting know with the server first that which is the new commit the branch is pointing to. And this is the value that you get as input for pre-receive commit hook. Push commit hook also checks if the branch already exists on the server and if yes, then it lets the pre-receive hook know what is its previous content.

Then the server checks if it already has that commit or not (commits are stored under .git/objects). If not, then it gets the commit from the client and checks what is its parent. If the parent is not on the server, the parent commit will also be moved to the server. It continues until the first parent commit which is located on the server.

How to figure out in pre-receive hook which commits are new?

The biggest achievement is that the pre-receive hook only tells us which references have been changed to what and nothing else. Our goal is to validate all newly push commit messages, but nothing else.

The first and easiest case is if someone pushed commits to a branch which already existed before. In this case, we get the old value of the reference and the new value of it and with git log old_hash..new_hash, we will see which are the commits between them.

There is one corner case when this method shows more commits than necessary: in case of merge commits, it shows the whole content of the merged branch, however it can be that that branch is already pushed at least partially.

I also need to mention the case when the reference (or branch) has been deleted. In this case, the new hash will be 40 times 0, but that also means that no commit messages need to be validated.

The last case to be covered is when a new branch has been pushed. In this case, the old hash of the reference is 40 times zero and we have the new hash of the reference. That means we have only the hash of the latest commit on the branch. What to know? After some investigation, my idea was to do the same as the push does. Check to latest commit and then jump to its parent commit, in case of merge commits, do the same with all parents and stop this activity once we reached a commit on the branch which was already pushed before.

This idea sounds good, but how to figure out if a commit was already there on the server or not. For sure, there are multiple solutions, but it took some time for me to find one which is working.

My solution is git branch --contains this command returns a list about branches which are containing a specific commit in their history. But pay attention! Since git stores only a reference to the latest commit on the branch, all commits which are ancestors of this commit are on this branch. So if I’m branching out at a point from the master branch, then all commit on the master branch which was before my branch are also part of my branch. There’s one more thing to notice: the branches on the client and the branches on the server are not the same and this will be the solution for our task.

Based on my experience, all commits on the server belong to at least one branch, since it is not possible to push a detached commit. The pre-receive commit hook is called before changing the references. That means all commits which were not pushed before are not part of any currently existing branch, but all commits which were already there are part of at least one branch. And this is the fact we can use here.

Summary

Let me summarize the solution for checking git commit messages on server side commit hooks.

Start by the latest commit of the branch, go parent by parent and check if this git branch --contains for the commit returns an empty list. If so, validate its commit message and check its parent, if not, then this commit has been already pushed before, we have nothing else to do on this branch. Pay special attention on merge commits, to check every parent.

I hope that this solution is correct, until now, it has passed all test cases and I also hope that it helped you to solve your task.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)