Commit History: Your Project Only Real Documentation

In today’s fast paced world of software development more and more teams are working in lean and agile mode. Maintaining a project documentation of low-level designs is most often considered a waste when following those development methodologies. But even if your team does not follow a strict development process, it is hard and tedious work to keep up your design documents up-to-date with your code base. It is hard to force a team of 50 people to go update a wiki page every time they make a change in the code. And what is the value of that? Barely anyone reads those papers even if they actually exist. There are much more pleasant ways to onboard new people or gain knowledge in a new system. We can do pair programming, ask questions in Slack channels, chat with a co-worker over a beer, read the source code. Also writing some new tests could be much more enjoyable experience than reading through tons of outdated design documents.

A design document is outdated even before it is implemented.

In short: the only artefacts that stand the test of time are your Git commit history and your tests suite. If crafted well these could serve as the most extensive up-to-date documentation of your project design decisions and business use cases. In this blog post I will focus on the first aspect - the importance of clean Git comit history.

As this is not a Git tutorial I would consider that the audience has some basic knowledge of working with Git. If Git techniques such as cherry-picking, rebasing, interactive rebasing, which are used to build the commit history that we want, do not ring any bells with you, it would be much more beneficial for you to get familiar with Git and then re-visit this blog post to get the most out of it.

Craft your Git history just like you craft your code.

Git commit messages are a mean of communication within the team. But even if you work alone on a project they are still a mean of communication between your current and your future self. Write them with utmost care. You write readable code because code is written once but read many times after. The same holds true for Git commit messages: they are written once but read many times later on. You can easily see what the whole dev team is doing by simply browsing through the Git commit history.

What is a good Git commit message?

A commit should ALWAYS have a message body

Always, Always, Always! Even for small commits you can elaborate on why you are doing that change. Do not try to explain what the code does. That should be clear from the code itself. In case it is not - then re-work that code. Do not focus on WHAT, focus on WHY. For example, saying “Fix a minor style issue” is not helpful at all. Explain what was the issue and why it is fixed in that particular way. Why now? Never consider writing a longer commit message as a waste. Most probably you will be the one trying to remember things after several months while debugging issues and you will thank your past self a lot for spending the time on writing a good explanatory message. Git blame with good messages will give you all the context you need to truly understand the problem and solve it quickly and cleanly. So in case you are using git commit -m now may be the right time to troll your own shell and alias that into something that slaps you :)

A commit message should reveal the intent.

Keep the title short (where short is under 50 characters). That helps a lot when someone does git log --oneline as he would need to scan through small focused lines of text to quickly spot what is needed. Try to save yours and your colleagues time when searching and reading through commit history.

A commit should be small focused on one change only

Again - the same principles as with writing good code: keep things small. Small is good, small fit in your head, small is simple. As you want to have small objects and functions that do one thing only, you should should do the same with commits. Keep them small and focused. One commit should describe one change only. Long commits are hard to manage (rebase, merge, resolve conflicts). Long commits fail to provide you with context when doing git blame. As you have to read through a whole lot and filter out only what you need. It will take you much longer time to get into the context of a change if that commit message contains numerous other changes as well. Long commits result in large pull requests and those are a pain to review as well as a pain to merge.

A commit should include a link to a ticket (Jira card)

You want to be as much descriptive as possible in your commit messages but still if your commit messages are several pages long then no body will read them. And even if they do, it will be a waste of their time to read through tons of text only to filter out the valuable information. Save your time. Save your teammates time. Do not copy and paste the scoping documents or Jira card descriptions into the commit messages. Simply refer to them. So that anyone who needs more context than what is already present in the commit message can go to that card or document and get it from there. Keep the commit message related to the small change it introduces. You don’t need to tell the whole story here. You can either link it or tell it in the pull request description. A long commit message is a smell. It points out that may be too much is being done in a single commit.

Stay concise when writing a commit message.

How to achieve a clean Git commit history?

First and foremost - learn Git. Pro Git is an excellent book on Git. There many videos and tutorials out there. Even if you know the basics, go beyond that. Make sure interactive rebasing is something you do naturally without efforts. Always review your pull request commit messages before merging into master. Re-write your commits with interactive rebasing and force push your feature branch as much as needed until you are completely happy with the story your commits tell about your changes. All should read like a story. It will help reviewers, it will help your team. The time taken is never wasted. It is a very good way of communication and knowledge transfer within your team. Even team members that are not required as reviewers will understand what you are doing and most importantly why you are doing it.

An example: Git add –patch

I spend a lot of thought on what should be a good example of crafting clean history as one may easily write a book full of examples. Writing example commit messages may not be that beneficial or interesting. Yet another interactive rebase tutorial is also not needed. So instead I decided to focus on something else that I see often more and more developers not doing it right - splitting changes.

Spliting changes that reside in a single file

Sometimes people introduce two separate changes in a single file while consumed in the working process. Later on when reviewing your commit messages and making sure yout commit history is clean, they notice that a commit message describe two distinct changes. So they decide to split them up. So far so good. The issue arise when the changes to be split are in the same file. That feels unnatural and hard to fit in one’s head as it would mean that one file has to be both staged and unstaged at the same time. How that could be even possible? Let’s see.

Once we are done with our changes we stage and then commit them. However, we may need a bit more control over staging as git add could be a bit more coarse than what you need. Git add --patch to the rescue. That will present you with an interface asking you how to deal with each change. Your changes will be split in hunks and Git will prompt you what to be added and what to be left behind. Hit h at this point and you will see all available commands with their descriptions.

The most common that you would want to remember and use regularly are:

  • y => stage hunk
  • s => split hunk
  • n => leave unstaged

So far so good. You use those three commands to split your changes as a pro. But what if the lines that you want to split are too close together and Git does not know how to split them. For example, you have:

this line goes into commit #1
this line goes into commit #2

Git does not know how to split this “hunk” as the lines are adjacent and to Git that looks like a single hunk. At this point you need like a secret editor where you can manually tell Git which line goes into staging in a line by line mode. Luckily Git provide us with such a secret editor. Simply hit e and you will enter editor mode where you can “edit” what is staged and what is unstaged line by line. In short: space re-adds “-” lines, i.e. they won’t be deleted, while deleting “+” lines removes them, i.e. won’t be staged. Help is inlined into that editor at the bottom so you don’t need to remember those as I don’t and could have messed them up in here :)

# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.

To see whether or not you have staged the right changes you can do: git diff --cached. If everything looks great then you can do the first commit from what you have staged. Proceed in the same way with the next set of changes. And please - don’t freak out if you see the same file both as staged and unstaged. That’s normal :)

Takeaway

Craft your commits as you craft your code. Keep them small, focused and revealing intent.

References: