Git will save your life, exhibit number 3847568

· 3 minutes read

Today I had to revert some research data to an old version for the revision of a paper that is currently under review. Unfortunately, we have since then updated the data and extracted more, thus the original dataset is no longer available. This is a big issue, right?

Wrong: thankfully, the whole data repository is under version control (with Git) with all data cleaning history being tracked and stored.

I have been preaching that version control will save your life (eventually, research-wise) for a long time now, and I am glad to report back exhibit number 3847568 supporting the idea that it actually does!

Now, let’s see how I could restore the old version I needed for this revision. First, note that I did not want to fully reset (e.g., using the git reset command) the history, as we need to keep the new data: I just needed to be able to access data as it was at some point back in time.

Here is a visual representation of the current status of the repository:

Screenshot 1

Note that this is not the actual repository, it’s just an example created ad-hoc. We want to be able to access the data at the point in time corresponding to the highlighted commit (tagged by the 589c3aa hash value):

Screenshot 2

We do this by using Git branches, which are cheap to maintain (and flexible!). Specifically, we use the git checkout command, which is generally used to switch between branches; in this context, we want to switch to a new branch pointing to the 589c3aa commit.

The command we need to use is:

git checkout -b old-data-for-revision 589c3aa

Note that -b is followed by the name of the new branch to be created and by the commit hash. After running this command, the status of the repository will be:

Screenshot 3

…and there you go! What is nice about this setup is that if we want to switch back to using the updated dataset, all we need to do is git checkout main – nice and easy. And if we had more data versions to keep track of, we could either go back in time and repeat the above or just create a new branch for each version, which we could always check out when or if needed.

That’s it from me for today, but before we wrap up, repeat with me kids:

You never know when or if you need Git, but when you do, you’ll be glad you invested the time!

I mean, I can only imagine the headache and waste of time it would have been if I had to manually go back in time, let alone all possible issues with reproducibility!