Thursday, 16 February 2012

Fork First or Just in Time?

I’ve been following the progress of Code 52 a bit over recent weeks. It is an audacious attempt to create a new open source project every week for a year. They also seem willing to accept code contributions, so I was tempted to download a few of their projects and make some minor improvements.

I’ve been using Mercurial for my projects at CodePlex for some time now, but I hadn’t used git in anger, and since Code52 store all their projects on the very impressive github, it gave me a good excuse to learn.

I read up a few tutorials on how to fork a repository in github. The official guide is good, and Scott Hanselman wrote a great post on how to contribute to Code 52 projects. But one thing they all have in common, is a workflow of Fork, Clone, Commit, [optional: Pull & Merge], Push, Pull Request. This has always struck me as being the wrong way round. (CodePlex recommends essentially the same workflow for Mercurial).

Fork First

The reason I don’t like this workflow, is that it assumes the first thing I want to do is create a fork. But that’s not how I typically interact with an open source project. My workflow goes like this:

  1. I come across a new open source project and maybe I find it interesting
  2. Often I will just want to download compiled binaries, but maybe I want to explore the code to see how it was implemented
  3. I clone it (git/hg clone) and maybe I will get round to playing with it later
  4. I attempt to build it locally and maybe it succeeds on my machine (surprising how often it doesn’t)
  5. I attempt to use it and maybe I find a bug or I wish it had a new feature
  6. I report the bug or feature to the developers, and maybe I think I could fix it myself
  7. I explore the source code, and maybe I understand it well enough to make a change
  8. I begin coding a fix/feature, and maybe I get it working
  9. I realise my code needs cleaning up before I issue a pull request, and maybe I get round to doing so
  10. If I have made it this far, now is the time I am ready to push to a public fork and issue a pull request. I estimate I get to this step on less than 1 percent of open source projects I come across.

As you can see, it is only at step 10 that I need to have a fork, but the tutorials all want me to make my fork at step 3. This results in lots of projects having multiple forks that have never been pushed to. Or have been pushed to but no pull request ever submitted, leaving you wondering what the status of the changes is.

Just in time fork

In my opinion, forks (which are really just publicly visible clones), should be made just in time. Currently, Code 52’s Pretzel project has 47 forks, and as far as I can tell, many (most?) of them have had no changes pushed to them at all. (In fact, a nice github feature would be to hide forks that have not been pushed to yet, and to highlight forks that have pull requests outstanding).

The just in time fork workflow isn’t difficult. First clone from the main repository. Think of this clone as your private fork if that helps.

git clone

Once you decide to make some changes to your repo, you can make a branch to work on (not strictly necessary, but recommended).

git checkout –b my-new-feature

Now work away on your feature. You can pull in changes from the master repository, and optionally merge them into your working branch whenever you like.

Once you are sure that you want to contribute to the project, at this point, you create your public fork on github. Now you add it as a remote:

git remote add myfork

You can now easily push to your github fork. I think it is probably best to also have a feature branch on your github fork, which means that if you wanted to contribute another unrelated feature, you could do that in another branch, and have two pull requests outstanding that weren’t dependent on each other.

git push myfork my-new-feature

The github gui makes it very easy to issue a pull request from a branch.


Why create dozens of unused forks when it is straightforward to create them at the point they are needed? Am I missing some important reason why you shouldn’t work like this? Let me know in the comments.

Post a Comment