Developer Drain Brain

November 27, 2009

Gated Checkins Revisited

Filed under: Development — Tags: , , , — rcomian @ 4:31 pm

So I’ve had a play with gated checkins in 2010 and I have to say I’m reasonably impressed. Not massively impressed, but reasonably.

I can finally see how the implementation may actually help. It pushes a lot of the problems back to the developer in a way that’s offline, so that the problems aren’t sorted out by mashing trunk, you just have to mash your workspace instead. And mash it you may.

First and foremost, the thing that wasn’t clear to me, is that the checkin process is properly separated into 2 stages. 2 things normally happen on checkin:

  1. Your changes are collated and sent to the server.
  2. Your workspace is updated to mark your files as no-longer changed by updating the ‘base’ revisions of the files in your workspace.

This second part is basically like a get operation on your changes and is what is delayed when gated checkins are used. With gated checkins, the process is much more complex:

  1. Your changes are collated and shelved.
  2. A build job is queued against the shelveset.
  3. The shelveset is merged onto trunk.
  4. The merged result is built.
  5. The shelveset is merged with trunk again (it could have moved on whilst the build was going on) and committed.
  6. An agent on the client machine notices the commit and runs around your working copy marking files as being at the new revision. No changes appear to be made to your code during this step.

It was step 6 that was the missing part of the story for me. The obvious way to do it is just to hold open the commit, but 2010 makes it a separate job with its own monitoring process. After commit, whilst the build is going on, by default your workspace still shows all your files as having been changed. You can continue to work and it looks like your coding on top of the changes you just sent up. When the commit completes, you get the option to reconcile and all the relevant changes disappear and diffs suddenly show the changes since the commit, instead of the changes including the commit.

Obviously, since there’s more steps, there’s more opportunity for failure. Failure at any point leaves the shelveset in tact and you’re notified via the daemon that watches the build on your workstation. Failures include anything other than a truly trivial merge in steps 3 or 5, and of course, any failures you define in the build.

Things can get a little messy on failure. Your first option is to simply ignore it. If you’ve carried on working, your workspace still shows the pending changes from the commit plus the changes you’ve made since so you can just push up a larger commit next time. Your other option is to try to fix the actual problem, to do this it feels like you’d typically shelve your current changes and revert them, then unshelve the failed commit. Fix it as you normally would and commit again. At which point you can switch back to the shelveset you originally made and carry on where you left of. Do this, however, and reconciliation will fail and you’ll be left to sort out the mess manually.

The second option gets quite messy, since the code you committed has diverged from the code you’re working on. It can’t be resolved by the reconciliation process, and you need to sort it out yourself with a get latest. This leaves you to resolve the merges in the normal way. Now, your merging fixed code with unfixed code, so it’s up to you to remember where the fixes where and ensure that they get merged in properly. Good luck.

By default the commits are queued sequentially, but I can see it would work just as well by building everything in parallel as well, since any merge conflict at any point will cause a commit failure (and the related heart-strain). I still maintain that this will be a major issue if anything makes the queue length build, but I’m generally more optimistic that it could work. I did get into a bit of a mess trying to fix shelveset in some scenarios, especially when the reconciliation failed.

Essentially, you really need to excerice the 3rd option and stop work and wait for the build to complete before you carry on working. In reality, that’s what tends to happen with CI anyway. Most CI’s work in what TFS calls Rolling Builds – which means that there’s no queue, so you have to wait a maximum of 2x the build time to see if your commit is good, unlike gated checkin. Also remember that even though you’ve resolved conflicts before commit, if something was queued up before you, that may still conflict and your otherwise perfectly good change will get rejected anyway.

So keep your builds quick and your queues short and gated checkin works quite nicely.

 

November 23, 2009

Gated Checkins

Filed under: Development — Tags: , , , , — rcomian @ 2:21 pm

Back when I was working with subversion, I was at one point writing a hook script to check that a given commit was good. You know the kind of stuff: Have you added a comment, are you trying to change a tag, are you trying to hack the server, have you brushed your teeth, and so on.

It’s a fairly obvious thought when you’re in that environment to think … is this going to build? After all, here’s the perfect opportunity find out, and if it doesn’t we can just reject the commit and it’ll be as if it never happened – no-one else gets affected, nothing bad polluting the trunk, better chance of having a 1:1 workitem/changeset ratio, it’s all good.

It turns out that this idea is called gated checkin, and it’s available as a checkbox in TFS 2010. But despite my glowing report, I’m yet to be convinced of gated checkin as something that isn’t just a nuisance, compared with CI. I rejected it on the subversion side of things because our builds always took forever, and whilst a build was being checked, no-one else could commit. Not only that, but because I used a single repository for everything, it meant that no-one else could commit even if they were on a different project entirely. Thinking about this differently, this could be the best argument I’ve found for using individual repositories per project with subversion.

It’s much the same argument with TFS – the checkin is held open until the build and any other checks you use have completed, preventing anyone else from doing anything commit related. I mean they can’t even resolve any commit conflicts reliably, because they may or may not need to incorporate the changes that are being tested. This means your builds must be quick. This means they must be small. This means the projects must be small. Allegedly gated checkin becomes a requirement when you have >~60 people on a branch, as just by the laws of statistics, the branch is more often broken than healthy. Having 60 people on a single branch doesn’t sound small to me. If you’ve got 60 developers checking in every 2 hours, that’s a commit every 2 minutes. That’s your timeslot before your start impinging on developer productivity.

I’m not sure what the solution to this is. Apparently gated checkin is the solution to world hunger. Personally I prefer the idea of having less people on a single branch and merging more. Split the project into components and release them separately. Anything really. But I guess I’m going to have to try these before I can make a decision for sure. After all, they’re easy to setup and take out. If the builds are fast enough, they’ll certainly make the branch history cleaner, and surely that’s a good thing.

Change tracking in TFS

Filed under: Development — Tags: , , , , — rcomian @ 2:16 pm

One of my main bugbears with source control is knowing where my changes are. I don’t mean just having a good idea, I mean actual empirical evidence that I can look at and easily convince someone else that changes have been moved from the project branch to trunk, and from trunk to release.

Now this is always possible by looking at the code, but that just doesn’t scale – it can easily take days to work out what’s in a particular release, and by the time that’s finished the world will have changed anyway. We need something high level, something that says this branch contains these changes from these locations. TFS goes some way to addressing this with a nifty visualisation in the new 2010 suite. Given a changeset, you can follow the merge history of that changeset across the branches of your project.

This is excellent if you’re looking at a particular change and want to know where it is. You can easily see that it has been flowed in the right direction, and can see how much of a mess you’re getting yourself into. This visualisation also handles partial merges by showing them in yellow for the time that they’re partial (hopefull they’ll get fully merged after a while, letting them go green).

If you want to know where a particular change has made it to, this view is perfect. In my experience, however, we rarely want to know where a single changeset has gone, we’re much more interested in where a workitem has gone. Although we’d all love to think that there’s a 1:1 correspondence between workitems and changesets, it just aint gonna be so, regardless of how you work. MS are working on this, of course, although I’m not holding my breath for the initial 2010 release.

The other problem is again one of scale. Whilst this view is very useful, one thing that we need for a release is a report on all the items which have been merged into this release from trunk (and conversely, what’s missing). Ideally this should be at a workitem level as well, but a changeset level report is a good start. Subversion has this with its mergeinfo command. Tortoisesvn shows each revision and greys out the ones that have already been merged. Git has an incredibly detailed view of all the changes that were made on parallel branches. I’m not sure if that’s any use in telling you what’s missing, but it’s wonderful for knowing what’s in. These all work on the changeset level, however.
I haven’t found anything equivalent in TFS yet. The information is available in TFS, so it’s likely trivial to build it manually.

All of this tracking is of no use, however, if your changesets mean nothing more than it was Friday night and were preparing for the weekend. Also, a commit comment of “some changes” isn’t going to be useful for working out which changesets you’re looking for. If you’ve never felt the need to be disciplined in making a changeset be a single unit of functionality before, perhaps this can inspire you as to why it might be a good idea.

Branching in TFS

Filed under: Development — Tags: , , , , — rcomian @ 1:51 pm

Some of you may know that I’m a strong advocate for using some of the more flexible branching strategies available to us developers, given the advances in source control over the last 20 years. TFS 2010 is adding in some extra branching support that should make this more compelling.

In short, TFS 2010 knows about branches explicitly. That is, you tell it “this folder is a branch” and it can track things around in a nice visual form. You start out with marking your trunk/mainline branch, then branching from that. Visual studio tracks what gets branched where and can show you the relationships between branches in a nice, simple diagram.

This shows the logical structure of the branches. Here we can see that 3 branches came from Dev and an additional branch was made from “Brian”. The logical structure shows where branches came from and, conversely where changes need to be merged back to. The actual meaning is branch dependent of course – release branches would have changes merged into them from trunk, development branches would have their changes merged into trunk.

The physical structure is where the branches are located in Team Explorer and will be completely different to the logical structure.  The recommendation is that folders are made to hold branches of a common type, so that the physical structure tells you the reason for the branches existence, such as private branches, feature work, stable release branches etc. Each branch can have a description, which is useful for finding out detail about an individual branch, but grouping them in folders makes sense.

This screenshot shows the branch structure in physical form. Notice that branches look different to folders. Also notice that here the Dev and QA branches are siblings in the physical structure, where as in the logical diagram QA is a child of Dev.

Branches can be made quite easily by dragging and dropping from the logical diagram or right clicking a branch on the physical diagram and choosing “Create new branch”. The whole branch process has been streamlined as well. Apparently old TFS checked out all the files onto your machine and waited for you to check them in as a pending change. HUGE waste of time, effort and bandwidth. Now it just makes the branch on the server and you can check it out when you see fit.

Some of you may know that I’m a strong advocate for using some of the more flexible branching strategies available. TFS 2010 is adding in some extra branching support that should make this more compelling.

TFS 2010 knows about branches explicitly. That is, you tell it “this folder is a branch” and it can track things around in a nice visual form. You start out with marking your trunk/mainline branch, then branching from that. Visual studio tracks what gets branched where and can show you the relationships between branches in a visual form.

Create a free website or blog at WordPress.com.