Developer Drain Brain

November 17, 2010

A Journey Through Source Control. Part 2. Jumping to the 3rd Generation

Filed under: Development — rcomian @ 1:27 pm

Ok, I admit it, most people just don’t care about source control, and why should they? It’s a tool that they use to get things done, and so long as they don’t lose their work, who cares about anything else?

I think it’s because of this disinterested mindset that a lot of people completely missed what the 3rd generation of tools were about. And it’s not too suprising, advertising like “ACID compliant transactions – never partially commit a set of changes again!” aren’t exactly hinting at the revolutionary upheaval that had happened on the server side, it just sounds like CVS in a database.

But what happened was that the history in the repositories was completely pivoted around. CVS, VSS, et al would keep a single tree of files, and each file would have it’s own history. SVN, TFS, etc, don’t keep a set of files, they keep a set of trees. Each revision in the repository represents the state of every file in the entire source control tree at that point in time. Think of a stack of playing cards, each with the entire source control tree drawn on it. Each time you commit a change to any file in the repository, an new card is drawn up with the previous tree + your changes and put on the top of the pile. This means that the entire source control in the repository can be reliably retrieved by giving just one number – just tell the system what card you want and you can get it.

No longer do we need hacks like ‘labels’ to make sure that everything is kosher and retrievable, everything is labelled with each and every change as an automatic by-product of how the repository works. Of course, we don’t lose the ability to view the changes that were made to a particular file: given a path, we can diff the contents of the file in one revision with the contents in the next and get the changes just like we could before. But we also get a brand new ability – by getting the difference between one tree and the previous tree, we can see all the files that were changed at the same time – and it’s this set of diffs that makes up a ‘changeset’.

It’s this natural creation and handling of changesets that gives us a much better ability to migrate changes around the repository. Say I have two identical copies of a file in two places in the repository. If I make a change to file A, I can retrieve the changeset at any time I like, this results in a diff, and I can apply this diff to file B whenever I feel like it. Expand this concept to directories and work with changesets instead of individual diffs, and we’ve got the basis for a very flexible and simple branching and merging system. Branching is simply making a copy of a directory in another location, merging is simply getting the changesets that were applied to one directory, and applying them to the other. This branching and merging can all be tracked within the repository itself and voila, we have a modern 3rd generation system with full branching support.

So now we know what the differences between the 2nd & 3rd generation are, we’ll address in the next post some of the pitfalls that we fall into when we try to think of a 3rd generation system in terms of a 2nd generation system.

And don’t worry, I’ve not forgotten about the 4th generation, that’s coming, but since it builds on what the 3rd generation gives us, I think it’s important to square that away first.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: