Developer Drain Brain

December 21, 2009

How to reference shared code

Filed under: Development — Tags: — rcomian @ 12:42 pm

One of the things that isn’t entirely obvious with source control is how to go about managing shared code and libraries.

The point is to keep your build stable and clean throughout its lifecycle whilst making life as easy as possible when releasing, branching and merging.

Keep in mind the main tenant of good source control is:

The ability to get any version of a project from a single location and build it with one command.

If we can do this, it’s an indication that we’ve got good control over our source. It’s kind of a litmus test – not a goal in itself, but if we can do this, then we will have automatically enabled a whole host of strong abilities.

In some respects, this goal will never be achievable. We’ll always be dependant on some aspects of the developer environment. At the very minimum, we’ll have to have the source control client and the base OS available through some other mechanism before we can do this. We’ll probably also need the compiler and some of the more common libraries and header files that go with them. But if this environment is small, well documented, stable and backwards compatible, then it really won’t be too difficult to manage.

But once the environment is sorted, we then need to find a way to manage the softer dependencies of our projects. The main ways of handling these are:

  1. Have everything in 1 huge project.
  2. Branch shared projects into each project as required.
  3. Reference shared projects through some mechanism like svn:externals.
  4. Install dependencies separately into the environment.

Each method has its pros and cons:

Pros:

  1. Simple to understand. Everything is always available.
  2. Have strong control over the version of the dependencies being used. Can make controlled local customizations to the dependencies.
  3. Lightweight. Strong documentation in all cases over what uses what. Can switch to unreleased versions to make library changes in parallel to project changes.
  4. Easy to understand. No reliance on source control mechanisms. Integrates well with platform abilities (ie, registered COM objects, plug-ins, registry settings, etc).

Cons:

  1. Breaking changes immediately affects all users of a library. History for a project does not include useful references to the changes in its libraries. Tendency for projects to become tightly coupled. Hard to know what uses what. Stability means getting and building everything with one checkout. Difficult to reliably branch just a single project. No real definition of project boundaries.
  2. Updating to new versions of libraries can become laborious. Can get to rely on customizations when not appropriate. Usually reliant on a single location to merge changes from (ie, a “latest version” branch) which can make having parallel versions interesting. It can get confusing to tell exactly what library version is being used when merges come from different places.
  3. Difficult to customize libraries. Needs source control support. Need to be aware of the stability of what you’re referencing. Tendency to recurse dependencies and cause conflicts and circular references.
  4. Installing deps is a manual process. Need to be told when dependencies have changed. Installation/Upgrade/Removal is a heavy weight activity. Complicates release of dependencies. Installations are machine wide which makes working on parallel projects/versions difficult. Locating source code can be awkward.

As with most things, you’ll probably use a mixture of all the techniques in real life. For example, having a compiler or IDE installed is an example of using option 4. Merging two or more projects together into a larger release unit is an example of option 1. Use of vendor branches as defined in the subversion book is option 2. Almost every use of externals is option 3. Of course, the trick is to find the appropriate times to use the techniques and to know the technical difficulties involved with each.

When faced with the issue of shared libraries, I tend to favour option 2 for its tight control over exactly what gets released. However, it’s a fairly heavy-weight solution and only really comes into its own when customizations have to be made on a per-project basis. Option 3 provides a lighter weight solution that provides the same guarantees for the stability of builds. The underlying idea with option 3 is that there is some setting that’s kept alongside the source for a project which indicates what dependencies are required and where to put them.

The biggest problem is the requirement for source control support to make use of this setting. Subversion has this built in with the svn:externals property. TFS has the ability to check out multiple locations into a defined tree structure, but provides no built-in mechanism to store this information in a relative form alongside the source code. It isn’t too hard to provide tooling to do this, however, and a well defined XML file will serve exactly the same purpose as the svn:externals property. How to do this reliably with distributed sytems like mercury and git is one of the things that I just don’t understand about those systems, possibly option 2 works better in those environments.

With option 2, it’s pretty obvious that when you tag, branch or merge a project, the required changes to the dependencies are also tagged, branched and merged. There’s no need to branch and then stabilise the references, the code is already the right version. Option 3 is slightly more confusing, but if you think of option 3 as a shortcut of option 2, it becomes pretty clear. To get control over your builds, the references must always point to a stable target. Either a tag, a label, a specific changeset, or whatever stable reference the source control provides. If you do this, then you can be sure that whenever a version of a project is retrieved, that the correct version of the dependencies will be retrieved as well and that the right things will get built. This should be something you can guarantee for every branch you care about from a release perspective, which at a minimum will be trunk and all maintenance branches. If you do this, when you label up a branch as a release, you can be sure, with no further work, that you’ve labelled everything that’s required for that build.

One advantage option 3 has over option 2 is that you can relax the stable version requirement when it’s appropriate. For example, you can tell source control to get the trunk for a particular dependency. For your working copy, this is no more taxing than using your source control to get the appropriate version – you don’t need to change and checkin your dependency file/property. For unstable or early development branches, this unstable reference can be checked in, with the understanding that you are intrinsically breaking the build guarantee for that period on that branch.

Conflicts are also highlighted in a more useful way than with option 2 – say a branch is taken to upgrade from version 1.1 to 1.2 of a library, but in the meantime trunk starts making use of version 1.1.1, this will be highlighted as a conflict in your dependencies when you come to merge the branch in. At that point you can determine the correct version of the library that should be used (maybe 1.2.1) to include all features and bug-fixes that are relied on. With option 2, you get a potential mish-mash of versions which may or may not work, you may have to re-do conflicts which were already resolved when making version 1.2.1 and you no-longer have the ability to definitively say what version of the library you’re using (it’s 1.1.1 with some 1.2 thrown in). It may not even be obvious to the person doing the merging that the conflicts are happening in a shared library, depending on the structure of your build environment.

All in all, option 3 appears to be by far the most practical method of sharing code whilst keeping a sane track of your development environment. I strongly advocate this method whever I can. But do be aware of the other options, they are very good and valid responses to other kinds of problems that we all face when doing development. But default to option 3.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: