Thursday, April 30, 2009

How symbolic: on removing symlinks in Bazaar VCS

the weakest link

I have a strong affinity for distributed revision control systems, and my favorite has been Bazaar VCS (a.k.a., bzr). Like any piece of software, bzr has its quirks and shortcomings. Tonight, I encountered its rather tricky behavior when it comes to symbolic links (symlinks).

I keep my configurations under revision control, which gives me the benefits of rolling back changes when I inevitably break things, and of setting up home on a new system, even a remote one, very quickly and easily. All was well, but I discovered that when I naively placed my .vim/ directory under the repository, I added a ton of symlinks to files in /usr/share/vim/addons. These symlinks were present because I used Ubuntu's vim-scripts and vim-addon-manager packages to install these addons to my Vim profile, which essentially just sets up symlinks to the addons, stored in /usr/share. It's a pretty reasonable system, actually, but it doesn't make sense to have these symbolic links stored in my branch. I can't guarantee that each system I work on will have the files the symlinks point to, therefore, I thought it best to remove them. Therein I encountered a sticky issue with bzr: you really can't remove symlinks from its revision tracking easily.

I thought I could be clever and write a simple one-liner in Bash to remove all the symlinks presently tracked by bzr from further tracking, but still leave them on the file system (I still need the symlinks there, after all, or my Vim goodness will break).

for file in `bzr ls -V`; do          # use bzr ls -VR in later versions
    if [ -h $file ]; then            # see if the file is a symlink
        echo "Removing $file";
        bzr rm --keep $file;         # remove from tracking, not the FS
    fi;
done

Okay, so I reformatted it for annotation, but trust me, it fits on one line. Anyway, I immediately encountered problems, getting this as output:

.vim/compiler/tex.vim
bzr: ERROR: Not a branch: "/usr/share/vim/addons/compiler/tex.vim/".
.vim/doc/NERD_commenter.txt
bzr: ERROR: Not a branch: "/usr/share/vim-scripts/doc/NERD_commenter.txt/".
.vim/doc/bufexplorer.txt
bzr: ERROR: Not a branch: "/usr/share/vim-scripts/doc/bufexplorer.txt/".
.vim/doc/imaps.txt.gz
bzr: ERROR: Not a branch: "/usr/share/vim/addons/doc/imaps.txt.gz/".
.vim/doc/latex-suite-quickstart.txt.gz
...

WTF? "Not a branch!?"

Okay, so, what happens here is that Bazaar de-references the symlink before attempting to remove it, which is not at all what I had in mind. Poking around Launchpad, you can find several bug reports regarding the way Bazaar deals with symlinks. The workaround solutions proposed in those—remove the symlink using rm—wouldn't work for me, because I needed to retain the actual symlinks on the filesystem.

At this point I had solicited the attention of Robert Collins, a.k.a. lifeless in #bzr on Freenode. When I told him the workaround wouldn't work for me, and that I'd need to write a script, he suggested I use WorkingTree.unversion() from bzrlib. Despite being a Python fanatic [understatement] and bzr's codebase being in Python, when I said "script", I meant "Bash script". It never occurred to me to actually write a Python script until he mentioned that. By the completion of the thought, though, I was digging into the codebase of bzrlib to figure out what to do.

My initial approach plan included using os.walk() to move through the filesystem, os.path.islink() to identify the symbolic links, and then WorkingTree.unversion() to mark the files for removal from tracking. I ran into a problem, however, in that unversion() only accepts a list of file IDs, as specified by the bzr metadata. Robert pointed me towards a method called path2ids(), but I had trouble figuring out how I was going to give it the proper paths. os.walk will let me construct absolute paths to files, but I really needed relative paths to the files, truncated at a certain point past the root (e.g., .vim/compiler/tex.vim instead of /home/chris/shell-configs/.vim/compiler/tex.vim). I could see it was getting a little hairy, so I decided to dig a little further into WorkingTree code and see if there was anything else I could use.

What I discovered was the jackpot in the form of WorknigTree.walktree()—a method written precisely for what I needed: traversing the filesystem, identifying the filetypes (especially symlinks), and providing file IDs. Within a few minutes, I banged out a script that did exactly what I needed it to do, presented below.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

# Copyright (c) 2009 Chris Lasher
#
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.

"""
A simple script to go through a Bazaar repository and ruthlessly
remove all symbolic links (symlinks) from further tracking.

It's important to note that this will not actually remove the symlinks
from the physical filesystem. This is left to the user, if so desired.

"""

__author__ = 'Chris Lasher'
__email__ = 'chris DOT lasher AT gmail DOT com'

import bzrlib.workingtree
import os

tree = bzrlib.workingtree.WorkingTree.open(os.getcwd())
try:
    # use protection -- one-on-one action only
    tree.lock_write()
    symlink_ids = []
    for dir, file_list in tree.walkdirs():
        # dir[1] (the file_id) will be None if it's not under revision
        # control, so this will skip it if it's not
        if dir[1]:
            for file_data in file_list:
                # file_data[2] is the file type, and file_data[4] is the
                # file_id, the necessary specifier for removing the file
                # from revision tracking
                if file_data[2] == 'symlink' and file_data[4]:
                    print "Removing %s" % file_data[0]
                    symlink_ids.append(file_data[4])

    tree.unversion(symlink_ids)

finally:
    # okay, all yours
    tree.unlock()

Hopefully someone else will find this little script useful. It's under the Apache version 2 license; make whatever use of it you can for your particular predicament.

So what were the lessons learned here:

  1. Exercise a little restraint and consideration about what you put under revision control in the first place.
  2. It's awesome to be able to have direct contact with developers of your tools.
  3. It's even more awesome to be able to dig right into their code and help yourself.
  4. Just like in Murali's brutal Theory of Algorithms course, in real life, when facing difficulty solving a problem one way, don't be afraid to step back and try an approach from another (the opposite) direction. Trust your gut—if it feels like the hard way of doing something, it probably is; find the lazy (smart) way.

A special thanks to Robert for his guidance and help.