I had thought at this point in my graduate career that I would not only have a firm grasp of the Biopython library, but also have become a primary contributor to the project. Like many aspirations I've had throughout my life, these proved naïve, but I felt they warranted reflection.
Most who of you who know me online or professionally also know me as a Python advocate. Only some of you know of my darker days programming in Perl. Through patient proselytizing by Rob Waldo (now Ph.D.), a friend of a lab mate at the University of Georgia Dept. of Microbiology, I rose up from my Perl-ish ways. Most of the reasons I love Python stem from the language's foundational philosophy, which I find wholly agreeable; indeed, my fanaticism for Python goes so far as for me to make a claim my friends often rib me about: Python is "arguably superior".
As a general computing language, I stand firmly by that claim. As a language for bioinformatics, however, I can not. By far, Perl has a nearly immovable foothold as the de facto language in the field I hope to have a career in. In the beginning of a field dominated by text manipulation, Perl
seemed the perfect choice to trailblazers. As more people moved into the field, more used Perl, because that's what the gal next door said she used. This positive feedback loop created a critical mass community that gave rise to the Bioperl library. Other language communities spawned their own Bio* projects, though none of them approach the size, completeness, and popularity of Bioperl, in large part due to the Catch-22 of not having the size, completeness, and popularity of Bioperl.
Making matters more challenging in the Python sector, a separate effort, Biology in Python (BIP), arose last August during the SciPy 2007 Conference, led by Titus Brown. The first BIP Birds of a Feather meeting at SciPy 2007 held that very few bioinformatician types use Biopython, instead preferring to roll their own parsers and libraries, out of ignorance or preference. (Read Titus's account of the BoF for more detail.) The BoF founded BIP with the intention of providing a unified community of bioinformatics and computational biology efforts in Python larger than a single project like Biopython. The meeting had talk over incorporating Biopython into its effort, but to me it still felt fractious: Biopython was a "them" and not an "us". Today, the two communities remain largely segregated and little communication exists between them.
While the Python efforts remain fragmented, those in the Perl community continue to snowball: more and more useful modules are added to Bioperl, which brings more users into the community, who write more and more useful modules, and so on. Jonathan Rockway recently wrote a pertinent blog post where the punchline emphasizes the importance of language libraries which prevent wheel reinvention and allow developers to really get things done.
So while Perl's popularity in bioinformatics irks me, I recognize it helps researchers get things done. I also recognize proselytizing won't bring greater part of that mind share to Python. A Biopython as good or better than Bioperl might. "You must be the change you wish to see in world," said Mahatma Gandhi. In this vein, I had hoped to become an expert on Biopython, and to actually get down to writing the "missing code". Well, to date, I have yet to be the change I wish to see.
I have made one contribution to Biopython's code base over the past three years, and that being a one-line change in a testing module. I have filed and helped with only a handful of bug reports. Originally, I led an effort to migrate Biopython from CVS to Subversion, though this has remained stalled for a year and a half. As far as I can tell, only two steadfast programmers, Peter Cock, and another developer who wishes to remain unmentioned, have kept this project going. To confirm my suspicions, I ran a CVS log on Biopython and perfunctorily scanned commits made in 2008. These two programmers made nearly all the commits, with Frank Kauff and Tiago Antão providing a handful of the remaining commits.
I subscribe to both the Biopython users mailing list and the developers list, but these days I check neither list's emails. Instead, my Gmail filters label and archive them, and there they sit until I get so intimidated by the number of unread threads that I simply "mark all as read". At this point, my involvement with the Biopython project remains minimal. I do not recall the last time I looked at the code base. I last pulled updates from the CVS repository five months ago, prior to pulling them during the course of writing this post. My lab mate, Andrew Warren, might even know more about Biopython than I do, at this point, and I turned him on to the package.
Ultimately, I have to ask, why have I not sat down and gone through the Biopython library as I intended to? Why have I not written the modules that seem missing from the library? The most simple answer is that I did not want to. Really, deep down, if I wanted to do it, I would have. After all, if there's anything I learned from Prof. Randy Pausch, it's that if you really want to make something happen, you can make it happen. This doesn't really address the question, though, since I've established that at some level, I certainly intended to get my hands dirty with Biopython. I think I have not worked with Biopython because I am not encouraged to do so, and am actually discouraged, because of research, and the current culture of academia.
My research project only uses the SeqIO class to parse through Genbank and FASTA boilerplate and get to the sequence information I need to use. I make far more use out of the NetworkX graph library for the bulk of what my code does, and even then, it's in a limited manner. This puts efforts towards Biopython as orthogonal—antagonistic, even—to what will earn me praise by my adviser and committee (namely publications). Therein lies a terrible problem, because ultimately, a more user-friendly or powerful (or both) Biopython could benefit more people than my own research, but academia seems to offer little to no reward for such tool building.
So, I imagine I'll not make much headway into Biopython for the remainder (probably [hopefully?] three more years) of my graduate career, and that will be that. I can find no immediate resolution to this conundrum of feeling compelled to help a cause that most PI's would find a fool's pursuit. I'd love to hear from contributors and users of any of the Bio* projects on how they work in time to attend to and learn these libraries. Perhaps some research projects out there really do leverage these libraries, or perhaps the field of bioinformatics is just too vast and too devoid of overlap, rendering code reuse a pointless pursuit. Maybe it's really just a roll-your-own kind of world.