Sunday, January 25, 2009

FriendFeed PyAPI, or "What I did over winter break"

In mid-December 2008 I made the decision to go dark and execute on a solid coding project I could sink my teeth into. On January 13, 2009, I emerged with the fruits of a lot of labor of the fingertips: a fully fledged Python interface library to the FriendFeed API, suitably named FriendFeed PyAPI.FriendFeed Python Powered

This library began from the original Python code available from the FriendFeed Google Code repository. This library provided a great basis, as it showed me how to implement method calls to the FriendFeed API, as well as contained code necessary for authentication which I wouldn't possibly have known how to write. The original library returned native Python structures of dictionaries, lists, strings, integers, and the like, parsed out using one of several available JSON parsing libraries which may be available on the systems. While this worked well enough, I saw a chance to really improve the library by having it work with and return full fledged data structures to represent the various FriendFeed entities, including users, rooms, entries, comments, and media. Calvin Spealman, a.k.a. ironfroggy, asked me two shrewd questions: 1) Wasn't I just creating an ORM [object-relational mapping]? 2) Why would I do that? The answer to 1) was "Yes". My answer to 2) was, essentially, "Because I want to." Calvin understood what I now know about undertaking the process: it takes a lot of time doing grunt work coding to create an ORM. I had experience using the object-oriented Python interface to the Twitter API for another project for Bertalan Meskó, and I really enjoyed the "feel" of that library, and so I made it my goal to bring the same kind of feel to the FriendFeed library. The result was an expansion and refactoring of the original library of 812 lines of code to nearly 4,000 lines, 45 unit tests, 8 entity classes, about a dozen exceptions, and support for nearly all the API calls available.

I think the real joy for me came from creating methods to parse the JSON structures recursively and instantiate appropriate objects at each depth. These objects are then appropriately set as attributes of their parent objects (that is, the objects they "belong to"). All of this is done quite simply with a mapping scheme of entity names to methods (e.g. mapping the key 'users' to the method _parse_users), and it feels quite elegant having it all work together, calling the appropriate parsing method for each structure, and returning beautiful little self-documented class instances. Witnessing it work in concert for the first time was definitely a "blinking LED moment," as my friend Ian Firkin would say.

Perhaps the most important lesson came not from the specific technical hurdles I made my way through, but from the personal insight that I absolutely love programming. I love writing code; I love to talk about writing code; and I really love interacting with other developers. Over the course of the couple of weeks, I consulted Stack Overflow, hit up #python on IRC, and had direct email exchanges with Ben Golub at FriendFeed (who, by the way, is an absolutely stand-up developer and a fantastic representative for the young service). I have a genuine sense of satisfaction from the code and documentation I produced for the project, and that feeling makes for a happier life more than any other currency (except, possibly, beer).

So what now? Well, I released FriendFeed PyAPI under the same Apache License (Version 2) that FriendFeed released the original library under. This means you may fork it, play with it, and modify it to your heart's content, and if you care to, let me know what improvements you've made so I can merge them back into the trunk branch. (Of course, you may also keep any and all modifications to yourself, in your quest for world domination, though you'll still have to attribute FriendFeed and me as taking a part in your doomsday device.) [Edit: On second thought, please don't attribute me in those events.] I also have a list of future directions, and a few ideas of my own, including the one that actually spurned this spurt of code-writing, that I look forward to releasing upon FriendFeeders. So go out and use it! Ask questions about it! Most importantly, please report bugs!

Class attributes and scoping in Python, Part 1

Object and Attribute

This latest post comes courtesy of Hari Jayaram, one of those people who's on my "I'd like to meet" list. Hari asks, [paraphrased] "Does Python treat class variables as having an instance scope, while at the same time treat class lists and class dictionaries as having class scope?"

Let's use a simple example to illustrate some gotchas with regards to class and instance attributes. We'll begin by coding up a simple class with a couple of reporter functions to help us out later on; right now I'd just like to draw your attention to class Foo's two attributes of interest: class_attr which shall represent our class attribute, and instance_attr which—you guessed it—represents our instance attribute.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import pprint

class Foo(object):
    class_attr = 0

    def __init__(self, item):
        self.instance_attr = item


    def report(self):
        print "My 'class_attr' is: %s" % self.class_attr
        print "My '__class__.class_attr' is: %s" % \
            self.__class__.class_attr
        print "My 'instance_attr' is: %s" % self.instance_attr


    def print_self_dict(self):
        pprint.pprint(self.__dict__)


    def change_class_attr(self, item):
        self.__class__.class_attr = item

Alright, let's throw this puppy into the interactive interpreter and play with it a bit. We'll start off by creating two instances and checking them out.

>>> from foo import Foo
>>> a = foo.Foo('a')
>>> b = foo.Foo('b')
>>> a.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

So right now we see that a and b share the same class attribute value of 0 but different instance attribute values of 'a' and 'b', respectively. Now, let's attempt change the class variable:

>>> a.class_attr = 1
>>> print a.class_attr
1
>>> print b.class_attr
0

Wait, a has our expected value for the class attribute, but instance b doesn't. That doesn't make sense; it's a class attribute after all! Let's take a closer look at the internals, though:

>>> a.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

Notice the discrepancy between the reported values for self.class_attr and self.__class__.class_attr for a? Huh. It looks as if Python actually made the assignment of the new value to an instance variable of the name class_attr rather than assign the value to the Foo class's class_attr. We can take a look at the instance and class dictionaries to help extricate this. First, let's compare the internal dictionaries of a and b.

>>> a.print_self_dict()
{'class_attr': 1, 'instance_attr': 'a'}
>>> b.print_self_dict()
{'instance_attr': 'b'}

Ha! Python, we've found you out! We now can see, indeed, Python made the new value assignment to a brand new instance variable in a called (deceptively, in our deceptive case) class_attr.

Now let's explore how to actually convince Python to do what we meant to do: reassign the class variable. Let's get a clean slate.

>>> a = foo.Foo('a')
>>> b = foo.Foo('b')
>>> a.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

One generic means by which we can reassign the class variable is to directly assign it via the class, rather via an instance of the class.

>>> Foo.class_attr = 1
>>> a.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 1
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 1
My 'instance_attr' is: b
>>> a.print_self_dict()
{'instance_attr': 'a'}
>>> b.print_self_dict()
{'instance_attr': 'b'}

That worked a treat. But often in production code, we don't want to tie the fate of a class variable assignment to a hard-coded class name somewhere in some file, soon to break when we refactor our code and give the class a new name. This is where using the special variable __class__ comes in handy. Take another look at the method change_class_attr().

    def change_class_attr(self, item):
        self.__class__.class_attr = item

This uses the instance's inherent knowledge of what class it belongs to (accessed via __class__) to make the necessary assignment to the class variable. So, we see, this also works:

>>> a.change_class_attr(2)
>>> a.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: b

There's an important caveat here: this method, too, is fragile for sub-classes. For example, let's create a sub-class of Foo called Bar, and an instance c.

>>> class Bar(Foo):
...     pass
...
>>> c = Bar('c')
>>> c.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: c
Now let's observe what happens when we assign a new value to the class variable via c's change_class_attr().
>>> c.change_class_attr(3)
>>> c.report()
My 'class_attr' is: 3
My '__class__.class_attr' is: 3
My 'instance_attr' is: c

All's well, but notice this only affected the Bar class's class_attr, not the Foo class's:

>>> a.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: a
>>> print Foo.class_attr
2

Failing to make note of this can come back to bite Python programmers in the tail. For example, you may use a class attribute to keep track of the number of instances of that class. If you would like to keep track of compatible sub-class instances, too, however, the __class__ trick will prove insufficient; a hard-coded class name would prove more suitable. Use this knowledge to make the right decision for your particular scenario.

In the next part, I'll be covering an even more interesting scoping question dealing with lists and other mutables as class variables.

Friday, January 23, 2009

Tab-completion and history in the Python interpreter

The Interpreter

I usually use IPython as my interactive Python interpreter, but it has problems with Unicode decoding which can have detrimental effects for times when I need to deal with Unicode (such as when I'm working with FriendFeed PyAPI). When complaining about this on #python, one of the user told me I should use the standard Python interpreter anyway. When I told him I did not use the standard interpreter because I loved the convenience of tab-completion in the IPython shell, he informed me that, indeed, the standard interactive interpreter can do auto-complete.

After some Googling, I came upon this blog post. I wound up using a modified solution posted in the comments. Here's my .pythonrc file:

import atexit
import os.path

try:
   import readline
except ImportError:
   pass
else:
   import rlcompleter

   class IrlCompleter(rlcompleter.Completer):
       """
       This class enables a "tab" insertion if there's no text for
       completion.

       The default "tab" is four spaces. You can initialize with '\t' as
       the tab if you wish to use a genuine tab.

       """

       def __init__(self, tab='    '):
           self.tab = tab
           rlcompleter.Completer.__init__(self)


       def complete(self, text, state):
           if text == '':
               readline.insert_text(self.tab)
               return None
           else:
               return rlcompleter.Completer.complete(self,text,state)


   #you could change this line to bind another key instead tab.
   readline.parse_and_bind('tab: complete')
   readline.set_completer(IrlCompleter().complete)


# Restore our command-line history, and save it when Python exits.
history_path = os.path.expanduser('~/.pyhistory')
if os.path.isfile(history_path):
   readline.read_history_file(history_path)
atexit.register(lambda x=history_path: readline.write_history_file(x))

I then added the following line to my .bashrc:

export PYTHONSTARTUP="$HOME/.pythonrc"

Now I can remain a happy camper using the native interactive interpreter.

Update (2008-1-25): Thanks to Bob Erb's comments, I corrected some poor indentation (whoops!) and also added the final lines to remove the atexit and os.path modules from the main namespace.

Update (2009-4-18): I removed the deletion of atexit and os.path from the main namespace. That seemed to wreck any script that needed either of those; quite a few scripts rely on os.path, in particular.