Sunday, January 25, 2009

Class attributes and scoping in Python, Part 1

Object and Attribute

This latest post comes courtesy of Hari Jayaram, one of those people who's on my "I'd like to meet" list. Hari asks, [paraphrased] "Does Python treat class variables as having an instance scope, while at the same time treat class lists and class dictionaries as having class scope?"

Let's use a simple example to illustrate some gotchas with regards to class and instance attributes. We'll begin by coding up a simple class with a couple of reporter functions to help us out later on; right now I'd just like to draw your attention to class Foo's two attributes of interest: class_attr which shall represent our class attribute, and instance_attr which—you guessed it—represents our instance attribute.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import pprint

class Foo(object):
    class_attr = 0

    def __init__(self, item):
        self.instance_attr = item


    def report(self):
        print "My 'class_attr' is: %s" % self.class_attr
        print "My '__class__.class_attr' is: %s" % \
            self.__class__.class_attr
        print "My 'instance_attr' is: %s" % self.instance_attr


    def print_self_dict(self):
        pprint.pprint(self.__dict__)


    def change_class_attr(self, item):
        self.__class__.class_attr = item

Alright, let's throw this puppy into the interactive interpreter and play with it a bit. We'll start off by creating two instances and checking them out.

>>> from foo import Foo
>>> a = foo.Foo('a')
>>> b = foo.Foo('b')
>>> a.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

So right now we see that a and b share the same class attribute value of 0 but different instance attribute values of 'a' and 'b', respectively. Now, let's attempt change the class variable:

>>> a.class_attr = 1
>>> print a.class_attr
1
>>> print b.class_attr
0

Wait, a has our expected value for the class attribute, but instance b doesn't. That doesn't make sense; it's a class attribute after all! Let's take a closer look at the internals, though:

>>> a.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

Notice the discrepancy between the reported values for self.class_attr and self.__class__.class_attr for a? Huh. It looks as if Python actually made the assignment of the new value to an instance variable of the name class_attr rather than assign the value to the Foo class's class_attr. We can take a look at the instance and class dictionaries to help extricate this. First, let's compare the internal dictionaries of a and b.

>>> a.print_self_dict()
{'class_attr': 1, 'instance_attr': 'a'}
>>> b.print_self_dict()
{'instance_attr': 'b'}

Ha! Python, we've found you out! We now can see, indeed, Python made the new value assignment to a brand new instance variable in a called (deceptively, in our deceptive case) class_attr.

Now let's explore how to actually convince Python to do what we meant to do: reassign the class variable. Let's get a clean slate.

>>> a = foo.Foo('a')
>>> b = foo.Foo('b')
>>> a.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 0
My '__class__.class_attr' is: 0
My 'instance_attr' is: b

One generic means by which we can reassign the class variable is to directly assign it via the class, rather via an instance of the class.

>>> Foo.class_attr = 1
>>> a.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 1
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 1
My '__class__.class_attr' is: 1
My 'instance_attr' is: b
>>> a.print_self_dict()
{'instance_attr': 'a'}
>>> b.print_self_dict()
{'instance_attr': 'b'}

That worked a treat. But often in production code, we don't want to tie the fate of a class variable assignment to a hard-coded class name somewhere in some file, soon to break when we refactor our code and give the class a new name. This is where using the special variable __class__ comes in handy. Take another look at the method change_class_attr().

    def change_class_attr(self, item):
        self.__class__.class_attr = item

This uses the instance's inherent knowledge of what class it belongs to (accessed via __class__) to make the necessary assignment to the class variable. So, we see, this also works:

>>> a.change_class_attr(2)
>>> a.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: a
>>> b.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: b

There's an important caveat here: this method, too, is fragile for sub-classes. For example, let's create a sub-class of Foo called Bar, and an instance c.

>>> class Bar(Foo):
...     pass
...
>>> c = Bar('c')
>>> c.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: c
Now let's observe what happens when we assign a new value to the class variable via c's change_class_attr().
>>> c.change_class_attr(3)
>>> c.report()
My 'class_attr' is: 3
My '__class__.class_attr' is: 3
My 'instance_attr' is: c

All's well, but notice this only affected the Bar class's class_attr, not the Foo class's:

>>> a.report()
My 'class_attr' is: 2
My '__class__.class_attr' is: 2
My 'instance_attr' is: a
>>> print Foo.class_attr
2

Failing to make note of this can come back to bite Python programmers in the tail. For example, you may use a class attribute to keep track of the number of instances of that class. If you would like to keep track of compatible sub-class instances, too, however, the __class__ trick will prove insufficient; a hard-coded class name would prove more suitable. Use this knowledge to make the right decision for your particular scenario.

In the next part, I'll be covering an even more interesting scoping question dealing with lists and other mutables as class variables.

8 comments:

  1. Thanks for this awesome post . I think I am finally close to getting a hang of class and instance scope.

    Also I did not consider the caveat that you spoke about if you hardcode something with self.__class__.class_attr vs just using the class name Foo.class_attr.

    Personally I prefer the class name (Foo.class_attr) approach, because I always find code with too many of the double underscored variables and methods very difficult to read.

    I am very glad for this post and am looking forward to Part II

    ReplyDelete
  2. thanks for the detailed comments. a clear and lucid description of class scoping in python.

    ReplyDelete
  3. The easiest way to understand it is that Python scoping is explicit.

    If you assign to self.something then you are creating an instance variable.

    Any assignments directly in the class declaration (or made explicitly on the class) will be class variables.

    The one confusing point is that you can *look up* class variables on an instance (using self) - this is because when you ask an instance for an attribute it checks the instance first and then looks on the class (leaving out descriptors which make the situation more complex but thankfully rarely do you need to actually be aware of them).

    You can create an instance variable with the same name as a class variable - and then when you ask the instance (using self again) for that member it will return the instance variable. In this sense instance variables 'mask' class variables.

    ReplyDelete
  4. You should use the "Class.classattr" notation. It is easier to read and more succinct. The argument that you shouldn't use an explicit class name reference because it will be broken by later refactoring is a red herring. If you are doing self.__class__.attr, then you are in the class definition itself, and clearly you would catch that in a refactor.

    There is more of an argument to be made if you are doing foo.__class__.attr, but it's almost better (and more correct) to do an if isinstance(foo, SomeClass): SomeClass.attr = 3.

    Bottom line: there is a difference between "improper hard coding" and "referencing an identifier".

    ReplyDelete
  5. Hmm... I wouldn't be so dogmatic about the syntax for accessing class attributes.

    I have a *mild* preference for "Class.classattr" notation but it isn't a big deal.

    Using "self.__class__.class_attr" may be useful when working with subclasses...

    ReplyDelete
  6. ...or it might be problematic. As the OP wrote, you would be setting a class attribute on subclasses, which may or may not be the desired behavior. If that's the desired behavior, i.e. you want the base class's method to be "virtual" (for lack of a better term) or to set a "virtual" version of the attribute on each subclass, then self.__class__ is the way to go.

    I am just pointing that that the OP's rationale for using self.__class__ in this case is flawed. I'm not saying there's never a use case for it.

    ReplyDelete
  7. Thank you for the post. Very clear and good points. As a little thank you, if you like gypsy swing check out this one, http://youtu.be/yJSpeb8YWQU (Pure Django Reinhardt Awesomeness!)

    ReplyDelete
  8. Your second code snippet looks buggy - foo.Foo(args) should throw a NameError because you haven't imported the foo module into your namespace.

    ReplyDelete