Thursday, December 18, 2008

Robust imports in Python, guaranteed fresh: how to import code for testing

UPDATE 2010-01-19: As captnswing pointed out an alternative, and I should say more commonly used method, is to simply put the following before your import statement for your packages or modules, assuming you keep your tests in a subdirectory of your code.

import os.path
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), os.path.pardir))

Beer TastingAnyone who knows me knows I like unit tests. I mean, I really like unit tests. Like, if Mr. Software Engineering were to offer to betroth me to one of his daughters, I would ask him to betroth me to Miss Unit Test.

One thing that comes up when preparing tests in Python is, "Where the hell do I put them?" To this, my first answer is, "If you're willing and diligent enough to write them, you can put them damn well anywhere you please!" If that answer doesn't satisfy you, though, that's good, you're not alone. Python programmers have raised this topic on several forums, including recently on the Testing in Python mailing list and on Stack Overflow.

I'm a fan of the following method, which seems to have taken dominance in the Python community. It's based around the following directory structure:

rootdir/
rootdir/mymodule.py
rootdir/tests/
rootdir/tests/mymodule_tests.py

We have a directory containing our module of interest, mymodule.py, and we have a module, mymodule_tests.py, containing our unit tests for mymodule.py. We create a sudbirectory, tests/, under the root directory, rootdir/, of the project, and we place our mymodule_tests.py under this directory so that its path is rootdir/tests/mymodule_tests.py.

We've got to import the module we want to test into the module containing the tests for it. The import statement works for all packages/modules currently in the import path, found in the list sys.path. Since the current directory, '.', is in sys.path by default, we can easily import any packages/modules on the same level as our importing module. This would be in the form of a simple import statement of

import mymodule

For the typical testing layout, though, this won't suffice. We'll get a big fat ImportError. This is because the path of mymodule.py is in rootdir/, above our testing module's rootdir/tests/ path. The next logical step, then, is to place rootdir/ in sys.path for mymodule_tests.py to access mymodule.py. The initial thought for doing this is to add the directory above to the sys.path using relative path.

#!/usr/bin/env python
import os
import sys

sys.path.insert(0, os.pardir)

Unfortunately, this is fragile. If we run mymodule_tests.py from outside its own directory, this will break the path. Take the following script as an example:

#!/usr/bin/env python
# parpath.py: print the parent path
import os

print "parent directory:", os.path.abspath(os.pardir)

I place this script in the path of /home/chris/development/playground/, and then run it from this directory

[chris]─[@feathers]─[2495]─[15:35]──[~/development/playground]
$ python parpath.py
parent directory: /home/chris/development

When I run the script from the parent directory, however, my results differ.

[chris]─[@feathers]─[2496]─[15:36]──[~/development/playground]
$ cd ..

[chris]─[@feathers]─[2497]─[15:36]──[~/development]
$ python python/parpath.py
parent directory: /home/chris

In the words of Austin Powers, "That's not right." Now instead of getting the directory I wanted (/home/chris/development/playground), I get the one above it (/home/chris/development). This is because relative paths is sys.path are relative to where you executed the script, not relative to where the script exists. Phooey!

I used to just ignore this fragility and be very careful about running tests from within the same directory as the test modules. However, last night I came across a robust solution by way of some Google Code Search Fu—specifically, while browsing test code for MoinMoin. It turns out the solution is to use a method of the following:

path_of_exec = os.path.dirname(sys.argv[0])
parpath = os.path.join(path_of_exec, os.pardir)
sys.path.insert(0, os.path.abspath(parpath))

If we take a look at the first line, we see that it's capturing the first argument to the command line, and using that to construct a robust path that understands where the actual module is. The very first argument in sys.argv is always what immediately follows python in the command line (or if executing directly by ./) In our examples, these would by path.py and playground/path.py, respectively. Then, running os.path.dirname on these, we get the results of '' and 'playground', respectively. By joining these to the parent directory, we get the desired effect.

#!/usr/bin/env python
# parpath.py

import os
import sys

print "parent path:", os.path.abspath(os.pardir)

path_of_exec = os.path.dirname(sys.argv[0])
print "execution path:", path_of_exec
parpath = os.path.abspath(os.path.join(path_of_exec, os.pardir))
print "true parent path:", parpath
This gives us the following results:
[chris]─[@feathers]─[2467]─[16:32]──[~/development/playground]
$ python parpath.py
parent path: /home/chris/development
execution path:
true parent path: /home/chris/development

[chris]─[@feathers]─[2467]─[16:32]──[~/development/playground]
$ cd ..

[chris]─[@feathers]─[2467]─[16:33]──[~/development]
$ python playground/parpath.py
parent path: /home/chris
execution path: playground
true parent path: /home/chris/development

Now we're cooking with the good sauce! Ultimately, you can create a shortened version which looks similar to the one from MoinMoin:

#!/usr/bin/env python
import os
import sys

parpath = os.path.join(os.path.dirname(sys.argv[0]), os.pardir)
sys.path.insert(0, os.path.abspath(parpath))

So now you, too, can enjoy a fine import from the comfort of your own ~, or anywhere else.

1 comment:

  1. hm,
    what about

    import os
    MODULE_LOCATION = os.path.abspath(os.path.dirname(__file__))

    you can get to the parent dir from there easily. works no matter from where the module is invoked.

    ReplyDelete