contact
----------------------------

Blog <-

Archive for the ‘python’ Category

RSS   RSS feed for this category

Pydocmd: Generate Markdown from python source files

I've created pydocmd. It generates Python Module / script documentation in the Markdown (md) format. It was written to automatically generate documentation that can be put on Github or Bitbucket.

It is as of yet not very complete and is more of a Proof-of-concept than a fully-fledged tool. Markdown is also a very restricted format and every implementation works subtly, or completely, different. This means output may be different on different converters. The Bitbucket version doesn't look as nice as the Github version.

Get it from bitbucket.

Example outputs are available:

Scripting a Cisco switch with Python and Expect

WS-C3750G-48TS-S2In the spirit of "Automate Everything" I was tasked with scripting some oft needed tasks on Cisco Switches. It's been a while since I've had to do anything even remotely related to switches, so I thought I'd start by googling for some ways to automate tasks on switches. What I found:

Both seemed to be able to get the job done quite well. Unfortunately it turns out that the source for sw_script is actually nowhere to be found and Trigger wouldn't even install properly, giving me a whole plethora of compiler errors. Since I was rather time constrained, I decided to fall back to good old Expect.

Expect

Expect s a framework to automate interactive applications. Basically what it does is let the user insert text into the input of the program, and then watches the output of the program for specific occurrences of text, hence the name "Expect". For example, consider a program that requires the user to enter a username and password. It lets the user know this by giving us prompts:

$ ftp host.local
Username: 
Password:

We can use Expect to scan the output of the program and respond with the username and password when appropriate:

spawn ftp host.local
expect "Username:"
send "fboender\r"
expect "password:"
send "sUp3rs3creT\r"

It's a wonderful tool, but error handling can be somewhat tricky, as you'll see further in this article.

Scripting a Cisco switch

There is an excellent Expect library for Python called Pexpect. Installation on Debian-derived systems is as easy as "aptitude install python-pexpect".

Here's an example session on a Cisco switch we'll automate with Expect in a bit:

$ ssh user@10.0.0.1
Password:
Switch>enable
Password:
Enter configuration commands, one per line.  End with CNTL/Z.
Switch(config)#interface Gi2/0/2 
Switch(config-if)#switchport access vlan 300
Switch(config-if)#no shutdown
Switch(config-if)#end
Switch#wr mem
Building configuration...
[OK]
Switch#quit

This is a simple manual session that changes the Vlan of switch port "Gi2/0/2" to Vlan 300. So how do we go about automating this with PExpect?

Logging in

The first step is to log in. This is fairly easy:

import pexpect

switch_ip = "10.0.0.1"
switch_un = "user"
switch_pw = "s3cr3t"
switch_port = "Gi2/0/2"
switch_vlan = 300

child = pexpect.spawn('ssh %s@%s' % (switch_un, switch_ip))
child.logfile = sys.stdout
child.timeout = 4
child.expect('Password:')
child.sendline(switch_pw)
child.expect('>')

First we import the pexpect module. Then we spawn a new process "ssh user@10.0.0.1". We set the process' logfile to sys.stdout. This is merely for debugging purposes. It tells PExpect to show all the output it's receiving on our terminal. The default timeout is set to 4 seconds.

Then comes the first juicy bit. We let Expect know that we expect to see a 'Password:' prompt. If something goes wrong, for instance the switch at 10.0.0.1 is down, expect will wait for 4 seconds, looking for the text 'Password:' in SSH's output. Of course, it won't get that prompt since the switch is down. It will then raise a pexpect.TIMEOUT exception after 4 seconds. If it does detect the 'Password:' prompt, it will then send the switch password and wait until it detects the prompt.

Catching errors

If we want to catch errors and show the user somewhat helpful error messages, we can use try/except clauses:

try:
  child.expect('Password:')
except pexpect.TIMEOUT:
  raise OurException("Login prompt not received")

​After the password prompt, we send the password. If all goes well, we'll receive the prompt. Otherwise the switch will ask for the password again. We don't "expect" this, so PExpect will timeout once again while waiting for the ">" prompt.

try:
  child.sendline(switch_pw)
  child.expect('>')
except pexpect.TIMEOUT:
  raise OurException("Login failed")

Let's jump ahead a bit and look at the next interesting problem. What if we supply the wrong port? The switch will respond like so:

Switch(config)#interface Gi2/0/2 
                         ^
% Invalid input detected at '^' marker.

If, on the other hand, our port is correct, we'll simply get a prompt:

Switch(config-if)#

So here we have two possible scenario's. Something goes wrong, or it goes right. How do we detect this? We can tell Expect that we expect two different scenario's:

o = child.expect(['\(config-if\)#', '% Invalid'])
if o != 0:
  raise OurException("Unknown switch port '%s'" % (port))

The first scenario '\(config-if\)#' is our successful one. The second is when an error occurred. We then simply check that we got the successful one, and otherwise raise an error.

​The rest of the script is just straight-forward expects and sendline's.

The full script

Here's the full script:

import pexpect

switch_ip = "10.0.0.1"
switch_un = "user"
switch_pw = "s3cr3t"
switch_enable_pw = "m0r3s3cr3t"
port = "Gi2/0/2"
vlan = 300

try:
  try:
    child = pexpect.spawn('ssh %s@%s' % (switch_un, switch_ip))
    if verbose:
        child.logfile = sys.stdout
    child.timeout = 4
    child.expect('Password:')
  except pexpect.TIMEOUT:
    raise OurException("Couldn't log on to the switch")

  child.sendline(switch_pw)
  child.expect('>')
  child.sendline('terminal length 0')
  child.expect('>')
  child.sendline('enable')
  child.expect('Password:')
  child.sendline(switch_enable_pw)
  child.expect('#')
  child.sendline('conf t')
  child.expect('\(config\)#')
  child.sendline('interface %s' % (port))
  o = child.expect(['\(config-if\)#', '% Invalid'])
  if o != 0:
      raise Exception("Unknown switch port '%s'" % (port))
  child.sendline('switchport access vlan %s' % (vlan))
  child.expect('\(config-if\)#')
  child.sendline('no shutdown')
  child.expect('#')
  child.sendline('end')
  child.expect('#')
  child.sendline('wr mem')
  child.expect('[OK]')
  child.expect('#')
  child.sendline('quit')
except (pexpect.EOF, pexpect.TIMEOUT), e:
    error("Error while trying to move the vlan on the switch.")
    raise

Conclusion

It's too bad that I couldn't use any of the existing frameworks. I could have tried getting Trigger to compile, but I was time constrained so I didn't bother. There are other ways of configuring Switches too. SNMP is one way, but it is complex and prone to errors. I believe it's also possible to retrieve the entire configuration from a switch, modify it and put it back. This is partly what Rancid does. However that would require even more time.

Expect was a good fit in this case. Although it too is rather error prone, it's fairly easy to catch errors as long as you're expecting (no pun intended) them. I strongly suggest you give Trigger a try before falling back to Expect. It seems like a very decent tool.

bbcloner v1.3 released. BitBucket backup tool.

I've just released bbcloner v1.3. bbcloner creates mirrors of your public and private Bitbucket Git repositories and wikis. It also synchronizes already existing mirrors. Initial mirror setup requires you manually enter your username/password. Subsequent synchronization of mirrors is done using Deployment Keys.

This release features the following changes:

  • Support for updating over https (patch by msboom)

 

my_indexr: Script to drop and recreate MySQL indexes

As can be read in this article, I was in need of a method to quickly drop all indexes (except primary keys) from a MySQL database. After googling around a bit and being astonished that apparently no-one had written such a thing yet, I wrote the script that can be seen in that article.

Unfortunately, that script wasn't very good, so I decided to do a cleaner better implementation of it. The result is my_indexr, which spits out SQL commands to drop and recreate indexes on a database. Other features include:

  • Process only certain tables
  • Process non-primary or both normal and primary indexes
  • Correctly handles:
    • Primary key indexes
    • Compound key / multi-column indexes
    • Index types (BTREE, etc)
    • Prefix lengths
    • Auto_increment columns, which MUST be a key (my_indexr skips indexes with these columns in them)

There may still be some edge-cases that are not properly handled by my_indexr. If you encouter one, please let me know.

You can download my_indexr from its Bitbucket page.

This is what its output looks like:

$ ./indexr.py -u root -p mydb
DROP INDEX `location` ON `idx_tst_innodb_basic`;
DROP INDEX `name_age` ON `idx_tst_innodb_basic`;
DROP INDEX `email` ON `idx_tst_innodb_basic`;
DROP INDEX `PRIMARY` ON `idx_tst_innodb_compkey`;
CREATE  INDEX `location` USING BTREE ON `idx_tst_innodb_basic` (`location_id`);
CREATE  INDEX `name_age` USING BTREE ON `idx_tst_innodb_basic` (`name`(40),`age`);
CREATE UNIQUE INDEX `email` USING BTREE ON `idx_tst_innodb_basic` (`email`);
ALTER TABLE `idx_tst_innodb_compkey` ADD PRIMARY KEY (`last_name`,`first_name`);

Think Bayes: Bayesian Statistics in Python

There's a free download available of "Think Bayes – Bayesian Statistics in Python" over at it-ebooks.info. A small excerpt from the book:

The premise of this book, and the other books in the Think X series, is that if you know
how to program, you can use that skill to learn other topics.

Most books on Bayesian statistics use mathematical notation and present ideas in terms
of mathematical concepts like calculus. This book uses Python code instead of math,
and discrete approximations instead of continuous mathematics. As a result, what
would be an integral in a math book becomes a summation, and most operations on
probability distributions are simple loops.

Increasing performance of bulk updates of large tables in MySQL

I recently had to perform some bulk updates on semi-large tables (3 to 7 million rows) in MySQL. I ran into various problems that negatively affected the performance on these updates. In this blog post I will outline the problems I ran into and how I solved them. Eventually I managed to increase the performance from about 30 rows/sec to around 7000 rows/sec. The final performance was mostly CPU bound. Since this was on a VPS with only limited CPU power, I expect you can get better performance on some decently outfitted machine/VPS.

The situation I was dealing with was as follows:

  • About 20 tables, 7 of which were between 3 and 7 million rows.

  • Both MyISAM and InnoDB tables.

  • Updates required on values on every row of those tables.

  • The updates where too complicated to do in SQL, so they required a script.

  • All updates where done on rows that were selected on just their primary key. I.e. WHERE id = …

Here are some of the problems I ran into.

Python’s MySQLdb is slow

I implemented the script in Python, and the first problem I ran into is that the MySQLdb module is slow. It’s especially slow if you’re going to use the cursors. MySQL natively doesn’t support cursors, so these are emulated in Python code. One of the trickiest things is that a simple SELECT * FROM tbl will retrieve all the results and put them in memory on the client. For 7 million rows, this quickly exhausts your memory. Real cursors would fetch the result one-by-one from the database so that you don’t exhaust memory.

The solution here is to not use MySQLdb, but use the native client bindings available with import mysql.

LIMIT n,m is slow

Since MySQL doesn’t support cursors, we can’t mix SELECT and UPDATE queries in a loop. Thus we need to read in a bunch of rows into memory and update in a loop afterwards. Since we can’t keep all the rows in memory, we must read in batches. An obvious solution for this would be a loop such as (pseudo-code):

offset = 0
size = 1000
while True:
    rows = query('SELECT * FROM tbl LIMIT :offset, :size"
    for row in rows:
        # do some updates
    if len(rows) < size:
        break
    offset += size

This would use the LIMIT to read the first 1000 rows on the first iteration of the loop, the next 1000 on the second iteration of the loop. The problem is: in MySQL this becomes linearly slower for higher values of the offset. I was already aware of this, but somehow it slipped my mind.

The problem is that the database has to advance an internal pointer forward in the record set, and the further in the table you get, the longer that takes. I saw performance drop from about 5000 rows/sec to about 100 rows/sec, just for selecting the data. I aborted the script after noticing this, but we can assume performance would have crawled to a halt if we kept going.

The solution is to use the order by the primary key and then select everything we haven’t processed yet:

size = 1000
last_id = 0
while True:
    rows = query('SELECT * FROM tbl WHERE id > :last_id ORDER BY id LIMIT :size')
    if not rows:
        break
    for row in rows:
        # do some updates
        last_id = row['id']

This requires that you have an index on the id field, or performance will greatly suffer again. More on that later.

At this point, +SELECT+s were pretty speedy. Including my row data manipulation, I was getting about 40.000 rows/sec. Not bad. But I was not updating the rows yet.

Connection settings

The next things I did is some standard tricks to speed up bulk updates/inserts by disabling some foreign key checks and running batches in a transaction. Since I was working with both MyISAM and InnoDB tables, I just mixed optimizations for both table types:

db.query('SET autocommit=0;')
db.query('SET unique_checks=0; ')
db.query('SET foreign_key_checks=0;')
db.query('LOCK TABLES %s WRITE;' % (tablename))
db.query('START TRANSACTION;')
# SELECT and UPDATE in batches
db.query('COMMIT;')
db.query('UNLOCK TABLES')
db.query('SET foreign_key_checks=1;')
db.query('SET unique_checks=1; ')
db.query('SET autocommit=1;')

I must admit that I’m not sure if this actually increased performance at all. It is entirely possible that this actually hurts performance instead. Please test this for yourselves if you’re going to use it. You should also be aware that some of these options bypass MySQL’s data integrity checks. You may end up with invalid data such as invalid foreign key references, etc.

One mistake I did make was that I accidentally included the following in an early version of the script:

db.query('ALTER TABLE %s DISABLE KEYS;' % (tablename))

Such is the deviousness of copy-paste. This option will disable the updating of non-unique indexes while it’s active. This is an optimization for MyISAM tables to massively improve performance of mass INSERTs, since the database won’t have to update the index on each inserted row (which is very slow). The problem is that this also disables the use of indexes for data retrieving, as noted in the MySQL manual:

While the nonunique indexes are disabled, they are ignored for statements such as SELECT and EXPLAIN that otherwise would use them.

That means update queries such as UPDATE tbl SET key=value WHERE id=1020033 will become incredibly slow, since they can no longer use indexes.

MySQL server tuning

I was running this on a stock Ubuntu 12.04 installation. MySQL is basically completely unconfigured out of the box on Debian and Ubuntu. This means that those 16 GBs of memory in your machine will go completely unused unless you tune some parameters. I modified /etc/mysql/my.cnf and added the following settings to improve the speed of queries:

[mysqld]
key_buffer         = 128M
innodb_buffer_pool_size = 3G

The key_buffer setting is a setting for MyISAM tables that determines how much memory may be used to keep indexes in memory. The equivalent setting for InnoDB is innodb_buffer_pool_size, except that the InnoDB setting also includes table data.

In my case the machine had 4 GB of memory. You can read more about the settings on these pages:

Don’t forget to restart your MySQL server.

Dropping all indexes except primary keys

One of the biggest performance boosts was to drop all indexes from all the tables that needed to be updates, except for the primary key indexes (those on the id fields). It is much faster to just drop the indexes and recreate them when you’re done. This is basically the manual way to accomplish what we hoped the ALTER TABLE %s DISABLE KEYS would do, but didn’t.

UPDATE: I wrote a better script which is available here.

Here’s a script that dumps SQL commands to drop and recreate indexes for all tables:

#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# DANGER WILL ROBINSON, READ THE important notes BELOW
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
import _mysql
import sys

mysql_username = 'root'
mysql_passwd = 'passwd'
mysql_host = '127.0.0.1'
dbname = 'mydb'

tables = sys.argv[1:]
indexes = []

db = _mysql.connect(user=mysql_username, passwd=mysql_passwd, db=dbname)
db.query('SHOW TABLES')
res = db.store_result()
for row in res.fetch_row(maxrows=0):
    tablename = row[0]
    if not tables or tablename in tables:
        db.query('SHOW INDEXES FROM %s WHERE Key_name != "PRIMARY"' % (tablename))
        res = db.store_result()
        for row_index in res.fetch_row(maxrows=0):
            table, non_unique, key_name, seq_in_index, column_name, \
            collation, cardinality, sub_part, packed, null, index_type, \
            comment, index_comment = row_index
            indexes.append( (key_name, table, column_name) )

for index in indexes:
    key_name, table, column_name = index
    print "DROP INDEX %s ON %s;" % (key_name, table)

for index in indexes:
    key_name, table, column_name = index
    print "CREATE INDEX %s ON %s (%s);" % (key_name, table, column_name)

Output looks like this:

$ ./drop_indexes.py
DROP INDEX idx_username ON users;
DROP INDEX idx_rights ON rights;
CREATE INDEX idx_username ON users (username);
CREATE INDEX idx_perm ON rights (perm);

Some important notes about the above script:

  • The script is not foolproof! If you have non-BTREE indexes, if you have indexes spanning multiple columns, if you have any kind of index that goes beyond a BTREE single column index, please be careful about using this script.

  • You must manually copy-paste the statements into the MySQL client.

  • It does NOT drop the PRIMARY KEY indexes.

Conclusions

In the end, I went from about 30 rows per second around 8000 rows per second. The key to getting decent performance is too start simple, and slowly expand your script while keeping a close eye on performance. If you see a dip, investigate immediately to mitigate the problem.

Useful ways of investigation slow performance is by using tools to unearth evidence of the root of the problem.

  • top can tell you if a process is mostly CPU bound. If you’re seeing high amounts of CPU, check if your queries are using indexes to get the results they need.

  • iostat can tell you if a process is mostly IO bound. If you’re seeing high amounts of I/O on your disk, tune MySQL to make better use of memory to buffer indexes and table data.

  • Use the EXPLAIN function of MySQL to see if, and which, indexes are being used. If not, create new indexes.

  • Avoid doing useless work such as updating indexes after every update. This is mostly a matter of knowing what to avoid and what not, but that’s what this post was about in the first place.

  • Baby steps! It took me entirely too long to figure out that I was initially seeing bad performance because my SELECT LIMIT n,m was being so slow. I was completely convinced my UPDATE statements were the cause of the initial slowdowns I saw. Only when I started commenting out the major parts of the code did I see that it was actually the simple SELECT query that was causing problems initially.

That’s it! I hope this was helpful in some way!

Quick-n-dirty HAR (HTTP Archive) viewer

HAR, HTTP Archive, is a JSON-encoded dump of a list of requests and their associated headers, bodies, etc. Here's a partial example containing a single request:

{
  "startedDateTime": "2013-09-16T18:02:04.741Z",
  "time": 51,
  "request": {
    "method": "GET",
    "url": "http://electricmonk.nl/",
    "httpVersion": "HTTP/1.1",
    "headers": [],
    "queryString": [],
    "cookies": [],
    "headersSize": 38,
    "bodySize": 0
  },
  "response": {
    "status": 301,
    "statusText": "Moved Permanently",
    "httpVersion": "HTTP/1.1",
    "headers": [],
    "cookies": [],
    "content": {
      "size": 0,
      "mimeType": "text/html"
    },
    "redirectURL": "",
    "headersSize": 32,
    "bodySize": 0
  },
  "cache": {},
  "timings": {
    "blocked": 0,
  }
},

HAR files can be exported from Chrome's Network analyser developer tool (ctrl-shift-i → Network tab → capture some requests → Right-click and select Save as HAR with contents. (Additional tip: Check the "Preserve Log on Navigation option – which looks like a recording button – to capture multi-level redirects and such)

As human-readable JSON is, it's still difficult to get a good overview of the requests. So I wrote a quick Python script that turns the JSON into something that's a little easier on our poor sysadmin's eyes:

harview_output

It supports colored output, dumping request headers and response headers and the body of POSTs and responses (although this will be very slow). You can filter out uninteresting requests such as images or CSS/JSS with the --filter-X options.

You can get it by cloning the Git repository from the Bitbucket repository.

Cheers!

bbcloner: create mirrors of your public and private Bitbucket Git repositories

 

bbclonerI wrote a small tool that assists in creating mirrors of your public and private Bitbucket Git repositories and wikis. It also synchronizes already existing mirrors. Initial mirror setup requires that you manually enter your username/password. Subsequent synchronization of mirrors is done using Deployment Keys.

You can download a tar.gz, a Debian/Ubuntu package or clone it from the Bitbucket page.

Features

  • Clone / mirror / backup public and private repositories and wikis.
  • No need to store your username and password to update clones.
  • Exclude repositories.
  • No need to run an SSH agent. Uses passwordless private Deployment Keys. (thus without write access to your repositories)

Usage

Here's how it works in short. Generate a passwordless SSH key:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key: /home/fboender/.ssh/bbcloner_rsa<ENTER>
Enter passphrase (empty for no passphrase):<ENTER>
Enter same passphrase again: <ENTER>

You should add the generated public key to your repositories as a Deployment Key. The first time you use bbcloner, or whenever you've added new public or private repositories, you have to specify your username/password. BBcloner will retrieve a list of your repositories and create mirrors for any new repositories not yet mirrored:

$ bbcloner -n -u fboender /home/fboender/gitclones/
Password: 
Cloning new repositories
Cloning project_a
Cloning project_a wiki
Cloning project_b

Now you can update the mirrors without using a username/password:

$ bbcloner /home/fboender/gitclones/
Updating existing mirrors
Updating /home/fboender/gitclones/project_a.git
Updating /home/fboender/gitclones/project_a-wiki.git
Updating /home/fboender/gitclones/project_b.git

You can run the above from a cronjob. Specify the -s argument to prevent bbcloner from showing normal output.

The mirrors are full remote git repositories, which means you can clone them:

$ git clone /home/fboender/gitclones/project_a.git/
Cloning into project_a...
done.

Don't push changes to it, or the mirror won't be able to sync. Instead, point the remote origin to your Bitbucket repository:

$ git remote rm origin
$ git remote add origin git@bitbucket.org:fboender/project_a.git
$ git push
remote: bb/acl: fboender is allowed. accepted payload.

Get it

Here are ways of getting bbcloner:

More information

Fore more information, please see the Bitbucket repository.

python-libtorrent program doesn't exit

(TL;DR: To the solution)

I was mucking about with the Python bindings for libtorrent, and made something like this:

import libtorrent
 
fname = 'test.torrent'
ses = libtorrent.session()
ses.listen_on(6881, 6891)
 
info = libtorrent.torrent_info(fname)
h = ses.add_torrent({'ti': info, 'save_path': session_dir})
prev_progress = -1
while (not h.is_seed()):
    status = h.status()
    progress = int(round(status.progress * 100))
    if progress != prev_progress:
        print 'Torrenting %s: %i%% done' % (h.name(), progress)
        prev_progress = progress
    time.sleep(1)
 
print "Done torrenting %s" % (h.name())
# ... more code

After running it a few times, I noticed the program would not always terminate. You'd immediately suspect a problem in the while loop condition, but in all cases "Done torrenting Foo" would be printed and then the program would hang.

In celebration of one of the rare occasions that I don't spot a hanging problem in such a simple piece of code right away, I fired up PDB, the Python debugger, which told me:

$ pdb ./tvt 
> /home/fboender/Development/tvtgrab/trunk/src/tvt(9)()
-> import sys
(Pdb) cont
Torrenting Example Torrent v1.0: 100% done
Done torrenting Example Torrent v1.0
The program finished and will be restarted

after which it promptly hung. That last line, "The program finished and will be restarted", that's PDB telling us execution of the program finished. Yet it still hung.

At this point, I was suspecting threads. Since libtorrent is a C++ program, and as the main loop in my code doesn't actually really do anything, it seems libtorrent is doing its thing in the background, and not properly shutting down every now and then. (Although it's more likely I just don't understand what it's doing) It's quite normal for torrent clients to take a while before closing down, especially if there are still peers connected. Most of the time, if I waited long enough, the program would terminate normally. However, sometimes it wouldn't terminate even after an hour, even if no peers were at any point connected to any torrents (the original code does not always load torrents into a session).

Digging through the documentation, I couldn't easily find a method of shutting down the session. I did notice the following:

~session()

The destructor of session will notify all trackers that our torrents have been shut down. If some trackers are down, they will time out. All this before the destructor of session returns. So, it's advised that any kind of interface (such as windows) are closed before destructing the session object. Because it can take a few second for it to finish. The timeout can be set with set_settings().


Seems like libtorrent uses destructors to shut down the session. Adding the following to the end of the code fixed the problem of the script not exiting:

del ses

The del statement in Python calls any destructors (if you're lucky) on that class. Having nearly zero C++ knowledge, I suspect C++ calls destructors automatically at program exit. Python doesn't do that though, so we have to call it manually.

Update: Calling the destructor does not definitively solve the problem. I am still experiencing problems with hangs when calling the session destructor. I will investigate further and update when a solution has been found.

Update II: Well, I've not been able to solve the problem any other way than upgrading to the latest version of libtorrent. So I guess that'll have to do.

Python UnitTest: AssertRaises pitfall

I ran into a little pitfall with Python's UnitTest module. I was trying to unit test some failure cases where the code I called should raise an exception.

Here's what I did:

def test_file_error(self):
    self.assertRaises(IOError, file('/foo', 'r'))

I mistakenly thought this would work, in that assertRaises would notice the IOError exception and mark the test as passed. Naturally, it doesn't:

ERROR: test_file_error (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 10, in test_file_error
    self.assertRaises(IOError, file('/foo', 'r'))
IOError: [Errno 2] No such file or directory: '/foo'

The problem is that I'm a dumbass and I didn't read the documentation carefully enough:


assertRaises(exception, callable, *args, **kwds)
Test that an exception is raised when callable is called with any positional or keyword arguments that are also passed to assertRaises().

If you look carefully, you'll notice that I did not pass in a callable. Instead, I passed in the result of a callable! Here's the correct code:

def test_file_error(self):
    self.assertRaises(IOError, file, '/foo', 'r')

The difference is that this time I pass a callable (file) and the arguments ('/foo' and 'r') that the test case should pass to that callable. self.AssertRaises will then call it for me with the specified arguments and catch the IOError. In the first scenario (the wrong code), the call is made before the unit test is actually watching out for it.