Log <-

Archive for the ‘python’ Category

RSS   RSS feed for this category

Simple function caching in Python

Tuesday, September 15th, 2009

Python dynamic nature continues to astound me. I was working on a small library when I noted it did a lot of redundant IO calls. Now I'm not one for premature optimization, but a while ago I was thinking about writing a decorator or something that would wrap around functions and methods, and cache the returns. Turns out it's way easier than I thought, and I whipped this up in a couple of minutes:

#!/usr/bin/python
#
# Public Domain.
 
__cache = {} # Global cache
 
def fncache(fn):
   """
   Function caching decorator. Keeps a cache of the return 
   value of a function and serves from cache on consecutive
   calls to the function. 
 
   Cache keys are computed from a hash of the function 
   name and the parameters (this differentiates between 
   instances through the 'self' param). Only works if 
   parameters have a unique repr() (almost everything).
 
   Example:
 
   >>> @fncache
   ... def greenham(a, b=2, c=3):
   ...   print 'CACHE MISS'
   ...   return('I like turtles')
   ... 
   >>> print greenham(1)           # Cache miss
   CACHE MISS
   I like turtles
   >>> print greenham(1)           # Cache hit
   I like turtles
   >>> print greenham(1, 2, 3)     # Cache miss (even though default params)
   CACHE MISS
   I like turtles
   >>> print greenham(2, 2, ['a']) # Cache miss
   CACHE MISS
   I like turtles
   >>> print greenham(2, 2, ['b']) # Cache miss
   CACHE MISS
   I like turtles
   >>> print greenham(2, 2, ['a']) # Cache hit
   I like turtles
   """
   def new(*args, **kwargs):
      h = hash(repr(fn) + repr(args) + repr(kwargs))
      if not h in __cache:
         __cache[h] = fn(*args, **kwargs)
      return(__cache[h])
   new.__doc__ = "%s %s" % (fn.__doc__, "(cached)")
   return(new)
 
if __name__ == '__main__':
   import doctest
   doctest.testmod()

Save to a file named 'fncache.py' and import it into your program. Then decorate your functions and methods with it, and their output will be cached. Python rocks. Remember, only use for functions that do heavy calculations, file or network IO. Determining the uniqueness of the function call is rather expensive.

HIrcd – minimal IRC server in Python

Monday, September 14th, 2009

I wrote a little IRC server in Python:

HIrcd is a minimal, hacky implementation of an IRC server daemon written in Python in about 400 lines of code, including comments, etc.

It is mostly useful as a testing tool or perhaps for building something like a private proxy on. Do NOT use it in any kind of production code or anything that will ever be connected to by the public.

Direct link to the source code, for those interested.

Encodings in Python

Friday, May 22nd, 2009

There's a whole slew of information regarding all kinds of encoding issues out there on the big bad Internet. Some deal with how unicode works, some with what UTF-8 is and how it relates to other encodings and some with how to transform from one encoding to another. All that theory is nice, but I've found a rather worrying lack of practical, understandable and contextual information on dealing with encodings in Python, leading me to think I'd never be able to properly deal with encodings in Python.

So I took the plunge, and tried to find some stuff out. Here's what I came up with. All of this might be terribly wrong though. Encodings are a complicated subject if you ask me, so feel free to correct me if I'm wrong.

NOTICE: This article uses special HTML entities in various places to show output. Depending on your browser, the encodings it supports and the font you are using and its capabilities of showing UTF-8 characters, you may or may not be able to properly see these characters. In these cases a description of the character is given between parenthesis right after the character.

(more…)

Apache, FastCGI and Python

Thursday, February 12th, 2009

FastCGI is a hybrid solution to serving web applications written in a wide variety of programming languages. It sits somewhere between CGI, which spawns a new instance of the web application for each request, and the various web server modules (such as mod_php, mod_python and mod_wsgi) which take care of pre-spawning a pool of interpreters and web applications from within the web server, which will then handle requests.

FastCGI, too, can take care of spawning a pool of web application instances to handle requests. It can also facilitate communications between a web server and an external, already running web application. Unlike CGI, FastCGI does not spawn a new process for each request, and unlike the various web server modules it is not completely embedded within the web server. Instead it uses TCP/IP or Unix sockets to communicate to the web application. This makes it possible to create advanced setups such as spreading out requests over multiple servers, limiting web applications' rights and system resources using the standard Unix tool-set, etc.

FastCGI is available for a large range of web servers, is fast and is powerful and versatile in its capabilities. It is however not well documented, nor easy to set up. This document covers the basic idea behind FastCGI, setting up FastCGI for Apache (v2) and hooking it up to a simple Python web application. Such a light-weight setup, which requires nothing more than a FastCGI Python library, can provide an alternative Python web development environment for people who feel that the Python web development frameworks (Django, Pylons, TurboGears and even Web.py) are too bloated.

Jump to this document.

Dependency resolving algorithm

Thursday, August 7th, 2008

Here's a little explanation of a possible dependency resolution algorithm. I've made this as basic as possible, so it's easy to understand. If you're at home in (graph) algorithms, this post is probably not of much interest to you.

Premise

Suppose we have five objects which all depend on some of the other objects. The objects could be anything, but in this case let's say they're very simple software packages (no minimal versions, etc) that depend on other packages which must be installed first. How does one find the right order of installing the packages?

(more…)

Rant: Dicts that are not dicts

Sunday, August 3rd, 2008

There's something I have to get off my chest. I HATE it when things in Python pretend to be something they are not. I often use IPython to introspect things so I know how to use them. Lately, I've been running into more and more libraries which return things that pretend to be one thing, but which really are something else. Observe:

import optparse
 
parser = optparse.OptionParser()
parser.add_option("-p", dest="path", action="store", type="string")
(options, args) = parser.parse_args()
 
print options

Output:

{'path': None}

"Ah", I think, "a dictionary. Swell!", and go ahead and do:

print options.get('path', 'default')

And then it responds:

Traceback (most recent call last):
  File "/home/todsah/foo.py", line 7, in <module>
    print options.get('path', 'default')
AttributeError: Values instance has no attribute 'get'

What the hell? A dictionary with no get attribute? Then, when you print the type, it turns out it's not a dictionary at all: "<type 'instance'>". It's a plain old object instance, and you have to use getattr(), etc instead of foo[key] and foo.get(). Fortunately, I've seen this trap now for so many times, the first thing I do is check the type of the thing I'm having problems with, but occasionally it still bites me in the ass.

I am seeing this more and more often, and it annoys me to no end. Python is supposed to be a rapid application development language, and things like this are extremely annoying. Don't pretend to be something you're not. If an object is going to behave dictionary-like, why not just extend the real dict object? Wouldn't that be so much easier?

So, developers: Please don't make things appear like things they're not. It causes confusion, and possibly even bugs.

Why Python Rocks III: Parameter expansion

Friday, July 25th, 2008

Okay. So what's cool about Python? I can't count the number of times I've had to show skeptics why Python is cool, what Python can do that their favorite language can't do. So I'm writing a bunch of articles showing off Python's Awesomeness.

Previous articles in this series:

(more…)

Python destructor and garbage collection notes

Monday, July 7th, 2008

I hardly ever use destructors in Python objects. I guess Python's dynamic nature often negates the need for destructors. Today though, I needed to write some data to disk when an object was destroyed, or more accurately, when the program exited. So I defined a destructor in the main controller object using the __del__ magic method. To my surprise, the destructor was never called. Not only was it never called upon program exit, but also not when I deleted it manually (using del). The code I needed this in was written a while ago, so I wasn't intimately familiar with it anymore, leading me to think it was some strange bug in my program. I eventually traced the problem to some code which basically did this:

class Foo:
	def __init__(self, x):
		print "Foo: Hi"
		self.x = x
	def __del__(self):
		print "Foo: bye"
 
class Bar:
	def __init__(self):
		print "Bar: Hi"
		self.foo = Foo(self) # x = this instance
 
	def __del__(self):
		print "Bar: Bye"
 
bar = Bar()
# del bar # This doesn't work either.

What the above code does is that the Foo instance keeps a reference to its creator class, which is an instance of Bar. The output is:

Bar: Hi
Foo: Hi

As you can see, the destructors are never called, not even when we add a del bar at the end of the program. Removing the self.x = x solves (well, makes it disappear) the problem.

Garbage collection

The reason that __del__ is never called suddenly becomes obvious when looking at the above code. It's a 'problem' with certain garbage collectors, namely: circular referencing. Python uses a reference counting garbage collecting algorithm. Such an garbage collection algorithm increases a counter on each data instance for each reference that exists to that data instance and decreases the counter when a reference to the data instance is removed. When the counter reaches zero, the data instance is garbage collected because nothing points to it anymore. Reference counting has a problem with circular links. In the above code foo.x points to bar, and bar.foo points to foo. This means that the reference counter never goes down, and the objects never get garbage collected. The destructors never gets called because of this.

The reason del foo doesn't work is also simple to explain. I initially confused myself by thinking that del foo would call the destructor, but it only decreases the reference counter on the object (and removes the reference from the local scope). Since the count in the code above for Foo and Bar is 2 (one in the main program, one in the other instances), the count will only go down to 1 for the object.

Further info

After figuring this out, somebody (thanks Cris) pointed me to the documentation on __del__. Had I bothered to read it earlier I would have noticed the note there:

"del x" doesn't directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x's reference count reaches zero. Some common situations that may prevent the reference count of an object from going to zero include: circular references between objects

Something else I noticed in the documentation for __del__:

Circular references which are garbage are detected when the option cycle detector is enabled (it's on by default), but can only be cleaned up if there are no Python-level __del__() methods involved.

Furher reading reveals:

A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). By default, this list contains only objects with __del__() methods.26.1Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn't collect such cycles automatically because, in general, it isn't possible for Python to guess a safe order in which to run the __del__() methods. [...] It's generally better to avoid the issue by not creating cycles containing objects with __del__() methods

This means that objects with cyclic references and __del__ methods will generate memory leaks in your Python program, unless the cyclic references are manually broken before the object is going to be deleted. Something to keep under consideration.

Program exit

You may wonder why Python doesn't simply set all reference counts to 0 when exiting the program? As outlined in this post by the BDFL on __del__:

One final thing to ponder: if we have a __del__ method, should the interpreter guarantee that it is called when the program exits? (Like C++, which guarantees that destructors of global variables are called.) The only way to guarantee this is to go running around all modules and delete all their variables. But this means that __del__ method cannot trust that any global variables it might want to use still exist, since there is no way to know in what order variables are to be deleted.

Like the manual mentions: It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.

Exceptions inside destructors

On a side-note, as mentioned in the post by Van Rossom, exceptions raised in destructors are ignored:

	def __del__(self):
		raise Exception("Oopsy")
		print "Bar: Bye"
Exception exceptions.Exception: Exception('Oopsy',) in > ignored
Bar: Hi
Foo: Hi
Foo: bye

(The warning is generated during compile-time, not run-time)

More side-notes

On another side-note: This is why it pays to learn software engineering students basic stuff like C programming and how garbage collectors and various algorithms work, even though some educators seem to think such information is not required anymore with today's high-level languages.

Also, don't rely on your IDE's ability to display code-completions with a short description of the method and its parameters. Even though I probably wouldn't have read the full documentation on __del__ and del anyway, I've often found that important notes and limitations are missed when not reading the full documentation of methods (deprecation notices, security concerns and nasty side effects in particular). If you want to have some fun with that, try to find a way to securely generate temporary files on a Unix system using Python or C using the manuals.

Update: A good way to prevent circular references seems to be the weakref module: weakref — Weak references. A quick introduction: Mindtrove: Python Weak References. (Thanks to zzzeek @ reddit for pointing out weakrefs)

pyBrainfuck v0.2 released

Monday, July 7th, 2008

I just released v0.2 of pyBrainfuck.

PyBrainfuck is a speed-optimized Brainfuck interpreter written in Python.

Some other Python interpreters already exists for Brainfuck, but they are either obfuscated or awfully slow. PyBrainfuck has been optimized for speed by doing various preprocessing on the code such as pre-caching loop instructions, removing non-instructions, etc. PyBrainfuck also has configurable memory size, infinite loop protection and a somewhat spartan debugger.

PyBrainfuck can be used both as a stand-alone Brainfuck interpreter or as a python library. It can read from standard input or from a string (in library mode) and write to standard out or to a string buffer (in library mode).

Changes in this release:

  • Improved exception throwing. Exceptions now include an error number.
  • A bug was fixed in the jump instruction pre-processor where it would sometimes scan beyond the end-of-line of the code.
  • A bug was fixed where a brainfuck program could increase the memory value beyond the byte boundary. It now wraps to 0 at 256 and to 255 at -1.

Update: Direct link to the interpreter code for those who are interested.

Why Python Rocks II: Data structures

Saturday, June 28th, 2008

Okay. So what's cool about Python? I can't count the number of times I've had to show skeptics why Python is cool, what Python can do that their favorite language can't do. So I'm writing a bunch of articles showing off Python's Awesome.

Previous articles in this series:

(more…)