contact
----------------------------

Blog <-

Archive for the ‘programming’ Category

RSS   RSS feed for this category

PyWebkitGTK: 'module' object has no attribute 'WebView'

If you're working with PyWebkitGTK, and you get the following error:

Traceback (most recent call last):
  File "./webkit.py", line 7, in 
    import webkit
  File "/home/todsah/webkit.py", line 18, in 
    class BrowserPage(webkit.WebView):
AttributeError: 'module' object has no attribute 'WebView'

… make sure you haven't named your script 'webkit.py', and there is no other script with the same name in that directory. Also delete any webkit.pyc pic files. Do'h!

gCountDown: Systray countdown timer for Linux

I needed an easy way to set timers on my desktop PC. All I really want is to set a countdown in hours, minutes and seconds, and have it alert me when that time has elapsed. I couldn't find anything simple with some exceptions that wouldn't compile (anymore) due to missing libs (which weren't available in Xubuntu). So I whipped up my own.

You can download it from its home page, and here's a screenshot of the thing:

Additionally, I was quite amazed at how easy it is to write GUI applications using just GTK in combination with Glade. Writing this tool took me only about an hour, with no previous knowledge. All it really required was creating a GTK Status Icon with an active signal handler. The handler pops up an interface put together in Glade by loading the gcountdown.glade file using gtk.glade.XML(). Connecting signals to the widgets is also super easy with signal_autoconnect().

Take a look at the source code. It's only a measly 136 lines.

Redirect stdout and stderr to a logger in Python

I'm writing a daemon and needed a method of redirecting anything that gets sent to the standard out and standard error file descriptors (stdout and stderr) to a logging facility. I googled around a bit, but couldn't find a satisfactory solution, so I came up with this.

import logging
import sys
 
class StreamToLogger(object):
   """
   Fake file-like stream object that redirects writes to a logger instance.
   """
   def __init__(self, logger, log_level=logging.INFO):
      self.logger = logger
      self.log_level = log_level
      self.linebuf = ''
 
   def write(self, buf):
      for line in buf.rstrip().splitlines():
         self.logger.log(self.log_level, line.rstrip())
 
logging.basicConfig(
   level=logging.DEBUG,
   format='%(asctime)s:%(levelname)s:%(name)s:%(message)s',
   filename="out.log",
   filemode='a'
)
 
stdout_logger = logging.getLogger('STDOUT')
sl = StreamToLogger(stdout_logger, logging.INFO)
sys.stdout = sl
 
stderr_logger = logging.getLogger('STDERR')
sl = StreamToLogger(stderr_logger, logging.ERROR)
sys.stderr = sl
 
print "Test to standard out"
raise Exception('Test to standard error')

We define a custom file-like object called StreamToLogger object which sends anything written to it to a logger instead. We then create two instances of that object and replace sys.stdout and sys.stderr with our fake file-like instances.

The output logfile looks like this:

2011-08-14 14:46:20,573:INFO:STDOUT:Test to standard out
2011-08-14 14:46:20,573:ERROR:STDERR:Traceback (most recent call last):
2011-08-14 14:46:20,574:ERROR:STDERR:  File "redirect.py", line 33, in 
2011-08-14 14:46:20,574:ERROR:STDERR:raise Exception('Test to standard error')
2011-08-14 14:46:20,574:ERROR:STDERR:Exception
2011-08-14 14:46:20,574:ERROR:STDERR::
2011-08-14 14:46:20,574:ERROR:STDERR:Test to standard error

(Finite-) State Machines in practice

(The lastest version of this article is always available in from this location).

1. Introduction

A (Finite-) State Machine is a method of determining output by reading input and switching the state of the machine (computer program). Depending on the type of State Machine (more on this later), the state of the machine is changed by looking at the current state, sometimes in combination with looking at the input.

Read the rest of this entry »

Filesystem Latency

There's an interesting on-going series of articles on file system latency over at Brendan's Blog. Usually when system administrators look into I/O performance, we look at the I/O of the disks. This is usually fine for a rough estimate of raw disk performance, but there's a lot more going on between the actual application and the disk: buffers, cache, the file system, etc. Brendan goes into detail regarding these matters by examining I/O performance of a MySQL database at both the disk and file system level:

Closures, and when they're useful.

When is a closure useful?

Before we start with why a closure is useful, we might first need to understand what exactly a closure is.

First-class functions

In order to understand what a closure is, we must realize that in many, if not most, languages we can not just call functions, but we can also pass references to a function around in a variable. If a language supports that, it is said to have first-class functions. This can be used, amongst other things, to implement callbacks: you pass a reference to a function to a part of the program, which can then later call the function and obtain the results.

A common example of something that uses callback functions is a sorting routine that takes a comparison function. Such a function is called a higher-order function. For instance, Python's sorted function:

sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list

The cmp parameter is a callback function. If we have a list of custom objects:

class MyPerson():
   def __init__(name, age):
      self.name = name
      self.age = age

people = [
   MyPerson('john', 24),
   MyPerson('santa', 100'),
   MyPerson('pete', 30),
]

and we want to sort people by age, we can do so by defining our own custom comparison function and pass it to sorted:

def my_cmp(a, b):
   return(cmp(a.age, b.age))

sorted(people, my_cmp)

The sorted function will now loop through the items in people and call the callback function my_cmp for two items in the list at a time. If one is bigger/smaller than the other, it swaps them in order to sort people. Note that we are not calling my_cmp! We're simply passing a reference to the function to sorted.

Nested functions

Okay, so that covers first-class functions. Many languages also support nested functions. Example:

def get_cmp_func(key='age'):

   def my_cmp_name(a, b):
      return(cmp(a.name, b.name))

   def my_cmp_age(a, b):
      return(cmp(a.age, b.age))
      
   if key == 'name':
      return my_cmp_name
   elif key == 'age':
      return my_cmp_age

The get_cmp_func returns a function that can be used to compare things depending on what you pass as the key parameter. get_cmp_func is also a higher-order function because it returns a reference to a function. Of course in this use-case there are better ways of sorting the list, but it's just an example.

Anonymous functions

Anonymous functions are not a requirement for closures, but it may be a good idea to explain what they are nonetheless, as there's a lot of confusion over when exactly something is an anonymous function.

Anonymous functions, sometimes also called lambda's, are simply that: anonymous. They have no name. Looking at previous examples in this post, we see function names such as my_cmp, get_cmp_func and even nested functions with names: my_cmp_age. Anonymous functions have no name. That doesn't mean they can't be passed around as a reference though! Example:

sorted(people, lambda a, b: cmp(a.age, b.age))

The anonymous function here is: lambda a, b: cmp(a.age, b.age). As you can see, it looks a lot like our first my_cmp function, except it has no name and doesn't seem to return anything. That's because an anonymous (lambda) function in Python always implicitly returns its first statement. In fact, you can only have one statement in a lambda in Python. (Other languages allow for more advanced anonymous functions; Python likes to keep it simple).

Okay, so why exactly would you need anonymous functions? Well, if your language already supports first-class functions (passing around references to a function), there really isn't a need for anonymous functions, except that it saves some typing. Lambda functions are syntactic sugar for first-class functions.

Scope

So.. a closure, what is it? Again, before we can understand closures, we need to understand scope. Scope determines when we can access defined variables and functions at a certain location in our code. When a function is called, the programming language allocates a piece of memory where parameters to the function are stored and local variables can be stored by the function. This piece of memory (called the stack) is automatically cleared when the function returns. This is called the local scope.

Functions usually can also reference variable of the parent scope. For example:

a = 10

def print_a():
   print a

print_a() # output: 10

The print_a function has access to the a variable in the parent scope. But if we define a in a function's local scope, we'll get an error:

def define_a():
   a = 10

def print_a():
   print a

define_a()
print_a() # NameError: global name 'a' is not defined

We get a NameError when we try to print a's value, because it is defined in define_a's local scope, which will be destroyed as soon as define_a stops running. This is called going out of scope. Anything a piece of code can access (local scope, parent scope) is defined as being within scope.

Closures

Now, finally, closures!

A closure is a special way in which scopes are handled. Instead of a function going out of scope and all the variables/functions its scope (both the local, as the parent, as the grand-parent, etc scope) being destroyed, the scope is kept around for later usage. Let's look at an example:

def define_a():
   a = 10

   def print_a():
      print a

   return(print_a)

var_print_a = define_a()
var_print_a() # output: 10

This outputs 10. Let's take a look at what's happening. We define a function define_a and set a = 10 in its local scope. We then define a nested function that prints a from the parent scope. The define_a function then returns a reference to that function.

Next, we call define_a, which returns a reference to print_a and assigns it to variable var_print_a. Then we call var_print_a as a function (this is called dereferencing). By all accounts it shouldn't work, because define_a has already stopped running. It has gone out of scope and its scope (containing a) should have been destroyed. But it's not, because Python kept its scope around. This is a closure. The variables that were in scope at the time the closure was generated are still accessible for the function, and are now known as free variables.

The use-case

So, when are closures useful? Why not just use an Object and store the value in the object, along with a method that uses the object?

Let's say we have a multithreaded program that handles requests. Data is stored in a database. The request handlers need to access the data in the database, but each thread has to have its own handler to the database, or they might accidentally overwrite each other's data. So our multithreaded program allows us to register a callback function which will be called when a new thread starts. The callback function should return a new database connection for use in the thread.

def make_db_connection():
   return(db.conn(host='localhost', username='john', passwd='f00b4r'))

app = MyMultiThreadedApp(on_new_thread_cb = make_db_connection)
app.serve()

MyMultiThreadedApp will call make_db_connection for each new thread it starts, and the thread can then use the database connection returned by make_db_connection. But there is a problem! The database connection information (host, username, passwd) is hard-coded, but we want to get it from a configuration file instead!

So? We just pass some paramters to the make_db_connection right? Wrong!

def make_db_connection(host, username, passwd):
   return(db.conn(host=host, username=username, passwd=passwd))

app = MyMultiThreadedApp(on_new_thread_cb = make_db_connection)
app.serve()

This example wont work! Why not? Because MyMultiThreadedApp has absolutely no idea it should pass parameters to make_db_connection. Remember that we're not calling the function ourselves, we're just passing a reference to the MyMultiThreadedApp, which will call it eventually. There's no way for it to know which parameters it should pass, because that depends on how your database needs to be set up. SQLite only needs a path parameter, but MySQL also needs username, password, and a host.

This is where closures step in:

def gen_db_connector(host, username, passwd):
   def make_db_connection():
      return(db.conn(host=host, username=username, passwd=passwd))
   return(make_db_connection)

callback_func = gen_db_connector('localhost', 'john', 'f00b4r')
app = MyMultiThreadedApp(on_new_thread_cb = callback_func)
app.serve()

The gen_db_connector function generates a closure (make_db_connection) which has access to host, username and passwd. We then get a reference to the closure, put it in callback_func and pass that to MyMultiThreadedApp. Now when a new thread is created, and the callback function is called, it will have access to the host, username and passwd information, without MyMultiThreadedApp needing to know which params it should pass on.

An alternative to closures

There's a different way of accomplishing this though. By using objects:

class DBConnector():
   def __init__(self, host, username, passwd):
      self.host = host
      self.username = username
      self.passwd = passwd

   def connect(self):
      return(db.conn(
         host=self.host, 
         username=self.username,
         passwd=self.passwd)
      )

db_conn = DBConnector('localhost', 'john', 'f00b4r')
app = MyMultiThreadedApp(on_new_thread_cb = db_conn.connect)
app.serve()

However, this is a lot more lines, and wheter it works depends on if your programming language allows first-class methods. That is, passing references around to methods on an object, while also allowing you to call them as an instance method (instead of just as a static method).

I'd personally argue for the Object way. Closures are a concept which is very hard to understand for less experienced programmers. It is a matter of debate on whether closures hide state in an unpredictable way. I tend to think they do, and I'm not much of a fan of free variables since it is hard to guess where they came from. At any rate, objects are easier to understand than closures, so if at all possible, go for the object-way.

Regular expression Denial of Service (ReDoS)

It's only logical, but I hadn't really thought about it much. Turns out Regular Expression can be vulnerable to external Denial of Service attacks.

Lessons on development of 64-bit C/C++ applications

Lessons on development of 64-bit C/C++ applications:

The course is devoted to creation of 64-bit applications in C/C++ language and is intended for the Windows developers who use Visual Studio 2005/2008/2010 environment. Developers working with other 64-bit operating systems will learn much interesting as well. The course will consider all the steps of creating a new safe 64-bit application or migrating the existing 32-bit code to a 64-bit system.

Comment your MySQL schema

Many people may not now, but you can comment your MySQL schema:

SQL: good comments conventions

Maatkit: Tools for MySQL

Maatkit is a suite of command-line tools for MySQL. It contains some rather nifty things for query analyses, replication, and other stuff. Some of the more interesting highlights:

Found via databasejournal.com, which has two articles on Maatkit:

The Wonders of Maatkit for MySQL and
Even more Maatkit for MySQL.