<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Electricmonk.nl weblog &#187; python</title>
	<atom:link href="http://www.electricmonk.nl/log/category/programming/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.electricmonk.nl/log</link>
	<description>Ferry Boender&#039;s ramblings</description>
	<lastBuildDate>Mon, 16 Jan 2012 15:23:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python UnitTest: AssertRaises pitfall</title>
		<link>http://www.electricmonk.nl/log/2012/01/14/python-unittest-assertraises-pitfall/</link>
		<comments>http://www.electricmonk.nl/log/2012/01/14/python-unittest-assertraises-pitfall/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 20:25:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4821</guid>
		<description><![CDATA[I ran into a little pitfall with Python&#039;s UnitTest module. I was trying to unit test some failure cases where the code I called should raise an exception. Here&#039;s what I did: def test_file_error(self): self.assertRaises(IOError, file('/foo', 'r')) I mistakenly thought this would work, in that assertRaises would notice the IOError exception and mark the test [...]]]></description>
			<content:encoded><![CDATA[<p>I ran into a little pitfall with Python&#039;s UnitTest module. I was trying to unit test some failure cases where the code I called should raise an exception.</p>
<p>Here&#039;s what I did:</p>
<pre>def test_file_error(self):
    self.assertRaises(IOError, file('/foo', 'r'))</pre>
<p>I mistakenly thought this would work, in that <tt>assertRaises</tt> would notice the <tt>IOError</tt> exception and mark the test as passed. Naturally, it doesn&#039;t:</p>
<pre>ERROR: test_file_error (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 10, in test_file_error
    self.assertRaises(IOError, file('/foo', 'r'))
IOError: [Errno 2] No such file or directory: '/foo'</pre>
<p>The problem is that I&#039;m a dumbass and I didn&#039;t read the documentation carefully enough:</p>
<p><code><br />
<b>assertRaises</b>(<i>exception</i>, <i>callable</i>, *<i>args</i>, **<i>kwds</i>)<br />
  Test that an exception is raised when callable is called with any positional or keyword arguments that are also passed to assertRaises().<br />
</code></p>
<p>If you look carefully, you&#039;ll notice that I did not pass in a <i>callable</i>. Instead, I passed in the result of a callable! Here&#039;s the correct code:</p>
<pre>def test_file_error(self):
    self.assertRaises(IOError, file, '/foo', 'r')</pre>
<p>The difference is that this time I pass a callable (<tt>file</tt>) and the arguments (<tt>'/foo'</tt> and <tt>'r'</tt>) that the test case should pass to that callable. <tt>self.AssertRaises</tt> will then call it for me with the specified arguments and catch the <tt>IOError</tt>. In the first scenario (the wrong code), the call is made before the unit test is actually watching out for it. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2012/01/14/python-unittest-assertraises-pitfall/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Evolutionary Algorithm: Evolving &quot;Hello, World!&quot;</title>
		<link>http://www.electricmonk.nl/log/2011/09/28/evolutionary-algorithm-evolving-hello-world/</link>
		<comments>http://www.electricmonk.nl/log/2011/09/28/evolutionary-algorithm-evolving-hello-world/#comments</comments>
		<pubDate>Tue, 27 Sep 2011 22:47:04 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[math]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4746</guid>
		<description><![CDATA[Note: The latest version of this article is always available from the Writings page in HTML, PDF, ePub and AsciiDoc (source) format. My interest in Evolutionary Algorithms started when I read On the Origin of Circuits over at DamnInteresting.com. I always wanted to try something like that out for myself, but never really found the [...]]]></description>
			<content:encoded><![CDATA[<p><b>Note:</b> The latest version of this article is always available from <a href="http://www.electricmonk.nl/Writings/HomePage">the Writings page</a> in HTML, PDF, ePub and AsciiDoc (source) format.</p>
<p>My interest in Evolutionary Algorithms started when I read <a href="http://www.damninteresting.com/on-the-origin-of-circuits/">On the Origin of Circuits</a> over at DamnInteresting.com. I always wanted to try something like that out for myself, but never really found the time. Now I have, and I think I&#039;ve found some interesting results.</p>
<p><b>Disclaimer</b>: I know next to nothing about Evolutionary Algorithms. Everything you read in here is the product of my own imagination and tests. I may use the wrong algorithms, nomenclature, methodology and might just be getting very bad results. They are, however, interesting to me, and I do know something about evolution, so here it is anyway.</p>
<h2>How Evolution Works</h2>
<p>So, how does an Evolutionary Algorithm work? Why, the same as normal biological evolution, mostly! Very (very) simply said, organism consist of DNA, which determine their characteristics. When organisms reproduce, there is a chance their offspring&#039;s DNA contains a mutation, which can lead to difference in characteristics. Sufficiently negative changes in offspring make that offspring less fit to survive, causing it, and the mutation, to die out eventually. Positive changes are passed on to future offspring. So through evolution an set of DNA naturally tends to grow towards its &#034;goal&#034;, which is ultimate fitness for its environment. Now this is not an entirely correct description, but for our purposes it is good enough.</p>
<h2>A simple evolutionary algorithm</h2>
<p>There is nothing stopping us from using the same technique to evolve things towards goals set by a programmer. As can be seen from the Antenna example in the DamnInteresting article, this can sometimes even produce better things than engineers can come up with. In this post, I&#039;m going to evolve the string &#034;Hello, World!&#034; from random garbage. The first example won&#039;t be very interesting, but it demonstrates the concept rather well.</p>
<p>First, lets define our starting point and end goal:</p>
<pre>source = "jiKnp4bqpmAbp"
target = "Hello, World!"</pre>
<p>Our evolutionary algorithm will start with &#034;jiKnp4bqpmAbp&#034;, which we can view as the DNA of our &#034;organism&#034;. It will then randomly mutate some of the DNA, and judge the new mutated string&#039;s fitness. But how do we determine fitness? This is probably the most difficult part of any evolutionary algorithm.</p>
<p>Lucky for us, there&#039;s an easy way to do this with strings. All we have to do is take the value of each character in the mutated string, and see how much it differs from the same character in the target string. This is called the distance between two characters. We then add all those differences, which leads us to a single value which is the fitness of that string. A fitness of 0 is perfect, and means that both strings are exactly the same. A fitness of 1 means one of the characters is off by one. For instance, the strings &#034;Hfllo&#034; and &#034;Hdllo&#034; both have a fitness of one. The higher the fitness number, the less fit it actually is!</p>
<p>Here&#039;s the fitness function. </p>
<pre>
def fitness(source, target):
   fitval = 0
   for i in range(0, len(source)):
      fitval += (ord(target[i]) - ord(source[i])) ** 2
   return(fitval)</pre>
<p>If you look closely, you&#039;ll notice that for each character, I square the difference. This is to convert any negative numbers to positive ones, and to put extra emphasis on larger differences. If we don&#039;t do this, the string &#034;Hannp&#034; would have a fitness of 0. You see, the difference between &#039;e&#039; and &#039;a&#039; is -5, between &#039;l&#039; and &#039;n&#039; is +2 (which we have twice) and between &#039;o&#039; and &#039;p&#039; is +1. Adding these up yields a fitness of 0, but it&#039;s not the string we want at all. If we square the differences, they become 25, 4, 4 and 1, which yields a fitness of 34. Effectively, we square each difference so that they can&#039;t cancel each other out.</p>
<p><b>Edit:</b> In the mutation algorithm below, I only mutate one character by one value at a time. It has been pointed out that, unless I actually allow for larger mutations, squaring the distance is largely pointless, since new mutations will always only differ by one value. At the time I wrote this fitness function, I had no idea how the rest of the algorithm would look like. It seemed like a good idea.</p>
<p>Now we need to introduce mutations into our string. This is rather easy. We simply pick a random character in the string, and either increment or decrease it by one, or leave it alone:</p>
<pre>def mutate(source):
   charpos = random.randint(0, len(source) - 1)
   parts = list(source)
   parts[charpos] = chr(ord(parts[charpos]) + random.randint(-1,1))
   return(''.join(parts))</pre>
<p>Time to tie the whole shabang together! </p>
<pre>fitval = fitness(source, target)
i = 0
while True:
   i += 1
   m = mutate(source)
   fitval_m = fitness(m, target)
   if fitval_m < fitval:
      fitval = fitval_m
      source = m
      print "%5i %5i %14s" % (i, fitval_m, m)
   if fitval == 0:
      break
</pre>
<p>This should be easy enough to understand. For each iteration of the While-loop, we mutate the string and then calculate its fitness. If it is fitter then the original string (the parent), we make the child the new string. Otherwise, we throw it away. If the fitness is 0, we're done!</p>
<p>Lets look at some output. I'm snipping out some intermediary output cause it's not terribly interesting.</p>
<p>At generation 1, we have a fitness of 15491, and the string looks nothing like "Hello, World!". The same for generation 20, 40, 60, etc.</p>
<pre>    1 15491  jjKnp4bqpmAbp
   20 15400  jiKnp3bppoAbp
   40 15377  jiKlo2bpooAdp
   60 15130  iiKlo2aoooAdp</pre>
<p>Not much progress so far. At generation 500 it's still a load of nonsense:</p>
<pre>  500  9986  \eTlo,YaorNdf</pre>
<p>Generation 1200, we start to see something that looks like "Hello, World!":</p>
<pre> 1200  4186  Heglo,LWorhdP</pre>
<p>Generation 1500, we're getting very close!</p>
<pre> 1500  3370  Hello,GWorldL</pre>
<p>It still takes a good 1500 generations more before we're finally there:</p>
<pre>
 3078     2  Hello, Vorld"
 3079     2  Hfllo, World"
 3080     2  Hfllo, World"
 3081     0  Hello, World!
</pre>
<p>There it is!</p>
<h2>A better, more interesting, algorithm</h2>
<p>Okay, so that worked. But... it was kinda lame. Nothing interesting to see, really, was there? That's because our algorithm was a little <i>too</i> simplistic. Only one "organism" in the gene pool, only one character mutated at any time. We can do better than that, so let's modify the program to make it more interesting.</p>
<p>We're not going to touch our <tt>fitness</tt> function, since that works rather well. Instead, lets introduce a gene pool. Instead of having only <i>one</i> string, why not have a whole bunch or randomly generated strings and let them duke it out among themselves. That sounds a bit more real-life, doesn't it?</p>
<pre>GENSIZE = 20
genepool = []
for i in range(0, GENSIZE):
   dna = [random.choice(string.printable[:-5]) for j in range(0, len(target))]
   fitness = calc_fitness(dna, target)
   candidate = {'dna': dna, 'fitness': fitness }
   genepool.append(candidate)</pre>
<p>This little snippet generates a gene pool with 20 random strings and their fitnesses. In an official implementation, the gene pool would be called the <b>population</b>. (Thanks, reddit!)</p>
<p>Now, lets modify our <tt>mutation</tt> function. Instead of mutating one single character, we feed it two parents, picked at random from the genepool, and it will mix their DNA together a bit. This is called <b>crossover</b>. It will also randomly mutate one character in the resulting DNA. It then returns the newly fabricated child, including its fitness.</p>
<pre>def mutate(parent1, parent2):
   child_dna = parent1['dna'][:]

   # Mix both DNAs
   start = random.randint(0, len(parent2['dna']) - 1)
   stop = random.randint(0, len(parent2['dna']) - 1)
   if start > stop:
      stop, start = start, stop
   child_dna[start:stop] = parent2['dna'][start:stop]

   # Mutate one position
   charpos = random.randint(0, len(child_dna) - 1)
   child_dna[charpos] = chr(ord(child_dna[charpos]) + random.randint(-1,1))
   child_fitness = calc_fitness(child_dna, target)
   return({'dna': child_dna, 'fitness': child_fitness})
</pre>
<p>We also need a routine to pick two random parents from the genepool. Now, we could just pick them completely random, but what you really want is for parents with a good fitness to have a better chance of offspring. This is called <b>elitism</b> If we sort the genepool list by fitness, we can use a uniform product distribution to make sure that parents with better fitness get chosen more often. </p>
<p>Now you might ask, what the hell is a uniform product distribution? When you randomly pick a number between, say, one and ten, each number has the same chance of being picked. This is called a "uniform distribution". But when you pick two random numbers, and you multiply them, there's a much bigger chance of getting a bigger number than a smaller number. Hence the name "uniform <i>product</i> distribution". Here's how that looks:</p>
<p><a href="http://www.electricmonk.nl/log/wp-content/uploads/2011/09/uniform_prod.png"><img src="http://www.electricmonk.nl/log/wp-content/uploads/2011/09/uniform_prod.png" alt="" title="uniform_prod" width="320" height="200" class="alignnone size-full wp-image-4750" /></a></p>
<p>So our random parent picker will do just that. We select two random real numbers between 0 and 1, multiple those two random numbers and then scale the result up to our poolsize by multiplying the result with the size of the pool. We return that parent from the pool.</p>
<pre>def random_parent(genepool):
   wRndNr = random.random() * random.random() * (GENSIZE - 1)
   wRndNr = int(wRndNr)
   return(genepool[wRndNr])</pre>
<p>There! Now it's time for our main loop</p>
<pre>while True:
   genepool.sort(key=lambda candidate: candidate['fitness'])

   if genepool[0]['fitness'] == 0:
      # Target reached
      break

   parent1 = random_parent(genepool)
   parent2 = random_parent(genepool)

   child = mutate(parent1, parent2)
   if child['fitness'] < genepool[-1]['fitness']:
      genepool[-1] = child</pre>
<p>For each iteration of the While True loop, we first sort the genepool by fitness so that the most fit parents are at the top. We check to see if the fittest happens to be the target string we're looking for. If so, we stop the loop. </p>
<p>Then we select two parents from the genepool using the uniform product distribution so that fitter parents are chosen more often. We create a bastard mutated child that will mix both parents' DNA together and introduce a little mutation. If the new child is more fit than the worst in the genepool, it will replace that degenerate one in the genepool. In the next iteration, the pool is sorted again on fitness so that the new child takes its rightful place.</p>
<h2>Results</h2>
<p>Now it's time to run this puppy and see what it does. Again, I snip out some of the less interesting stuff.</p>
<p>Here's the genepool in the beginning. The first number is the generation (the number of times the While-loop has run), the second number the fitness and the third column is the DNA for that entry in the genepool.</p>
<pre>     1   7617   'iSx{$,K`u~(B
<b>     1   9284   SQf`1N#UdrPlT</b>
     1  12837   sYIu&lt;E"Fq'^_.
     1  15531   DC8Dg1I$*mUs-
     1  16064   L~*}JBVdF7bu2
     1  16533   1,XU%)5$q[YuO
<b>     1  16588   ff],ceW<0fud&#038;</b>
     1  17316   [V3@2'VgY\{KV
     1  17356   kWw#v/P&lt;#apG9
     1  17581   &lt;Lrh(1hN_Bd)3
     1  18777   TM]_]TbtxFY:q
     1  19656   $zS+EI?BS&gt;%z(
     1  19841   =S;B~((W8 D,6
     1  20398   P_A$D|NPJPio/
     1  21957   J&#038;f=O:g\8'{S2
     1  22543   5*T2c"pMZ80L'
     1  24954   A&amp;lZ#A_}MxI"P
     1  25186   &amp;9MrI|0&amp;x)q,N
     1  28110   OlXT/Q{y3{"LR
     1  29656   8WB99hx%0]}h[</pre>
<p>One big random jumbled mess. Note the ones I've emphasized. These are the parents that were selected for the new child in the next generation. Lets see how it looks after one generation:</p>
<pre>     2   7617   'iSx{$,K`u~(B
     <b>2   8742   SQf`1N#UdfumT</b>
     2   9284   SQf`1N#UdrPlT
     2  12837   sYIu&lt;E"Fq'^_.
     2  15531   DC8Dg1I$*mUs-
     2  16064   L~*}JBVdF7bu2
     2  16533   1,XU%)5$q[YuO
     2  16588   ff],ceW&lt;0fud&#038;
     2  17316   [V3@2'VgY\{KV
     2  17356   kWw#v/P&gt;#apG9
     2  17581   &lt;Lrh(1hN_Bd)3
     2  18777   TM]_]TbtxFY:q
     2  19656   $zS+EI?BS&gt;%z(
     2  19841   =S;B~((W8 D,6
     2  20398   P_A$D|NPJPio/
     2  21957   J&#038;f=O:g\8'{S2
     2  22543   5*T2c"pMZ80L'
     2  24954   A&#038;lZ#A_}MxI"P
     2  25186   &#038;9MrI|0&#038;x)q,N
     2  28110   OlXT/Q{y3{"LR</pre>
<p>Two random parents from the previous generation have their DNA mixed, and have generated an offspring (the bold one) which is better then both of them. It comes in second with a fitness of 8742, while its parents only had fitness of 9284 and 16588. Lets skip ahead a bit and look at the 6th generation:</p>
<pre>     6   7617   'iSx{$,K`u~(B
<b>     6   8742   SQf`1N#UdfumT
     6   9284   SQf`1N#UdrPlT
     6  10198   SQfD1N#UdfumT</b>
     6  12837   sYIu&lt;E"Fq'^_.
     6  15531   DC8Dg1I$*mUs-
     6  16064   L~*}JBVdF7bu2
<b>     6  16387   SQf`1N"MZ80LT</b>
     6  16533   1,XU%)5$q[YuO
     6  16588   ff],ceW&lt;0fud&#038;
     6  17316   [V3@2'VgY\{KV
<b>     6  17356   kWw#v/P&gt;#apG9
     6  17356   kWw#v/P&gt;#apG9</b>
     6  17581   &lt;Lrh(1hN_Bd)3
     6  18777   TM]_]TbtxFY:q
     6  19656   $zS+EI?BS&gt;%z(
     6  19841   =S;B~((W8 D,6
     6  20287   fe],1eW&lt;0fud&#038;
     6  20398   P_A$D|NPJPio/
     6  21957   J&#038;f=O:g\8'{S2</pre>
<p>As you can see, the "SQf" has reproduced again with success, and there are now four variants of it in the genepool. We also note the "kWw#", which there are two identical ones of. This can happen when the entire DNA of one parent is copied and no mutation occurs. In our <tt>mutate</tt> function, we use the first parent's DNA as a base and then randomly overlay some of the seconds parent's DNA. This can anything from the entire second parent's DNA, or nothing at all. But generally, the chance is higher that the first parent's DNA survives largely in tact.</p>
<p>The next interesting generation is 13:</p>
<pre>    13   4204   RQf`{$,KdfumT
    13   7617   'iSx{$,K`u~(B
    13   7617   'iSx{$,K`u~(B
    13   8742   SQf`1N#UdfumT
    13   8742   SQf`1N#UdfumT
    13   9284   SQf`1N#UdrPlT
    13   9284   SQf`1N#UdrPlT
    13  10198   SQfD1N#UdfumT
    13  12837   sYIu&lt;E"Fq'^_.
    13  15531   DC8Dg1I$*mUs-
    13  15838   L~*xJBVdG7bu2
    13  15856   $zS+&lt;E"Fq(^_(
    13  15883   L~*xJCVdG7bu2
    13  16064   L~*}JBVdF7bu2
    13  16387   SQf`1N"MZ80LT
    13  16533   1,XU%)5$q[YuO
    13  16588   ff],ceW&lt;0fud&#038;
    13  17316   [V3@2'VgY\{KV
<b>    13  17356   kWw#v/P&gt;#apG9
    13  17356   kWw#v/P&gt;#apG9</b></pre>
<p>Wow! "SQf" has been really busy and now almost rules the genepool. "iSx" is second and third, but has lost its number one position to the "RQf" variant of "SQf". "RQf" was introduced in the 12th generation as a child of an "iSx" and "SQf" variant. We see that "kWv" has been knocked almost to the end of the list by more fit candidates. It is very obvious that this pool is no longer random. Patterns are starting to emerge all over it.</p>
<p>By the time we reach generation 40:</p>
<pre>    40   3306   RQSw{$-KcfumB
    40   4204   RQf`{<b>$,KdfumT</b>
    40   4229   RQf`|<b>$,KdfumT</b>
    40   4242   RQe`|$,KdfumT
    40   4795   RQSw{$-KdfumT
    40   4971   <b>RQSwz$</b>*K`uSnT
    40   4973   <b>RQSwz$</b>+K`uSmT
    40   4992   <b>RQSwz$</b>+K`uSnT
    40   5017   SQSxz$+K`uSmT
    40   5017   SQSxz$+K`uSmT
    40   5951   (QSxz$+KdfSmT
    40   5985   'QSxz$+K`uSmT
    40   6421   SQfx{$+K`u~(B
    40   6444   TQf`{$+K`u~(B
    40   6489   SQfx{$+KdfS(B
    40   6492   TQf`{$-K`u~(B
    40   7034   SQSxy$+KdfS(B
    40   7617   'iSx{$,K`u~(B
    40   7617   'iSx{$,K`u~(B
    40   7625   'iS`{$,Kdg~(B</pre>
<p>The genepool is now almost entirely dominated by the "RQf" variants. Forms of its original parents "SQf" and "iSx" can still be found here and there, although "iSx" is almost entire gone from the pool. An interesting thing is that we can see combinations of letters (bold) that keep reappearing. These are almost like actual genes! Combinations of DNA that work well together and therefor stay in the genepool in that combination. It takes lots of generations to make variants of these genes that are more fit then previous versions.</p>
<p>The next milestone is found in the 67th generation:</p>
<pre>    67   3138   RQSw{$+KdfukA
    67   3161   RQSw{$+KcfukA
    67   3176   RQSw{$,KdfulA
    67   3176   RQSw{$+KcfulA
    67   3218   RQSw{$-LcfumA
    67   3222   RQSw{%,KefumB
    67   3237   RQSw{$-LcfvmA
    67   3241   RQSw{$-KcfumA
    67   3241   RQSw{$-KcfumA
    67   3266   RQSw{$-KceumA
    67   3266   RQSw{$-KceumA
    67   3267   RRSw{$-KcfumB
    67   3289   RQSw{%,KefumC
    67   3306   RQSw{$-KcfumB
    67   3306   RQSw{$-KcfumB
    67   3323   RQSw{#-KcfumB
    67   3324   RPSw{$-KdfumB
    67   3331   RQSw{$-KbfumB
    67   3348   RQSw{#-KbfumB
    67   3489   RQSw{$+KdfumA</pre>
<p>This marks the first generation where there are no other variations then the RQS one. But immediately, we see the next generation in which a new number one is found:</p>
<pre>    68   3119   QQSw{$+KdfukA
    68   3138   RQSw{$+KdfukA
    68   3161   RQSw{$+KcfukA
</pre>
<p>By the 96th generation, QQS has taken over the top:</p>
<pre>    96   3060   QQSw{%+KdhukA
    96   3065   QRSw{%+KdfukA
    96   3081   QQSw{%+KdgukA
    96   3081   QQSw{%+KdgukA
    96   3081   QQSw{%+KdgukA
    96   3096   QQSw{$+KdgukA
    96   3104   QQSw{%+KdfukA
    96   3119   QQSw{$+KdfukA
    96   3119   QQSw{$+KdfukA
    96   3119   QQSw{$+KdfukA
    96   3137   RRSw{$,KdfulA
    96   3137   RRSw{$,KdfulA
    96   3138   RQSw{$+KdfukA
    96   3138   RQSw{$+KdfukA
    96   3138   RQSw{$+KdfukA
    96   3138   RQSw{$+KdfukA
    96   3138   RQSw{$+KdfukA
    96   3142   QQSw{$,KdfukA
    96   3142   QQSw{$+KcfukA
    96   3144   QQSw|$+KdfukA</pre>
<p>This is where the race gets boring. Every now and then a new, better, mutation will arise and take over the genepool. Change is slow though, and no big surprised are left. The candidates slowly but surely mutate until the reach something resembling the "Hello, World!" we are looking for in generation 1600:</p>
<pre>  1600     19   Hdllo+ Worle%
  1600     20   Hdklo+ Worle%
  1600     20   Hdklo+ Worle%
  1600     20   Hdklo+ Worle%
  1600     20   Hdklo+ Worle%
  1600     20   Hdklo+ Workd%</pre>
<p>It takes almost another half-thousand generation to get to the final target:</p>
<pre>  1904      0   Hello, World!
  1904      1   Hello, World"
  1904      1   Hello, World"
  1904      2   Hello, Wprld"
  1904      2   Helmo, World"
  1904      2   Helmo, World"
  1904      2   Hdllo, World"
  1904      2   Hello, Worle"
</pre>
<p>Here are the program so you can download them and play with it a bit (ignore the SSL warning; it's a self-signed certificate):</p>
<ul>
<li><a href="https://svn.electricmonk.nl/svn/documents/trunk/src/evolutionary_algorithm/evo_simple.py">Naive Algorithm</a></li>
<li><a href="https://svn.electricmonk.nl/svn/documents/trunk/src/evolutionary_algorithm/evo_better.py">Traditional Algorithm</a></li>
</ul>
<p>Interesting (if you're boring like me and you like this kind of stuff) facts:</p>
<ul>
<li>It usually takes anywhere between 2500 and 4000 generations to evolve the target.</li>
<li>On average, it takes approximately 3100 generations to evolve the target.</li>
<li>If we remove the parent DNA mixing and rely solely on mutations, it takes on average 3650 generations to evolve the target.</li>
<li>The parent DNA mixing is only really useful in the beginning. In the first generations, it can quickly propel a new mix of DNA to the top of the list, but later on random mutations instead of mixing DNA becomes the main driving force between the evolution. (this doesn't have to be the case in real life evolution, naturally)</li>
<li>Sometimes "beneficial" mutations disappear. For instance, the word "World" already appeared in mutation 1469, but was overtaken by other mutations quickly. It was pushed out of the genepool at generation 1486, only to reappear in generation 1659. From then on, it quickly rose to the top and dominated the top 5 positions of the genepool within 10 generations.</li>
</ul>
<p><b>Update:</b> It has rightly been pointed out that are much more efficient methods of this algorithm. Please keep in mind that I had absolutely no idea what I was doing. :-D I'm surprised I got so close to how one would properly implement an Evolutionary Algorithm. </p>
<p>Also, here are some more interesting statistics. I modified the mutation function a number of times, and these are the results:</p>
<ul>
<li>One char, -1, 0 or +1 ascii-value: 3100 generations</li>
<li>Two chars, -1, 0 or +1 assii-value: 1924 generations</li>
<li>Three chars, -1, 0 or +1 ascii-values: 1734 generations</li>
<li>Four chars, -1, 0 or +1 ascii-values: 1706 generations</li>
<li>One char, between -4 and +4 ascii-values: 1459 generations</li>
<li>two chars: between -4 and +4 ascii-values: 2122 generations</li>
<li>Three chars, between -4 and +4 ascii-values: 4490 generations</li>
</ul>
<p>You can also read the<br />
<a href="http://www.reddit.com/r/programming/comments/ktg7o/evolutionary_algorithm_evolving_hello_world/">Reddit discussion</a> and the <a href="http://news.ycombinator.com/item?id=3047046">Hacker News</a> discussion for some nice insights. One of the most interesting comments mentions:</p>
<blockquote><p>FWIW, for this problem, at least the way the OP set it up, the "naive" algorithm is actually a very good way to go - when I increase the population size to 20, and set the mutation/selection/crossover policies OP used, I find that the average number of fitness checks required to hit "Hello, World" (about 3510) is actually higher than the number in the naive version (in the neighborhood of 3k, usually a bit under). Also, the real time taken is larger. Which means that adding "genetic" to the algorithm has actually hurt us...<br />
In fact, even with my full GA codebase in hand (not a substantial one, I wrote it in response to this post, but it's more flexible than the OP's), I couldn't find any situation where having a population size more than a few members helped - single member mutation (which is accepted/rejected if better/worse) always won. This is a good indication that this type of problem is vastly better suited to gradient descent than it is to a genetic algorithm.</p></blockquote>
<p>Cool stuff.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/09/28/evolutionary-algorithm-evolving-hello-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyWebkitGTK: &#039;module&#039; object has no attribute &#039;WebView&#039;</title>
		<link>http://www.electricmonk.nl/log/2011/09/08/pywebkitgtk-module-object-has-no-attribute-webview/</link>
		<comments>http://www.electricmonk.nl/log/2011/09/08/pywebkitgtk-module-object-has-no-attribute-webview/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 09:38:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4723</guid>
		<description><![CDATA[If you&#039;re working with PyWebkitGTK, and you get the following error: Traceback (most recent call last): File "./webkit.py", line 7, in import webkit File "/home/todsah/webkit.py", line 18, in class BrowserPage(webkit.WebView): AttributeError: 'module' object has no attribute 'WebView' &#8230; make sure you haven&#039;t named your script &#039;webkit.py&#039;, and there is no other script with the same [...]]]></description>
			<content:encoded><![CDATA[<p>If you&#039;re working with PyWebkitGTK, and you get the following error:</p>
<pre>Traceback (most recent call last):
  File "./webkit.py", line 7, in <module>
    import webkit
  File "/home/todsah/webkit.py", line 18, in <module>
    class BrowserPage(webkit.WebView):
AttributeError: 'module' object has no attribute 'WebView'</pre>
<p>&#8230; make sure you haven&#039;t named your script &#039;<tt>webkit.py</tt>&#039;, and there is no other script with the same name in that directory. Also delete any <tt>webkit.pyc</tt> pic files. Do&#039;h!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/09/08/pywebkitgtk-module-object-has-no-attribute-webview/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>gCountDown: Systray countdown timer for Linux</title>
		<link>http://www.electricmonk.nl/log/2011/08/26/gcountdown-systray-countdown-timer-for-linux/</link>
		<comments>http://www.electricmonk.nl/log/2011/08/26/gcountdown-systray-countdown-timer-for-linux/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 15:30:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[website]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4699</guid>
		<description><![CDATA[I needed an easy way to set timers on my desktop PC. All I really want is to set a countdown in hours, minutes and seconds, and have it alert me when that time has elapsed. I couldn&#039;t find anything simple with some exceptions that wouldn&#039;t compile (anymore) due to missing libs (which weren&#039;t available [...]]]></description>
			<content:encoded><![CDATA[<p>I needed an easy way to set timers on my desktop PC. All I really want is to set a countdown in hours, minutes and seconds, and have it alert me when that time has elapsed. I couldn&#039;t find anything simple with some exceptions that wouldn&#039;t compile (anymore) due to missing libs (which weren&#039;t available in Xubuntu). So I whipped up my own.</p>
<p>You can download it <a href="http://www.electricmonk.nl/Programmings/GCountDown">from its home page</a>, and here&#039;s a screenshot of the thing:</p>
<p><a href="http://www.electricmonk.nl/log/wp-content/uploads/2011/08/gcountdown.p.png"><img src="http://www.electricmonk.nl/log/wp-content/uploads/2011/08/gcountdown.p.png" alt="" title="gcountdown.p" width="345" height="227" class="alignnone size-full wp-image-4700" /></a></p>
<p>Additionally, I was quite amazed at how easy it is to write GUI applications using just GTK in combination with Glade. Writing this tool took me only about an hour, with no previous knowledge. All it really required was creating a <a href="http://www.pygtk.org/docs/pygtk/class-gtkstatusicon.html">GTK Status Icon</a> with an <tt>active</tt> signal handler. The handler pops up an interface put together in <a href="http://glade.gnome.org/">Glade</a> by loading the <tt>gcountdown.glade</tt> file using <tt>gtk.glade.XML()</tt>. Connecting signals to the widgets is also super easy with <tt>signal_autoconnect()</tt>.</p>
<p>Take a look at the <a href="https://svn.electricmonk.nl/svn/gcountdown/trunk/src/gcountdown">source code</a>. It&#039;s only a measly 136 lines.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/08/26/gcountdown-systray-countdown-timer-for-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Redirect stdout and stderr to a logger in Python</title>
		<link>http://www.electricmonk.nl/log/2011/08/14/redirect-stdout-and-stderr-to-a-logger-in-python/</link>
		<comments>http://www.electricmonk.nl/log/2011/08/14/redirect-stdout-and-stderr-to-a-logger-in-python/#comments</comments>
		<pubDate>Sun, 14 Aug 2011 12:51:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4684</guid>
		<description><![CDATA[I&#039;m writing a daemon and needed a method of redirecting anything that gets sent to the standard out and standard error file descriptors (stdout and stderr) to a logging facility. I googled around a bit, but couldn&#039;t find a satisfactory solution, so I came up with this. import logging import sys &#160; class StreamToLogger&#40;object&#41;: &#34;&#34;&#34; [...]]]></description>
			<content:encoded><![CDATA[<p>I&#039;m writing a daemon and needed a method of redirecting anything that gets sent to the standard out and standard error file descriptors (stdout and stderr) to a logging facility. I googled around a bit, but couldn&#039;t find a satisfactory solution, so I came up with this.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">logging</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> StreamToLogger<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
   <span style="color: #483d8b;">&quot;&quot;&quot;
   Fake file-like stream object that redirects writes to a logger instance.
   &quot;&quot;&quot;</span>
   <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, logger, log_level=<span style="color: #dc143c;">logging</span>.<span style="color: black;">INFO</span><span style="color: black;">&#41;</span>:
      <span style="color: #008000;">self</span>.<span style="color: black;">logger</span> = logger
      <span style="color: #008000;">self</span>.<span style="color: black;">log_level</span> = log_level
      <span style="color: #008000;">self</span>.<span style="color: black;">linebuf</span> = <span style="color: #483d8b;">''</span>
&nbsp;
   <span style="color: #ff7700;font-weight:bold;">def</span> write<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, buf<span style="color: black;">&#41;</span>:
      <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> buf.<span style="color: black;">rstrip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
         <span style="color: #008000;">self</span>.<span style="color: black;">logger</span>.<span style="color: black;">log</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">log_level</span>, line.<span style="color: black;">rstrip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #dc143c;">logging</span>.<span style="color: black;">basicConfig</span><span style="color: black;">&#40;</span>
   level=<span style="color: #dc143c;">logging</span>.<span style="color: black;">DEBUG</span>,
   format=<span style="color: #483d8b;">'%(asctime)s:%(levelname)s:%(name)s:%(message)s'</span>,
   filename=<span style="color: #483d8b;">&quot;out.log&quot;</span>,
   filemode=<span style="color: #483d8b;">'a'</span>
<span style="color: black;">&#41;</span>
&nbsp;
stdout_logger = <span style="color: #dc143c;">logging</span>.<span style="color: black;">getLogger</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'STDOUT'</span><span style="color: black;">&#41;</span>
sl = StreamToLogger<span style="color: black;">&#40;</span>stdout_logger, <span style="color: #dc143c;">logging</span>.<span style="color: black;">INFO</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">sys</span>.<span style="color: black;">stdout</span> = sl
&nbsp;
stderr_logger = <span style="color: #dc143c;">logging</span>.<span style="color: black;">getLogger</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'STDERR'</span><span style="color: black;">&#41;</span>
sl = StreamToLogger<span style="color: black;">&#40;</span>stderr_logger, <span style="color: #dc143c;">logging</span>.<span style="color: black;">ERROR</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span> = sl
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Test to standard out&quot;</span>
<span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">Exception</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'Test to standard error'</span><span style="color: black;">&#41;</span></pre></div></div>

<p>We define a custom file-like object called <tt>StreamToLogger</tt> object which sends anything written to it to a logger instead. We then create two instances of that object and replace <tt>sys.stdout</tt> and <tt>sys.stderr</tt> with our fake file-like instances. </p>
<p>The output logfile looks like this:</p>
<pre>2011-08-14 14:46:20,573:INFO:STDOUT:Test to standard out
2011-08-14 14:46:20,573:ERROR:STDERR:Traceback (most recent call last):
2011-08-14 14:46:20,574:ERROR:STDERR:  File "redirect.py", line 33, in <module>
2011-08-14 14:46:20,574:ERROR:STDERR:raise Exception('Test to standard error')
2011-08-14 14:46:20,574:ERROR:STDERR:Exception
2011-08-14 14:46:20,574:ERROR:STDERR::
2011-08-14 14:46:20,574:ERROR:STDERR:Test to standard error</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/08/14/redirect-stdout-and-stderr-to-a-logger-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>(Finite-) State Machines in practice</title>
		<link>http://www.electricmonk.nl/log/2011/08/13/finite-state-machines-in-practice/</link>
		<comments>http://www.electricmonk.nl/log/2011/08/13/finite-state-machines-in-practice/#comments</comments>
		<pubDate>Sat, 13 Aug 2011 12:01:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4686</guid>
		<description><![CDATA[(The lastest version of this article is always available in from this location). 1. Introduction A (Finite-) State Machine is a method of determining output by reading input and switching the state of the machine (computer program). Depending on the type of State Machine (more on this later), the state of the machine is changed [...]]]></description>
			<content:encoded><![CDATA[<p>(The lastest version of this article is always available in <a href="http://www.electricmonk.nl/Writings/HomePage">from this location</a>).</p>
<h2 id="_introduction">1. Introduction</h2>
<div class="sectionbody">
<div class="paragraph">
<p>A (Finite-) State Machine is a method of determining output by reading input and switching the state of the machine (computer program). Depending on the type of State Machine (more on this later), the state of the machine is changed by looking at the current state, sometimes in combination with looking at the input. </p>
<p><span id="more-4686"></span></p>
<p>If you&#039;re a web-developer, you may (in a dark past) have seen something like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre><tt>if (logged_in()) {
    if ($action == "logout") {
        act_logout();
    } else {
        # Show a form with a logout button
        display_logout();
    }
} else {
    if ($username != NULL &amp;&amp; $password != NULL) {
        act_login();
    } else {
        display_login();
    }
}</tt></pre>
</div>
</div>
<div class="paragraph">
<p>This could be seen as a State Machine, albeit a crude one. It has two states: You are either logged in, or logged out. Depending on the current state and the input (<tt>$action</tt>), the machine switches between these two states, or remains in the same state and displays information.</p>
</div>
<div class="paragraph">
<p>The example above would not normally be called a State Machine, especially since the example works over HTTP, which is a stateless protocol. However, it does explain the theory of State Machines rather well. Later on, we&#039;ll look at some of that theory, and more formal State Machine implementations. Let&#039;s first look at what State Machines are useful for.</p>
</div>
</div>
<h2 id="_uses">2. Uses</h2>
<div class="sectionbody">
<div class="paragraph">
<p>State Machines are a rather abstract concept if you&#039;re not used to dealing with them. They have their roots in mathematics (where we also find Non-Finite State Machines, but we won&#039;t be going into those here).</p>
</div>
<div class="paragraph">
<p>So what are State Machines used for? In the programming world, generally speaking, State Machines are useful when you read input and what you have to do with that input depends on things previously encountered in the input.</p>
</div>
<div class="paragraph">
<p>In practice, State Machines are often used for:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>
Design purposes (modeling the different actions in a program)
</p>
</li>
<li>
<p>
Natural language (grammar) parsers
</p>
</li>
<li>
<p>
String parsing
</p>
</li>
<li>
<p>
Algorithms
</p>
</li>
<li>
<p>
And many other things
</p>
</li>
</ul>
</div>
</div>
<h2 id="_theory">3. Theory</h2>
<div class="sectionbody">
<h3 id="_basic_properties">3.1. Basic properties</h3>
<div style="clear:left"></div>
<div class="paragraph">
<p>State Machines exhibit some basic properties. Not every type of State Machine has all of them, but they always have at least one of them:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>
<strong>Input</strong>. State Machines have input. These are usually called symbols.
</p>
</li>
<li>
<p>
<strong>States</strong>. For example: lightswitch=on, heatsensor=off, etc.
</p>
</li>
<li>
<p>
<strong>Transitions</strong>. When the State Machines changes its state, this is called a transition. A transition usually requires a <em>condition</em>, The condition is determined by the input, the current state or a combination of both.
</p>
</li>
<li>
<p>
<strong>Actions</strong>. An Action is a rather generic description for anything that can happen in a State Machine. Actions may be performed when entering or exiting a state, when input is read, when a state transition occurs, etc.
</p>
</li>
</ul>
</div>
<h3 id="_types_of_state_machines">3.2. Types of State Machines</h3>
<div style="clear:left"></div>
<div class="paragraph">
<p>There are two types or Finite-State Machines which are typically used in computer programs:</p>
</div>
<div class="ulist">
<ul>
<li>
<p>
Acceptors / Recognisers
</p>
</li>
<li>
<p>
Transducers
</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>The <strong>Acceptors/Recognisers</strong> read bits of input and in the end tell you only if the input was accepted or not. One example would be a State Machine that scans a string to see if it has the right syntax. Dutch ZIP codes for instance are formatted as &#034;1234 AB&#034;. The first part may only contain numbers, the second only letters. A State Machine could be written that keeps track of whether it&#039;s in the NUMBER state or in the LETTER state and if it encounters wrong input, reject it.</p>
</div>
<div class="paragraph">
<p><strong>Transcoders</strong> on the other hand continuously read pieces of input and for each piece of input produce either some output or nothing. What it produces can depend on the input and the current state of the machine. A good example of this is a string parser that allows the user to enclose a part of the string in quotes so that it is treated as a single item. On the Unix shell this is used to refer to file names with a space in them. The State Machine reads one character at a time. When a space is read, it assumes a new element is starting. If it encounters a quote, it enters the QUOTED state. Any characters read while in the QUOTED state are assumed to be part of the same element. When another quote is encountered, the State Machine transitions to the UNQUOTED state.</p>
</div>
</div>
<h2 id="_a_simple_transducer_example">4. A simple transducer example</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Let&#039;s look at an example of a simple state machine. I&#039;ve already mentioned the quote-example, so here it is in the flesh:</p>
</div>
<div class="listingblock">
<div class="content">
<pre><tt>#!/usr/bin/python

s = "ls -la 'My Documents' /home /etc"

STATE_UNQUOTED = 1
STATE_QUOTED = 2

CHAR_QUOTE = "'"
CHAR_SPACE = " "

words = []
cur_state = STATE_UNQUOTED
cur_word = ''

# Break s up in words. Words are delimited by
# spaces, unless we're between quotes.
for char in s:
    if cur_state == STATE_QUOTED:
        if char == CHAR_QUOTE:
            words.append(cur_word)
            cur_word = ''
            cur_state = STATE_UNQUOTED
        else:
            cur_word += char
    elif cur_state == STATE_UNQUOTED:
        if char == CHAR_QUOTE:
            cur_state = STATE_QUOTED
        elif char == CHAR_SPACE:
            if cur_word:
                words.append(cur_word)
            cur_word = ''
        else:
            cur_word += char
words.append(cur_word)

print words</tt></pre>
</div>
</div>
<div class="paragraph">
<p>The output of the above State Machine is:</p>
</div>
<div class="literalblock">
<div class="content">
<pre><tt>['ls', '-la', 'My Documents', '/home', '/etc']</tt></pre>
</div>
</div>
<div class="paragraph">
<p>As you can see, &#034;My Documents&#034; has been successfully parsed as a single argument.</p>
</div>
<div class="paragraph">
<p>Let&#039;s examine the State Machine up close. There are two possible <strong>states</strong>: <tt>UNQUOTED</tt> and <tt>QUOTED</tt>. We start in the <tt>UNQUOTED</tt> state and then loop over each of the characters in the string. These characters are the <strong>input</strong> into our state machine, and thus are our <strong>symbols</strong>. If we&#039;re currently in the <tt>UNQUOTED</tt> state, and the character we read is a quote, we <strong>transition</strong> to the <tt>QUOTED</tt> state. Otherwise, if the character is a space, we add the current to the list of words. This is an <strong>action</strong>. If the character is anything else, we append it to the current word (also an <strong>action</strong>)</p>
</div>
<div class="paragraph">
<p>If the State Machine is currently in the <tt>QUOTED</tt> state, we do basically the same thing as the <tt>UNQUOTED</tt> state, except that the space-character is not treated as a word-delimiter. Instead, it is just append to the current word. Encountering another quote while already in the <tt>QUOTED</tt> state means we&#039;ve reached the end of the quoted word, so we append it to the list of words and <strong>transition</strong> back to the <tt>UNQUOTED</tt> state.</p>
</div>
<h3 id="_state_diagrams">4.1. State Diagrams</h3>
<div style="clear:left"></div>
<div class="paragraph">
<p>A State Diagram is a visual method of describing a State Machine. There are many varieties of State Diagrams, each with different rules. The basic form though can be seen in figure 1, which describes the state machine from the previous example.</p>
</div>
<div class="imageblock">
<div class="content">
<a href="http://www.electricmonk.nl/log/wp-content/uploads/2011/08/ex_simple_diagram.png"><img src="http://www.electricmonk.nl/log/wp-content/uploads/2011/08/ex_simple_diagram.png" alt="" title="ex_simple_diagram" width="318" height="400" class="alignnone size-full wp-image-4688" /></a>
</div>
<div class="image-title">Figure 1: Quoted string parser State Diagram</div>
</div>
<div class="paragraph">
<p>In this example diagram the start state (<tt>UNQUOTED</tt>) is indicated by the arrow coming out of nothing going into the <tt>UNQUOTED</tt> state. The &#034;<tt>quote char</tt>&#034; arrows are transition conditions. Whenever a quote character is encountered, the State Machine changes state. It is debatable whether we should include the &#034;non-quote char&#034; arrows, since they do not indicate a state transition. However, since we want to model all the input we can receive, we will include them. This particular State Diagram does not model the action of adding words to the list when a white space occurs. This is because actions that are internal to a state are not modelled in classical State Diagrams. When modeling State Machines as implemented in computer programs, you may therefor want to make use of <a href="http://en.wikipedia.org/wiki/State_diagram_(UML)">UML State Diagrams</a>.</p>
</div>
<h3 id="_abstracting_the_example">4.2. Abstracting the example</h3>
<div style="clear:left"></div>
<div class="paragraph">
<p>Our simple example above is a completely hand-written custom implementation of a State Machine. The state transitions and actions are all handled within the <tt>for</tt> loop. This can quickly become messy when we&#039;re dealing with bigger state machines, so we want to create an Abstract State Machine handler. There are many ways to implement Abstract State Machines, and here is an example of one:</p>
</div>
<div class="listingblock">
<div class="content">
<pre><tt>class TransducerError(Exception):
    pass

class Transducer(object):
    def __init__(self, input, start_state):
        self.input = input
        self.output = []
        self.cur_state = start_state

    def run(self):
        for symbol in self.input:
            method = getattr(self, 'state_%s' % (self.cur_state), None)
            if not method:
                raise TransducerError('No method handler found for state \'%s\'' % (self.cur_state))
            method(symbol)
        return(self.output)

    def transition(self, new_state):
        handler = getattr(self, 'action_%s_exit' % (self.cur_state), None)
        if handler:
            handler()
        handler = getattr(self, 'action_transition', None)
        if handler:
            handler(self.cur_state, new_state)
        handler = getattr(self, 'action_%s_enter' % (new_state), None)
        if handler:
            handler()
        self.cur_state = new_state</tt></pre>
</div>
</div>
<div class="paragraph">
<p>This Abstract State Machine makes use of Python&#039;s powerful meta-programming capabilities to handle states and state transitions. The <tt>run()</tt> method reads tokens from the input. For each token it determines the current state the state machine is in and tries to find a method called <tt>state_CURRENT_STATE</tt> on the current object instance. We can then extend the <tt>Transducer</tt> class and define methods for each different state. When we transition into a different state, the Transducer class automatically tries to find entry, transition and exit methods. The exit methods (for example: <tt>action_quoted_exit</tt>) are called when we exit that particular state. The <tt>action_transition</tt> method, if found, will be called whenever we transition state, regardless of the state we came from and are going to. It is called with the previous and the new state as parameters. Finally, the entry action (for example: <tt>action_quoted_enter</tt>) is called when we transition to that particular state.</p>
</div>
<div class="paragraph">
<p>Here&#039;s how we would implement our string parser example using the abstract Transducer State Machine class.</p>
</div>
<div class="listingblock">
<div class="content">
<pre><tt>s = "ls -la 'My Documents' /home /etc"

CHAR_QUOTE = "'"
CHAR_SPACE = " "

class Splitwords(Transducer):
    def __init__(self, s):
        Transducer.__init__(self, s, 'unquoted')
        self.output.append('')

    def state_unquoted(self, c):
        if c == CHAR_QUOTE:
            self.transition('quoted')
        elif c == CHAR_SPACE:
            self.append_word()
        else:
            self.append_char(c)

    def state_quoted(self, c):
        if c == CHAR_QUOTE:
            self.transition('unquoted')
        else:
            self.append_char(c)

    def append_word(self):
        if self.output[-1]:
            self.output.append('')

    def append_char(self, c):
        self.output[-1] += c

sw = Splitwords(s)
print sw.run()</tt></pre>
</div>
</div>
<div class="paragraph">
<p>The output:</p>
</div>
<div class="literalblock">
<div class="content">
<pre><tt>['ls', '-la', 'My Documents', '/home', '/etc']</tt></pre>
</div>
</div>
<div class="paragraph">
<p>You can get <a href="ex_abstracted.py">the full example</a> to try it out.</p>
</div>
<div class="paragraph">
<p>As you can see, the abstracted implementation of the State Machine is much clearer. The current state is automatically handled by calling the correct methods. Actions and state transitions are clearly visible in each state.</p>
</div>
<h3 id="_another_example_sql">4.3. Another example: SQL</h3>
<div style="clear:left"></div>
<div class="paragraph">
<p>Here&#039;s another example which shows how you can parse structured statements. In this case, we parse an SQL statement.</p>
</div>
<div class="listingblock">
<div class="content">
<pre><tt>s = "SELECT a, b FROM table WHERE a &gt; 5 ORDER BY b"

class SQL(Transducer):
    def __init__(self, s):
        Transducer.__init__(self, s, 'select')
        self.output = {
            'select': [],
            'from': [],
            'where': [],
            'order': [],
        }

    def state_select(self, token):
        if token == "FROM":
            self.transition('from')
        elif token == "SELECT":
            pass
        else:
            self.output['select'].append(token)

    def state_from(self, token):
        if token == "ORDER":
            self.transition('order')
        elif token == "WHERE":
            self.transition('where')
        else:
            self.output['from'].append(token)

    def state_where(self, token):
        if token == "ORDER":
            self.transition('order')
        else:
            self.output['where'].append(token)

    def state_order(self, token):
        if token == 'BY':
            pass
        else:
            self.output['order'].append(token)

sw = SQL(s.split())

print sw.run()</tt></pre>
</div>
</div>
<div class="paragraph">
<p>This example produces the following output:</p>
</div>
<div class="literalblock">
<div class="content">
<pre><tt>{'where': ['a', '&gt;', '5'], 'from': ['table'], 'order': ['b'], 'select': ['a,', 'b']}</tt></pre>
</div>
</div>
<div class="paragraph">
<p>Of course this example is far from complete. It lacks proper syntax checking, is case-sensitive and doesn&#039;t properly sanitize various values. It demonstrates how one could create a very clear parser for structured statements.</p>
</div>
</div>
<h2 id="_conclusion">5. Conclusion</h2>
<div class="sectionbody">
<div class="paragraph">
<p>As we&#039;ve discovered, State Machines are a powerful method of programming context-sensitive input-handling routines. While it is if often possible to write such routines in different ways, State Machines provide a simple, elegant, easy to extend and clear method of implementing such routines. They have a wide variety of uses, from simple input validators to full-blown parsers.</p>
</div>
<div class="paragraph">
<p>We can mix and match both Acceptors and Transducers, we can chain multiple State Machines to each other and we can easily model their behaviour using diagrams.</p>
</div>
<div class="paragraph">
<p>Abstract implementations of State Machines are available for many, if not all, programming languages, ranging from simple implementations to completely extendable toolkits complete with graphical design software.</p>
</div>
<h3 id="_further_reading">5.1. Further Reading</h3>
<div style="clear:left"></div>
<div class="ulist">
<ul>
<li>
<p>
<a href="http://en.wikipedia.org/wiki/Finite-state_machine">Finite-state Machines</a> (Wikipedia)
</p>
</li>
<li>
<p>
<a href="http://www.objectmentor.com/resources/articles/umlfsm.pdf">UML Tutorial: Finite State Machines</a> (PDF)
</p>
</li>
<li>
<p>
<a href="http://www.ibm.com/developerworks/library/l-python-state/index.html">Charming Python &#8211; Using state machines</a> (IBM)
</p>
</li>
</ul>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/08/13/finite-state-machines-in-practice/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Closures, and when they&#039;re useful.</title>
		<link>http://www.electricmonk.nl/log/2011/05/20/closures-and-when-theyre-useful/</link>
		<comments>http://www.electricmonk.nl/log/2011/05/20/closures-and-when-theyre-useful/#comments</comments>
		<pubDate>Fri, 20 May 2011 13:05:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4631</guid>
		<description><![CDATA[When is a closure useful? Before we start with why a closure is useful, we might first need to understand what exactly a closure is. Fist-class functions In order to understand what a closure is, we must realize that in many, if not most, languages we can not just call functions, but we can also [...]]]></description>
			<content:encoded><![CDATA[<p>When is a closure useful?</p>
<p>Before we start with why a closure is useful, we might first need to understand what exactly a closure is.</p>
<h2>Fist-class functions</h2>
<p>In order to understand what a closure is, we must realize that in many, if not most, languages we can not just <i>call</i> functions, but we can also <i>pass references</i> to a function around in a variable. If a language supports that, it is said to have <b>first-class functions</b>. This can be used, amongst other things, to implement callbacks: you pass a reference to a function to a part of the program, which can then later call the function and obtain the results.</p>
<p>A common example of something that uses callback functions is a sorting routine that takes a comparison function. Such a function is called a <b>higher-order function</b>. For instance, Python&#039;s <tt>sorted</tt> function:</p>
<pre>
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
</pre>
<p>The <tt>cmp</tt> parameter is a callback function. If we have a list of custom objects:</p>
<pre>
class MyPerson():
   def __init__(name, age):
      self.name = name
      self.age = age

people = [
   MyPerson('john', 24),
   MyPerson('santa', 100'),
   MyPerson('pete', 30),
]
</pre>
<p>and we want to sort <tt>people</tt> by <tt>age</tt>, we can do so by defining our own custom comparison function and pass it to <tt>sorted</tt>:</p>
<pre>
def my_cmp(a, b):
   return(cmp(a.age, b.age))

sorted(people, my_cmp)
</pre>
<p>The <tt>sorted</tt> function will now loop through the items in <tt>people</tt> and call the callback function <tt>my_cmp</tt> for two items in the list at a time. If one is bigger/smaller than the other, it swaps them in order to sort <tt>people</tt>. Note that we are <i>not calling <tt>my_cmp</tt></i>! We&#039;re simply passing a reference to the function to <tt>sorted</tt>.</p>
<h2>Nested functions</h2>
<p>Okay, so that covers first-class functions. Many languages also support nested functions. Example:</p>
<pre>
def get_cmp_func(key='age'):

   def my_cmp_name(a, b):
      return(cmp(a.name, b.name))

   def my_cmp_age(a, b):
      return(cmp(a.age, b.age))

   if key == 'name':
      return my_cmp_name
   elif key == 'age':
      return my_cmp_age
</pre>
<p>The <tt>get_cmp_func</tt> returns a function that can be used to compare things depending on what you pass as the <tt>key</tt> parameter. <tt>get_cmp_func</tt> is also a <b>higher-order function</b> because it returns a reference to a function. Of course in this use-case there are better ways of sorting the list, but it&#039;s just an example. </p>
<h2>Anonymous functions</h2>
<p>Anonymous functions are not a requirement for closures, but it may be a good idea to explain what they are nonetheless, as there&#039;s a lot of confusion over when exactly something is an anonymous function.</p>
<p>Anonymous functions, sometimes also called lambda&#039;s, are simply that: anonymous. They have no name. Looking at previous examples in this post, we see function names such as <tt>my_cmp</tt>, <tt>get_cmp_func</tt> and even nested functions with names: <tt>my_cmp_age</tt>. Anonymous functions have no name. That doesn&#039;t mean they can&#039;t be passed around as a reference though! Example:</p>
<pre>
sorted(people, lambda a, b: cmp(a.age, b.age))
</pre>
<p>The anonymous function here is: <tt>lambda a, b: cmp(a.age, b.age)</tt>. As you can see, it looks a lot like our first <tt>my_cmp</tt> function, except it has no name and doesn&#039;t seem to return anything. That&#039;s because an anonymous (lambda) function in Python always implicitly returns its first statement. In fact, you can only have one statement in a lambda in Python. (Other languages allow for more advanced anonymous functions; Python likes to keep it simple). </p>
<p>Okay, so why exactly would you need anonymous functions? Well, if your language already supports first-class functions (passing around references to a function), there really isn&#039;t a need for anonymous functions, except that it saves some typing. Lambda functions are <b>syntactic sugar</b> for first-class functions.</p>
<h2>Scope</h2>
<p>So.. a closure, what is it? Again, before we can understand closures, we need to understand scope. Scope determines when we can access defined variables and functions at a certain location in our code. When a function is called, the programming language allocates a piece of memory where parameters to the function are stored and local variables can be stored by the function. This piece of memory (called <b>the stack</b>) is automatically cleared when the function returns. This is called the <b>local scope</b>. </p>
<p>Functions usually can also reference variable of the parent scope. For example:</p>
<pre>
a = 10

def print_a():
   print a

print_a() # output: 10
</pre>
<p>The <tt>print_a</tt> function has access to the <tt>a</tt> variable in the parent scope. But if we define <tt>a</tt> in a function&#039;s <b>local scope</b>, we&#039;ll get an error:</p>
<pre>
def define_a():
   a = 10

def print_a():
   print a

print_a() # NameError: global name 'a' is not defined
</pre>
<p>We get a NameError when we try to print a&#039;s value, because it is defined in <tt>define_a</tt>&#039;s local scope, which will be destroyed as soon as <tt>define_a</tt> stops running. This is called <b>going out of scope</b>. Anything a piece of code can access (local scope, parent scope) is defined as <b>being within scope</b>. </p>
<h2>Closures</h2>
<p>Now, finally, closures!</p>
<p>A closure is a special way in which scopes are handled. Instead of a function going out of scope and all the variables/functions its scope (both the local, as the parent, as the grand-parent, etc scope) being destroyed, the scope is kept around for later usage. Let&#039;s look at an example:</p>
<pre>
def define_a():
   a = 10

   def print_a():
      print a

   return(print_a)

var_print_a = define_a()
var_print_a() # output: 10
</pre>
<p>This outputs <tt>10</tt>. Let&#039;s take a look at what&#039;s happening. We define a function <tt>define_a</tt> and set <tt>a = 10</tt> in its local scope. We then define a nested function that prints <tt>a</tt> from the parent scope. The <tt>define_a</tt> function then returns a reference to that function.</p>
<p>Next, we call <tt>define_a</tt>, which returns a reference to <tt>print_a</tt> and assigns it to variable <tt>var_print_a</tt>. Then we call <tt>var_print_a</tt> as a function (this is called <b>deferencing</b>). By all accounts it shouldn&#039;t work, because <tt>define_a</tt> has already stopped running. It has gone out of scope and its scope (containing <tt>a</tt>) should have been destroyed. But it&#039;s not, because Python kept its scope around. This is a closure. The variables that were in scope at the time the closure was generated are still accessible for the function, and are now known as <b>free variables</b>. </p>
<h2>The use-case</h2>
<p>So, when are closures useful? Why not just use an Object and store the value in the object, along with a method that uses the object? </p>
<p>Let&#039;s say we have a multithreaded program that handles requests. Data is stored in a database. The request handlers need to access the data in the database, but each thread has to have its own handler to the database, or they might accidentally overwrite each other&#039;s data. So our multithreaded program allows us to register a callback function which will be called when a new thread starts. The callback function should return a new database connection for use in the thread.</p>
<pre>
def make_db_connection():
   return(db.conn(host='localhost', username='john', passwd='f00b4r'))

app = MyMultiThreadedApp(on_new_thread_cb = make_db_connection)
app.serve()
</pre>
<p><tt>MyMultiThreadedApp</tt> will call <tt>make_db_connection</tt> for each new thread it starts, and the thread can then use the database connection returned by <tt>make_db_connection</tt>. But there is a problem! The database connection information (host, username, passwd) is hard-coded, but we want to get it from a configuration file instead!</p>
<p>So? We just pass some paramters to the <tt>make_db_connection</tt> right? Wrong!</p>
<pre>
def make_db_connection(host, username, passwd):
   return(db.conn(host=host, username=username, passwd=passwd))

app = MyMultiThreadedApp(on_new_thread_cb = make_db_connection)
app.serve()
</pre>
<p>This example <i>wont work</i>! Why not? Because <tt>MyMultiThreadedApp</tt> has <i>absolutely no idea</i> it should pass parameters to <tt>make_db_connection</tt>. Remember that we&#039;re not calling the function ourselves, we&#039;re just passing a reference to the <tt>MyMultiThreadedApp</tt>, which will call it eventually. There&#039;s no way for it to know which parameters it should pass, because that depends on how your database needs to be set up. SQLite only needs a <tt>path</tt> parameter, but MySQL also needs username, password, and a host.</p>
<p>This is where closures step in:</p>
<pre>
def gen_db_connector(host, username, passwd):
   def make_db_connection():
      return(db.conn(host=host, username=username, passwd=passwd))
   return(make_db_connection)

callback_func = gen_db_connector('localhost', 'john', 'f00b4r')
app = MyMultiThreadedApp(on_new_thread_cb = callback_func)
app.serve()
</pre>
<p>The <tt>gen_db_connector</tt> function generates a closure (<tt>make_db_connection</tt>) which has access to host, username and passwd. We then get a reference to the closure, put it in <tt>callback_func</tt> and pass <i>that</i> to <tt>MyMultiThreadedApp</tt>. Now when a new thread is created, and the callback function is called, it will have access to the host, username and passwd information, without <tt>MyMultiThreadedApp</tt> needing to know which params it should pass on.</p>
<h2>An alternative to closures</h2>
<p>There&#039;s a different way of accomplishing this though. By using objects:</p>
<pre>
class DBConnector():
   def __init__(self, host, username, passwd):
      self.host = host
      self.username = username
      self.passwd = passwd

   def connect(self):
      return(db.conn(
         host=self.host,
         username=self.username,
         passwd=self.passwd)
      )

db_conn = DBConnector('localhost', 'john', 'f00b4r')
app = MyMultiThreadedApp(on_new_thread_cb = db_conn.connect)
app.serve()
</pre>
<p>However, this is a lot more lines, and wheter it works depends on if your programming language allows first-class methods. That is, passing references around to methods on an object, while also allowing you to call them as an instance method (instead of just as a static method).</p>
<p>I&#039;d personally argue for the Object way. Closures are a concept which is very hard to understand for less experienced programmers. It is a matter of debate on whether closures hide state in an unpredictable way. I tend to think they do, and I&#039;m not much of a fan of free variables since it is hard to guess where they came from. At any rate, objects are easier to understand than closures, so if at all possible, go for the object-way.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2011/05/20/closures-and-when-theyre-useful/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Simple function caching in Python</title>
		<link>http://www.electricmonk.nl/log/2009/09/15/simple-function-caching-in-python/</link>
		<comments>http://www.electricmonk.nl/log/2009/09/15/simple-function-caching-in-python/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 18:45:49 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4422</guid>
		<description><![CDATA[Python dynamic nature continues to astound me. I was working on a small library when I noted it did a lot of redundant IO calls. Now I&#039;m not one for premature optimization, but a while ago I was thinking about writing a decorator or something that would wrap around functions and methods, and cache the [...]]]></description>
			<content:encoded><![CDATA[<p>Python dynamic nature continues to astound me. I was working on a small library when I noted it did a lot of redundant IO calls. Now I&#039;m not one for premature optimization, but a while ago I was thinking about writing a decorator or something that would wrap around functions and methods, and cache the returns. Turns out it&#039;s way easier than I thought, and I whipped this up in a couple of minutes:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#!/usr/bin/python</span>
<span style="color: #808080; font-style: italic;">#</span>
<span style="color: #808080; font-style: italic;"># Public Domain.</span>
&nbsp;
__cache = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span> <span style="color: #808080; font-style: italic;"># Global cache</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> fncache<span style="color: black;">&#40;</span>fn<span style="color: black;">&#41;</span>:
   <span style="color: #483d8b;">&quot;&quot;&quot;
   Function caching decorator. Keeps a cache of the return 
   value of a function and serves from cache on consecutive
   calls to the function. 
&nbsp;
   Cache keys are computed from a hash of the function 
   name and the parameters (this differentiates between 
   instances through the 'self' param). Only works if 
   parameters have a unique repr() (almost everything).
&nbsp;
   Example:
&nbsp;
   &gt;&gt;&gt; @fncache
   ... def greenham(a, b=2, c=3):
   ...   print 'CACHE MISS'
   ...   return('I like turtles')
   ... 
   &gt;&gt;&gt; print greenham(1)           # Cache miss
   CACHE MISS
   I like turtles
   &gt;&gt;&gt; print greenham(1)           # Cache hit
   I like turtles
   &gt;&gt;&gt; print greenham(1, 2, 3)     # Cache miss (even though default params)
   CACHE MISS
   I like turtles
   &gt;&gt;&gt; print greenham(2, 2, ['a']) # Cache miss
   CACHE MISS
   I like turtles
   &gt;&gt;&gt; print greenham(2, 2, ['b']) # Cache miss
   CACHE MISS
   I like turtles
   &gt;&gt;&gt; print greenham(2, 2, ['a']) # Cache hit
   I like turtles
   &quot;&quot;&quot;</span>
   <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #dc143c;">new</span><span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwargs<span style="color: black;">&#41;</span>:
      h = <span style="color: #008000;">hash</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">repr</span><span style="color: black;">&#40;</span>fn<span style="color: black;">&#41;</span> + <span style="color: #dc143c;">repr</span><span style="color: black;">&#40;</span>args<span style="color: black;">&#41;</span> + <span style="color: #dc143c;">repr</span><span style="color: black;">&#40;</span>kwargs<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> h <span style="color: #ff7700;font-weight:bold;">in</span> __cache:
         __cache<span style="color: black;">&#91;</span>h<span style="color: black;">&#93;</span> = fn<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>args, <span style="color: #66cc66;">**</span>kwargs<span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">return</span><span style="color: black;">&#40;</span>__cache<span style="color: black;">&#91;</span>h<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
   <span style="color: #dc143c;">new</span>.__doc__ = <span style="color: #483d8b;">&quot;%s %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>fn.__doc__, <span style="color: #483d8b;">&quot;(cached)&quot;</span><span style="color: black;">&#41;</span>
   <span style="color: #ff7700;font-weight:bold;">return</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">new</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
   <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">doctest</span>
   <span style="color: #dc143c;">doctest</span>.<span style="color: black;">testmod</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Save to a file named &#039;fncache.py&#039; and import it into your program. Then decorate your functions and methods with it, and their output will be cached. Python rocks. Remember, only use for functions that do heavy calculations, file or network IO. Determining the uniqueness of the function call is rather expensive.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/09/15/simple-function-caching-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HIrcd &#8211; minimal IRC server in Python</title>
		<link>http://www.electricmonk.nl/log/2009/09/14/hircd-minimal-irc-server-in-python/</link>
		<comments>http://www.electricmonk.nl/log/2009/09/14/hircd-minimal-irc-server-in-python/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 08:39:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[libre software]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4419</guid>
		<description><![CDATA[I wrote a little IRC server in Python: HIrcd is a minimal, hacky implementation of an IRC server daemon written in Python in about 400 lines of code, including comments, etc. It is mostly useful as a testing tool or perhaps for building something like a private proxy on. Do NOT use it in any [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote a little IRC server in Python:</p>
<p><a href="http://www.electricmonk.nl/Programmings/HIrcd">HIrcd</a> is a minimal, hacky implementation of an IRC server daemon written in Python in about 400 lines of code, including comments, etc.</p>
<p>It is mostly useful as a testing tool or perhaps for building something like a private proxy on. Do NOT use it in any kind of production code or anything that will ever be connected to by the public. </p>
<p><a href="https://svn.electricmonk.nl/svn/hircd/trunk/hircd.py">Direct link to the source code</a>, for those interested.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/09/14/hircd-minimal-irc-server-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Encodings in Python</title>
		<link>http://www.electricmonk.nl/log/2009/05/22/encodings-in-python/</link>
		<comments>http://www.electricmonk.nl/log/2009/05/22/encodings-in-python/#comments</comments>
		<pubDate>Fri, 22 May 2009 09:48:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.electricmonk.nl/log/?p=4370</guid>
		<description><![CDATA[There&#039;s a whole slew of information regarding all kinds of encoding issues out there on the big bad Internet. Some deal with how unicode works, some with what UTF-8 is and how it relates to other encodings and some with how to transform from one encoding to another. All that theory is nice, but I&#039;ve [...]]]></description>
			<content:encoded><![CDATA[<p>There&#039;s a whole slew of information regarding all kinds of encoding issues out there on the big bad Internet. Some deal with how unicode works, some with what UTF-8 is and how it relates to other encodings and some with how to transform from one encoding to another. All that theory is nice, but I&#039;ve found a rather worrying lack of practical, understandable and contextual information on dealing with encodings in Python, leading me to think I&#039;d never be able to properly deal with encodings in Python.</p>
<p>So I took the plunge, and tried to find some stuff out. Here&#039;s what I came up with. All of this might be terribly wrong though. Encodings are a complicated subject if you ask me, so feel free to correct me if I&#039;m wrong.</p>
<p><strong>NOTICE</strong>: This article uses special HTML entities in various places to show output. Depending on your browser, the encodings it supports and the font you are using and its capabilities of showing UTF-8 characters, you may or may not be able to properly see these characters. In these cases a description of the character is given between parenthesis right after the character.</p>
<p><span id="more-4370"></span></p>
<h2>Basics</h2>
<p>When we&#039;re talking about text, we&#039;re really talking about two things:</p>
<ul>
<li>Byte representation of text.</li>
<li>Encoded representation of text.</li>
</ul>
<p>Byte representation of text doesn&#039;t give a crap about what language or encoding something is. A byte is a byte: 8 bits, 256 different values. </p>
<p>The text&#039;s encoding doesn&#039;t exist. That is, a text is encoded in what we <em>say</em> it is encoded in. If I take a piece of UTF-8 text, and say &#034;this is encoded as latin-1&#034;, that&#039;s just fine. There is nothing inherently UTF-8y about the text. The encoding merely lets us know what little symbol we should show when we encounter a certain byte (or range of bytes, for that matter). But pretending that a piece of UTF-8 text is encoded in latin-1 can of course cause problems when the text contains bytes which latin-1 doesn&#039;t have. This is what causes encoding/decoding errors in Python.</p>
<p>A piece of text consisting of the byte with decimal value 163 does not exist in ASCII (as it is heigher than 128), represents &#039;&pound;&#039; (Pound sterling) in latin-1, &#039;&#290;&#039; (Capital G Cedilla) in Cyrillic (ISO-8859-5) and also doesn&#039;t exist in UTF-8. As you can see, it all depends on how you interpret it, and encodings are names for ways to interpret.</p>
<p>There are an almost infinite number of encodings out there. If you wanted to, you could make your own. Some of the ones I&#039;ll be using as an example in this little adventure are: </p>
<ul>
<li>
<p><code>ASCII</code></p>
<p>A 7-bit encoding (the 8th bit is unused) that doesn&#039;t know jack about swishy characters with accents and stuff.</p>
</li>
<li>
<p><code>Latin-1</code>, <code>ISO 8859-1</code>, <code>Windows-1252</code></p>
<p>An 8-bit encoding that does know about accents and stuff. Backwards compatible with ASCII in that its first 7 bits (byte-values lower than 128) map to the same characters as ASCII does. Higher characters map to all kinds of stuff like characters with accents on them, pound signs, etc.</p>
</li>
<li>
<p><code>UTF-8</code></p>
<p>A variable-length encoding. It can consist of 1 byte or more, up to 4 bytes. Also backwards compatible with ASCII in the same way that latin-1 is.</p>
</li>
</ul>
<h2>When to deal with encodings</h2>
<p>In any given software system, we have to deal with a lot of different potential encodings. Here are some general ones we have to worry about:</p>
<ul>
<li>The default system encoding of the Operating System.</li>
<li>The encoding of any input into the program.</li>
<li>The default encoding of our programming language.</li>
<li>The encodings our software libraries can work with.</li>
<li>The supported encoding of the tools we use to work with data in our program (debuggers, etc).</li>
<li>The encoding of the destination when we output data to it.</li>
</ul>
<p>As we can see, there are a <em>lot</em> of things that can go wrong. All this sounds rather despairing doesn&#039;t it? The truth of the matter is that most of the time, encodings don&#039;t matter at all. In reality, we only need to take encoding into account when:</p>
<ul>
<li>We want to operate on the actual <em>meanings</em> of the text, instead of just transporting bytes around.</li>
<li>We output text to a system which needs to operate on the actual <em>meanings</em> of the text (or which cares about the encoding its input comes in).</li>
</ul>
<p>Suppose we&#039;re reading a file and counting the number of bytes in that file. Does the encoding matter? No, we care about bytes, not characters. If, however, we want to count the frequency of <em>characters</em>, we&#039;ll need to deal with encodings, since an &#039;e&#039; without an accent is not the same as an &#039;&eacute;&#039; (&#039;e&#039; <em>with</em> an accent). When we&#039;re outputting text, it depends on the system we&#039;re outputting to whether we need to deal with encodings. If we&#039;re inserting data into a database, and that database expects UTF-8, we&#039;ll need to make sure we also output UTF-8. If we&#039;re printing to the console, and the system default encoding is ASCII, we&#039;ll need to make sure we&#039;re outputting ASCII.</p>
<p>As a software developer, you&#039;ll mostly have to deal with the encodings of input and output of your program. This presents us with two major problems:</p>
<ul>
<li>We need to know the encoding of our input.</li>
<li>We need to know the encoding we need to output.</li>
</ul>
<p>Sometimes, however, we simply don&#039;t <em>know</em> the encodings. When we read input from a text file that contains bytes with values higher than 127, we might simply not know what encoding it is. It could be in the system&#039;s default encoding, but it does not have to be. These can be tricky problems. For instance, my system default encoding is ASCII at the moment. Yet running the command <code>apt-cache dumpavail</code> produces output with bytes with a value higher than 128. Which encoding is it in? It&#039;s in ASCII encoding of course! Except that ASCII doesn&#039;t support bytes with larger then 128 values, so it is actually output with invalid characters.</p>
<p>Usually these problems are solved by making industry standards. For instance, some things always have to be in a certain encoding. Other things may let us know up-front what encoding something is in. An HTML document, for instance, should mention its encoding on the first line of the file (which should always be in just ASCII characters so everything can read it).</p>
<h2>How to deal with encodings</h2>
<p><strong>WARNING</strong>: You may encounter what is sometimes called a &#039;heisenbug&#039;. A heisenbug is a bug which normally does not appear when your program is running, but only appears when you try to look at the data. In the case of encodings, suppose you try to print a string, purely for debugging reasons, which has the UTF-8 encoding. The console you&#039;re printing on may not be able to deal with UTF-8, and as such Python tries to encode the UTF-8 string to ASCII and fails. This bug would not occur had you not tried to print the data to the console. </p>
<p>Onto the meat of the article! How do we deal with encodings?</p>
<p>First, some basics. Normal strings in Python don&#039;t care about encoding:</p>
<pre><code>&gt;&gt;&gt; s = 'Andr\xE9' # \xE9 == hex for 233 == latin-1 for e-acute.
&gt;&gt;&gt; type(s)
&lt;type 'str'&gt;
&gt;&gt;&gt; print s
Andr&#65533;
</code></pre>
<p>Note that I&#039;m outputting this to an ASCII terminal, yet python does not complain about the unsupported character in &#039;s&#039;. It simply show some garbled text &#039;Andr&#65533;&#039; (&#039;i&#039; with trema, reversed questionmark and a 1/2 character).</p>
<p>Python also has support for unicode strings. These unicode strings can contain just about any character in the entire world. </p>
<pre><code>&gt;&gt;&gt; s = u'Andr\xE9'
&gt;&gt;&gt; type(s)
&lt;type 'unicode'&gt;
</code></pre>
<p>In this case, we define a unicode string &#039;Andr&eacute;&#039; (&#039;e&#039; acute) by specifying the &#039;&eacute;&#039; as hexidecimal value <code>E9</code>. Since unicode strings can contain just about everything, they are ideal as intermediate storage for strings. It is therefor always a good idea to <em>decode</em> any input you receive from its encoding to unicode, and to <em>encode</em> it to the proper encoding when you output it again.</p>
<p>Now let&#039;s see what happens when we print the unicode string we just created:</p>
<pre><code>&gt;&gt;&gt; print s
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 4: ordinal not in range(128)
</code></pre>
<p>As you can see in the previous examples, when we use normal python strings, we don&#039;t get an error. But when we use the Unicode string, we get an encoding error. Like I said: normal strings in Python don&#039;t care about encoding, but unicode strings do. So when we try to print it, Python will notice our terminal is in ASCII, and tries to convert the string from Unicode to ASCII. This, naturally, fails as ASCII doesn&#039;t know about character \xE9. We&#039;ll learn how to deal with that in a moment. First, let&#039;s look at how to handle input.</p>
<p>So what do we do when reading in an ASCII string when it contains invalid characters? We&#039;ve got three options:</p>
<ul>
<li>Ignore the entire encoding (if we can).</li>
<li>Ignore any invalid characters.</li>
<li>Replace any invalid characters with a placeholder character. </li>
</ul>
<p>Ignoring invalid characters simply removes them from the text as the text is decoded from the source encoding to the target encoding. Replacement will replace the unknown character with a placeholder character. When encoding to unicode, it will be replaced with the character with hexidecimal value XFFFD. This is a two-byte character which will be rendered (if your font supports it) as a square or diamond with a questionmark in it. </p>
<p>So, let&#039;s decode our input (containing invalid characters) from ascii to unicode:</p>
<pre><code>&gt;&gt;&gt; s = 'Andr\xE9'               # ASCII with invalid char \xE9.
&gt;&gt;&gt; s.decode('ascii', 'ignore')
u'Andr'
</code></pre>
<p>The &#039;&eacute;&#039; is dropped from the output string as it does not exist in ASCII. We can also replace characters:</p>
<pre><code>&gt;&gt;&gt; s.decode('ascii', 'replace')
u'Andr\ufffd'
</code></pre>
<p>This works great. The &#039;&eacute;&#039; is replaced with <code>\uFFFFD</code>: the UTF-8 symbol &#65533; (Black diamond with a questionmark) representing an unsupported character. The <code>decode()</code> method on normal strings decodes <em>from</em> the encoding you specify to a unicode string. But look at what happens when we do the same thing, but print the variable?:</p>
<pre><code>&gt;&gt;&gt; print s.decode('ascii', 'replace')
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 4: ordinal not in range(128)
</code></pre>
<p>How is this possible? Did we do something wrong? No, everything is alright. This is the heisenbug. What happened here is that the &#039;replace&#039; option will replace any unknown characters in the ASCII string to the \xFFFD unicode character. When we try to print the resulting unicode string to the terminal, Python will try to convert the string to our default system encoding (ASCII) and print it. This will fail, because \xFFFD isn&#039;t a character in ascii either. If we want to print it (for debugging purposes or something) we have to encode it back to ASCII before we can:</p>
<pre><code>&gt;&gt;&gt; print s.decode('ascii', 'replace').encode('ascii', 'replace')
Andr?
</code></pre>
<p>The Python manual mentions: &#034;When a Unicode string is printed, written to a file, or converted with str(), conversion takes place using this default [ASCII] encoding&#034;. Since Python converts without using any of the replacement options, UnicodeEncodeErrors can occur. We have to encode it ourselves if our data contains invalid characters. The <code>encode()</code> method of a unicode string encodes the string <em>from</em> unicode <em>to</em> the encoding you specify. If we were to output the string to a HTML file that is in the UTF-8 encoding, we would do this instead:</p>
<pre><code>s = 'Andr\xE9'

line = s.decode('ascii', 'replace')
f_out = file('foo.html', 'w')
f_out.write('''&lt;?xml version="1.0" encoding="utf-8"?&gt;
&lt;!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "DTD/xhtml1-transitional.dtd"&gt;
&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
&lt;body&gt;
    %s
&lt;/body&gt;
&lt;/html&gt;''' % (line.encode('utf-8')))

f_out.close()
</code></pre>
<p>This will output a HTML document with the string &#039;Andr&#65533;&#039; (where the last character is a diamond with a questionmark in it, depending on how your browser displays the UTF-8 replacement character). We first decode the line from ASCII to Unicode and when we output it, we encode it from Unicode to UTF-8. Rmemeber: UTF-8 isn&#039;t the same as Unicode! There&#039;s also UTF-16, UTF-32, etc.</p>
<p>If we know that the string &#039;s&#039; is actually in the latin-1 encoding, we can replace the line:</p>
<pre><code>line = s.decode('ascii', 'replace')
</code></pre>
<p>with:</p>
<pre><code>line = s.decode('latin-1', 'replace')
</code></pre>
<p>and the output in UTF-8 will become &#039;Andr&eacute;&#039;.</p>
<p>Another problem that can occur is when you try to run the <tt>decode()</tt> method on strings which are already unicode:</p>
<pre><code>&gt;&gt;&gt; s = 'Andr\xE9'
&gt;&gt;&gt; u = s.decode('ascii', 'replace')
&gt;&gt;&gt; u
u'Andr\ufffd'
&gt;&gt;&gt; u.decode('ascii', 'replace')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 4: ordinal not in range(128)
&gt;&gt;&gt; u.decode('utf-8', 'replace')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 4: ordinal not in range(128)
</code></pre>
<p>As you can see, calling the <tt>decode()</tt> method on a Unicode string always seems to fail. I have no idea why the Unicode strings in Python have this method, but it&#039;s probably either because they inherit from the String type, or this is the &#039;internal&#039; method Python uses when it needs to display or convert a Unicode string. If I remember correctly, the <tt>decode()</tt> will be removed from Unicode strings in the future.</p>
<h2>Conclusion</h2>
<p>It turns out that dealing with encodings in Python is tricky since we have to deal with a large number of potential problem areas. Once you&#039;re familier with the ins and outs though, dealing with encodings becomes rather easy. In general:</p>
<ul>
<li>If input is just &#039;passing through&#039; your program, just treat it as binary instead of text. Don&#039;t decode or encode at all. But be careful about the target output (including the terminal when debugging with print or something) as it may require a specific encoding.</li>
</ul>
<p>For input:</p>
<ul>
<li>First decode input from the encoding it is in to Unicode.</li>
<li>If you know the input encoding: <code>s = input.decode(input_encoding)</code>.</li>
<li>If you do not know the input encoding, take the safe route and decode from ASCII with the &#039;replace&#039; option: <code>s = input.decode('ascii', 'replace')</code></li>
<li>If the input might contain invalid characters for the encoding it is in, use the <code>replace</code> option. (always a good idea)</li>
</ul>
<p>For output:</p>
<ul>
<li>If you know the output encoding: <code>s.encode(output_encoding)</code>.</li>
<li>If you do not know the output encoding, and there is an agreement about the default encoding, use that. Otherwise, take the safe route and use ASCII.</li>
<li>If the target encoding might not support all the characters that are in the internal representation, use the &#039;replace&#039; option.</li>
</ul>
<p>I hope I got all this right, and that it makes dealing with encodings in Python a little clearer.</p>
<h2>Further reading</h2>
<p>Further reading:</p>
<p><a href="http://docs.python.org/tutorial/introduction.html#unicode-strings">An Informal Introduction to Python: Unicode-strings</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.electricmonk.nl/log/2009/05/22/encodings-in-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

