Tuesday, September 13th, 2005
I am, at times, surprised by people’s inability or unwillingness to find more efficient ways of performing tedious tasks.
Sys admins: Learn to use the unix toolset (sed, grep, etc)
Windows admins: install cygwin so you can use the unix toolset on windows.
Windows software engineers: Get yourself a decent editor.
Documentation writers: Get Docbook, learn LaTeX or at least learn how to use MS Word efficiently. One does not vertically align text using the spacebar! (Nor do you make headings using font-sizes and bold)
Tuesday, September 13th, 2005
Those who are familiar with Unix commandline tools like grep, sed and cut will know about the enormous power they provide. They make it a breeze to mangle, transform and retrieve information in and from text files. Unfortunately, they’re mostly dependant on row and column based information. That is, they expect each line in a file to contain one row and each column to be seperated with a certain character (usually a space or a tab). Take, for instance, some lines from a simple Apache logfile
220.75.180.165 - - [26/Jun/2005:09:57:55 +0200] "GET /st..
192.168.1.7 - - [26/Jun/2005:10:03:56 +0200] "GET / HTTP..
68.44.13.141 - - [26/Jun/2005:10:14:27 +0200] "GET /imag..
192.168.1.7 - - [26/Jun/2005:10:21:36 +0200] "GET / HTTP..
82.160.36.154 - - [26/Jun/2005:10:23:53 +0200] "GET /ima..
If I wanted to list every unique IP in that logfile, I’d simply issue the following command at the shell:
[todsah@jib]~$ cut -d" " -f1 access.log | sort | uniq
192.168.1.7
220.75.180.165
68.44.13.141
82.160.36.154
‘Cut’ strips away every column except for the first. ‘Sort’ sorts list of IP’s so that all duplicate will appear under eachother. ‘Uniq’ then removed all the duplicate IP’s, and I’m left with a list of all unique IP’s in the log. Writing this small ‘script’ took about 15 seconds. Now, that’s a pretty strong method for statistical analysis.
Unfortunately, XML took that power completely away. It doesn’t work on a row/column basis, it’s syntax is loose (for example, you can spread a single element with attributes over multiple lines) and you can nest elements inside of other elements.
There is hope, however. A toolset called XMLStarlet offers a powerful XML commandline tool which can do Xpath selects, transformations and more.
Take the following example XML file:
<?xml version='1.0' encoding='UTF-8'?>
<dataq port="50000" daemon="false" verbose="true">
<access>
<host>127.0.0.1</host>
</access>
<access>
<username>john</username>
<password>johnspw</password>
</access>
<access>
<host>192.168.1.5</host>
<username>pete</username>
<password>petespw</password>
</access>
<queue name='backup' />
<queue name='mp3' type='fifo' size='1' overflow='pop' />
<queue name='restricted' type='fifo' size='5' overflow='deny'>
<access sense="deny">
<username>john</username>
</access>
</queue>
</dataq>
Suppose I’d want to get all the usernames in this XML file. Using the traditional Unix commandline utilities, I’d have to do this:
[todsah@jib]~$ grep "<username>" dataq.xml | cut -d'>' -f2 | cut -d'<' -f1
john
pete
john
As you can see, this works. But what if we changed the last queue element to be completely on one line?:
<queue name='restricted' type='fifo' size='5' overflow='deny'><access sense="deny"><username>john</username></access></queue>
It’s the exact same, valid, XML and should yield the same results, but it does this instead:
[todsah@jib]~$ grep "<username>" dataq.xml | cut -d'>' -f2 | cut -d'<' -f1
john
pete
The problem is that you can’t assume anything to be the same from one XML file to the next. It’s simply not part of the XML specifications.
Using the XMLStarlet commandline tool, we can work around these problems. For instance, selecting all usernames from the XML file works like this:
[todsah@jib]~$ xmlstarlet sel -t -m "//username" -v 'node()' -n dataq.xml
john
pete
john
This commandline basically says to use Select mode (sel) with a commandline template (-t) to match all <username> tags (-m “//username”) and to show the Value (-v) of each match and to append each value with a newline (-n).
It’s use is quite simple, but you do have to know XML and XPath. Some XSLT will also come in handy because underneath, every option to XMLStarlet is translated to an XSLT stylesheet.
XMLStarlet also allows you to completely transform (using XSLT) XML files, translate, validate, format and edit XML files. You can, for instance, use XMLStarlet to delete or insert certain parts of an XML file that match an XPath expression. It can also convert XML to the PYX format, which can then be more easily used with traditional Unix commandline tools.
Tuesday, September 13th, 2005
Through this article on Newsforge, I found out about IPython, an enhanced interactive Python shell. It sure works more comfortable than the default interactive Python interpretor. ONLamp.com has a somewhat more indepth article on IPython.
Sunday, September 4th, 2005
Some people don’t seem to get the fact that the defacto standard format for music is MP3. Stop putting crap like WMA, MP4 and OGG/Vorbis in torrents please. Oh, and also stop putting Zip files in a RAR file in a torrent. It’s already compressed enough as it is and nobody cares about saving two bytes on a three Gigabyte torrent.
Tuesday, August 30th, 2005
The King and the Toaster
Once upon a time, in a kingdom not far from here, the king summoned two of his advisors for a test. He showed them both a shiny metal box with two slots in the top, a control knob, and a lever. “What do you think this is?” he asked. One advisor, an engineer, answered first. “It is a toaster,” he said. The king asked, “How would you design an embedded computer for it?”
The engineer replied, “Using a four-bit microcontroller, I would write a simple program that reads the darkness knob and quantifies its position to one of 16 shades of darkness, from snow white to coal black. The program would use that darkness level as the index to a 16-element table of initial timer values. Then it would turn on the heating elements and start the timer with the initial value selected from the table. At the end of the time delay, it would turn off the heat and pop up the toast. Come back next week, and I’ll show you a working prototype.”
The second advisor, a computer scientist, immediately recognized the danger of such short-sighted thinking. He said, “Toasters don’t just turn bread into toast, they are also used to warm frozen waffles. What you see before you is really a breakfast food cooker. As the subjects of your kingdom become more sophisticated, they will demand more capabilities. They will need a breakfast food cooker that can also cook sausage, fry bacon, and make scrambled eggs. A toaster that only makes toast will soon be obsolete. If we don’t look to the future, we will have to completely redesign the toaster in just a few years.”
“With this in mind, we can formulate a more intelligent solution to the problem. First, create a class of breakfast foods and specialize this class into subclasses: grains, pork, and poultry. The specialization process should be repeated with grains divided into toast, muffins, pancakes, and waffles; pork divided into sausage, links, and bacon; and poultry divided into scrambled eggs, hard-boiled eggs, poached eggs, fried eggs, and various omelet classes.”
“The ham and cheese omelet class is worth special attention because it must inherit characteristics from the pork, dairy, and poultry classes. Thus, we see that the problem cannot be properly solved without multiple inheritance. At run time, the program must create the proper object and send a message to the object that says, Cook yourself. The semantics of this message depend, of course, on the kind of object, so they have a different meaning to a piece of toast than to scrambled eggs.”
“Reviewing the process so far, we see that the analysis phase has revealed that the primary requirement is to cook any kind of breakfast food. In the design phase, we have discovered some derived requirements. Specifically, we need an object-oriented language with multiple inheritance. Of course, users don’t want the eggs to get cold while the bacon is frying, so concurrent processing is required, too.”
“We must not forget the user interface. The lever that lowers the food lacks versatility, and the darkness knob is confusing. Users won’t buy the product unless it has a user-friendly, graphical interface. When the breakfast cooker is plugged in, users should see a cowboy boot on the screen. Users click on it, and the message Booting UNIX v. 8.3 appears on the screen. (UNIX 8.3 should be out by the time the product gets to the market.) Users can pull down a menu and click on the foods they want to cook.”
“Having made the wise decision of specifying the software first in the design phase, all that remains is to pick an adequate hardware platform for the implementation phase. A Pentium-90 with 32MB of memory, a 1G hard disk, and a Super-VGA monitor should be sufficient. If you select a multitasking, object-oriented language that supports multiple inheritance and has a built-in GUI, writing the program will be a snap. (Imagine the difficulty we would have had if we had foolishly allowed a hardware-first design strategy to lock us into a four-bit microcontroller!).”
The king wisely had the computer scientist beheaded, and they all lived happily ever after.
From: this comment at this story on slashdot
Thursday, August 25th, 2005
Something so simple, yet so hard to figure out. I’ve been strugling on and off with something for over a couple weeks. Here’s the deal:
I’m using the Python SAX module in combination with XPath to find my way through an XML-based configuration file. When an incorrect option or value is given, I raise an exception about the problem. However, I’d also like to let the user know the exact line number at which the problem occured.
I’ve already given up on displaying the exact line number. Due to the fact that a single XML element can span multiple lines and the way XML is parsed, it’s probably too impossible to get the real line number. Instead, I’ve settled for the line at which the element began. So when the element tag is openened on line 5 and the attributes span multiple lines, I’ll settle for reporting the problem as being on line 5.
But I can’t even get that to work in Python. I’ve went through the API, googled my ass off, but all I can find is some vague reference about locators.
For something they make out to be so simple and easy to use, XML (or rather the parsers and gazillion of different parsing libraries/methods) sure is a bitch to use.
Monday, August 22nd, 2005
Four days of booze, pot and great music: the Lowlands 2005 music festival. I just got back and it was awesome. It took me only half an hour of shower-time to get all the gunk out my ears and dust out my nose. I’m not sure how many days it’ll take to get the alcohol and THC out my bloodstream though.
I’ll post a little write-up when I find the time. Right now I’m unpacking my bags and catching some hard earned Z’s.
(Snoeiharde Pantera!)
BTW: How come I never get any e-mail when I’m at home but when I’m gone suddenly everybody needs to write me? *sigh*
Thursday, August 4th, 2005
Because I have a bad memory:
CodeByExample
Thursday, August 4th, 2005
“In this interview, Erich Gamma, co-author of the landmark book, Design Patterns, talks with Bill Venners about how design patterns are problem solution pairs, how design patterns help you understand intent and tradeoffs, and how to become a better designer through practice.”
Some pieces that caught my eye:
Bill Venners: In the GoF book you and your coauthors say, “Knowing these design patterns can make you a better designer.” How? Don’t I still need to know when to apply them?
Erich Gamma: Yes, do not turn off the brain; your creativity is still required. It isn’t always clear when to apply a design pattern. In addition you also need to know which variation of a pattern to apply, and how to tweak the pattern. You always need to adapt a pattern to your particular problem.
Removing a pattern can simplify a system and a simple solution should almost always win. Coming up with simple solutions is the real challenge.
Erich Gamma: In addition to reading books, you need to read and understand lots of code, see how existing systems solve a particular problem and what experienced designers did. Basically what design patterns do is to tell you what these developers have done. But, just reading about it isn’t enough. You become a master by mimicking the work of excellent developers.
I’m not saying I’m an experienced designer or anything, but I’ve always frowned upon the use of patterns. I really don’t see the use in getting a prefabricated solution from a book when it’s just as easy to come up with the sollution yourself. The problem isn’t with using patterns perse but with the underlying idea: prefabricated sollutions. An electric engineer need not understand the internals of the Integrated Chipset he uses. Likewise, the Software engineer needn’t understand the libraries he uses. BUT, the should always understand the structure of what he’s building and, more importantly, understand why he’s building it like that.
Basically, I see design patterns as something for inexperienced programmers with little or no imagination or creativity. So far, the writers of the book seem to agree with me. But I’ll take it one step further and say: all it does is stiffle those people’s creativity and keep them from ever learning to find sollutions on their own.
When I was younger, I used to spend nights and nights lying awake thinking about the best possible sollution to this or that programming problem. Nowaydays, when I see some of these design patterns, I find that they’re usually the same as the what I figured out for myself back then. The big advantage of having thought out the sollution for myself is that I can easily determine the cases in which the ‘pattern’ is not the best way to go (usually because I already dismissed it for several other similar problems when coming up with a new sollution for something).
Actually thinking about ways to perform a certain task for yourself, instead of letting some book do the thinking for you, teaches you so much more than following some prefabricated sollution. It’s somewhat similar to finding an awnser to a question yourself by consulting API documentation, books, other people’s source-code, etc instead of simply asking a guru. During the ‘researching’ you’ll encounter tons of information that might not be of any use in your current situation but which may prove to be indispensable later on.
Even though the author, Erich Gamma, does try to convince everybody that you don’t become a good developer/designer by just reading books on patterns or applying patterns, he fails to mention that you can only become a good developer by understanding patterns. Perhaps the real problems shines through when he says: “You become a master by mimicking the work of excellent developers“. That’s what Design Patterns seem like to me.. mimicking. But everybody can mimick someone else.
He should have said: “You become a master by understanding the work of excellent developers”.
Monday, August 1st, 2005
I use my Debian GNU/Linux laptop both at home and at work. Sometimes I may take it over to a friend’s network so I can show him some stuff.
Unfortunatelly, being on a bunch of different networks does pose some problems. When I’m at home for instance, I’d like my firewall to block incoming FTP and SSH connections, since there’s no reason to enable them (I use NFS at home). When I’m on the netwerk at work, I’d like FTP and SSH to be enabled. At my friend’s network, I can’t use DHCP but get a static address instead.
Fortunately, there’s the laptop-net Debian package:
laptop-net is a tool for Debian (it may work in other GNU/Linux distributions) that makes moving from one network configuration to another very easy.
Tipically you may want to use it for your laptop, so that you can plug it at home, at the university or to the Matrix, and it will take care of detecting which configuration you need –from a list of preinstalled ones–, and then run scripts, set up the interface, copy files to the root file system, etc.
Rock on! And to think I was almost going to implement some hack thing myself! This’ll save me a lot of time.
The text of all posts on this blog, unless specificly mentioned otherwise, are licensed under this license.