Log <-

Archive for August, 2008

Blocking VoilaBot

Tuesday, August 19th, 2008

There's a web-crawler out there called VoilaBot, which is hammering my site with needless crawls and which appears to ignore robots.txt files completely. Apparently it's a crawler for a french portal/search engine. If you need to block this bot from your site, there are two things you can do:

Firewall

If you've got a firewall on your box, you can deny access to the two IP ranges 81.52.143.0 / 24 and 193.252.149.0 / 24. That'll get them off your back permanently. For Linux machines with iptables firewall, the following will do the trick:

iptables -A INPUT --source 193.252.149.0/24 -j DROP
iptables -A INPUT --source 81.52.143.0/24 -j DROP

htaccess

If you don't want to firewall the bot, you can deny them access to your website by putting a .htaccess file in your web root directory with the following contents:

order allow,deny
deny from 81.52.143.
deny from 193.252.149.

Don't trust VoilaBot to honour your robots.txt file; it won't.

Dependency resolving algorithm

Thursday, August 7th, 2008

Here's a little explanation of a possible dependency resolution algorithm. I've made this as basic as possible, so it's easy to understand. If you're at home in (graph) algorithms, this post is probably not of much interest to you.

Premise

Suppose we have five objects which all depend on some of the other objects. The objects could be anything, but in this case let's say they're very simple software packages (no minimal versions, etc) that depend on other packages which must be installed first. How does one find the right order of installing the packages?

(more…)

Rant: Dicts that are not dicts

Sunday, August 3rd, 2008

There's something I have to get off my chest. I HATE it when things in Python pretend to be something they are not. I often use IPython to introspect things so I know how to use them. Lately, I've been running into more and more libraries which return things that pretend to be one thing, but which really are something else. Observe:

import optparse
 
parser = optparse.OptionParser()
parser.add_option("-p", dest="path", action="store", type="string")
(options, args) = parser.parse_args()
 
print options

Output:

{'path': None}

"Ah", I think, "a dictionary. Swell!", and go ahead and do:

print options.get('path', 'default')

And then it responds:

Traceback (most recent call last):
  File "/home/todsah/foo.py", line 7, in <module>
    print options.get('path', 'default')
AttributeError: Values instance has no attribute 'get'

What the hell? A dictionary with no get attribute? Then, when you print the type, it turns out it's not a dictionary at all: "<type 'instance'>". It's a plain old object instance, and you have to use getattr(), etc instead of foo[key] and foo.get(). Fortunately, I've seen this trap now for so many times, the first thing I do is check the type of the thing I'm having problems with, but occasionally it still bites me in the ass.

I am seeing this more and more often, and it annoys me to no end. Python is supposed to be a rapid application development language, and things like this are extremely annoying. Don't pretend to be something you're not. If an object is going to behave dictionary-like, why not just extend the real dict object? Wouldn't that be so much easier?

So, developers: Please don't make things appear like things they're not. It causes confusion, and possibly even bugs.