Home Page

bexec v0.6 released: Execute Vim buffers and show output in new buffer.

I've just released version 0.6 of my Vim bexec plugin. 

Bexec is a Vim plugin that allows the user to execute the current buffer if it contains a script with a shebang (#!/path/to/interpreter) on the first line or if the default interpreter for the script's type is known by Bexec. The output of the script will be grabbed and displayed in a separate buffer.

New in this version:

You can get the new version (as a Vimball) from:

bbcloner v1.3 released. BitBucket backup tool.

I've just released bbcloner v1.3. bbcloner creates mirrors of your public and private Bitbucket Git repositories and wikis. It also synchronizes already existing mirrors. Initial mirror setup requires you manually enter your username/password. Subsequent synchronization of mirrors is done using Deployment Keys.

This release features the following changes:


my_indexr: Script to drop and recreate MySQL indexes

As can be read in this article, I was in need of a method to quickly drop all indexes (except primary keys) from a MySQL database. After googling around a bit and being astonished that apparently no-one had written such a thing yet, I wrote the script that can be seen in that article.

Unfortunately, that script wasn't very good, so I decided to do a cleaner better implementation of it. The result is my_indexr, which spits out SQL commands to drop and recreate indexes on a database. Other features include:

There may still be some edge-cases that are not properly handled by my_indexr. If you encouter one, please let me know.

You can download my_indexr from its Bitbucket page.

This is what its output looks like:

$ ./indexr.py -u root -p mydb
DROP INDEX `location` ON `idx_tst_innodb_basic`;
DROP INDEX `name_age` ON `idx_tst_innodb_basic`;
DROP INDEX `email` ON `idx_tst_innodb_basic`;
DROP INDEX `PRIMARY` ON `idx_tst_innodb_compkey`;
CREATE  INDEX `location` USING BTREE ON `idx_tst_innodb_basic` (`location_id`);
CREATE  INDEX `name_age` USING BTREE ON `idx_tst_innodb_basic` (`name`(40),`age`);
CREATE UNIQUE INDEX `email` USING BTREE ON `idx_tst_innodb_basic` (`email`);
ALTER TABLE `idx_tst_innodb_compkey` ADD PRIMARY KEY (`last_name`,`first_name`);

Think Bayes: Bayesian Statistics in Python

There's a free download available of "Think Bayes – Bayesian Statistics in Python" over at it-ebooks.info. A small excerpt from the book:

The premise of this book, and the other books in the Think X series, is that if you know
how to program, you can use that skill to learn other topics.

Most books on Bayesian statistics use mathematical notation and present ideas in terms
of mathematical concepts like calculus. This book uses Python code instead of math,
and discrete approximations instead of continuous mathematics. As a result, what
would be an integral in a math book becomes a summation, and most operations on
probability distributions are simple loops.

Increasing performance of bulk updates of large tables in MySQL

I recently had to perform some bulk updates on semi-large tables (3 to 7 million rows) in MySQL. I ran into various problems that negatively affected the performance on these updates. In this blog post I will outline the problems I ran into and how I solved them. Eventually I managed to increase the performance from about 30 rows/sec to around 7000 rows/sec. The final performance was mostly CPU bound. Since this was on a VPS with only limited CPU power, I expect you can get better performance on some decently outfitted machine/VPS.

The situation I was dealing with was as follows:

  • About 20 tables, 7 of which were between 3 and 7 million rows.

  • Both MyISAM and InnoDB tables.

  • Updates required on values on every row of those tables.

  • The updates where too complicated to do in SQL, so they required a script.

  • All updates where done on rows that were selected on just their primary key. I.e. WHERE id = …

Here are some of the problems I ran into.

Python’s MySQLdb is slow

I implemented the script in Python, and the first problem I ran into is that the MySQLdb module is slow. It’s especially slow if you’re going to use the cursors. MySQL natively doesn’t support cursors, so these are emulated in Python code. One of the trickiest things is that a simple SELECT * FROM tbl will retrieve all the results and put them in memory on the client. For 7 million rows, this quickly exhausts your memory. Real cursors would fetch the result one-by-one from the database so that you don’t exhaust memory.

The solution here is to not use MySQLdb, but use the native client bindings available with import mysql.

LIMIT n,m is slow

Since MySQL doesn’t support cursors, we can’t mix SELECT and UPDATE queries in a loop. Thus we need to read in a bunch of rows into memory and update in a loop afterwards. Since we can’t keep all the rows in memory, we must read in batches. An obvious solution for this would be a loop such as (pseudo-code):

offset = 0
size = 1000
while True:
    rows = query('SELECT * FROM tbl LIMIT :offset, :size"
    for row in rows:
        # do some updates
    if len(rows) < size:
    offset += size

This would use the LIMIT to read the first 1000 rows on the first iteration of the loop, the next 1000 on the second iteration of the loop. The problem is: in MySQL this becomes linearly slower for higher values of the offset. I was already aware of this, but somehow it slipped my mind.

The problem is that the database has to advance an internal pointer forward in the record set, and the further in the table you get, the longer that takes. I saw performance drop from about 5000 rows/sec to about 100 rows/sec, just for selecting the data. I aborted the script after noticing this, but we can assume performance would have crawled to a halt if we kept going.

The solution is to use the order by the primary key and then select everything we haven’t processed yet:

size = 1000
last_id = 0
while True:
    rows = query('SELECT * FROM tbl WHERE id > :last_id ORDER BY id LIMIT :size')
    if not rows:
    for row in rows:
        # do some updates
        last_id = row['id']

This requires that you have an index on the id field, or performance will greatly suffer again. More on that later.

At this point, +SELECT+s were pretty speedy. Including my row data manipulation, I was getting about 40.000 rows/sec. Not bad. But I was not updating the rows yet.

Connection settings

The next things I did is some standard tricks to speed up bulk updates/inserts by disabling some foreign key checks and running batches in a transaction. Since I was working with both MyISAM and InnoDB tables, I just mixed optimizations for both table types:

db.query('SET autocommit=0;')
db.query('SET unique_checks=0; ')
db.query('SET foreign_key_checks=0;')
db.query('LOCK TABLES %s WRITE;' % (tablename))
# SELECT and UPDATE in batches
db.query('UNLOCK TABLES')
db.query('SET foreign_key_checks=1;')
db.query('SET unique_checks=1; ')
db.query('SET autocommit=1;')

I must admit that I’m not sure if this actually increased performance at all. It is entirely possible that this actually hurts performance instead. Please test this for yourselves if you’re going to use it. You should also be aware that some of these options bypass MySQL’s data integrity checks. You may end up with invalid data such as invalid foreign key references, etc.

One mistake I did make was that I accidentally included the following in an early version of the script:

db.query('ALTER TABLE %s DISABLE KEYS;' % (tablename))

Such is the deviousness of copy-paste. This option will disable the updating of non-unique indexes while it’s active. This is an optimization for MyISAM tables to massively improve performance of mass INSERTs, since the database won’t have to update the index on each inserted row (which is very slow). The problem is that this also disables the use of indexes for data retrieving, as noted in the MySQL manual:

While the nonunique indexes are disabled, they are ignored for statements such as SELECT and EXPLAIN that otherwise would use them.

That means update queries such as UPDATE tbl SET key=value WHERE id=1020033 will become incredibly slow, since they can no longer use indexes.

MySQL server tuning

I was running this on a stock Ubuntu 12.04 installation. MySQL is basically completely unconfigured out of the box on Debian and Ubuntu. This means that those 16 GBs of memory in your machine will go completely unused unless you tune some parameters. I modified /etc/mysql/my.cnf and added the following settings to improve the speed of queries:

key_buffer         = 128M
innodb_buffer_pool_size = 3G

The key_buffer setting is a setting for MyISAM tables that determines how much memory may be used to keep indexes in memory. The equivalent setting for InnoDB is innodb_buffer_pool_size, except that the InnoDB setting also includes table data.

In my case the machine had 4 GB of memory. You can read more about the settings on these pages:

Don’t forget to restart your MySQL server.

Dropping all indexes except primary keys

One of the biggest performance boosts was to drop all indexes from all the tables that needed to be updates, except for the primary key indexes (those on the id fields). It is much faster to just drop the indexes and recreate them when you’re done. This is basically the manual way to accomplish what we hoped the ALTER TABLE %s DISABLE KEYS would do, but didn’t.

UPDATE: I wrote a better script which is available here.

Here’s a script that dumps SQL commands to drop and recreate indexes for all tables:

import _mysql
import sys

mysql_username = 'root'
mysql_passwd = 'passwd'
mysql_host = ''
dbname = 'mydb'

tables = sys.argv[1:]
indexes = []

db = _mysql.connect(user=mysql_username, passwd=mysql_passwd, db=dbname)
db.query('SHOW TABLES')
res = db.store_result()
for row in res.fetch_row(maxrows=0):
    tablename = row[0]
    if not tables or tablename in tables:
        db.query('SHOW INDEXES FROM %s WHERE Key_name != "PRIMARY"' % (tablename))
        res = db.store_result()
        for row_index in res.fetch_row(maxrows=0):
            table, non_unique, key_name, seq_in_index, column_name, \
            collation, cardinality, sub_part, packed, null, index_type, \
            comment, index_comment = row_index
            indexes.append( (key_name, table, column_name) )

for index in indexes:
    key_name, table, column_name = index
    print "DROP INDEX %s ON %s;" % (key_name, table)

for index in indexes:
    key_name, table, column_name = index
    print "CREATE INDEX %s ON %s (%s);" % (key_name, table, column_name)

Output looks like this:

$ ./drop_indexes.py
DROP INDEX idx_username ON users;
DROP INDEX idx_rights ON rights;
CREATE INDEX idx_username ON users (username);
CREATE INDEX idx_perm ON rights (perm);

Some important notes about the above script:

  • The script is not foolproof! If you have non-BTREE indexes, if you have indexes spanning multiple columns, if you have any kind of index that goes beyond a BTREE single column index, please be careful about using this script.

  • You must manually copy-paste the statements into the MySQL client.

  • It does NOT drop the PRIMARY KEY indexes.


In the end, I went from about 30 rows per second around 8000 rows per second. The key to getting decent performance is too start simple, and slowly expand your script while keeping a close eye on performance. If you see a dip, investigate immediately to mitigate the problem.

Useful ways of investigation slow performance is by using tools to unearth evidence of the root of the problem.

  • top can tell you if a process is mostly CPU bound. If you’re seeing high amounts of CPU, check if your queries are using indexes to get the results they need.

  • iostat can tell you if a process is mostly IO bound. If you’re seeing high amounts of I/O on your disk, tune MySQL to make better use of memory to buffer indexes and table data.

  • Use the EXPLAIN function of MySQL to see if, and which, indexes are being used. If not, create new indexes.

  • Avoid doing useless work such as updating indexes after every update. This is mostly a matter of knowing what to avoid and what not, but that’s what this post was about in the first place.

  • Baby steps! It took me entirely too long to figure out that I was initially seeing bad performance because my SELECT LIMIT n,m was being so slow. I was completely convinced my UPDATE statements were the cause of the initial slowdowns I saw. Only when I started commenting out the major parts of the code did I see that it was actually the simple SELECT query that was causing problems initially.

That’s it! I hope this was helpful in some way!

HP Lights-Out 100i (LO100i) "Invalid username / password" when trying to connect to KVM

If you're trying to connect to the Virtual KVM (console) on a HP Lights-Out 100i (LO100i) using the Remote Console Client Java applet, you might be getting an error in the order of

Username / Password invalid


com.serverengines.r.rdr.EndOfStream: EndOfStream

This is a known problem with firmware version 4.24 (or Earlier):

The Virtual Keyboard/Video/Mouse (KVM )will not be accessible
on HP ProLiant 100-series servers with Lights-Out 100 Base 
Management Card Firmware Version 4.24 (or earlier), if the server
has been running without interruption for 248 days (or more). When
this occurs, when attempting to access Virtual KVM/Media as shown
below, the browser will generate the following message[...]

As a solution, HP recommends:

As a workaround, shut down the server and unplug the power cable.
After a few seconds, reconnect the power cable and restart the server.

I've found that it isn't required to actually unplug the powercable. For me, remotely cold-restarting the iLoM card got rid of the problem. You can remotely cold-start the iLoM with ipmitool:

$ ipmitool -H <ILOM_IP> -U <USERNAME> mc
MC Commands:
  reset <warm|cold>
  watchdog <get|reset|off>
$ ipmitool -H <ILOM_IP> -U <USERNAME> mc reset cold
Sent cold reset command to MC

Now we wait until the iLoM comes back up and we can succefully connect to the console via the KVM Java applet.

Getting started with Juju: "no public ssh keys found" error.

I'm trying out Juju with the 'local' environment, and ran into the following error:

$ sudo juju bootstrap
error: error parsing environment "local": no public ssh keys found

The Getting Started Guide mentions nothing of this error, and I couldn't find a solution on the web. After a bit of reading, it seems Juju requires a passwordless SSH key be available in your ~/.ssh dir. So to get rid of this error, just generate a new key with no password:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/fboender/.ssh/id_rsa):
Enter passphrase (empty for no passphrase): <EMPTY>
Enter same passphrase again: <EMPTY>

Now you bootstrap Juju:

$ sudo juju bootstrap -e local

Quick-n-dirty HAR (HTTP Archive) viewer

HAR, HTTP Archive, is a JSON-encoded dump of a list of requests and their associated headers, bodies, etc. Here's a partial example containing a single request:

  "startedDateTime": "2013-09-16T18:02:04.741Z",
  "time": 51,
  "request": {
    "method": "GET",
    "url": "http://electricmonk.nl/",
    "httpVersion": "HTTP/1.1",
    "headers": [],
    "queryString": [],
    "cookies": [],
    "headersSize": 38,
    "bodySize": 0
  "response": {
    "status": 301,
    "statusText": "Moved Permanently",
    "httpVersion": "HTTP/1.1",
    "headers": [],
    "cookies": [],
    "content": {
      "size": 0,
      "mimeType": "text/html"
    "redirectURL": "",
    "headersSize": 32,
    "bodySize": 0
  "cache": {},
  "timings": {
    "blocked": 0,

HAR files can be exported from Chrome's Network analyser developer tool (ctrl-shift-i → Network tab → capture some requests → Right-click and select Save as HAR with contents. (Additional tip: Check the "Preserve Log on Navigation option – which looks like a recording button – to capture multi-level redirects and such)

As human-readable JSON is, it's still difficult to get a good overview of the requests. So I wrote a quick Python script that turns the JSON into something that's a little easier on our poor sysadmin's eyes:


It supports colored output, dumping request headers and response headers and the body of POSTs and responses (although this will be very slow). You can filter out uninteresting requests such as images or CSS/JSS with the --filter-X options.

You can get it by cloning the Git repository from the Bitbucket repository.


16 things you should absolutely configure on any new server

It seems even professional sysadmins occasionally forgets the bare minimum configuration that should be done on a new machine. As a developer and part-time system administrator, I can't count the number of times I've had to waste significantly more time Here's a, by no means exhaustive, list of things you should configure on any new machine you deploy.

1. Pick a good hostname

Set a sane hostname on your machine. Something that describes what the machine is or does. Something that uniquely identifies it from any other machines on, at least, the same network. For instance, machine for a client called Megacorp might be called "mc-tst-www-1" to identify the first test WWW server for Megacorp. The primary production loadbalancer might be called "mc-prod-lb-1". Never have your junior sysadmin bring down the master database backend because he thought he was on a different machine.

2. Put all hostnames in /etc/hosts

Put all hostnames your machine uses in the /etc/hosts file to avoid annoying DNS lookup delays and other problems.

3. Install ntpd

Running into problems related to clock drift on your server is not a matter of "if", but a matter of "when". And with clock drift it will be sooner rather than later, depending on which direction your clock is drifting in. Install NTPd, and synchronize it to the same servers as all your other machines. Don't use a default pool if you can avoid it, because they might use Round Robin DNS and give you different servers. Theoretically this shouldn't pose a problem. Theoretically…

If you're running virtual machines, turn off Virtualbox/VMWare/whatever's time synchronization. They've historically been proven to be very unreliable[1]. Install ntpd anyway. And I swear, as a developer, I will kick you in the face if I ever have to diagnose another problem caused by a lack of ntpd.

4. Make sure email can be delivered

This one is simple. Make sure email can be delivered to the outside world. Many programs and scripts will need to be able to send email. Make sure they can. Ideally, you should have a dedicated SMTP server set up on your network that hosts can relay email through. A gateway firewall should prevent all other outgoing traffic for port 25, unless you want your server to be turned into a zombified spam node (which will happen).

5. Cron email

Configure Cron such that output is emailed to an actual person. You want to know about that "No space left on device" error that crashed your cobbled-together backups script. You can specify the email address with the MAILTO directive in the crontab file. Don't forget about user crontabs! Since it's hard to ensure every user crontab has a MAILTO setting, you may want to configure your SMTP server to automatically forward all email to a special email address.

6. Protect the SSH port

Unauthorized probing of the SSH port will happen, unless you prevent it. Weak passwords can be easily guessed in a few hundred tries. Timing attacks can be used to guess which accounts live on the system, even if the attacker can't guess the password. There are several options for securing your SSH port

7. Configure a firewall

This should go without saying.. install and configure a firewall. Firewall everything. Incoming traffic, outgoing traffic, all of it. Only open up what you need to open. Don't rely on your gateway's firewall to do its job; you will regret it when other machines on your network get compromised.

8. Monitor your system

Monitor your system, even if it's just a simple shell script that emails you about problems. Disks will fill up, services will mysteriously shut down and your CPU load will go to 300. I highly recommend also monitoring important services from a remote location.

9. Configure resource usage

Running Apache, a database or some Java stack? Configure it properly so it utilizes the resources your system has, but doesn't overload it. Configure the minimum and maximum connections Apache will accept, tune the memory your database is allowed to use, etc.

10. Keep your software up-to-date

Install something like apt-dater to keep your software up-to-date. Many server compromises are directly linked to outdated software. Don't trust yourself to keep a machine up to date. You will forget. If you're running third-party software not installed from your package repository, subscribe to their security announcement mailing list and keep a list of all third-party software installed on every server. A tool such as Puppet, Chef or Ansible can help keep your system not only up to date, but uniform.

11. Log rotation

Make sure all logs are automatically rotated, or your disks will fill up. Take a look at /etc/logrotate.d/ to see how. For instance, for Apache vhosts that each have their own log directory, you can add an entry such as:

/var/www/*/logs/*.log {
        rotate 52
        # create 640 root adm # Disabled so old logfile's properties are used.
                if [ -f /var/run/apache2.pid ]; then
                        /etc/init.d/apache2 restart > /dev/null

12. Prevent users from adding SSH keys

Remove the ability for users to add new authorized keys to their account. Which keys are allowed to connect should be in the admin's hand, not the users. Having the Authorized Keys files scattered all over your system also makes maintenance harder. To do this, change the AuthorizedKeysFile setting in /etc/ssh/sshd_config:

#AuthorizedKeysFile     %h/.ssh/authorized_keys
AuthorizedKeysFile      /etc/ssh/authorized_keys/%u

13. Limit user crontabs

Limit which users can create personal crontab entries by placing only allowed usernames in /etc/cron.allow. This prevents users from creating CPU/IO heavy cronjobs that interfere with your nightly backups.

14. Backups, backups and more backups

Make backups! Keep local backups for easy restoring of corrupt files, databases and other disasters. Databases should be backed up locally each night, if at all possible. Rotate backups on a daily, weekly and monthly cycle. Keep off-site backups too. For small servers I can highly recommend Boxbackup. It keeps remote encrypted backups, does full and incremental backups, keeps a history and does snapshotting as well as continues syncing. Only delta's (changes in files) are transferred and stored, so it is light on resources. wrote an article on setting it up which might prove useful.

15. Install basic tools

Make sure basic tools for daily admin tasks are pre-installed. There's nothing more annoying than having to track down problems and not having the means to do so, especially when your network refuses to come up. Some essential tools:

16. Install fail2ban

I've already mentioned this in the "Protect your SSH port", but it bears mentioning again: install Fail2ban to automatically block offending IPs.


That's it. These are the things I would consider the bare minimum that should be properly configured when you deploy a new machine. It will take a little bit more time up front to configure machines properly, but it will save you time in the end. I can highly recommend using Puppet, Chef or Ansible to help you automate these tasks.


[1]This was the case a few years ago. I'm not sure it still is for VMWare. For VirtualBox, it most certainly is, but you wouldn't run that in a production environment probably. At the very least, install NTPd on your host.

bbcloner: create mirrors of your public and private Bitbucket Git repositories


bbclonerI wrote a small tool that assists in creating mirrors of your public and private Bitbucket Git repositories and wikis. It also synchronizes already existing mirrors. Initial mirror setup requires that you manually enter your username/password. Subsequent synchronization of mirrors is done using Deployment Keys.

You can download a tar.gz, a Debian/Ubuntu package or clone it from the Bitbucket page.



Here's how it works in short. Generate a passwordless SSH key:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key: /home/fboender/.ssh/bbcloner_rsa<ENTER>
Enter passphrase (empty for no passphrase):<ENTER>
Enter same passphrase again: <ENTER>

You should add the generated public key to your repositories as a Deployment Key. The first time you use bbcloner, or whenever you've added new public or private repositories, you have to specify your username/password. BBcloner will retrieve a list of your repositories and create mirrors for any new repositories not yet mirrored:

$ bbcloner -n -u fboender /home/fboender/gitclones/
Cloning new repositories
Cloning project_a
Cloning project_a wiki
Cloning project_b

Now you can update the mirrors without using a username/password:

$ bbcloner /home/fboender/gitclones/
Updating existing mirrors
Updating /home/fboender/gitclones/project_a.git
Updating /home/fboender/gitclones/project_a-wiki.git
Updating /home/fboender/gitclones/project_b.git

You can run the above from a cronjob. Specify the -s argument to prevent bbcloner from showing normal output.

The mirrors are full remote git repositories, which means you can clone them:

$ git clone /home/fboender/gitclones/project_a.git/
Cloning into project_a...

Don't push changes to it, or the mirror won't be able to sync. Instead, point the remote origin to your Bitbucket repository:

$ git remote rm origin
$ git remote add origin git@bitbucket.org:fboender/project_a.git
$ git push
remote: bb/acl: fboender is allowed. accepted payload.

Get it

Here are ways of getting bbcloner:

More information

Fore more information, please see the Bitbucket repository.