Electricmonk

Ferry Boender

Programmer, DevOpper, Open Source enthusiast.

Blog

Why you shouldn’t be using S3 or Google App Engine

Tuesday, June 24th, 2008

Recently a new ‘hype’ has been popping up, namely Amazon’s S3 and Google App Engine. For those who don’t know, S3 and App Engine are basically hosting facilities for web applications which run, and data that’s stored, remotely on Amazon or Google’s infrastructure. This allows you to benefit from the huge and reasonably super-scaling architectures that these companies have built.

While this may seem nice, I’ve had my doubt about the usefulness of such services. For one, it fosters vendor lock-in. Your application with an Amazon S3 database back-end won’t be able to run anywhere else. Neither will your Google App Engine application. Sure, some of the software is released as Open Source, but the software is just icing on the cake. It’s the architecture that really counts, and it won’t be easy to reproduce Google or Amazon’s architecture. And when you build your application against those architectures, it’s bound to become limited to them, in the same way (and probably even beyond that) that SQL queries you write for one database will perform badly on a different database.

And I am now seeing the first proof of these problems, as well as entirely different problem: debugging. You see, beyond the tools provided by the service, you are out of options when it comes to debugging. When hosting your own data storage or application platform, you can sink your teeth into it when problems arise. Even when most of the stuff you’re using isn’t Open Source, you’ll still be able to sick a whole range of diagnostic tools at the problem. No such luck with remote application hosting services. They’re black-box beyond the most common debugging situations.

That’s what the people on a thread at Amazon’s Web Services Forum are experiencing right now, if you ask me. I’ll quote the gist of the conversation:

all data we store on S3 has gone through the same code path for months. starting a couple days ago a small percentage of the objects we are retrieving are not checksumming to the correct values. we hash and store objects by checksum and rehash the objects when we retrieve to ensure there is no data corruption. all the objects we’re having issues with were uploaded at approximately the same time period a few days ago.

we’ve stored 10’s of millions of objects in S3 and never encountered such problems. please let me know ASAP if you have any idea what could be going on here. thanks.

I’m having similiar problems. […] I’ve been investigating our end to find the problem, and it was just suggested that I should check the forums to see if anyone else was having problems. […] This is super-high priority for us (both corporately and personally, since lack of sleep dealing with this is killing me

The first post was made on June 22, 2008 5:05 PM. An Amazon support engineer (I assume) is working on it, but at June 23, 2008 11:12 AM there is still no answer.

And this shows exactly my objections with such remotely hosted application services. It’s out of your control. That’s something I couldn’t live with, and I think most companies shouldn’t want to either. As one user comments: “This is super-high priority for us (both corporately and personally“. Staking your entire business on some black-box remote service seems like a silly idea to me. Most service providing software I (and the company I work at) use are Free or otherwise Open Software, which means we’ll always have the source to dive into when we run into problems. And even if the source isn’t there, at least you’ll get to look at the problem from both sides. When it comes to database problems, you’ll be able to view the logs, turn on debugging, inspect the entire environment the service is running in, from software to hardware. That’s something you can’t do with a remote service.

Yes, there will always be problems for which a fix is hard to find or for which there simply isn’t a fix. If you’re not willing to run that risk, you can pay a company a lot of money for support, and let them handle the hard problems (or for some even the easy problems). Naturally, Amazon and Google both offer that support too, but that’s still different. You see, when you run into a really unfixable problem in your own architecture, you can always swap any part of the software or hardware and try again. Or at least, that’s what most developers try to achieve anyway: interchangeable software (databases, application servers, programming libraries) and hardware. But with a remote service such as S3 or App Engine, you’ve already committed your entire application to one, huge, non-interchangeable component from which you can never escape. Hence the vendor lock-in.

But who knows what the future brings? This may all turn out to be a non-issue, and companies and developers may all flock to such general service providers in the future without any difficulties whatsoever. I guess I’m a conservative person when it comes to these kinds of changes. I’d rather wait and see.

Free Speech

Monday, June 23rd, 2008

Free Speech. Why is it important? Because it’s an extension of Free Thought. Should we be able to think whatever the hell we want? Yes we should. Controlling Free Speech is about nothing more than controlling Free Thought. “You’re not allowed to say this, because somebody might not agree with it. You’re not allowed to say that, because somebody might feel hurt by it”. What they’re really trying to do is control what you can think. Trying to generate a “mindset”, a “zeitgeist”. Brainwashing is more like it. Well, fuck that. I’ll think about whatever the hell I want and as long as I’m thinking it, I’ll be saying it.

So fuck the Dutch government for trying to outlaw Free Thought, and keep on publishing cartoons showing Mohammed, wearing t-shirts implying cops are corrupt (which they are), making Death-Threat Raps and telling the public about how the politicians are the real terrorists. Remember that little rhyme you used to use when you were a kid? “Sticks and stones may break my bones, but words will never hurt me”? Guess what? Kids are smarter than our police, politicians, religious fanatics and the whole government. Grow the fuck up.

This country is going to shit. Time to move to Cuba, where you’re allowed more freedoms these days.

Why Python Rocks I: Inline documentation

Sunday, June 22nd, 2008

Okay. So what’s cool about Python? I can’t count the number of times I’ve had to show skeptics why Python is cool, what Python can do that their favorite language can’t do. So I’m writing a bunch of articles showing off Python’s Awesome.

All articles in this series:

First up: Documentation. I’m talking about inline documentation here: annotating modules, classes, methods, etc. Most languages have third party tools that parse the source code and extract documentation from comments. This is nice, of course, but the comments get out of date and you have to regenerate the documentation each time. Different people use different documentation generators (Doxygen VS. PHPDoc, JSDoc VS. ScriptDoc, etc) which, in turn, use different documentation standards, causing unknown chaos documentation even within the same language space. You may have heard some code monkeys say “The code IS the documentation”. In Python, that’s actually not far from the truth. Let’s look at some of the things you can do with inline Python documentation.

(more…)

Links

Wednesday, June 18th, 2008

Here are some random links to interesting stuff:

FirePHP
FirePHP is a PHP debugging library and a Firefox plugin which allow you to output debugging information to the Firebug debugging panel. Since it doesn’t intermingle debugging information with your page output, but writes in a special HTTP header instead, it’s especially useful for AJAX debugging. It can also come in handy when you’re trying to debug a server-side script which generates something else than a HTML page. A PDF or PNG file, for example.

OpenProj
OpenProj is a project management application written in Java and therefor platform independent. It has a lot of the features Microsoft Project has (according to the webpage; I have never used MS Project before, so I wouldn’t know) such as Resources, Gantt Charts, Network Diagrams (PERT Charts), WBS and RBS charts, etc. There are also various different representations of tasks for resources. It doesn’t really outshine Gnome Planner, but at least it’s platform independent.

Typechecking Python module
Typecheck provides powerful run-time typechecking facilities for Python functions, methods and generators. Without requiring a custom preprocessor or alterations to the language, the typecheck package allows programmers and quality assurance engineers to make precise assertions about the input to, and output from, their code.

Here’s a little code example:

@accepts(String, [Number], {str: Number})
def my_func(a, *vargs, **kwargs):
    pass

@accepts(String, Number, Number)
def my_func(a, *vargs, **kwargs):
    pass

It’s Alive! Aliive!!

Tuesday, June 10th, 2008

My personal website, Subversion, the projects website and most of the other stuff is finally back online. It disappeared somewhere in April, after another harddisk crash. This time, three of my machines decided to go belly-up all at the same time. All three were in different locations, spread across the country.

The worst thing was that I use those three servers as online backups for eachother. Imagine my surprise when all three went down with defective harddisks at the same time. After that, I couldn’t really bring myself to restoring everything, so I put the harddisks in a cupboard somewhere and decided to go without a website and asorted other junk for a while. But, I’ve managed to recover almost all of my stuff from one harddisk or another, and now most of it is back online.

Let’s hope it stays that way for a while.

Homepage: Photos added

Saturday, March 22nd, 2008

I’ve added some photos I made to my homepage.

pyBrainfuck

Saturday, March 22nd, 2008

For fun, I wrote a brainfuck interpreter in Python. Brainfuck is an esoteric (joke) programming language which is Turing-complete (given enough memory) with only 8 op-codes (instructions). It was designed to allow for the smallest possible compiler.

There are already some other Brainfuck implementations in Python, but they are either obfuscated or extremely slow. pyBrainfuck is optimized for speed by pre-caching loops and removing non-brainfuck opcodes.

PyBrainfuck can be used both as a stand-alone Brainfuck interpreter or as a python library. It can read from standard input or from a string (in library mode) and write to standard out or to a string buffer (in library mode).

pyBrainfuck is released under the MIT license. You can directly view the code for the interpreter at BitBucket.

Electricmonk.nl (zoltar) downtime

Tuesday, March 11th, 2008

The Zoltar machine (hosting electricmonk.nl and a host of other domains and stuff) has been down for about two weeks. Everything appears to work again now. The CPU fan was dead and then I made a stupid mistake with the network configuration so Zoltar couldn’t be reached when he was racked back in. That in combination with various logistics problems caused the long downtime. My apologies to everybody who was affected by it.

Disable Ubuntu command not found

Sunday, February 17th, 2008

Ubuntu 7.10 has this feature where, if you mistype a command on the shell, it’ll bother you with useless information about how you can install that application:

tiagoboldt@Niath:~$ gedit
The program 'gedit' is currently not installed. You can install it by typing:
sudo apt-get install gedit
bash: gedit: command not found
tiagoboldt@Niath:~$

It’s pretty annoying if you do a lot of work on the commandline and each time you mistype a command you have to wait for up to a full second before the command fails. I keep thinking I accidentally ran some command which is now wiping my disk. I’m sure it’s handy for noobs, but not for commandline junkies.

To get rid of it, type:

sudo aptitude purge command-not-found

And it won’t bother you anymore.

Transmission 1.x on Ubuntu 7.10

Sunday, February 17th, 2008

For some reason Ubuntu 7.10 has an ancient version of Transmission. Version 0.74 or somesuch. Unfortunately, that version of Transmission contained some bugs so it’s blocked by certain bittorrent trackers. In order to install a more recent version:

You can download a more recent version from the gutsy backports package pool.

  • Uninstall transmission:
    sudo aptitude purge transmission transmission-gtk
  • Download transmission-common 1.04
  • Download transmission-gtk 1.04
  • Install the packages:
    sudo dpkg -i "transmission-common_1.04-0ubuntu1~gutsy1_all.deb"
    sudo dpkg -i "transmission-gtk_1.04-0ubuntu1~gutsy1_i386.deb"

And you’ll have a more recent version of Transmission.

The text of all posts on this blog, unless specificly mentioned otherwise, are licensed under this license.