Why Python Rocks II: Data structures

Saturday, June 28th, 2008

Okay. So what’s cool about Python? I can’t count the number of times I’ve had to show skeptics why Python is cool, what Python can do that their favorite language can’t do. So I’m writing a bunch of articles showing off Python’s Awesome.

All articles in this series:

Why Python Rocks I: Inline documentation
Why Python Rocks II: Data structures
Why Python Rocks III: Parameter expansion

A note for people new to Python

If you’re new to Python, I’ve got a couple of remarks. First off all, this is not a tutorial. It’s just meant to show you some of the cool stuff Python can do. In this article, you’ll see a lot of examples which look like:

>>> x = '10'
>>> print x
10

This is the Python interactive interpreter. It’s basically a command-line version of Python. You can type in stuff at the prompt (‘>>>’) and the interpreter will execute it as if it’s executing a normal program. If you see stuff like ‘…’, this is a continuing input prompt. It means the previous line wasn’t a complete Python statement which can be executed. Lines without anything prepended are output of previous commands.

Now that you know this, it’s time to go on to the juicy stuff.

Data structures

In the previous article, I showed off inline Python documentation strings. Another thing that makes Python cool are its built-in data structures. All high-level languages have good, easy to use and high-level data structures, but Python’s just – I don’t know – feel a little better. They’re more balanced, better worked out. Let’s look at a couple and see what we can do with them. This is a long article, sorry for that, but there’s a lot to show you.

Basics

The title of this article mentions it’s about data structures, but really I’m talking about the data types in Python. Some of the basic data types are common to most high-level languages. Strings, Booleans, Ints, Floats, etc. I’m not going to discuss them in detail; they should mostly be obvious. I’ll just show you some stuff which is nice about Python’s data types.

First of all, Python has default built-in Unicode support. We can easily create a unicode string like this:

>>> s = u'Andr\202'

I wanted to also show the result of printing the above ‘s’, but this blogging software I’m using is written in PHP, and it can’t handle Unicode, so it messed up the post completely. I think this even better proves why Python is pretty neat, so I’m going to leave at this as far as Unicode is concerned.

Strings have various nice methods to operate on them:

>>> s = 'a\nb\r\nc' 
>>> s.splitlines()     # DOS and Unix line-endings both recognized
['a', 'b', 'c']
>>> s.startswith('a')
True
>>> s = 'aabdefef'
>>> s.count('ef')      # Count occurrences of substrings.
2
>>> s.isalpha()
True
>>> s.isdigit()
False
>>> '15'.isdigit()     # Methods work on strings, also inline.
True
>>> '--abcdcba--'.strip('--')
'abcdcba'

These are all simple operations on strings, but they make life incredibly easy for the programmer. Take for instance this example, where we want to join a list (a list is an array; more on that later) of strings together:

>>> l = ['one', 'two', 'three']
>>> ', '.join(l)
'one, two, three'
>>> 'one, two, three'.split(', ')
['one', 'two', 'three']

It may seem strange to see something like ', '.join(l). Why not just l.join(', ')? I’m not sure what the original reason was for implementing it like this, but one reason might have been that string joining would otherwise have to be implemented for everything it works on (lists, sets, dictionaries, etc). This way, it’s a more common operation of strings.

String formatting
Another cool thing Pythons strings have is string formatting. Python gives us lots of different ways of formatting strings. The basic format is: STRING % (PARAM1, PARAM2). Inside STRING, we can specify parameter mapping using the percentage-sign. For instance, to map two string parameters: '%s and %s' % ('one', 'two'), which results in the string ‘one and two‘. Some more examples:

>>> '%s %i' % ('hello', 5)
'hello 5'

>>> '%10i' % (5)             # Left padding
'         5'
>>> '%-10i' % (5)            # Right padding
'5         '

>>> '%010i' % (5)            # Left padding with 0s
'0000000005'

>>> '%1.2f' % (5.14431)      # Floating point
'5.14'

Mappings are processed from left to right, just like C’s printf function. If you’ve got a lot of mappings, you can assign them names and use a dictionary (more on that later) to handle the mappings:

>>> 'Hello %(name)s. You are %(age)i years old' % {
...   'name': 'Ferry',
...   'age': 28 
... }
'Hello Ferry. You are 28 years old'

An ocurrence of %(NAME)s is automatically replaced by the dictionary value with key ‘NAME’.

Lists, tuples, sets and dicts

Python has built in support for lists, tuples and sets. Lists are just arrays. They are mutable (you can add, modify and remove stuff to/in/from them). Tuples are like lists, except they can’t be modified. Sets are like tuples (immutable) but the values in the set are unique. We can store anything we want in lists, tuples and sets: strings, integers, but also other lists, object instances, etc.

>>> l = ['one', 2, ['three', 'four']] # A list with a string, integer and another list

We can do neat stuff with list methods:

>>> l = [1, 5, 2, 6, 4, 5]
>>> l.sort()          # Sort the list in-place.
>>> l
[1, 2, 4, 5, 5, 6]
>>> l.count(5)        # Count occurrences of elements
2
>>> l.reverse()       # Reverse the list in-place.
>>> l
[6, 5, 5, 4, 2, 1]
>>> l.index(4)        # Get the place of an element in the list
3
>>> l.pop(l.index(4)) # and remove it.
4
>>> l
[6, 5, 5, 2, 1]

>>> l1 = ['one', 'two', 'three']
>>> l2 = [4, 5]
>>> l1 + l2       # Create a new list with the elements of l1 and l2
['one', 'two', 'three', 4, 5]
>>> l1.extend(l2) # Or add the elements of l2 to l1
>>> l1
['one', 'two', 'three', 4, 5]

Tuples are like lists, except that the are immutable. They’re mostly useful for writing fixed lists inline in your code. For example: t = (1, 2, 3, 3). Take a look at string formatting later on for some examples of tuples.

Then there are sets which, like I already said, are immutable and of which the elements are unique. We can use this to filter a list for instance:

>>> l = [6, 5, 5, 4, 2, 1]
>>> s = set(l)
>>> print s
set([1, 2, 4, 5, 6])        # The other 5 was removed.

Dicts are dictionaries, also known as associative arrays and hash maps. Basically, they map keys to values. Unlike PHP, dicts are only hash maps and not also normal arrays. This prevents you from mixing normal array and associative array elements (which, in my honest opinion, is a bad idea anyway).

Defining a dictionary is easy:

>>> d = {'a': 1, 'b': 2, 'c':3}
>>> print d['c']
3
>>> d = dict( [('a', 1), ('b', 2), ('c', 3)] ) # Convert a list of two-value tuples to a dictionary
>>> print d
{'a': 1, 'b': 2, 'c': 3}

Python dictionaries offer various methods for retrieving values:

>>> d.get('c')
3
>>> d.get('d', 4) # Return 4 if 'd' not in dict
4

dict.get() allows you to return a default value if the key wasn’t found in the dictionary. This makes it easier than, say, PHP, where you have to do something like:

$a = array('a' => 1, 'b' => 2, 'c' => 3);
if (array_key_exists('d', $a)) {
  print $a['d'];
} else {
  print 4
}

(Note: Many people seem to use isset in PHP to test for array keys. This is wrong, as a key with a value of NULL in an array will evaluate to False when using isset() in PHP. I.e: $a = array('a' => NULL); isset($a['a']) == False. Oh, and the same goes for normal variables. You see, isset() doesn’t test whether a variable is actually set.. it tests whether the variable’s value isn’t NULL. One of the many many reasons I stopped using PHP).

In Python, we can also easily set a dictionary key’s value if it doesn’t exist yet:

>>> print d
{'a': 1, 'b': 2, 'c': 3}
>>> d.setdefault('d', 4)
4
>>> print d
{'a': 1, 'b': 2, 'c': 3, 'd':4}

Other neat stuff we can do with dicts:

>>> dict.fromkeys(('a', 'b', 'c'), '') # Quickly construct a 'template' empty dictionary
{'a': '', 'b': '', 'c': ''}
>>> d.keys()                     # Get the keys
['a', 'c', 'b', 'd']
>>> d.values()                   # Get the values
[1, 3, 2, 4]
>>> d.items()                    # Get keys and values
[('a', 1), ('c', 3), ('b', 2), ('d', 4)]
>>> for key, value in d.items():
...   print "Key '%s' has value: '%s'" % (key, value)
Key 'a' has value: '1'
Key 'c' has value: '3'
Key 'b' has value: '2'
Key 'd' has value: '4'

Built-in functions that operate on iterables

We can combine the stuff above to do things such as print out a sorted version of the dictionary. (Dictionary keys are not ordered in any predictable way, so you can’t, let’s say, sort an dictionary in-place).

>>> for key in sorted(d.keys()):
...   print d[key]
1
2
3
4

In the example above, you can see we retrieve a list of keys in ‘d’, sort them and then walk through them. The sorted() built-in Python function sorts an iterable (in this case the list d.keys() returned) on the spot, without touching the original object. There are a couple other built-in functions we can use on most Python data structures:

>>> len([1, 2, 3, 4])     # Length of a list
4
>>> len({'a': 1, 'b': 2}) # Number of keys in a dictionary
2
>>> len('Hello')          # Length of a string
5
>>> min([1, 2, 3, 4])     # Smallest value in list
1
>>> max({'a': 1, 'b': 2}) # Largest key value in dict
'b'

The ‘in’ construct is also very powerfull. We’ve already seen iterating through all kinds of stuff such as for x in list(1, 2, 3) and for key, value in dict.keys(). We can also use ‘in’ to test for occurrence in data structures:

>>> print 'a' in 'dcbabcd'
True
>>> print 'bab' in 'dcbabcd'
True
>>> print 5 in [3, 4, 5]
True
>>> print (1, 2) in [(3, 4), (1, 2), (5, 6)]
True
>>> print 'key2' in {'key1': 1, 'key2': 2, 'key3': 3}
True

As long as you can compare two data types, you can use ‘in’ to check just about any data structure for the occurrence of a value.

Slicing

The final thing I’m going to show you is slicing. Slicing is very powerful, and also works on most data structures. Slicing means extracting a piece of a data structure. The syntax is s[from: to: step], which means ‘Return a slice from FROM to TO (not inclusive), with step STEP. The best thing is that we can also specify negative FROM, TO and STEP values, which will do things such as starting from the right side of the data structure instead of the left side, etc. Some examples:

>>> s = 'abc123def'
>>> s[3]
1
>>> s[0:3] 
'abc'
>>> s[:3]   # Same as s[0:3]
'abc'
>>> s[-3] 
'd'
>>> s[:-3]
'abc123'
>>> s[-3:]  # Omitting TO means till the end
'def'
>>> s[::2]  # From begin to end, skip one each time
'ac2df'
>>> s[::-1] # Invert a string
'fed321cba'

Slicing also works on lists, tuples and sets. It doesn’t work on dictionaries, because dictionaries don’t have indexes:

>>> [1, 2, 3, 4][-2:]
[3, 4]
>>> {'a': 1, 'b': 2, 'c': 3}[2:]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type

Conclusion

You’ve seen a lot of the things we can do with the built-in data structures in Python. The possibilities aren’t limited to these though, but I can only show you so much in a single article. In one of the future articles I’ll discuss Python’s functional programming constructs, which are an excellent way of manipulating data in Python.

As you can see, the built-in Python data structures are very powerful and easy to use. Even though other languages have most or all of the same data structures Python has, the ones in Python just seem a little more advanced and easy to use (especially compared to PHP for instance). Personally, I think this is one of the things that make Python such an attractive language.

Stay tuned for more Python coolness in future articles!

Blog

Why Python Rocks II: Data structures