Published Jan 09, 2016
This article will cover Python’s sequence
, mapping
, and set
types.
I will refer to these collectively as collection
types.
Collections are defined by the special methods __len__
, __getitem__
,
__setitem__
, __delitem__
, and __iter__
.
__len__
is used by the built-in function len
to get the length/size of a
collection.
__getitem__
, __setitem__
, and __delitem__
are used to define the []
(item access) operator.
__iter__
is used by the built-in iter
function to get an iterator
over the
collection’s elements.
Iterators will be covered in a later article.
Note that a str
is actually a sequence of characters, and follows the same
conventions as other sequence types, such as list
and tuple
.
Python’s list
type is analogous to C++’s vector<void*>
or Java’s
List<Object>
.
It can automatically resize itself and the items can be of any type
(including other list
s).
list
literals are created using square brackets:
As with parentheses, Python ignores whitespace between square brackets:
Also notice that the final item can have a trailing comma.
This allows for easy modification of list
literals.
Like str
, list
supports indexing, slices, and len
.
You can modify the items of a list by assignment.
list
supports modification using append
, insert
, and pop
.
Note that inserting or popping anywhere except the end of the list takes linear time; inserting or popping at the end of the list takes constant time.
Since all variables are pointers, assigning a list does pointer assignment. That is, both names will refer to the same object in memory.
To make a copy of a list, use slicing.
Note that slice copying performs a shallow copy, where both lists share references to the same objects.
(see Sequence Types and Time Complexity )
tuple
is like list
, except it is immutable (cannot be modified).
tuple
supports all operations of list
except ones that modify the tuple
.
tuple
is often used like a C/C++ struct
.
The parentheses are only required to prevent ambiguity, and can be omitted.
Items in a tuple
(or any sequence type) can be unpacked into variables by
specifying a list of variables on the left-hand side of the assignment operator.
The number of variables must match the length of the sequence (and therefore must be known when the script is written).
Packing and unpacking can be combined in one statement.
This can be used for swapping variables, possibly modifying them in the process.
This can also be used for returning multiple values.
set
is an unordered collection that contains unique values.
An empty set
can be created using set()
.
Non-empty set
literals can be created using curly braces.
Testing for membership in a set
is much faster than for list
or tuple
.
set
also supports set arithmetic operations
union
, intersection
, and difference
.
(see Set Types)
Dictionaries (dict
) are Python’s associative array.
Like list
, dict
contains several values that are accessed by index.
Unlike list
, dict
’s indices can be any hashable object,
and are referred to as keys
.
An empty dict
can be created using dict()
or {}
.
Non-empty dict
literals can be created using key : value
pairs separated by
commas and enclosed in curly braces.
Like parentheses and square brackets, Python ignores whitespace between curly
braces, so dict
literals can span multiple lines for readability.
You can test if a key is in the dict
using in
.
The dict()
constructor builds dictionaries directly from sequences of
key-value pairs:
When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:
You can get a list of the keys in the dict
using dict.keys()
.
The return value is an iterator
, which will be covered later.
For now, just know that you can convert it to a list
or tuple
by passing it
to the appropriate constructor.
The dict
itself is also an iterator over its keys.
The keys are returned in arbitrary order. To list the keys in sorted order,
use sorted
.
You can get the values in the dictionary using the values
method.
You can get the key-value pairs using the items
method.
(see Mapping Types)
Python’s set
and dict
only work for hashable types.
A hash code is a number computed from some data, such that the same data
always has the same hash code, and different data should have a different
hash code.
An object is hashable if that object can compute a hash code of its data.
A hashable type implements the method __hash__
.
An object’s hash code can be retrieved using the built-in function hash
.
Because list
is mutable, it is not hashable.
Since tuple
is immutable, tuples are hashable as long as all of the items
are hashable.
This means that tuple
can be used as items in a set
or keys in a dict
.
In order for a type to be hashable, it must implement the __hash__
method.
A general requirement is that two objects which compare equal must have the same
hash code.
You must first implement the __eq__
method so you know which fields must be
incorporated in the __hash__
method.
Consider a 2d point class:
Since the __eq__
method compares the x
and y
attributes, the __hash__
method must also use those attributes.
An easy way to do this is to exclusive-or the hash codes of x
and y
:
Another option is to pack x
and y
into a tuple
and use the tuple
’s hash code:
(see
Python’s requirements on __hash__
,
Java’s requirements on hashing,
Java’s List.hashCode,
and my notes on hashing in Java and C++
)