Objects in Python
Overview
Teaching: 20 min
Exercises: 15 minQuestions
What are objects in Python?
What is a class or type?
Can objects belong to more than one class?
How can objects be created from a class?
Objectives
Be able to distinguish between class and object
Be able to construct objects via a class’s constructor
Be able to distinguish between equality and identity of objects
You may recall that we calculated the mean of a Numpy array by using
numpy.mean
, as
import numpy
numbers = numpy.arange(10)
print(numpy.mean(numbers))
4.5
However, we can also calculate the mean of a Numpy array as:
print(numbers.mean())
4.5
Let’s see if we can do this with a normal list:
more_numbers = [1, 2, 3, 4]
print(more_numbers.mean())
In this case Python will complain with an error. How does Python know it can do this for numbers
but not more_numbers
?
What type is it?
Let’s investigate this further by using type
to identify what the data type of numbers
is:
type(numbers)
<class 'numpy.ndarray'>
What about the type of the variable more_numbers
?
type(more_numbers)
<class 'list'>
We can see here that numbers
is an object of the type numpy.ndarray
. In Python, anything which can be stored in a variable or passed to a function is called an object. Objects are classified by their type
, or their class
.
Class or Type?
Note that in literature, you’ll find a subtle distinction between class and type. However, since in Python 3 we can’t have one without the other, we will use both terms interchangably.
Let’s find some types
Can you find anything that you can store in a variable which does not have a class? What is the type of the number
1
, or the string"hello"
?Does the class change if they are passed directly to
type
, or if they are stored in a variable?Solution
type(1)
<class 'int'>
type("hello")
<class 'string'>
Does everything have a class?
Try to find words that Python recognises that do not have classes. What about
numpy.mean
ornumpy
? What aboutif
orfor
? Can you think of others?Solution
type(numpy.mean)
<class 'function'>
type(numpy)
<class 'module'>
The objects
numpy.mean
andnumpy
are things that we typically wouldn’t store in variables or passed around. However, they could in principle be stored in variables, and therefore are objects with a class.type(if)
File "<stdin>", line 1 type(if) ^ SyntaxError: invalid syntax
type(for)
File "<stdin>", line 1 type(for) ^ SyntaxError: invalid syntax
The words
if
andfor
are part of the Python language itself, they can’t be stored in variables. Only things which can be stored in variables can have a class.
Changing things
In Python, there are two ways in which objects can behave. The most intuitive case is when object are created with a value, and they keep the value forever. Many objects we’re familiar with, such as integers or strings, are objects which hold a value.
Let’s store a string in a variable:
message = "Hello"
The variable message
now refers to an object, which has the value
"Hello"
. We can point another variable at the same object with:
second_message = message
But, we can never change the value of the string object itself. The
string “hello” will always be the string “hello”. We can set the
variable second_message
to a new object, with
second_message = message + ", world"
But the original object is still there, unchanged. We can still get to it by typing
print(message)
This may not seem surprising, but not all objects in Python behave this way. Consider the following list of strings:
messages = ["Hello", "world!"]
Let’s point new variable duplicate_messages
at the list named messages
duplicate_messages = messages
Think of this as pointing duplicate_messages
at the same underlying
object contained in messages
: Now let’s change a part of
duplicate_messages
:
duplicate_messages[1] = "there!"
What is the value of messages
now?
print(messages)
['Hello', 'there!']
Note how we changed messages
through the variable
duplicate_messages
. We can do this because both messages
and
duplicate_messages
refer to the same underlying object, and that
underlying object can be changed.
We say that objects which can’t be changed, like numbers and string, are immutable. Numbers are an intuitive example of immutable objects, the number 1000 will always be the number 1000. We say that these objects that can be changed are mutable, they can be “mutated” after they’ve created.
Immutable lists
Python has a class similar to a list called a
tuple
. Is atuple
mutable or immutable?Check if you can change a tuple by setting:
messages = ("Hello", "world!")
and trying to modify the second element with:
messages[1] = "there!'
Solution
messages = ("Hello", "world!") messages[1] = "there!"
You should see an error containing the text:
TypeError: 'tuple' object does not support item assignment
This is telling you that you can’t modify the tuple object, this is true because the tuple object is immutable.
What kind of objects?
List some objects that you think are mutable and immutable. Verify this by trying to find ways to change the objects.
Note: Be careful that you’re not “cheating” by using
=
to point to a new object.
Instances and Methods
We say that an object of a particular class is an instance of that
class. To use a real world example, we could have the type or class
Chair
which describes to all the chairs in the world. The chair that
you are sitting on right now is a specific instance of the chair
class.
We can check if an object is an instance of a particular class with the isinstance
function.
isinstance(numbers, numpy.ndarray)
True
Every object is created with a single class, and which can’t be changed. The class of an object can also provide behaviour that the object might have, by providing functions to objects in its class. These functions can be called by using a dot after the variable name, for example:
numbers.mean()
The functions which are associated with an object are provided by the class of the object. When a class provides a function to an object we call that function a method of the class.
We say that the numpy.ndarray
class provides the mean
method. Since numbers
belongs to the class numpy.ndarray
, we can use the mean
method on the object referred to by numbers
, by calling numbers.mean()
. This allows objects of a numpy.ndarray
to provide functionality specific to objects of class numpy.ndarray
.
It’s worth noting that both mutable and immutable objects can have methods. Methods of immutable objects, however, can’t change the underlying object. If needed, they will return a brand new object, and set the expected value in the new object. To keep this change, you will need to store it in a variable, for example:
hello = "hello, world"
capital_hello = hello.capitalize()
print(capital_hello)
Methods of mutable objects can, and often do, change the object.
grades = [84, 78, 91]
grades.append(66)
print(grades)
[84, 78, 91, 66]
In this case, we don’t need the extra =
to assign the value to a new object.
Finding out what things are
use
type()
to find the type ofstudents
, defined asstudents = ['Petra', 'Aalia', 'Faizan', 'Shona']
and check this with
isinstance
.Solution
type(students)
<class 'list'>
isinstance(students, list)
True
Other common classes
What other classes have you encountered previously when using Python? What methods did they provide?
Making an object
A class can be called as a function, in which case it constructs new instances of itself. While this is not the only way to make objects, it is one that all classes offer. For example, a new list can be created as:
students = list()
print(students)
print(type(students))
[]
<class 'list'>
Making a Numpy array
While all classes can be constructed by calling their name, some classes don’t recommend this route. For example,
numpy.ndarray
is used internally by Numpy to initialise its arrays, but Numpy recommends using one of the higher-level functions likenumpy.zeros
,numpy.ones
,numpy.empty
, ornumpy.asarray
to construct an array (of zeroes, of ones, without initialising the data, and initialising from an existing data structure like a list, respectively).
Make a dict
Given the following list of students and their grades, how would you construct a
dict
with students as keys, and grades as values?students = ['Petra', 'Aalia', 'Faizan', 'Shona'] grades = [84, 78, 91, 66]
You can check the type of the object you’ve created with
isinstance
Hint:
zip()
can be used to turn two lists into tuples of corresponding pairs of elements.Solution
student_grades = dict(zip(students, grades)) isinstance(student_grades, dict)
True
Equality and identity
Python has two ways of testing whether two objects are the “same”. The first is equality, or whether the associated values or contents of the object are the same.
The second is identity, or whether the objects are in fact the same instance, with names referring to the same underlying object.
Equality is tested with ==
, which you have probably used before. We can test for identity with the is
keyword:
old_students = students
new_students = ['Petra', 'Aalia', 'Faizan', 'Shona']
if old_students == students:
print("old_students is equal to the students list")
if new_students == students:
print("new_students is equal to the students list")
if old_students is students:
print("old_students is identical to the students list")
if new_students is students:
print("new_students is identical to the students list")
Constructing a new list that has the same elements as an existing list gives a list that is equal, but not identical, to the existing one. This is true for any class: constructing a new object that is the same as an existing one will give a result that is equal, but not identical, to the existing one.
Inheritance
Object-oriented programming allows relationships to be defined between classes or types. One class may be considered to be a specialisation or subclass of another. For a real world example, a car could be considered a specialisation or subclass of the class of all vehicles.
This is very frequently seen in the way Python handles exceptions. For example,
if we check what type a ValueError
is, we see that it is of
class 'ValueError'
:
an_error = ValueError("A value must be provided")
print(type(an_error))
if isinstance(an_error, ValueError):
print("an_error is a ValueError")
<class 'ValueError'>
an_error is a ValueError
However, we can also check if it is an Exception
:
if isinstance(an_error, Exception):
print("an_error is an Exception")
an_error is an Exception
This is because ValueError
is a subclass of Exception
: value errors are a
specific type of exception that can occur, and so should have all the same
logic that is common to all exceptions.
One place this can be used is to structure exception handling; for example:
numerator = 5
denominator = 0
try:
print(numerator, "divided by", denominator, "is", numerator / denominator)
except ZeroDivisionError:
print("You can't divide by zero!")
except Exception:
print("Something else went wrong.")
ZeroDivisionError
is another subclass of Exception
. On encountering an
exception, Python checks each except
in turn to see whether the exception
matches the class being tested for. The more specific ZeroDivisionError
catches the specific case of dividing by zero, but the block is skipped for
all other issues, which are then handled by the more general Exception
.
Key Points
Anything that we can store in a variable in Python is an object
Every object in Python has a class (or type)
list
andnumpy.ndarray
are commonly-used classes; lists and arrays are corresponding objectsCalling the class as a function constructs new objects of that class
Classes can inherit from other classes; objects of the subclass are automatically also of the parent class
Writing classes
Overview
Teaching: 20 min
Exercises: 25 minQuestions
How are classes written in Python?
What do methods look like?
How can a class customise how its instances are constructed?
Objectives
Write classes from scratch
Write methods for classes
Write custom
__init__
methods
In the previous section, we’ve seen how objects can have different behaviour, provided by methods, which in turn are provided by the class of an object.
But what if we want to make our own classes and objects?
If we wanted to plot a variety of quadratic functions, with a consistent set of styles, we could define a class that does this:
from matplotlib.pyplot import show, subplots
from numpy import linspace
class QuadraticPlotter:
color = 'red'
linewidth = 1
def plot(self, a, b, c):
"""Plot the line a * x ** 2 + b * x + c and output to the screen.
x runs between -10 and 10, with 1000 intermediary points.
The line is plotted in the colour specified by color, and with width
linewidth."""
fig, ax = subplots()
x = linspace(-10, 10, 1000)
ax.plot(
x,
a * x ** 2 + b * x + c,
color=self.color,
linewidth=self.linewidth,
)
Similarly to how def
is used to define a function, the class
keyword is
used to define a new class. Both functions and variables can be created inside
the class block, and these will be accessible on any objects of the class that are
created.
When functions are defined within a class, they will become methods of instances
of the class. In order for the function to be aware of the object that they need
to refer to, methods are always given the instance as their first argument. By
convention, the first argument of methods is always called self
, so that the
object can be referred to consistently whenever it is needed.
Note that variables within methods are local to that method. For example, fig
and ax
will be deleted once the method finishes running. To access variables
attached to the object, their names must be prefixed by self.
.
Other names than
self
While it is possible to use any variable name for the first argument of a method, and Python will not complain, other programmers will. Since one aim when programming is to be as clear as possible to others who may read the program later, we strongly recommend following the convention of calling the first argument to methods
self
.
Naming classes
Another convention in Python is that class names start with a capital letter, and instead of underscores, initial letters of subsequent words are also capitalised. This makes it easier to distinguish classes from objects and other variables at a glance.
So far this code hasn’t visibly done anything; while we have defined a class, we have yet to use it. Let’s do that now.
plotter = QuadraticPlotter()
plotter.plot(1, 2, 3)
plotter.plot(1, 0, -1)
show()
Notice that we only supply the arguments a
, b
, and c
to plotter.plot()
—
Python automatically adds the object to become the self
parameter.
So far, this hasn’t done anything that we couldn’t have done with a function to perform the setup and then do the plot—perhaps something like:
def quadratic_plot(a, b, c, color="red"", linewidth=1):
"""Plot the line a * x ** 2 + b * x + c and output to the screen.
x runs between -10 and 10, with 1000 intermediary points.
The line is plotted in the colour specified by color, and with width
linewidth."""
fig, ax = subplots()
x = linspace(-10, 10, 1000)
ax.plot(x, a * x ** 2 + b * x + c, color=color, linewidth=linewidth)
However, what if we wanted to plot some of the curves in a thick blue line?
With this function, we could set the color
and linewidth
on every call,
but that would create a lot of repetition, and hence opportunities for
the code to become inconsistent.
We could also use a dict
to hold the common options:
thick_blue = {"color": "blue", "linewidth": 5}
quadratic_plot(3, -5, 5)
quadratic_plot(-3, 1, 0, **thick_blue)
quadratic_plot(2, 10, 2)
quadratic_plot(-2, 13, 4, **thick_blue)
show()
**
The
**
syntax here tells Python to take thethick_blue
dict
, and use its keys and values as keywords and keyword arguments. We’ll look at this operator later in the lesson where we talk about decorators.
Using objects on the other hand gives a neat alternative way of achieving this result:
blue_plotter = QuadraticPlotter()
blue_plotter.color = "blue"
blue_plotter.linewidth = 5
plotter.plot(3, -5, 5)
blue_plotter.plot(-3, 1, 0)
plotter.plot(2, 10, 2)
blue_plotter.plot(-2, 13, 4)
show()
The two objects plotter
and blue_plotter
can store the different states
needed to set up the two styles of plot, whilst keeping the plotting
functionlity common, so it doesn’t need to be written separately for red and
blue versions. We no longer have to specify the colour every time we
want to plot with a non-default colour—instead, we can use the
QuadraticPlotter
instance that has the colour we want set.
If we need to, we can check the values of the variables we defined:
print("Line width of red plotter is", plotter.linewidth)
print("Line width of blue plotter is", blue_plotter.linewidth)
Line width of red plotter is 1
Line width of blue plotter is 5
Mutation revisitied
Note that the classes we create ourselves in this way will produce mutable objects. This means that we can change the values in objects of
plotter.linewidth
, and Python allows us to do that. It doesn’t throw an error.
Zoom in
Currently
QuadraticPlotter
is hardcoded to plot between -10 and 10. Try adjusting it so that it can be adjusted in the same way as thecolor
andlinewidth
can, while keeping the current defaults.Use the new class to plot the curve with
a = 3, b = 2, c = 1
both between -10 and 10, and between -5 and 50. Do this without changing the arguments to theplot
method.Solution
class QuadraticPlotter: color = "red" linewidth = 1 x_min = -10 x_max = 10 def plot(self, a, b, c): """Plot the line a * x ** 2 + b * x + c and output to the screen. x runs between -10 and 10, with 1000 intermediary points. The line is plotted in the colour specified by color, and with width linewidth.""" fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot( x, a * x ** 2 + b * x + c, color=self.color, linewidth=self.linewidth, ) narrow_plot = QuadraticPlotter() wide_plot = QuadraticPlotter() wide_plot.x_min = -5 wide_plot.x_max = 50 narrow_plot.plot(3, 2, 1) wide_plot.plot(3, 2, 1) show()
Plots of fits
The following function performs an Orthogonal Distance Regression fit of some data, and plots the resulting fit line along with the data.
from scipy.odr import ODR, Model, RealData from matplotlib.pyplot import show def linear(params, x): return params[0] * x + params[1] def odr_fit(f, x, y, xerr=None, yerr=None, p0=None, num_params=None): if not p0 and not num_params: raise ValueError("p0 or num_params must be specified") if p0 and (num_params is not None): assert len(p0) == num_params data_to_fit = RealData(x, y, xerr, yerr) model_to_fit_with = Model(f) if not p0: p0 = tuple(1 for _ in range(num_params)) odr_analysis = ODR(data_to_fit, model_to_fit_with, p0) odr_analysis.set_job(fit_type=0) return odr_analysis.run() def plot_results( f, fitobj, x, y, xmin=None, xmax=None, xerr=None, yerr=None, filename=None, ): fig, ax = subplots() if xmin is None: xmin = min(x) if xmax is None: xmax = max(x) x_range = linspace(xmin, xmax, 1000) ax.plot(x_range, f(fitobj.beta, x_range), label="Fit") ax.errorbar(x, y, xerr=xerr, yerr=yerr, fmt=".", label="Data") ax.set_xlabel(r"$x$") ax.set_ylabel(r"$y$") fig.suptitle( f"Data: $A={fitobj.beta[0]:.02}" f"\\pm{fitobj.cov_beta[0][0]**0.5:.02}, " f"B={fitobj.beta[1]:.02}\\pm{fitobj.cov_beta[1][1]**0.5:.02}$" ) ax.legend(loc=0, frameon=False) if filename is not None: fig.savefig(filename) x_data = [0, 1, 2, 3, 4, 5] y_data = [1, 3, 2, 4, 5, 5] x_err = [0.2, 0.1, 0.3, 0.2, 0.5, 0.3] y_err = [0.4, 0.4, 0.1, 0.2, 0.1, 0.4] result = odr_fit(linear, x_data, y_data, x_err, y_err, num_params=2) plot_results(linear, result, x_data, y_data, xerr=x_err, yerr=y_err) show()
This code has a lot of repeated terms, and would have even more if we wanted to set custom formatting each time.
Try rewriting this as a class, turning most function arguments into variables attached to the object, and functions into methods. Some of these you won’t be able to set in the class definition, but will need to be set before the functions will work.
Solution
class FitterPlotter: x_data = None y_data = None x_err = None y_err = None fit_result = None fit_form = None num_fit_params = None xmin = None xmax = None def odr_fit(self, p0=None): if None in (self.x_data, self.y_data, self.fit_form): raise ValueError("x_data, y_data, and fit_form must be specified") if not p0 and not self.num_fit_params: raise ValueError("p0 or num_fit_params must be specified") if p0 and (self.num_fit_params is not None): assert len(p0) == self.num_fit_params data_to_fit = RealData(self.x_data, self.y_data, self.x_err, self.y_err) model_to_fit_with = Model(self.fit_form) if not p0: p0 = tuple(1 for _ in range(self.num_fit_params)) odr_analysis = ODR(data_to_fit, model_to_fit_with, p0) odr_analysis.set_job(fit_type=0) self.fit_result = odr_analysis.run() return self.fit_result def plot_results(self, filename=None): if None in (self.x_data, self.y_data): raise ValueError("x_data and y_data must be specified") fig, ax = subplots() xmin, xmax = self.xmin, self.xmax if xmin is None: xmin = min(self.x_data) if xmax is None: xmax = max(self.x_data) if self.fit_result is not None: x_range = linspace(xmin, xmax, 1000) ax.plot( x_range, self.fit_form(self.fit_result.beta, x_range), label="Fit", ) fig.suptitle( f"Data: $A={self.fit_result.beta[0]:.02}" f"\\pm{self.fit_result.cov_beta[0][0]**0.5:.02}, " f"B={self.fit_result.beta[1]:.02}" f"\\pm{self.fit_result.cov_beta[1][1]**0.5:.02}$" ) ax.errorbar( self.x_data, self.y_data, xerr=self.x_err, yerr=self.y_err, fmt=".", label="Data", ) ax.set_xlabel(r"$x$") ax.set_ylabel(r"$y$") ax.legend(loc=0, frameon=False) if filename is not None: fig.savefig(filename) fitterplotter = FitterPlotter() fitterplotter.x_data = [0, 1, 2, 3, 4, 5] fitterplotter.y_data = [1, 3, 2, 4, 5, 5] fitterplotter.x_err = [0.2, 0.1, 0.3, 0.2, 0.5, 0.3] fitterplotter.y_err = [0.4, 0.4, 0.1, 0.2, 0.1, 0.4] fitterplotter.fit_form = linear fitterplotter.num_fit_params = 2 fitterplotter.odr_fit() fitterplotter.plot_results() show()
Initialising instances
So far we can create an object with the defaults that we set in the class definition, and then customise it afterwards. But wouldn’t it be nice to be able to create an object with the attributes that we want straight out of the box?
To do this, we can define an initialiser for the class. When Python creates
an instance of a class, it looks for a method called __init__
. If it finds
one, then it calls it, giving it all the arguments passed to the class.
For example, for the QuadraticPlotter
, the variables color
and linewidth
could be passed as arguments to __init__
and set on initialisation, rather
than being defined as part of the class definition:
from matplotlib.colors import is_color_like
class QuadraticPlotter:
def __init__(self, color='red', linewidth=1):
'''Set the initial attributes of this plotter.'''
assert is_color_like(color)
self.color = color
self.linewidth = linewidth
def plot(self, a, b, c):
'''Plot the line a * x ** 2 + b * x + c and output to the screen.
x runs between x_min and x_max, with 1000 intermediary points.
The line is plotted in the colour specified by color, and with width
linewidth.'''
fig, ax = subplots()
x = linspace(-10, 10, 1000)
ax.plot(
x,
a * x ** 2 + b * x + c,
color=self.color,
linewidth=self.linewidth,
)
pink_plotter = QuadraticPlotter(color='magenta', linewidth=3)
pink_plotter.plot(0, 1, 0)
show()
This also lets us do some validation that the values we are given are usable, rather than deferring these errors to a long way down the line.
Pronouncing
__init__
The method name
__init__
is most often pronounced “dunder init”, where the “dunder” is short for “double underscore”, since the name starts and ends with two underscores.We’ll encounter more methods with “dunder” in the name in a later episode.
Zoom in again
Try rewriting the “Zoom in” example above to set the bounds of the plot, as well as the
color
andlinewidth
, using arguments to the constructor.Solution
class QuadraticPlotter: def __init__(self, color='red', linewidth=1, x_min=-10, x_max=10): '''Set the initial attributes of this plotter.''' assert is_color_like(color) self.color = color self.linewidth = linewidth self.x_min = x_min self.x_max = x_max def plot(self, a, b, c): '''Plot the line a * x ** 2 + b * x + c and output to the screen. x runs between x_min and x_max, with 1000 intermediary points. The line is plotted in the colour specified by color, and with width linewidth.''' fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot( x, a * x ** 2 + b * x + c, color=self.color, linewidth=self.linewidth, ) narrow_plot = QuadraticPlotter() wide_plot = QuadraticPlotter(x_min=-5, x_max=50) narrow_plot.plot(3, 2, 1) wide_plot.plot(3, 2, 1) show()
Initialising fitting
Adjust your solution to the Plots of fits challenge above so that it has an initialiser which checks that the needed parameters are given before initialising the object.
Solution
class FitterPlotter: fit_result = None def __init__( self, x_data, y_data, x_err=None, y_err=None, fit_form=None, num_fit_params=None, xmin=None, xmax=None, ): self.x_data = x_data self.y_data = y_data self.x_err = x_err self.y_err = y_err self.fit_form = fit_form self.num_fit_params = num_fit_params self.xmin = xmin self.xmax = xmax def odr_fit(self, p0=None): if self.fit_form is None: raise ValueError("fit_form must be specified") if not p0 and not self.num_fit_params: raise ValueError("p0 or num_fit_params must be specified") if p0 and (self.num_fit_params is not None): assert len(p0) == self.num_fit_params data_to_fit = RealData(self.x_data, self.y_data, self.x_err, self.y_err) model_to_fit_with = Model(self.fit_form) if not p0: p0 = tuple(1 for _ in range(self.num_fit_params)) odr_analysis = ODR(data_to_fit, model_to_fit_with, p0) odr_analysis.set_job(fit_type=0) self.fit_result = odr_analysis.run() return self.fit_result def plot_results(self, filename=None): fig, ax = subplots() xmin, xmax = self.xmin, self.xmax if xmin is None: xmin = min(self.x_data) if xmax is None: xmax = max(self.x_data) if self.fit_result is not None: x_range = linspace(xmin, xmax, 1000) ax.plot( x_range, self.fit_form(self.fit_result.beta, x_range), label='Fit', ) fig.suptitle( f'Data: $A={self.fit_result.beta[0]:.02}' f'\\pm{self.fit_result.cov_beta[0][0]**0.5:.02}, ' f'B={self.fit_result.beta[1]:.02}' f'\\pm{self.fit_result.cov_beta[1][1]**0.5:.02}$' ) ax.errorbar( self.x_data, self.y_data, xerr=self.x_err, yerr=self.y_err, fmt='.', label='Data', ) ax.set_xlabel(r'$x$') ax.set_ylabel(r'$y$') ax.legend(loc=0, frameon=False) if filename is not None: fig.savefig(filename) fitterplotter = FitterPlotter( x_data=[0, 1, 2, 3, 4, 5], y_data=[1, 3, 2, 4, 5, 5], x_err=[0.2, 0.1, 0.3, 0.2, 0.5, 0.3], y_err=[0.4, 0.4, 0.1, 0.2, 0.1, 0.4], fit_form=linear, num_fit_params=2, ) fitterplotter.odr_fit() fitterplotter.plot_results() show()
Key Points
Classes in Python are blocks started with the
class
keywordMethod definitions look like functions, but must take a
self
argumentThe
__init__
method is called when instances are constructed
Inheritance
Overview
Teaching: 20 min
Exercises: 20 minQuestions
How can classe relationships where one represents a specific subset of another be represented?
How can functionality on one class be overridden or extended by its children?
Objectives
Be able to use inheritance to construct parent-child relationships between classes
Be able to override methods on child classes, and refer back to the parent class’s implementations
We have talked about using classes as a way to reduce repetition in the
software we write. However, what happens if we want to write two classes that
do similar but distinct things? For example, if we wanted to write a
CubicPlotter
as well as our QuadraticPlotter
, would we need to repeat all
of the code common to both of them? What if we wanted a QuarticPlotter
and a
QuinticPlotter
as well? This repetitive code would quickly start to build
up…
Thankfully, Python (and most other languages that have classes) give us a mechanism to avoid this in the form of inheritance. A class that inherits from a second class automatically gains all of the second’s attributes and methods. The class that is being inherited from is called the parent class, superclass, or base class, while the new class inheriting from it is called the child class, subclass, or derived class.
We saw earlier that ValueError
is a subclass of Exception
, and that this
can be used to handle both specific and more general exceptions in a
hierarchy. We can also use this to define our own exceptions. Say, for
example, we have a function to convert temperatures from degrees Celsius to
degrees Fahrenheit,
\(\theta_{\mathrm{F}}(\theta_{\mathrm{C}})=\frac{9}{5}\theta_{\mathrm{C}} +
32\).
We know that temperatures below absolute zero are not valid, so if we
encounter those in our code we would like to raise the alarm as soon as
possible; we could do this with an assert
, but another way of expressing this
could be by defining our own exception to flag this. A temperature below
\(-273.15^\circ\mathrm{C}\) is an example of a bad value, so this would want to
inherit from ValueError
.
class InvalidTemperatureError(ValueError):
pass
def celsius_to_fahrenheit(temperature_c):
if temperature_c < -273.15:
raise InvalidTemperatureError
return temperature_c * 9 / 5 + 32
The pass
keyword here tells Python that while we have started an indented
block, we don’t actually have anything to put in it. (If we were to omit it,
Python would complain at us that it expected a block and didn’t get one.) So we
have constructed a new class called InvalidTemperatureError
, which is an
exact copy of ValueError
, except that it knows that ValueError
is its
parent. Let’s test this.
for temperature_c in 0, 100, -300:
print(temperature_c, "degrees Celsius is",
celsius_to_fahrenheit(temperature_c), "degrees Fahrenheit")
0 degrees Celsius is 32.0 degrees Fahrenheit
100 degrees Celsius is 212.0 degrees Fahrenheit
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "<stdin>", line 3, in celsius_to_fahrenheit
__main__.InvalidTemperatureError
If we wanted to, we could catch this exception with except
InvalidTemperatureError
or with except ValueError
(or even except
Exception
).
What about if we want to add functionality? Let’s consider an example
of a Polygon
class, which can calculate its perimeter.
class Polygon:
def __init__(self, side_lengths):
self.side_lengths = side_lengths
def perimeter(self):
"""Returns the perimeter of the polygon."""
return sum(self.side_lengths)
some_shape = Polygon([1, 2, 3, 4, 5])
print(some_shape.perimeter())
15
Now, we know more about triangles than we do about generic polygons,
so we can create a specialised subclass of Polygon
called
Triangle
. For example, for a triangle of sides \(a\), \(b\), and \(c\),
Heron’s formula states that the perimeter of the triangle is given by
\(\sqrt{p(p-a)(p-b)(p-c)}\), where \(p=\frac{1}{2}(a+b+c)\).
class Triangle(Polygon):
def __init__(self, side_lengths):
# Triangles have three sides
assert len(side_lengths) == 3
self.side_lengths = side_lengths
def area(self):
"""Returns the area of the triangle."""
a, b, c = self.side_lengths
p = self.perimeter() / 2
return (p * (p - a) * (p - b) * (p - c)) ** 0.5
a_triangle = Triangle([3, 4, 5])
print("Perimeter:", a_triangle.perimeter())
print("Area:", a_triangle.area())
Perimeter: 12
Area: 6.0
We’ve done a few new things here. Firstly, we’ve overridden the
__init__
method of the Polygon
parent class, since we now need to
check that the sides that the shape is being given form a triangle,
and not some other shape. This means that only the __init__
method
from the Triangle
class is called, and not the one in the Polygon
class. Next, we’ve defined a new method area
, which is only
available on the Triangle
class. We’ve also called the perimeter
method, which is defined on the Polygon
parent class—we don’t
have to recreate this, since we can use it as-is.
One niggling issue is that we are still repeating ourselves a little
here. The line self.side_lengths = side_lengths
appears in the
__init__
method of both classes. If we can, we’d like to remove this
by using the equivalent method from the Polygon
class. In principle
we could use Polygon.__init__
, but this still has some repetition,
since we have to specify the name of the Polygon
class more than
once, even though the class knows what its parent class is.
What we can do instead is make use of the super()
function. This
gives us access to the superclass (and any superclasses further up the
chain), without having to refer to any one of them by name. When we
call a method of the super()
object, Python automatically works its
way up the tree until the first class which has a method of the
correct name, and calls that. The Triangle
class would then become:
class Triangle(Polygon):
def __init__(self, side_lengths):
# Triangles have three sides
assert len(side_lengths) == 3
super().__init__(side_lengths)
def area(self):
"""Returns the area of the triangle."""
a, b, c = self.side_lengths
p = (a + b + c) / 2
return (p * (p - a) * (p - b) * (p - c)) ** 0.5
(You can see that super()
has also taken care of the self
argument
for us, which using Polygon
directly wouldn’t do.)
While in this case we have only saved a single line of repetition,
making use of super()
becomes essential as methods become
increasingly complex and build up functionality in layers.
Not implemented
If we anticipate a lot of subclasses may provide a particular method, but we can’t or don’t want to provide it on the superclass, we can add a stub method that raises
NotImplementedError
instead, so that it becomes clear if an implementation has been forgotten. For example, thearea
method ofPolygon
could be:def area(self): raise NotImplementedError
Inheriting from
object
Sometimes in older Python you will see classes inherit from
object
. This is a holdover from Python 2, where this was needed to create a “new-style” class instead of an “old-style” class. Old-style classes were removed in Python 3, with all classes being new-style ones which inherit fromobject
automatically, so you don’t need to (and shouldn’t) do this any more.
super()
placementA four-sided shape where one of the side lengths is zero is a triangle. We can adjust the
__init__
method of thePolygon
to reflect this by removing any zero-length sides before storing the list of side lengths. The method then becomes:def __init__(self, side_lengths): filtered_side_lengths = [] for side_length in side_lengths: assert side_length >= 0 if side_length > 0: filtered_side_lengths.append(side_length) self.side_lengths = filtered_side_lengths
How does this affect our implementation of
Triangle.__init__
? Adjust this so thatTriangle([3, 4, 0, 5])
works, andTriangle([3, 4, 0])
does not.Solution
We now need to call
super().__init__
before checking the lengths, and check the resulting instance variable rather than theside_lengths
argument.class Polygon: def __init__(self, side_lengths): filtered_side_lengths = [] for side_length in side_lengths: assert side_length >= 0 if side_length > 0: filtered_side_lengths.append(side_length) self.side_lengths = filtered_side_lengths def perimeter(self): """Returns the perimeter of the polygon.""" return sum(self.side_lengths) class Triangle(Polygon): def __init__(self, side_lengths): # Triangles have three sides super().__init__(side_lengths) assert len(self.side_lengths) == 3 def area(self): """Returns the area of the triangle.""" a, b, c = self.side_lengths p = (a + b + c) / 2 return (p * (p - a) * (p - b) * (p - c)) ** 0.5 a_triangle = Triangle([3, 4, 0, 5]) print("Perimeter:", a_triangle.perimeter()) print("Area:", a_triangle.area()) b_triangle = Triangle([3, 4, 0])
Perimeter: 12 Area: 6.0 --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-17-751f0372a229> in <module>() 27 print("Perimeter:", a_triangle.perimeter()) 28 print("Area:", a_triangle.area()) ---> 29 b_triangle = Triangle([3, 4, 0]) <ipython-input-17-751f0372a229> in __init__(self, side_lengths) 16 # Triangles have three sides 17 super().__init__(side_lengths) ---> 18 assert len(self.side_lengths) == 3 19 20 def area(self): AssertionError:
Where to place your call to
super()
is an important thing to consider when writing subclasses!
Rectangles
Write another subclass of
Polygon
to represent rectangles, and add a method to calculate their area.Solution
class Rectangle(Polygon): def __init__(self, side_lengths): super().__init__(side_lengths) num_sides = len(self.side_lengths) assert num_sides == 2 or num_sides == 4 if num_sides == 2: width, height = side_lengths self.side_lengths = [width, height, width, height] else: assert self.side_lengths[0] == self.side_lengths[2] assert self.side_lengths[1] == self.side_lengths[3] def area(self): return self.side_lengths[0] * self.side_lengths[1]
Polynomial plotters
In the previous episode, we wrote a
QuadraticPlotter
class for plotting quadratic functions. We know, however, that quadratics are not the only type of polynomial in the world.Write a
PolynomialPlotter
class similar toQuadraticPlotter
, and rewriteQuadraticPlotter
to be a subclass of it.Solution
from numpy import linspace from matplotlib.pyplot import subplots from matplotlib.colors import is_color_like class PolynomialPlotter: def __init__(self, color="red", linewidth=1, x_min=-10, x_max=10): assert is_color_like(color) self.color = color self.linewidth = linewidth self.x_min = x_min self.x_max = x_max def polynomial(self, x, coefficients): """For a given x and list of n+1 coefficients [a, b, c, d, ...], returns the polynomial f(x) = ax^n + bx^(n-1) + cx^(n-2) + ...""" result = 0 for coefficient in coefficients: result = result * x + coefficient return result def plot(self, coefficients): """Given the list of coefficients [a, b, c, d, ...], plot the polynomial f(x) = ax^n + bx^(n-1) + cx^(n-2) + ... . The line is plotted in the colour specified by color, and with width linewidth.""" fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot( x, self.polynomial(x, coefficients), color=self.color, linewidth=self.linewidth, ) class QuadraticPlotter(PolynomialPlotter): def plot(self, a, b, c): super().plot([a, b, c])
More general function plotters
Taking this a step further, write a more general
FunctionPlotter
class, and adjustPolynomialPlotter
to be a subclass of it.Solution
class FunctionPlotter: def __init__(self, color="red", linewidth=1, x_min=-10, x_max=10): assert is_color_like(color) self.color = color self.linewidth = linewidth self.x_min = x_min self.x_max = x_max def plot(self, function): """Plot a function of a single argument. The line is plotted in the colour specified by color, and with width linewidth.""" fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot(x, function(x), color=self.color, linewidth=self.linewidth) class PolynomialPlotter(FunctionPlotter): def plot(self, coefficients): """Given the list of coefficients [a, b, c, d, ...], plot the polynomial f(x) = ax^n + bx^(n-1) + cx^(n-2) + ... . The line is plotted in the colour specified by color, and with width linewidth.""" def polynomial(x): """For a given x and list of n+1 coefficients [a, b, c, d, ...], returns the polynomial f(x) = ax^n + bx^(n-1) + cx^(n-2) + ...""" result = 0 for coefficient in coefficients: result = result * x + coefficient return result super().plot(polynomial) class QuadraticPlotter(PolynomialPlotter): def plot(self, a, b, c): """Plot the line a * x ** 2 + b * x + c and output to the screen. x runs between x_min and x_max, with 1000 intermediary points. The line is plotted in the colour specified by color, and with width linewidth.""" super().plot([a, b, c])
Defining a function within another function as we do in
PolynomialPlotter
is a useful way of parametrising functions without having to pass arguments every time.
Key Points
Adding a class in parentheses after a class definition indicates that the new class is a subclass of the bracketed class (parent class).
The subclass inherits all of that parent class’s attributes and methods.
Defining a method with the same name as one of the parent class’s overrides it.
Use
super()
to access parent classes and their methods.
Decorators, class methods, and properties
Overview
Teaching: 20 min
Exercises: 25 minQuestions
What is a decorator?
How do I tag methods as being applicable to a class rather than an instance?
How can I add logic to process changes to instance variables?
Objectives
Understand the purpose of decorators and how they are implemented
Be able to use
@classmethod
and@property
Sometimes when we are writing software we would like to be able to attach additional functionality to a variety of functions (or classes) without writing the functionality directly into the function. Python gives us some extra syntax to make this easier.
For example, say that we want to track what functions are being called in our program. We can write a function that will take a function that we want to track as an argument, and return a new function that outputs before and after calling the function we are interested in.
def track_this(function):
def new_function():
print("Entering", function)
function()
print("Leaving", function)
return new_function
To test this out:
def say_hello():
print("Hello, world.")
say_hello = track_this(say_hello)
def say_goodbye():
print("See you later.")
say_goodbye = track_this(say_goodbye)
def conversation():
say_hello()
say_goodbye()
conversation = track_this(conversation)
conversation()
Entering <function conversation at 0x1115c07b8>
Entering <function say_hello at 0x11143be18>
Hello, world.
Leaving <function say_hello at 0x11143be18>
Entering <function say_goodbye at 0x1115c0e18>
See you later.
Leaving <function say_goodbye at 0x1115c0e18>
Leaving <function conversation at 0x1115c07b8>
So we can now see in more detail what’s going on as we move through
this program. However, having to set each function to the result of
calling track_this
on the function is laborious; it would be nicer
if there were an easier way to do this, and thankfully Python gives us
one.
@track_this
def say_hello():
print("Hello, world.")
@track_this
def say_goodbye():
print("See you later.")
@track_this
def conversation():
say_hello()
say_goodbye()
conversation()
Using @
followed by the name of the altering function (track_this
in this case), and placing this before the function definition, Python
takes the result of calling the altering function and overwrites the
new function with it.
This syntax is called a “decorator”; the functions say_hello
,
say_goodbye
, and conversation
have been decorated with the
track_this
decorator.
Our @track_this
decorator is currently not very general. We can see
a problem when we try and decorate a function that takes arguments:
Decorators and arguments
@track_this def say_something(thing_to_say): print(thing_to_say) say_something("Hello there")
TypeError Traceback (most recent call last) <ipython-input-29-000c7283eed1> in <module>() 3 print(thing_to_say) 4 ----> 5 say_something("Hello there") TypeError: new_function() takes 0 positional arguments but 1 was given
To make this more flexible, we can rewrite the
track_this
decorator as:def track_this(function): def new_function(*args, **kwargs): print("Entering", function) function(*args, **kwargs) print("Leaving", function) return new_function
The
*
and**
here carry two meanings. In the definitiondef new_function(*args, **kwargs)
, they mean “take any positional arguments and put them into a list calledargs
, and take any keyword arguments and put them into a dict calledkwargs
. In the function callfunction(*args, **kwargs)
, they mean “pass each element of the listargs
as a separate argument, and pass each element of the dictkwargs
as a keyword argument.You can also write and use decorators that themselves accept arguments by using a nested function definition, but we won’t go into detail about this today.
Double checking
Try writing a decorator that checks the result of a computation is consistent by running it twice and checking the outputs are equal. This should return the result if it is consistent, and raise an exception otherwise. Test it by decorating the
area
andperimeter
methods of thePolygon
andTriangle
classes from the previous episode.Solution
class InconsistentResultsError(AssertionError): pass def check_consistency(function): def consistent_function(*args, **kwargs): results = [function(*args, **kwargs) for _ in range(2)] if results[0] != results[1]: raise InconsistentResultsError return results[0] return consistent_function class Polygon: def __init__(self, side_lengths): filtered_side_lengths = [] for side_length in side_lengths: assert side_length >= 0 if side_length > 0: filtered_side_lengths.append(side_length) self.side_lengths = filtered_side_lengths @check_consistency def perimeter(self): """Returns the perimeter of the polygon.""" return sum(self.side_lengths) class Triangle(Polygon): def __init__(self, side_lengths): # Triangles have three sides super().__init__(side_lengths) assert len(self.side_lengths) == 3 @check_consistency def area(self): """Returns the area of the triangle.""" a, b, c = self.side_lengths p = (a + b + c) / 2 return (p * (p - a) * (p - b) * (p - c)) ** 0.5 a_triangle = Triangle([3, 4, 5]) print("Perimeter:", a_triangle.perimeter()) print("Area:", a_triangle.area())
Class methods
Sometimes we want to write functions associated with classes that are
relevant to the class as a whole, rather than to one specific
instance. We can do this by adding the @classmethod
decorator to a
method.
Class methods are most frequently used as specialised constructors,
to create instances of the class without having to supply every
argument to __init__
.
For example, revisiting the Triangle
class from earlier, we may want
to be able to define an equilateral triangle by giving a single side
length.
class Triangle(Polygon):
def __init__(self, side_lengths):
# Triangles have three sides
super().__init__(side_lengths)
assert len(self.side_lengths) == 3
@classmethod
def equilateral(cls, side_length):
return cls([side_length] * 3)
def area(self):
"""Returns the area of the triangle."""
a, b, c = self.side_lengths
p = (a + b + c) / 2
return (p * (p - a) * (p - b) * (p - c)) ** 0.5
Notice that in addition to adding the @classmethod
decorator, the
first argument which is usually self
has been replaced with
cls
. Since class methods aren’t specific to a particular instance,
there is no need to have the self
argument referring to
it. Conversely, it is useful to be able to refer to the specific class
without having to do this by name, since in general we would like
class methods to work and return the correct type of class for
subclasses as well.
Let’s test this now.
e_triangle = Triangle.equilateral(1.5)
print("Perimeter:", e_triangle.perimeter())
print("Area:", e_triangle.area())
Perimeter: 4.5
Area: 0.9742785792574935
Now we only need to supply a single number, the length of the side,
and the equilateral
class method constructs the list of three equal
side lengths from this, returning a Triangle
with three equal sides.
Squares
Add a class method to the
Rectangle
class which you wrote in the previous episode to create a square, given the length of its side.Solution
class Rectangle(Polygon): def __init__(self, side_lengths): super().__init__(side_lengths) num_sides = len(self.side_lengths) assert num_sides == 2 or num_sides == 4 if num_sides == 2: width, height = side_lengths self.side_lengths = [width, height, width, height] else: assert self.side_lengths[0] == self.side_lengths[2] assert self.side_lengths[1] == self.side_lengths[3] def area(self): return self.side_lengths[0] * self.side_lengths[1] @classmethod def square(cls, side_length): return cls([side_length] * 4)
Properties
In general, when working with classes, there is an assumption that
instance variables can be modified, unless something is done to
prevent this. In some languages, variables can be defined as
read-only, or private so that they cannot be seen from outside of the
class. Python has neither of these—any instance variable can be
modified by any piece of code using the class. There is, however, a
convention that variables and methods whose names begin with _
are
private to the implementation—while they can be accessed from
outside the class, they are not guaranteed to remain stable between
versions, and the class doesn’t guarantee to behave well if they are
changed.
To look at a specific example, what happens if we take the Triangle
class and change side_lengths
?
a_triangle = Triangle([3, 4, 5])
a_triangle.side_lengths = [3, 4, 5, 6]
print(a_triangle.area())
11 def area(self):
12 """Returns the area of the triangle."""
---> 13 a, b, c = self.side_lengths
14 p = (a + b + c) / 2
15 return (p * (p - a) * (p - b) * (p - c)) ** 0.5
ValueError: too many values to unpack (expected 3)
Our implementation of area
assumes that side_lengths
was validated
by __init__
and so has three elements, all positive. By adding a
fourth, the implementation becomes broken as a list of four elements
can’t be unpacked to three variables. Similarly,
a_polygon = Polygon([1, 2, 3, 4, 5])
a_polygon.side_lengths = "spam and eggs"
a_polygon.perimeter()
TypeError Traceback (most recent call last)
<ipython-input-50-9b8ffd74de36> in <module>()
1 a_polygon = Polygon([1, 2, 3, 4, 5])
2 a_polygon.side_lengths = "spam and eggs"
----> 3 a_polygon.perimeter()
<ipython-input-49-5ae60040e7be> in perimeter(self)
10 def perimeter(self):
11 """Returns the perimeter of the polygon."""
---> 12 return sum(self.side_lengths)
13
14
TypeError: unsupported operand type(s) for +: 'int' and 'str'
It doesn’t make sense to take the sum of a string (or more precisely, to add the individual characters together), so this also fails.
One way to fix this is to signal that this shouldn’t happen is to mark
side_lengths
as private by renaming it to _side_lengths
. However,
this removes some potentially useful functionality—it would
definitely be useful for a user of the class to be able to read the
side lengths, just not write them directly. Python provides us with an
@property
decorator that lets us do this.
class Polygon:
def __init__(self, side_lengths):
filtered_side_lengths = []
for side_length in side_lengths:
assert side_length >= 0
if side_length > 0:
filtered_side_lengths.append(side_length)
self._side_lengths = filtered_side_lengths
def perimeter(self):
"""Returns the perimeter of the polygon."""
return sum(self._side_lengths)
@property
def side_lengths(self):
return self._side_lengths
We have done two things here: self.side_lengths
has been renamed to
self._side_lengths
, indicating that it is intended to be considered
as private to the class. We have also added a new method
side_lengths
, and decorated that with the @property
decorator. This allows the result of calling this function to be
accessed as though it were an instance variable:
a_polygon = Polygon([1, 2, 3, 4, 5])
print(a_polygon.side_lengths)
[1, 2, 3, 4, 5]
However, we can’t assign to it without referring to the private
_side_lengths
:
a_polygon.side_lengths = "spam and eggs"
AttributeError Traceback (most recent call last)
<ipython-input-57-9b8ffd74de36> in <module>()
1 a_polygon = Polygon([1, 2, 3, 4, 5])
----> 2 a_polygon.side_lengths = "spam and eggs"
3 a_polygon.perimeter()
AttributeError: can't set attribute
So we have now successfully “protected” our class from changes that
will break it, by signalling to users of it what is internal to the
implementation, and what is designed for them to use. However, we have
still removed a little functionality in the process—previously,
a user could change side_lengths
without breaking things, provided
that they were careful (since the consistency checks of __init__
were being bypassed), whereas now this is not supported behaviour
(even if it is possible).
What we would like is to offer the ability to set the value for the
property, but add some kind of validation function that does that
rather than allowing it to be assigned directly. This kind of function
is called a setter, and the @property
decorator in fact allows us
to create one. (The first function, which gets the value, is referred
to as a getter.)
class Polygon:
def __init__(self, side_lengths):
self.side_lengths = side_lengths
def perimeter(self):
"""Returns the perimeter of the polygon."""
return sum(self._side_lengths)
@property
def side_lengths(self):
return self._side_lengths
@side_lengths.setter
def side_lengths(self, side_lengths):
filtered_side_lengths = []
for side_length in side_lengths:
assert side_length >= 0
if side_length > 0:
filtered_side_lengths.append(side_length)
self._side_lengths = filtered_side_lengths
We’ve moved the validation logic into the method side_length
, as
decorated by the @side_lengths.setter
decorator, and the __init__
method uses this to do its initial setup. Testing this:
a_polygon = Polygon([1, 2, 3, 4, 5])
print("Original perimeter:", a_polygon.perimeter())
a_polygon.side_lengths = ([1, 2, 3, 4, 5, 6])
print("Modified perimeter:", a_polygon.perimeter())
Original perimeter: 15
Modified perimeter: 21
More robust plotters
Adjust the
FunctionPlotter
,PolynomialPlotter
, orQuadraticPlotter
example from earlier to makecolor
a property, with a getter and a setter, with the setter checking that the the color is a valid matplotlib color.Solution
from numpy import linspace from matplotlib.pyplot import subplots from matplotlib.colors import is_color_like class FunctionPlotter: def __init__(self, color="red", linewidth=1, x_min=-10, x_max=10): self.color = color self.linewidth = linewidth self.x_min = x_min self.x_max = x_max @property def color(self): return self._color @color.setter def color(self, color): assert is_color_like(color) self._color = color def plot(self, function): """Plot a function of a single argument. The line is plotted in the colour specified by color, and with width linewidth.""" fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot(x, function(x), color=self._color, linewidth=self.linewidth)
Key Points
A decorator adds functionality to a class or function. To use the
decoratorname
decorator, add@decoratorname
one line before the class or function definition.Use the
@classmethod
decorator to indicate methods to be called from the class rather than from an instance.Use the
@property
decorator to control access to instance variables
Special methods
Overview
Teaching: 25 min
Exercises: 15 minQuestions
How can classes allow their instances to work with standard Python operators?
How can classes allow their instances to behave like iterables or collections?
How can classes allow their instances to be called like functions?
Objectives
Be able to implement methods like
__add__
,__eq__
, and__gt__
.Be able to implement methods like
__len__
,__iter__
, and__reversed__
.Be able to implement the
__call__
method.
In the previous episodes, we built a Triangle
class that could
represent a triangle by storing the lengths of its sides. Now,
mathematically speaking, two triangles with identical sides are the
same triangle. Let’s see if Python agrees with this.
a_triangle = Triangle([3, 4, 5])
the_same_triangle = Triangle([3, 4, 5])
if a_triangle == the_same_triangle:
print("Python thinks that these triangles are the same.")
else:
print("Python thinks that these are different triangles.")
Python thinks that these are different triangles.
So, despite these triangles having been constructed with exactly the same side lengths, Python distinguishes between them. By default, Python will only consider two objects to be the same if they are identical:
a_triangle = Triangle([3, 4, 5])
duplicate_triangle = a_triangle
if a_triangle == duplicate_triangle:
print("Python thinks that these triangles are the same.")
else:
print("Python thinks that these are different triangles.")
Python thinks that these triangles are the same.
This isn’t great for our triangle example—we would much prefer
if we could compare equality of triangles without having to compare
the side_lengths
property by hand. Fortunately, Python gives us a
way of doing this. If we implement the __eq__
method of Triangle
,
then Python learns how to compare triangles.
class Triangle(Polygon):
def __init__(self, side_lengths):
# Triangles have three sides
super().__init__(side_lengths)
assert len(self.side_lengths) == 3
@classmethod
def equilateral(cls, side_length):
return cls([side_length] * 3)
def area(self):
"""Returns the area of the triangle."""
a, b, c = self.side_lengths
p = (a + b + c) / 2
return (p * (p - a) * (p - b) * (p - c)) ** 0.5
def __eq__(self, other):
"""Returns True if the triangle self and the triangle other
are the same triangle"""
if not isinstance(other, Triangle):
return False
else:
# Check all permutations
if self.side_lengths == other.side_lengths:
return True
elif (
self.side_lengths[1:] + [self.side_lengths[0]] ==
other.side_lengths
):
return True
elif (
[self.side_lengths[2]] + self.side_lengths[:2] ==
other.side_lengths
):
return True
return False
a_triangle = Triangle([3, 4, 5])
the_same_triangle = Triangle([3, 4, 5])
if a_triangle == the_same_triangle:
print("Python thinks that these triangles are the same.")
else:
print("Python thinks that these are different triangles.")
Python thinks that these triangles are the same.
Great! We can compare equality. __eq__
is the second example we’ve
seen of a so-called “special”, “magic”, or “dunder” (short for “double
underscore”) method. These are methods that Python ascribes a special
meaning to; it guards the names of these with a double underscore __
on each side, so it is unlikely to collide with a name you might want
to use for a method of your own. These methods allow us to enable
instances of our classes to behave more like Python objects you’re
used to dealing with, using the typical set of operators, rather than
needing to use method calls for everything.
Let’s look at some more examples of these. Firstly, wouldn’t it be
nice if we got something more descriptive when Python referred to our
Triangle
s?
a_triangle
<__main__.Triangle at 0x1235e7b00>
We can do this by implementing the __repr__
method (short for
“representation”). This is designed to be something that looks like
Python code—ideally, something that if you pasted it back in, you’d
get the same (or at least a similar) object. For the Triangle
, this
could look like:
def __repr__(self):
return f"Triangle({self.side_lengths})"
Testing this now gives:
a_triangle = Triangle([3, 4, 5])
a_triangle
Triangle([3, 4, 5])
Other comparisons
What about if we want to know how two objects compare to each other?
We can do this by implementing __lt__
, __gt__
, __le__
, __ge__
,
and __ne__
, representing <
, >
, <=
, >=
, and !=
respectively. For example, an implementation of __lt__
might look
like:
def sides_with_max_first(self):
max_index = self.side_lengths.index(max(self.side_lengths))
if max_index == 0:
return self.side_lengths
elif max_index == 1:
return self.side_lengths[1:] + [self.side_lengths[0]]
else:
return [self.side_lengths[2]] + [self.side_lengths[:2]]
def __lt__(self, other):
if self.area() != other.area():
return self.area() < other.area()
elif self.perimeter() != other.perimeter():
return self.perimeter() < other.perimeter()
elif self == other:
return False
else:
return self.sides_with_max_first() < other.sides_with_max_first()
Testing this:
a_triangle = Triangle([3, 4, 5])
b_triangle = Triangle([5, 12, 13])
a_triangle < b_triangle
True
Python then does two very nice things for us: firstly, since we have
defined __lt__
, it can now sort lists of Triangle
s for us. And
while we could leave only the <
operator defined, this could be
confusing for those using the class; fortunately, given
implementations of __eq__
and __lt__
, Python can automatically
generate the other relational operators using the
functools.total_ordering
decorator.
from functools import total_ordering
@total_ordering
class Triangle(Polygon):
...
Sorting random triangles
Add a class method that generates a triangle with three random edge lengths (for example, using
random.random()
. Use this to construct and sort a list of 10 random triangles.Solution
Add an import at the top of the file:
from random import random
Also add new class method:
@classmethod def random(cls): """Returns a triangle with three random length sides in the range [0, 1). If the sum of the two short sides isn't longer than the long side (and so the triangle doesn't close), then try again. There is an infinitesimal probability that this method will never return, as randomness keeps delivering invalid triangles.""" random_triangle = cls([random(), random(), random()]) while isinstance(random_triangle.area(), complex): random_triangle = cls([random(), random(), random()]) return random_triangle
Testing this:
random_triangles = [Triangle.random() for _ in range(10)] [triangle.area() for triangle in sorted(random_triangles)]
Arithmetic
In the same way that
__lt__
and friends correspond to relational operators, arithmetic operations like+
,-
,*
, etc. can be defined with methods like__add__
,__sub__
, and__mul__
.Define a new class
ErrorBar
to represent a number with an associated error in Gaussian statistics. Add__init__
,__repr__
,__add__
,__sub__
,__mul__
, and__truediv__
methods, making the (very unreasonable) assumption that all errors are uncorrelated.Solution
class ErrorBar: def __init__(self, centre, error): self.centre = centre self.error = error def __repr__(self): return f"{self.centre} ± {self.error}" def __add__(self, other): centre = self.centre + other.centre error = (self.error ** 2 + other.error ** 2) ** 0.5 return ErrorBar(centre, error) def __sub__(self, other): centre = self.centre - other.centre error = (self.error ** 2 + other.error ** 2) ** 0.5 return ErrorBar(centre, error) def __mul__(self, other): centre = self.centre * other.centre error = centre * ((self.error / self.centre) ** 2 + (other.error / other.centre) ** 2) ** 0.5 return ErrorBar(centre, error) def __truediv__(self, other): centre = self.centre / other.centre error = centre * ((self.error / self.centre) ** 2 + (other.error / other.centre) ** 2) ** 0.5 return ErrorBar(centre, error)
Callable objects
By implementing the __call__
method, we can allow instances of a
class to be called like functions. For example, returning to the
FunctionPlotter
example:
from numpy import linspace, sin
from matplotlib.colors import is_color_like
from matplotlib.pyplot import show, subplots
class FunctionPlotter:
def __init__(self, color="red", linewidth=1, x_min=-10, x_max=10):
self.color = color
self.linewidth = linewidth
self.x_min = x_min
self.x_max = x_max
@property
def color(self):
return self._color
@color.setter
def color(self, color):
assert is_color_like(color)
self._color = color
def plot(self, function):
"""Plot a function of a single argument.
The line is plotted in the colour specified by color, and with width
linewidth."""
fig, ax = subplots()
x = linspace(self.x_min, self.x_max, 1000)
ax.plot(x, function(x), color=self._color, linewidth=self.linewidth)
def __call__(self, *args, **kwargs):
return self.plot(*args, **kwargs)
plotter = FunctionPlotter()
plotter(sin)
show()
Subclassing with
__call__
Do we need to redefine
__call__
on each subclass ofFunctionPlotter
to get the correct version of theplot()
function? Why/why not?Solution
No;
self
returns the current instance, so the call toself.plot()
will pick up the correct version ofplot()
for whichever class the instance is.
Collections and iterables
Python also gives us the power to make our objects behave like
iterable or collection types (for example tuples, lists, dicts, and
generators). For example, to let instances of the class behave with
the len()
function, we implement __len__()
. For example, adding
this to the Polygon
class:
def __len__(self):
return len(self.side_lengths)
will define the length of the object as the number of edges that the
Polygon
has. (Note that we shouldn’t make this the
perimeter—Python expects len()
to return a non-negative
integer.) Testing this,
a_polygon = Polygon([1, 2, 3, 4, 5])
print(len(a_polygon))
5
We can also let our code loop over elements of our objects by
implementing the __iter__()
method, which should return an
iterator; this is a particular type of object in Python that makes
things like for
loops work. We can get one of these from any
iterable via the iter()
function.
def __iter__(self):
return iter(self.side_lengths)
We can now iterate through the sides of our Polygon
s without having
to get the side_lengths
property each time.
a_polygon = Polygon([1, 2, 3, 4, 5])
for side_length in a_polygon:
print(side_length)
1
2
3
4
5
In reverse
The
reversed()
function returns an iterator over the elements of an iterable or collection going backwards. This is implemented for classes via the__reversed__
method. Implement this for thePolygon
class, and test your implementation.Solution
Method:
def __reversed__(self): return reversed(self.side_lengths)
Test:
a_polygon = Polygon([1, 2, 3, 4, 5]) for side_length in reversed(a_polygon): print(side_length)
5 4 3 2 1
Getting specific elements
You can also allow your code to access elements via square brackets, just like with lists. The
__getitem__()
method does this, taking the index (or key) being sought as its argument.For a non-dict-like collection,
__getitem__()
can work for both integer indices and for slices.Implement
__getitem__()
for thePolygon
class. Since in our current implementation ofPolygon
, it doesn’t make sense to take a subset of the sides, requesting a slice should raiseIndexError
; only requesting a single element with an integer index should work.Solution
Implementation:
def __getitem__(self, key): if type(key) is int: return self.side_lengths[key] else: raise IndexError
Test:
a_polygon = Polygon([1, 2, 3, 4, 5]) print(a_polygon[2])
3
for
loops with__getitem__()
Once a class has
__getitem__()
defined, then Python will automatically work out how to loop over it, even in the absence of__iter__()
(although adding this does make it more efficient). Even beter, when__len__()
is also implemented, then Python automatically knows how toreversed()
the class as well.Test this by removing the implementations of
__iter__()
and__reversed__()
fromPolygon
and testing the loops forwards and backwards again.
More dunder methods
Python offers many more dunder methods than could possibly be covered in this episode. A full listing, categorised by the functions that they serve, can be found in the Python documentation
Key Points
Implement methods like
__eq__
,__add__
, and__gt__
to allow operations such as arithmetic and comparisons.Implement
__repr__
to get more meaningful printouts when you output an object.Implement methods like
__len__
,__iter__
, and__reversed__
to make instances of a class behave like a collection or iterable.Implement the
__call__
method to make instances of a class callable like functions.
Duck typing and interfaces
Overview
Teaching: 10 min
Exercises: 15 minQuestions
How does Python decide what you can and can’t do with an object?
When is inheritance not appropriate?
What alternatives are there to inheritance?
Objectives
Understand how duck typing works, and how interfaces assist with understanding this.
Understand the circumstances where inheritance can be a hindrance rather than a help.
Be aware of concepts such as composition which can help where inheritance fails.
There is a principle that if something “looks like a duck, and swims like a duck, and quacks like a duck, then it is probably a duck”.
Python’s type system adopts a similar philosophy—if it looks, swims, and quacks like a duck, and that’s the only duck-like aspects that we need at a particular time, then as far as Python is concerned, then it is a duck.
For example, the Newton–Raphson method solves equations of the form \(f(x)=0\) iteratively from a starting point \(x_0\) as \[x_{n+1}=x_n - \frac{f(x_n)}{f’(x_n)}\;.\] We could implement this in Python as:
def newton(function, derivative, initial_estimate, num_iters=10):
"""Solves the equation `function`(x) == 0 using the Newton–Raphson
method with `num_iters` iterations, starting from `initial_estimate`.
`derivative` is the derivative of `function` with respect to x."""
current_estimate = initial_estimate
for _ in range(num_iters):
current_estimate = (
current_estimate
- function(current_estimate) / derivative(current_estimate)
)
return current_estimate
This clearly works with functions that operate on and return real numbers.
from math import sin, cos
print(newton(sin, cos, 1))
print(newton(sin, cos, 2))
print(newton(sin, cos, 1.5))
0.0
3.141592653589793
-12.566370614359172
If you only planned for this to work with real numbers, you might
think of adding a check at the start of the function that the
initial_estimate
given is a real number, or that each successive
current_estimate
is real. However, if we think about this in a duck
typed way, we don’t really need to care about this—provided that
the values can be subtracted and divided, and function
and
derivative
can operate on them, then the algorithm will work.
This means that we can apply this function to cases we may not have considered. For example, when \(f(z)\) is a polynomial, then plotting the solution \(z_n\) (which is now a complex number) obtained as a function of the initial estimate \(z_0\) gives us Newton’s fractal.
%matplotlib inline
from numpy import angle, linspace, newaxis, pi
from matplotlib.pyplot import colorbar, show, subplots
def complex_linspace(lower, upper, num_real, num_imag):
real_space = linspace(lower.real, upper.real, num_real)
imag_space = linspace(lower.imag, upper.imag, num_imag) * 1J
return real_space + imag_space[:, newaxis]
def test_polynomial(x):
return x ** 3 - 1
def test_derivative(x):
return 3 * x ** 2
z_min = -1 - 1J
z_max = 1 + 1J
initial_z = complex_linspace(z_min, z_max, 1000, 1000)
results = newton(test_polynomial, test_derivative, initial_z, 20)
fig, ax = subplots()
image = ax.imshow(
angle(results),
vmin=-3,
vmax=3,
extent=(z_min.real, z_max.real, z_min.imag, z_max.imag),
)
cbar = colorbar(image, ax=ax, ticks=(-2*pi/3, 0, 2*pi/3))
cbar.set_label(r"$\arg(z_n)$")
cbar.ax.set_yticklabels((r"$-\frac{2\pi}{3}$", "0", r"$\frac{2\pi}{3}$"))
ax.set_xlabel(r"$\operatorname{Re}(z_0)$")
ax.set_ylabel(r"$\operatorname{Im}(z_0)$")
show()
Because our Newton–Raphson function was duck typed, it automatically worked for this problem, despite this problem requiring Numpy arrays of complex numbers rather than the real numbers we thought we were writing for.
Protocols
It is frequently useful to codify exactly what requirements are placed on an object (or duck) so that we can design classes to match. In Python, when these requirements are documented, the specification is called a protocol; you may also hear the word (informal) interface used to describe this as well.
An example of a well-known protocol in Python is the iterator
protocol, which should be obeyed by objects returned by the
__iter__()
method. For a class to support the iterator protocol, it
must have two methods:
__iter__()
, which returns the object itself. (This is so that an iterator can be given to afor
loop directly, which is sometimes desirable rather than relying on it being returned by the__iter__()
method of a collection-type object.)__next__()
, which returns the next item in the sequence. If there are no more items, then this should raise theStopIteration
exception, and successive calls should keep raising this exception.
For instance, an iterator that returns the Fibonacci numbers up to some upper bound may look something like:
class FibonacciIterator:
def __init__(self, max_value):
self.max_value = max_value
self.last_two_numbers = (1, 0)
def __iter__(self):
return self
def __next__(self):
next_number = sum(self.last_two_numbers)
if self.max_value < next_number:
raise StopIteration
else:
self.last_two_numbers = (self.last_two_numbers[1], next_number)
return next_number
This is an example of where we want to use the iterator directly with
the for
loop, since we want to initialise it with a
max_value
. Testing this:
for number in FibonacciIterator(100):
print(number)
1
1
2
3
5
8
13
21
34
55
89
Triangular numbers
Write a class that implements the iterator protocol and that returns the first \(n\) triangular numbers. These are defined such that the \(n\)th triangular number is the sum of the first \(n\) positive integers, so the first five are 1, 3, 6, 10, and 15.
What would happen if you removed the upper bound (and so never raised
StopIteration
) and used the iterator in afor
loop? When might this behaviour be useful?Solution
class TriangularIterator: def __init__(self, n): self.n = n self.total = 0 self.index = 0 def __iter__(self): return self def __next__(self): if self.index >= self.n: raise StopIteration self.index += 1 self.total += self.index return self.total
A loop over an iterator that can’t raise
StopIteration
will run forever. This could be useful if you’re usingzip()
to iterate over another, bounded, iterable at the same time; then each element will get a corresponding triangular number, no matter how many elements there are.
Spot the problem
Look back at the solutions for the
QuadraticPlotter
,PolynomialPlotter
, andFunctionPlotter
. What problems do you see with theplot
method of these classes?Solution
The arguments to
FunctionPlotter.plot()
,PolynomialPlotter.plot()
, andQuadraticPlotter.plot()
are all different—one expects a callable, one expects a list of coefficients as one argument, and one expects three coefficients as separate arguments. In general, specialistations of a class should keep the same interface to its functions, and the parent class should be interchangeable with its specialisations.
Over to you
Thinking about your own research software, what kind of places might an interface be useful to better codify how different parts of the software interact?
Abstract base classes
Python also allows us to go a step further than a protocol, and
formalise the requirements we place on our interfaces in code. An
abstract base class is a class that must be inherited from—you
can’t create instances of it directly. Python provides these for many
of its protocols in the collections.abc
module. For example, the
Fibonacci iterator above could inherit from abc.Iterator
. This would
allow other code to check in advance that it supports the protocol,
and also would guard against us forgetting to implement some part of
the protocol. For example, if we forgot the __next__()
method:
from collections.abc import Iterator
class FibonacciIterator(Iterator):
def __init__(self, max_value):
self.max_value = max_value
self.last_two_numbers = (1, 0)
def __iter__(self):
return self
for number in FibonacciIterator(100):
print(number)
In this case Python gives us an error:
TypeError Traceback (most recent call last)
<ipython-input-3-a96ac2788df3> in <module>
5 self.last_two_numbers = (1, 0)
6
----> 7 for number in FibonacciIterator(100):
8 print(number)
TypeError: Can't instantiate abstract class FibonacciIterator with abstract methods __next__
This can be useful when working with more complex interfaces. (On the
other hand, removing the __iter__()
method works fine, because
abc.Iterator
helpfully defines __iter__()
for us, so we can
inherit it.)
Implementing multiple interfaces
You may find yourself wanting to implement multiple interfaces in a single class. This is possible by making use of multiple inheritance, where a class inherits from more than one base class. This is not supported in all programming languages, and in many programming languages it is considered to be problematic. It is more common in Python, but we don’t have space to go into detail about it in this lesson.
Hashable
Polygon
sThe hashable protocol allows classes to be used as dictionary keys and as members of sets. Look up the hashable protocol and adjust the
Polygon
class so that it follows this.Test this by using a
Triangle
instance as a dict key:triangle_descriptions = { Triangle([3, 4, 5]): "The basic Pythagorean triangle" }
Solution
The hashable protocol requires implementing one method,
__hash__()
, which should return a hash of the aspects of the instance that make it unique. Lists can’t be hashed, so we also need to turn the list ofside_lengths
into a tuple.def __hash__(self): return hash(tuple(self.side_lengths))
Composition
Composition is a technique where rather than adding more and more functionality to a single class (either explicitly, or via inheritance), functionality is added by adding instances of other classes that group together the related functionality.
An example of a library that makes heavy use of composition is the
Matplotlib object-oriented API. While Matplotlib makes its pyplot
API available for basic plotting, it is built on top of a very
intricate hierarchy of classes and objects. Those who want more
control over their plots are encouraged to use this interface instead
of the simplified pyplot
version.
To get a feel for how Matplotlib uses composition to separate its concerns while having a large amount of functionality, we can write a small test function to recursively walk through a member variables of an object that are themselves instances of a non-builtin class.
from matplotlib.pyplot import subplots
def traverse_objects(base_object, level=0, max_level=5):
"""Recursively walk through the member variables of base_object,
and print out information about each that is an instance of a
non-built-in class. max_level controls the depth that the
recursion may continue to, to avoid infinite loops."""
if hasattr(base_object, "__dict__") and level < max_level:
for child_name, child_object in vars(base_object).items():
if child_object.__class__.__module__ != "builtins":
print(" " * level, child_name, ":", type(child_object))
traverse_objects(child_object, level=level+1)
# Create a simple plot
fig, ax = plt.subplots()
ax.scatter([1, 2, 3], [1, 4, 9])
ax.scatter([1, 1.5, 2, 2.5, 3], [1, 1, 2, 3, 5])
# Inspect the object hierarchy of ths figure object
traverse_objects(fig)
This gives a lot of output—72 lines, so in principle 72
different classes are combining here. In practice this number is not
accurate; there is some duplication in this list, since for example
both canvas
and patch
have a figure
member variable so that they
can refer back to the Figure
that they work with. Conversely, this
simple traversal ignores some additional composition; for example,
fig._axstack._elements
is a list of tuples, but within some of those
tuples are more objects of type matplotlib.gridspec.SubplotSpec
and
matplotlib.axes._subplots.AxesSubplot
.
This is why when you have errors in your code, tracebacks from some libraries can be quite long. Having lots of small methods in classes that are dedicated to one very specific aspect means that it is easier to reason about what each one is doing in isolation by itself, but can make it more complicated to get a view of the big picture.
Composing plotters
How could the
FunctionPlotter
,PolynomialPlotter
, andQuadraticPlotter
be refactored to make use of composition instead of inheritance?Solution
One way of doing this is to define a “plottable function” interface. An object respecting this interface would:
- be callable
- accept one argument
- return \(f(x)\)
Then, with the
FunctionPlotter
as defined previously, there is no need to subclass to createQuadraticPlotter
s andPolynomialPlotter
s; instead, we can define aQuadraticFunction
class as:class Quadratic: def __init__(self, a, b, c): self.a = a self.b = b self.c = c def __call__(self, x): return self.a * x ** 2 + self.b * x + self.c
This can then be passed to a
FunctionPlotter
:plotter = FunctionPlotter() plotter.plot(Quadratic(1, -1, 1))
Alternatively, we can encapsulate the function to be plotted as part of the class.
from matplotlib.colors import is_color_like class FunctionPlotter: def __init__(self, function, color="red", linewidth=1, x_min=-10, x_max=10): assert is_color_like(color) self.color = color self.linewidth = linewidth self.x_min = x_min self.x_max = x_max self.function = function def plot(self): """Plot a function of a single argument. The line is plotted in the colour specified by color, and with width linewidth.""" fig, ax = subplots() x = linspace(self.x_min, self.x_max, 1000) ax.plot(x, self.function(x), color=self.color, linewidth=self.linewidth)
This could then be used as:
from numpy import sin sin_plotter = FunctionPlotter(sin) quadratic_plotter = FunctionPlotter(Quadratic(1, -1, 1), color="blue") sin_plotter.plot() quadratic_plotter.plot() show()
Key Points
Provided a class exposes all required functionality for an operation to work, Python allows it.
Only use inheritance to express relationships where the subclass is the same kind of thing as the superclass.
Implementing interfaces and adding functionality with composition can be better alternatives to inheritance in some cases.