title: ‘Exercise with Fractals’ teaching: 10 exercises: 50 —
- Can we tackle a real problem now?
- Create a strategy to parallelize existing code.
- Apply previous lessons.
The Mandelbrot and Julia fractals
This exercise uses Numpy and Matplotlib.
from matplotlib import pyplot as plt
import numpy as np
We will be computing the famous Mandelbrot fractal.
Complex numbers
Complex numbers are a special representation of rotations and scalings in the two-dimensional plane. Multiplying two complex numbers is the same as taking a point, rotate it by an angle \(\phi\) and scale it by the absolute value. Multiplying with a number \(z \in \mathbb{C}\) by 1 preserves \(z\). Multiplying a point at \(i = (0, 1)\) (having a positive angle of 90 degrees and absolute value 1), rotates it anti-clockwise by 90 degrees. Then you might see that \(i^2 = (-1, 0)\). The funny thing is that we can treat \(i\) as any ordinary number, and all our algebra still works out. This is actually nothing short of a miracle! We can write a complex number
\[z = x + iy,\]
remember that \(i^2 = -1\), and act as if everything is normal!
The Mandelbrot set is the set of complex numbers \[c \in \mathbb{C}\] for which the iteration
\[z_{n+1} = z_n^2 + c,\]
converges, starting from iteration at \(z_0 = 0\). We can visualize the Mandelbrot set by plotting the number of iterations needed for the absolute value \(|z_n|\) to exceed 2 (for which it can be shown that the iteration always diverges).

We may compute the Mandelbrot as follows:
= 256
max_iter = 256
width = 256
height = -0.8 + 0.0j
center = 3.0 + 3.0j
extent = max((extent / width).real, (extent / height).imag)
scale
= np.zeros((height, width), int)
result for j in range(height):
for i in range(width):
= center + (i - width // 2 + 1j * (j - height // 2)) * scale
c = 0
z for k in range(max_iter):
= z**2 + c
z if (z * z.conjugate()).real > 4.0:
break
= k result[j, i]
Then we can plot with the following code:
= plt.subplots(1, 1, figsize=(10, 10))
fig, ax = (width + 1j * height) * scale
plot_extent = center - plot_extent / 2
z1 = z1 + plot_extent
z2 **(1 / 3), origin='lower', extent=(z1.real, z2.real, z1.imag, z2.imag))
ax.imshow(result"$\Re(c)$")
ax.set_xlabel("$\Im(c)$") ax.set_ylabel(
Things become really loads of fun when we zoom in. We can play around
with the center
and extent
values, and
necessarily max_iter
, to control our window:
= 1024
max_iter = -1.1195 + 0.2718j
center = 0.005 + 0.005j extent
When we zoom in on the Mandelbrot fractal, we get smaller copies of the larger set!

Exercise
Turn this into an efficient parallel program. What kind of speed-ups do you get?
Solution
Create a BoundingBox
class
We start with a naive implementation. It may be convenient to define
a BoundingBox
class in a separate module
bounding_box.py
. We add methods to this class later on.
from dataclasses import dataclass
from typing import Optional
import numpy as np
import dask.array as da
@dataclass
class BoundingBox:
int
width: int
height: complex
center: complex
extent: float] = None
_scale: Optional[
@property
def scale(self):
if self._scale is None:
self._scale = max(self.extent.real / self.width,
self.extent.imag / self.height)
return self._scale
<<bounding-box-methods>>
= BoundingBox(1024, 1024, -1.1195+0.2718j, 0.005+0.005j) test_case
Solution
Plotting function
import matplotlib # type:ignore
="Agg")
matplotlib.use(backendfrom matplotlib import pyplot as plt
import numpy as np
from bounding_box import BoundingBox
def plot_fractal(box: BoundingBox, values: np.ndarray, ax=None):
if ax is None:
= plt.subplots(1, 1, figsize=(10, 10))
fig, ax else:
= None
fig = (box.width + 1j * box.height) * box.scale
plot_extent = box.center - plot_extent / 2
z1 = z1 + plot_extent
z2 ='lower', extent=(z1.real, z2.real, z1.imag, z2.imag),
ax.imshow(values, origin=matplotlib.colormaps["viridis"])
cmap"$\Re(c)$")
ax.set_xlabel("$\Im(c)$")
ax.set_ylabel(return fig, ax
Solution
Some solutions
The natural approach with Python is to speed this up with Numba.
Then, there are three ways to parallelize: first, letting Numba
parallelize the function; second, doing a manual domain decomposition
and using one of the many Python ways to run multi-threaded things;
third, creating a vectorized function and parallelizing it using
dask.array
. This last option is almost always slower than
@njit(parallel=True)
and domain decomposition.
Solution
Numba (serial)
When we port the core Mandelbrot function to Numba, we need to keep some best practices in mind:
- Don’t pass composite objects other than Numpy arrays.
- Avoid acquiring memory inside a Numba function; rather, create an array in Python and then pass it to the Numba function.
- Write a Pythonic wrapper around the Numba function for easy use.
from typing import Any, Optional
import numba # type:ignore
import numpy as np
from bounding_box import BoundingBox
@numba.njit(nogil=True)
def compute_mandelbrot_numba(
int, height: int, center: complex,
result, width: complex, max_iter: int):
scale: for j in range(height):
for i in range(width):
= center + (i - width // 2 + 1j * (j - height // 2)) * scale
c = 0.0 + 0.0j
z for k in range(max_iter):
= z**2 + c
z if (z * z.conjugate()).real >= 4.0:
break
= k
result[j, i] return result
def compute_mandelbrot(
int,
box: BoundingBox, max_iter: = None,
result: Optional[np.ndarray[np.int64]] = None):
throttle: Any = result if result is not None \
result else np.zeros((box.height, box.width), np.int64)
return compute_mandelbrot_numba(
result, box.width, box.height, box.center, box.scale,=max_iter) max_iter
Numba parallel=True
We can parallelize loops directly with Numba. Pass the flag
parallel=True
and use prange
to create the
loop. Here, it is even more important to obtain the result array outside
the context of Numba, otherwise the result will be slower than the
serial version.
from typing import Optional
import numba # type:ignore
from numba import prange # type:ignore
import numpy as np
from .bounding_box import BoundingBox
@numba.njit(nogil=True, parallel=True)
def compute_mandelbrot_numba(
int, height: int, center: complex, scale: complex,
result, width: int):
max_iter: for j in prange(height):
for i in prange(width):
= center + (i - width // 2 + (j - height // 2) * 1j) * scale
c = 0.0+0.0j
z for k in range(max_iter):
= z**2 + c
z if (z*z.conjugate()).real >= 4.0:
break
= k
result[j, i] return result
def compute_mandelbrot(box: BoundingBox, max_iter: int,
int] = None):
throttle: Optional[if throttle is not None:
numba.set_num_threads(throttle)= np.zeros((box.height, box.width), np.int64)
result return compute_mandelbrot_numba(
result, box.width, box.height, box.center, box.scale,=max_iter) max_iter
Solution
Domain splitting
We split the computation into a set of sub-domains. The
BoundingBox.split()
method is designed so that, if we
deep-map the resulting list-of-lists, we can recombine the results using
numpy.block()
.
def split(self, n):
"""Split the domain in nxn subdomains, and return a grid of BoundingBoxes."""
= self.width // n
w = self.height // n
h = self.scale * w + self.scale * h * 1j
e = self.center - e * (n / 2 - 0.5)
x0 return [[BoundingBox(w, h, x0 + i * e.real + j * e.imag * 1j, e)
for i in range(n)]
for j in range(n)]
To perform the computation in parallel, let’s go ahead and choose the
most difficult path: asyncio
. There are other ways to do
this, like setting up a number of threads or using Dask. However,
asyncio
is available in Python natively. In the end, the
result is very similar to what we would get using
dask.delayed
.
This may seem as a lot of code, but remember: we only use Numba to compile the core part and then Asyncio to parallelize. The progress bar is a bit of flutter and the semaphore is only there to throttle the computation to fewer cores. Even then, this solution is the most extensive by far but also the fastest.
from typing import Optional
import numpy as np
import asyncio
from psutil import cpu_count # type:ignore
from contextlib import nullcontext
from .bounding_box import BoundingBox
from .numba_serial import compute_mandelbrot as mandelbrot_serial
async def a_compute_mandelbrot(
box: BoundingBox,int,
max_iter:
semaphore: Optional[asyncio.Semaphore]):async with semaphore or nullcontext():
= np.zeros((box.height, box.width), np.int64)
result await asyncio.to_thread(
=result)
mandelbrot_serial, box, max_iter, resultreturn result
async def a_domain_split(box: BoundingBox, max_iter: int,
sem: Optional[asyncio.Semaphore]):= cpu_count(logical=True)
n_cpus = box.split(n_cpus)
split = await asyncio.gather(
split_result *(asyncio.gather(
*(a_compute_mandelbrot(b, max_iter, sem)
for b in row))
for row in split))
return np.block(split_result)
def compute_mandelbrot(box: BoundingBox, max_iter: int,
int] = None):
throttle: Optional[= asyncio.Semaphore(throttle) if throttle is not None else None
sem return asyncio.run(a_domain_split(box, max_iter, sem))
Solution
Numba vectorize
Another solution is to use Numba’s @guvectorize
decorator. The speed-up (on my machine) is not as dramatic as with the
domain decomposition, though.
def grid(self):
"""Return the complex values on the grid in a 2d array."""
= self.center - self.extent / 2
x0 = self.center + self.extent / 2
x1 = np.mgrid[x0.imag:x1.imag:self.height*1j,
g self.width*1j]
x0.real:x1.real:return g[1] + g[0]*1j
def da_grid(self):
"""Return the complex values on the grid in a 2d array."""
= self.center - self.extent / 2
x0 = self.center + self.extent / 2
x1 = np.linspace(x0.real, x1.real, self.width, endpoint=False)
x = np.linspace(x0.imag, x1.imag, self.height, endpoint=False)
y = da.meshgrid(x, y)
g return g[1] + g[0]*1j
from typing import Any
from numba import guvectorize, int64, complex128 # type:ignore
import numpy as np
from .bounding_box import BoundingBox
@guvectorize([(complex128[:, :], int64, int64[:, :])],
"(n,m),()->(n,m)",
=True)
nopythondef compute_mandelbrot_numba(inp, max_iter: int, result):
for j in range(inp.shape[0]):
for i in range(inp.shape[1]):
= inp[j, i]
c = 0.0+0.0j
z for k in range(max_iter):
= z**2 + c
z if (z*z.conjugate()).real >= 4.0:
break
= k
result[j, i]
def compute_mandelbrot(box: BoundingBox, max_iter: int, throttle: Any = None):
= np.zeros((box.height, box.width), np.int64)
result = box.grid()
c
compute_mandelbrot_numba(c, max_iter, result)return result
Solution
Benchmarks
from typing import Optional
import timeit
from . import numba_serial, numba_parallel, vectorized, domain_splitting
from .bounding_box import BoundingBox, test_case
= BoundingBox(16, 16, 0.0+0.0j, 1.0+1.0j)
compile_box = test_case
timing_box
def compile_run(m):
1)
m.compute_mandelbrot(compile_box,
def timing_run(m, throttle: Optional[int] = None):
1024, throttle=throttle)
m.compute_mandelbrot(timing_box,
= ["numba_serial:1", "vectorized:1"] \
modules + [f"domain_splitting:{n}" for n in range(1, 9)] \
+ [f"numba_parallel:{n}" for n in range(1, 9)]
if __name__ == "__main__":
with open("timings.txt", "w") as out:
= ["name", "n", "min", "mean", "max"]
headings print(f"{headings[0]:<20}" \
f"{headings[1]:>10}" \
f"{headings[2]:>10}" \
f"{headings[3]:>10}" \
f"{headings[4]:>10}",
file=out)
for mn in modules:
= mn.split(":")
m, n = int(n)
n_cpus = f"from mandelbrot.bench_all import timing_run, compile_run\n" \
setup f"from mandelbrot import {m}\n" \
f"compile_run({m})"
= timeit.repeat(
times =f"timing_run({m}, {n_cpus})",
stmt=setup,
setup=1,
number=50)
repeatprint(f"{m:20}" \
f"{n_cpus:>10}" \
f"{min(times):10.5g}" \
f"{sum(times)/len(times):10.5g}" \
f"{max(times):10.5g}",
file=out)
import pandas as pd
from plotnine import ggplot, geom_point, geom_ribbon, geom_line, aes
= pd.read_table("timings.txt", delimiter=" +", engine="python")
timings = ggplot(timings, aes(x="n", y="mean", ymin="min", ymax="max",
plot ="name", fill="name")) \
color+ geom_ribbon(alpha=0.3, color="none") \
+ geom_point() + geom_line()
"mandelbrot-timings.svg") plot.save(
Extra: Julia sets
For each value \[c\] we can compute the Julia set, namely the set of starting values \[z_1\] for which the iteration over \[z_{n+1}=z_n^2 + c\] converges. Every location on the Mandelbrot image corresponds to its own unique Julia set.
= 256
max_iter = 0.0+0.0j
center = 4.0+3.0j
extent = max((extent / width).real, (extent / height).imag)
scale
= np.zeros((height, width), int)
result = -1.1193+0.2718j
c
for j in range(height):
for i in range(width):
= center + (i - width // 2 + (j - height // 2)*1j) * scale
z for k in range(max_iter):
= z**2 + c
z if (z * z.conjugate()).real > 4.0:
break
= k result[j, i]
If we take the centre of the last image, we get the following rendering of the Julia set:

Generalize
Can you generalize your Mandelbrot code to compute both the Mandelbrot and the Julia sets efficiently, while reusing as much code as possible?
- Actually making code faster is not always straightforward.
- Easy one-liners can get you 80% of the way.
- Writing clean and modular code often makes parallelization easier later on.