Memoization using Closures
Published 06 Jan 2011. Tags: computer-science, python.
I wrote a little article about how to use closures in Python a couple weeks ago, but I’m not convinced that I really gave a good motivation for why we might want to use them. The example I gave was a little bit simplistic—here’s a better one.
First, we need to talk about memoization (if you’re familiar with memoization already, feel free to skip along a bit). Suppose we want to write a function to calculate the nth Fibonacci number. We might first write a naïve recursive function:
def fib(n): if n < 2: return n else: return fib(n - 1) + fib(n - 2)
This does the job (assuming we give it appropriate input), but it’s actually
terribly inefficient. Check out the call tree for
We’ve got a few redundant calls here (notably
fib(3)), and for
higher values of
n they really start adding up. As it turns out, this
Fibonacci implementation is O(2n). Our elegantly simple algorithm is
awful, just awful.
But we can salvage it! If we were to record our calculated values (by closing over our already-calculated values, say) we could make this thing phenomenally more efficient.
def memoized_fib(n): values = [0, 1] def fib_helper(k): if len(values) > k: result = values[k] else: result = fib_helper(k - 1) + fib_helper(k - 2) values.append(result) return result return fib_helper(n)
We’re creating a list called
values, for which each entry contains the
Fibonacci number associated with its index, starting with the first two. After
we’ve recursed all the way down the first time, we build up the list as we pop
functions off the stack, so subsequent calls to
fib_helper don’t really need
to do any recursion at all—they can just look up the appropriate value in
values. The list
values is hidden inside a closure, so it’s protected from
the outside world. We certainly could have implemented memoization using a
global variable, but why would we if we we don’t have to? Our closure provides
an equally efficient and more elegant solution to the problem.
Just to drive home the efficiency gain,
memoized_fib(500) takes 0.014s to run
on my machine;
fib(500) is still calculating an hour later. Which makes sense,
since we’ve replaced an exponential algorithm with a linear one. Feels good.
And, yes, it’s totally true that we could have just calculated the Fibonacci numbers iteratively and avoided the whole fiasco. But what would we have learned?