Memoization using Closures

I wrote a little article about how to use closures in Python a couple weeks ago, but I’m not convinced that I really gave a good motivation for why we might want to use them. The example I gave was a little bit simplistic—here’s a better one.

First, we need to talk about memoization (if you’re familiar with memoization already, feel free to skip along a bit). Suppose we want to write a function to calculate the n^th Fibonacci number. We might first write a naïve recursive function:

def fib(n):
  if n < 2:
    return n
  else:
    return fib(n - 1) + fib(n - 2)

This does the job (assuming we give it appropriate input), but it’s actually terribly inefficient. Check out the call tree for fib(5):

fibonacci call tree

We’ve got a few redundant calls here (notably fib(2) and fib(3)), and for higher values of n they really start adding up. As it turns out, this Fibonacci implementation is O(2ⁿ). Our elegantly simple algorithm is awful, just awful.

But we can salvage it! If we were to record our calculated values (by closing over our already-calculated values, say) we could make this thing phenomenally more efficient.

def memoized_fib(n):
  values = [0, 1]

  def fib_helper(k):
    if len(values) > k:
      result = values[k]
    else:
      result = fib_helper(k - 1) + fib_helper(k - 2)
      values.append(result)
    return result

  return fib_helper(n)

We’re creating a list called values, for which each entry contains the Fibonacci number associated with its index, starting with the first two. After we’ve recursed all the way down the first time, we build up the list as we pop functions off the stack, so subsequent calls to fib_helper don’t really need to do any recursion at all—they can just look up the appropriate value in values. The list values is hidden inside a closure, so it’s protected from the outside world. We certainly could have implemented memoization using a global variable, but why would we if we we don’t have to? Our closure provides an equally efficient and more elegant solution to the problem.

Just to drive home the efficiency gain, memoized_fib(500) takes 0.014s to run on my machine; fib(500) is still calculating an hour later. Which makes sense, since we’ve replaced an exponential algorithm with a linear one. Feels good.

And, yes, it’s totally true that we could have just calculated the Fibonacci numbers iteratively and avoided the whole fiasco. But what would we have learned?

You might like these textually similar articles: