Thursday, April 30, 2009

Go West Young Man!

I have had to find it out the hard way, but blogger sucks when it comes to math blogging. It wouldn't be all that bad, were it not because on the other side of the road things are soooo very much easier. And while I will miss google analytics, the native support for latex and syntax highlighting make up for it more than enough. So I'm moving this blog to wordpress...

You can find the new blog here, and suscribe to the RSS feed here.

For starters I have redone the much improvable first post of the series on the FFT, which I have split in two, one on the DFT, the other starting discussion on the FFT, plus an unconnected post on computing the GCD...

I will be moving all the content in this blog over to the new one, but have no intention of deleting anything from this one.

Thursday, April 16, 2009

The Cooley-Tukey FFT Algorithm

I'm currently a little fed up with number theory, so its time to change topics completely. Specially since the post on basic integer factorization completes what I believe is a sufficient toolkit to tackle a very cool subject: the fast Fourier transform (FFT).

I have some mixed feelings about how does the Fourier transform qualify for the "uncomplicated complexity" rule I imposed on myself when starting this blog. There certainly is a lot of very elaborate math behind Fourier analysis, but I think that if you skip all the mathematical subtleties and dive head first into a "the Fourier transform gives you the frequencies present in the signal" type of intuitive description, the journey from the naive implementation of the discrete Fourier transform (DFT) to the many flavors of the FFT is a very enjoyable one, which requires only a limited amount of complication, but gentle doses of complexity. Which is exactly what I'm after...

So I will take the definition of the DFT for granted, and concentrate on the algorithms to compute it efficiently. There will have to be a few more leaps of faith, but I will try to mark them clearly, so you can whisper "amen" and keep going without a second thought. Although things may unfold differently, I think there is material for about half a dozen posts. Towards the end I may devote some time to optimizing the code, with things such as avoiding recursion, or doing the calculations in place, but most of my efforts will be devoted to optimizing the math behind the code, not the code itself. So the resulting code will probably be suboptimal: if you spot a potential improvement your comments are welcome and encouraged.

The Discrete Fourier Transform
Without further explanation, we will begin by writing down the analytical expression of the DFT,
tex: \displaystyle X_k = \sum_{n=0}^{N-1}{x_n e^{-2\pi i k n / N}},
and of its corresponding inverse transform,
tex: \displaystyle x_n = \sum_{k=0}^{N-1}{X_k e^{2\pi i k n / N}}.
With python's built-in support for complex arithmetic, there really isn't much mistery in turning these two formulas into two python functions, or as I have chosen, one with an inverse switch:


from __future__ import division
import math
import time

def dft(x, inverse = False, verbose = False) :
t = time.clock()
N = len(x)
inv = -1 if not inverse else 1
X =[0] * N
for k in xrange(N) :
for n in xrange(N) :
X[k] += x[n] * math.e**(inv * 2j * math.pi * k * n / N)
if inverse :
X[k] /= N
t = time.clock() - t
if verbose :
print "Computed","an inverse" if inverse else "a","DFT of size",N,
print "in",t,"sec."
return X

Of course, having two nested loops of the size of the input means that computation time will grow with N2, which is extremely inefficient.

The Radix-2 Cooley-Tukey Algorithm with Decimation in Time
How can that be improved? If the size of the input is even, we can write N = 2·M and it is possible to split the N element summation in the previous formulas into two M element ones, one over n = 2·m, another over n = 2·m + 1. By doing so, we arrive at
tex: \displaystyle X_k = \sum_{m=0}^{M-1}{x_{2m} e^{-2\pi i \frac{m k}{M} }} + e^{-2\pi i\frac{k}{N}} \sum_{m=0}^{M-1}{x_{2m+1} e^{-2\pi i\frac{m k}{M}}}
for the direct transform, and
tex: \displaystyle x_n = \frac{1}{2} \left(\frac{1}{M} \sum_{m=0}^{M-1}{X_{2m} e^{2\pi i \frac{m n}{M} }} + e^{2\pi i\frac{n}{N}}\frac{1}{M} \sum_{m=0}^{M-1}{X_{2m+1} e^{2\pi i\frac{m n}{M}}}\right)
for the inverse.

If you look carefully at this formulas, you'll notice we have just split our size N transform, be it direct or inverse, into two others of half the size, one over even indexes, the other over odd ones. The factor multiplying the second partial transform is known as a twiddle factor, and introduces a little overhead to the total timing, but overall we have just managed to cut the total time in half. And if the input size happens to be a power of two, we need not stop here, for as long as it is divisible by two, we can repeat the process iteratively, leading to a time of N log N.

To implement this recursive approach we need to take care of a few more details, though. First, if you look at the previous formulas, variable k in the direct transform, or n in the inverse, runs all the way to N = 2·M, rather than to M, as one would expect in the standard DFT. By taking advantage of the fact that the exponential function is periodic with period 2πi, it is quite easy to show that, for k (or n) larger than M, the resulting value will be the same as the one obtained for k-M (or n-M).

The other open question is at what point to stop the recursion. The simplest approach is to keep going until the input size is odd, and then call the naive DFT function. If the size happens to be a power of two, this call will not happen until the input size comes down to one. Since the DFT of a single sample signal is the same signal unchanged, that is a wasteful way of coding. But since it clarifies the logic behind the algorithm, I will leave those optimizations for a later stage.

With this caveats in mind, this FFT algorithm can be coded in python as follows:


from __future__ import division
import math
import time

def fft_CT(x, inverse = False, verbose = False) :
t = time.clock()
N = len(x)
inv = -1 if not inverse else 1
if N % 2 :
return dft(x, inverse, False)
x_e = x[::2]
x_o = x[1::2]
X_e = fft_CT(x_e, inverse, False)
X_o = fft_CT(x_o, inverse, False)
X = []
M = N // 2
for k in range(M) :
X += [X_e[k] + X_o[k] * math.e ** (inv * 2j * math.pi * k / N)]
for k in range(M,N) :
X += [X_e[k-M] - X_o[k-M] * math.e ** (inv * 2j * math.pi * (k-M) / N)]
if inverse :
X = [j/2 for j in X]
t = time.clock() - t
if verbose :
print "Computed","an inverse" if inverse else "a","CT FFT of size",N,
print "in",t,"sec."
return X

The speed gain with this approach, restricted for now to power of two sizes, are tremendous even for moderately large input sizes. For instance, a 220 (~106) item input can be processed with the FFT approach in a couple of minutes, while the projected duration using the naive DFT will be more like a couple of months...

Thursday, April 9, 2009

Naive Integer Factorization

After three posts (1, 2, 3) on calculating prime numbers, it is probably worth putting that knowledge to a more useful task. As we will see in a near future, integer factorization, i.e. breaking down a (composite) number into its prime factors is one such task. In purity, factoring a number n is simply decomposing it as the product of two smaller non-trivial, i.e. different from 1 and n itself, divisors. But by repeatedly factoring the divisors one will eventually come up with a unique set of primes which, when multiplied together, render the original number, or so says the fundamental theorem of arithmetic... The point is, we will consider factorization a synonym of prime decomposition, be it formally correct or not.

There are some very sophisticated methods to factor very large numbers, but they use a lot of extremely complex math, so I doubt they will ever find their way onto this blog. So we are going to be left with the naive, straightforward approach as our only option, although I will try to give it an efficiency boost. What is this naive approach? Trial division, of course: given a number n, we know that its smallest factor will be smaller than the square root of n, so we can simply try and see if any of those numbers divide it. No, I will not try to code that yet... If you have read the entries on determining prime numbers, it should come as no surprise that we really do not need to do trial division by all numbers smaller than the square root of n, but only by the primes within. This is a consequence of the fact that, if a composite number divides n, then each of the prime factors of that composite number will also divide n. According to the prime number theorem the number of primes below x is asymptotic to x / log x. So by limiting our trials to prime numbers we can reduce the number of tests from n1/2 to something around 2 n1/2 / log n. If we rescue the primeListSofE function from the post on the sieve of Erathostenes, a python implementation of naive factorization could look something like this...

from time import clock

def factor(n, verbose = False) :
"""Returns all prime factors of n, using trial division by prime
numbers only. Returns a list of (possibly repeating) prime factors
"""
t = clock()
ret =[]
nn = n
maxFactor = int(n**0.5)
primes = primeListSofE(maxFactor, verbose)
for p in primes :
while nn % p == 0 :
nn //= p
ret += [p]
if nn == 1 :
break
if nn != 1 :
ret += [nn]
t = clock() - t
if verbose :
print "Calculated factors of",n,"in",t,"sec."
return ret

While this function will be about as good as we can make it for numbers which are the product of two large prime factors, it will be terribly inefficient for most numbers. Consider, as an extreme example, that we are trying to factor 255 ~ 3.6·1016. We would first calculate all primes up to 1.9·108, a challenging feat in itself with our available tools, only to find out that we only needed the first of those primes, i.e. 2. Taking into account that 50% of all numbers are divisible by 2, 33% are divisible by 3, 20% are divisible by 5... it doesn't seem wise to disregard the potential time savings. What we can do to profit from this is to do the trial division checks at the same time as we determine the prime numbers, updating the largest prime to test on the fly. This has to be done in two stages, the first while we sieve up to n1/4, the second while we search the rest of the sieve up to n1/2 searching for more primes. The following Franken-code has been written mostly by cut-and-paste from primeListSofE and factor, which hopefully hasn't affected its readability much:

from time import clock

def factorAndSieve(n, verbose = False) :
"""Returns all prime factors of n, using trial division while sieving
for primes. Returns a list of (possibly repeating) prime factors
"""
t = clock()
ret =[]
nn = n
while nn % 2 == 0 : # remove 2's first, as 2 is not in sieve
nn //= 2
ret += [2]
maxFactor = int(nn**0.5)
maxI = (maxFactor-3) // 2
maxP = int(maxFactor**0.5)
sieve = [True for j in xrange(maxI+1)]
i = 0
for p in xrange(3, maxP+1,2) : # we then sieve as far as needed
if p > maxP :
break
i = (p-3) // 2
if sieve[i] :
while nn % p == 0 :
nn //= p
ret += [p]
maxFactor = int(nn**0.5)
maxI = (maxFactor-3) // 2
maxP = int(maxFactor**0.5)
if nn == 1 :
break
else :
i2 = (p*p - 3) // 2
for k in xrange(i2, maxI+1, p) :
sieve[k] = False
index = i
for i in xrange(index, maxI+1) : # and inspect the rest of the sieve
if i > maxI :
break
if sieve[i] :
p = 2*i + 3
while nn % p == 0 :
nn //= p
ret += [p]
maxFactor = int(nn**0.5)
maxI = (maxFactor-3) // 2
maxP = int(maxFactor**0.5)
if nn == 1 :
break
if nn != 1 :
ret += [nn]
t = clock() - t
if verbose :
print "Calculated factors of",n,"in",t,"sec."
print "Stopped trial division at",2*i+3,"instead of",int(n**0.5)
return ret

This new code will very often be much faster than the other one, but at times it will be just about as slow as in the other case, or even slower, since the mixing of both codes introduces some inefficiencies. The most extreme examples of such cases would be a prime number, or the square of a prime number on one side, and a power of 2 on the other one.



The graph above plots times to calculate the factors of numbers between 106 and 106 + 100. Prime numbers in this interval stick out as the red dots among the blue ones: 106 +3, +33, the twin primes +37 and +39, +81 and +99. And numbers with many small prime factors populate the bottom of the red cloud.

If the above graph is not enough to convince you of the benefits of the second approach, maybe this timings for very large numbers will:

>>> factor(10**15+37,True)
Calculated primes to 31622776 in 6.760 sec.
Calculated factors of 1000000000000037 in 8.466 sec.
[1000000000000037L]
>>> factorAndSieve(10**15+37,True)
Calculated factors of 1000000000000037 in 8.666 sec.
Stopped trial division at 31622775 instead of 31622776
[1000000000000037L]

>>> factor(2**55,True)
Calculated primes to 189812531 in 42.811 sec.
Calculated factors of 36028797018963968 in 43.261 sec.
[2, ..., 2]
>>> factorAndSieve(2**55,True)
Calculated factors of 36028797018963968 in 8.632e-05 sec.
Stopped trial division at 3 instead of 189812531
[2, ..., 2]