There is no dearth of explanations of the Fourier transform on the internet. Being a believer in understanding a concept from as many perspectives as possible, I welcome all of them. Here I offer yet another perspective that is “intuitive” in its own way and yet is not something I’ve seen presented often or presented in yet more mysterious ways.
As Jeff Raskin put it, “Intuitive equals Familiar” and so when I say this is “intuitive in its own way”, what I’m saying is that if you’re familiar with some specific other stuff, the FT would seem obvious to you when seen through that path.
Signals
So, we’re going to be dealing with the “discrete fourier transform”, not the continuous version. For that, we have to first describe what a “signal” is.
For the purpose of this point, we’ll treat a “real signal” as a function from integers to real numbers and a “complex signal” as a function from integers to complex numbers. We notate a signal \(\mathbf s\) and its value at a discrete time instant \(n\) by \(s(n)\). We won’t really distinguish between real and complex signals. If you like to be concrete, you can just assume complex signals everywhere.
Delay by one step
Note that the signal \({\mathbf s'}\) defined by \(s'(n) = s(n1)\) is essentially the same as the signal \(\mathbf s\) except that its values are “shifted by 1” – i.e. \(s'(0) = s(1)\), \(s'(1) = s(0)\), \(s'(2) = s(1)\) and so on.
We’ll represent this idea of “delay by one step” by the operator \(\mathcal D\) where an operator turns one function into another, and so we write \({\mathbf s'} = {\mathcal D}{\mathbf s}\).
This notion of “delay by one step” has some interesting compositional properties. In particular, it is linear – meaning, if you take two signals and make a new signal by a linear combination of them – like \({\mathbf w} = \alpha {\mathbf u} + \beta {\mathbf v}\), then the combination signal delayed by one step is the same as the linear combination of the corresponding “delayed by one step” signals – i.e. \({\mathcal D}{\mathbf w} = \alpha {\mathcal D}{\mathbf u} + \beta {\mathcal D}{\mathbf v}\) and by that we just mean \(w(n1) = \alpha u(n1) + \beta v(n1)\).
We can extend this notation to multiple applications of \(\mathcal D\), which then produce deeper delays. For example \(({\mathcal D}{\mathcal D}{\mathcal D}{\mathbf s})(n) = s(n3)\). Since the order of application of these “Ds” doesn’t matter, we simply write that as \(({\mathcal D}^3{\mathbf s})(n) = s(n3)\).
Polynomials
Now, consider the simple signal \({\mathbf \delta}\) defined as \(\delta(0) = 1\) and \(\delta(n) = 0\) for all other \(n\).^{1}
Considering only signals that are nonzero for \(n \geq 0\), and using the “linear” property we saw above, we can now write any signal \({\mathbf s}\) as –
\[{\mathbf s} = (s(0) + s(1){\mathcal D} + s(2){\mathcal D}^2 + s(3){\mathcal D}^3 + …){\mathbf \delta} = (\sum_k{s(k){\mathcal D}^k})\delta\].
When we look at that sum that way, we can also see that it can be extended to negative \(n\) by defining \({\mathcal D}^{1}\) to be the “shift left by one” operator – i.e. \(({\mathcal D}^{1}{\mathbf s})(n) = s(n+1)\).
We’ve now created a mapping between the numeric values at various indices as a sequence, and the abstract “signal” itself, through the \(\mathcal D\) operator. So \({\mathbf s}\) is mapped to a corresponding polynomial \(\sum_k{s(k){\mathcal D}^k}\). Given this polynomial, the signal \({\mathbf s}\) is completely defined. So the fact that the polynomial is an operator that is to be applied to \(\delta\) can be omitted without loss of clarity and we can just write \({\mathbf s} = \sum_k{s(k){\mathcal D}^k}\).
Now, we know how to do many more things with polynomials. We of course know to add and subtract and scale them with numerical coefficients in obvious ways.
We also know how to multiply polynomials. For example, if we have \({\mathbf u} = \sum_k{u(k){\mathcal D}^k}\) and \({\mathbf v} = \sum_k{v(k){\mathcal D}^k}\), we can multiply these two to get –
\[ \begin{array}{rcccl} {\mathbf u}{\mathbf v} & = & {\mathbf v}{\mathbf u} & = & \sum_j{\sum_k{u(j)v(k){\mathcal D}^{j+k}}} \end{array} \]
When we then collect all like terms together – i.e. all \({\mathcal D}^{j+k=m}\) terms together, we get –
\[ \begin{array}{rcl} {\mathbf u}{\mathbf v} & = & \sum_m{(\sum_k{u(k)v(mk)}){\mathcal D}^m} \end{array} \]
That sum that’s sitting inside – \(\sum_k{u(k)v(mk)}\) is what is usually called the “discrete convolution” of the two signals \({\mathbf u}\) and \({\mathbf v}\).
Linear, timeinvariant filters
An expression of the form \({\mathbf F} = \sum_k{f(k){\mathcal D}^k}\) is, according to our original notion, a “signal transformer” – as it takes a signal function and produces another signal function. Since \(\mathcal D\) is linear, such a signal transforming function is also linear and it also commutes with \({\mathcal D}\).
Such an \({\mathbf F}\) is what we call a “linear, timeinvariant filter”. “Linear” because the filter of a linear combination of signals is the same as the corresponding linear combinations of the filtered signals. “Time invariant” because it commutes with \(\mathcal D\) which implies that whether you delay a signal and then apply the filter or first apply the filter and then delay the signal, you get the same result.
Such a filter, therefore is also expressible as a polynomial in \({\mathcal D}\) just the same as a signal. We call this signal corresponding to the filter’s polynomial form as the “impulse response” of the filter, since you can apply the filter to the \(\mathbf \delta\) signal and get the sequence as the result. The \(\mathbf \delta\) here therefore behaves like a “unit” for “filter application”.
Doing more with the polynomials
If we now take a finite signal, i.e. where the sum \(\sum_{k\geq 0}{u(k){\mathcal D}^k}\) runs over a finite number of value of \(k\), it is easy to see that to completely specify such a signal, we’ll need \(n+1\) numbers, where \(n\) is the highest power of \({\mathcal D}\) in the polynomial. We’re now starting to use the polynomial like any ordinary polynomial!
So another way we can define such a finite signal in its entirety is to consider the polynomial over \({\mathcal D}\) and ask “what if we evaluate this polynomial at \(n+1\) real values of \({\mathcal D}\)?” Obviously, having these \(n+1\) numbers also completely defines the polynomial, as long as we evaluate at a fixed set of distinct values. Let’s call this set of numbers \({\mathcal E}(\mathbf s)\) where “\({\mathcal E}\)” stands for “evaluations”.
So what would be the convolution of the two sequences \(F(n)\) and \(s(n)\), becomes simple elementbyelement multiplication if we had the \(\mathcal E\) representations of them both.
A special set of numbers
For a signal \(\mathbf s\) such that \(s(n) \neq 0\) only for \(n \in [0,N]\), if we choose the \(N+1\) complex numbers \(e^{i\frac{2\pi k}{N}}\), we get a representation we’ll denote by \({\mathcal F}(\mathbf s)\).
This representation is called the “Discrete Fourier Transform” of the signal \(\mathbf s\).^{2}
Note that the complex numbers \(e^{i\frac{2\pi k}{N}}\) rotate around the unit circle at different rates depending on \(k\) when you raise them to an integer power \(j\), which happens when these are substituted for the various powers of \(\mathcal D\) in the polynomial. So these capture answers to “what is the periodicity of the sequence” kind of questions, probing various periodicities for various values of \(k\).
What’s so “intuitive” about this stuff?
How do you represent numbers? A number \(n\) in base \(b>1\) is simply \(\sum_{k\geq 0}{n_kb^k}\) where \(n_k\) are “digits” in the base \(b\), meaning \(0 \leq n_k \lt b\).
When you multiply two numbers \(m = \sum_k{m_kb^k}\) and \(n = \sum_k{n_kb^k}\), we get \(mn = nm = \sum_k{(\sum_j{m_jn_{kj}})b^k}\). So multiplication of two numbers is the convolution of their digit sequences.
So it is in that sense that I titled this post “The Fourier Transform as a number”. Since I expect you’re familiar with numbers as digit sequences and how to mutiply them, I’m expecting the above approach to be “intuitive” in Raskin’s sense. I hope this added to your understanding of the Fourier Transform.
Note that in most presentations, you’d see a definition of the FT before properties like “convolution becomes multiplication” are “proved” after the definition. Here, we saw all that even before defining the FT .. and I think that is worth something.
Have fun with all this!

Known as a the “Dirac delta” – in the discrete world. There is also a similar continuous version of the Dirac delta, but I’m not getting into that here. ↩︎

Here I’ve used \(i\) instead of the conventional \(i\), but that’s almost irrelevant as the choice of sign of \(i\) is only a matter of … convention. As long as we stick to the same convention throughout, we’re good. ↩︎