On the Pythagorean "theorem"

Nov 11, 2023   #Pythagorean theorem  #Vector algebra  #Geometry 

(Status: draft)

I’ve long held a view in my mind that the very well known Pythagorean “theorem” is not a “theorem” – in the sense of being able to be derived from a set of axioms through logical deduction. I want to jot down this thinking here because I think it is important for pedagogical reasons … but mostly I just want to get it off my chest even if the mathematical community considers this blasphemy. So I shall refer to this using “theorem” in quotes throughout this writeup.

To start with, let me pick a statement of the “theorem” from Wikipedia

“It states that the area of the square whose side is the hypotenuse (the side opposite the right angle) is equal to the sum of the areas of the squares on the other two sides.”

Books are chock full of remarkable and entertaining “proofs” of this “theorem” and I don’t intend to reproduce them here. Suffice to say that I admire the intellectual work on all of them.

So what exactly is my objection to calling this a “theorem”? Without a definition of what a “right angle” is that is independent of the conclusion, it is simply a tautology.

So let’s dive in. Try this as an exercise. What is a “right angle”? You may say, as the Wikipedia notes, an “angle of 90 degrees”. But then, what is an “angle” and how are you supposed to measure it? In fact, how do you define a measure for the length of a line segment?

Down that rabbit hole, it becomes clear that without the \(c = \sqrt{a^2+b^2}\) result, there is no way to define the length of a line segment.

In other words, given the result, all of the properties of the so-called “Euclidean geometry” can be derived. In that sense, the result is more fundamental than the other “parallel lines don’t meet” and such other tautologies taught as self evident truths. This is also the basis of the formulation of general geometry of curved spaces that is used in Einstein’s general theory of relativity.

One way to see this is to start with vector spaces. Here, you have objects called “vectors” along with a commutative and closed operation \(+\) and a scalar multiplication operation, such that given two vectors \(u\) and \(v\), and two scalars \(\alpha\) and \(\beta\), \(\alpha u+\beta v\) is also a vector in the same space. Let’s denote this vector space using \(\mathcal{V}\).

Now lets look at linear functions from vectors to the scalar field. i.e. \(f(u)\) with the property that \(f(\alpha u+\beta v) = \alpha f(u) + \beta f(v)\) for all scalars \(\alpha\) and \(\beta\) and vectors \(u\) and \(v\).

Now, the set of such functions also forms a vector space that’s dual to our starting vector space, and whose members are called “one forms”. So we have one forms, which are linear functions of our vectors to scalars. We’ll use \(\mathcal{\Omega}\) to refer to this dual space.

\(u, v :: \mathcal{V}\)

\(\omega, \nu :: \mathcal{\Omega}\)

Measuring the lengths of vectors is done by making a linear map from the vector space to its dual vector space.

\( \mathbf{g} :: \mathcal{V} \rightarrow \mathcal{\Omega}\)

This map is called a “metric” because fixing this map lets us get a “squared length” like this –

\( |u|^2 \stackrel{{\mathsf{def}}}{=} \mathbf{g} u u \)

… and when we take g to be symmetric in its arguments (i.e. that \(\mathbf{g} u v = \mathbf{g} v u\)), it meets the conditions of triangular inequality required of a metric.

A convenient notation used for \(\mathbf{g}uv\) is the “dot product” notation – \(\langle u,v\rangle\). So we write \(|u|^2 = \langle u, u \rangle\).

Note that \( \mathbf{g} u \) is a one form, i.e. a linear function from vectors to scalars.

Using \(\mathbf{g}\), we can derive an orthonormal basis for our vector space, i.e. a collection of vectors \(\hat{e}_i\) such that any vector \(v\) can be expressed as a linear combination of the \(\hat{e}_i\).

\[ v = \sum_{i}(\langle v, \hat{e}_i \rangle\hat{e}_i \]

Two vectors are said to be orthogonal if \(\langle u, v \rangle = 0\).

To see how this is possible, you can consider the projection operator \(P :: \mathcal{V} \rightarrow \mathcal{V} \rightarrow \mathcal{V}\) defined by

\[ P u v = v - \frac{\langle u, v \rangle}{\langle u, u \rangle}u \]

Note that \(\langle u, Puv\rangle = 0\). Given a number of linearly independent vectors (i.e. a set of vectors where none of the members can be expressed as a linear weighted sum of the other vectors in the set), we can always compute an orthonormal basis set from them by repeatedly applying projections.

Now with this definition of a “dot product” available, we say two vectors are orthogonal if \(\langle u,v \rangle = 0\).

Given that, we can easily see \(\langle u + v, u + v \rangle = |u|^2 + |v|^2 + 2\langle u,v \rangle\). (We’re only considering vector spaces over the field of reals, where we have \(\langle u,v\rangle = \langle v,u\rangle\).)

So for two orthogonal vectors \(u\) and \(v\), we have \(|u + v|^2 = |u|^2 + |v|^2\). Well, that’s the “pythagorean theorem” once we establish by observation (i.e. empirically) that the notions have correspondence to how we conceive of angles and perpendicularity in ordinary planar geometry for a metric \(\mathbf{g} \hat{e}_i \hat{e}_j = \delta_{ij}\) where \(\delta_{ij} = 0\) for \(i \neq j\) and \(\delta_{ij} = 1\) for \(i = j\). By observation, \(u+v\) corresponds to the “hypotenuse” of the triangle whose sides are represented by \(u\) and \(v\).

To see the point I’m trying to make, it is worth stating what we’ve arrived at explicitly –

If two vectors \(u\) and \(v\) are orthogonal according to a metric \(\mathbf{g}\), then \(|u+v|^2 = |u|^2 + |v|^2\). (Note “orthogonal” means \(\langle u, v \rangle = 0\).)

At this point, given the definition of dot product, \(|u+v|^2 = |u|^2 + |v|^2 + 2\langle u,v\rangle\) by ordinary algebra and so if further we have \(\langle u,v\rangle = 0\) implies \(|u+v|^2 = |u|^2 + |v|^2\) is just like … duh!

In conclusion, we need the definition of a dot product to define the notion of a “right angle”. By the time we have that though, we’ve already assumed the form of the metric to be \(\langle u,v\rangle = \sum_{i}u_iv_i\) given an orthonormal set of basis vectors \(\{\hat{e}_i\}\) and there is no more “theorem” left to be “proven”. That’s the gist of my critique.

Furthermore, we can also see that taking the length definition according to this calculation, we can “derive” all the basic properties of “Euclidean geometry”.

Btw, Prof. C.K.Raju also dismisses this as a “theorem”, preferring to call it “Baudhayana’s calculation” instead. … But the reason for his dismissal is that “reasoning from axioms” constitutes a hegemony in mathematics and an inferior one compared to “reasoning from facts” according to him. The dismissal is the same though the rationale behind it is perhaps differently motivated. I do agree with his view that this has a strong bearing on math education, where the western approach has largely gone with “reasoning from facts” for pedagogy, but we’re still stuck in India with the “Greek gods”.