Automatic Differentiation: Dual numbers & Taylor numbers

Mar 12, 2019   #FP  #Haskell 

In the earlier post, we saw how we can calculate the derivative of a function as a function itself. We may not need to do that, but may want the value of a function’s derivative to be calculated while the function expression is being evaluated.

Given a normal function expression (in Haskell, for the moment), it is possible to reuse the function definition to calculate its derivative using the idea of dual numbers.

For example, you may have a simple squaring function defined like this -

square :: (Num x) => x -> x
square x = x * x

Since dual numbers obey the normal algebra of numbers, it should be possible to use this same definition of square but calculate its derivative also, by passing a dual number instead.

The definition of dual numbers is simple - it is just a pair of values.

data Dual x = Dual x x deriving (Show)

-- f adn = the function part
-- df adn = the derivative part
f (Dual f0 _) = f0
df (Dual _ df0) = df0

You can create input values as dual numbers using a simple value creator like this -

x :: (Num a) => a -> Dual a
x a = Dual a 1

Now, if you want to calculate square 3 and its derivative, you can call square (Dual.x 3). The (Dual.x 3) sub-expression produces a dual number, which square knows how to calculate its expression with. The * operator within the square function then invokes the multiplication rule for dual numbers and produces a dual number as the result.

If you’ve already read the first post, the following definitions for dual numbers will make sense since it is very close to that. The code is available on github as the Dual module.

-- Basic arithmetic operations
instance (Num n) => Num (Dual n) where
    a + b = Dual (f a + f b) (df a + df b)
    a * b = Dual (f a * f b) (df a * f b + f a * df b)
    negate a = Dual (- f a) (- df a)
    abs a = Dual (abs (f a)) (df a * signum (f a))
    signum a = Dual (signum (f a)) 0
    fromInteger x = Dual (fromInteger x) 0

instance (Eq n) => Eq (Dual n) where
    a == b = f a == f b

instance (Eq n, Ord n) => Ord (Dual n) where
    a <= b = f a <= f b

-- Reciprocal
instance (Fractional n) => Fractional (Dual n) where
    fromRational x = Dual (fromRational x) 0
    recip a = Dual (1 / f a) (- df a / (f a * f a))

-- Scientific functions
instance (Floating n) => Floating (Dual n) where
    pi = Dual pi 0
    exp a = Dual (exp (f a)) (df a * exp (f a))
    log a = Dual (log (f a)) (df a / f a)
    sin a = Dual (sin (f a)) (df a * cos (f a))
    cos a = Dual (cos (f a)) (- df a * sin (f a))
    asin a = Dual (asin (f a)) (df a / sqrt (1 - f a * f a))
    acos a = Dual (acos (f a)) (- df a / sqrt (1 - f a * f a))
    atan a = Dual (atan (f a)) (df a / (1 + f a * f a))
    sinh a = Dual (sinh (f a)) (df a * cosh (f a))
    cosh a = Dual (cosh (f a)) (df a * sinh (f a))
    asinh a = Dual (asinh (f a)) (df a / sqrt (1 + f a * f a))
    acosh a = Dual (acosh (f a)) (df a / sqrt (f a * f a - 1))
    atanh a = Dual (atanh (f a)) (df a / (1 + f a * f a))

Taylor numbers

With dual numbers, we can only calculate the first derivative. What if we also want the second or higher order derivatives? Can we arrange that?

We can exploit the fact that the n-th derivative is the first derivative of the n-1 th derivative, to define a recursive type that can calculate all the derivates necessary for the Taylor expansion of a function around a point only using the function definition itself. You get a supercharged version of dual numbers that can produce the full Taylor expansion, so we call them “Taylor numbers” here.

data Taylor x = Taylor x (Taylor x)

Notice how this mimics the recursive definition we did in dmath.hs. It basically stands for the infinite list of derivatives you can calculate through a function definition.

Since we can now calculate n-th derivates, we can define the corresponding utility function dfn for that.

-- f adn = the function part
-- df adn = the derivative part
f (Taylor f0 _) = f0
df (Taylor _ df0) = df0

-- Recursive calculation of the nth derivative.
dfn 0 = f 
dfn n = dfn (n-1) . df

Similar to dual numbers, we need a mechanism to construct our “Taylor numbers”.

x :: (Num a) => a -> Taylor a
x a = Taylor a 1

.. and again as with dual numbers, we can define operations for Taylor numbers recursively pretty easily.

-- Basic arithmetic operations
instance (Num n) => Num (Taylor n) where
    a + b = Taylor (f a + f b) (df a + df b)
    a * b = Taylor (f a * f b) (df a * b + a * df b)
    negate a = Taylor (- f a) (- df a)
    abs a = Taylor (abs (f a)) (df a * signum a)
    signum a = Taylor (signum (f a)) 0
    fromInteger x = Taylor (fromInteger x) 0

instance (Eq n) => Eq (Taylor n) where
    a == b = f a == f b

instance (Eq n, Ord n) => Ord (Taylor n) where
    a <= b = f a <= f b

-- Reciprocal
instance (Fractional n) => Fractional (Taylor n) where
    fromRational x = Taylor (fromRational x) 0
    recip a = Taylor (1 / f a) (- df a / (a * a))

-- Scientific functions
instance (Floating n) => Floating (Taylor n) where
    pi = Taylor pi 0
    exp a = Taylor (exp (f a)) (df a * exp a)
    log a = Taylor (log (f a)) (df a / a)
    sin a = Taylor (sin (f a)) (df a * cos a)
    cos a = Taylor (cos (f a)) (- df a * sin a)
    asin a = Taylor (asin (f a)) (df a / sqrt (1 - a * a))
    acos a = Taylor (acos (f a)) (- df a / sqrt (1 - a * a))
    atan a = Taylor (atan (f a)) (df a / (1 + a * a))
    sinh a = Taylor (sinh (f a)) (df a * cosh a)
    cosh a = Taylor (cosh (f a)) (df a * sinh a)
    asinh a = Taylor (asinh (f a)) (df a / sqrt (1 + a * a))
    acosh a = Taylor (acosh (f a)) (df a / sqrt (a * a - 1))
    atanh a = Taylor (atanh (f a)) (df a / (1 + a * a))

The code is, again, available on github as the Taylor module.

The main difference with dual numbers to note here is that the second part of the Taylor constructor can no longer be straight numeric calculation, but must recursively construct Taylor numbers themselves. Apart from that, the code reads very similar.

So, given the same square function, we can pass it a Taylor number so we can calculate the first and the second derivative like this -

> let s3 = square (Taylor.x 3) 
> f s3
9
> f (df s3)
6
> f (df (df s3))
2
> f (df (df (df s3)))
0

Note: The complexity of calculating the higher order derivatives goes up pretty quickly even if it ends up being 0 at some point due to the way we’ve defined the expressions. To get around this, the TaylorZ module augments the definition with a “zero” so there are also contracting constructs which reduce complexity to something more manageable.

Calculus in coinductive form

(25 July 2022) Recently, I came across this reference on Calculus in coinductive form (download pdf) which presents a logical view of what’s going on with this kind of formulation of “Taylor numbers”. The paper uses coinduction to represent the Taylor series of a function around a point as a stream of progressively higher order derivatives … which is exactly what the Taylor type is all about.

The only additional piece we need to be able to make use of the Taylor numbers to calculate values of a function around a given point is an evaluator that calculates up to k terms of the series –

teval 0 t x = f t
teval k t x = (teval (k-1) t x) + (dfn k t) * (x ^ k) / fromIntegral (product [1..k])

Furthermore, if we construct Taylor numbers directly using the constructor rather than passing values through a known function (which is what we used it for, for automatic differentiation), we can express the solutions of simple differential equations rather directly. For example, \(e^x\) around \(x = 0\) can be represented using its Taylor series which captures the fact that \(e^x\) is a solution of the differential equation \(dy/dx = y\), with \(y(0) = 1\) as an initial condition. We capture that as –

e = Taylor 1 e

We can now calculate \(e^x\) to, say, 10 terms of its Taylor expansion using approx_exp = teval 10 e. If we do that for x = 0.5, we get approx_exp 0.5 = 1.6487212706873655.

We can do the same for, say, \(\sin(x)\) like this -

s = Taylor 0 (Taylor 1 (-s))

where we take \(\sin\) to be the function meeting the differential equation \(d^2y/dx^2 = -y\) with \(y(0) = 0\) and \(y'(0) = 1\) as initial conditions. With that, we get teval 10 s 0.5 == 0.4794255386164159 and teval 20 s 0.5 == sin 0.5.

Note though that we’re forced to evaluate the approximate version only around \(x = 0\). But if you have a Taylor number evaluated for some other point a, then the x argument to teval is an offset relative to a.

This is pretty darn cool and the article is an elegant and enlightening read.