Automatic Differentiation: Dual numbers & Taylor numbers

Mar 12, 2019   #FP  #Haskell 

In the earlier post, we saw how we can calculate the derivative of a function as a function itself. We may not need to do that, but may want the value of a function’s derivative to be calculated while the function expression is being evaluated.

Given a normal function expression (in Haskell, for the moment), it is possible to reuse the function definition to calculate its derivative using the idea of dual numbers.

For example, you may have a simple squaring function defined like this -

square :: (Num x) => x -> x
square x = x * x

Since dual numbers obey the normal algebra of numbers, it should be possible to use this same definition of square but calculate its derivative also, by passing a dual number instead.

The definition of dual numbers is simple - it is just a pair of values.

data Dual x = Dual x x deriving (Show)

-- f adn = the function part
-- df adn = the derivative part
f (Dual f0 _) = f0
df (Dual _ df0) = df0

You can create input values as dual numbers using a simple value creator like this -

x :: (Num a) => a -> Dual a
x a = Dual a 1

Now, if you want to calculate square 3 and its derivative, you can call square (Dual.x 3). The (Dual.x 3) sub-expression produces a dual number, which square knows how to calculate its expression with. The * operator within the square function then invokes the multiplication rule for dual numbers and produces a dual number as the result.

If you’ve already read the first post, the following definitions for dual numbers will make sense since it is very close to that. The code is available on github as the Dual module.

-- Basic arithmetic operations
instance (Num n) => Num (Dual n) where
    a + b = Dual (f a + f b) (df a + df b)
    a * b = Dual (f a * f b) (df a * f b + f a * df b)
    negate a = Dual (- f a) (- df a)
    abs a = Dual (abs (f a)) (df a * signum (f a))
    signum a = Dual (signum (f a)) 0
    fromInteger x = Dual (fromInteger x) 0

instance (Eq n) => Eq (Dual n) where
    a == b = f a == f b

instance (Eq n, Ord n) => Ord (Dual n) where
    a <= b = f a <= f b

-- Reciprocal
instance (Fractional n) => Fractional (Dual n) where
    fromRational x = Dual (fromRational x) 0
    recip a = Dual (1 / f a) (- df a / (f a * f a))

-- Scientific functions
instance (Floating n) => Floating (Dual n) where
    pi = Dual pi 0
    exp a = Dual (exp (f a)) (df a * exp (f a))
    log a = Dual (log (f a)) (df a / f a)
    sin a = Dual (sin (f a)) (df a * cos (f a))
    cos a = Dual (cos (f a)) (- df a * sin (f a))
    asin a = Dual (asin (f a)) (df a / sqrt (1 - f a * f a))
    acos a = Dual (acos (f a)) (- df a / sqrt (1 - f a * f a))
    atan a = Dual (atan (f a)) (df a / (1 + f a * f a))
    sinh a = Dual (sinh (f a)) (df a * cosh (f a))
    cosh a = Dual (cosh (f a)) (df a * sinh (f a))
    asinh a = Dual (asinh (f a)) (df a / sqrt (1 + f a * f a))
    acosh a = Dual (acosh (f a)) (df a / sqrt (f a * f a - 1))
    atanh a = Dual (atanh (f a)) (df a / (1 + f a * f a))

Taylor numbers

With dual numbers, we can only calculate the first derivative. What if we also want the second or higher order derivatives? Can we arrange that?

We can exploit the fact that the n-th derivative is the first derivative of the n-1 th derivative, to define a recursive type that can calculate all the derivates necessary for the Taylor expansion of a function around a point only using the function definition itself. You get a supercharged version of dual numbers that can produce the full Taylor expansion, so we call them “Taylor numbers” here.

data Taylor x = Taylor x (Taylor x)

Notice how this mimics the recursive definition we did in dmath.hs. It basically stands for the infinite list of derivatives you can calculate through a function definition.

Since we can now calculate n-th derivates, we can define the corresponding utility function dfn for that.

-- f adn = the function part
-- df adn = the derivative part
f (Taylor f0 _) = f0
df (Taylor _ df0) = df0

-- Recursive calculation of the nth derivative.
dfn 0 = f 
dfn n = dfn (n-1) . df

Similar to dual numbers, we need a mechanism to construct our “Taylor numbers”.

x :: (Num a) => a -> Taylor a
x a = Taylor a 1

.. and again as with dual numbers, we can define operations for Taylor numbers recursively pretty easily.

-- Basic arithmetic operations
instance (Num n) => Num (Taylor n) where
    a + b = Taylor (f a + f b) (df a + df b)
    a * b = Taylor (f a * f b) (df a * b + a * df b)
    negate a = Taylor (- f a) (- df a)
    abs a = Taylor (abs (f a)) (df a * signum a)
    signum a = Taylor (signum (f a)) 0
    fromInteger x = Taylor (fromInteger x) 0

instance (Eq n) => Eq (Taylor n) where
    a == b = f a == f b

instance (Eq n, Ord n) => Ord (Taylor n) where
    a <= b = f a <= f b

-- Reciprocal
instance (Fractional n) => Fractional (Taylor n) where
    fromRational x = Taylor (fromRational x) 0
    recip a = Taylor (1 / f a) (- df a / (a * a))

-- Scientific functions
instance (Floating n) => Floating (Taylor n) where
    pi = Taylor pi 0
    exp a = Taylor (exp (f a)) (df a * exp a)
    log a = Taylor (log (f a)) (df a / a)
    sin a = Taylor (sin (f a)) (df a * cos a)
    cos a = Taylor (cos (f a)) (- df a * sin a)
    asin a = Taylor (asin (f a)) (df a / sqrt (1 - a * a))
    acos a = Taylor (acos (f a)) (- df a / sqrt (1 - a * a))
    atan a = Taylor (atan (f a)) (df a / (1 + a * a))
    sinh a = Taylor (sinh (f a)) (df a * cosh a)
    cosh a = Taylor (cosh (f a)) (df a * sinh a)
    asinh a = Taylor (asinh (f a)) (df a / sqrt (1 + a * a))
    acosh a = Taylor (acosh (f a)) (df a / sqrt (a * a - 1))
    atanh a = Taylor (atanh (f a)) (df a / (1 + a * a))

The code is, again, available on github as the Taylor module.

The main difference with dual numbers to note here is that the second part of the Taylor constructor can no longer be straight numeric calculation, but must recursively construct Taylor numbers themselves. Apart from that, the code reads very similar.

So, given the same square function, we can pass it a Taylor number so we can calculate the first and the second derivative like this -

> let s3 = square (Taylor.x 3) 
> f s3
9
> f (df s3)
6
> f (df (df s3))
2
> f (df (df (df s3)))
0

Note: The complexity of calculating the higher order derivatives goes up pretty quickly even if it ends up being 0 at some point due to the way we’ve defined the expressions.