How ForwardDiff Works¶
ForwardDiff is an implementation of forward mode automatic differentiation (AD) in
Julia. There are two key components of this implementation: the Dual
type, and the API.
Dual Number Implementation¶
Partial derivatives are stored in the Partials{N,T}
type:
immutable Partials{N,T}
values::NTuple{N,T}
end
Overtop of this container type, ForwardDiff implements the Dual{N,T}
type:
immutable Dual{N,T<:Real} <: Real
value::T
partials::Partials{N,T}
end
This type represents an N
-dimensional dual number with the following mathematical
behavior:
where the \(a\) component is stored in the value
field and the \(b\)
components are stored in the partials
field. This property of dual numbers is the
central feature that allows ForwardDiff to take derivatives.
In order to implement the above property, elementary numerical functions on a Dual
number are overloaded to evaluate both the original function, and evaluate the derivative
of the function, propogating the derivative via multiplication. For example, Base.sin
can be overloaded on Dual
like so:
Base.sin(d::Dual) = Dual(sin(value(d)), cos(value(d)) * partials(d))
If we assume that a general function f
is composed of entirely of these elementary
functions, then the chain rule enables our derivatives to compose as well. Thus, by
overloading a plethora of elementary functions, we can differentiate generic functions
composed of them by passing in a Dual
number and looking at the output.
We won’t dicuss higher-order differentiation in detail, but the reader is encouraged to
learn about hyper-dual numbers, which extend dual numbers to higher orders by introducing
extra \(\epsilon\) terms that can cross-multiply. ForwardDiff’s Dual
number
implementation naturally supports hyper-dual numbers without additional code by allowing
instances of the Dual
type to nest within each other. For example, a second-order
hyper-dual number has the type Dual{N,Dual{N,T}}
, a third-order hyper-dual number has
the type Dual{N,Dual{N,Dual{N,T}}}
, and so on.
ForwardDiff’s API¶
The second component provided by this package is the API, which abstracts away the number
types and makes it easy to execute familiar calculations like gradients and Hessians. This
way, users don’t have to understand Dual
numbers in order to make use of the package.
The job of the API functions is to performantly seed input values with Dual
numbers,
pass the seeded value into the target function, and extract the derivative information from
the result. For example, to calculate the partial derivatives for the gradient of a function
\(f\) at an input vector \(\vec{x}\), we would do the following:
In reality, ForwardDiff does this calculation in chunks of the input vector. To provide a simple example of this, let’s examine the case where the input vector size is 4 and the chunk size is 2. It then takes two calls to \(f\) to evaluate the gradient:
This seeding process is similar for Jacobians, so we won’t rehash it here.