Tutorial

There are two ways to compute gradients with Mooncake.jl:

through the standardized DifferentiationInterface.jl API
through the native Mooncake.jl API

We recommend the former to start with, especially if you want to experiment with other automatic differentiation packages.

import DifferentiationInterface as DI
import Mooncake

DifferentiationInterface.jl API

DifferentiationInterface.jl (or DI for short) provides a common entry point for every automatic differentiation package in Julia. To specify that you want to use Mooncake.jl, just create the right "backend" object (with an optional Mooncake.Config):

backend = DI.AutoMooncake(; config=nothing)

ADTypes.AutoMooncake{Nothing}(nothing)

This object is actually defined by a third package called ADTypes.jl, but re-exported by DI.

Single argument

Suppose you want to differentiate the following function

f(x) = sum(abs2, x)

f (generic function with 1 method)

on the following input

x = float.(1:3)

3-element Vector{Float64}:
 1.0
 2.0
 3.0

The naive way is to simply call DI.gradient:

DI.gradient(f, backend, x)  # slow, do not do this

3-element Vector{Float64}:
 2.0
 4.0
 6.0

This returns the correct gradient, but it is very slow because it includes the time taken by Mooncake.jl to compute a differentiation rule for f (see Mooncake.jl's Rule System). If you anticipate you will need more than one gradient, it is better to call DI.prepare_gradient on a typical (e.g. random) input first:

typical_x = rand(3)
prep = DI.prepare_gradient(f, backend, typical_x)

DifferentiationInterfaceMooncakeExt.MooncakeGradientPrep{Mooncake.Cache{Mooncake.DerivedRule{Tuple{typeof(Main.f), Vector{Float64}}, Tuple{Mooncake.CoDual{typeof(Main.f), Mooncake.NoFData}, Mooncake.CoDual{Vector{Float64}, Vector{Float64}}}, Mooncake.CoDual{Float64, Mooncake.NoFData}, Tuple{Float64}, Tuple{Mooncake.NoRData, Mooncake.NoRData}, false, Val{2}}, Float64, Tuple{Mooncake.NoTangent, Vector{Float64}}}}(Mooncake.Cache{Mooncake.DerivedRule{Tuple{typeof(Main.f), Vector{Float64}}, Tuple{Mooncake.CoDual{typeof(Main.f), Mooncake.NoFData}, Mooncake.CoDual{Vector{Float64}, Vector{Float64}}}, Mooncake.CoDual{Float64, Mooncake.NoFData}, Tuple{Float64}, Tuple{Mooncake.NoRData, Mooncake.NoRData}, false, Val{2}}, Float64, Tuple{Mooncake.NoTangent, Vector{Float64}}}(Mooncake.DerivedRule{Tuple{typeof(Main.f), Vector{Float64}}, Tuple{Mooncake.CoDual{typeof(Main.f), Mooncake.NoFData}, Mooncake.CoDual{Vector{Float64}, Vector{Float64}}}, Mooncake.CoDual{Float64, Mooncake.NoFData}, Tuple{Float64}, Tuple{Mooncake.NoRData, Mooncake.NoRData}, false, Val{2}}(MistyClosure (::Mooncake.CoDual{typeof(Main.f), Mooncake.NoFData}, ::Mooncake.CoDual{Vector{Float64}, Vector{Float64}})::Mooncake.CoDual{Float64, Mooncake.NoFData}->◌, Base.RefValue{MistyClosures.MistyClosure{Core.OpaqueClosure{Tuple{Float64}, Tuple{Mooncake.NoRData, Mooncake.NoRData}}}}(MistyClosure (::Float64)::Tuple{Mooncake.NoRData, Mooncake.NoRData}->◌), Val{2}()), 1.6925921249549871, (Mooncake.NoTangent(), [0.11751177502384214, 1.993376240821874, 1.668235788213869])))

The typical input should have the same size and type as the actual inputs we will provide later on. As for the contents of the preparation result, they do not matter. What matters is that it captures everything you need for DI.gradient to be fast:

DI.gradient(f, prep, backend, x)  # fast

3-element Vector{Float64}:
 2.0
 4.0
 6.0

For optimal speed, you can provide storage space for the gradient and call DI.gradient! instead:

grad = similar(x)
DI.gradient!(f, grad, prep, backend, x)  # very fast

3-element Vector{Float64}:
 2.0
 4.0
 6.0

If you also need the value of the function, check out DI.value_and_gradient or DI.value_and_gradient!:

DI.value_and_gradient(f, prep, backend, x)

(14.0, [2.0, 4.0, 6.0])

Multiple arguments

What should you do if your function takes more than one input argument? Well, DI can still handle it, assuming that you only want the derivative with respect to one of them (the first one, by convention). For instance, consider the function

g(x, a, b) = a * f(x) + b

g (generic function with 1 method)

You can easily compute the gradient with respect to x, while keeping a and b fixed. To do that, just wrap these two arguments inside DI.Constant, like so:

typical_a, typical_b = 1.0, 1.0
prep = DI.prepare_gradient(g, backend, typical_x, DI.Constant(typical_a), DI.Constant(typical_b))

a, b = 42.0, 3.14
DI.value_and_gradient(g, prep, backend, x, DI.Constant(a), DI.Constant(b))

(591.14, [84.0, 168.0, 252.0])

Note that this works even when you change the value of a or b (those are not baked into the preparation result).

If one of your additional arguments behaves like a scratch space in memory (instead of a meaningful constant), you can use DI.Cache instead.

Now what if you care about the derivatives with respect to every argument? You can always go back to the single-argument case by putting everything inside a tuple:

g_tup(xab) = xab[2] * f(xab[1]) + xab[3]
prep = DI.prepare_gradient(g_tup, backend, (typical_x, typical_a, typical_b))
DI.value_and_gradient(g_tup, prep, backend, (x, a, b))

(591.14, ([84.0, 168.0, 252.0], 14.0, 1.0))

You can also use the native API of Mooncake.jl, discussed below.

Beyond gradients

Going through DI allows you to compute other kinds of derivatives, like (reverse-mode) Jacobian matrices. The syntax is very similar:

h(x) = cos.(x) .* sin.(reverse(x))
prep = DI.prepare_jacobian(h, backend, x)
DI.jacobian(h, prep, backend, x)

3×3 Matrix{Float64}:
 -0.118748   0.0       -0.534895
  0.0       -0.653644   0.0
 -0.534895   0.0       -0.118748

Mooncake.jl API

Warning

Work in progress.