Known Limitations

Mooncake.jl has a number of known qualitative limitations, which we document here.

Mutation of Global Variables

While great care is taken in this package to prevent silent errors, this is one edge case that we have yet to provide a satisfactory solution for. Consider a function of the form:

julia> const x = Ref(1.0);

julia> function foo(y::Float64)
           x[] = y
           return x[]
       end
foo (generic function with 1 method)

x is a global variable (if you refer to it in your code, it appears as a GlobalRef in the AST or lowered code). For some technical reasons that are beyond the scope of this section, this package cannot propagate gradient information through x. foo is the identity function, so it should have gradient 1.0. However, if you differentiate this example, you'll see:

julia> rule = Mooncake.build_rrule(foo, 2.0);

julia> Mooncake.value_and_gradient!!(rule, foo, 2.0)
(2.0, (NoTangent(), 0.0))

Observe that while it has correctly computed the identity function, the gradient is zero.

The takehome: do not attempt to differentiate functions which modify global state. Uses of globals which does not involve mutating them is fine though.

Circular References

To a large extent, Mooncake.jl does not presently support circular references in an automatic fashion. It is generally possible to hand-write solutions, so we explain some of the problems here, and the general approach to resolving them.

Tangent Types

The Problem

Suppose that you have a type such as:

mutable struct A
    x::Float64
    a::A
    function A(x::Float64)
        a = new(x)
        a.a = a
        return a
    end
end

This is a fairly canonical example of a self-referential type. There are a couple of things which will not work with it out-of-the-box. tangent_type(A) will produce a stack overflow error. To see this, note that it will in effect try to produce a tangent of type Tangent{Tuple{tangent_type(A)}} – the circular dependency on the tangent_type function causes real problems here.

The Solution

In order to resolve this, you need to produce a tangent type by hand. You might go with something like

mutable struct TangentForA
    x::Float64 # tangent type for Float64 is Float64
    a::TangentForA
    function TangentForA(x::Float64)
        a = new(x)
        a.a = a
        return a
    end
end

The point here is that you can manually resolve the circular dependency using a data structure which mimics the primal type. You will, however, need to implement similar methods for zero_tangent, randn_tangent, etc, and presumably need to implement additional getfield and setfield rules which are specific to this type.

Circular References in General

The Problem

Consider a type of the form

mutable struct Foo
    x
    Foo() = new()
end

In this instance, tangent_type will work fine because Foo does not directly reference itself in its definition. Moreover, general uses of Foo will be fine.

However, it's possible to construct an instance of Foo with a circular reference:

f = Foo()
f.x = f

This is actually fine provided we never attempt to call zero_tangent / randn_tangent / similar functionality on f once we've set its x field to itself. If we attempt to call such a function, we'll find ourselves with a stack overflow.

The Solution This is a little tricker to handle. You could specialise zero_tangent etc for Foo, but this is something of a pain. Fortunately, it seems to be incredibly rare that this is ever a problem in practice. If we gain evidence that this is often a problem in practice, we'll look into supporting zero_tangent etc automatically for this case.

Tangent Generation and Pointers

The Problem

In many use cases, a pointer provides the address of the start of a block of memory which has been allocated to e.g. store an array. However, we cannot get any of this context from the pointer itself – by just looking at a pointer, I cannot know whether its purpose is to refer to the start of a large block of memory, some proportion of the way through a block of memory, or even to keep track of a single address.

Recall that the tangent to a pointer is another pointer:

julia> Mooncake.tangent_type(Ptr{Float64})
Ptr{Float64}

Plainly I cannot implement a method of zero_tangent for Ptr{Float64} because I don't know how much memory to allocate.

This is, however, fine if a pointer appears half way through a function, having been derived from another data structure. e.g.

function foo(x::Vector{Float64})
    p = pointer(x, 2)
    return unsafe_load(p)
end

rule = build_rrule(Tuple{typeof(foo), Vector{Float64}})
Mooncake.value_and_gradient!!(rule, foo, [5.0, 4.0])

# output
(4.0, (NoTangent(), [0.0, 1.0]))

The Solution

This is only really a problem for tangent / fdata / rdata generation functionality, such as zero_tangent. As a work-around, AD testing functionality permits users to pass in CoDuals. So if you are testing something involving a pointer, you will need to construct its tangent yourself, and pass a CoDual to e.g. Mooncake.TestUtils.test_rule.

While pointers tend to be a low-level implementation detail in Julia code, you could in principle actually be interested in differentiating a function of a pointer. In this case, you will not be able to use Mooncake.value_and_gradient!! as this requires the use of zero_tangent. Instead, you will need to use lower-level (internal) functionality, such as Mooncake.__value_and_gradient!!, or use the rule interface directly.

Honestly, your best bet is just to avoid differentiating functions whose arguments are pointers if you can.