Architecture¶
sparsegrad
performs forward mode automatic differentiation on vector valued function.
sparsegrad forward value is a pair where y is referred to as value
and the Jacobian is referred to as dvalue
.
During the evaluation, sparsegrad
values are propagated in place of standard numpy
arrays. This simple approach gives good result when only first order derivative is required. Most importantly, it does not involve solving graph coloring problem on the whole computation graph. For large problems, storing complete computation graph is very expensive.
When a mathematical function is evaluated, the derivatives are propagated using the chain rule
Support for numpy
¶
To support numpy
functions, broadcasting must be included. In the discussion below, scalars are treated as one element vectors.
Application of a numpy function involves implicit broadcasting from vector , with its proper shape, to vector , with shape of if both shapes are not the same. This can be denoted by multiplication by matrix . The result of function evaluation is then
and the derivative is
The Jacobian of numpy
function application is a diagonal matrix. If is a numpy
elementwise derivative of with respect of to , then
Optimizations¶
As a major optimization, the Jacobian matrices are stored as a product of scalar , diagonal matrix and a general sparse matrix :
The general parts are constant shared objects. If not specified, the diagonal parts and the general parts are assumed to be identity.
The factored matrix is referred to as sdcsr
in sparsegrad
code. It allows to peform the most common operations with rebuilding the general matrix part:
with denoting elementwise multiplication.
Backward mode¶
Backward mode is currently not implemented because of prohibitive memory requirements for large calculations. In backward mode, each intermediate value has to accessed twice: during the forward evaluation of function value, and during backward evaluation of derivative. The memory requirements to store intermediate values are prohibitive for functions with large number of outputs, and grows linearly with the number of steps in computation.