dO: A Differentiable Engine for Deep Lens Design of Computational Imaging Systems

Computational imaging systems algorithmically post-process acquisition images either to reveal physical quantities of interest or to increase image quality, e.g., deblurring. Designing a computational imaging system requires co-design of optics and algorithms, and recently Deep Lens systems have been proposed in which both components are end-to-end designed using data-driven end-to-end training. However, progress on this exciting concept has so far been hampered by the lack of differentiable forward simulations for complex optical design spaces. Here, we introduce dO (DiffOptics) to provide derivative insights into the design pipeline to chain variable parameters and their gradients to an error metric through differential ray tracing. However, straightforward back-propagation of many millions of rays requires unaffordable device memory, and is not resolved by prior works. dO alleviates this issue using two customized memory-efficient techniques: differentiable ray-surface intersection and adjoint back-propagation. Broad application examples demonstrate the versatility and flexibility of dO, including classical lens designs in asphere, double-Gauss, and freeform, reverse engineering for metrology, and joint designs of optics-network in computational imaging applications. We believe dO enables a radically new approach to computational imaging system designs and relevant research domains.

imaging modalities, and acquisition images are regarded as optically encoded information about the physical world. Computational methods, e.g., model-based numerical optimization or data-driven machine learning, are applied to raw images and decode the information. Examples are wavefront coding [3] and coded aperture variants [4], [5], [6] to extend depth-of-field or to perceive depth, cameras to capture individual rays (i.e., light fields) [7], [8], and cameras capable of high dynamic ranges [9].
The design of computational imaging systems is particularly challenging in that the design tradeoff happens in both hardware and software. As such the final design typically occupies a very high dimensional design space, that complicates easy and intuitive solutions. One way to explore the design space and evolve the co-design is to do stochastic optimization by gradient-descent, as shown in prior end-to-end trained Deep Lens works [10], [11], [12], [13], [14]. Such an approach requires the derivative of the final image concerning individual optical design parameters. This derivative-aware modeling is non-trivial for complex lens groups, and hence fully differentiable optical models have so far only been available for simplistic design spaces such as a diffractive optical element or a lensless mask, as demonstrated by various application-oriented deep optics designs, for image classification [15], depth estimation [16], [17], [18], lensless imaging [19], HDR acuisitions [20], [21], extended depth-of-field [22], computational micrscopy [23], [24], and for versatile purposes [25], [26]. See also [27], [28] for perspectives. These are a long way from the expressive power of ray-tracing based systems like ZEMAX [1] and Code V [2].
To partially tackle these challenges, derivative-aware lens design engines [29], [30], [31] were inspired by automatic differentiation (AD) [32], a fundamental technique in deep learning. The main distinction of a derivative-aware engine compared to conventional ones is differentiability: The availability of derivative information relating design parameters and the error metric. Via differential ray tracing [30], design parameters and their gradients are chained to the error metric through a so-called computational graph, on which back-propagation results in how each design parameter should quantitatively change to reduce the error metric. Together with gradient-based optimization, the obtained derivatives provide a searching direction in the hyper-parameter space to locally guide evolution of current design, improving performance in terms of the error metric. Combined with deep neural networks, such a differentiable engine could be employed for generating lens designs [33], [34] or image restoration [35], [36]. Recent works [13], [14] rely on differentiable ray tracing for end-to-end designs in This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ computational imaging. Similar trends also appear in other research domains for solving inverse problems utilizing this additional amount of information on derivatives, in computer graphics [37], 3D reconstruction [38], ptychography [39], [40], head-mounted displays calibration [41], lens metrology [42], and phase microscopy [43].
However, one main challenge of direct applying backpropagation to gradient computations, is memory-hunger, in that the underlining computational graph could grow large when many millions of Monte Carlo rays are sampled, as required by rendering a megapixel image for a design algorithm to process with. As such, intermediate variables and computations could easily fill up device memory, limiting the scope of scaling-up, hence reducing the overall performance of a derivative-aware pipeline. Compared to model-based (or, physics-based) learning scenarios [44], our applications require the image formation to be computed in a Monte Carlo fashion and cannot be explicitly stated, and thus memory-efficient techniques rely on known image formation models like [44] are not directly applicable. Similar memory-hunger issue exists also in differentiable rendering for end-to-end learning [37], [45], and solutions have been proposed [46]. However, an optical design engine differs from a general-purpose graphics renderer, in that optical surface geometry representations are well-parameterized surfaces (e.g., aspheric or freeform splines) rather than discrete meshes. For non-spherical representations, the parameterization is so complicated that there are no analytical solutions for ray-surface intersection. As a result, ray-surface intersections are dominant computations, and become the memory-hunger bottleneck in rendering sensor image and its associated derivatives to design parameters. Specific problems like this in a differentiable optical design engine, are problems that this paper seeks crafted solutions for.
This memory issue has not been fully addressed in previous derivative-aware ray tracing works [13], [29], [30], [33] due to different application-oriented goals. In [13], the implementation of a differentiable ray tracing scheme enables image rendering, and thus rays are traced backwardly starting from the sensor plane, through the lenses, landing at the scene. Gradients are naively accumulated by unrolling iterations of a ray-surface intersection root finder, i.e. straightforward back-propagation. This most frequent yet time-critical operation consumes a large amount of memory and computing resources that limit the number of rays permissible for each render batch. Further, the gradient aggregation at the sensor plane suffers additionally to the large amount of rays sampled per pixel, and [13] sidesteps this issue by splitting the sensor image into blocks and rendering them independently. However this issue can be addressed in a more principled fashion where gradients are computed and back-propagated in a way that is efficient in terms of both compute time and memory. This is made possible by computing the gradients instead of directly back-propagating the computational graph, as explained below.
In this work we propose dO (DiffOptics), a derivativeaware lens design optimization pipeline using differentiable ray-tracing. To back-propagate from the error metric to design variable parameters in a memory-efficient way, we analyze the gradients in situations of ray-surface intersection and adjoint back-propagation, to enable validity and efficiency for gradient computation and inference. Based on the obtained derivatives, advanced algorithms could be developed for specific design applications.
Various applications are demonstrated using the proposed derivative-aware pipeline, ranging from classical usage such as spherical and aspherical lens group design optimization, local sensitivity analysis, freeform surface optimization, setup misalignment estimation, to advanced computational imaging applications. We believe that these examples highlight the potential and possible application domains shown by dO. This diverse application range differentiates dO from previous works as a general-purpose derivative-aware pipeline based on geometric ray tracing.
In contrast to earlier works that rely on external ray tracing engines such as [13], dO is fully implemented in PyTorch, which makes it easily portable and simplifies integration with existing machine learning image reconstruction methods, while at the same time utilizing GPU compute resources in an efficient fashion. Source code will be available at [47].

A. Overview
We use geometric ray tracing to model the light transport in a sequential-mode lens design. A lens system is parameterized by a vector variable θ θ θ ∈ R n , a collection of n-number design parameters. Starting from one end of the lens system, sampling rays are sequentially traced through a set of parameterized optical surfaces, intersecting only once for each surface, while traveling towards the other end of the lens system. Rays can be traced starting the object plane towards the image plane, as the preferable way in lens design for aberration analysis. Alternatively, rays are traced reversely starting from the image plane towards the object plane preferably in graphics for image rendering. For the i th ray, intersection on the image (object) plane is determined by ray tracing, as a "black-box" function of θ θ θ, denoted as p i (θ θ θ) ∈ R 2 . Given m number of sampling rays, the collection of p 1,...,m is known as the spot diagram in lens design, denoted as p ∈ R 2m , a concatenation of vectors.
To optimize a design, a merit function (·) : R 2m → R is applied to p, producing a scalar error (p(θ θ θ)) ∈ R, as an indicator for design performance. Design optimization aims to solve for an optimal θ θ θ * by minimizing the error: For example, the usual merit function is the spot RMS error, by comparing the current spot diagram against a target one p target , in a least square sense: In dO, the merit function can also be at the irradiance level, when optimizing in the image space rather in the intersection space, e.g. in Refs. [13], [48]. This requires integrating a sensor image I(p) from intersections p, given a pre-defined pixel size. The error merit can be defined such that the produced image To be derivative-aware, all modules must be differentiable so that gradients can be back-propagated from the error metric (p(θ θ θ)) to variable parameters θ θ θ. This is achieved by two stages of the reverse-mode AD: the forward and the backward passes. To ensure differentiability and efficiency, a custom ray-surface intersection solver is introduced in Section II-C. Instead of unrolling iterations for forward/backward, only the forward (no AD) is computed to obtain solutions at surfaces f i = 0, and gradients are amended afterwards as in (7).
is close to a target image I target in a least square sense using pixel-by-pixel comparison: Crucially, the advantage of such an irradiance-based merit function allows for the consideration of software reconstruction modules such as deep neural networks in the design process. This allows for end-to-end designs in which the optics and the image processing software are jointly optimized, and design solutions are found in which the optical design maximizes the ability of the software module to restore quality images. Given a merit function (·), our goal is to evolve variable parameters θ θ θ iteratively towards an optimal θ θ θ * , using gradientbased optimization. This requires computing ∂ /∂θ θ θ ∈ R n , the partial derivatives that indicate how design parameters affect the error metric locally. Assuming (p(θ θ θ)) is a continuous function of θ θ θ, by the chain rule, the partial derivatives in (1) expand as, without or with I: In the following, we rely on using AD for computing (4). In AD, sole Jacobian matrices are rarely evaluated, instead the vector-Jacobian products are computed for a given perturbation, in our case denoted as Δ and Δθ θ θ, with J T (·) denoting the Jacobian transposes, (4) rewrites as: Given a desired decrease Δ in the error metric, our goal in this paper is to compute the corresponding variable parameter changes Δθ θ θ (i.e., the derivatives), by evaluating (5), and preferably being memory-efficient for scaling up to large design problems.

B. Ray Tracing With Back-Propagation
A straightforward way to compute the derivatives in (5) is by using AD to perform all the computations in ray tracing. This requires using a derivative-aware numerical library, e.g., PyTorch [49], to track and compute both the primal value and its derivatives for every elementary arithmetic operation. For our multiple-input and single-output case, i.e., many variable parameters and one scalar error metric, for numerical efficiency it is preferable to employ the reverse-mode AD, or back-propagation in the context of machine learning. Briefly speaking, to perform back-propagation on (1) to calculate the derivatives, two stages are required, the forward pass and the backward pass. Fig. 1 illustrates this process by applying the reverse-mode AD to ray tracing. In the forward pass, values of differentiable parameters (θ θ θ in our case) are inputs from the starting plane to the ending plane, by ray tracing, to obtain intersections p(θ θ θ). With a user-defined merit function, an error metric (p(θ θ θ)) is calculated. The forward pass defines a computational graph in progressive that relates all differentiable variables, and nondifferentiable variables are not needed for the forward pass with AD (i.e., forward with no AD in Fig. 1). In the backward pass, this computational graph is back-propagated to compute gradients following the chain rule, starting from the previously obtained error metric (p(θ θ θ)), to compute variable derivatives Δθ θ θ as in (5). Fig. 2 illustrates a derivative-aware example where for illustrative purposes the partial derivatives are shown for interpretation. For example, the derivatives of the spot diagram are the "flow" diagram ∂p/∂θ revealing the motion of the spot diagram when changing θ, where the arrows indicate moving directions and lengths indicate magnitudes. In this case, increasing θ will focus the rays, affecting more in the peripheral spots than the central ones. Similarly, the derivative of the rendered image is a "derivative" image ∂I/∂θ that tells how would the pixel value changes when changing θ. The "derivative image" indicates that the peripheral pixels are more sensitive than the central ones, as expected.
To ensure differentiability, all modules must be differentiable, i.e., modular computations are derivative-aware so that both  (6) is solved by Newton's method. Once the optimal t * is obtained, the AD-version t is computed as in (7). the primals and their derivatives are computed and tracked in a computational graph. This requires both the optical system and the merit function to be fully differentiable, so that θ θ θ, p, (p(θ θ θ)), and the desired derivatives Δθ θ θ given Δ , can all be related through ray tracing.

C. Differentiable Ray-Surface Intersection
To perform ray tracing from surface to surface, the crucial part in a differentiable optical system is to calculate ray-surface intersection, where an intersection point x ∈ R 3 is computed. The intersection lies on an optical surface described by an implicit function f (x; θ θ θ) = 0, and lies along the ray direction. Given a ray {o, d} of origin o ∈ R 3 and direction d ∈ R 3 of unit length, x = o + td for a positive marching distance t ∈ R + . Fig. 3 illustrates this problem, which finds t > 0 such that: It can be solved using iterative root finders (e.g., Newton's method) implemented in AD, unrolling the iterations for gradient evaluations, so that the solution t and its gradients can be related to the lens parameters θ θ θ, as in Ref. [13]. However, this straightforward approach is memory consuming because of storing the intermediate iteration variables. This issue can be avoided by taking advantage of the fact that solution to (6) is independent on initialization of t, and hence the optimal solution t * can be first solved with no AD, no need for storing intermediate states, and calculate the AD-version t by one Newton iteration: , where · denotes inner product, and ∇f (x; θ θ θ) denotes the spatial derivatives of the implicit function with respect to x, parameterized by θ θ θ. Refer to Supplemental Document for specific forms of f (·) and ∇f (·). Example surfaces are aspheres, XY polynomials, and B-splines. Notice our approach does not depend on specific iteration solvers chosen for solving (6), yet keeping the solution differentiable. Our approach is simple yet effective, and to our best knowledge is the first to address this memory issue in differentiable optics research. We employ Newton's method for obtaining t * , initialized by a non-singular estimate Fig. 3, and the iteration stops when residual is smaller than the tolerance.   [13], our approach produces a cleaner one, due to the memory allowance for sampling more rays, as the consequence of low memory consumption of computing the intersections. The finite difference result suffers from slight blurriness due to numerical rounding errors, because the finite difference values of θ +/− = {10 −6 , −10 −6 } were approaching the single floating point precision.
is within a few iterations of k. For spherical lenses, Newton's method converges in theory within exactly one iteration. Fig. 4 shows the superiority of the proposed approach, in comparison against the unrolled approach [13]. The unrolled approach needs to store intermediate states for AD, and consumes a large memory that cannot be affordable on a single GPU, and hence limiting the total number of freeform coefficients in the optimization. In contrast, our proposed differentiable solver drastically alleviates this issue, and imagine how often ray-surface intersection happen in ray tracing the design. This improvement allows dO to perform memory-efficient back-propagation when evaluating J T p in (5), no matter what merit function is chosen.
Our proposed method improves gradient estimates for Deep Lens designs, by allowing more rays to be sampled per pixel. In Fig. 5 a doublet aspheric lens (see Table III in Supplementary Document for prescription) is under investigation, with its first surface's 4 th aspheric coefficient θ = 0 being differentiated. Given a planar texture image object, dO renders the corresponding sensor image I and its gradient ∂I/∂θ. Fig. 5 shows the same gradient image computed using method in [13] and ours, as well as a finite difference version for reference, which is inaccurate due to limited numerical precision. Method in [13] is limited to fewer rays per pixel because of high memory consumption in computing ray-surface intersection, producing a noisy gradient image. In contrast, our method allows for more rays to be sampled per pixel, yielding a cleaner gradient image. This example demonstrates the superiority of our approach over [13], and the benefit of applying our approach for robust gradient estimate, and hence to Deep Lens training.  6. Adjoint back-propagation. In end-to-end learning for computational imaging applications, the merit function contains a rendering image, a neural network, and its weights as variables. Rendering the image could use many millions of rays, and the unaffordable memory limits total ray samples in back-propagation (a), and hence compromising the performance. Adjoint back-propagation (b) alleviates this issue by splitting computations into multiple stages (i), (ii), and (iii).

D. Adjoint Back-Propagation
Evaluating (5) depends also on the specific merit function chosen. For simple metrics such as the spot RMS in conventional lens designs, evaluating (5a) does not involve many rays and the memory issue is not urgent. However, when the merit function is in the image space, e.g. (3), and involves rendering images, for example a megapixel resolution image I(p) may take an intensive sampling of many millions of rays. In this scenario, direct back-propagating on (5b) could be problematic and may require impractical amounts of memory in the backward pass, making derivative evaluations prohibitive in practice.
This issue can be partially alleviated by exploiting a key insight from (5b) that, the target Δθ θ θ contains two parts: (i) ray-tracing relevant derivatives J T p determined by the current design, and (ii) error-metric relevant derivatives J T I J T determined by the chosen merit function (·). This separability property enables us to split computations into multiple passes, eventually alleviating the back-propagation memory issue. A similar memory-efficient approach was proposed in Ref. [46]. The separability property reflects by introducing an intermediate derivative image ΔI to (5b): By first evaluating (8a) and then (8b), the original (5b) can be eventually evaluated. The computations are hence split into three sequential stages: i) Forward computation (no AD). This obtains a rendered image I(θ θ θ) solely, with no derivative information on θ θ θ.
Since no computational graph is created or stored, this stage has low memory requirements. ii) Forward and backward on the metric function (I). This treats the rendered image I as a differentiable variable, and obtains ΔI by back-propagating (8a). iii) Forward and backward on the optical system. This gives the final desired variable derivatives Δθ θ θ by backpropagating (8b). The above procedure, termed the adjoint back-propagation, allows the backward pass to be performed at stages (ii,iii), and hence alleviating the memory issue as a whole. The adjoint  (8).
This computational spirit is depicted in Fig. 6, where the merit function is chosen deliberately to be the one used in end-to-end learning [10], [11], [12], where the rendered image I (p(θ θ θ)) is fed to a neural network whose own weights θ θ θ net are adjustable during training, as a set of additional differentiable parameters in the design optimization process. In end-to-end learning, the goal is to optimize both the optical system parameters θ θ θ and network weights θ θ θ net such that the hardware-software combination produces the optimal image quality for the neural network to sharpen raw sensor images. See Figs. 15 and 16 for examples. Fig. 7 shows a memory consumption comparison between the two back-propagation methods, applying to Fig. 2, assuming a dummy merit function of (3). The adjoint back-propagation takes more time, but the memory consumption maintains at a low-level, allowing for aggregations and sampling scale-ups. This memory-efficiency enables a large number of sampling rays for faithful image rendering for practical usage in end-to-end computational imaging designs. Fig. 8. Entrance pupil calculation. By tracing a dense grid at the very first optical surface, whose aperture is the dark circle, overlaid on which we can obtain the entrance pupil area (bright region) for subsequent ray spatial sampling, at different viewing angles. Fig. 9. Spot diagrams and RMS spot errors that produced by dO highly resemble those by Zemax. Fig. 10. Spherical aberration minimization. Differentiable parameters θ θ θ are surface curvature, conic and 4 th asphere coefficient. Our solver converges in 1 s. Fig. 11. Nikon lens group total aberration minimization. Starting from an allspherical design, with the differentiable parameters θ θ θ being all surface curvatures and aspheric coefficients of two surfaces, our optimized version achieves similar performance as the original design.

A. Optimization
Given the derivatives Δθ θ θ from back-propagating (5), dO performs optimization and iteratively changes the variable parameters θ θ θ. When there are constraints in the design, e.g., positive air-spacing, minimum glass thickness or back focal length, Fig. 12. Tolerancing a nominal lens system. The sensitivity matrices are readily obtained as the Jacobians in dO. Here, and ∂ /∂θ θ θ represent the sensitivity matrices in three different field of views. Fig. 13. Overview of advanced applications. (a) Caustic engineering aims to design a freeform surface to produce a target irradiance pattern at certain distance (Fig. 14). (b) Real setup misalignment can be estimated by using dO in reverse as a black-box solver (Fig. 17). (c) End-to-end designs that jointly optimize lenses and image post-processing algorithms (here, neural networks) for computational imaging applications (Fig. 15, 16). maximum system overall size, (1) can be modified by adding a vector constraint function.
The specific optimization method depends on the number of variables. When the number of variables is small (for example θ θ θ ∈ R n , n < 20), Classical damped least squares [50] are employed to efficiently optimize (1), that the required Jacobians can be constructed from the derivative vectors. In this case it does not take full advantage of the derivative-aware property of dO. This is in contrast to the case when n is large (e.g. for free-form surface optimization) and the Jacobians are too large to be explicitly constructed, and popular gradient descent methods such as Adam [51] can be employed. If desirable, additional regularization terms are possible, for example when solving (15). This optimization flexibility feature differentiates dO from existing optical design software.

B. Lens System
We follow the standard lens design pipeline [52], [53] to model a lens systems. We focus on the sequential mode, where starting from one end of the lens system, rays are sequentially traced through a sequence of parameterized optical surfaces (including the image plane, i.e., the sensor plane), intersecting only once for each surface, while traveling towards the other end of the lens system. In the sequential mode, the exact visibility ordering of the surfaces is known a priori, and thus no need for finding the closest surface intersection when performing ray tracing.
Ray propagation through a lens system involves two major steps, finding the ray-surface intersection point (finding intersection points by solving (6)), and refraction of the ray at material interfaces with chromatic effects. Only valid rays are traced in continuity, whereas invalid rays happen when the intersections are outside of the lens geometry or when total internal reflection takes place.
At material interfaces, transmitted direction d t is determined from surface normal direction n = ∇f/ ∇f and incident direction d i , by Snell's law [54]: where cos ψ i = d i · n and η = n i /n t is the ratio of refraction indices of the two materials. Refractive index follows Cauchy's equation n(λ) = A + B/λ 2 , with A and B determined from central refractive index and Abbe number.
Though in this work we focus on the sequential mode where optical surfaces are fixed in a known order, non-sequential mode should also be possible with proper extensions and modifications on the current ray tracing engine.

C. Two Tracing Modes: Forward and Backward
Depending on the needs, rays can be traced through a lens system in two different modes, forward mode or backward mode. In the forward mode, rays are traced starting from the object plane towards the image plane. This is the preferable way in lens design for aberration analysis, e.g., generation for spot diagrams. In the backward mode, rays are traced reversely, starting from the image plane towards the object plane. This is a sampling efficient way for sensor image rendering, and thus is the preferable way in computer graphics to render realistic images. We will be using these two modes interchangeably depending on specific needs.

D. Differentiable Image Rendering From Intersections
Once tracing is finished, a synthetic image I(p) can be generated given the intersection points p, provided with a proper image integrator. This process (termed rendering) of pixelization from a continuous light signals to discretized pixel values, is handled differently in the two tracing modes.
Reconstruction filter in the forward mode: In the forward tracing mode, performance analysis is conduct to understand the optical property of the current design, where rays are purposely generated according to analysis-specific criteria. This process may involve gradient computations, and differentiability is desired in that the generated image pixel values are differentiable to intersection point movements.
To ensure differentiability, the derivatives of this mapping from intersection to pixel values need to be continuous. dO employs a linear interpolation scheme to round the fractional intersection point p into the nearest four neighboring pixels, with the corresponding four pixel values linearly weighted.
Monte Carlo integrator in the backward mode: In the backward tracing mode, rays are initialized from the pixelated sensor plane, and are traced outwards, and the major goal is to render a physically correct image for the current design given a specific scene. This is achieved by integration of the rendering equation [55], and is evaluated by renderers using a Monte Carlo integrator for discrete sampling the integral. This backward tracing mode is utilized in end-to-end system designs, see Section V-B.

E. Entrance/Exit Pupil Computation for all Fields
To perform ray tracing for a lens system, the entrance pupil has to be determined first, which is the area over the very front lens element where rays from a given viewing angle will finally reach the sensor plane. This is easily determined for paraxial angles, but not for larger angles. Fig. 8 demonstrates this vignette effect of a double-Gauss design [56], with the calculated entrance pupils shown at different views. Our engine determines the entrance pupil by ray tracing a dense grid (1025 × 1025) at specific viewing angles. Entrance pupils are determined if the sampled rays propagate through all the optical elements successfully. Exit pupils can be determined in a similar manner as described in [57].

F. Derivative-Aware Implementation
dO is implemented from scratch in PyTorch [49]. In comparison to [13], which is based on Mitsuba2 [37], dO is therefore more portable and very easy to combine with existing deep neural network code for image reconstruction. This makes it easier to integrate dO in various end-to-end design problems while at the same time offering more degrees of freedom in terms of flexibility for additional applications. Due to the PyTorch backend, dO effectively uses GPU compute resources and due to the algorithmic improvements outlined in the previous section, it is significantly more memory efficient than previous end-to-end differentiable ray-tracing systems. dO produces highly identical results to modern lens design software. To verify this, a double-Gauss design [56] is under sanity check with Zemax [1],  using single wavelength λ = 587.56 nm at four field of views, as in Fig. 9. The spot diagrams and the RMS errors are almost identical to the Zemax results despite a slight variation due to different aperture sampling strategies.

IV. CLASSICAL APPLICATIONS
dO can manage classical design problems, as will be demonstrated in this section. Spot RMS error at different viewing angles is chosen as the merit function (·) for design optimization. See Supplemental Document for lens prescriptions and full details.

A. Design Optimization
Spherical aberration minimization: The first example is to optimize the aspherical coefficients of an asphere lens to minimize the axial spherical aberration. In Fig. 10, parameters of a revised asphere lens design (Thorlabs, ACL5040 U) are optimized in the hope of reducing the axial RMS spot. Compared to the initial design, our differentiable engine ends up with a nearly six times smaller RMS spot.
Photographic camera design optimization: The engine can also optimize complicated lens group for design optimization. Fig. 11 shows the second example to re-parameterize curvatures and aspheric coefficients of a Nikon patent design to minimize the total RMS spot error at different field of views (0°, 10°, 20°, 32.45°), at three wavelengths (656.27 nm, 587.56 nm, 486.13 nm). The design is initialized by removing all aspheric coefficients, showing large aberrations. After optimization, our optimized design shows a comparable mean RMS error of the original design. This example demonstrates the capability of dO to perform multi-lens and aspheres optimization.

B. Tolerance Analysis
In tolerancing a lens system, or known as sensitivity analysis, a presuming small, linear parameter perturbation Δθ θ θ is enforced, and the total effect of the perturbation, Δ , is calculated through the nominal system to determine the potential effects. To avoid confusion between perturbation and the gradients considered before, we use the (·) notation. This is mathematically paraphrased by relating Δ to Δθ θ θ through the derivatives, known as the Jacobian-vector product in AD terminology: Notice there is a reciprocal relationship between (10) and (5). In the forward analysis, Δθ θ θ is given to compute Δ , whereas in the inverse analysis, Δ is given to compute Δθ θ θ, assuming proper prior probabilities or regularization on Δθ θ θ. Despite alternative methods such as finite difference, analytical gradients [58], or the wavefront differential method [59], dO can evaluate (10) intrinsically, without additional implementation efforts.
In Fig. 12, a Cooke triplet is under sensitivity analysis, with all the optical element positional misalignment parameters θ θ θ being toleranced. dO can obtain the Jacobian matrix, known as the sensitivity matrix for further analysis. See Supplemental Document for misalignment parameter definitions.

V. ADVANCED APPLICATIONS
Thanks to differentiability and the versatile optimizer, dO can be combined with advanced post-processing computational algorithms for complex designs or setup reverse-engineering, and beyond the usual application range of lens design software. Such domain-specific applications are deemed not easily configurable with existing design software, as three examples shown in Fig. 13.

A. Caustic Engineering
Caustic engineering aims to produce a target image by pixelwisely changing the directions of a directional light source, by optimizing a freeform optical surface [60], [61]. Here, we demonstrate caustic engineering as one of the applications of dO. We parameterize the desired surface in spline freeform, and thus this representation ensures smoothness and higherorder continuities of the surface, in contrast to the mesh-based representations in alternative differentiable solvers [37]. The freeform knot positions are fixed, spline coefficients are optimized and consequently changing the surface geometry. This flexibility of geometry representation allows dO to perform freeform surface optimization. We employ the forward mode to render caustic images. To enable a satisfactory reconstruction, the desired freeform surface is assumed to be smooth and hence is represented by B-splines with a large degree of freedom approximately equal to the pixel number of the target image. The error metric was the standard mean-square-error (MSE), and we optimize the B-spline coefficients θ θ θ that characterize the freeform surface: min (11) is optimized using Adam [51] because construction of Jacobian is computationally prohibitive due to the large number of variables in (11). Fig. 14 shows the results along with intermediate optimization states, where the optimized freeform surface is shown in height maps. The reconstruction image is contrast-preserving, and the optimized freeform surface is smooth. Although in this particular example, the contrast in our result is not high compared to alternative specific solvers, e.g., using optimal transportation [60] or iterative warping [61], our result shows freeform designs would be more approachable when a plug-and-play differentiable engine such as dO is available.

B. End-to-end Computational Imaging Designs
Computational imaging designs [10], [62] require jointly optimization of hardware and software as a whole part. For example, a blurry image I can be de-blurred by a designed algorithm or a neural network N (·), and becomes a potentially sharper image N (I). In end-to-end designs, this design process involves an evolution of both the lens design (parameterized by θ θ θ) and the post-processing algorithm N (·) (parameterized by θ θ θ net , e.g., an untrained neural network of weights θ θ θ net ). End-to-end design optimizations are usually accomplished in a supervised setting, by using a training set as target images I target to educate and evolve the hardware/software variable parameters. This design pipeline and the back-propagation procedure has been depicted in Fig. 6. An end-to-end design error metric involves both θ θ θ and θ θ θ net , and is re-phrased as: where L(·, ·) is a general loss metric function to quantify a total error difference between the post-processed image N (I(θ θ θ); θ θ θ net ) and the ground truth target image I target . The goal in the following is to minimize (12) for two end-to-end computational imaging designs, using dO. End-to-end design for extended-depth-of-field imaging: In wavefront coding, a pupil plane phase modulator is introduced to deliberately distort input lights for point-spread-function (PSF) engineering, e.g., cubic phase plate [3] that produces depthinvariant PSFs for extended-depth-of-field, and coded aperture [4] and lattice-focal [63] that produce depth-sensitive PSFs for depth retrieval. This process is a joint optical-algorithmic problem in that the final image is the output from a sequential appliance of the encoding optics and the decoding algorithm. Prior end-to-end approach relies on paraxial approximation [10], ignoring the spatially variant nature of PSFs. Here, dO provides an initial solution to break this limitation, in that the optical system is faithfully reproduced by ray tracing. We parameterize the phase plate using only third-order XY polynomials, and end-to-end jointly optimize the polynomial coefficients θ θ θ XY and U-Net [64] parameters θ θ θ net for extended depth of field applications, as in Fig. 15. Specifically, the total error metric in (12) consists an MSE loss L mse and a channel feature loss L vgg16 on pre-trained VGG16 [65]: See Supplemental Document for full details on network architecture, definitions of loss terms L, and training set generation. Initialized from a null zero phase, the optimized phase in Fig. 15(a) exhibits a similar structure as in [3], verified by the central PSFs in Fig. 15(b). Given the blurry raw input images, the post-processed images in Fig. 15(c) reveal sharp features. End-to-end design for large field-of-view (FOV) imaging: Large field-of-view (FOV) imaging is challenging for cemented doublets due to the severe aberrations that cannot be fully compensated because of the lack of design degree of freedoms. We argue this large aberrations can be corrected not through additional optical components, but through a computational post-processing algorithm that applies on the perceived aberration image. Using an end-to-end learning, this optical aberration limitations can be addressed by using a post-processing neural network, here we use [66]. The training follows (12), optimizing: Fig. 17. dO can be employed as a general-purpose solver to back-engineer real setup misalignment parameters so that simulation and reality match, demonstrated by the high similarity between optimized images and the target (real) ones. (a) Experimental setup includes an LED light source, a lens, and an image sensor. (b) LA1131 was in focus but tilted. (c) LA1986 was out of focus and tilted. Optimized images share high visual similarity to the real target ones, revealing the success of our approach. See Visualization 2 for the optimization process.
A cemented doublet parameterized by its aspheric coefficients θ θ θ asphere and network parameters θ θ θ net are jointly optimized following a similar metric as in (13). Fig. 16 shows how the original blurry images can be post-processed and be deblurred, after the end-to-end learning, justifying the applicability of dO.

C. Misalignment Estimation
Finally, we show real experimental results to leverage potentials of the proposed differentiable pipeline, dO, by using it in reverse to estimate misalignment of an experimental setup. This task of misalignment estimation is not possible with previous differentiable renderers, e.g., Mitsuba2 [37] or Ref. [13], because of geometric representation incompatibility and missing implementations for optics element perturbations, and most importantly, the memory efficiency of dO allows for large number of rays to be sampled per pixel for gradient estimates. In Fig. 17(a), a pinpoint light source, consists of an LED (central wavelength 622 nm) and an iris of radius 0.4 mm, is placed far in front of a misaligned plano-convex lens. The lens was able to rotate freely, but the exact angular values were unknown, and the imperfect mounting leads to slight yet noticeable tilting.
The setup is imaged by a monochromatic CMOS sensor (FLIR, GS3-U3-51S5M-C, pixel size of 3.45 µm). Without knowing the exact position and angle parameters θ θ θ, e.g., light source position, sensor to lens distancing, lens yaw/pitch angles, it is very challenging to reproduce experimental measurements by manual parameter tuning a simulation setup. In Fig. 17, we show the success of dO to estimate such misalignment parameters, by minimizing the MSE error between the simulation image I(θ θ θ) and the target real captured image I real . We employ the forward mode to render I(θ θ θ). To escape local stationary points and to regularize gradient computation, we enforce centroid alignment between the two images, by denoting C(·) as an operator for calculating image centroid: where μ is a tradeoff parameter to balance between MSE and alignment errors. Empirically, (15) is optimized by Adam and damped least squares in alternation. The optimization usually takes < 0.5 min to finish for a megapixel image resolution on a GPU (Nvidia, GeForce RTX 2080 Ti). Two lenses of different focal lengths (Thorlabs, LA1131, focal length of 50 mm; LA1986, focal length of 125 mm) were under investigation, as shown in Fig. 17(b) and Fig. 17(c). In Fig. 17(b), the lens was focused but tilted, and the goal is to re-parameterize the simulation to fit real measurements. In Fig. 17(c), the lens was slightly de-focused, introducing a blurry bright disk on the sensor plane. With an increase of angular misalignment, the image smears and elongates in the horizontal direction. Initialized from a coarse setup estimation, the final optimized images match real images visually well. Refer to Supplemental Document for full results. This example proves the validity of the differentiability of dO, and demonstrates the possibility of using dO for setup calibration. The simplicity of using the proposed engine for misalignment estimation allows integrating self-calibration into existing numerical modeling pipelines. From a broader perspective, we demonstrate the possibility of using a differentiable ray tracer such as dO, as a general inverse solver for metrology problems.

VI. DISCUSSION AND CONCLUSION
In principle, ray optics limits the application range of dO in that the wave nature of light is ignored. Resolution demanding imaging applications towards diffraction-limited performance such as telescope or microscope designs are hence not possible at this point. In methodology, we rely on gradient descent and damped least squares as the optimization techniques, and thus a number of iteration steps are required for convergence. Due to its local optimization nature, the solver suffers from the local minima problem and the initialization sensitivity issue. In software, dO relies mostly on reverse-mode AD, which is known to be memory-consuming especially for large amount of Monte Carlo samples, though partially alleviated, can still limit the number of rays used when optimizing the design. More memory-efficient approaches could be explored. We also expect faster speed performance after careful code improvements.
The current ray tracing engine could be further enhanced. From design perspective, more surface representations could be implemented, e.g., Laguerre, Hermite, and Zernike polynomials. New tracing methods are possible, for example paraxial tracing [67] and Gaussian beam tracing [68]. Also, new features could be implemented, e.g., diffraction and gratings, Fresnel equation and polarization ray tracing, coatings, stray light and ghost analysis, and non-sequential tracing. From application perspective, further applications could be explored, e.g., lens metrology [69], realistic lens rendering [70], and wavefront sensor designs [71], [72]. Hardware-software end-to-end optimization for domain-specific applications [10], [11], [12], [62], or semi-supervised training for automatic lens design [33], [34] are also target topics. We believe dO serves as an initial starting point towards these applications.
To conclude, we have proposed dO, a differentiable ray tracing engine for lens design. Board applications are demonstrated, ranging from classical design optimization and sensitivity analysis, to computationally intensive freeform design, end-to-end learning for computational imaging applications, and to challenging experimental misalignment estimation. We envision the potential of dO, as it opens up an exciting aspect to bring first-order gradient insights into lens design and relevant problems.