Sampling-Noise Modeling & Removal in Shape From Focus Systems Through Kalman Filter

Shape from Focus (SFF) is one of the passive techniques to recover the shape of an object under consideration. It utilizes the focus cue present in the stack of images, obtained by a single camera. In SFF when the images are acquired, the inter-frame distance, also known as the sampling step size, is assumed to be constant. However, in practice, due to mechanical constraints, sampling step size cannot remain constant. The inconsistency in the sampling step size causes the problem of jitter, and produces Jitter noise in focus curves. This Jitter noise is not visible in images, because each pixel in an image (of the stack) will be subjected to the same error in focus. Thus, traditional image denoising techniques will not work. This paper formulates a model of the Jitter noise, followed by the design of system and measurement models for Kalman filter. Then, the jittering problem for SFF systems is solved using the proposed filtering technique. Experiments are performed on simulated and real objects. Ten noise levels are considered for simulated, and four for real objects. RMSE and Correlation are used to measure the reconstructed shape. The results show the effectiveness of the proposed scheme.


I. INTRODUCTION
Three-dimensional shape recovery using two-dimensional images is a well-established research problem in computer vision applications, robot and machine vision, bio-informatics, medical imaging, consumer cameras, microscopy, and so forth [1]- [6]. In recent years, many techniques have been proposed to recover depth maps from acquired images, as natural scenes under different conditions produce different cues [7]. These cues are distinguished from each other, and can be measured, depending on various factors. One of these cues is focusing that is measured by determining the blur-degree of the image. The method by which object shapes within scenes are estimated, by accommodating focus cues by means of fixed-axis-multiple-images, is referred to as shape from focus (SFF).
This area of research has progressed extensively in recent years. All of SFF methods broadly consist of three main steps; image acquisition, focus measure (FM) application, The associate editor coordinating the review of this manuscript and approving it for publication was Liang Hu . and shape improvement techniques, which are discussed briefly below.
Shape from Focus systems are modeled by the simple lens equation, given as: where f is the focal length of the imaging device, u is the distance of the object point from the imaging device, and v is the position of the object point where it is best focused by the lens. Fig. 1 illustrates this. In SFF, the image stack is acquired by manipulating one of the factors of (1), while keeping the other two factors constant. Conventionally, the images are acquired by changing u of the system. An example of such type of system is an optical microscope. In other systems, focus settings (focal length f ) can also be changed to capture images. Examples of these systems can be found in [8]. Either f or u is changed in small steps, and an image (of dimensions l × m) is obtained, and stored in the image stack, giving the total number of images as n. Changing v for image acquisition is quite challenging, or mostly not feasible [9]. However, whatever factor is manipulated, the magnification of the imaging system should remain constant, while the depth of field should be as shallow as possible [10].
When all the images are acquired, the result is an image stack I of dimensions l × m × n, and each pixel in the stack is represented by P i,j (k), where 1 ≤ i ≤ l, 1 ≤ j ≤ m and 1 ≤ k ≤ n are the indices in the l, m, and n directions. P i,j (k) also represents the pixel curve along the optical axis. This is shown in Fig. 2. The number of images (n) is given by: where is the sampling step size. The step size expression for change in u is provided in [9]. The main idea of SFF is to estimate the shape of the object in consideration using the focus cues present in the image stack. The sharpness of focus in an image is measured by a sharpness criterion, called the focus measure (FM) operator. After the image stack is obtained, the FM is applied to each pixel, to measure the amount of focus each pixel possesses, by the following: where F is the FM transformation of pixel P i,j (k) to obtain the focus value i,j in the k th image, and represents the focus behavior or (in other words) the focus curve of the pixel. There are numerous FMs proposed in the literature, summarized by [8], each designed to suppress the out-of-focus regions, and enhance the in focus points in each image. For example, Sum of Modified Laplacian (SML) utilizes squares of the second derivative of the images, Tenengrad or Tenenbaum (TENG) utilizes the first derivative of the image, and Gray Level Variance (GLVA) uses statistical method to compute variance as focus measurement, whereas, Image Curvature(CURV ) calculates image surface curvature. There are other FMs such as Image Contrast (CONT ), 3D Laplacian The key role of FM in SFF systems is to provide a sharp focus curve (in parallel to the optical axis) for every object point in the image stack. Conventionally, the initial depth map D i,j can be obtained by maximizing the focus curve along the optical axis, and obtaining the value of k where i,j (k) is the maximum according to: When the images are acquired, the shape of the object is discretized into image frames, causing loss of information between two consecutive frames. To address this issue, many techniques have been proposed in the literature. The traditional techniques used SML as focus measure and apply Gaussian interpolation technique to compute intra-frame values for better focus [11], [12]. The other concept of focusedimage-surface and curved-focused-image-surface [13], [14], utilizes piecewise curved surface approximation. Alternatively, Neural Networks and Deep Neural Networks have also been employed [2], [15], [16]. Kim et al. in [17] proposed a method to improve the efficiency of neural networks by introducing a weight passing method. Muhammad & Choi in [10] proposed a method based on Bezier Surface approximation. Ali et al. in [18] increased the accuracy of 3D shapes by applying 3D weighted least squares to enhance image focus volume. Ali et al. in [19] also used the wavelet transform method to improve shape reconstruction. Guided image filtering for depth enhancement in SFF is proposed by [20]. Also, Ali et al. in [21] recovered several 3D shapes by applying different FMs, and combining their results into one final shape. Fan et al. in [22] used the combination of 3D steerable filters on treating texture-less regions. Since the reconstructed 3D shape quality in SFF depends on the applied FM, another FM based on the analysis of 3D structure tensor of the image sequence is proposed by [23]. Ma et al. in [24] proposed a method for depth reconstruction that utilized non-local matting Laplacian along with Markov Random Field. Yan et al. [25] utilized pulsed coupled neural network to aggregate shape using focus. Jang et al. in [26] proposed shape optimization through non-para-metric regression.
The paper is structured as follows. Focus measurement and focus curve models are discussed in next section followed by motivation of the work. Section IV addresses Jitter noise Modeling. Section V proposed the methodology, while Section VI presents the Results and Discussion. Section VII then concludes the work.

II. FOCUS MEASUREMENTS & FOCUS CURVE MODELS
The focus curves or the focus behavior (of every individual pixel) depend on the FM used, the nature of cue the FM utilizes, the camera (imaging device) parameters, and most importantly, the image texture around that object point [8], [9]. If the images are acquired properly, then these focus curves are bell-shaped [12].
The Gaussian Model is given by: the Lorentzian-Cauchy Model is given by: and the Quadratic Model is given by: where the As, Bs and Cs are the parameters of each model. The unification of these models into Quadratic Model have been provided by [27]. If the logarithmic transformation is applied to (5), and the equations are simplified, it transforms to (7) [27], as follows: Similarly, the reciprocal transformation, when applied to (6) can result in (7) after simplification, as follows: This paper utilizes the Quadratic model (given in (7)) to model the Jitter noise in SFF systems in the next section. The Quadratic model provides computational advantage over Gaussian and Lorentzian-Cauchy models, due to its simplicity and robustness [27].

III. MOTIVATION
In SFF, when the shape is discretized into image frames by sampling the object in scene, the step size for sampling is presumed constant [9]. Although shape from focus has been thoroughly investigated in recent years, there still exist several insufficiently solved problems that impact the performance of the system. One of these problems is the unstable or non-constant sampling step size. This can be due to the mechanical structure of the imaging device and lens-focusing methods. The resultant variation in the amplitude of the signal due to instability in sampling step size is referred to as Jittering or Jitter noise.
Jang et al. in [28] proposed the removal of Jitter noise using Kalman Filter. Since then, many variants of their method have been proposed [29]- [33]. However, all of their methods used scalar-models for Kalman filter (i.e., the system matrix was taken as 1), and ignored the dynamic nature of focus cues. For each step, multiple images were acquired to eliminate the Jitter i.e., if there were n images (of dimensions l × m) in the stack, they required 100 samples for each step, and therefore, n × 100 samples were required for each focus curve. This increases the complexity of the system, and huge computational cost has to be paid. It also impacts the practical use of their methods. Also, Jang et al. in [28]- [33] considered only symmetric bell-shaped distributions for vibrational noise in translational stage, and their designed measurement model measures only a constant (each step position k). However, in such case taking the mean of the measurement values on every step position k can provide the similar results. The authors have also considered the Jitter noise to have Normal and Levy distributions [28], [31]; however, in practice, this resultant Jitter noise due to the vibrational noise does not follow (Normal or Levy) symmetric bell-shaped distributions.
In this paper, the nature of Jitter noise has been studied and necessary conditions for approximating this noise are proposed and discussed. Jitter noise in SFF systems is a position dependent noise and varies according to the focus position. The manuscript models Jitter noise and conclude that it follows gamma ( ) distribution (a non-symmetric distribution) with a constant mean and position dependent variance (discussed in Section IV). A Kalman filter is then designed for removing this noise from SFF systems.
The system matrix is formulated using the Taylor series, followed by explanation and design of the measurement model. The shape recovery expression is then provided. In the proposed scheme, a single measurement is taken for each step. Thus, for n images (of dimensions l × m) in the image stack, the proposed method requires only n samples for each focus curve, and utilizes 100 times less images as compared to previous methods, providing better shape recovery results in terms of correlation and RMSE (provided in Section VI).

IV. JITTER MODELING IN SHAPE FROM FOCUS SYSTEMS
In SFF systems, jitter occurs when there is uncertainty or unevenness in the step size of u or f . In this section, we discuss the step size in both situations of image acquisition i.e., change in object distance from the lens ( u), and change in focal length of the imaging device system ( f ).
After this, we discuss types of Jitter, followed by the proposed model for Jitter noise. The next section utilizes this proposed Jitter noise model and Kalman filter to remove the effects of jittering on focus curves.

A. STEP SIZE IN SFF IMAGE ACQUISITION
The step size expression for u is provided by [9]. In their system, the object is moved towards (or away from) the imaging device in small constant steps of u, by keeping the focal length and magnification constant, and also keeping the depth of field as shallow as possible. Their simplified expression for u (step size) is provided as follows: where DoF is the depth of field of the system, and ρ = 2.9957, as provided by [9]. Equation (10) provides the maximum limit for u. The ideal example of such systems is an optical microscope. The images can also be acquired by changing the focal length of the system in small, constant increments of f [34]. In this type of image acquisition for SFF, the object is held static in front of the imaging device, and the device focal length is changed. Mostly, auto-focusing algorithms utilize this type of technique for searching for the best focal lens position for a single point. This can also be used for depth and shape estimation of the object under consideration [35].
In both the above cases, an image is stored at every step to obtain a stack of images, as discussed in Section I.

B. MODELING JITTER IN SFF
To model the jitter, consider again the Quadratic function, as follows: where a 2 , a 1 and a 0 are the equation parameters, g(k) is the quadratic function, and 1 ≤ k ≤ n represents the sample points of this function. The step size is k = k − (k − 1), and is considered as 1.
To model Jitter noise, consider the uncertainty in step size as, ∼ N (0, σ 2 ), then (11) can be written as follows: Rewriting (12), by expanding the squared terms and simplifying using the Taylor series, the following equation is derived: Using (11) and (13), following is obtained: Equation (14) shows that the noise on the RHS of the equation is multiplied to the first and second derivatives of the function, concluding that the Jitter noise in SFF systems depends on the slope and concavity of the focus curves. If is Normal (N (0, σ 2 )), 2 will follow a chi-square distribution. Equation (14) can be rewritten as: where η k N and η kχ are given by the following: and, where η k N is normally distributed with mean µ N = 0 and variance σ 2 N = (2a 2 k + a 1 ) 2 σ 2 . Meanwhile, η kχ follows gamma ( ) distribution, with mean µ χ = a 2 σ 2 , and variance σ 2 χ = 2a 2 2 σ 4 . Therefore, the total resultant noise η will have mean µ η = a 2 σ 2 and variance as σ 2 η = (2a 2 k + a 1 ) 2 σ 2 + 2a 2 2 σ 4 . Derivation of the mean (µ η ) and variance (σ 2 η ) is based on the standard manipulations with probability theory formulas and properties of expectations.
The value of the variance of η k N (σ N ) will be different at every k th step, and will become zero at: The direction of η kχ will always be towards the concavity of the function.
The range (R χ ) of η k χ depends on g (k). If g (k) > 0, then 0 < χ < ∞; similarly, if g (k) < 0, then −∞ < χ < 0; and if g (k) = 0, the effect of this noise is zero. However, the physical limitations and restrictions of the imaging device restrict χ < k. The noise factor of η kχ will remain throughout the function g(k + ); however, the sign of η χ will depend on the sign of g (k), and will always be towards the concavity of the function.
For the values of k other than (or near to) the value given in (18), η k N will be more significant for η kχ . But as the function approaches the value given in (18), the effect of η k N VOLUME 9, 2021 diminishes, while the contribution of η kχ becomes significant. However, if the variance of , given by: is chosen, the effect of η χ can be ignored, making η N the only contribution to the noise, and the resultant noise will be Normal. However, if the condition given in (19) is violated, then η kχ can play a significant role, and thus cannot be ignored. The combination of both η k N and η kχ results in a non-Normal noise.

V. THE PROPOSED METHODOLOGY
In the previous section, Jitter noise is modeled, and explained in detail. In this section, the proposed method is presented. The proposed scheme can be applied in two ways, before FM (as pre-FM application, i.e., on pixel curves P i,j (k)), or after FM (as post-FM application, i.e., on focus curves i,j (k)).
To fully remove the Jitter noise from the pixel/focus curves, the Kalman Filter is designed as follows:

A. KALMAN FILTER DESIGN
To model and design the Kalman filter, the proposed method in this manuscript utilizes the cubic equation to design the system, and measurement equations for the filter to remove the Jitter noise from the pixel/focus curves. Although quadratic equation can also be utilized, cubic equation gives the system model more robustness and flexibility, since the cubic equation is of higher degree than quadratic equation.
The system and measurement models are derived in the following sections.

1) SYSTEM MODEL
In order to derive the system model for Kalman filter application, the cubic equation is considered and is given as: where a 0 , a 1 , a 2 , and a 3 are the coefficients of the equation, and h k is the cubic function. The first, second, and third derivatives of (20) are: Using the Taylor series again, the equation for h k+1 is written as: By expanding the powers in (22) and rearranging the following is obtained: Utilizing (20) and the set of equations in (21); (23) can be rewritten as: Using (24), and repeating the process for h (k + 1), h (k + 1), and h (k + 1), a similar set of equations can be obtained for the set of h k , and its derivatives: Thus, utilizing this set of equations in (25), the system equations for the Kalman filter can be written as: where A is the system state matrix, X k is the state vector. The predicted state noise at k is given by ω k ∼ N (0, Q), and h represents the focus curve ( i,j (k)) or pixel curve (P i,j (k)) values. The manuscript assumes that there is no system noise and the only noise present is due to jitter in measurements, therefore, Q (the process covariance matrix) is assumed to have a very small value but not 0. Then, state vector X k , and system matrix A are given by: The system covariance equation is provided by: where − k is the predicted (estimated) covariance matrix at k, k−1 is the predicted covariance matrix at k − 1, and Q is the process covariance matrix, respectively.

2) MEASUREMENT MODEL
The next step in the proposed methodology is to design the measurement model for the Kalman filter. For this purpose, (20) is rewritten using the Taylor series and the steps explained in Section IV, as follows: by rearranging, and utilizing (21), the following is obtained: Utilizing the condition explained in (19), the 2 and 3 factors can be ignored, resulting in a simplified measurement model, as follows: where Y k represents the measurements of the pixel curve or focus curve of pixel P i,j (k) after FM application at the k th image, C is the state measurement matrix (given as C = 1 0 0 0 ) and η k is the noise in measurement due to jitter, as modeled in Section IV.
As the filter can be applied in two ways: • when applied before FM (as pre-FM application), i.e., on pixel curves, then: Y k = P i,j (k), • or when applied after FM (as post-FM application), i.e., on focus curves, then: Y k = i,j (k).

3) UPDATED STATES AND KALMAN GAIN
The Kalman gain is computed on every step using (31) as: where K k is the Kalman gain at k, and R is the measurement covariance matrix. The optimal state estimate is computed using the following: whereX − k = AX k−1 .

B. SHAPE RECOVERY
After the n th step iteration for the focus curve is completed, the depth for every pixel is recovered to obtain the shape of the object under consideration. As presented in the previous section, the filter can be applied in two ways, pre-or post-FM application.
If the filter is applied as pre-FM application then before recovering the depth map, FM is applied onP i,j (k) to obtain i,j (k) using (3). However, if the filter application is post-FM, then the depth map can be recovered directly usingˆ i,j (k).
For every object point i, j the coefficients of (20) (around k * ) are estimated using the following: where k * is the position whereˆ i,j (k) is maximum, and is obtained by the following: In (33), the vectorM i,j represents the collection of parameters of h k , i,j (k * ) are the values ofˆ i,j (k) around k * , and i,j (k * ) is the coefficient matrix; all for i, j object point and defined as follows: The refined and filtered depth (KD i,j ) for every P i,j is then recovered by:

VI. RESULTS AND DISCUSSION
This section analyzes the experimental results, and discusses them in detail. The section is divided into three subsections. First, details of the experimental setup are provided, followed by the depth map and shape assessment criteria, and the metric measures used. Later the detail analysis of the affects of Jitter noise on SFF is provided at the end of the section.

A. EXPERIMENTAL SETUP
Experiments for shape reconstruction analysis are performed on seven objects. Table 2, provides a summary of the objects used in the 3D shape analysis. Ten simulated datasets of simulated cone are generated with different lens positions and Jitter noise levels using camera simulation software (AVS). The details of AVS are provided in [8], [36], [37]. The Matlab code used can be downloaded from [8]. All the datasets consist of 97 images with 360×360 pixels. The AVS software is provided with the depth map, texture image, and camera parameters. The texture map consists of concentric circles of two alternating black and white stripes. The depth maps and the texture images used for image generation via AVS for all sequences of Simulated Cone are the same. The difference in each dataset is the uncertainty in step size u to generate the sequences, in order to study the effect of Jitter on shape reconstruction. The values of variance σ in u are (0, 0.1, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, and 2.0).
The real datasets contain real objects, Real Cone, Real Plane, LCD-TFT Filter, Groove, Coin, and Image-I. These image sequences were originally in gray-scale. Fig. 4 provides the ground truths of Simulated and Real cones. Fig. 5 shows the 10th frame of each image sequence. These image sequences have been widely used by many researchers including [19], [38]- [42].
Images of Real Cone were taken using the CCD camera system, [13], with dimensions of 200 × 200 × 97. The Real Plane image sequence is also obtained in a similar way, and contains 87 image frames each with a size of 200×200 pixels. Sixty images of the LCD-TFT filter were taken by the microscopic control system (MCS), with each image consisting of 300 × 300 pixels.
The coin sequence consists of magnified images of Lincoln's head from the back of a US penny. The LCD-TFT filter images were microscopic images of an LCD color filter. These images were also obtained by means of MCS. The system consisted of a personal computer integrated with a frame grabber board (Matrox Meteor-II) and a CCD camera (SAMSUNG CAMERA SCC-341) mounted on a microscope (NIKON OPTIPHOT-100S). Computer software acquired the images by controlling the lens position through a stepper motor driver (MAC 5,000), possessing a 2.5 nm minimum step length. All the images being stored in sequence at every step were captured by varying the object plane.
The sequence of Image-I is the letter I engraved in a metallic surface. This sequence consists of 60 images, and was also obtained via the same system under magnification of 10×. The dimensions of this image sequence are 300×300 pixels.
The Groove image sequence is of a V-groove engraved in a metallic surface. The dimensions of this image sequence are 300 × 300 pixels, with 60 images.

B. METRIC MEASURES
The shape reconstruction quality is the characteristic that measures the perceived difference between the reconstructed shape and the ideal shape. As the difference increases, the quality of shape reconstruction reduces. In this article, the quality of the depth map obtained by using different focus measures under various levels of Jitter noise is analyzed. In the ideal case, the obtained depth map is indistinguishable from the original map, and the difference is zero, hence, the quality of the map is at its maximum. Several quality metrics have been provided in the previous literature [43]. In this manuscript, RMSE and correlation are used to compare the proposed method combined with various FM operators under different levels of Jitter noise.
Root Mean Square Error (RMSE) is the square root of the variance of the residuals of the data under observation. This indicates how close the perceived shape is to the original shape. A lower value of RMSE indicates better results, written as: where G true is the ground truth, D obtained is the obtained depth map, and l × m are the dimensions of the depth maps. The higher the value of RMSE, the larger the error in shape reconstruction; for better results, the value should be close to zero. Correlation, or Pearson Correlation, is a linear relationship or similarity measure between two shapes [44], given by: where cov is the covariance, and σ 2 G and σ 2 D are the variances of G true and D obtained , respectively.
In Table 3 the proposed method is compared with previous methods provided by Jang et al. in [28]- [33]. The results are provided for σ 2 = 0.1. The first method is a scalar Kalman filter method, followed by scalar version of modified  the correlation results of the proposed scheme are better or comparable to other methods. Only the correlation values of IMCCKF method are better than proposed method as they use more data, but their RMSE results are poor in comparison to proposed scheme. This is due to the fact that the proposed scheme utilizes only the basic Kalman filter model. The results can be further improved if advance variants of Kalman filter are used with proposed scheme.   For further experimentation, each level of noise and FM, three scenarios are considered: FM only (i.e., without proposed filter), proposed filtering as a pre-processing step, (i.e. applied before FM), and proposed filtering as a post-processing step, (i.e. applied after FM). Each table has seven columns, with the σ 2 in the first column, followed by one column each for correlation and RMSE for each scenario. Table 4 along with Fig. 6 show the results of shape reconstruction for Simulated Cone under various levels of Jitter noise. The solid lines with markers show the correlation and RMSE values, whereas the dotted lines show  the general linear trend of the data. From the table and the figure, it is clear that when SML is used without the proposed scheme, as the noise level increases, the results become poorer. At lower noise levels, the values of correlation are similar for all three scenarios, whereas, when the noise levels increase, the correlation values without the proposed filter decrease sharply at 11.6%. The decrease in correlation values with the use of filter as pre-or post-step is merely 0.22% and 0.23%, respectively. The RMSE values of shape reconstruction when using SML start with 7.339 and increase at high rate of 17.95%, whereas the RMSE values   when using the proposed scheme start at the lower value of 6.952 and 7.194 and decrease at 13.94% and 7.69% with increase in noise levels. The graphs and tables clearly show that when the proposed scheme is applied as pre-or post-step, the results are better, as compared to just using SML. Fig. 8 represents the shape reconstruction of Simulated Cone using SML under various noise levels in all three scenarios, along with the cross-sections of these shapes. The blue line in the cross-section figures represents the ground truth for simulated cone. Table 5 and Fig. 7 provide the results of shape reconstruction when TENG is used as FM. A similar trend is observed, as the values of correlation at lower noise levels are similar for all three scenarios, whereas, when the noise levels increase the correlation values without the filter decrease sharply at 0.83%. The decrease in correlation values with the use of filter as pre/post step is merely 0.18% and 0.19%. The RMSE   values of shape reconstruction when using TENG start with 7.315 and increase at 9.43%, whereas the RMSE values when using the proposed scheme start at the lower value of 6.829 and 6.912 and decrease at 1.27% and 7.25% with increase in noise levels. The graphs and tables clearly show that when the proposed scheme is applied as pre-or post-step, the results are better, as compared to just using TENG. Fig. 9 represents the shape reconstruction of Simulated Cone using TENG under various noise levels in all three scenarios, along with the cross-sections of these shapes. Table 6 and Fig. 10 give the results of shape reconstruction when GLVA is used as an FM. When noise level increases and the GLVA is used without filtering, the correlation levels decrease by 0.83%. The correlations with filter in both preand post-application remain similar with decrease of 0.015% over the increase of noise from 0 to 2.0. When no filter is used, the RMSE values show a similar trend of 10.86% increase, and when proposed technique is used, about 3% and 10% decrease. Table 7 and Fig. 11 provide the results of WAVS as FM under all three scenarios of FM and the proposed method application. As noise increases, the correlation of WAVS decrease at the rate of 1%, but when combined with the proposed technique, with the increase in noise, it increases at 0.7%. However, at lower values of noise, the WAVS with or without filtering behaves similar. The trend in RMSE values also suggests that WAVS combined with the proposed technique offers better performance. The shape reconstruction results are provided in Fig. 12 and Fig. 13, respectively. Table 8 and Fig. 14 show the RMSE and correlation results when using CONT as an FM. In all three scenarios, it behaves poorly. Table 9 and Fig. 15 provide the results of the shape reconstruction of simulated cone when GRA3 is used as the VOLUME 9, 2021    RMSE values, as compared to using FMs only. The RMSE and Cor. graphs are shown in Fig. 16 and Fig. 17, respectively. Table 14 with Fig. 18 and Fig. 19, show the similar comparison of Real Cone with and without the proposed scheme for Jitter noise variance σ η affected by 0 ≤ σ 2 ≤ 0.75. The table and graphs clearly suggest the effectiveness of the proposed scheme. The correlation values across noisy conditions for the proposed scheme are higher as compared to using FM(s) only, whereas the RMSE values are lower. Figures 20 to 22 represent the shape construction of Real Cone with and without the proposed scheme. The reconstructed shapes when   using the proposed scheme are smoother as compared to the ones reconstructed using FM(s) only, as these shapes have surface-roughness produced due to jitter, and FMs cannot remove this alone.      only and the proposed filter used as post-FM application. In Real Plane reconstructed shape, the similar smoothness phenomenon (as Real Cone) can be observed in Fig. 23. The roughness in shape, due to jittering when using only FM is smoothed by the filter i.e., the jitter effect is removed. In LCD-TFT filter, the cylindrical shape of the filter is   preserved, and the surface around it is smoothed by the filtering process. In the case of Image-I dataset, not much difference can be observed visually, as the jitter in this sequence is quite low. In the case of Coin sequence a depth abnormality can be observed also in Fig. 23 near the vertical axis of 175 value in shape reconstruction using all FMs, whereas, when the proposed filter is applied, it is removed.
The Groove image sequence is a challenging problem in shape reconstruction [2]. The sides and center of this image sequence are over-exposed, resulting in texture degradation, which is critical in SFF systems. The slopes in the middle are the only ones that exhibit the change in focus levels. Fig. 25 shows the shape reconstruction of Groove using different FMs, along with the proposed filter applied as pre-and post-FM.
The results presented in the tables, graphs and reconstructed shapes in the manuscript clearly show that the Jitter noise affects the overall accuracy of the SFF systems, and can be removed by applying the proposed filtering technique using Kalman filter. The proposed scheme shows promising results.

VII. CONCLUSION
In SFF when the shape of the object is discretised into image frames, constant inter-frame distance is assumed. However, in practice, this inter-frame distance is prone to errors. This is due to mechanical errors in gear assembly of the translational-stage or lens-focusing-assembly of the imaging device. This can cause errors, which are referred to as Jitter noise in the literature. Jitter noise is not visible in images, because each pixel in an image will be subjected to the same error in focus. Thus, using traditional techniques of the denoising of images will not work. In this paper, Jitter noise is first modeled and the mean and variance of this noise are formulated. It is also shown that this Jitter noise is dependent on the first and second derivatives of the focus/pixel curves, and follows gamma distribution. The design of the system and measurement models for the proposed scheme using Kalman filter are presented and applied to the focusing curves to remove this noise. The proposed scheme can be applied in two ways, as pre-FM application or post-FM application. Unlike previously proposed techniques for Jitter noise removal in SFF systems, the proposed scheme utilizes single measurement for each step, and utilizes the dynamic approach with Kalman filter. Thus, it is faster and more accurate as compared to its predecessors.
The experiments are performed on seven objects: one simulated and six real. Ten noise levels are tested on the simulated object, and four levels on the real objects. Both pre-and post-applications are tested and the results are presented. The RMSE and correlation are used as metric measures. The experiments show the effectiveness of the proposed scheme.