Selective Subtraction for Handheld Cameras

Background subtraction techniques model the background of the scene using the stationarity property and classify the scene into two classes namely foreground and background. In doing so, most moving objects become foreground indiscriminately, except in dynamic scenes (such as those with some waving tree leaves, water ripples, or a water fountain), which are typically “learned” as part of the background using a large training set of video data. We introduce a novel concept of background as the objects other than the foreground, which may include moving objects in the scene that cannot be learned from a training set because they occur only irregularly and sporadically, e.g. a walking person. We propose a “selective subtraction” method as an alternative to standard background subtraction, and show that a reference plane in a scene viewed by two cameras can be used as the decision boundary between foreground and background. In our definition, the foreground may actually occur behind a moving object. Furthermore, the reference plane can be selected in a very flexible manner, using for example the actual moving objects in the scene, if needed. We extend this idea to allow multiple reference planes resulting in multiple foregrounds or backgrounds. We present diverse set of examples to show that: 1) the technique performs better than standard background subtraction techniques without the need for training, camera calibration, disparity map estimation, or special camera configurations; 2) it is potentially more powerful than standard methods because of its flexibility of making it possible to select in real-time what to filter out as background, regardless of whether the object is moving or not, or whether it is a rare event or a frequent one. Furthermore, we show that this technique is relatively immune to camera motion and performs well for hand-held cameras.


I. INTRODUCTION
Background subtraction is the fundamental step used in many applications including object detection, tracking, action recognition, and activity recognition. Background subtraction techniques traditionally use one or more views to classify the objects (or image pixels) as either foreground or background. However, standard methods have a rigid definition of what constitutes a background, which often leads to classifying almost all moving objects as foreground, except for small persisting motions that can be learned from a training set. This strict binary classification and loss of 'intra-class separability' results in inability to model partial background or partial foreground and thus the notion of a background object being in front of a foreground object or a moving object belonging to the background. If scene modeling is to be made more effective, the background subtraction techniques need to ensure The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Zhao . that the statistical models can learn partial backgrounds and thus an intra-class taxonomy is preserved; which can prove very useful in many real world applications such as video surveillance and detection and tracking in crowds.
Existing background subtraction techniques can be classified into two main categories: techniques using monocular sequences and those using stereo sequences. Our method relies on two views but does not require configuring cameras rigidly as a stereo pair. Most of the existing literature focuses on different aspects such as the statistical approach used to model the background, type of scene used (dynamic or static), the learning method applied to the training set, and the model used for the background or foreground. The background of a scene is generally defined as being motionless for static scenes (e.g., video conference) and almostmotionless for dynamic scenes (e.g., scenes which include changes such as illumination, shadows, waving tree leaves, water ripples, or fountains). Most single-view background subtraction techniques try to model the background (and the dynamic changes) either by modeling each pixel or different regions statistically, and then use those statistical models to detect the moving objects, known as foreground. This type of modeling requires large amount of training data for learning the statistical properties of the background. Another limitation of traditional techniques is that an object (or pixels) can either be classified as foreground or background, not both. Alternatively, stereo-based techniques rely on estimating disparity maps by rectifying the views and using similarity measures in order to estimate the background. Such disparity maps are in practice difficult to estimate in realtime, susceptible to noise and highly error prone. Also, these techniques require special camera setup and are computationally expensive. Recent algorithms have used sensors that can measure the depth of the objects accurately and are typically part of custom-designed hardware or device which is often very expensive. Furthermore, all background subtraction techniques (whether based on single view or two views) classify moving objects as foreground indiscriminately. Consider a case when a camera is looking at a street with multiple objects moving across the camera in both directions. The object closer to the camera occludes the object crossing behind it which, in turn, is occluding another object crossing behind and so on. Any standard background subtraction technique will consider all of the moving objects as foreground thus will not be able to selectively distinguish which moving object should be kept as foreground and which ones should be discarded. What if you are only interested in the first two objects closest to the camera, or only one object at the back, and all other objects are irrelevant. Thus, the foregroundof-interest is now the partial foreground while backgroundof-interest is a combination of traditional background and partial foreground. In this context, the standard definition of background is insufficient. Current background subtraction techniques fail to model such backgrounds or foregrounds.
Our technique has five novel contributions. Firstly, most background subtraction techniques require extensive training or learning of the background model using data consisting of different examples of background alone. Even when such data is available, these techniques cannot learn the partial background as defined above. We challenge the requirement of training and propose the use of a reference plane inducing a base homography, estimated using only two frames. This base homography can be used in the background subtraction of the scene when traditional technique fail, because they cannot classify a frequently occurring moving object as background. Secondly, we propose to use the actual moving objects in the scene to estimate the base homography and show how a simple walk (or an object in motion) can be used to define a reference plane. Thirdly, most background subtraction techniques need large amount of data to model the background (which usually ranges from several hundreds to several thousands of examples). We propose and show that the base homography can be estimated using an object in motion viewed only in two frames. Thus the presence of large amount of training data is no longer required in our method. Reference plane: The reference plane is defined by a moving object or human walk and the projective depth (τ ) is defined as the distance between the reference plane and the objects in the scene.
Fourthly, standard background subtraction techniques fail to change the background model once it is learned. Only some minor dynamic changes are incorporated in the updating of the background model. In our proposed technique, the base homography can be modified using a different moving object or a plane in the scene in real time, and can be replaced altogether with a new base homography, thus providing flexibility in the background subtraction. This enables us to select multiple reference planes where one object can be classified as foreground with reference to one base homography while the same object can be classified as background with reference to a different base homography. Lastly, we avoid the explicit use of depth map and the requirement of rectifying two views for calculating depth as in stereo-based methods or use of special sensors to measure depth, and propose a solution based entirely on projective depth calculated from traditional cameras. Our technique also does not require rigid camera setup and works for handheld cameras.
The rest of the paper is organized as follows. A summary of related work is presented in Section II with the theoretical formulation and the description of the proposed approach in Section III. Experimental results performed on real world sequences and brief discussion of results is presented in Section IV, followed by conclusion in Section V.

II. RELATED WORK
Background subtraction has been an active area of research over the many past decades. It is beyond the scope of this work to review all the methods and techniques, hence we refer the reader to [1], [2] for a good review of the related work in this area. Background subtraction techniques have generally used one or more views to model the pixels or regions. The idea of defining foreground as non-moving object in a static scene has been used in background subtraction and object tracking for a very long time [3], [4]. In order to improve the results in real-world scenarios, dynamic background subtraction techniques use a single three-dimensional Gaussian distribution to model each pixel in the scene [5] or a Mixture of Gaussian (MoG) [6] or a non-parametric kernel density estimation (KDE) [7]. Region based techniques have also been proposed to improve background subtraction which try to use a covariance matrix from a region around a pixel [8] or auto-regression models [9] or propose the use of temporal VOLUME 8, 2020 persistence with single probability density in a Maximum A Posteriori in the Markov Random Field (MAP-MRF) selection framework [10] to model the spatial and appearance attributes [8]. See [11]- [20] for review of other single view methods.
Deep learning, and convolutional neural network (CNN) in general, has made its impact on background subtraction as well. Based on deep learning networks, [21] performs semantic segmentation on the images. This pixel level information is leveraged for motion detection in the video sequence. Pixels with low semantic probability are deemed as background. In order to reduce any false negatives, a semantic background model is maintained at each pixel as well. In case of ambiguity, any background subtraction method can be used in their method as the final step. Dividing an input image into patch [22], SuBSENSE algorithm [23], combined with Flux Tensor algorithm [24], is applied first to create a background image. CNNs are fed with matching pairs of patches from background and the input image. For application to the field of agriculture, [25] combines a standard background subtraction method with features learned from CNNs. It is hoped that these features would be robust to camera motion and view changes, and sensitive to any new elements in the area. Pixel-wise segmentation map is computed by [26] and they proposed an encode-decoder framework where the input image is temporally aligned to the reference image. An atrous convolution is introduced by [27] to expand the receptive field of the network and, Mimicking res-net, shortcut connections are added to reduce training complexity. Conditional Random Fields (CRF) are added in the last layer for refinement. A triplet convolutional neural network is proposed by [28] which used an encoder-decoder type network while utilizing pre-trained VGG-16 Net. Each branch of the triplet network operates on different scale to perform feature encoding. Decoding is performed by the transposed convolutional network. Their method works on an image at a time, not utilizing any temporal information. In order to utilize the temporal information, [29] proposes a deep end-to-end framework where pixel-wise semantic features are extracted using an encoder-decoder network. Long Short-Term Memory method (LSTM), is then used to model pixel-wise changes over time. In order to reduce sensitivity to camera motion, conditional random fields (CRFs) are used in the last layer. In order to fully capture the temporal information of a scene. a 3D CNN is proposed by [30]. Their specific 3D-CNN consists of 6 convolutional layers and the input is a window of 10 consecutive frames. These 10 frames are divided into a group of 4 frames and fed to 4 convolutional layers. Up-sampling is performed using kernels of various strides to retain the fine details from the input images, these layers are then concatenated to produce the final predication layer.
An alternate approach and the one most related to the technique presented in this paper, is based on stereo, which attempts to recover dense disparity maps in real time for segmenting the scene. References [31] used stereo cameras and their disparity maps to perform background subtraction by checking the color intensity values of corresponding pixels. Each pixel was warped to the corresponding pixel in the reference image and the color and luminance values were used to decide if the pixel belongs to the foreground or background. This method suffers from false and missed detections. [32] proposed the use of a stereo configuration, in which the cameras are vertically aligned, to improve the background subtraction. A multi-view approach is proposed by [33] to remove static background. They propose two methods, one with rough camera localization and other with accurate camera localization. For the first method, they use scene-specific pre-trained background model (using SVMs) to perform foreground extraction. For their second approach, mutli-view stereo approach is employed to perform a dense matching (using Structure from Motion technique) of the scene with dataset of existing images to remove static background. However, scene-specific labeled trained dataset is very expensive to acquire and SfM is known to be noise prone. Out of plane object are detected by [34] and a stereo image pair is used initially to compute the planar homography between them which is done off-line. During the test phase, one image is super-imposed on the other using the pre-computed homography and then a similarity map is created. A similarity map is created to detect out of plane objects, as pixels corresponding to a background have specific values (close to 1). The background pixels, on the other hand, have low values in the similarity map. A two-view based hierarchical algorithm is proposed by [35] where stereo images are decomposed using the Discrete Wavelet Transform (DWT). Adaptive models are build over sub-bands at each level. A depth based model is also created, which is applied to pixels that do not conform to the adaptive model. However, DWT is an expensive process and is known to be effected by noise. There are two major limitations of these techniques: color and luminance is not sufficient to decide if the pixels belong to foreground or background especially when objects are roughly similar in color. Furthermore, the cameras need to be in strict configurations to have sufficient accuracy. Recently, advances in 3D depth cameras such as Microsoft Kinect have improved the accurately of depth information but these devices are typically designed and configured specifically with multiple sensors to measure depth and are often expensive to purchase and difficult to setup. Our technique does not use such devices and can work with traditional cameras including handheld cameras or cellphones.

III. SELECTIVE SUBTRACTION APPROACH
In this section, we first define selective subtraction and provide the theoretical formulation for implementing it.
A. REFERENCE PLANE π AND BASE HOMOGRAPHY Consider a sequence of images {I t } t=1...n , where multiple objects are moving across the scene as shown in Figure 3. A simple change detection algorithm can be used to detect the moving objects (or blobs) and their head and feet positions  can be obtained by using the approach described in [36]. Let P 1 and P 2 be the two 3 × 4 camera projection matrices of two arbitrary cameras observing the scene. Since we do not require any calibration or a specific configuration, without loss of generality, we will model the two cameras as canonic cameras, i.e. P 1 = [I, 0] and P 2 = [[e ] × F, e ], where F is the fundamental matrix, e is the epipole in the second camera view, and for any vector v = (a, b, c) the notation [v] × denotes the skew symmetric matrix defined as: Next, define the head and feet positions of a person viewed by these two cameras at a given instant in time as p t 1 (top), p b 1 (bottom) and p t 2 (top), p b 2 (bottom) points, respectively. These corresponding pair of points define a one parameter family of planes given by where α is a scalar parameter. The homography induced by this family of planes is then given by Now, let m and m be two corresponding points of a 3D point M viewed by the two cameras. The homography H α would map m from the left image to the right image as where β = Here the scalar parameter τ may be interpreted as the projective depth of the point M from the plane π α , because we can readily verify that if M ∈ π α , then τ = 0. Otherwise, τ will VOLUME 8, 2020 be either positive or negative depending on which side of the plane, M lies.
Rearranging (10), we can determine τ from either x or y coordinates of the points m, m , and e . For instance using x coordinates we have: where (·) x denotes the x coordinate of the vector. One last issue before we describe how we can use (11) for selective subtraction: The base homography H α as derived above is parameterized in terms of a scalar α. There are several ways we can determine α. One simple way is to use a pair of corresponding points between the two camera views to solve for α using (6). For instance, either the head or feet point correspondences of the person in the two cameras in a later frame can be used to determine α. In this way, a walking person would establish a reference plane as depicted in Figure 1.

B. SELECTIVE SUBTRACTION
We use the reference plane as the decision boundary between foreground and background objects. Any plane in the scene can be chosen as the reference plane and thus it gives us the flexibility of selectively keeping or subtracting the objects on either side of the plane. For instance, if the reference plane chosen is the farthest plane in the scene then all moving objects fall in front of the reference plane and thus the approach can be used as a traditional background subtraction technique. The projective depth (τ ) for any moving object in the scene can be estimated and based on the sign of τ , the object can be classified as being on the foreground or the background. Moreover, The rate of change of τ over time may be interpreted as 'projective speed' of the object relative to the reference plane. For instance, in Figure 2 when an object moves, the rate of change of τ can be estimated and can be used in several applications including vehicle navigation or detecting anomalies in pedestrian paths. Furthermore, the idea of a single reference plane and the estimation of projective depth can be extended to the use of multiple reference planes which would allows us to classify a scene as layers of foreground or background [37] where different objects could belong to different foreground layers. The ability of having multiple reference planes enables us to define multiple foregrounds or multiple backgrounds and hence a notion of in-between two layers. Moreover, it is important to highlight that the proposed technique can also be used even when an object is fully or partially occluded (full occlusion can be detected as the object disappearing from the foreground).

IV. RESULTS AND DISCUSSION
The algorithm was tested on five set of challenging sequences with multiple moving objects with significant occlusions and illumination changes. These dataset are named as: outdoor, indoor, and three cellphone datasets. The comparative results with the Mixture of Gaussian method [6] have also been presented. A simple threshold based frame difference algorithm along with connected component analysis was used to detect the changes (or blobs) in the scene. We use state of the art feature matching algorithm, Scale Invariant Feature Transform (SIFT) [38], to find point correspondences. Table 1 summarizes the datasets used for testing the proposed method.
Outdoor Dataset: The first sequence contains an outdoor scene with several moving objects, possible casting shadows. It contains 1000 frames from each camera view with the resolution of 720 × 480. The scene also contains dynamic background motion, such as swaying tree leaves. The reference walk from a moving object was selected as reference plane and base homography was estimated using head and feet positions in the first and the last frames as shown in Figure 2. It should be highlighted that only four point correspondences are used to calculate the base homography and we do not require any additional training data. An alternate approach would be to track the head and feet positions throughout the  reference walk and use curve fitting techniques to improve the precision of head and feet positions [40]. Moreover, numerous complex algorithms can be used to detect the blobs with varying degree of success. The discussion on these algorithms is outside the scope of this paper.
Once the blobs are detected, we use the proposed algorithm to estimate the projective depth (τ ) as described in Section III-A. In our experiments, we first performed the blob detection followed by feature matching for point correspondences using SIFT. Notice that these two steps can be VOLUME 8, 2020   [6] 14%, [39] 72.7%), (b) shows the specificity (Average values: Ours 99.8%, [6] 99.3%, [39] 98.5%). The results show that the average detection sensitivity of the proposed is consistently better than [6] and [39] and specificity is comparable to these techniques.
reversed, as to finding point correspondences on the entire image followed by eliminating the ones outside the blobs. Figure 4 shows two views of the input images used for blob detection and feature matching for point correspondences.
For each corresponding point, we calculate τ using (11) and use a majority voting scheme to classify the blob as foreground or background (i.e., as being on one side of the reference plane or the other). The results are depicted in Figure 5, showing that the proposed algorithm can correctly separate the foreground from background.
One of the most unique aspects of our proposed technique is the flexibility it provides in selecting the reference plane of choice. Figure 5 shows how the foreground detection changes when different reference planes are selected for selective subtraction. Figure 5(a) shows the results when the The results show that the average detection sensitivity of the proposed is consistently better than [6] and [39] and specificity is comparable to these techniques. reference plane is the far wall and hence all moving objects are considered foreground as in traditional background subtraction technique. When the reference plane is changed to a moving object, the foreground changes accordingly as seen in Figure 5(b). Figure 5(c) shows the results when the selected reference plane is in the middle of pathway thus, detecting the objects in front as foreground. We also selected our reference plane as the object walking closest to the camera and found that all moving objects were detected as background. Figure 6 depicts the qualitative results, showing that the proposed technique performs better than mixture of Gaussian [6].
Indoor Dataset: Our second dataset contains an indoor scene with significant illumination changes and the results FIGURE 10. Selective subtraction results for cellphone-A sequence: Baristas are seen brewing the coffee and taking orders for the customers. We see some customers walking and pass in front of the staff from the left and move to the right of the scene. Each row shows the images captured from both cellphone cameras along with the foreground and background points detected by our algorithm when different reference planes are chosen. The first column of the figures shows the images captured from one cellphone camera and the second column shows images captured from the second camera. The remaining columns show the results obtained from our method when different reference planes are chosen. The third column shows the results when the farthest wall or plane is used as reference plane. The forth column shows the results when the middle plane is used as reference plane and the fifth column shows the results when foremost area is chosen as reference plan. are shown in Figure 7. The dataset contains 867 frames from each camera view with the resolution of 720×480. The scene contains a table with some objects lying on the table and a book shelf to the back of the room. People walk in front of VOLUME 8, 2020 FIGURE 11. Reference planes: The reference plane used in Cellphone-B are shown here. First row shows images from left camera and second row shows the corresponding images from right camera. The first column shows the two frames used for sift matches. The remaining columns show selected reference planes as follows: (from left to right) when plane is farthest from camera, when plane is in the middle -one farther from camera and one closer, and when plane is closest to the camera. the camera from the left of the room to the right and vice versa.
Cellphone Dataset: This dataset comprises of three separate recordings, which we denote at cellphone-A, cellphone-B and cellphone-C. These datasets were captured with two handheld SAMSUNG Galaxy S7 and Note 4 cellphones with an image resolution of 1080 × 1920. cellphone-A was captured inside a cafe, where baristas are seen brewing coffee and taking orders for the customers. We see some customers passing in front of the staff from the left and move to the right of the scene. This is shown in Figure 10. Each row shows the images captured from both cellphone cameras along with the foreground and background points detected by our algorithm when different reference planes are chosen. The first column of the figures shows the images captured from one cellphone camera and the second column shows images captured from the second camera. The remaining columns show the results obtained from our method when different reference planes are chosen. The third column shows the results when the farthest wall or plane is used as reference plane. In most results, objects are correctly classified as foreground objects. The forth column shows the results when the middle plane is used as reference plane and the fifth column shows the results when foremost area is chosen as reference plan. The average accuracy scores of 84%, 71.8% and 82.2% were observed for correct classification of each point shown in last three columns. Similarly, Figure 12 shows some of the frames in the cellphone-B dataset. This set of sequence captures a food court in a shopping mall. People are seen moving in the background and helping themselves with food. Each row of the figure shows results obtained from the proposed method. The first and second columns show two views of images captured from cellphone cameras. The remaining 4 plots in each row show the results obtained from our method when different reference planes are chosen. The top-left plot shows the results when the farthest wall or plane is used as reference plane. The bottom-right plot shows the results when the closest plane (i.e., closest to camera) is used as reference plane. The bottom-left and top-right plots show the results when different middle planes are used as reference planes. In most results, objects are correctly classified as foreground and background objects. The average accuracy scores of 94%, 94%, 85.3%, and 76% were observed for correct classification of each point shown in last two columns. Finally, Figure 13 shows some of the frames in the cellphone-C dataset. This sequence captures most challenging scene which includes dynamic moving objects (i.e., bushes, moving due to strong wind) as well as shadows. People are seen moving in both directions. The top two rows show some of the frames from two views of cellphone cameras. Third row show results from our proposed algorithm when the chosen reference planes is in the middle. The girl in green shirt walking from the left is selectively subtracted due to being in the background. Fourth row shows results from our proposed algorithm when the chosen reference plane is the farthest plane in the scene hence this approach becomes a traditional background subtraction approach. All moving objects are correctly classified as foreground. The remaining rows show results from other approaches. These results indicate that selective subtraction approach is effective and provides flexibility in selectively subtracting the objects of choice from the scene. The results are qualitatively demonstrated and compared to other methods, as shown in Figure 7. The qualitative analysis of these results clearly shows that our proposed technique performs very well in challenging environments even when used with datasets captured with handheld cameras.

A. QUANTITATIVE ANALYSIS
We also performed the quantitative analysis of the pixel-level detection accuracy. The per frame detection rates are calcu-    Figure 9 shows the sensitivity and specificity of the proposed technique as compared to [6] and [39]. Clearly, the detection accuracy in terms of sensitivity is consistently higher than [6] and [39] while specificity is comparable to both techniques. One of the major advantages of the proposed technique is that it does not require any special camera setup or configuration or depth sensing device as needed in other two-view background subtraction techniques. We also do not use the disparity map and thus the proposed algorithm is fast and computationally efficient. The average computation time per frame (480 × 720 pixels) is 0.0029 seconds on Intel Core2 Extreme CPU with 4GB RAM (excluding the time needed for blob detection and the feature matching). It should be noted that we have not performed any shadow removal or other post-processing, such as graph cuts [10] to improve the boundaries of foreground objects. Table 2 shows the results obtained from our method. We also compare our results to the standard methods of [6] and [39]. The first column shows different datasets that we have tested in this paper. The second column shows the specificity and sensitivity measurements obtained from the proposed method, where as the third and the fourth column mentions the obtained measure from [6] and [39], respectively. As can be seen from the table, results obtained from out approach are much higher and better. For the outdoor dataset, we obtained 79% and 95% for specificity and sensitivity, respectively. Similarly, for the cellphone-C dataset, we obtain 74% and 99.8%, where the best results obtained from the competition is that of 73% from [39] and 98.5% from [6] for specificity and sensitivity, respectively. These results show that the proposed method is robust and applicable. Moreover, the method is fast and computationally efficient. The above encouraging results demonstrate the practicality and viability of the proposed method.

B. IMPROVEMENTS
The proposed algorithm relies heavily on the change detection and feature matching algorithms for point correspondences and uses their results to estimate the projective depth. For improved results the following recommendations should be followed: • An important constraint in the estimation of base homography is its consistency with the fundamental matrix and thus all point correspondences should satisfy this constraint.
• Base homography can be estimated using two instances of the walk (or two frames only). Any error in selecting appropriate instances can result in wrong reference plane and thus introduce errors in the estimation of base homography. A more robust approach can be adopted, by tracking the head and feet positions over a period of time and then using curve fitting techniques to select the best candidates for head and feet positions.
• The accuracy of point correspondences used is critical for selective subtraction approach. A reliable feature matching algorithm like SIFT or Triangle Constraint Measurements (TCM) [41] is recommended to minimize the probability of false matches. The consistency of these matches with fundamental matrix can also be used.
• The use of effective blob detection algorithm is also important for selective subtraction. Numerous complex change detection algorithms are available which can be used for blob detection such as those using statistical properties of the pixels or color features. Moreover, selective subtraction can be used within the framework of any object detection algorithm as a refinement step.

V. CONCLUSION
This work presents a number of fundamental innovations in the context of background subtraction. We present a novel concept of background as objects other than foreground which may include moving objects in the scene that cannot be learned from a training set because they occur only irregularly and sporadically. Our proposed method, ''Selective Subtraction'', is as alternative to standard background subtraction, and we show that a reference plane in a scene is sufficient as the decision boundary between foreground and background. Furthermore, the flexibility in selecting the reference plane using the actual moving object in the scene or an arbitrary plane in the scene, is truly unique to this method and is not available in existing background subtraction techniques. We also show that the proposed technique enables us to select multiple reference planes and thus relaxing the strict binary classification-based paradigm. We present promising results on a challenging set of image sequences to show that the selective subtraction approach performs effectively and has applications in background subtraction, vehicle navigation, path anomaly detection, and detecting objects in crowds.
We also present results on images sequences from hand-held cameras to show that the proposed technique is relatively immune to camera motion and is robust. Furthermore, we provide recommendations to improve the results of selective subtraction approach.