Pseudo-Label-Free Weakly Supervised Semantic Segmentation Using Image Masking

Weakly-supervised semantic segmentation (WSSS) aims to train a semantic segmentation network using weak labels. Recent approaches generate the pseudo-label from the image-level label and then exploit it as a pixel-level supervision in the segmentation network training. A potential drawback of the conventional WSSS approaches is that the pseudo-label cannot accurately express the object regions and their classes, causing a degradation of the segmentation performance. In this paper, we propose a new WSSS technique that trains the segmentation network without relying on the pseudo-label. Key idea of the proposed approach is to train the segmentation network such that the object erased by the segmentation map is not detected by the classification network. From extensive experiments on the PASCAL VOC 2012 benchmark dataset, we demonstrate that our approach is effective in WSSS.


I. INTRODUCTION
Image semantic segmentation, a task to classify each pixel among the interested classes, is an important problem with a wide range of applications such as autonomous driving, medical diagnosis, industrial automation, and aerial imaging [1], [2]. Recently, deep neural networks (DNN)-based semantic segmentation has received special attention due to its excellent segmentation performance [3], [4]. A potential drawback of the DNN-based approach is that a large number of fully-annotated data are needed to train the networks. Since the generation of fully-annotated dataset is laborious, alternative approaches such as unlabeled or weakly-labeled learning have been suggested in recent years [5]- [7]. There are various forms of weak labels such as image-level labels [8], points [9], scribbles [10], and bounding boxes [11]. Among these, image-level label is popularly used for its simplicity [12]- [14]. In essence, image-level label indicates whether the foreground objects appear in an image or not (e.g., bird is in an image and cat is not). We henceforth refer to The associate editor coordinating the review of this manuscript and approving it for publication was Turgay Celik. the DNN-based semantic segmentation using the image-level labels as weakly-supervised semantic segmentation (WSSS).
A central challenge of WSSS is that the image-level labels do not provide information on object regions required to train the semantic segmentation networks. A simple way to localize object regions is to use class activation mapping [15]. Basically, this approach figures out what regions in the image are relevant to the semantic classes. The localization map obtained from this technique, called class activation map (CAM), indicates the discriminative object regions. In recent WSSS approaches, CAM is used to generate a pseudo-label for the training of semantic segmentation network [5], [16]. While the pseudo-label can well express the object region of interest, it might cause some potential problems hindering the accurate image segmentation. First, the object extent in the non-discriminative region is not accurately expressed (see Fig. 1-(a)). This is because the classification network focuses only on the existence of the objects so that the network tends to ignore the non-discriminative regions which are also parts of the objects. Second, the class assigned in each pixel of the pseudo-label might not be correct when an image contains multiple objects with distinct classes (see Fig. 1-(b)) since the CAMs are spread to unwanted regions outside the foreground objects. For these reasons, an approach that trains the semantic segmentation network using the pseudo-label might not achieve the satisfactory performance in many practical scenarios.
An aim of this paper is to propose a novel WSSS technique that can train the semantic segmentation network without relying on the pseudo-label. Basically, our approach is inspired by the visual attention mechanism of the human visual system (HVS) [17]. When HVS perceives the visual information, HVS focuses on the desired object without being interfered by other objects. In order to mimic the human behavior and thereby reduce the interference from irrelevant regions, we mask an input image using an attention map that guides which pixels to attend or ignore. Specifically, a segmentation network generates the segmentation map describing the discovered object regions. Then, the attention map is generated by collecting the discovered regions in the segmentation maps of interesting classes. We exploit the attention map in erasing the discovered regions and therefore focus on the remaining regions in the masked image.
In order to check whether the objects are erased properly, we employ a classification network trained for multi-class multi-label classification. In summary, in the training process, the segmentation network tries to generate the segmentation map covering the object regions. Then, for the image masked by the segmentation map, the classification network tries to find out the interesting objects. For example, when an image contains bird and car, a segmentation network is guided to generate an accurate map of bird or car. If the generated segmentation map contains the bird, then the bird is erased in the masked image, helping the detection of a car in the classification network.
To train the segmentation network in the absence of the pseudo-label, we adopt a novel combination of two complementary loss functions: attention loss and saliency loss. The attention loss is used to penalize the segmentation network if the segmentation map does not completely cover the objects of a target class. The saliency loss is used to encourage the segmentation network to learn the accurate object extent which cannot be identified by the classification network. By learning the object classes using the attention loss and object extent using the saliency loss, the segmentation network can segment the image without obtaining the class-specific knowledge from pixellevel supervision.
As a means to enhance the segmentation performance, we propose a training strategy for the classification network and a refining technique for the saliency map. First, for the training of the classification network, we exploit the dilated convolutional blocks (see Fig. 3). The dilated convolutional blocks are used to find out the object regions outside the most discriminative regions. The regions discovered by dilated convolutional blocks are then used as an additional supervision for the classification network in finding out complete object regions. Second, we refine the saliency map using the CAM obtained by the classification network (see Fig. 4). Note that the value in each pixel of the CAM indicates the probability of an object being contained in that pixel.
Using these values, we can find out the missing objects and also remove the unwanted objects in the saliency map.
The main contributions of this paper are as follows: • We propose a novel segmentation technique for weakly-supervised semantic segmentation. In our work, instead of learning the class-specific knowledge from the pseudo-label, the segmentation network learns the class-specific knowledge directly from the classification network by exploiting the image masking technique.
• We propose a training strategy for the semantic segmentation network. In the proposed approach, the segmentation network is trained using the combination of the attention loss and the saliency loss to accomplish the semantic segmentation task (see Section III-C).
• From numerical experiments on val and test of the PASCAL VOC 2012 semantic segmentation benchmark [18], we show that our approach achieves mean-intersection-over union 66.5% and 66.9% using VGG16-based network and 69.0% and 69.2% using ResNet101-based network, respectively, which are competitive with the state-of-the-arts.

II. RELATED WORK A. WEAKLY-SUPERVISED SEMANTIC SEGMENTATION
Image-level label has been used in many WSSS approaches due to its simplicity. Early works include multiple-instance learning [12], constrained optimization [19], and expectationmaximization techniques [20]. Recently, the class activation mapping technique that finds out the most discriminative object regions has been used to generate a pixel-level pseudo-label from the image-level label [15]. The generated pseudo-label depicting the reliable object regions is used as a supervision for the semantic segmentation network. The segmentation performance of this approach depends strongly on the accuracy of the generated pseudo-labels. Hence, it is of importance to find out accurate object regions for the proper training of the semantic segmentation network. In order to obtain a reliable pseudo-label, various segmentation techniques have been proposed. In [5], fully-connected conditional random field (CRF) is applied to the predicted segmentation maps to refine the object boundaries. In [14], seeded region growing technique is used to assign classes to unlabeled pixels. Recently, approaches generating the reliable pseudo-labels without relying on the segmentation algorithms have been proposed. In [21], for example, a large number of localization maps are generated and then aggregated into a single localization map. In [22], the localization maps are accumulated through the training process to collect the discriminative regions of different parts in the objects. In [23], multiple dilated convolutional blocks are used to enlarge the receptive fields and transfer the discriminative information to the non-discriminative regions. In [24], adversarial manipulation technique is used to expand the discriminative object regions.
In a nutshell, the proposed approach is a bit similar to the CAM-based approach in the sense that we find out the object regions from CAM. However, key distinctive point of the proposed approach is that the segmentation network learns the classes of pixels by directly utilizing the classification network meaning that we can train the segmentation network without relying on the pseudo-label.

B. VISUAL ATTENTION
Visual attention, an approach to select the search regions and analyze their effects, has been applied to various computer vision tasks such as image classification [25], object detection [26], and image caption generation [27]. In the semantic segmentation, visual attention is often implemented using the image masking, a technique to erase part of an image. In many approaches, discovered object regions are erased to help the discovery of new object regions [28]- [31]. For example, in [28], [29], discovered object regions are repetitively erased to find out new object regions. In [30], two-phase learning strategy has been proposed to get a complete region of the foreground objects from the attention maps of two networks. The drawback of these approaches is that it is difficult to figure out whether the masked image still contains part of foreground objects or not. As a consequence, one might simultaneously find out unwanted background objects and the main foreground objects (e.g., water with boat, rail with train). In [31], discriminative object regions are erased to guide the network to find out new object regions. In [32], discriminative object regions are suppressed to spread the attention of the network to adjacent non-discriminative object regions.
In [16], [33], visual attention mechanism is applied to the adversarial learning. In these approaches, an attention map obtained from the main network is used to mask an input image and then the masked image is delivered to the adversarial network. By training the network using the adversarial loss function, the main network is encouraged to generate an attention map which makes the adversarial network output consistent with the image-level label. In [33], an adversarial network is used to discriminate whether the input map is ground truth or generated from the segmentation network. In [16], an input image is masked by the selfattention map. The masked image is passed to the adversarial network to check if the attention map covers regions contributing to the classification output.

C. SALIENCY DETECTION
The main goal of the salient object detection is to identify the visually distinctive objects (or regions) in an image and then segment them out from the background. Since the image-level label does not contain any information on the background regions in WSSS systems, one cannot directly find out the confident background regions using the classification network. To overcome this limitation, the saliency map has been widely used in many WSSS approaches [5], [14], [16], [21], [22]. Key idea of these schemes is to identify the background regions using the pixels with low salient probabilities.
In [34]- [38], the saliency map is directly used in the training process of the segmentation networks. For example, in [34], the segmentation network is trained using the saliency maps of simple images to generate the pseudo-labels for complex images. In [35], saliency maps are used to supplement non-discriminative object regions. In [36], saliency maps are exploited to guide the seeded region growing method. In [37], saliency-guided self-attention module is used to capture rich contextual information for discovering the integral extent of objects and retrieving high-quality pseudo-label. In [38], an approach that trains the network using pixel-level feedback from combination of saliency maps and image-level labels has been proposed.

III. PROPOSED WEAKLY-SUPERVISED SEMANTIC SEGMENTATION NETWORK
In this section, we discuss the proposed WSSS framework. We first discuss the classification network training using dilated convolutional blocks and then discuss the refinement of the saliency maps using CAMs obtained from the classification network. We also explain how to train the semantic segmentation network using the image masking technique. The overall network architecture is illustrated in Fig. 2

A. TRAINING OF CLASSIFICATION NETWORK
A classification network is a key ingredient in our approach. Basically, the classification network is trained using the multi-class multi-label classification loss. One well-known problem in the conventional classification network is that the network cannot detect the non-discriminative object regions. To address this issue, we use an extra-supervision on the nondiscriminative object regions in the training of the classification network. To find out the non-discriminative object regions, we use a dilated convolution which enlarges the receptive field without changing the computational cost [39]. With the increased receptive field, the information in the discriminative object regions can be transferred to distant regions, helping the detection of the non-discriminative object regions.  In the training of the classification network, we append dilated convolutional blocks to the classification network (see Fig. 3). The dilated convolutional blocks are similar to the standard convolutional block except that their first convolutional layers have unique dilation rates d. Let M 0 be the CAM obtained from the standard convolutional block and M 1 , · · · , M D be the CAMs obtained from D dilated convolutional blocks. Then, the object regions found by multiple dilated convolutional blocks are added to M 0 using the max-fusion to supplement the non-discriminative object region. A dense CAM, denoted as M , covering the discriminative and non-discriminative object regions is obtained as The classification network is trained using the multi-class multi-label classification loss and the CAM loss. First, the multi-class multi-label classification loss sig is where C is the number of foreground classes, D is the set of indices of convolutional blocks, C is the set of indices of foreground classes, t c is the image-level label for class c, z ic = GAP(M i c ) is the predicted class score for class c (GAP is the global average pooling operation), and σ (x) is the sigmoid function. Second, the CAM loss cam , used to match the CAM M 0 to the dense CAM M , is the mean square error (MSE) between M and M 0 : where S is the set of all positions, C p is the set of indices of the present classes, and φ(x) = max(0, x) is the ReLU activation function. Also, M u,c is the class score of class c at position u of class activation map M . The overall loss cls for training the classification network is where λ cls is the weighting factor for balancing two losses.

B. SALIENCY MAP REFINEMENT
In the segmentation network training, the saliency map is used to learn which pixels belong to either background or foreground regions. While the saliency detector (SD) can find out the detailed shape of the objects, it might also find out unwanted background objects or miss interesting foreground objects since SD is trained without the semantic classes.
To overcome this potential drawback, we correct the pixels in the saliency map based on the CAM score. The score in each pixel indicates the probability of an object being contained in that pixel. Since the object detection in the classification network is fairly accurate, we can readily find out the missing foreground regions from the high-scored pixels in the CAM. Note that this does not necessarily mean that the low-scored pixels belong to the background regions since these pixels might belong to the non-discriminative object regions. From our extensive experiments, we observe that correcting these pixels to the background pixels causes a degradation of the segmentation performances. In our work, we set pixels with low scores to unlabeled pixels. In Fig. 4, we illustrate the overall procedure of refining the saliency map. We first obtain the CAM of an input image from the classification network. To improve the reliability of the CAM, we merge the CAMs of multiple scaled input images. Let M 0 (s i ) be the CAM of an input image scaled by a factor s i (s i ∈ {s 0 , · · · , s n }), then the reliable CAM M * is obtained as where scale is the scaling operator that changes the size of map to the size of the input image. To obtain a map expressing the foreground object regions, we merge the CAMs of present classes, generating a class-agnostic activation map B whose pixels indicate the probabilities of an object being contained in that pixel: If B u is larger than the pre-defined threshold τ 1 and the pixel u belongs to the background regions in the saliency map O, we consider this pixel as a foreground pixel. On the other hand, if B u is smaller than the pre-defined threshold τ 2 and the pixel u belongs to the foreground regions, we consider this pixel as an unlabeled pixel. That is, the refined saliency map R is obtained as

C. TRAINING OF SEGMENTATION NETWORK
For the training of the segmentation network, we use the saliency loss that encourages the segmentation network to learn the object regions from the saliency map. While the segmentation map has C + 1 classes, the saliency map has only two classes (background and foreground). To connect the segmentation map to the saliency map, we design the where u is the position of the pixels. The weights for background and foreground pixels are set to 1 |S b | and 1 |S f | , respectively, to balance the losses for background and foreground pixels. The first and second terms in (7) correspond to the loss for background and foreground classes, respectively. Note that the losses on unlabeled pixels in the saliency map are not computed during the training process.
To improve reliability of the network in various scales, we feed the multiple scaled input images to the network and compute the losses individually. Thus, the resulting saliency loss is the sum of cross-entropy losses for |S| scaled outputs (S is the set of input scales). One potential weakness using the saliency loss is that the segmentation network might predict the class of pixel incorrectly since the class of each pixel is unspecified in the saliency map. In order to make sure that the segmentation network predicts the correct class for each pixel, we exploit the image masking technique in the training of the segmentation network. During the training process, an input image is masked using an attention map F that designates which regions are erased. The attention map is obtained from the predicted regions in the segmentation map: where b c is the binary random number that decides whether the segmentation map H c is erased in the attention map or not. Using the F, the masked image I can be expressed as VOLUME 10, 2022 the product of the input image I and the attention map F: where is the element-wise multiplication and µ is the RGB mean of the training images. For a given class c, when b c = 1, we expect that the objects of class c are erased in I . Whereas, when b c = 0, we expect that the objects of class c remain in I . Hence, it is natural to choose t = t(1 − b) as the modified label corresponding to I . We illustrate the attention maps and masked images corresponding to b in Fig. 5. The segmentation network is trained to predict the correct object regions so that the class score corresponding to I matches t . An associated loss, the attention loss attn is defined as the cross-entropy between the class score z of I and the target label t : In contrast to the classification loss sig , the attention loss only considers the present classes. By generating multiple masked images using different attention maps, we can investigate the effects of different combinations of the segmentation maps. As illustrated in Fig. 6, we can add additional classification paths for other masked images generated using different attention maps. In each path, the same classification network is employed to compute the class score and the attention loss individually. The total attention loss is computed as the average of the attention losses: (11) where N is the number of classification paths. In summary, an overall loss for training the semantic segmentation network is seg = sal + λ seg total_attn (12) where λ seg is the weighting factor for balancing two losses in the segmentation loss. Note that during the training process of the segmentation network, we fix the parameters of the classification network to keep its learned knowledge.

IV. EXPERIMENTS A. DATASET AND EXPERIMENT SETTINGS
We evaluate the proposed approach on the PASCAL VOC 2012 segmentation benchmark dataset [18] which has 20 foreground classes and one background class. This dataset has 1,464 training images, 1,449 validation images, and 1,456 test images. As in many practices [4], [28], we use augmented training dataset consisting of 10,582 images [40].
In our experiments, we only utilize image-level annotations for the network training. We employ the saliency detector [41] to obtain saliency map that expresses class-agnostic pixelwise object scores. As a performance measure, we use mean intersection-over-union (mIOU), average of IOUs over 21 categories. We obtain the result on the test set by submitting the predicted results to the official PASCAL VOC evaluation server. For the classification network, we employ VGG16 [42] pre-trained on ImageNet classification dataset [43]. As illustrated in Fig. 3, we replace the last three fully-connected (fc) layers in VGG16 with a standard convolutional block consisting of three convolutional layers. The convolutional blocks consist of two 3 × 3 convolutional layers (fc6 and fc7 both 1024 outputs) and one 1 × 1 convolutional layer (fc8). We append three dilated convolutional blocks to the classification network (see Fig. 3). The dilation rates in three dilated convolutional blocks are set to d = {3, 6, 9, 12, 15, 18, 21, 24}. The parameters of the standard and dilated convolutional blocks are initialized from the normal distribution. We apply the GAP layer after fc8 for the training of the classification network.
For the segmentation network, we employ DeepLab-ASPP [4] whose backbone architecture is either VGG16 [42] or ResNet101 [44]. We initialize the parameters of VGG16and ResNet101-based DeepLab using the convolutionalized VGG16 and ResNet101 pre-trained on MS-COCO [45], respectively. For the last layer, the parameters are initialized from the normal distribution. When training the segmentation network, we use the classification network only with standard convolutional block (i.e., the dilated convolutional blocks are removed). In the training of the ResNet101-based DeepLab, we only update parameters of convolutional layers while fixing the parameters of batch normalization layers. The softmax output of the segmentation network is post-processed by CRF with default parameters [46].
To improve the robustness of the classification network and the segmentation network, we apply data augmentation techniques. We randomly flip and scale (from 0.5 to 1.5) input images. The resulting images are cropped to 321 × 321 at random location. We also apply color augmentation techniques by randomly changing brightness, contrast, saturation, and hue. We use multi-scale inputs with scales, S = {1, 0.75, 0.5} in both training and test phases [4], [47]. We use stochastic gradient descent optimizer with the momentum 0.9. We set the weight decay to 0.0005 and the batch size to 20. We employ polynomial learning rate policy [48] with initial learning rate 10 −3 and power 0.9, i.e., learning rate = 10 −3 ×(1− iter maxiter ) 0.9 . The learning rate for the last layers are multiplied by 10. We set the two thresholds τ 1 and τ 2 in (6) used to refine the saliency map to 0.8 and 0.3, respectively, which are found by grid search. The weighting factors λ cls in (3) and λ seg in (12) are set to 0.1 and 2, respectively. The entries of the binary random vectors b are drawn uniformly. We train the classification network and the segmentation network for 50 and 30 epochs, respectively. Our approach is implemented based on Tensorflow [49]. The classification network and the segmentation network are trained on a single NVIDIA GeForce Titan Xp.

B. COMPARISONS WITH STATE-OF-THE-ARTS
We compare the performance of the proposed method with that of state-of-the-art WSSS methods. In Tables 1 and 2, we summarize the mIOU obtained by VGG16-and ResNet-based WSSS approaches. From the results, we observe that our approach performs competitive with the conventional WSSS approaches. Specifically, our approach achieves mIOU of 66.5% and 69.0% for val set of PASCAL VOC 2012 segmentation dataset with VGG16and ResNet101-based DeepLab-ASPP, respectively. Using our approach, we can train the segmentation network such that it learns the class-specific knowledge directly from the classification network. Our results clearly demonstrate that the generation of pseudo-labels is unnecessary for WSSS.
We compare the proposed approach with a few notable WSSS approaches. In GAIN [16], since the main network and the adversarial network are sharing the parameters and also trained simultaneously, the network might be confused when the object regions are poorly discovered. Our approach can avoid this by training the adversarial network in advance and fixing the parameters in the network. MDC uses the classification network having multiple convolutional blocks to generate the pseudo-label [23]. In our approach, the classification network trained to predict the dense CAM is used for the training of the segmentation network directly. Similarly to the proposed approach, MCOF trains the segmentation network using the classification network [35]. In MCOF, the classification network is used to classify the superpixels of an input image. Whereas, in our approach, the pre-trained classification network is used to classify the regions of input image after applying image masking technique.

C. ABLATION STUDIES
In order to show the effectiveness of each component, we conduct ablation experiments with different settings of the proposed work. In Table 3, we summarize the segmentation performance of the proposed approach in different settings. When we say 'standard' classification network, it means that the network trained only using the multi-class multilabel classification loss function. The 'dilation' classification network means the network trained using the classification loss and the CAM loss described in Section III-A.
Our baselines are the VGG16-and ResNet101-based segmentation networks trained only using the saliency loss associated with the original saliency map obtained by SD [41] (see A1 and B1). From the results, we observe that the segmentation performance can be improved by refining the saliency map. Specifically, the models using the refined saliency map (A2 and B2) achieve about 3% improvement in mIOU over the baseline models. By comparing the performance of A1 and A3, we also observe that the segmentation performance can be improved by exploiting the classification network in the training of the segmentation network. Moreover, we can observe that the segmentation performance can be further improved by exploiting the dilated convolution-based classification network (see A3 and A4). We also observe that the performances can be enhanced by employing multiple classification paths (see A5 to A7 and B3 to B5).
To investigate the efficacy of refining saliency map, we conduct experiments using different saliency maps: 1) original saliency map obtained from saliency detector, 2) refined saliency map in which low-scored foreground pixels are corrected to background pixels, and 3) refined saliency map in which low-scored foreground pixels are considered as unlabeled pixels. From the results in Table 4,  we observe that the segmentation performance is degraded when the low-scored foreground pixels are corrected to background pixels. We also observe that the segmentation performance is significantly improved when the low-scored foreground pixels are considered as unlabeled pixels.
To observe the effect of combination of input scale used for refining saliency maps, we conduct experiments by varying the number of the input scales. The input scales are chosen among the scales used in data augmentation {0.5, 0.75, 1, 1.25, 1.5}. From the results, we see that the best segmentation performance is obtained when three input scales are used (see S4 and S7 in Table 5).
We also test the performances for various number of dilated convolutional blocks D. From the results shown in Table 6, we observe that the segmentation performance slightly improves with the number of dilated convolutional blocks at the expense of the additional computations and training time.

D. QUALITATIVE RESULTS
In Fig. 7, we provide qualitative results obtained from ResNet101-based DeepLab-ASPP. From the results, we can  observe that our saliency map refining strategy can find out the objects which might not be detected by SD and removing the falsely activated background objects. Also, we can observe that our image masking-based training strategy can help the segmentation network to learn the object classes precisely even when the objects are very small. Also, we would like to mention some failure cases. One of the most frequent failure scenarios is that there is an object which covers a large portion of the image. For example, sofa or table can be confused as background.
In Fig. 8, we provide qualitative results for the proposed approach and conventional approaches. From the results, we observe that the proposed approach predicts the detailed object region (see the first three columns in Fig. 8) while the conventional approaches make false activations (see the last two columns in Fig. 8).

V. CONCLUSION
In this paper, we proposed a new WSSS technique that can train the segmentation network without pixel-level pseudolabels. To prevent the performance degradation caused by inaccurate pseudo-label in conventional WSSS approaches, we have exploited the image masking technique in the training of the segmentation network. We also introduced an approach to refine the saliency map, which significantly improves the segmentation performance. Extensive experiments demonstrate that our approach is effective in solving the problem of WSSS.
LUONG TRUNG NGUYEN received the B.S. and M.S. degrees from the Ho Chi Minh City University of Technology, Vietnam, in 2010 and 2012, respectively, and the Ph.D. degree in electrical and computer engineering from Seoul National University, South Korea, in 2020. He is currently a Postdoctoral Researcher at the Institute of New Media and Communications, Seoul National University. His research interests include matrix completion, federated learning, and machine learning.
KYUHONG SHIM received the B.S. degree in electrical and computer engineering from Seoul National University, South Korea, in 2015, where he is currently pursuing the Ph.D. degree. His research interests include speech and language processing, efficient deep learning algorithms and implementations, neural network compression, compressed sensing, low-rank matrix completion, big data analysis, and machine learning.
JUNHAN KIM received the B.S. degree in electrical and computer engineering from Seoul National University, South Korea, in 2017, where he is currently pursuing the Ph.D. degree. His research interests include compressed sensing, low-rank matrix completion, big data analysis, and machine learning.