A New Backbone Network for Instance Segmentation: Application on a Semiconductor Process Inspection

In this paper, we propose Instance Segmentation Detector (ISD) to extract the enhanced feature-maps under the situations where training dataset is limited in the specific industry domain such as semiconductor photo lithography inspection. ISD is used as a new backbone network of state-of-the-art Mask R-CNN framework for instance segmentation. ISD consists of four dense blocks and four transition layers. Each dense block in ISD has the shortcut connection and the concatenation of the feature-maps produced in layer with dynamic growth rate. ISD is trained from scratch without using recently approached transfer learning method. Additionally, ISD is trained with image dataset pre-processed by means of the specific designed image filter to extract the better enhanced feature map of Convolutional Neural Network (CNN). In ISD, one of the key principles is the compactness, plays a critical role for addressing real time problem and for application on resource bounded devices. To validate the model, this paper uses the real image collected from the computer vision system embedded in the currently operating semiconductor manufacturing equipment. ISD achieves consistently better results than state-of-the-art methods at the standard mean average precision. Specifically, our ISD outperforms baseline method DenseNet, while requiring only 1/4 parameters. We also observe that ISD can achieve comparable better results than ResNet, with only much smaller 1/268 parameters, using no extra data or pre-trained models.


I. INTRODUCTION
The semiconductor photo lithography is a process of drawing semiconductor circuits on wafers, coating them thinly with photosensitive polymer materials that respond to light on wafers, then placing a mask on top of the desired pattern and pecking the light to form the desired pattern. In this process, the spin coating is used to spread the required thickness of the photoresist uniformly on the wafer. Therefore, the spin coating is an important process. If inspection faults occur in this process, a defective product is produced no matter how well the subsequent process is performed. It is greatly affecting the defect rate in wafer-based process. As illustrated in Fig. 1, the computer vision system is used to prevent defects in semiconductor products by monitoring these processes and predicting defects in the photo process in advance.
The associate editor coordinating the review of this manuscript and approving it for publication was Zhigang Liu . Generally, the computer vision system uses the digital image processing [1]- [10] to try and perform emulation of vision at human scale. The computer vision system used in the process of spin coating also finds defects through digital image processing algorithm.  However, many detection errors occur due to external environmental factors such as various types of wafers and photoresist, motor rotation speed, and diffuse reflection of light. Fig. 2 illustrates an example of image distorted by external environment factors. Digital image processing algorithm has high performance in case of images with little influence on the external environment. However, performance is extremely degraded when image distortion occurs due to the external environment. Therefore, in the computer vision system, if the characteristics of the image is changed or distorted, there is a disadvantage in that a new or modified technique of digital image processing algorithm and the specialized signal processing method should be applied to overcome it. To overcome the influence of various image distortion, we adopt deep learning that is robust even in the external environment.
As illustrated in Fig. 3, there are three inspection type for detecting defects in the spin coating process of semiconductor photo lithography: first is the suck-back state of the nozzle that sprays the photoresist, second is the contamination state of the nozzle, and third is the time to spray the photoresist. In this paper, we propose a method for detecting defects by monitoring the first inspection type, the suck-back state of nozzle. Therefore, in order to this, it is necessary to find a specific area in an image and extract features within the area to determine whether the defect is defective. Deep learning techniques [11] that can detect specific areas in an image have object detection, semantic segmentation, and instance segmentation. Among them, the instance segmentation technique can be applied to inspect not only the suck-back state of nozzle but also the contamination of the nozzle.
Image segmentation is a computer vision process designed to simplify image analysis by splitting input into segments that represent objects or parts of objects and form a collection of pixels. Instance segmentation is a subtype of image segmentation which identifies each instance of each object within the image at the pixel level. Instance segmentation can also be thought as object detection where the output is a mask instead of just a bounding box. Agarwal et al. [12] presented recent advances in object detection in the age of deep convolutional neural networks. The objective of instance segmentation is to detect specific objects in an image and create a mask around the object of interest.
In computer vision, transfer learning is usually expressed through the use of pre-trained models. To achieve desired performance, the common practice in advanced instance segmentation systems is to fine-tune models pre-trained on ImageNet [13]. This fine-tuning process can be viewed as transfer learning [14]- [19]. Researchers usually train CNN models on large scale classification datasets like ImageNet [13] first, then fine-tune the models on target tasks, such as object detection [20]- [35], image segmentation [36]- [39], etc. However, we directly train model without involving any other additional data or extra fine-tuning process. There are numerous state-of-the-art pre-trained CNN models available. Fine-tuning on pre-trained models can quickly convergence to a final state and requires less instancelevel annotated training data than basic classification task. As is well-known, fine-tuning can mitigate the gap between different target category distributions. However, it is still a severe problem when the source domain (e.g., ImageNet) has a huge mismatch to the target domain such as industrial images, medical images, etc. As illustrated in Fig. 3, the image used for inspection is completely different from the image on source domain (e.g., ImageNet). Without having enough number of dataset, deep artificial neural networks cannot be trained well and it is difficult to collect enough data size in the specific industry domain.
In this work, we investigate three questions. First, is it possible to train instance segmentation networks from scratch directly with only smaller dataset without the pre-trained models? Second, are there any principles to design a resource efficient network structure for instance segmentation, meanwhile keeping high detection accuracy? Third, is there any methodology to improve inspection performance other than network design? To meet this goal, we propose instance segmentation detector (ISD) and pre-processing that is performed by using image filter before training.

II. RELATED WORK A. INSPECTION METHOD
Computer vision systems [40]- [46] are widely used for online inspection and quality control to improve the finished product quality and lower the costs in various industries. The computer vision system used in current semiconductor industries performs the specialized digital image processing and signal processing to extract features necessary for defect detection, and determines the defect by means of a neural network as a classifier. The specialized digital image processing removes noise from the input image of specific VOLUME 8, 2020  domain, improves brightness or contrast, emphasizes edges, and makes the image more clearly to extract features. Feature extraction is obtained by the signal processing method that calculates the sum of the vertical component pixels and the horizontal components of the pre-processed image by means of digital image processing, and applies an adaptive threshold. Recognizing the extracted features and determining whether there are defects is composed of a neural network. Fig. 4 (c) illustrates an example of automatically detecting the contamination state of nozzle by means of digital image processing. Fig. 5 also illustrates an example of automatically detecting the suck-back state of nozzle by means of signal processing during the spin coating process of semiconductor photo lithography.
In the spin coating process of semiconductor photo lithography, various types of nozzle for spraying photoresist are used depending on the kind of photoresist and the characteristic of wafer. Fig. 6 illustrates an example of various types of nozzle. Therefore, digital image processing and signal processing method used in the computer vision system should be applied to the specialized technique depending on external environment such as various types of nozzle, wafer characteristics and diffuse reflection of light etc. If a new nozzle or a new wafer is used, the defect detection accuracy of the computer vision system is inevitably reduced.
Considering these problems, we propose instance segmentation method based on generalized deep learning in order to be more robust to the external environment and further improve performance instead of the specialized digital image  processing and signal processing method used for semiconductor photo lithography inspection.

B. ENHANCED FEATURE MAP
The discriminative feature is very important factor in image classification problem, and the smaller the variance within the same class and the larger the variance between different classes, the easier it is to solve the classification problem in general. The feature-map of CNN to detect nozzle type is clearly distinguished between the nozzle types. However, since the inspection in semiconductor photo lithography is performed in the same nozzle type, it is difficult to extract the discriminative CNN feature-map. It is hard to extract the discriminative feature from the proposed regions of the Region Proposal Network (RPN) using CNN feature-map of the same nozzle type. As illustrates in Fig. 7, the mask area cannot be achieved without the discriminative CNN featuremap in the proposed regions.
The reason for not being able to extract the discriminative feature in the proposed regions is that it is not enough to extract the discriminative feature by means of only original pixel information in the corresponding area as a gray scale image. In order to enhance a feature-map of CNN with only the original pixel information of the image, it may be possible to extract the discriminative feature by performing a lot of deep learning by increasing the network layer of CNN with a large number of various training images. Deep convolutional neural networks require a large corpus of training data in order to avoid over-fitting. Over-fitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as industrial image analysis and medical image analysis, and collection of such training data is often expensive and laborious.
Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve CNN task performance. Data augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better deep learning models can be built using them. Data augmentations based on basic image manipulations are geometric transformation, flipping, color space, cropping, rotation, translation, noise injection, color space transformations, geometric versus photometric transformations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning [47]- [52].
However, we propose the pre-processing method that reduces the amount of training images and decreases the number of network layer in CNN rather than data augmentation. The specialized image filter for the semiconductor photo lithography inspection is applied to the pre-processing method in order to enhance the feature-map of CNN.

C. BACKBONE NETWORK FOR INSTANCE SEGMENTATION
A lot of deep convolutional neural networks (CNN) [53] originally designed for classification tasks have been adopted for the detection task as well. And a lot of modifications have been done on them to adapt for the additional difficulties encountered. Object detection is a natural extension of the classification problem. The constant challenge is to correctly detect the presence and accurately locate the object instance in the image. It is a supervised learning problem in which, given a set of training images, one has to design an algorithm which can accurately locate and correctly classify as many object instances as possible in a rectangle box while avoiding false detections of background or multiple detections of the same instance. The process of detecting instance segmentation can be spilt into three parts: extracting featuremaps, proposing regions, classifying and regressing binary mask. Among them, the backbone network that extracts feature-maps play a major role in instance segmentation detection models. Huang et al. [54] partially confirmed the common observation that, as the classification performance of the backbone increases on ImageNet [13] classification task, so does the performance of object detectors based on those backbones. It is the case at least for popular object detectors like Fast R-CNN [21], Faster R-CNN [22], Mask R-CNN [55] and R-FCN [23] although for SSD [24] the object detection performance remains around the same. Since there are significant efforts that have been devoted to design network architectures for image classification, many diverse and powerful networks are emerged, such as VGGNet [56], GoogLeNet [57], ResNet [58], DenseNet [59], DPN [60] etc. In practice, most of the detection methods [20]- [22], [24], [55] directly utilize these structures pre-trained on ImageNet [13] as the backbone network for detection task. Some other works try to design specific backbone network structures for object detection, but still require to pre-train on ImageNet [13] classification dataset in advance. Kim et al. [61] proposes PVANet for fast object detection, which consists of the simplified ''Inception'' block from GoogLeNet [57]. Huang et al. [54] investigated various combination of network structures and detection frameworks, and found that Faster R-CNN [22] with Inception-ResNet-v2 [62] achieved very promising accurate performance. Nakazawa et al. [63] proposed the CNN architecture for wafer map pattern generation in the semiconductor manufacturing.
Therefore, we propose a suitable backbone structure for extracting the enhanced feature-map to detect instance segmentation in industrial domain, which is the proposed ISD instead of ResNet [58] that is the backbone network of stateof-the-art Mask R-CNN framework.

D. LEARNING NETWORK MODEL FROM SCRATCH
There are no previous works that train deep CNN-based instance segmentation in industrial domain from scratch. In generic object detection, Shen et al. [64] proposed Deeply Supervised Object Detectors (DSOD), an object detection framework that can be trained from scratch. In semantic segmentation, J egou et al. [65] demonstrated that a welldesigned network structure can outperform state-of-the-art solutions without using the pre-trained models. It extends DenseNet [59] to fully convolutional networks by adding an up sampling path to recover the full input resolution.
Thus, our proposed approach has very appealing advantage in that it is learning network model from scratch without using the pre-trained model on ImageNet [13] for instance segmentation.

III. OUR APPROACH
We first introduce the whole framework of our ISD architecture, following by pre-processing for extracting the enhanced feature-map. Then we describe the training process and objective in detail.

A. ISD ARCHITECTURE
The whole framework for semiconductor photo lithography inspection is based on Mask R-CNN framework. There are two stages of Mask R-CNN framework. First, it generates proposals about the regions where there might be an object based on the input image. Second, it predicts the class of the object, refines the bonding box and generates a mask in pixel level of the object based on the first stage proposal. Both stages are connected to the backbone network structure.
Many approaches to instance segmentation are based on segment proposals. However, our approach is focus on the backbone network which extracts the enhanced feature-maps for the object mask. The state-of-the-art Mask R-CNN framework uses ResNet [58] and ResNetXt [66] as backbone network. However, as illustrates in Fig. 8, our approach uses the compact ISD instead of ResNet [58] for addressing real time problem and learning from scratch. VOLUME 8, 2020 ISD based on the state-of-the-art DenseNet [59] is motivated by combining the advantage of shortcut connection and concatenation of the feature-maps produced in layers with dynamic growth rate. In order to improve the performance of instance segmentation with better parameter efficiency, we investigated all the state-of-the-art CNN based instance segmentation. The design principle of ISD is compact model, which is suitable for real time embedded system such as computer vision system and make them easy to train under reducing over fitting on tasks with smaller training set sizes.
ISD comprises layers, each of which implements a composite function of operations such as Batch Normalization (BN) [67], rectified linear units (ReLU) [68], Pooling [69], or Convolution (Conv). ISD has the concatenation of the feature-maps produced in layers in order to encourage strengthen feature propagation and feature reuse. Further, ISD has the shortcut connection for addressing vanishing and exploding gradients. ISD is composed of four dense blocks and four transition layers similar to DenseNet [59]; see Table 1.
However, crucially in contrast to DenseNet [59], ISD combine features through summation before they are passed into a dense block combined features by concatenating them with post-activation. Fig. 9 illustrates this layout schematically. Santhanam et al. [70] presented the result that pre-activation ResNets consistently outperforms the original post-activation only at very high-network depths (≥ 152 depths). ISD has 38 depths at low-network depths and post-activation ISD outperformed pre-activation on the results of experiment. Thus, in our approach, ISD has a structure with post-activation as shown in Fig. 9. Moreover, as illustrated in Fig. 10, there is dynamic growth rate unlike DenseNet [59], which applies different growth rate in each layer in order to optimize the model. The growth rate that regulates the amount of information on each layer determine the number of feature-map. The dynamic growth rate substantially reduces the number of parameters, optimizing the model more compact and improving the performace.
ISD has mainly three hyper-parameters: First, we refer to n as number of layers in each dense bock. Second, we refer to k as growth rate of the network. Third, we refer to bw as bottleneck width. We optimized the hyper-parameters through experimental results.

B. PRE-PROCESSING FOR ENHANCED FEATURE MAP
Edge detection is one of the significant section of the image processing algorithms which have many applications like  image morphing, pattern recognition, image segmentation and image extraction etc. As the edge is one of the major information contributors to any image, hence the edge detection is a very important step in many of the image processing algorithms. It represents the contour of the image which could be helpful to recognize the image as an object with its detected edges. Kabade et al. [71] proposed block level canny edge detection algorithm which is the special algorithm to carry out the edge detection of an image in order to reduce the time and memory consumption. In case of the suck-back state among the inspection types shown in Fig. 3, it is hard to extract the feature from an image overlapped by nozzle image and photoresist image. In addition, the image of photoresist is varied by depending on the type of nozzle, and the image of nozzle is varied by depending on the kind of photoresist. The specific image filter modified by the sobel edge detector [72], which is composed of a pair of 3 × 3 convolution masks, one estimating gradient in the horizontal x-direction and the other estimating gradient in vertical y-direction, is adopt to identify points in an image at which the image brightness changes sharply or, more formally, has discontinuities. Pre-processing is performed by using convolution on the image by means of the specific image filter.
The edge occurs where there is a discontinuity in the intensity function or a very steep intensity gradient in the image. Thus, the edge could be located at which the derivative is maximum. The gradient is a vector, whose components measure how rapid pixel value are changing with distance in the x and y direction. Thus, the components of the gradient may be found using the following approximation: where dx and dy measure distance along the x and y directions respectively. In discrete images, one can consider dx and dy in terms of numbers of pixel between two points, dx = dy = 1 The different operation in ''(3)'' and ''(4)'' correspond to convolving the image with the following image filter mask.
In ''(5)'' and ''(6)'', g is adaptively applied according to the image intensity. The image pre-processed by means of the specific image filter is shown in Fig. 11. The pre-processed image that is used as the input of ISD has significance in extracting the enhanced feature-map for inspection

C. MODEL TRAINING
In our approach, we focus on the instance segmentation task without using the pre-trained models. We train models on target dataset directly without using IamgeNet dataset as shown in Fig. 12. ISD is trained with various nozzle image as shown in Fig. 12, to classify the nozzle type.  The image dataset used to train Mask R-CNN is prepared by using image annotation tool (i.e. VGG image annotator) which manipulates the labeled segmentation of image. In addition, filtering the input dataset is performed for preprocessing of training model. Fig. 13 illustrates the training process.

D. TRAINING OBJECTIVE
The training objective is the losses being used to converge the huge number of weights and the hyper-parameters that must be conducive to this convergence.
In training model for classifying nozzle type, categorical cross entropy loss generally used to classify image is adopt to the loss of ISD (i.e. L ISD ). It is a softmax activation plus a cross entropy loss.
L ISD = − log e s p C j e s j where: s p = the CNN score for the positive class C = the number of classes s j = the score inferred by the network for each class in C In training model for detecting suck-back state of nozzle, the training loss is adopt from Faster R-CNN and Mask R-CNN, which is a weighted sum of the classification loss(cls), the localization loss(box) and segmentation mask loss(mask). Where L total_cls and L total_box are same as in Faster R-CNN [22] and L total_mask is same as in Mask R-CNN [55] L total = L total_cls +L total_box +L total_mask where: p i = Predicted probability of anchor i being an object p * i = Ground truth label of whether anchor i is an object N cls = Normalization term, set to be batch size where: t i = Predicted four parameterized coordinates t * i = Ground truth coordinates N box = Normalization term, set to the number of anchor locations α= Balancing parameter 1≤i,j≤m y ij logŷ k ij +(1−y ij ) log (1−ŷ k ij ) where: y ij = Label of cell (i, j) in the true mask for the region of size m × m y k ij = Predicted value of the same cell for the ground truth class k

IV. EXPERIMENT
We implement ISD based on the tensorflow platform [73]. The hardware platform is notebook with two GPUs as illustrated in Table 2. Since image related to semiconductor process is not available in open datasets for deep learning such as ImageNet, MS COCO, pascal VOC etc., the experimental dataset is acquired from computer vision system embedded in the currently operating semiconductor manufacturing equipment for photo lithography inspection. The size of image is 640 × 495 pixels and gray color. Intuitively, larger input images will bring better performance for instance segmentation. However, an additional difficulty is that real world applications like computer vision system demand inspection to be solved in real time. Fastest detectors are usually better than the best performing ones. Thus, we reduced the size of image used as the input of ISD to 120 × 120 pixels. We evaluate ISD with different depth and growth rates for compactness. We verify the effectiveness of the method through the comparison experiment. A consistent setting is imposed on all the experiments, unless when some components or structures are examined. We adopt the standard mean Average Precision (mAP) to measure the instance segmentation performance.

A. CLASSIFICATION RESULTS ON ISD
In order to classify nozzle type, 18,304 images that have already been correctly classified into 8 types of nozzle, were collected from real operating semiconductor manufacturing equipment. Then, we split these images randomly into 13,728 training datasets and 4,576 validation datasets. The classification training accuracy after only 10 epoch is 99.8% and the validation accuracy is 99.9% for classifying nozzle type. The classification training and validation accuracy in each epoch is illustrated in Fig. 14. The average processing time for each epoch is 37 seconds. In addition to classification of nozzle type, we also test to detect instance segmentation of nozzle type. We used 385 training dataset and 138 validation dataset for instance segmentation. Fig. 15 illustrates the result on detecting instance segmentation in each nozzle type.

B. COMPARISON WITH PRE-PROCESSING
In order to detect suck-back state of nozzle, we used 266 training datasets and 144 validation datasets for instance segmentation in each nozzle type. The average processing time for each epoch is 208 seconds. We evaluate the performance of pre-processing on instance segmentation task in the standard mean average precision.
In aspect of the mask, the mask of nozzle type was detected well even without pre-processing using image filter. However, the mask of suck-back state for inspection was not detected or incorrectly recognized when the pre-processing is not performed. Fig. 16 illustrates comparison with preprocessing in aspect of mask. We can observe that the preprocessing using image filter can achieve higher accuracy, which is consistent to our conjecture that the enhanced feature-map is extracted by pre-processing.
In aspect of the standard mean average precision, comparison of pre-processing is illustrated in Table 3. mAP@0.50 in validation is improved by 4.27% when the pre-processing is performed. Interestingly, mAP@0.75 in validation is improved with a large margin (18.19%) when the pre- In case of training ISD with pre-processing, the mask performance is better than without pre-processing. processing is performed. We can observe that the greatest task performance improvement was yielded by pre-processing.

C. INSTANCE SEGMENTATION RESULTS ON ISD
Model optimization and performance are an important tradeoff for the applications of deep neural networks in actual instance segmentation tasks for real time application. In order to optimize ISD, we conduct experiments with three cases which are the number of depth, shortcut connection and dynamic growth rate.

1) THE NUMBER OF DEPTH
We have experimented with various depths on ISD. As illustrated in Table 4, we empirically demonstrate that the deeper layer is the better performance, as is well known. However, using 42 depths is sufficient to deliver good performance and it is better in aspect of resource effectiveness. We can observe that our compactness model with only 85K parameters achieves performance to 95.49% at mAP@0.50 in validation, which shows great potential for applications on computer vision system in real time.

2) SHORTCUT CONNECTION
We have experimented with and without shortcut connection. We observe that ISD with 62 depths using shortcut connection significantly improves the performance from 42.55% to 57.87% at mAP@0.75 in validation. We experimentally found that shortcut connection improves the performance by means of alleviating vanishing and exploding gradients, encouraging feature reuse.

3) GROWTH RATE
As aforementioned, ISD use dynamic growth rate that applies different growth rates in each layer. In Table 5, we compare three options: (A) uniform growth rates (k, k, · · · , k) are used; (B) increasing growth rates (1, 2, 3, · · · , k) are used; (C) decreasing growth rates (k, k−1, k−2, · · · , 2, 1) are used; As illustrated in Table 5, we observe that ISD with 54 depths using increasing growth rates improves the performance from 91.32% to 95.83% at mAP@0.50 in validation, while requiring only 1/2 parameters. We experimentally found that dynamic growth rate improves the performance better than uniform growth rate. It substantially reduces the number of parameters.

D. COMPARISON WITH STATE-OF-THE-ART METHODS
We compare our results with state-of-the-art backbone networks of Mask R-CNN framework. Results are summarized in Table 6. ISD achieves consistently better results than statof-the-art methods with much more compactness structure. Specifically, our ISD-38 achieves 95.24% at mAP@0.50 in  validation, which outperforms the baseline DenseNet-38 with a large margin (16.97%, mAP@50), while requiring only 1/4 parameters. We also observe that ISD-38 can achieve comparable better results at mAP@0.75 than ResNet-38 requiring a huge memory space to store the massive parameters, with only much smaller 1/268 parameters, which shows great potential for application on resource bounded devices.
As the size of the network increases, the inference and the training become slower and require more data. There is generally a trade-off between performance and speed. When one needs real time detectors, like for computer vision, one loses some precision. In Table 6, the highest result of 96.59% at mAP@50 in validation are obtained with ResNet-38. Our ISD-42 achieves 95.49% at mAP@50 in validation, 1.1% lower. However, the speed has improved significantly by 217 times. Interestingly, our ISD-42 is 3.45% higher than ResNet-38 at mAP@75 in validation.

V. CONCLUSION
This paper presents a novel backbone network, the ISD, to solve the problem that training dataset limited in specific industry domain might cause overfitting at training and quality mismatch at inference, for addressing real time problem and for application on resource bounded devices. Our model is simple to construct and can be trained directly on full images. According to our method including pre-processing, enhanced feature-maps can be obtained for instance segmentation. We demonstrate that our ISD-42 significantly outperforms state-of-the-art DenseNet-42 in terms of both accuracy (9.73% more accurate) and speed (3 times faster) at mAP@50 in validation. Also, our ISD-42 improves 217 times faster in speed and 3.45% higher accurate than state-of-theart ResNet-38 at mAP@0.75 in validation.
In addition to backbone network of Mask R-CNN framework, the ISD can be applicable to many instance segmentation architecture. We believe that it can be useful to many future instance segmentation research efforts in diverse industry domain which is requiring real time and good performance with only smaller training dataset. JUNGHEE HAN is currently pursuing the Ph.D. degree with the Graduate School of Convergence Science and Technology (GSCST), Seoul National University, South Korea. He is currently a Principal Research Engineer with Korea Aerospace Industries, Ltd. His research interests include computer vision, deep learning, object detection, and instance segmentation.