A Deep Transfer Learning Model for Packaged Integrated Circuit Failure Detection by Terahertz Imaging

Terahertz time-domain spectroscopy imaging system (THz-TDS) is becoming a promising tool for packaged integrated circuit (IC) failure detection due to its nonmetal penetrability and low radiation. However, two major obstacles are hindering the industrial application of the THz-TDS based IC detection method: 1) the low resolution of THz images may affect the detection accuracy; 2) the failure detection tasks are always carried out manually, which is inefficient and inaccurate. Thus, in this paper, we firstly enhanced the quality of IC THz images with a deconvolution algorithm and a mathematically simulated point spread function (PSF), and then we proposed a deep convolutional neural network (CNN) based failure detection framework to achieve end-to-end IC inspection automatically. Besides, we introduced transfer learning to overcome the limitation of the IC dataset size. The result demonstrated that our proposed method achieved excellent performance concerning both accuracy and efficiency.


I. INTRODUCTION
Detection of hidden failures in integrated circuit (IC) packages is vital as there is an increasing demand for quality and reliability in ICs. There are many causes of IC failures. One of the most common types is electrical overstress (EOS), which induces thermal shock, and therefore, generates hidden internal defects, such as cracks, delamination, and voids of packaging materials [1,2].
Accordingly, a variety of failure analysis systems have been investigated to detect hidden defects in IC packages. Conventional failure detection methods include X-ray [3] and scanning acoustic microscopy (SAM) [4]. These approaches are practical but still have shortcomings [5]. To be more specific, X-ray transmission imaging may be harmful to both samples and operators due to its ionizing photon energy. Besides, X-ray cannot detect defects caused by thermal fatigue or EOS, as its energy does not change when passing through air in the defect locations [1,2]. SAM, on the other hand, requires immersion of samples in a liquid environment, which may lead to the oxidation of these electronic devices [6].
Terahertz time-domain system (THz-TDS) is an emerging non-destructive inspection technology, which can penetrate nonmetallic materials and offer information about the composite damages [7]. Terahertz wave has no known safety issues to both samples and human bodies as its low-energy radiation (0.1-10 THz). In addition, it does not require samples exposed to liquid while testing, which enables easy and safe inspection. For these reasons, several studies have investigated the use of terahertz imaging for packaged IC failure detection. According to the distance from the aperture to the detected objects, the imaging systems used in these studies could be classified into two major categories: the near-field and the far-field THz imaging systems. The nearfield imaging system could overcome the distortion effects of diffraction and provide better spatial resolution. For instance, the laser terahertz-emission microscope (LTEM) system achieves a resolution of 3 µm, and it has been successfully applied to inspect the cut interconnection lines in damaged integrated circuits [8,9]. Besides LTEM, the Electro Optical Terahertz Pulse Reflectometry (EOTPR) was also developed to locate and clarify the fault of 3D packaged ICs [10,11], and it has been used in semiconductor failure analysis laboratories in AMD, IBM Intel and TSMC. However, in near-field imaging systems, the objects should be placed at a subwavelength distance from the lens. Thus, objects thicker than approximately 100 µm cannot be imaged using this technique [12,13]. Far-field THz imaging is a more feasible solution for some applications, for example, far-field THz imaging could be used for detecting failures of ICs caused by EOS and ESD [1]. And it can also measure the depth of the hidden voids in IC packages caused by overheating [2]. In addition, it can distinguish counterfeit ICs from authenticated ones by detecting unexpected materials of IC packages [13]. Since the depth of cracks and voids caused by EOS may be more than 100 µm, we utilized the far-field THz-TDS imaging system in our study. However, due to the intrinsic long wavelength of THz wave, the spatial resolution of farfield THz imaging systems is restricted by the spot size, and thus the obtained images often encounter severe blurring, low signal-to-noise ratio, and low-resolution problems [13][14][15]. Thus, poor image quality is the first obstacle that may affect detection accuracy.
The second obstacle hindering the industrial application of the THz based IC detection is the detection tasks are still carried out manually in most previous researches, which is inefficient and inaccurate. Thus, it is essential to develop an accurate and automated detection method. In the last few years, researchers have applied deep learning in automated failure detection [14][15][16][17]. Deep learning methods can learn discriminative features from training samples automatically and then use them for anomaly and failure detection, they have already been applied in many areas, such as face recognition [18], optical character recognition [19], object tracing [20], speech recognition [21] and natural language processing [22]. Although deep learning models have achieved outstanding performances in different fields, there is still a problem need to be addressed. Training an outstanding deep neural network always requires large quantities of labeled samples, which are not easily accessible. An alternative way to deal with this problem is using transfer learning (TL) instead of training a deep learning model from scratch. TL refers to use prior knowledge learned from one problem to solve another different problem [23]. The idea behind transfer learning is the first few convolutional layers of convolutional neural networks (CNNs) pre-trained with a large source data could learn more generic features, such as lines, curves, or shapes. And these layers can be transferred when we retrain the model with the target dataset [24]. More specifically, we can freeze the weights of the first few convolutional layers and fine-tune the latter convolutional and fully connected (FC) layers using a relevant dataset. It has been successfully used in many applications, such as text classification [25], object recognition [26], medical diagnosis [27], art classification [28] and fault diagnosis [24].
In this study, we have introduced a CNN based packaged IC detection method. To strengthen its performance, we firstly enhanced the quality of IC THz images. To be specific, we simulated a point spread function (PSF) that could describe the degradation process of THz images accurately and then deconvoluted the THz images with the mathematically modeled PSF. Then, we introduced transfer learning to overcome the limitation of the IC dataset size. In the training process, the weights of the first few convolutional layers are pre-trained from ImageNet, the latter convolutional and FC layers are fine-tuned from the IC dataset. In this way, TL dramatically reduced the number of parameters needed to be trained and made it feasible to train a deeper network with a limited number of labeled data. In addition, to further reduce the computational complexity and memory requirement, we introduced Mobilenet V2 [29] as our base model since Mobilenet V2 is a lightweight CNN that needs comparatively less computing resource while maintaining good accuracy. Comparative experiments with VGG-16 [30] and Resnet-50 [31] demonstrated that our proposed method achieved excellent performance concerning both accuracy and efficiency. The main contributions of this paper can be summarized as follows: 1) We applied a THz image enhancement method based on the deconvolution algorithm and a mathematically simulated PSF. The method significantly enhanced the resolution of IC THz images. 2) We proposed a CNN based automated IC failure detection method and introduced TL to overcome the limitation of the IC dataset size, yield a better inspection accuracy. 3) We show that our proposed framework leads to a state-of-the-art result for packaged IC failure detection. The rest of the paper is organized as follows. Section 2 discusses the related works. Section 3 presents the methodologies of the proposed approach, including Terahertz imaging and image enhancement, convolutional neural networks, and transfer learning. Section 4 presents the case studies. The conclusion and future researches are presented in Section 5.

A. TERAHERTZ IMAGING FOR IC DETECTION
Terahertz spectrum has been applied as a non-destructive testing method of packaged ICs due to its unique strengths. The first IC THz image was derived by Hu and Nuss in 1995. They have provided a feasible way for IC packages inspection, as THz radiation can transmit through plastic housing and reveal metal interconnects [7]. Years later, Keenan  reflection mode of the THz-TDS imaging system, successfully measured the thickness and depth of the voids in IC packages caused by overheating [2]. THz-TDS has also been applied to authenticate ICs. Ahi detected unexpected materials such as blacktopping layers and sanded and contaminated components by THz spectrum, in order to distinguish counterfeit ICs from authenticated ones [13]. The systems used in above-mentioned researches are all far-field THz imaging systems. In order to achieve micron-level detection, near-field THz imaging was applied for IC failure detection. For instance, Kiwa et al. proposed the laser terahertz-emission microscope (LTEM) system, which has clearly observed the cut interconnection lines in damaged chips [8]. Soon after, they redesigned their optical setup and improved the spatial resolution of the system to below 3 µm [9]. Yamashita et al. has also successfully utilized a LTEM to distinguish damaged circuits with disconnected wires from the normal ones [32]. Besides LTEM, the Electro Optical Terahertz Pulse Reflectometry (EOTPR) was also developed to locate and clarify the faults of 3D packaged ICs [10,11]. It has been used in semiconductor failure analysis laboratories in AMD, IBM Intel, and TSMC.

B. TERAHERTZ IMAGE ENHANCEMENT
In general terms, there are two methods to enhance the quality of THz images. The first approach is multi-image super-resolution (MISR) methods. For instance, Xu et al. [33] have proved the effectiveness of projection onto convex sets (POCS) and iterative back projection (IBP) algorithms on THz image enhancement. However, it is time-consuming to obtain multiple images with the THz raster-scanning imaging system. Thus, the second method, single image superresolution (SISR) is a more accessible method for practical applications. Some optical imaging algorithms were first investigated to improve the quality of THz images, such as, Liang et al. [34] used histogram equalization, wavelet denoising, and canny algorithms to enhance the contrast of the image, denoise and get the edge of the image, respectively. But these methods did not consider the blur problem caused by the convolution operation of the object and PSF. To resolve this issue, the Lucy-Richardson deconvolution algorithm was applied for THz image restoration. According to the PSF of the imaging system is known or not, the THz image enhancement researches can be classified into two categories: non-blind and blind methods. The blind methods only utilize the prior information of the blurred image itself, and then use regularization term (such as Wong et al. applied the total variation [35] and Krishnan et al. utilized the normalized sparsity measurement [36]) to estimate the PSF and improve its spatial resolution subsequently. Apart from blurred images, the non-blind methods also apply measured (Ning et al. [37]) or simulated (Ahi [38,39]) PSF to restore the THz images.

C. CONVOLUTIONAL NEURAL NETWORKS AND TRANSFER LEARNING
Convolutional neural networks (CNNs) were first proposed by LeCun et al. [40] in 1989. However, they did not draw much attention until AlexNet won the ImageNet competition in 2012 [41]. Since then, CNN has been successfully applied in many aspects due to its outstanding feature extraction abilities, such as image classification, face recognition, object detection, etc. [42]. Besides optical images, it has also been used to deal with terahertz images. Zhang et al. [43] adopted a faster region-based convolutional neural network (Faster R-CNN) to detect concealed weapons carried on personnel and achieved an encouraging result. Yang et al. [44] improved the effectiveness and efficiency of the concealed objects' detection by taking two measures, firstly, they employed a deeper CNN model (VGG-16) for THz images classification. Secondly, they utilized the spatio-temporal information via a sparse and low-rank decomposition model. Motivated by these works, this paper proposed a CNN based automated IC failure detection method by using THz images. Generally, the volume of labeled samples is relatively small, and it is almost impossible to train a deep CNN model without a large amount of labeled data. A practical solution is pretrained deep CNNs on ImageNet and then fine-tuned its parameters on the target dataset. CNNs pretrained on ImageNet can also perform well on a small dataset in another different domain [23]. It has been proved in many fields. Donahue et al. [26] evaluated the transferability of the deep convolutional activation features extracted from a large labeled dataset, concluded that DeCAF has strong representational and generalization ability to perform visual tasks. Tajbakhsh et al. [27] compared the performance of pre-trained CNNs with fully trained CNNs on four distinct medical imaging applications, demonstrated that fine-tuned CNNs could yield a better result for medical imaging applications. Cetinic et al. [28] conducted plenty of CNN fine-tuning experiments for five different art-related classification tasks and achieved stateof-the-art classification results. Shao et al. [24] developed a deep learning framework for fault diagnosis. They used a pre-trained VGG-16 network and fine-tuned the last two convolution blocks and fully connected layers with three different testing datasets. Comparative analysis showed that their proposed approach achieves state-of-the-art results.
Though TL greatly improved the performance of CNNs, some conventional networks with complicated structures and numerous parameters, such as VGG and ResNet, still require a great amount of computing power and time, which in turn limited their application on mobile or embedded devices [45]. To solve this issue, Baheti et al. [46] modified the architecture of VGG-16 by replacing its FC layers with convolution layers. Experiment results showed that this modification not only reduced the number of the parameters from 140 million to 15 million, but also enhanced its performance. Pan et al. [47] proposed a welding defect detection model based on a transfer learning and Mobilenet V2, of which the parameter number is only 2.5 million, the experimental results proved their method has better performance than other traditional neural networks. Srinivasu et al. [48] proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM), the proposed system can help general practitioners efficiently and effectively diagnose skin conditions.

A. TERAHERTZ IMAGING AND IMAGE ENHANCEMENT
THz time-domain spectroscopy system (THz-TDS) is a powerful tool for detecting packaged object defects. When obtaining a time-domain THz image, the sample is rasterscanned point by point in a 2-D plane, and the transmitted or reflected THz beam is recorded simultaneously. In this way, we can get a 3-D data cube with the third dimension being time [49]. Then a fast Fourier transform (FFT) is always performed on the time dimension to get frequency-domain THz images, which can present information about the absorption, reflection, or scattering losses and usually give better imaging results. The frequency-domain THz images can also be viewed as a 3-D data cube, with the third dimension being frequency. After FFT transformation, a high pass filter (HPF) is usually applied to filter out the lowfrequency spectrum, since the distortion effects of the diffraction are higher for the lower wavelengths. Then a low pass filter (LPF) is performed to suppress the noise. As higher-frequency spectrum usually has a lower signal to noise ratio (SNR).
The scanning process could be expressed as follows: where i (x, y, z) is the THz image, o (x, y, z) is the object function, the x-y plane is the raster-scanned coordinate (focal plane), the z-axis is the traveling path of the THz beam and z = 0 is the focal plane, PSF is the point spread function, ⨂ denotes the convolution operation, which is a mathematically modeling of the raster-scanning operation, and n is the additive system noise.
Previous studies have already revealed that the key aspect of THz image restoration is the construction of the PSF. In this study, we applied the mathematical modeling of the PSF proposed by Ahi [39], which has been proved technically feasible in many studies [13,38,50], and could be illustrated in Fig. 1 and expressed as: , where x, y is the position from the center of the beam, f is the frequency of the THz beam, ω (z, f) is the spot radius of the beam at distance z from the focal plane and could be calculated by: and ω (0, f) is the spot radius at the focal plane: where c is the speed of light, and NA is the numerical aperture of the imaging system.

B. CONVOLUTIONAL NEURAL NETWORKS
CNNs are designed to simulate the way that biological neural networks handle images. Unlike traditional artificial neural networks (ANNs), CNNs can extract hierarchical features automatically, which are beneficial to image processing tasks.
A CNN generally consists of three elements: convolution layers, pooling layers, and fully connected (FC) layers. The convolution layers, pooling layers are integrated to form feature extractors, and the FC layers are used as image classifiers. Fig. 2 illustrates a typical CNN architecture. CNNs are so named because the key element in the CNN structure is the convolutional layer. The convolutional layers are designed based on two principles: local features and shared weights. As our brains always understand an image through combinations of local features without taking their spatial distribution into account.
To achieve local feature detection, the neural nodes of the convolutional layer are only connected to a small adjacent sub-area of the input images; to detect the same features in different locations, the weights and bias are shared between different nodes in the convolutional layer. The shared sets of weights and biases are called kernels. The kernels are convolved across the input images, and the process can be described as follows: Where C i is the value of the i-th feature map; W i and b i denote weights and biases of the i-th convolutional kernel; X represents the pixel value of the input image and ⨂ is the dot product of the kernel and input image.
After the convolution layer, it is common to add an activation function to enable the network to acquire the nonlinear relationship between input and output, which can be defined as:

( )
Where A i is the activation value of C i , and α() denotes the activation function.
A convolution layer is always followed by a pooling layer, which is used to down-sample extracted features, and thus, reduce parameters of the network. The pooling layer can be described as: Where P i is the new value of the i-th feature map after pooling operation, pool() represents the pooling rules, m and n denote the width and length of the pooling area, A m,n,i is the value in the (m, n) region of the i-th feature map.
Convolution layers and pooling layers are usually combined to form convolution blocks. After several convolution blocks, FC layers and SoftMax functions are followed to perform classification. This process can de represent as follows: where Y is the predicted labels; w f and b f represent the weights and biases between FC layer and output layer. X is the node values of the last FC layer.

C. TRANSFER LEARNING AND FINE-TUNING
Transfer learning is implemented by duplicating the parameters of the pre-trained model to the target model [23]. Typically, the weights of a CNN are initialized randomly and then updated iteratively based on the loss function. However, a deep CNN usually has millions of weights and biases to be trained, which, in consequence, requires large numbers of labeled training data. Otherwise, the network may be overfitting or not converging.
Transfer learning is a promising method that can assist in the training of CNNs. Earlier convolutional layers are able to learn low-level features, such as curves and edges, which are applicable to most image classification tasks, whereas later convolutional layers are able to learn high-level features, which are more specific to different applications [23]. Thus, the weights of earlier convolutional layers which represent low-level feature extractors can be transferred. Only the weights of later convolutional and FC layers need to be learned from the new training dataset. The operation of updating the weights of last few convolutional and FC layers of a CNN is called fine-tuning. Fine-tuning a CNN generally begins with copying the weights of all the layers from a pretrained model to the network we would like to train. Thus, the feature extraction and classification ability are transferred from the pre-trained model to the new one. Then, the last or last few FC layers are replaced by other FC layers with the same number of neurons as classes in the new application. After the weights of the new FC layer are initialized, the network can be fine-tuned, starting from only tuning the last FC layers to tunning all the layers in the CNN. The number of layers need to be fine-tuned is different according to the distance between the source data and the target data. For a similar dataset, only FC layers need to be fine-tuned, whereas, for a significantly different dataset, early convolutional blocks need to be fine-tuned as well. Compared with training from scratch, fine-tuning a CNN is more accurate and less time consuming as it remarkably reduces the number of parameters that need to be trained.

IV. EXPERIMENTS AND RESULTS
To test the performance of the proposed detection method for packaged ICs and verify its effectiveness. Firstly, we obtained and enhanced the THz images of the tested ICs. Then, we constructed the THz IC detection dataset. Finally, we input these images into the proposed models for failure VOLUME XX, 2017 1 detection. The procedure of TL based IC detection is given in Fig. 3. The THz time-domain spectroscopy system used in this study is T-Ray 5000 produced by Advanced Phonotix, Inc. The setup of this system in transmission mode is shown in Fig. 4 (a), and Fig. 4 (b) is a photograph of the actual experimental setup. The spectral bandwidth of our TDS system ranges from 0.01 THz to 5 THz theoretically. However, the real range is from 0.01 THz to 1.0 THz, since the SNR of the signal whose frequency is higher than 1.0 THz is significantly low. The numerical aperture (NA) of this system is 0.125, which was determined by the diameter of the lens (38 mm) and focal length (150 mm). Packaged ICs, as shown in Fig. 5 (a) and 5 (f), were scanned in this study. When obtaining the THz time-domain image, we first fixed the IC on the XY 2D moving stage, and the THz image is constructed point by point by physically moving the sample on the stage. The raster-scanning step size was 0.25 mm, and the imaging range is 220 × 65 (the interval of the pixels is 0.25 mm). Then, we performed a FFT to get the frequency-domain THz images, which usually give better imaging results, as shown in Fig. 5 (c) and 5 (h). Afterwards, an HPF with 1.0 and LPF with 0.6 were applied respectively to suppress noises and minimize diffraction effects, as shown in Fig. 5 (d) and 5 (i). The wavelength of the derived THz beam is from 0.3 mm to 0.5mm. Finally, after deconvolution with the PSF modeled by equation 2, the restored THz image is shown as Fig.5 (e) and 5 (j). As can be seen, Comparing the optical and X-ray images of the qualified and defective ICs, we cannot observe any defect. Though we can notice some difference between the original THz images of qualified and defective ICs, their resolution is too low to identify the key information, such as the actual shape of the die frame, bond-wires, and the defect region, which are shown more clearly in the restored images.

1) TERAHERTZ IMAGING AND ENHANCEMENT RESULTS
We also analyzed the waveform of the qualified and detective ICs to clarify their image difference, as shown in Fig. 6. Fig. 6 (a) illustrates that the traversed THz pulses from the defective IC show more time delays and attenuations, and Fig. 6 (b) shows that the detective IC have smaller amplitudes due to the energy attenuation of THz wave passing through the defective region. Thus, the defective region has lower brightness in the frequencydomain THz images.

2) TERAHERTZ IC DATASET
We have collected 94 packaged ICs, and half of these IC samples were subjected to 30V DC to simulate failure caused by EOS. We then divided all the collected samples into two classes: the samples subjected to 30V DC are defective ones, whereas the others are qualified ones. Afterwards, these 47 IC pairs (47 qualified ICs and 47 defective ones) are randomly divided into a training set with 33 sample pairs and a testing set with the remaining 14 pairs. Due to the difficulty and expense of collecting IC THz images, the training samples are insufficient, which may influence the generalization of the trained CNN model and lead to overfitting [51]. Thus, a maximum of 15% zoom, width shift, height shift and a rotation varied over an interval of 15° were employed to augment the training set. After augmentation, there were 660 pairs of training images. And they were further divided into 520 pairs of training images and 140 pairs of validation images. The CNN used in this study is MobileNet V2 [29]. MobileNet V2 is a lightweight CNN proposed by the Google team. The key improvement in MobileNet is that they replaced regular convolutional layers with depthwise separable convolutional blocks [45]. Each separable convolutional block consists of a 3×3 depthwise convolutional layer to filter the input, and then followed by a 1×1 pointwise convolutional layer to combine these filtered values. The separable convolutional block has far fewer parameters, and thus, it is much faster than the conventional convolutional layer with nearly the same accuracy. Besides separable convolution, MobileNet V2 also adopts two other approaches to improve its performance further. The first improvement is the application of the expansion layer, which enables the model to extract features in higher-dimensional spaces to avoid information loss. The second improvement is the residual connection which could help the flow of gradients through the network. The basic unit of MobileNet V2 is the bottleneck residual block, which mainly consists of three convolutional layers: a 1×1 expansion convolutional VOLUME XX, 2017 1 layer, a depthwise convolutional layer, and a 1×1 pointwise convolutional layer. The architecture of MobileNet V2 is shown in Fig. 7. It is composed of 17 bottleneck residual blocks in a row, then followed by a regular 1×1 convolution, a global average pooling layer, and a classification layer with class number k.

2) TRANSFER LEARNING SCENARIOS AND TRAINING SETTINGS
Generally, two approaches can be applied to achieve TL. In the first approach, the pre-trained CNN is used as a feature extractor, that is, during the training process, all the convolutional layers of the pre-trained model are frozen, only the weights of the FC layers are updated with the new dataset. The second approach which could further increase the performance of the model is to fine-tune a few selected convolutional layers of the pre-trained CNN alongside the FC layers [52].
In this study, we trained Mobilenet V2 with the second method, which achieved better results in many studies [24,27,28,52,53], especially when the target dataset (the collected dataset) is different from the source dataset (ImageNet [54]). The number of convolutional layers needs to fine-tune differs with regard to the target dataset size and the distance between source data and target data [27]. Therefore, we designed three different fine-tuning schemes to find the best solution, one fine-tuning (1FT), 2FT and 3FT. 1) 1FT: only the weights of the last bottleneck residual block and the followed layers are updated while the rest of the layers are frozen. 2) 2FT: the weights of the last two bottleneck residual blocks and the followed layers are trainable. 3) 3FT: the weights of the last three bottleneck residual blocks and the followed layers are trainable. For comparison, we also fine-tuned VGG-16 and Resnet-50 with the same scenario, and then we trained all the three CNN models from scratch with the same training data.
We tested our proposed fine-tuning scenarios on the constructed THz IC image dataset. Since the original THz IC images are gray-scale images with the size of 220 × 65, during the training process, we firstly resized the THz images to 224 × 224, and we then duplicated them into three channels, as the input size of the three fine-tuned models is all 224 × 224 × 3. Then, we trained them with the SGD optimizer, the learning rate is 0.001, and the batch size is set to be 32. To prevent overfitting, the training was stopped when the validation accuracy did not show any improvement over ten epochs. All experiments are carried out on the GPU with GeForce GTX 1080Ti, by using Keras library [55] for training and fine-tuning the CNN models.

C. RESULTS
The classification accuracy for different fine-tuning schemes is given in Table I, with the best indexes shown in bold. We can see that CNNs trained from scratch are outperformed by all the fine-tuning scenarios when we only have a limited number of training data. For the fine-tuning scenarios, 2FT achieved the best performance for all the three CNN models, thus it is the optimal fine-tuning scenario.  Table II shows the classification accuracy, training time and the number of parameters when the number of fine-tuned blocks is set to 2. As seen, though both VGG-16 and MobileNet V2 achieved a 100 percent accuracy rate of classification, combine with the training time and parameter numbers, we can conclude that the lightweight MobileNet V2 gain the best performance in the least time.

V. CONCLUSIONS
In this paper, we aim to present a practical solution for terahertz-based IC failure detection, we firstly enhanced the quality of THz images to improve detection accuracy, and then we proposed a CNN based failure detection framework to achieve end-to-end IC inspection. Besides, we investigated whether TL can promote the performance of the IC detection networks. Comparison results between fine-tuned and fullytrained CNNs showed that knowledge transfer from natural images to IC THz images is possible, even though their distance is relatively large. Thus, TL provides a feasible solution for CNN training when we could not collect large quantities of labeled training data. The comparison results between 3 different fine-tuning scenarios showed that finetuning the last two bottleneck residual blocks (or convolutional blocks) achieved the best performance. The assumption is that though last convolutional layers are able to learn high-level features, which is helpful to the specific application, fine-tuning too many layers may lead to overfitting due to complex model structure and limited training data. Moreover, we can conclude the lightweight MobileNet V2 is an ideal CNN model for IC failure detection, since compared to VGG-16 and ResNet-50, it gains the best performance in the least time. VOLUME XX, 2017 1 However, our proposed packaged IC detection method may have some limitations. Firstly, Due to the difficulty and expense of collecting IC THz images, we could not collect as many training samples as possible. Though we adopted a data augmentation algorithm, the limited training data size may still influence the testing accuracy, especially for the trained from scratch scenario. Secondly, our CNN could only detect defects caused by EOS. We would collect ICs of more different failure types to train our model, so that our model could recognize more failure types. Finally, we only classified the tested ICs into qualified and defective ones. We would train our proposed networks to determine the exact location of the cracks and voids hidden in the IC packages in future works.