Segmentation and Measurement of Superalloy Microstructure Based on Improved Nonlocal Block

The microstructure of superalloy materials has a decisive impact on its service performance. When preparing the material and photographing the microstructure, different depths of metallography perpendicular to the cut plane appear in the microstructure image. These metallographic features are major factors causing inaccurate segmentation. Aiming at the problems of traditional image processing methods, such as large noise and poor robustness of the edge extraction, the deep learning method is introduced. The receptive field of the traditional convolutional neural network method is too local to obtain remote dependencies, and dense feature extraction also brings problems such as excessive noise. To address the problem of non-tangential metallographic information in microstructure images that is not easily distinguishable. We consider how to combine and utilize the feature maps of the intermediate process as effectively as possible, and find that reusing affinity matrices from different angles and stacking them can improve the overall effect, and can solve the problem of inaccurate segmentation of metallographic images at different depths. We propose to improve and optimize the non-local attention module and further combine the module with the UNet network to form a new improved SNL-Unet image segmentation structure, which significantly raises the accuracy and efficiency of image segmentation, Additionally, we measure the characteristic parameters such as volume fraction, average thickness and the degree of rafting. The code for this paper will be available at github.com/ustbjdl1021/improved-snl-unet.


I. INTRODUCTION
T HE microstructure is the key to determining the properties of superalloy materials. Microstructure analysis is mainly based on the shape, distribution, and other characteristics of different tissues in microscopic images. Manual analysis methods have problems such as the influence of subjective factors, excessive time consumption, high cost, limited local statistics, and low accuracy. Therefore, seeking more advanced microstructure image analysis methods for superalloys and improving the calculation speed and accuracy of microstructure features have become the key to the metallographic analysis.
When measuring and calculating characterization parameters such as volume fractions, more accurate information on the metallic phases in the same cutting plane is required.
However, when preparing materials and taking pictures of microstructures, due to the limitations of on-site process conditions and errors in manual slicing operations, different depths of metallography perpendicular to the tangent plane will be present in the microstructural image. These image information that is not in the same cutting plane are not in the scope of characterization parametric measurement, so they need to be accurately rejected during image segmentation.
Therefore, in order to employ the information in microstructure images more efficiently, it is necessary to improve the network's ability to utilize image features and cope with the problem of identifying depth information.
We propose an improved nonlocal bolck, which can better segment deep and shallow information in effect. Further, we combine the improved non-local attention module and UNet network to form an improved SNL-UNet network structure, which significantly improves the accuracy and efficiency of image segmentation.
Experimentally, the segmentation result of this method is better than that of other network structures, and the ability to recognize the depth information of the image is also significantly improved. Additionally, based on more accurate segmentation results, the microstructure characterization parameters such as volume fraction, rafting degree of γ phase and thickness of γ phase are measured.

A. RELATED WORK
Some studies [1], [2], [3] show that the distribution of the γ ′ phase have an important influence on the mechanical properties and hardness of superalloys. Recent years, more and more attention has been paid to the study of γ-phase [4], [5]. Hou et al. [6] find that the yield strength and deformation mechanism are temperature-dependent. Sun et al. [7] find that morphology and size of secondary γ ′ are sensitive to temperature and strain rate.
There are some feature representation methods that combine imaging techniques and algorithms. Payton et al. [8] study that backscatter electron imaging of specimens in which the γ ′ phase has been selectively etched yields images that can be more readily segmented with image processing algorithms than other imaging techniques. Tiley et al. [9] measure γ ′ precipitates in a nickel-based superalloy using energy-filtered transmission electron microscopy coupled with automated segmenting techniques.
Traditional image processing methods mainly focus on image texture, gray difference, and other single features for segmentation, such as Otsu [10] method threshold segmentation, Canny [11] edge detection operator, Sobel [12] edge operator, etc. These methods have problems in edge extraction of material microstructure, such as blurred edges, low precision, and many noises.
There are methods based on traditional algorithms that perform well in segmenting microstructure images. Cao et al. [13] develop a new Multichannel Edge-Weighted Centroidal Voronoi Tessellation (MCEWCVT) algorithm to automatically segment all the 3D grains from microscopic images of a super-alloy sample. Chuang et al. [14] study the combination of a region merging segmentation method called the stabilized inverse diffusion equation (SIDE), and a stochastic segmentation method, the expectation-maximization/maximization of the posterior marginals (EM/MPM) algorithm. Ewees et al. [15] presents a hybrid meta-heuristic approach for multi-level thresholding image segmentation by integrating both the artificial bee colony (ABC) algorithm and the sine-cosine algorithm (SCA). Alwerfali et al. [16] develope an alternative MTI segmentation method by using a modified version of the salp swarm algorithm (SSA).
In recent years, the continuous improvement of the computing power of computer hardware equipment has provided the possibility for the employment of deep learning in many fields. Deep learning methods can learn from intensive material data to quickly find new materials with target property [17], [18], [19].
Convolutional neural network (CNN), as a classic application structure of deep learning, can automatically extract features from massive data and has good generalization capabilities. Long et al. [20] proposed a fully convolutional neural network (FCN) based on semantic segmentation, replacing the fully connected layer in CNN with a convolutional layer. Ronneberger et al. [21] proposed a symmetrical U-Net network based on the idea of encoder-decoder structure based on FCN and flexibly used the jump connection between deep and shallow networks, thus overcoming the fact that FCN cannot retain part of the pixel spatial position information. Moreover, context information leads to the shortcomings of loss of local features and global features. Zhou [22] and others proposed the U-Net++ network, rethinking the downsampling and up-sampling times of the UNet network, redesigning the multi-scale connection nodes in the original network, adding in-depth supervision to pruning different segmentation tasks to the appropriate network, Achieving a good segmentation effect. Mehta and Sivaswamy [23] proposed an M-Net network based on U-Net for brain magnetic resonance imaging image segmentation, three times faster than random forest and 2D CNN in volume segmentation. At present, the U-Net network and a series of derivative network structures [24] have become one of the most popular image semantic segmentation methods.
Although the UNet network and its series of derivative network structures perform well in image segmentation [25], [26], they are limited by a large number of intensive use of convolution operators, obtaining the interdependence between remote information is inefficient, and the receptive field of image feature extraction is excessively localized. Stacking more layers of convolution operators does not always increase the effective receptive field, so convolution operators still lack a mechanism for modeling remote dependency information.
There are many works on fusing multi-scale features. Hu et al. [27] propose a joint feature pyramid (JFP) module, and built a spatial detail extraction (SDE) module, design a bilateral feature fusion (BFF) module, making full use of the correspondence between high-level features and lowlevel features. Benvcevic et al. [28] propose training a neural network on polar transformations of the original dataset, such that the polar origin for the transformation is the center point of the object. Mohamed et al. [29] propose a hybrid SI based approach that combines the features of two SI methods, marine predators algorithm (MPA) and moth flame optimization (MFO). Liu et al. [30] The author proposes an improved Itti model and an improved GrabCut image segmentation algorithm for PET images that are low-resolution grayscale images to solve the problems existing in the original algorithm in grayscale images.
The Nonlocal block [31] can capture remote dependencies more robustly and flexibly to help the deep network bet- ter integrate Nonlocal information. Some Nonlocal network modules currently proposed are NL [31], A2 [32], NS [33], CC [34], CGNL [35], SNL [36], etc. Chen et al. [32] propose the Double Attention block, which first collects the features in the entire space and then assigns them back to each location. Yue et al. [35] proposed a compact generalized Nonlocal block to capture cross-channel cues, which inevitably increases the noise of the attention map. Huang et al. [34] proposed a lightweight Nonlocal block called an interleaved attention block, which decomposes the positional attention of NL into conterminously column-wise and row-wise attention. In order to improve the stability of the NL block, Tao et al. [33] proposed using the Laplacian of the incidence matrix as an attention map, and the Nonlocal stage (NS) module can follow the diffusion characteristics. Zhu et al. [36] proposed The SNL (Spectral Nonlocal Block) module, which symmetrically processes the attention feature block. From the perspective of a new Chebyshev approximation and graph filtering, it uniformly explains the above five famous Nonlocal modules and provides a theoretical explanation.
The Nonlocal operator in the SNL block is equivalent to filtering the signal with a set of graph filters. This article takes this feature as a starting point, considering that the use of the SNL module for the incidence matrix is equivalent to filtering in the form of row transformation and lack of another corresponding form of column transformation filtering, proposed an improved SNL block and combined with the excellent performance of the image segmentation field UNet network structure. Experimental results show that the performance is state-of-the-art, improving mIoU significantly, and better than other segmentation networks such as Unet.
In a nutshell, our contributions are threefold: • This paper proposes an improved nonlocal bolck, which can better segment deep and shallow information in effect.
• This paper combines an improved non-local attention module and UNet network, which significantly improves the accuracy and efficiency of image segmentation.
• Based on more accurate segmentation results, the microstructure characterization parameters such as volume frac-tion, rafting degree of γ phase and thickness of γ phase are measured.

II. METHOD
This paper proposes an improved SNL block, combined with the UNet network, to form an improved SNL-UNet network structure. The network inserts an improved SNL module after multiple downsampling to perform attention feature transformation, which has significantly improved the depth information recognition ability of the image, and compared with other network structure methods, it can more accurately distinguish the target area γ ′ phase and background (non-γ ′ phase area) of the superalloy microstructure. Briefly mention that, The γ phase is the ideal area for the white part of the image in Figure 3, and correspondingly, the γ ′ phase is the ideal area for the black part.

A. NETWORK STRUCTURE
The method network in this paper is an encoder-decoder structure as a whole, as shown in Fig. 1. The network consists of encoders, decoders, jump connectors, and improved SNL block.
The input image enters the encoder for image feature extraction and reduces network parameters through multiple maxpool downsampling to obtain a larger receptive field. The encoder part specifically includes a structure combining four repetitions of convolution and down-sampling. Each structure module includes convolution, ReLU activation function, BN layer, and maxpool down-sampling.
The image features collected through the encoder structure are sent to the improved SNL block. After passing through the improved SNL block, the output image feature block is up-sampled in the decoder part through linear interpolation and then concatenates with the features of the corresponding layer of the encoder on the channel. Then proceed to the traditional convolution module. After the above steps are repeated many times, the feature size is the same as the original image, and then convolution is used to perform image pixel-level classification and output feature map channels VOLUME 4, 2016 equal to the number of pixel classification categories for subsequent loss function calculation and network parameter training. In order to achieve the purpose of segmenting the microstructure image of the superalloy.

B. NONLOCAL BLOCK
The Nonlocal block calculates a weighted sum of pairings between the features of each position and all possible positions. The nonlocal operator is defined: where X ∈ R N ×C1 is the input feature map, i,j are the position indexes in the feature map, f (·) is the affinity kernel with a finite Frobenius norm, g(·) is a linear embedding that is defined as g (X j,: ) = X j,: Here N is the total positions of each feature. C 1 and C S are the numbers of channels for the input and the transferred features.
When inserting the NL block into the network structure,a linear transformation with weight matrix W ∈ R C S ×C1 and a residual connection are added:

C. IMPROVED SNL BLOCK
As shown in Fig. 2, this paper proposes an improved SNL block based on the SNL block. This block is essentially a non-local attention block. In view of the fact that the receptive field of the convolutional network is too local, this block can capture remote dependencies more robustly and flexibly to help the deep network better integrate non-local information.
The input image features with the shape [C,W,H] are reshaped into a feature map with the shape [WH,C] after the 1 × 1 Conv on the two branches, Then Reshape into a feature map T with the shape [WH,C] and a feature map P with the shape transposed [C,WH], multiply the two feature maps by matrix. After symmetric normalization, the affinity matrix A is obtained, and the formula is as follows: In the above formula M = T P , DM is a diagonal matrix containing the degree of each vertex ofM . Symmetrically processedM is composed of pairwise similarities between pixels,the affinity matrix A normalized fromM in Figure 2 represented by a feature block att with the shape [WH,WH]. The affinity matrix A is the attention block of image features, including the pairwise similarity relationship between image pixels and non-local attention information.
The input image features with the shape [C,W,H] are reshaped into a feature map with the shape [WH,C] after the 1 × 1 Conv on the other branch, Then Reshape into a feature map K with the shape [WH,C] and a feature map G with the shape transposed [C,WH], Feature map K and feature map G perform matrix left multiplication and matrix right multiplication with matrix A, respectively. Feature elementwise added after passing through the matrices W 1 and W 2 in the form of convolution, and then through the BN layer.On this basis, the output feature block is added with the identity element of the input feature block to form a residual jump connection. The formula is as follows: The input feature map X passes through the entire module, and the output is the feature map Y. Such a module is called a stage, which can be selectively repeated n times according to the processed data set and task requirements to achieve the best feature attention extraction effect.

A. DATESET
The image dataset for this task is the microstructure image of the DZ125 directionally solidified superalloy blade material obtained by the ZEISS SUPRA 55 field emission scanning electron microscope. In DZ125's microstructures, such as carbides, grains, grain boundaries, dendritic core(DC) and interdendritic(ID) region. A grain boundary is the interface between two grains, or crystallites, in a polycrystalline material. A dendrite in metallurgy is a characteristic tree-like structure of crystals growing as molten metal solidifies, interdendritic region is the alloy region corresponding to the dendrite core. the γ ′ phase of dendrites is an important strengthening phase of superalloys, and the volume of γ ′ phase particles directly affects the strengthening ability of superalloys. It has the properties of uniform size and shape that can sensitively reflect the difference in service temperature and stress of blade materials. Therefore, this article focuses on the study of the γ ′ phase of the DZ125 microstructure.
Due to the difficulty of obtaining electron microscope images of this material, the long time to take microstructure images, and the high cost, the data sets are all small samples. The training set has a total of 200 images, and the image size is 512×512 pixels. During the training process, 10% of the images are randomly sampled as the verification set through cross-validation. We selected about ten images with different metallic morphology as the test set. Classification of the data set based on the different morphological characteristics exhibited by metallography and the different directional cut planes used for image acquisition. The dataset is composed of images of dendritic core(DC) and interdendritic(ID) region mixed together to form a training set. On this basis, the online method is adopted in the data set loading process, the CPU is used to generate transformed images, horizontal and vertical flipping, random mirroring and other data enhancement methods are carried out to expand the amount of data to reduce the risk of overfitting. Nevertheless, compared with other image segmentation tasks, this dataset can be described as a fairly small scale. The method in this paper still  solves the problem of semantic segmentation implemented on small datasets of superalloy microstructure images. In the field of deep learning, there are many difficulties in the learning of small datasets, and most tasks require massive data for learning. Therefore, the method in this paper is of great significance for the realization of small dataset image segmentation.

B. EXPERIMENT
Since the network structure in this article is used to segment the microstructure image of superalloys, the purpose of the task is not only to classify the category of a single-pixel accurately but also to separate the phase structure of the image as a whole, so the combination of cross-entropy and Dice loss is adopted.
The experiment is based on the PyTorch1. 9.0 open-source deep learning framework, and the training environment is Anaconda2020.11, CUDA10.2, cuDNN7.6.5. Training is performed on a server with GPU model NVIDIA Tesla V100-SXM2. The training parameters are as Tabel 1

C. SEGMENTATION RESULTS AND ANALYSIS
In order to verify the performance of the network structure proposed in this paper in the image segmentation task, it is compared with other semantic segmentation network structures on the same superalloy microstructure image data set. We conducted a comparison experiment with the same hardware conditions, the same hyperparameter configuration and the same number of image datasets. A variety of evaluation indicators were compared. The pixel-based evaluation methods mainly include pixel accuracy and mean accuracy, and the evaluation methods based on intra-class coincidence mainly include (Mean intersection over union, MIoU), (Frequency weighted intersection over union, FWIoU) and Dice score [37] . Cluster-based evaluation methods mainly include Rand index (RI) [38], [39] and Adjusted rand index (ARI) [40]. Table 2 shows the comparison of evaluation indexes corresponding to each segmentation method and the time consumed per image prediction in seconds.
The segmentation results show that for microscopic images of superalloys, the grain boundaries are not obvious or even difficult to distinguish with the naked eye due to the unique properties of metallography. The traditional threshold segmentation methods are not ideal. Although the segmentation effects of deep learning segmentation methods such as UNet, ResUnet [41] and the recently popular ViT [42] have excellent performance, there are apparent segmentation errors and unnecessary noise. As shown in Fig. 3, the area in the red box is a non-target area. Non-target areas refer to metallic phases that are not in the tangent plane of the image acquisition and they are perpendicular to the tangent plane. The segmentation effect of this method is better than that of other network structures, and the ability to recognize the depth information of the image is also significantly improved.The segmentation effect of this method is significantly better than other network structures, and the depth information recognition ability of the image has been significantly improved. Compared with other network structure methods, it can be more accurate. The γ ′ phase and the background (non-γ ′ phase area) of the target area of the superalloy microstructure can be divided into the ideal level division and more accurate grain boundaries, and the evaluation indicators of the mIoU and the pixel accuracy are also significantly improved. Table  2 shows the comparison of evaluation indexes corresponding to each segmentation method.
Based on more accurate segmentation results, the charac-VOLUME 4, 2016  teristic parameters such as volume fraction, γ ′ phase thickness, and rafting degree are measured and calculated. The phase volume fraction is defined as follows: where, P and P T are the phase volume and total volume respectively. The γ ′ phase rafting degree Ω , which measures the degree of deformation of a material under the influence of temperature, stress, etc, is defined as follows: Where P L represents the number of crossings and interruptions of the γ ′ raft-shaped tissue within a unit length in a specified direction. Ω is a value between 0-1. When Ω is 0, isometric organization. When Ω is 1, ideal raft organization. The larger the Ω value, the more perfect the raft shape. The γ ′ phase thickness D is the average pixel thickness of the γ ′ phase under 20k magnification. P ⊥ L and P ′′ L are two mutually perpendicular P L , one of which is in the same direction as the metallurgical growth. As shown in Figure 4, it is easy to know from the measurement results that as the service time increases, the degree of rafting gradually increases. The statistical results of the γ phase volume fraction distribution within the same service time are shown in Figure 5. The measured data and statistical results show that the distribution phenomena such as the increase of the raft with time and the increase of thickness with time are consistent with the objective laws.

IV. CONCLUSION
In this paper, an improved module is proposed to solve the problem of metallographic features of different tangent planes that are difficult to accurately segment in microstructure images. And further combine the module with the UNet network to form a new improved SNL-Unet image segmentation structure, based on the improved SNL-Unet and conducts experimental verification with DZ125 alloy material microstructure images. The comparison with other methods shows that this method can more accurately distinguish the non-tangential metallographic phase that is difficult to segment, not only can meet the needs of complex image segmentation of superalloys, but also can further improve the segmentation accuracy and robustness of microstructure images of superalloys. . In particular, the extraction of depth information is more accurate, ensuring the accuracy of the same plane tissue feature extraction and calculation, and it is significantly improved under various evaluation indicators. Based on more accurate segmentation results, the microstructure characterization parameters such as volume fraction, rafting degree of γ ′ phase and thickness of γ ′ phase are measured. The measured data and statistical results are consistent with the distribution of material properties with time and under stress, i.e., phenomena such as the increase in rafting with time and the increase in thickness with time. However, there are still many problems that need to be solved by follow-up research. For example, the data set is more specific and unique, and the restrictions are large, so the method is not general enough. The follow-up work will continue to increase the image analysis research work of different types of superalloy microstructure image organization, and carry out more research on the relationship between organization and performance in combination with material properties.