Automatic Recognition of Palaeobios Images Under Microscope Based on Machine Learning

The research of paleontology is an essential part of contemporary earth science. However, the time-consuming manual identification process has always been cumbrous in the field of paleontology. Since conventional algorithms have limited efficiency in processing images of complicated paleontological fossils. In this work, a combinational machine learning method, which comprises appropriate image preprocessing, Scale-invariant feature transform (SIFT), K-means clustering (K-means), and Support Vector Machine (SVM) are applied to realize automatic recognition of paleontological images under microscope. It is demonstrated that this combined algorithm has superior performance in morphological feature extraction in the case of complex mineral textures. With this technique, the overall average accuracy of image recognition is 84.5%, which significantly improved the efficiency of sample analysis in the field of paleontology.


I. INTRODUCTION
Through the analysis of palaeobios, researchers can explore the origin, evolution and development of life [1]. Paleontological specimens provide direct information about the ecological environment of the earth in ancient times [2], enabling us to understand the earth human live on [3]. By studying the biological remains and fossils preserved in the stratum, researchers can determine the ages of strata [4], understand crustal development [5], and infer the climate change [6], mineral sedimentation, and oil gas distribution in the geological history [7].
Fossils are acknowledged vital in paleontological research, and most of the biological ones are permanently buried in the strata along with geological movements. With the exploration and development of oil and minerals, drilling technology and coring technology [8] have provided the possibility for these large number of buried paleontological objects to be seen. The most commonly used technique in paleontological researches is to use drilled and sampled cores to make thin slices [9], so that tiny palaeobios can show their unique overall characteristics and detailed features under an optical microscope [10]. By observing the paleontological characteristics The associate editor coordinating the review of this manuscript and approving it for publication was Yongming Li . of different paleontological species or their combinations under microscope, the category of palaeobios can be determined, and the ages and environments that they existed in can be elucidated. It is an important indicator of the biological evolution history, and can be used for the exploration of oil gas resources and mineral resources [11]. However, the identification and classification of palaeobios under microscope is a complicated and time-consuming task. Professionals are supposed to face difficult operations, slow processing efficiency and large workload. Therefore, an advanced and userfriendly processing technique is demanded.
Machine learning has shown remarkable superiority in efficient automatic image recognition [12]- [14]. In this work, a combination of machine learning based image recognition technologies was applied to paleontological microscope images. The experimental results demonstrate that this method provides a novel application prospect of paleontological fossil automatic identification.

II. BACKGROUNDS AND REQUIREMENTS
The difficulties in recognizing paleontological images under microscope are concluded as follows: (1) Palaeobios are scarcely distributed in the stratum, so it is difficult to obtain fossil samples with complete paleontological remains. Therefore, the commonly used convolutional neural network method which requires huge amount of training images appears to be inapplicable here.
(2) The microscopic images are complex, and the samples are filled with various minerals inside and outside. Besides, there are paleontological deformation caused by metasomatism and compaction in the process of rock diagenesis, causing a strong interference to image recognition. As a result, it is difficult to identify the texture of palaeobios, and choosing a suitable texture classification method becomes necessary [15].
(3) Rock specimens can only display a two-dimensional cross-section of the three-dimensional palaeobios. Different angles and positions selected from the same sample when making rock slices may exhibit very different characteristics in two-dimensional images.
As discussed above, some popular algorithms in computer vision, clustering and classification were investigated but show drawbacks in some aspects. The first one to discuss about is the popular convolutional neural network feature extraction algorithms like VGG [16] and ResNet [17]. These algorithms usually show good feature extraction abilities with robustness against noises and interferences. However, the training and fine tuning of the neural network model cannot be conducted in this scenario due to the lack of samples for training. Accordingly, the popular neural network methods without fine tuning exhibits very poor performance on the recognition of paleontological images with noisy background. Meanwhile, traditional feature extraction algorithms like Histogram of Gradient (HOG) [18] possess advantages on extracting rigid object features [19], but it is not capable of handling feature extraction with object occlusion and rotation [20]. Hence, it is not suitable in our research where the rock thin slices often contain incomplete palaeobios with deformed and rotated views and shields. Regarding the choice of classification algorithm, the k-nearest neighbors (KNN) algorithm [21] has poor classification performance when the sample is unbalanced and rare, which is not suitable for comprehensive analysis [22]. Decision tree algorithms [23] designed for discrete data processing, display a good classification effect for discrete data information of conceptual types [24], but an unsatisfying performance in processing continuous data flow [25].
Based on the algorithm investigation, an image pre-processing step is firstly adopted on the original microscopic images in order to strengthen the texture characteristics of palaeobios, followed by a combination of computer vision algorithms of SIFT, K-means and SVM to form the effective palaeobios recognition procedure. SIFT is chosen because of its advantages such as scale invariance [26], antiocclusion [27] and stable feature [28] extraction against viewing angle change and noise. SVM is chosen based on its high accuracy and good performance in non-mass data classification [29]. This algorithm set is expected to achieve good results in palaeobios recognition under the complicated and deformed sample images.
As a result of our preliminary attempt, the newly developed combined algorithm displays much more accurate recognition capability in comparison with previous methods. It is therefore reasonable to expect that this preprocessing with machine learning algorithm combinatory method provides unprecedent convenience and accuracy in image analysis.
In this research, to explore the method of algorithm combination and the use of parameters in this challenging and practical palaeobios recognition scene is the main innovation. The effective palaeobios recognition we achieved is not relying on simply algorithm selection and connection, but the usage of suitable method in each procedure and the proper parameters adjustment matching the actual scenario as well as the essential pre-processing together give a smooth and strong feature extraction and pattern recognition flow. This result provides a prospect of how engineers could use the already well-build computer vision and pattern recognition methods in their specific and particular situations to promote real industry evolution.

III. ALGORITHMS
Based on the analysis of the paleontological image scenes under the microscope and the investigation of different methods, several image processing methods and algorithms are combined to realize the palaeobios recognition, which will be introduced in this part.

A. PREPROCESSING
Due to the complex background of rock and mineral patterns in the microscopic images of paleontology, it is difficult to automatically identify paleontology with single computer vision method. Machine learning feature extraction is severely affected by the complex interference in the captured images, so a special image preprocessing to strengthen and highlight the contour features of paleontology image were designed at first. Through a series of grayscale, downsampling, contrast and brightness adjustment and sharpening processing, the mineral pattern is relieved as much as possible, and the more obvious paleontological pattern characteristics are retained (FIGURE 1).

B. SCALE-INVARIANT FEATURE TRANSFORM ALGORITHM
Scale-invariant feature transform (SIFT) is an algorithm for detecting and describing local features of images in computer vision.
This algorithm was proposed by David Lowe in 1999. The SIFT algorithm has outstandinganti-interference performance for factors such as the state of the target state, the environment in which the scene is located, and the imaging characteristics of the equipment. The essence of the SIFT algorithm is to find key points (feature points) on different scale spaces and calculate the direction and intensity of the key points. That is to say, the key is that the extracted appearance feature points are unrelated to the scaling [26], translation and rotation of the graphic. For all kinds of noise VOLUME 8, 2020 FIGURE 1. Schematic diagram of the preprocessing process of all types of paleontological picture. and the change of light brightness [27], this algorithm shows a remarkable anti-interference ability [28].
The graphic signal of the edge of palaeobios is in low frequency domain, and the surrounding mineral image interference is with high frequency. The cutting process of rock flakes will bring about the impacts such as target occlusion and light transmission. The SIFT algorithm has a good feature extraction capability against this kind of interference. Therefore, the advantages of SIFT algorithm in processing feature information have promising generalization performance and robustness for the application of microscopic image recognition of palaeobios.
This experiment uses SIFT algorithm through four steps: Constructing a differential scale space; performing extreme point detection; searching for paleontological feature locations in scale space; and using Gaussian differential functions to identify appropriate key feature points.
Human vision has a concept of scale. Within a certain range, the size of an object can be perceived by the human eyes. But the computer cannot perceive the scale of an object. Therefore, SIFT directly constructs the Gaussian pyramid and provides image features of different scales to the computer [30], and then let the computer recognize the features of the same image at different scales. As the blur degree of an image at different scale in scale space gradually becomes larger, the computer can simulate the formation process of the target on the retina when the distance is close to far. The larger the scale, the more blurred the image.
The convolution operation of the image and the Gaussian function can blur the image, and the Gaussian kernels of different scales can obtain blurred images of different degrees. The Gaussian scale space of an image can be obtained by the convolution of the image and the Gaussian kernels of different scales: where G is the Gaussian function: Here, σ is the scale space factor, which is the standard deviation of the Gaussian normal distribution, reflecting the degree of image being blurred. The larger the value, the blurrier the image, the larger the corresponding scale. L(x, y, σ ) corresponds to the Gaussian scale space.
The application of this method is to use the characteristics of the Gaussian scale with a computer to simulate the process of human eyes recognizing paleontology, because when it is identified with a microscope, the human eye can automatically filter out the small rock pattern features and leave a large paleontological form. In this way, the larger σ value can be used to obtain image features from a larger scale, and the image features of this scale are mostly paleontological features.
Key point screening: On each candidate key point, the location of paleontological features is determined by precisely fitting the pixel data. Finally, the key point with higher stability is chosen. The key points are determined by calculating the gradient of pixels near the key points and obtaining the local gradient direction of paleontological features. Then one or more directions to the location of the key point is assigned.
Describe key points: Calculate the gradient information of paleontological images in the neighborhood of each obtained key point and merge the gradient features of each area to obtain the feature vector of the feature point.
The computational complexity analysis for SIFT has already been established in other paper [31]. We briefly introduce the result here. The SIFT algorithm consists of two steps: 1) Feature detection for identifying image regions presenting high gradients; 2) Features descriptors construction for gathering invariant information about a feature. For step 1, the computational complexity is O (mn), where m and n are weight and height of an image respectively. For step 2, the computational complexity is O (k) where k is the number of extrema found in the previous stage. Thus, the total complexity is O (mn + k).

C. K-MEANS CLUSTERING ALGORITHM
K-means algorithm is the most common clustering algorithm [32], which has the advantages of fast convergence speed and excellent clustering effect [33]. Only two steps are required in this algorithm: 1) Calculate the distance between each vector set and the cluster center provided by extracted SIFT feature points.
2) Recalculate the cluster center based on the distribution of objects in the cluster.
Here, the Euclidean distance (L2-norm) is chosen as the definition of distance in the feature space.
The computational complexity of the K-means clustering algorithm is O(nmkT ). Here, n is the data set size, m is the feature dimension of the data object, k is the number of specified clusters, and T is the total number of iterations.

D. SUPPORT VECTOR MACHINE ALGORITHM
The original algorithmic idea of Support Vector Machine (SVM) was first proposed by two former Soviet mathematicians Vapnik and Chervonenkis in 1963. In 1992, from the University of California, Berkeley and Bell Laboratories, Boser, Guyon, and Vapnik proposed a training algorithm to maximize the interval between training data and dividing hyperplanes [34], and used the kernel function technique to achieve a nonlinear classifier.
The predecessor of the currently widely accepted and used SVM algorithm was proposed by Cortes and Vapnik of Bell Labs [29], they applied soft-space classification on the basis of the former, allowing the training data to be divided by hyperplanes to the wrong side, thereby reduced the possibility of overfitting and made tasks become more categorizable after applying the kernel function. The application of kernel function method and soft interval laid the foundation of modern SVM algorithm.
In general, the support vector machine can be understood as a binary classifier, and the learning space can be divided into two parts through the learning of the training data. The purpose of training is to find the optimal feature space division method. To achieve this purpose, the goal is to find a classification hyperplane, so that this hyperplane can separate the points with different labels to the two sides of the hyperplane.
In order to determine the most suitable hyperplane, intuitively, this classification hyperplane is expected to divide two different types of points as far as possible, that is, the point closest to the hyperplane should be as far away as possible from the hyperplane. Therefore, a hyperplane with the above ''maximum separation'' should be found. This geometric interval can be used to uniquely measure the distance between the training set and the hyperplane. The training goal is to find a hyperplane with the largest geometric interval from the training set. Among all the sample data points, the points closest to the hyperplane, that is, the points that actually affect the position of the hyperplane, are called support vectors.
In this research, the RBF kernel is used in SVM. And thus, the computational complexity of SVM training and predicting phase are O d L l 2 and O(d L N S ) respectively. Here, d L is the dimension of the input vector, l is the number of training sample points, and N S is the number of support vectors.

IV. EXPERIMENT PROCESS
The SIFT, K-Means algorithm and image preprocessing used in this article are all implemented using Python language and Python's OpenCV library. The SVM algorithm is implemented using Python language and Scikit-learn library. The CNN feature extraction algorithm used in the comparative experiment is provided by the PyTorch framework and is pretrained with ImageNet dataset.
First, the above-mentioned collected images are divided into a training set and a test set. In a single experiment, 80% of the images in each category are randomly used as the training set, and the remaining 20% of the images are used as the test set. The training and testing operations were carried out multiple times throughout the experiment, and the average accuracy was counted to evaluate the classification performance. The following is a brief introduction to the process through an image:

A. PREPROCESSING
Due to the complex image pattern, the paleontological morphology was affected by the diagenesis, various compaction effects, and serious metasomatism. The paleontological patterns are greatly affected by mineral lines. It is a very important step to preprocess the image first. Among what hinders the recognizability of paleontological features, irregular mineral lines are one of the main factors.
The preprocessing adopts the processes of grayscale, down-sampling, unifying brightness, improving contrast and sharpening. These processes are set to vanish interference patterns and strengthen the palaeobios contours in the image. One of the most important steps is the adjustment of brightness and contrast. The specific method is: 1) Convert images to grayscale.
2) Down-sample: If the minimum side length of the image is less than 600 pixels, no processing will be performed; if the minimum side length is greater than 600 pixels, the image will be scaled to the minimum size of 600 pixels. 3) Unify brightness: Move the average brightness of the image to the median. 4) Improve contrast: The brightness of each pixel is scaled with a standard deviation to 63% of the maximum level centered on the mean. 5) Sharpen: Convolution kernel as below is utilized: In the down-sampling process, the factor 600 is set based on the image resolutions in the dataset, where many images have a resolution near 600 pixels. Down sampling images to this level unifies all the image scale and retains sufficient detail information. The method of down-sampling is resampling using pixel area relation.
In the experiment, due to the influence of diagenesis, the texture contrast of paleontological fossils is generally low. This pretreatment is adopted to make the patterns of paleontology more prominent in various complex mineral patterns, and to eliminate the complex mineral patterns as much as possible, leaving the pattern characteristics of paleontology (Figure 2a).

B. SCALE-INVARIANT FEATURE TRANSFORM, K-MEANS CLUSTERING AND SUPPORT VECTOR MACHINE COMBINED ALGORITHM TRAINING
Following part is the introduction of the machine learning algorithm flow used in our combined method:

1) SCALE-INVARIANT FEATURE TRANSFORM ALGORITHM PROCESS
In the SIFT algorithm, Gaussian pyramid layers is set to 6; feature contrast threshold is set to 0.01; the initial layer Gaussian standard deviation is set to 8 (σ = 8), and the number of feature points is set to 55. The remaining parameters are not adjusted, and the default parameters are used.
At first, the preprocessed images are inputted into the SIFT model of the 8-layer Gaussian pyramid, and then 55 feature points are extracted, which are marked on the image by the computer with an icon similar to ''alarm clock''. It can be seen that the feature points are basically the features of the patterns or outlines of palaeobios (Figure 2b). The size of the ''alarm clock'' icon represents the scale of the feature point, the range of the circle is the area with the characteristic meaning, and the pointer direction is the characteristic direction. The Gaussian standard deviation of the initial layer is set to 8, which is 5 times the suggested value (1.6) in the origin of SIFT. Since the standard deviation of the Gaussian VOLUME 8, 2020 kernel in SIFT represents different spatial scales of the observed image, setting the initial standard deviation to a larger value means that the processed image will be observed by SIFT starting from a large spatial scale, which corresponds to the large-scaled textures of the paleontological fossil in the image. By starting from a larger spatial scale, feature points are calculated from the large-scale situation, which can reduce the influence of tiny mineral patterns. When obtaining SIFT features, the first 55 feature points are extracted according to the local contrast score of the feature points in the SIFT algorithm. This is to ensure that as many paleontological features are extracted as possible while features containing rock patterns are not extracted too many. With these two important adjusted parameters, the algorithm is more likely to extract the characteristic points of the outline patterns of paleontological creatures, while excluding the influence of the fine and noisy mineral patterns.

2) K-MEANS CLUSTERING ALGORITHM FLOW
The feature vector of each feature point of SIFT contains the gradient direction information in its neighborhood. In this research, our purpose is to identify the paleontological category, and does not concern the information contained in the specific feature point. Therefore, the K-means algorithm was used to cluster all the feature points extracted from the training set images. 70 central points were selected in thecluster, that is, divide all SIFT feature points into 70 categories. K-means clustering is performed on the feature vectors obtained by the SIFT algorithm for all training images. Then the numbers of feature points belong to each cluster are counted into a K-means histogram (Figure 2c). The statistics of this histogram include the distribution of the type and number of feature points in each image and forms a feature vector of each image. These feature vectors are used as input feature vectors for SVM classification in subsequent step.

3) SUPPORT VECTOR MACHINE ALGORITHM FLOW
Here are some settings of the SVM parameters: • The kernel function is set to Gaussian kernel function (RBF), • The soft boarder parameter C is set to 1.0. Standard deviation of Gaussian kernel σ = 1 70 = 0.0143. During training, the feature vectors clustered from K-means are used to train the SVM (Figure 2d) by casting all the feature vectors into a higher dimensional space to realize the nonlinear classification (Figure 2d, a schematic diagram of SVM two-dimensional vector nonlinear classification).
After training all the images in training set, the whole trained model is outputted.

C. TEST
The test images are inputted into the trained models. First the feature points of the images are extracted by SIFT, and then K-means with clustering center points obtained during training are used to cluster the SIFT features into a histogram to get the feature vector of each image. Finally, feature vectors are fed to the SVM model to get the predicted classification results and the statistical classification accuracy is calculated.

V. RESULTS
The image set we collected and used, as well as the experiment results of the palaeobios recognition and some comparisons are shown below.

A. DATASET
In this paper, palaeobios and rock samples are collected from the School of Earth Science and Technology, Southwest Petroleum University. Sample slices were made from core samples taken during oil gas exploration and development. All the palaeobios and rock slice images were taken under a polarized microscope.
In this experiment, foraminifera and anthozoa samples with well-preserved texture were collected. 63 abiotic rock images of the same order of magnitude were selected as the control group ( Figure 3). The foraminifera includes four main types, namely Palaeofusulina, Reichelina Erk, Nankinella Lee and Geinitzina Spandel, which has 45 images in total ( Figure 4). Anthozoa contains two types, namely Favosites and Kucichowphyllum, which has 85 images in total. ( Figure 5).

B. OUTPUT RESULTS
Here, the averaged accuracy over 100 repetitions are reported.
The experimental group that fully carried out the above process shows following results: the average recognition accuracy of foraminifera palaeobios is 77.1%; the average recognition accuracy of anthozoa is 86.4%; the average recognition accuracy of non-paleontology rock image is 87.1%; and the overall average recognition accuracy is 84.5% (Table 1).

C. OTHER POPULAR FEATURE EXTRACTION ALGORITHMS
In this experiment, output of the VGG-16, ResNet-18, and ResNet-50 networks (the last full connected layer is removed to directly output the extracted image feature) pretrained by ImageNet dataset were used as features with the SVM classification algorithm to perform four independent same training tests ( Table 2).
The results show that all the tested images are classified as anthozoa organisms. The accuracy of anthozoa recognition is 100%, and the other categories are 0%. This indicates that popular neural network algorithms are not suitable in this scenario. These types of typical algorithms have poor recognition capabilities for small sample amounts with complex pattern scenes. This shows that the commonly-used convolutional neural network algorithm is inapplicable to the recognition of paleontological image and is strongly interfered by mineral patterns.

D. NECESSITY OF PREPROCESSING
In this paper, all the experimental groups have adopted grayscale and resampled resolution. These two steps are for more uniform and efficient image processing. In order to prove the effect of our preprocessing on the accuracy, the image recognition accuracy test on the three control groups was conducted. The three groups are the group without uniform brightness and contrast, the group without sharpening, and the group without preprocessing. The results are shown in Table 1.
As can be seen from Table 1, the overall average recognition accuracy can reach 84.5% if adopts all preprocessing methods. Without adjusting the brightness and contrast, the average accuracy of the preprocessing method that only uses sharpening is 81.3%. Although the average accuracy of anthozoa recognition rate is 94.7%, the accuracy of foraminifera is only 50%. This shows that the classification is not balanced and not ideal. For the group that does not utilize sharpening, the average accuracy is effected a little and is 83.4%. Moreover, if trained directly without taking these two preprocessing steps, the overall average accuracy is only 80.2%. The arruracy of recognition of foraminifera is only 46.7%, which is even worse. The above results show that appropriately adjusting brightness and contrast provides significant improvement to the recognition of foraminifera and overall recognition accuracy. The sharpening process also attributes to the recognition behavior. This shows that the preprocessing procedure we adopted is necessary and effective.

VI. DISCUSSION
Paleontological fossils in the rock cores are of great value to the research of geological and paleoenvironmental changes. It also provides important information for the exploration of resources such as oil and minerals. Improving the analysis and processing capabilities of these rare data is an important step towards automation and intelligence in this field. Machine learning and pattern recognition have developed rapidly in the past decades. With the support of massive amounts of data, the breakthrough and popularity of neural network algorithms are obvious. But on the other hand, the development of traditional computer vision and pattern recognition methods is relatively slow in recent years. Not only geological sciences, but many other disciplines also have the problem of processing scarce data, and until now, the practical application and development of computer technology in these related fields have rarely been seen. In addition to the hotly developed neural network technology, it is our hope to contribute to comprehensive improvements in applications of computer technology in all walks of life which can assist or even replace traditional expert experience and manual analysis. To achieve this goal, there is still a long way to go.
In this research, only the classification of 3 types of images is verified. In future research and practical applications, it is often necessary to classify more types of fossils and deal with more complicated interference. In this case, our algorithm needs to be further improved in classification capabilities and anti-interference capabilities.
In our algorithm flow, the features extracted from images by SIFT are directly fed to K-means to do the clustering. However, it is unavoidable to extract features that actually belong to backgrounds but not the fossil patterns, which become an interference to followed steps and may limit the classification ability. In this research, the first 55 features are chosen from SIFT algorithm by the rank of feature point local contrast [26]. This number is to make the balance between the features belong to fossil patterns and backgrounds to ensure the high accuracy of classification, which drops numbers of informatic features ranked low. In modern data mining field, feature selection plays an essential role to trim meaningless and redundant features to maintain the quality of the feature set. Insert an efficient feature selection method such as [35] after the SIFT feature extraction may improve the robustness of our algorithm flow of fossil classification and should be considered and examined in the future [36].
In addition, the positional relationship of the feature points is not used: K-means does not take the location relationship between feature points, which also contains information of the fossil pattern. Thus, not only clustering method like K-means, but also some other algorithms process the feature point location information should be considered, too. And both the feature clustering results and the structured feature location information should be synthesized together to conduct the classification procedure.

VII. CONCLUSION
In this research, a novel algorithm which shows good performance in recognizing microscopic palaeobios images with complicated deformation and interference is developed. By using preprocessing methods to enhance the paleontological characteristics of the image species, the combined algorithm based on SIFT, K-means, and SVM has promising performance and practicability for the recognition of thin slice paleontological images. The overall accuracy of the proposed method reaches 84.5%. As a contrast, conventional convolutional neural network-based methods failed to do any recognition in this scenario. This method does not need manual analysis of images, instead the computer automatically obtains the description of the features of paleontological images through learning and automatically classifies. Therefore, there is a significant reduction in labor costs and learning costs, making it possible for experts to transfer knowledge, and leading to a great improvement in the speed of paleontological identification. This shows that artificial intelligencerelated algorithms have broaden the spectra of ideas and directions for the future development of paleontology.