Automatic Deep Vector Learning Model Applied for Oil-Well-Testing Feature Mining, Purification and Classification

Well-testing stage analysis is an important way for oilfield operation state decision-making and reservoir management. However, due to the variability and nonlinearity of the downhole data caused by the complex exploration activities and the differences of petroleum type and geological conditions, the classical methods are ineffective in feature extraction, learning network construction, and classifier optimization. In this work, we propose a new well-testing stage classification method based on a deep vector learning model (DVLM). The novelty of this study lies in the combination of multi-feature extraction, deep learning and feature vector mapping. The proposed method can overcome the problems of poor feature representation ability and poor classification model generalization ability in the existing machine learning methods, which mainly caused by the non-optimized training network structure and the unreasonable classifier design,. Firstly, the initial features are obtained by four classical methods. Then a five layers deep belief network embedded with the mutual information coefficient method is implemented for further feature extraction and purification. Finally, the optimized learning vector quantization classifier outputs the predicted tags. For model training and testing 572 field samples total of 4004 data streams are used. By considering the classification errors and accuracy metrics, the neurons number of deep learning network and the classifier are tuned, and an optimal and stable framework is obtained. Comparative experimental results with several classical integration models show that the proposed model achieves the highest classification accuracy of 98.065% as well as the least of features (nine). The results demonstrate that the proposed model has excellent performance in improving the classification accuracy and completing the feature compression. Moreover, the proposed model has very important practical significance for guiding the automatic analysis and processing oil and gas data.


I. INTRODUCTION
During oil well exploitation, the accurate classification of the well-testing stage plays an important role in real-time early warning of the operating platform, the rational revision of the production flow and the scientific management of the The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan .
reservoirs [1]. The off-line well-testing stage classification can improve the well-testing operation process, provide qualitative explanations to the data without assignment of the exploration category and working stage, and complete the accurate evaluation of complex oil reservoirs. The online well-testing stage classification can determine whether an operation stage is finished or terminated in advance, reduce production accidents, and provide guidance for real-time surface operations. However, the high nonlinearity of geological parameters and the high complexity of exploration processes caused by the continuous development of oil exploration in depth and width make it difficult for the existing classical mathematical method to extract a large amount of effective information in the exploration data.
Moreover, the testing data contains a lot of interference signals, singular signals, and information about the ground operations and the activities of the staff due to the temporary operation flow adjustment or the occurrence of uncertain events at strata and borehole locations [2]. Even when the data is from the same operation stage but different testing wells, there exists a large difference in the length and the curve shape. These uncertainties exacerbate the difficulty of analyzing and automatically processing the massive data [3], which pose significant challenges to data analysis and directly lead to the inaccurate matching relationship between testing data and operation stage. Therefore, the problem of ''rich sample data but poor effective information'' often occurs. At present, the unsatisfactory data classification results are mainly attributed to poor characterizing information ability of data features and low generalization ability of classification models. In order to solve this problem, it is urgent to mine a few but high representational characteristics of the well-testing data that would facilitate the effective classification of non-uniqueness pressure signals.
As for improving the characterizing information ability of data features, the time-domain approach has long been popular because of its simplicity and ease of implementation. Numerous methods have been proposed for the feature extraction and classification in the field of petroleum operation decision-making and system diagnosis [4], such as coastal oil tank detection [5], oil spill detection [6], pipeline leakage detection [7], tool wear prediction [8], and working status inspection [9]. Li and Misra [10] used nuclear magnetic resonance to investigate the internal structure of geomaterials filled with fluid. Firoozabadi et al. [11] built a well-prediction model based on a pressure drop. Meckel et al. [12] conducted the wellbore leakage model optimization by analyzing reservoir pressure transients. Ghaffarian et al. [13] illustrated that transient pressure data could be used to identify the correct reservoir models with high accuracy. It should be noted that the pressure derivative has always been an important time-domain analysis method used in the modern well test analysis to obtain the properties and the conditions of the flow system [14]. Miao and Di [15] took pressure derivative as feature to track and evaluate five kinds of fracturing methods commonly used in oil production plants. However, the pressure derivative represents the rate of pressure change, and can only reflect the local features rather than the global features. Besides, some studies also tried the nonlinear methods. Ahmadi et al. [16] proposed a well-testing identification model for the pressure transient test data using the concept of shapelets. Li and Misra [10] extracted the features by a three-level wavelet packet transform (WPT) from the resultant force converted from thrust and torque. Zheng et al. [17] analyzed the real-time mud signals and designed a feature extraction method based on the intrinsic mode functions through the ensemble empirical mode decomposition (EMD). Yang and Kong [18] used the combination method of the wavelet transform and EMD to detect mutation point of the positive pressure wave signal. Although the above-mentioned studies have achieved a certain interpretation effect, their feature extraction algorithms are single and shallow and the simple single feature they extracted cannot characterize the trend of the data to be processed. We are pleased to find that the multi-feature extraction method could resolve the above problems and provide guidance for the follow-up study [19].
We notice that the intelligent technologies have been widely used in many fields. As one of these methods, deep learning can not only extract the effective features of data, but also can be used for prediction and classification [20], it shows good performance in text recognition [21], manufacture diagnosis [22], fault detection [23], electrocardiogram (ECG) detection [24], and speech recognition [25]. Hammad et al. [26] employed a pre-trained deep CNN models and selected valuable layers to get a good representation of ECG and fingerprint data. Kim et al. [27] created a particle swarm optimization-deep belief network (PSO-DBN) based classifier, which is used for rare class prediction. Zou et al. [28] constructed a feature-selection model to improve the remote sensing scene classification performance, in which DBN was used for mass-data processing and feature reconstruction. The most recent work using deep learning in well-testing applications has been presented by Feng et al. [19], in which he created an integrating model based on deep belief network for the feature extraction. The model is almost the first to introduce the Boltzmann machine based deep learning methodology into well-testing data analysis. Although the paper provides the optimal structure of the DBN network suitable for the division of the oil testing data, the step size of neurons in DBN highest layer was 10, which was a bit large. At the same time, the support vector machine (SVM) classifier used in his model needs to be normalized, which requires higher format requirements for input features.
The above-mentioned studies illustrate the advantages of the DBN method in processing the data and enhancing the classification ability of models under different application backgrounds, but these methods still have many drawbacks. First, they lack the consideration of the impact of the DBN structure on data analysis, a complex learning network may cause too many training sessions, resulting in low learning efficiency and slow convergence. Second, there is no general theoretical guidance on how to determine the number of hidden layers and the number of nodes in the hidden layer of the learning network, and it mainly draws on a lot of practical experience. Third, Restricted Boltzmann Machines (RBM) and the back propagation (BP) network, as the important components of DBN, still have the disadvantages that they are easy to fall into local optimality and cannot obtain VOLUME 8, 2020 global optimal solutions. Fourth, the feature redundancy still exists. Hence, further improvement in features purification and model optimizing are required.
As for improving the generalization ability of classification models, scholars have been studying this problem by using classic methods such as Naive Bayesian (NB) [29], RUSBoost [30], probabilistic neural network (PNN) [31], k-nearest-neighbor (KNN) [32], but so far there is no universal solution. In the latest method, Shahriari et al. [33] tried to illustrate the performance of the deep neural network (DNN) for inverting logging-while-drilling measurements acquired in high-angle wells. In our previous research [19], the SVM with PSO optimized parameters was only used for data classification. The learning vector quantization (LVQ) [34] is a classic vector quantization framework [35] and has both feature learning and classification functions. It has been successfully applied to many classification fields, such as financial management [36], industrial production [37], system safety [38], and wearable physiological parameter detection [39]. It can be used to build the codebook from various code vectors and has the wider input features range that makes it possible to improve the model generalization ability. Furthermore, an LVQ network structure suitable for well-testing analysis is rarely used. Therefore, it is of great practical significance to introduce the LVQ into oil and gas exploration in addition to the classical mathematical and the intelligent methods. The only drawback in LVQ is that the optimal parameters need to be set according to the analysis of the processed data.
The main tasks of this article are extracting the effective features of well-testing data and optimizing the structure of the integration model. When classifying the data obtained under complicated working conditions, the classification accuracy tends to be poor by using single feature extraction or above-described classification methods [40]. Therefore, in order to improve the stage classification performance (that is, the characterizing information ability of data features and the generalization ability of classification models), we propose the deep vector learning model (DVLM). Firstly, four classic feature extraction methods are used to extract the initial multiple features. Next, the DBN is employed to learn, excavate, and reconstruct the features in-depth. Then, the maximum information coefficient (MIC) [41] is applied to delete the data redundancy. Finally, the LVQ is used to classify the features. The main contributions of this article are summarized as follows: 1. We are the first to integrate multi-feature extraction, deep learning and feature vector mapping and use their integration for well-testing working stage classification. 2. We analyze the priority of the number of neurons in the highest level of DBN from 20 to 60 with a step size of 1.
Although it belongs to the second study, it supplements the priority rule missed in our previous study due to too large step size (step size = 10).
3. We are the first to apply the cross-validation method based classification error and accuracy on well-testing data to optimize the structure and parameters of the LVQ classifier. 4. We are the first to use MIC method for feature purification and redundant elimination based on LVQ classification rate.

II. WELL-TESTING DATA ANALYSIS
Well testing is a seepage mechanics based method to acquire downhole geological information and production capacity by analyzing the dynamic changes of pressure with time in the production process of oil wells and gas wells. As the theoretical test curve is shown in Fig. 1(a), the complete well-testing operation process consists of total five stages, and the operation curve of each stage is a data-stream, which is composed of the geological parameter pressure of the corresponding time period. In this article, we do not directly classify the geological parameters, but classify the data streams composed of the geological parameters.
In order to keep the originality of the well-testing data, the preprocessing of the data streams is only normalization. However, the signal processing circuit of the electronic pressure gauge has rectified and filtered the signal of the sensor when acquiring the pressure data.
Since well opening and well closing are the two most important stages to be identified, in most field operations, these two stages are performed at least twice. After merging the same stages, the whole process consists of a total of five stages. Set the tag value for each stage and use these values for the standard judgment. The surface staff will identify the stage name based on the predicted tag. The five working stages and the detailed correspondence stage tags are shown in Table 1. The conventional well-testing works such as curve analysis and operation stage classification are mostly done manually by experts with production experience and contain a lot of subjectivity. Take the data in lowering the oil string stage as example. As shown in Fig. 1(b), if denoising process is required, the pressure noise points that deviate significantly from the ''average'' trajectory of other testing points will be quickly identified and filtered artificially due to their obvious fluctuation characteristics. However, the manual methods will not be able to deal with the cases where the point fluctuation is small and the visual identification of noise points is difficult. The noise signal may be misinterpreted as normal signal, which reflects the imperceptible of data features. Looking at it another way, if stage classification is required, the noise should just not be filtered out as interference, it can be used as the judgment feature of lowering the oil string stage, which reflects the flexibility of using the extracted features to solve the practical problems. Therefore, to obtain a good classification efficiency, it is particularly important to extract the latent features that can represent the data characteristics of each operation stage.
Currently, the difficulty of extracting the features with strong characterizing information ability is mainly attributed to the stochasticity of data and the differences in algorithms.
During oil and gas exploitation, the procedure of well testing is scheduled, planned and repeatable, which ensures that the operation stage sequences are not random. However, the fact is that the interference of outside and artificial factors on well-testing data are inevitable. As shown in Fig. 1(b), the pressure and time are not simple linear relationships. The environmental factors that affect the relationship between the two include the geological conditions of the downhole operation area, the surface operation environment, and the measured downhole depth. These factors lead to the different underground energy and pressure recovery capacity of different test samples, the uncertain running time of each stage, which makes the pressure curve of each operation stage have a great difference in amplitude, shape and time width.
Besides, during the well-testing signal acquisition process, various interference signals are inevitably produced due to the influence of the mechanical vibration, the faulty or delayed operation and the negligence use of oilfield workers. These noises and uncertain interferences are mixed with the real downhole geological pressure signals, which will lead to distortion of pressure signals and ambiguity of pressure waveforms, thus leading to inaccurate identification of signal feature points and unobvious distinguishable features between the data in each stage. These noises show a rapid rise and fall in the time domain.
As shown in Fig. 1(b), the position, the duration and the occurrence of the peak and the inflection point on the timeline vary significantly in different well-testing sample curves. The density of peak group can be presented in frequency domain. There exists some similar but difficult to extract features from different samples or stages, such as continuous peaks, stepped pressure rise/drop mixing various noises and interferences. Among them, the continuous peaks are mostly generated by screen plugging, packer loss of seal and pressure-relieving operation during the well closing, and the mechanical vibrations. The stepped rise often occurs when the equipment fails, the sampling frequency is too low, and lowering the oil string. The stepped drop is easy to form in variable-rate drawdown test and pulling up the oil string stage. The step width and curve rise/fall slop well reflect the time-frequency characteristics of these stepped rise/falls.
The above similarities and uncertainties show that, due to the influence of the workflow and the geological environment, the data features are difficult to calculate by using simple empirical formulas or mathematical statistical methods. However, the good news is that the time domain methods, frequency domain methods, and time-frequency domain methods can all be used to abstractly characterize the data.
However, the above difficulty in extracting the data features cannot be solved by simple time-frequency feature extraction. On the one hand, the existing classical feature extraction methods cannot cover all the feature types. On the other hand, extracting only one specific feature type using a single method is not enough to fully characterize the information of data. Therefore, to distinguish the changes in production operations such as noise interference, formation blockage, it is necessary to conduct an overall and regional VOLUME 8, 2020 treatment of the well-testing data. The multi-feature integration method should be utilized to represent all possible types of well-testing features. Meanwhile, there arose the following new problems that need to be further solved: 1. Whether the initial features obtained by the multifeature extraction method are the features that can best characterize the data stream. 2. Whether there is a need to improve the feature representation ability by re-learning the initial features. 3. More features also mean more redundancy, whether they need to be purified. 4. Whether there exists a mapping method that requires little input vector formatting requirement and easily converges various complex and abstract input features to the categories they belong to. Fortunately, we find all solutions and believe it is necessary to use time or/and frequency domain methods and the deep learning method to characterize the inherent laws of the reservoir parameters, and use the LVQ to classify the heterogeneous characteristics of the well-testing data.

III. METHODOLOGY
To overcome the challenges brought by the above problems, we propose a deep vector learning model, whose framework is shown in Fig. 2. We focus on using the classical methods to classify the complex and nonlinear downhole geological parameters, which are mainly reflected in two parts: mining the data characteristics and improving the generalization ability of the classifier.
In this section, we follow the process of model formulation to explain the working principles of different methods used in the DVLM model. The data processing steps can be concluded as follows: Step 1: Perform the multi-feature extraction (MFE) by using four classical time-frequency methods and obtain five types of initial features.
Step 2: Reconstruct and purify the initial features by using the deep belief network-maximum information coefficient integration methods.
Step 3: Apply the LVQ classification accuracy to the analysis of the effect of various influence factors on model structure optimization and feature classification, so as to ensure the optimal performance of the model.

A. THE MFE METHOD
Practice confirms that due to the nonlinear-time-varying characteristic of downhole signal, when applying the pure time-domain or frequency-domain features to data classification, it usually encounters the problem of poor feature characterization ability. So learned from MFE method [17], we integrate wavelet packet decomposition-approximate entropy (WPD-AE) [42], empirical mode decompositionapproximate entropy (EMD-AE) [43], fast Fourier transformation (FFT), and linear regression (LR) to process and distinguished the similarities and uncertainties between the different well-testing data.
Define the n th 0 group data s(n 0 , N , x, y) with data length N. y is the amplitude of the data, and the index number x ∈ [0, N − 1]. The energy of the well-testing signal is mostly distributed in the low frequency band. Therefore, we use WPD to perform three-layer decomposition of the well-testing data to obtain the reconstruction coefficients , and C L is the WPD decomposed level. Considering that Symlets has good local characteristics in the frequency domain, and relatively high accuracy in the signal phase reconstructed, we select Symlets for WPD wavelet bases.
EMD is suitable for non-linear and non-stationary data analysis which contains multiple IMF components {c N nl (n 0 )} and a residual r(n 0 ), nl ∈ Z . To make the eigenmode function have good linearity and stability, CS is set to 0.2 in the EMD. To avoid the abnormal processing caused by too large data or unequal data length between two data samples, the data length of each sample is sampled to 1000.
Further, we use AE [33] to indirectly describe the complex changes inside the {d N il (n 0 )} and {c N nl (n 0 )}. Thus, the WPD-AE α(n 0 ) = ApEn N m,r (n 0 , d il ) and the EMD-AE β(n 0 ) = ApEn N m.r (n 0 , c nl ) are obtained. For a specified length data sequence, the magnitude of the AE depends on the dimension m and the tolerance threshold r. To make AE less dependent on N , set m = 2, and r = 0.2SD, where SD denotes the standard deviation of s(n 0 , N ).
FFT is used to extract the peak distribution and sampling frequency of the data. The characteristics X (k 0 , n 0 ) are obtained by calculating the top k 0 FFT coefficients.
Furthermore, an oil well interpretation system typically uses pressure derivatives for reservoir evaluation and geological research, but seldom for well-testing stages classification. Faced with the stage data that requires global data analysis to complete classification, the pressure derivative belongs to the local data transient analysis method and is not functional. Production experience shows that gradient features can quickly and accurately predict the motion trend of the detected data. It's working principle is similar to pressure derivatives. Thus, we use the LR to extract the parameters including three-interval regression (TIR) reg 3 (n 0 ) = [ra x andȳ represent the mean of index number and the mean magnitude of the id th interval data, respectively; kd = 0, 1, . . . , (N /nd) − 1. Then, the final multi-feature vector can be expressed as: attr(n 0 ) = [α(n 0 ), β(n 0 ), X (k 0 , n 0 ), reg 3 (n 0 ), Reg nd (n 0 )] (2) Set k 0 = 20, nd = 10 and 100. As shown in Fig. 3, the MFE method extract a vector containing 38 feature elements from the n th 0 sample. For one sample, the process of acquiring MFE features is shown in the appendix at the end of the paper (see Fig. 13).

B. DBN-MIC FEATURE RECONSTRUCTION
The combination of DBN and MIC is to realize the non-linear learning of abstract data and obtain features with high characterizing information ability.

1) DBN TRAINING
The DBN network is made up of a multilayer restricted Boltzmann machine (RBM) and a BP network. Denote θ= {a, b, ω} as the parameters of the RBM model, where a and b are bias vectors, ω is weight matrix. Then the learning process can be summarized in the following three steps: Step 1: Normalize the amplitude y of s(n 0 , N , x, y) to the range [0, 1]. Set the learning rate to 0.1 and training period to 50. To obtain the optimal θ, the CD method [44] is used and the training samples were divided into 80 parts.
Step 2: Input the MEF features into the DBN network, train each layer of the RBM network 100 iterations in an unsupervised manner.
Step 3: Add the fine-tuning BP network after DBN and optimize the DBN network weights 100 times to decrease the difference between the highest layer output and the tagged data.
In Step 3, we use the gradient descent method based BP optimization network to obtain the global parameters θ DBN The Polack-Ribiere flavour of conjugate gradients [45] and Wolfe-Powell stopping criteria [46] are used for computing search directions and controlling the range of the optimal function value.

2) FEATURE EXTRACTION AND PURIFICATION
Construct a new network structure with the same weight θ DBN and complete the input features mapping once again; thus, the output features can be expressed as: where ψ is the input feature, y h−level is the output feature.
Considering that feature redundancy still exists in y h−level , we prioritize and purify the y h−level based on MIC method. The main steps were as follows.
Step 1: Calculate the MIC value M (y h−level , I h−level ) using Minepy package, I h−level is the element index in y h−level Step 2: Update the index sorting based on MIC value from large to small and get the priority ranking.
Step 3: Calculate the LVQ classification accuracy of the above rank, and denote the top τ features that can obtain the highest accuracy as y τ MIC (n 0 ) and name it as purified feature.

C. CROSS-VALIDATION-OPTIMIZED LVQ CLASSIFIER
We use the LVQ1 to design the classifier. The LVQ has no normalization requirements for input features and needs only a few parameters to be optimized. As shown in Fig. 4, the LVQ network is composed of an input layer (IL), a competition layer (CL), and a linear output layer (LOL). By calculating the CL neuron closest to the input vector, the LOL neuron connected with the CL neuron is found. Define the weight between the j th neuron of IL and the i th neuron of CL as w ij , the number of neurons in the CL as N C . The steps of the LVQ1 can be concluded as follows.
Step 1: Initialize the w ij and learning rates η(η > 0) between IL and CL.
Step 2: Send the input vector X = (x 1 , x 2 , . . . , x R ) T to IL and calculate the distance d i between the neurons of CL and x, where Step 3: Define the neuron of CL with the smallest d i from X as Min c , mark the class tag of the LOL neuron connected to Min c as C i . VOLUME 8, 2020 Step 4: Define the class tag of X as C X , w ij are updated as follows: Besides, the cross validation (CV) method [47] plays an important role in avoiding over-fitting and improving the generalization ability [48]. To optimize the LVQ structure, we used 10-fold CV to partition datasets into complementary subsets, ensuring that each example has the same chance of appearing in the training and testing set.

D. DATA SOURCE
The dataset was collected from the well testing platform of Huabei Oilfield, China. The reservoirs in this oilfield have the feature of special lithology, structural fractures, and strong edge water. From 2009 to 2018, based on the field data collected from 572 wells, a total of 4004 samples of well-testing data and the corresponding operation stages were obtained. Each oil well contained samples belonging to the complete five operation stages, and each sample was composed of the data from one working stage. The actual operation stage of each sample is specified by matching the theoretical test curve in Fig. 1(a). All the pressure data are acquired by the downhole pressure storage gauge. Considering that the data obtained from well-testing sensors at different depths have different data sizes N and pressure amplitudes y, we pre-normalized N of each sample at [0, 1000], y at [0,100]. This article do not explore the impact of the DBN hidden layer setting on the system, which makes the verification set and testing set have the same network training effect. So we assign the number of samples in training set as 2402 (60% of samples) and verification/testing set as 1602 [49], in which the proportion of the number of samples in each stage is 1: 1: 2: 2: 1. Fig. 5 shows the workflow from data acquisition to data playback and the self-made equipment used for obtaining the well-testing data. First, the assembly is lowered down to the depth needed to be measured. Then, the surface staffs perform operations according to the established well-testing procedure. At last, staffs take the gauge out of well, download the memory data by wired (blue dotted line) or wireless (red dotted line) method and classify the data.

E. PROPOSED MODEL
This article aims at automatically and accurately outputting the tags belonging to the data of unknown classification stage relying on training the oilfield datasets with class tags. Generally, combining the classifications of an ensemble is often more accurate than the individual methods making them up. Therefore, we established the topology structure of the stage classification model based on MFE and DBN, and put forward the DVLM, as shown in Fig.6. The detail works we have done are: 1. Combine the classical feature extraction algorithms and deep learning network to a deep learning system. We advocated extracting as much feature information as possible through the analysis of well-testing data, which contained the essential information representing all the operation process.
2. The optimization of DBN network structure.
Aiming at the problems of poor learning ability of classical three-layer network and difficulty in determining the number of neurons in the middle layer, we created a self-configuring DBN network and determined the optimal network structure suitable for well-testing data processing.
3. The optimization of classifier parameters. The CV algorithm was introduced to avoid the local minimum value while seeking the optimal number of neurons in CL of the LVQ classifier in order to obtain global optimal parameters. 4. Large collection of well-testing samples. Acquisition of samples require a long operation time due to the fluidity of oil and the slowness of pressure recovery, which brings great challenges to models that rely on massive learning datasets to complete data classification. We collected the samples from single well to well groups, and used the obtained data of past 10 years to enhance the learning and the classification capabilities of the model.
The DBN network in the proposed model is a multi-layer forward network with BP feedback mechanism, and has a five-layer network structure, including one input layer, three RBM hidden layers and one output layer. The layers are fully connected, and there is no connection between the neurons in each layer. The three hidden layers are used for feature learning, reconstruction, and dimension reduction. The hidden layer nodes are used to extract and store the inherent laws of the data. To make the DBN have a strong learning ability, we set the number of input layer neurons to be nearly 10 times the number of input features. Moreover, the number of hidden neurons should be less than twice the size of the input layer [50]. Based on this, we set that the numbers of neurons in the three hidden layers R hidd decrease layer by layer, the ratio is 2, so the number of neurons in the three layers are 400, 200 and 100 respectively. The previous analysis results showed that the MFE method obtained a vector with 38 multiple features for each sample. Therefore, in Fig. 6, the number of neurons in the first visible layer R first was set to 38. The number of neurons in the highest layer R high and the LVQ competition layer R CL were optimized by analyzing the LVQ classification accuracy under different R high and R CL (shown in simulation results). Here, the features of the training and the testing sets are expressed as Attr train (2402 × 38) and Attr test (1602 × 38), respectively. For the n th 0 sample, the main definitions are listed in the appendix at the end of the paper (see Table 4).
The concrete process and the pseudocode of the DVLM are described in Algorithm 1.

IV. EXPERIMENTAL RESULTS
The proposed model is obtained on the basis of learning and training the sample set with given stage tags, and  the data classification experiments are conducted by using MATLAB 2017a. The classification accuracy Acc is obtained by comparing the predicted stage tags with the actual stage tags of corresponding samples. The expression for accuracy is derived in Eq. (6).
All the works that need to verify the classification ability are accomplished by the R CL -optimized LVQ. Define the cumulative error e = n test n 0 =1 i n 0 , the goal of LVQ training parameters as G Param . The most challenging of LVQ is its difficulty in selecting an appropriate R CL and G Param . Considering that CV is available for estimating an appropriate parameter value, we randomly selected 60% of samples for training and the remaining for testing to obtain the optimal LVQ parameters. Given the range of R CL values and configuration parameters (epoch and G Param ), we first obtain the local optimum number of neurons using 10-CV. In each 10-CV estimation, 10 errors are obtained. After trial and error, the ranges of the global optimal parameters are determined and R CL ∈ [10,50], G Param ∈ [0.24, 0.33]. Let Err in Eq. (7) be the judgment criteria of parameters optimization, where err 1 reflects the dispersion, err 2 reflects the average distribution. Then the combination of G Param and R CL with the smallest Err is selected as the optimal combination of parameters.  for all parameters combinations. So after successive training, G Param = 0.28 and R CL = 30 are the optimal choice.
We capture the least MSE and the highest LVQ accuracy for the best DBN configuration. From 20 to 60, with a step size of 10, five trial designs are obtained. Fig. 8 shows the training set classification accuracy of the DBN with different numbers of highest-layer neurons. When R high = 30, the lowest MSE is 0.0053289, while its LVQ accuracy is as high as 0.972534. Thus, the optimal number of highest-layer neurons that would yield satisfactory performance is determined.  At present, there is no scientific and universal method for determining the structure of deep learning networks. Fig. 9 analyzes the impact of different R high on classification accuracy and gives its priority ranking, where the abscissa represents the number of neurons in the highest layer of DBN. When the value of R high is greater than 41, the accuracy is stable in the range of 97.07%-97.24%, which could ensure sufficiently high network performance. Considering that the too many neuron nodes will increase the calculation burden and appear ''overfitting'' during parameter optimization [51], the basic principle of choosing optimal R high in this article is: the network structure should be as compact as possible by reducing the number of hidden layer nodes with a high classification accuracy ensured. So, in summary, R high = 30 is chosen. Table 2 shows the classification accuracy before and after the MIC processing at different numbers of highest layer neurons. The self-validation accuracy and MIC classification accuracy are obtained by using y h−level and y τ MIC , respectively. The difference is that both training samples and verification samples in self-validation were from y h−level , while in MIC classification, 60% of y τ MIC are used for training and the remaining 40% for verification. It shows that the self-verification accuracy is higher than other models of the same type. What's more, under MSE criteria, the feature extraction network with 30 highest-layer neurons can not only obtain the highest self-verification rate, but also show good characterizing information ability for the corresponding feature y τ MIC . For these reasons, the value of τ is determined to be 9.   10 shows the optimal feature ranking under MIC processing. The horizontal axis represents the feature priority ranking obtained by the MIC method when the R high is from 20 to 60. The vertical axis represents the self-verification classification accuracy obtained by the LVQ classifier after the current feature is combined with all the features that were prioritized by itself. It can be seen that the features appeared after the accuracy has become stable have little influence on the classification accuracy. These features are the redundant information that needed to be eliminated by the MIC operation. It's known together with Table 2, when R high is 30, the number of MIC features is the smallest, the model can get the best redundancy elimination effect and the highest MIC classification accuracy. Fig. 11 shows the stage tags corresponding to the predicted output and the actual standard output under the four methods. All the training samples in these methods are randomly selected. The rows correspond to the predicted class (Output Class) and the columns correspond to the true class (Target Class). The diagonal cells correspond to observations that are correctly classified. In MFE, the large number of predicted output is not placed on the master diagonal, which reflectes the low characterizing information ability of the features. In comparison, the latter three methods have obvious improvements and the number of misclassification is decreasing. The detailed comparison between different methods is shown in Table 3.
Part (1) of Table 3 shows the LVQ classification accuracies under different sub-feature-extraction-methods that make up the DLVM, their values varies with different methods. The total rate is the ratio of the total correctly classified samples number and the total samples number. The number of testing samples in five stages are 208, 224, 468, 468, and 234. The WPD-AE, MIE(10), and MIE(100) can better characterize S1 and S5. The EMD-AE and TIR have a good effect on the classification of S2 and S4. The FFT and TIR obtain a relatively high accuracy for S3. But even the maximum classification accuracy of a single feature is only 50.437%, which is too low to satisfy the requirement of practical production.
Part (2) of Table 3 summarizes the performances of different classifiers and integration methods. The number of testing samples in each stage is randomly assigned. The MFE features has a total rate of 69.788%, but the overall classification error is still large. The DBN30 network with R high = 30 deeply extract the MFE features, although not eliminate the redundancy, it is still effective. The DBN-MIC method (the method integrated DBN and MIC) with different R high reduce the number of features and obtain better results. The DVLM model with R high = 30 and τ = 9 correctly classify up to 1571 testing samples which corresponds to 98.065% of 1602 samples. The experimental results also show that the DVLM has a better classification performance as compared to BP, NB, RUSBoost, PNN, and KNN. We also compare with a latest methods DNN and MFE-DMIC. In DNN, The number of neurons in the three hidden layers is same and is 50. To obtain the optimal weights of DNN, we use the function ''fminunc'' in Matlab to calculate the minimum value of the cost function. The input training features of SVM and BP are y 9 MIC , here renamed them as E-BP, E-ANN, and E-SVM, E stands for evolution. The benchmark methods we used for comparison are configured with Matlab default parameters. These configurations can basically ensure a good working state of the corresponding model. The number of neurons in BP hidden layer is 20; the spread in PNN is 0.8; the values of SVM penalty parameters c = 2, kernel function parameter g = 0.05.
A well designed method is based on the principle of minimum-in-cluster-distance and maximum-betweencluster-distance. So we introduced the t-SNE method to realize the visualization of all 4004 samples by minimizing the standardized Euclidean distance between the original data and the reconstructed data. Fig. 12 shows the distributions of different dimensional features obtained by different methods when perplexity = 200. The distances between well-separated clusters in a t-SNE plot only reflects relative position and has no physical meaning. The dimension of features for each method is given in the caption of the subpicture below.
The features in Fig. 12(a) is obtained by using the MFE method; due to the lack of deep feature extraction, there were a lot of mutual inclusions and overlapping. For the features in Fig. 12(b), some points belonging to Stage 3 still have the sample dispersion problem. In Fig. 12(c), there is no interruption and isolation, the same stage feature points are relatively aggregated but exists over-high dimensions in feature space. Compared with others, the DVLM gets a more proper in-cluster distance and a larger between-cluster distance.

V. DISCUSSION
Faced with the difficulties in data feature extraction and low classification efficiency caused by the uncertainty and the non-uniqueness of the pressure response, the accumulation of 30 million pressure sampling points is used to develop an well-testing data classification model. The powerful database and the numerous simulation results ensured the reliability and feasibility of the proposed model.
Based on the well-testing data analysis, we find that the data streams at different stages show different time-domain characteristics due to the influence of well test workflow or geological environment, which can be abstractly expressed by the features obtained by using time-domain, frequency-domain and time-frequency domain methods. In this way, not only the types of extracted features can be significantly enriched, but also the misjudgment caused by the artificial classification can be avoided. Therefore, MFE scheme is put forward.
As for improving the information charactering ability, we introduce four classical methods to expand the types of extracted features. We make a classification accuracy comparison of the features obtained by the sub-methods that make up the proposed method in this article. However, the single sub-method is not efficient, Table 3 Part (1) shows that the maximum classification accuracy is only 39.513%. Although the TIR achieved the highest accuracy of 50.437% among all MFE sub-methods, its low accuracy indicates that the regression method is only suitable for the case of low-dimensional and linearly separable variables in traditional well-testing data analysis, but not for the case of nonlinear variables and multi-dimensional features in well-testing stage classification. Therefore, it can be concluded that the classical methods have significant limitations in classifying well-testing data and working stage decisionmaking. The MFE enriched the types of data features, extractes 38 feature elements, and obtain a better accuracy of 69.788%, but it lacks the selection of major and minor elements, so the further feature mining is still needed. As for the pressure derivative method recommended by current experts, its working principle is similar to linear regression. Although we do not use pressure derivative for classification accuracy analysis, the comparison with linear regression based features such as TIR and MIE (see Table 3) can already show the superiority of MFE method.
In order for the deeper expression of the data, we use DBN to learn the MFE features. Compared with the latest model MFE-DMIC, this article is more detailed in selecting the number of neurons in DBN highest layer, and its experimental results are more convincing. We first try to analyze the impact of different R high values and then assign priority ranking. As shown in Fig. 9, different types of features have different information-carrying abilities, so for features ranked before the 34th, the classification accuracy will step up as the priority increases. When the optimal number of neurons is kept within a certain range (red dots in Fig. 9), the classification accuracy remains stable. There is no clear relationship between the classification accuracy and the neurons size, not the bigger the better. Different numbers of neurons may have the same effect, but for layers that differ by only one or two neurons, the effect of the two may differ. The results in Fig. 8 and the best s-SNE distribution in Fig. 12 demonstrate that it is feasible to select the fixed neurons size from the optimal range with small fluctuation range (97.07%-97.24%) of classification accuracy, and it is correct to determine the size according to the network scale and the priority of the highest-layer-neuron number.
From Fig. 8 and Table 2, we find that the small R high is insufficient to summarize and reflect the sample laws in the training set. The large R high may firmly learn and store some non-relevant information that should be loosely remembered. The classification results of DBN30 (97.690%) and other DBN integration methods in Table 3 show that non-linear and high-dimensional learning mode of the deep learning is more conducive to distinguishing and extracting the effective VOLUME 8, 2020 FIGURE 13. The process of acquiring MFE features. Here we randomly select one sample for analysis. Figure 13.a shows a complete well test curve. Figure 13.b shows the obtained FFT coefficients, three IMF components of EMD, and the gradient direction of linear regression. The stage curves in Fig. 1.b are taken from Fig. 13.a. Fig. 13.c shows the obtained WPD reconstruction coefficients of the stage 1, the number of decomposed layers is 3. The number of decomposition coefficients in other stages is also 8, and the decomposition method is the same. It should be noted that the components of EMD and the coefficients of WPD shown here are only the intermediate parameters, their final features can be obtained only after the approximate entropy calculation is done. For illustrative purpose, we removed the labels from axes. For stage curves, x-label: time, y-label: pressure. For FFT, x-label: frequency, y-label: amplitude. For IMF, it represents the component of each frequency in original signals and has no unit. For LR, x-label: time, y-label: pressure. For WPD, because the reconstructed signal is the same size as the original signal, so x-label: time, y-label: pressure. Considering the feature redundancy and the correlation between the feature elements, we use the MIC to prioritize the feature elements and select the representative features. This achieves the purpose of reducing the elements of features without losing any useful information. In Table 2, the MIC self-verification accuracy appeared to be higher than the MIC classification accuracy. It is mainly because the former trained network, with the same training and testing sets, produces the network parameters with better adaptability to training the samples. The comparison between the above two verifies the effectiveness of the deigned DBN network. Fig.10 supplements the rationality of choosing R high = 30 by giving the ranking of the number of effective feature elements after MIC processing. In Fig. 10, the feature output by DBN30 network has the least number of elements and has the highest classification efficiency. By analyzing the correlation between different feature elements in DBN30, MIC purifies the redundancy and obtains fewer (reduced the number of dimensions from 30 to 9) but more representative features (improved the classification accuracy from 97.690% to 98.065%, see Table 3), which provide an efficient feature input for the LVQ classifier.
As for improving the generalization ability, the LVQ classifier is proven effective in finding a nonlinear mapping between oil test data and operation stage. Usually, the geological data can lead to a bias in interpreting the accuracies and errors. To reduce the system error and improve the classification efficiency, we use the average CV error to complete the parameter optimization and obtain the optimal number of neurons in the LVQ competition layer. By introducing the cumulative error e and the LVQ parameters, we provide a calculation formula of evaluation error Err. As its applications, Fig. 7 analyzes the impact of different LVQ parameters, provides the distribution of average error and minimum error, and determines R CL = 30, G Param = 0.28. For any input of unknown stage, the predicted tags can be automatically and accurately output. It can be seen from Fig. 4 that the structure of LVQ can not only classify the feature, but also further achieve the feature reconstruction and deep training process. The comparison results in Table 3 with different classifiers, deep neural network and the latest MFE-DMIC method verify the superiority of LVQ.
All in all, creating an ensemble data classifier from a set of classic data processing methods is found promising in enhancing the classification performance. The classifier can exploit capabilities of different individual classical methods. The proposed DVLM model is a suitable option in the well-testing analysis for determining the correct well-testing working stage, providing a reliable reference for oil production management.
The proposed model also exists some limitations and shortcomings. First, the feature types are not enough and comprehensive. Second, the sample size for deep learning and network training is not large. Third, the proposed model, which integrates DBN and LVQ, achieves good classification performance, but only considers the DBN highest layer optimization. So, in future work, three ''more'' should be studied in-depth. More feasible methods should be used to maximize the data information mining. More field samples including more complex operations should be collected to realize the better training of the learning network. More influencing factors should be considered in the adjustment and the optimization of model structure.

VI. CONCLUSION
Aiming at improving the oil well-testing stage classification accuracy, we firstly explain the processing principle and the construction process of the proposed model in light of the complexity, randomness and regional characteristics of data. Secondly, we emphasize the importance of the diversity of features, DBN neurons size, feature priority and LVQ parameters. Then the features extracted by the MFE method are used as the input of the structure-optimized DBN network, and combined with the MIC and improved LVQ classifier, the high-precision stage classification is realized. The experimental results show that, benefiting from the improvement in information charactering ability of the features and the generalized classification ability of the classifier, the DVLM has higher accuracy of 98.065% than the classical mathematical methods, various integration methods, and the latest deep learning methods. The proposed model reduces human intervention in the stage classification process and has broad application prospects in stochastic and non-scheduled data processing. XINGWEI HOU was born in Shijiazhuang, Hebei, China. He is currently pursuing the Ph.D. degree in instrument science and technology with Tianjin University, Tianjin, China. His research interests include near infrared spectroscopy, embedded system design, and signal processing.
MENGQIU ZHANG was born in Tai'an, Shandong, China. She is currently pursuing the Ph.D. degree in biomedical engineering with Tianjin University, Tianjin, China. Her research interests include signal detecting and processing, sensor engineering, neural networks, and signal analysis.
SHUGUI LIU received the M.S. and Ph.D. degrees in production machinery from the Tokyo Institute of Technology, Tokyo, Japan, in 1985 and 1988, respectively. Since 1995, he has been a Professor with the School of Precision Instruments and Opto-Electronics Engineering, Tianjin University. His major research interests include automatic measurement and control, sensing and information processing, intelligent measurement, and big data analytics. He has presided and finished one NSFC project and six provincial and ministerial projects. VOLUME 8, 2020