Confidence-Constrained Support Vector Regression for Geological Surface Uncertainty Modeling

Reconstruction of complex geological surface is widely used in oil and gas exploration, geological modeling, geological structure analysis, and other fields. It is an important basis for data visualization and visual analysis in these fields. The complexity of geological structures, the inaccuracy and sparsity of seismic interpretation data, and the lack of tectonic morphological information can lead to uncertainty in geological surface reconstruction. The existing geological surface uncertainty characterization and uncertain reconstruction methods have a shortcoming in balancing the interpolation error of high-confidence samples and model structure risk. Based on support vector regression (SVR), a method with confidence constraints for uncertainty characterization and the modeling of geological surfaces is proposed in this article. The proposed method minimizes the structural risk by adding a regularization term representing the model complexity, integrates high-confidence samples, such as drilling data, based on confidence constraints, and utilizes well path points by assigning appropriate inequality constraints to the corresponding prediction points. The results based on a real-world fault data set show that the uncertainty envelopes and fault realizations generated by the proposed method are constrained by well observations and well paths, effectively reducing the uncertainty.


I. INTRODUCTION
In the field of petroleum exploration, geological surfaces are reconstructed based on drilling data, seismic interpretation data, and various constraints representing regional geological knowledge [1]. Such reconstructions are the basis for establishing sequence and reservoir models. A three-dimension structural model based on geological surface reconstruction can reflect the spatial distribution and shape of geological interfaces/objects, such as horizons and faults, and play vital roles in understanding underground structures, performing reservoir prediction and planning drilling processes.
However, due to the complexity of geological structures, the inaccuracy and sparsity of seismic interpretation data, the lack of tectonic and morphological information, and the improper selection of reconstruction algorithms, there is often The associate editor coordinating the review of this manuscript and approving it for publication was Jon Atli Benediktsson . significant uncertainty in geological surface reconstruction. This uncertainty makes it extremely difficult to construct real geological surface from seismic interpretation data and well observations [2]. Additionally, uncertainty can have a negative impact on structural analysis, reserve calculations and drilling strategies [3]. Therefore, effectively characterizing the uncertainty of geological surfaces, strengthening the research on the uncertainty of geological surface reconstruction and minimizing this uncertainty are of great significance in reducing the risks associated with petroleum exploration and development.
To date, many methods have been proposed to characterize and model the uncertainty of geological surfaces. [4]- [6] generated multiple potential realizations to evaluate the uncertainty by randomly adjusting the parameters that characterize a geological surface. Although methods that perturb the parameters can be effectively applied to explore the uncertainty of a geological surface, they lose some characteristics VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of the corresponding seismic interpretation data and well data; thus, greater uncertainty may be introduced. References [2], [3], [7]- [9] treated the location of sample points as random variables and generated multiple geological surface realizations by varying the initial values of these variables or by randomly sampling the distributions of these random variables. Most of these uncertainty modeling methods are based on the kriging technique [10] and discrete smooth interpolation (DSI) [11]. Neither of these two interpolation methods uses the structural risk minimization principle. Recently, a geological surface uncertainty modeling method combining prior information and a Bayesian reasoning framework has emerged [12], [13]. This approach integrates additional information into the modeling steps and effectively reduces the uncertainty. However, it is difficult to constrain a geological surface with dense hard data in the Bayesian framework. This article uses SVR [14], [15] for geological surface uncertainty characterization and reconstruction. Compared to other regression models, SVR adopts the structure risk minimization principle, thus providing better generalization ability. Reference [16] noted that when applied to complex nonlinear systems, the performance of SVR is better than kriging technique, radial basis function interpolation and other methods. Recently, [17]- [20] incorporated the concept of support vector interval regression into uncertainty analysis. In addition, multioutput SVR [21]- [23] is also a common uncertainty analysis method. The above uncertainty analysis methods based on SVR regard sample points with high confidence as points in insensitive band or outliers; thus, they cannot effectively utilize the information associated with these sample points. In fact, the effective integration of highconfidence sample points is an important issue in uncertainty characterization/modeling and can effectively reduce uncertainty.
In this context, this article proposes a method for uncertainty representation and the reconstruction of geological surface based on SVR with confidence constraints. This method uses two nonparallel functions to identify the lower and upper envelopes that characterize the uncertainty of the studied geological surface and restrict the location and shape of the subsequent simulated geological surface realizations. By adding fuzzy contact constraints [9] representing the confidence of the sample points output to the SVR model, the integration of high-confidence sample points, such as drilling data, can be performed. The well path points can be integrated by specifying an appropriate inequality constraint for the corresponding prediction points. In addition, based on classical -SVR, this article generates multiple geological surface realizations to explore the uncertainty of the data by treating the generated envelopes as the boundary constraints and randomly sampling the prior distribution of the sample point output.

II. BACKGROUND
A concise description of -SVR is given in this section. Suppose that the training set is denoted by (A, Y), where A ∈ R n×l denotes the input sample matrix with row vectors A i = (A i1 , A i2 , · · · , A il ), i = 1, 2, · · · , n representing the input of the ith training sample, n is the number of the sample points, and l is the dimension of the input. Y = (y 1 ; y 2 ; · · · ; y n ) denotes the output sampling value vector of the training samples, where y i ∈ R, i = 1, 2, · · · , n.
SVR first maps the data x ∈ R l into feature space H via a nonlinear function ϕ : R l → H. In feature space H, SVR seeks to estimate a regression function where w ∈ R l and b ∈ R. To tolerate a small error in fitting the given data, the -insensitive loss function that measures empirical risk is used, where the following relation holds: The -insensitive loss function sets an insensitive band around the sample data, within which errors are ignored. Because SVR implements the structural risk minimization principle by introducing a regularization term 1 2 w 2 that characterizes the complexity of the model, SVR yields excellent generalization performance. Moreover, 1 2 w 2 maintains the flatness of the regression function. SVR can be expressed as the following constraint optimization problem: where ξ i , ξ * i , i = 1, 2, · · · , n are slack variables that measure the error, and C > 0 is a parameter determining the trade-off between the closeness of the solution to the training points and its smoothness. A large C generates a model with high complexity and low training error, and a small C results in a simple SVR structure with higher training error.
Introducing the Lagrange multipliers λ i and λ * i on the constraints, the dual problem of Eq. (4) is as follows: By solving the above problems, the appropriate regression function can be obtained. According to the Mercer theorem [24], the inner product in H can be represented by a kernel function, that is, , instead of defining ϕ explicitly. Common kernel functions for regression are as follows:

III. METHODS
Due to the characteristics of SVR, the current uncertainty modeling methods based on SVR regard sample points with high confidence as points in insensitive band or outliers; thus, the information associated with those points cannot be effectively utilized. Seismic interpretation data and well data are often used in geologic surface uncertainty characterization and reconstruction. Although well data are rare, they play a vital role in reducing the uncertainty of the geological surface.
Here, two types of well data are used: well paths and drilling data. Well path points fall on a well path that does not intersect the studied geological surface, and drilling data are observations of the geological surface in a well. To effectively integrate well data, this article propose a confidence-constrained support vector interval regression (CCSVIR) model that is used to generate envelopes representing the uncertainty of geological surface. Based on this framework, the uncertain reconstruction of geological surface can be performed.

A. CCSVIR FOR CHARACTERIZING GEOLOGICAL SURFACE UNCERTAINTY
The uncertainty in the location of a geological surface is represented by uncertainty envelopes. The envelope defined by two surfaces is used to specify the specific region around the studied geological surface, and it restricts the positions of the geological surface realizations [3], [25]. Therefore, in CCSVIR, the objective is to find two functions ϕ T (x)w + b and ϕ T (x)u + d, such that the uncertainty envelopes can be expressed as follows: Here, ϕ T (x)w + b and ϕ T (x)u + d are called the center and radius functions, respectively.
n} is the training sample set of seismic interpretation data and drilling data, where A i ∈ R 2 is the input, and y i ∈ R is the output sampling value of A i . Typically, the output of a sample point is inaccurate and has a certain distribution i . The fuzzy information, which suggests that each sample output is approximately equal to the sampling value, can be expressed as the following chance constraint: where δ is a pregiven positive number and 0 ≤ α i ≤ 1 is a predetermined confidence. δ and α i depend on the reliability of the output sampling value of the training sample. The higher the reliability is, the smaller δ is, and the greater the confidence of α i . If the output sampling value of the training sample is accurate, then δ = 0 and α i = 1. Reference [9] noted that if the output of training samples displays a symmetrical distribution i , then Eq. (8) can be converted to a deterministic interval constraint where , y 1i = 2 y i − y 2i , and β is a probability value that is very close to 1. [y 2i , y 1i ] is called the uncertainty interval of the sample output, and (δ, α i , β) are the parameters of the uncertainty interval. The closer β is to 1, the less likely it is for the output of sample points to fall outside the uncertainty interval. In geological surface uncertainty characterization, the uncertainty interval of the sample output should be within the envelopes to the greatest extent possible, as shown in the following relations: The slack variables ξ i and ξ * i are used to capture the error of the uncertainty interval outside the envelopes. If y 1i = y 2i = y i , the above two constraints represent the parameter-insensitive loss function in the v-SVIRN model proposed by [19].
When performing uncertainty characterization and reconstructing geological surface, although the locations of geological surfaces are uncertain, such processes are usually subject to certain restrictions. For example, the position of a horizon is limited by the positions of the horizons below and above it. Therefore, the output of a sample point has a lower bound lb i and an upper bound ub i . [lb i , ub i ] is called the bound interval of a sample point output. The size of the bound interval of a high-confidence sample point is usually very small.
In addition, a geological surface is usually constrained along well paths. These constraints make the simulated surface more reasonable and realistic. The well path points represent location known to be on a particular side of the studied geological surface [3]. Now let m are the inputs of the well paths. If the outputs of the well path points are y path i , i = 1, · · · , m, then a certain inequality constraint is added to the output for the predicted points corresponding to the well path points to ensure that the position of the geological surface is correctly predicted. For example, if the well path points are located at the footwall of a fault, then the following Eq. holds: If the well path point is located at the hanging wall of a fault, then the following relation is valid: In summary, the following requirements should be applied when determining uncertainty envelopes f up (x) and f bl (x) based on CCSVIR: 1) The envelopes should contain the points (B i , y 1i ), i = 1, · · · , n and (B i , y 2i ), i = 1, · · · , n to the greatest extent possible: (15) 2) The envelopes should be within the bound interval of the training sample point outputs and on the correct side of the well path points: If the well path point is on the upper or right side of the geological surface, then the following applies: If the well path point is on the lower or left side of the geological surface, then the following applies: 3) The upper envelope should be above or to the right of the lower envelope, that is, f up (x) ≥ f bl (x), and the following applies: Near a sample point with high confidence, (ϕ(B i )u + d) may be a negative number very close to 0 due to the accuracy of the computation. In such cases, an unreasonable situation may occur in which the upper envelope is below or left of the lower envelope, that is, the radius function is negative (as shown in Fig. 3). To avoid this situation, Eq. (20) can be replaced with where γ is a small positive number.
Therefore, the CCSVIR model used to characterize the uncertainty of a geological surface can be expressed using the follow quadratic programming problem (QPP): where 1 2 w 2 + 1 2 u 2 describes the CCSVIR complexity. CCSVIR adopts the structure risk minimization principle, which states that to obtain a small risk, the trade-off between the model complexity and training error should be controlled by the parameter C>0, which is selected in advance. To solve Eq. (22), the dual QPP of CCSVIR must be obtained.
Theorem 1: The dual problem of CCSVIR is . The proof of Theorem 1 is given in Appendix.
The dual problem (23) is a convex optimization problem, and it has a global optimal solution. However, the optimal objective function value of the dual problem is the lower bound of that of the primal problem. The proof process of the following theorem shows that the objective function value of (22) is the same as that of (23). Therefore, under the strong duality theorem [26], w and u corresponding to the optimal solution of the dual (23) are the optimal solutions of the primal (22). Theorem 2: Ifλ 1 = (λ 11 ,λ 12 , · · · ,λ 1n ) T ,λ * 1 = (λ * 11 ,λ * 12 , · · · ,λ * 1n ) T ,λ 2 = (λ 21 ,λ 22 , · · · ,λ 2,n+m ) T ,λ * 2 = (λ * 21 ,λ * 22 , · · · ,λ * 2,n+m ) T , andλ 3 = (λ 31 ,λ 32 , · · · ,λ 3,n+m ) T are optimal solutions of (23), then the optimal solution of the primal problem (22) with respect to w and u can be expressed as follows: , the optimal solution of CCSVIR (22) with respect to b and d can be computed as follows: The proof of Theorem 2 is given in Appendix A. Theorem 2 shows that the upper and lower envelopes constructed by the proposed CCSVIR method are as follows: The envelopes generated by CCSVIR can effectively characterize the uncertainty of the geological surface and divide the simulation space of the geological surface into two subspaces, thus restricting the location and shape of the subsequent simulated geological surface realizations.

B. UNCERTAIN RECONSTRUCTION OF GEOLOGICAL SURFACES
A model is proposed for the uncertain reconstruction of geological surface based on classical -SVR. This model treats the envelopes generated by CCSVIR as the boundary constraints, which keeps the multiple realizations of the geological surface within the envelopes and constrained by well paths and drilling data.
For seismic interpretation data and drilling data points, y * i is randomly selected from the distribution of i of the training sample output. (B i , y * i ), i = 1, · · · , n represents the training samples used in the uncertain reconstruction of a geological surface. The goal of uncertain reconstruction is to find a surface g(x) = ϕ T (x)v+e that is fitted to the training sample points (B i , y * i ), i = 1, · · · , n and within the envelopes generated by CCSVIR. Therefore, the model for the uncertain reconstruction of geological surface is as follows: where f up and f bl are the upper and lower envelopes introduced in Section 3.1, respectively. The model complexity is characterized by 1 2 v 2 . The positive slack variables ξ i and ξ * i are responsible for penalizing errors greater than . Eq. (30) employs the structural risk minimization principle. By introducing the Lagrangian multiplier technique, the dual problem for (30) can be obtained as follows: (31) VOLUME 8, 2020  The above dual problem is a convex optimization problem with a global optimal solution. The optimal solution of (30) can be obtained by the optimal solution of (31) according to Theorem 3. Theorem 3: Supposeλ 1 = (λ 11 ,λ 12 , · · · ,λ 1n ) T ,λ * 1 = (λ * 11 ,λ * 12 , · · · ,λ * 1n ) T ,λ 2 = (λ 21 ,λ 22 , · · · ,λ 2,n+m ) T , and λ * 2 = (λ * 21 ,λ * 22 , · · · ,λ * 2,n+m ) T are optimal solutions of the dual problem (31). In this case, the optimal solution of (30) with respect to v and e can be expressed as follows: The proof of Theorem 3 is similar to the proof of Theorem 2. Theorem 3 shows that the geological surface realization fitted to the random sample points (B i , y * i ), i = 1, · · · , n is as follows: By randomly selecting different y * i values from the distribution of i , multiple geological surface realizations can be obtained, thus effectively exploring the uncertainty space of the data.

IV. EXPERIMENTS AND RESULTS
To investigate the effectiveness of the proposed algorithms in this article, they are applied to a real fault data set, which consists of 15 fault sticks (containing 799 discrete points) obtained by seismic interpretation, 4 well path points and 2 drilling data points. Because the x-coordinates of the 799 fault points are uncertain, the x-coordinates x i , i = 1, · · · , 801 of these fault points and drilling data points are taken as the output sampling values. It is assumed that the output of sample points follows a normal distribution, with mean x i , i = 1, · · · , 801. The bound interval length of the fault points is set to 460 m, and the parameters (δ, α i , β)  of the fault point are (85, 0.8, 0.999). Here, the given length of the bound interval and the parameter δ of the uncertainty interval of the fault points are several times larger than the actual values to provide clear results. In addition, a Gaussian kernel is used in the following experiments. Fig. 1 shows the initial fault envelopes generated by CCSVIR from only 799 fault points. The parameters (C, σ, γ ) of CCSVIR are chosen to be (200, 1200, 0). There are no high-confidence training data, such as drilling data points. Therefore, ϕ(B i )u + d is not a negative number close to zero, and γ is set to zero. The initial envelopes generated by CCSVIR are shown in purple. The fault realization which shown in orange is obtained from the average of the upper and lower envelopes (the fault realizations in both Fig. 2 and Fig. 4 are obtained in this way).
Then, four well path points are added to the model. The parameters (C, σ, γ ) are the same as before. Fig. 2 shows the updated envelopes where four well path points are added with red dots. The updated model shows that the well path points can partly reduce the uncertainty.
Finally, two drilling data points were added to study the influence of drilling data on the uncertainty of the geological surface. The parameters of the uncertainty interval of drilling data points are set to (4.5, 0.98, 0.999). As discussed VOLUME 8, 2020 in Section 3.1, the addition of drilling data points may result in an unreasonable situation in which the upper envelope is below or to the left of the lower envelope. To avoid this situation, the parameter γ must be set to a small positive number. Fig. 4 shows the updated envelopes where the well path points (red dot) and the drilling data points (black dot) are added, with the parameters (C, σ, γ ) are set to (200, 1200, 5.5) and the bound interval length of the drilling data points is set to 5.5 m. The results show that the drilling data points are important for reducing the uncertainty. Moreover, the proposed CCSVIR method can effectively characterize uncertainty of geological surface and integrate various modeling data. Fig. 5 shows the multiple fault realizations generated by the proposed uncertain reconstruction model (Eq. (30)) with the envelopes generated in Fig. 4 as the boundary constraints. The parameters (C, σ, ) of Eq. (30) are chosen to be (200, 1200, 30). As illustrated by the results, the proposed uncertain reconstruction method yields plausible fault realizations considering all the modeling data and without bullseye effects near the drilling data points.

V. CONCLUSION
Uncertainty characterization and the uncertain reconstruction of geological surfaces play increasingly important roles in oil and gas exploration because they provide context for risk analysis. Most existing methods are based on probability field simulations and traditional interpolation methods, and their generalization ability is poor. The method proposed in this article employs the structural risk minimization principle, and it surpasses existing methods in terms of generalization capability. The proposed method effectively integrates seismic interpretation data and well data by introducing confidence and boundary constraints. Thus, the generated envelopes and geological surface realizations are constrained by drilling data points and along well paths. Moreover, because of the characteristics of the SVR approach, the resulting uncertainty envelopes and geological surface realizations of the developed methods are not affected by outliers. In the future, more constraints that characterize geological rules, such as length and curvature, can be added to the model to minimize uncertainty and obtain more realistic and reasonable geological surfaces.
The methods proposed in this article provide an effective way of applying machine learning algorithms to the uncertainty characterization and modeling of geological surfaces. However, the proposed model contains multiple types of constraints, which results in a long training time. In the future, CCSVIR and the uncertain reconstruction model can be decomposed into two smaller QPPs to reduce the training time. In addition, the resulting envelopes and geological surface realizations are affected by the selection of hyperparameters. This article mainly introduces a new method for characterizing and modeling the uncertainty of geological surface; thus, the introduction to the selection of parameters is basic and will be explored more in the future.
Hence,b andd can be computed as follows: