A Comprehensive Review of Soft Computing Models for Permeability Prediction

Crude oil is a vital and valuable commodity in the energy industry. In order to maintain continuous, stable, and reasonably priced supplies, oil producers need cheaper exploration and extraction techniques. Permeability is one of the formation parameters that is a key interest to petroleum engineers in determining the economic worth and yield of crude deposits, yet permeability prediction remains a difficult problem. Many approaches have been applied to solving this important issue. Soft computing has been deployed to predict permeability. In this paper, we present an extensive review of the existing research that has been conducted on applications of soft computing for permeability prediction. This paper finds out that traditional approaches for permeability prediction are still relevant in the oil and gas industry. Soft computing methods in particular are worthy of addition to this interesting area. This extensive review is intended to be an entry point for further exploration of other approaches that have received little or no attention from researchers.


I. INTRODUCTION
Crude oil is central to the world economy by meeting the energy demand and it is the engine room of most economies. Players in the oil market need cheaper extraction methods and techniques to maintain a reasonable price and stability of the world economy. Permeability is one of the formation parameters that is pivotal to petroleum engineers because it is essential in determining the economic worth and yield of crude deposits. Permeability is a measure to determine the ability of fluid to flow through materials [1]. It is fundamental in reservoir management, in the choice of optimal drainage port, and in perforation. Formations permeability are usually measured from the core sample and well logging in the laboratories which are often tedious and expensive. Researchers have agreed that permeability prediction is a complex problem in oil industries [1]- [3]. The difficult and dynamic nature of the problem has aroused the interest of researchers over the years. The cost of direct measurement has led researchers to show interest in other ways of predicting this vital parameter. One of the methods that have been harnessed is soft computing applications in predicting permeability. The successes of soft computing techniques in other application areas have led to its popularity in oil and gas exploration.
The associate editor coordinating the review of this manuscript and approving it for publication was Tawfik Al-Hadhrami .
We classify all the methods for predicting permeability as shown in Fig. 1. In this paper, we are concentrating on soft computing techniques. This study is intended to be the starting point for any researcher that is delving into the application of artificial intelligence techniques in permeability prediction. We conduct a systematic and in-depth analysis of the methods available in the literature by exposing their strengths and weaknesses. The remaining part of this paper is organized as follows. Section II presents an overview of soft computing models. In Section III, the formation studied in literature and data availability are discussed. Section IV presents statistical quality measures that are employed in the literature for the performance evaluation of proposed models. Section V classifies all the works in the literature on the applications of computational intelligence into single and hybrid models while Section VI concludes the paper.

II. OVERVIEW OF SOFT COMPUTING MODELS
Soft computing, artificial intelligence, computational intelligence, and machine learning are used interchangeably in the literature. Though, there are subtle differences between some of them, for the sake of readers from diverse research disciplines. We live it at that. For the sake of consistency, throughout this paper, we will be using soft computing techniques. Soft computing is an aspect of computing that attempts to learn from data and adjust itself with experience where necessary. There are several soft computing techniques available in the literature. Some of them are nature-inspired such as genetic algorithm, cuckoo search, ant colony, particle swarm optimization, and so on while others mimic the human neural system like artificial neural networks. Others are purely mathematical models such as extreme learning machines, support vector machines, etc. The learning process for all the algorithms are usually classified as supervised, unsupervised, and semi-supervised or reinforcement learning. In this section, we will provide an overview of the soft computing models/ techniques that are mostly applied for permeability prediction in the literature.

A. ARTIFICIAL NEURAL NETWORK
Artificial Neural Network (ANN) is a mathematical model designed to emulate the human brain. It attempts to mimic the biological neural system of using neurons and nodes to process information [4]. A neural network is a black box universal approximator that can approximate difficult and complex functions. Its details working are like human reasoning with the ability to learn and adapt from data provided to it. ANN is made up of four major components; information is processed with the aid of neurons. Each link connects one or more neurons. Every link between neurons is associated with adjustable weight and an activation function is applied to its input to regulate its outputs. There are no strict rules on the number of hidden layers. It is usually based on trial and error [5]. Structurally, the basic ANN is made up of an input layer, an output layer and in between these two layers are one or more hidden layers as shown in Fig. 2. The operations of ANN involve training, validation, and testing. The network can be trained with data, which means finding an appropriate relationship between the inputs and outputs after adjusting weight and biases. The ''reasonable'' relationship depends on the predefined error threshold.
There are many training algorithms for ANN [6]. Backpropagation (BP) is used to train ANN by adjusting weight and learn from the data. Backpropagation is a supervised learning algorithm where a set of data is provided against a set of outputs. During training, in every iteration, it tries to minimize the error by adjusting weight biases. There are three training termination criteria; They are: (a) if the gradient performance falls below a predefined threshold (b) predefined error limit is reached and (c) if the maximum iterations are reached [7]. ANN can handle many problems. It can process nonlinear problems independent of the initial assumption. However, it has its drawbacks. ANN is restrictive and can get stuck in local minima. To address some of these challenges faced by the basic neural network which is a Feedforward Neural Network (FFNN) that gives multi-layer perceptron or backpropagation learning algorithms. There are many modifications and extensions to ANN such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Learning Network (DLN), Radial Basis Neural Network (RBNN), Higher-order Networks, Probabilistic Networks, Fuzzy Neural Networks (FNN), Wavelength Neural Networks (WNN), and Generalized Regression Neural Network (GRNN) [8]. RBNN is similar in structure to FFNN, that is, it has input, hidden and output layer. However, the hidden layer of RBNN is made up of a radial basis function. Its input bypasses the weight computation at the input layer and goes directly to the hidden layer with activation of radially symmetric basis function. RBNN has a high learning speed and local approximation [9]. RNN is more structurally complex compared to RBNN and FFNN. In RNN, there are intermediate inputs connected directly to the input at the hidden layer as can be seen from Fig. 3. In the use of GRNN, investigating the structure and iterative training is not necessary as in backpropagation [8].

B. ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS
Adaptive Neuro-Fuzzy Inference Systems (ANFIS) is a neuro-fuzzy technique that combines the neural network and the fuzzy inference system. The relational structure of the artificial neural networks and its learning ability is fused with the decision-making mechanism of fuzzy logic. The neural network is highly adaptable while fuzzy logic handles imprecision and uncertainty of the system. The disadvantages of black-box nature and indeterminable weight of artificial  neural network is eliminated by the structure of ANFIS in the fuzzy inference system. Detailed descriptions of the ANFIS model can be found in [10].

C. CUCKOO SEARCH ALGORITHM
Cuckoo Search Algorithm (CSA) is inspired by the lifestyle and behavior of a bird species called cuckoo. Due to the peculiar reproductive behavior of laying their eggs in another bird's nest. Cuckoo can detect new next and its egg, even when mixed with the host's bird's eggs. Some host birds might retaliate by either rejecting the cuckoo eggs or move to a new location to build a new nest. For simplicity, each egg in a nest represents a solution and each cuckoo egg represents a new one [11]. The cuckoo search procedure is presented below and its flowchart is shown in Fig. 4.
• Each cuckoo lays one egg at a time and puts it in a randomly chosen nest.
• The best nest with a high-quality egg is carried over to the next generation.
• The number of available host nest is constant and a host has the probability P a ∈ [0, 1] of discovering an alien egg.
• Consequently, the host bird can either throw away the cuckoo egg or move to a new location to build a new nest.

D. EXTREME LEARNING MACHINE
Extreme Learning Machine (ELM) is a learning algorithm for a single hidden layer feedforward neural network (SLFN) withN hidden neurons andN < N , where N is the number of training samples. The learning process as proposed in [12], and is described as follows: Hence, a standard SLFN withN hidden neurons and activation function g(x) is defined as: where w i = [w i1 , w i2 , . . . , w in ] T is the connecting weight vector between the i th neuron and the input neuron and β i = [β i1 , β i2 , . . . , β im ] T is the connecting weight vector between the i th hidden neuron and the output neuron. b i is the threshold of i th hidden neurons. w i .x j is the inner product of w i and x j . The standard SLFN withN hidden neurons and activation function g(x) is to approximate these N samples with zero error that is The above equation can be re-written as where As proposed in [13], H is called the hidden layer output matrix of the neural network; the i th column of H is the i th hidden neuron's output vector.

E. FUNCTIONAL NETWORKS
Functional Networks (FN) is an extension of the neural networks for efficient generalization and a combination of both domain and data. In FN, there is no weight associated with links and it can select the best function by learning from the data and attempt to minimize MSE. The learning process in FN involves both parametric and structural. Parametric is the estimation of the neural function while structural is based on the topology of the network [14].  FN has the following components (a) input layer, an output layer, and one or several layers of processing unit. Generally, n layer FN model is as follows: where Y ∈ R m represents output of the system, X ∈ R n represents input to F 1 , F 2 , . . . , F n .

F. FUZZY LOGIC
Type I fuzzy set was introduced in [15]. It is used to model uncertain processes such as temperature and pressure. The membership function U x (x) is chosen based on individual experience from [0 1]. A simple example of a triangular membership function for type I fuzzy is shown in Fig. 6. Expert experience is always needed. Type I fuzzy set membership function is crisp. It has been applied in several applications such as data mining, time series modeling, reservoir characterization, and permeability prediction. Despite the successes of type I, there are limitations to the level of uncertainties that can be modeled by type I. Hence, type II is an extension of type I. The membership function  U x (x) in type II is itself fuzzy. A special case of type II fuzzy logic is called interval type II which is widely used. The membership function in interval type II is an interval. For example, consider the Gaussian membership function in Fig. 7. Interval U x (5) = [0.7, 1]. Type II is bounded above and below by two type I fuzzy sets. They are called Upper Membership Function (UMF) and Lower Membership Function (LMF) and the shaded area is called the footprint of uncertainty. The membership function can be constructed from surveys or using an optimization algorithm [16]. Type II fuzzy logic system is shown in Fig. 8. The fuzzifier maps the crisp input into the fuzzy set. The rules are of the form: R : if x 1 is f l l and x 2 is f l 2 and . . . x p is f l p then y is G l . Note that it is not a necessary condition for all the antecedent/consequent to be type II. If either one antecedent or consequent is type II, hence it all becomes a type II. The inference engine combines rules and gives the mapping from input type II fuzzy to output type II fuzzy set. The defuzzifier produces an extension principle to produce the type I defuzzification method. This process is called type reduction.

G. SUPPORT VECTOR MACHINE
Support Vector Machine (SVM) is a statistical learning model developed in [17]. SVM is a supervised learning model that can receive input data and output function that can be used to predict the features of future data [18]. It constructs nonlinear decision functions by training a classifier to perform a linear separation in a high-dimensional space which is nonlinearly related to the input space. SVM has been applied in solving several problems such as classification, regression, function approximation, text categorization, pattern recognition. The ability of SVM to overcome getting stuck in local maxima and ease of generalization makes it a model of choice for many applications. Given samples, the SVM model is formulated as a minimization of Vapnik's -insensitive loss function with the target value (y) as: To estimate a linear regression where x is the input vector, w is the weight, and b is the bias term. The objective is to minimize To ensure that the margin is maximized and error of classification is minimized, C is introduced as a trade-off parameter. Considering the set of constraints, the problem can be formulated as an optimization problem as: subject to: According to the constraints (11) and (12), if the error is less than will not enter the objective function and does not require positive ξ i or ξ i [19]. SVM basis function could be polynomial, radial basis function, and sigmoid function.

H. GENETIC ALGORITHM
Genetic Algorithm (GA) is an evolutionary search algorithm that attempts to mimic Charles Darwin's theory of survival of the fittest in biology. GA is used to solve optimization problems. It uses three standard processes of selection, crossover, and mutation. Since it is a search problem, the optimal solution is always our objective. The algorithm is summarized in Fig. 9. We start off by randomly selecting an initial population that is likely to have the optimal solution. The population evolves with time as an individual and as a group. During selection, the best performing chromosomes are selected and the weaker ones dropped. The process continues iteratively until an optimal solution is reached.

I. GENETIC PROGRAMMING
Genetic Programming (GP) is an example of an evolutionary algorithm inspired by biological evolution. It is used to discover solutions to problems humans do not know how to solve directly. GP implements algorithms that use random  mutation, crossover, fitness function, and multiple generations of evolution to resolve a user-defined task [20]. It carries out the continuous improvement of an initial random population of programs. These improvements are made possible by stochastic variations of programs and selections. Solutions are reached according to predefined criteria for judging an acceptable solution [21].

J. PARTICLE SWARM OPTIMIZATION
Particle Swarm Optimization (PSO) is inspired by the social behavior and movement dynamics of fish, birds, and insects. It is a stochastic search method suitable for continuous-variable problems. PSO has several advantages such as an efficient global search algorithm, simple implementation, and very few algorithm parameters. However, one of its drawbacks is a weak local search [22]. The PSO algorithm is described as follows and the flowchart is as shown in Fig. 10: 1) Create a population of particles (agents) uniformly distributed over x. 2) Evaluate each particle's position according to the object function. 3) If the particle current position is better than the previous position, then update. 4) Determine the best particle based on its previous position. Update the particle velocity using where V t+1 i is velocity, Pb t l is the best-remembered individual particle position, gb t l is the best-remembered swarm position, X t+1 l = X t l + V t+1 i , ϕ 1 , ϕ 2 are cognitive and social parameter, and U 1 , U 2 are random numbers between 0 and 1. 5) Move to step 2 until stopping criteria are satisfied.

III. FORMATION STUDIED AND DATA AVAILABILITY
The reservoir formations data used in literature are mostly obtained from carbonate reservoirs in the Middle Eastern region, Iran (Mansouri, Sarawak, Kangan, South Pars), and a few other formations from other countries such as the Hassi field in Algeria, Western Sichuan Basin and Mesozoic strata Gaoqing in China, and Washakie Basin in the USA. Others are the North sea and Ula field in Norway and Venture gas field in offshore Canada. The datasets named the Middle East with the highest number of the published papers consist of a total of 356 observations from well logging with core permeability measurements. The statistical descriptions of the predictors and the corresponding core permeability are given in Table 1. Most times, the predictors are of different units, the majority of the researchers normalized their data to the range of [0,1] while the permeability output was mostly expressed in the logarithmic scale. The data sets were divided randomly into 70% for the training of the models and 30% for testing in most of the studies.

IV. PERFORMANCE EVALUATION METHODS
In this section, we present statistical quality measures that are employed in the literature for the performance evaluation of proposed models. If there is no linear correlation or a weak linear correlation, r is close to 0. A value near zero means that there is a random, nonlinear relationship between the two variables. Note that r is a dimensionless quantity; that is, it does not depend on the units employed. A perfect correlation of ±1 occurs only when the data points all lie exactly on a straight line. A correlation greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak. These values can vary based upon the type of data being examined.
A study utilizing medical data may require a stronger correlation than a study using social experiment data.
where y a and y p are the actual and predicted values andŷ a andŷ p are the mean of the actual and predicted values.

2) Root Mean Square Error (RMSE): RMSE is the standard deviation of the residuals (prediction errors).
These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample. Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.
where y k andŷ k are the actual and predicted values and N is the number of data samples.

3) Mean Absolute Percentage Error (MAPE):
The MAPE measures the size of the error in percentage terms.
It is calculated as the average of the unsigned percentage error. The MAPE is scale-sensitive and not suitable for low-volume data. Notice that because actual is in the denominator of the equation, the MAPE is undefined when actual demand is zero. Furthermore, when the actual value is not zero, but quite small, the MAPE will often take on extreme values. This scale sensitivity of MAPE makes it unsuitable as an error measure for low-volume data.
where y k andŷ k are the actual and predicted values and N is the number of data samples.

4) Mean Absolute Error (MAE):
where y k andŷ k are the actual and predicted values and N is the number of data samples.

5) Mean Square Error (MSE):
where y k andŷ k are the actual and predicted values and N is the number of data samples. 6) Average Absolute Percentage Relative Error (E a ): where y k andŷ k are the actual and predicted values and N is the number of data samples. 7) Standard Deviation (SD): SD is a measure that is used to quantify the amount of variation or dispersion of a set of data values from the mean.
where y is the actual value,ȳ is the mean value, and N is the number of data samples.

V. APPLICATIONS OF SOFT COMPUTING FOR PERMEABILITY PREDICTION
As the quest for effective permeability prediction techniques continues, soft computing techniques were harnessed as it proofs to be able to handle complex problems. In this section, we provide a detailed analysis of the works in the literature.

A. SINGLE MODEL APPLICATION
Single model applications are the type of applications that uses only one soft computing technique in their applications. All techniques have their limitations, soft computing is not an exception. Though with a lot of successes is solving linear, nonlinear, and complex problems such as classification, regression, prediction there are drawbacks. For example, one of the widely used neural networks is multilayer perception. It could easily get stuck in local minima, it is slow in training, determining the optimal architecture using trial and error is difficult.
In this section, we present works that use only a single model for the prediction of permeability. As stated earlier, traditionally there are many methods for estimating permeability indirectly from rock properties acquired from well log measurements. In [2], it is shown that permeability is correlated to the pore (micro-scale) model, pore characteristics, and statistical (percolation and fractal techniques). Singh [23] combined the estimates from well logs and core data using buckle methods to estimate permeability. Early work on the application of soft computing techniques for permeability prediction shows that is promising. It is inexpensive, non-interruption of production, and the speedy result of the investigation. Soft computing model applications have expanded over the years. According to [3], the initial well-known models for permeability prediction are ANN, fuzzy logic (FL), and neuro-fuzzy. Each of these models has its strength and weaknesses. Mohaghegh et al. [24] proposed multivariate regression analysis as a useful tool for permeability correlation. Furthermore, in [24], the authors proposed a virtual measurement using ANN. Due to the poor structural design of ANN, GRNN was used to design the optimal architecture and used backpropagation for the prediction. GRNN reaches a stable state in a short time but lacks generalization ability while ANN generalizes better i.e., it can predict new data reasonably. Their result shows that ANN performed reasonably well in prediction permeability of well log it was not used to.
Huang et al. [25] used ANN to model the relationship between the spatial position and permeability of six wells in the Venture gas field in offshore Canada. Helle et al. [26] used ANN to predict the porosity and permeability directly from well logs using data from the North sea formation. Singh [27] used a conventional log in FFNN and backpropagation to estimate permeability. Ben-Awuah and Padmanabhan [28] developed ANN model using facies instead of conventional logs. The authors used porosity as input data to the model and permeability as the target output. The correlation coefficient raised to 99%. In [29], Abdideh expressed the relationship between porosity and permeability with linear regression. The author created ANN model by dividing the dataset into different well zones. The correlation coefficient of each zone ranges from 0.73 to 0.85. Due to complex architecture design and slow training encountered with ANN, Saljooghi and Hezarkhani [30] combined wavelet theory and ANN to form a wave network (wavenet). The authors applied different wavelet as the activation function to the ANN. Though, wavelet parameters such as dilation and translation were kept constant. The wavenet outperform ANN with R 2 of 92 compared to 89. In [31], Mohebbi et al. noted that the distinct feature of high pressure, heterogeneity in the Iran oil field inhibits the performance of other studies using ANN. They modified their work by zoning the reservoir VOLUME 9, 2021 geological characteristic before applying ANN. The estimation of permeability from porosity, specific surface area, and irreducible water saturation was proposed in [32], while the authors in [33] examined the relationship between porosity, permeability, and depth. Depth and porosity were passed as input to the ANN to predict permeability. The researchers in [34] estimate a log derived permeability using ANN.
Researchers in [35] opined that generally, permeability prediction is a complex problem. The problem becomes more challenging in a tight sand formation with strong heterogeneity. This is the case of middle Jurassic Shaximiao formation in Western Sichuan Basin China studied by the authors. They investigated porosity using single linear regression (SLR) and permeability with multiple linear regression (MLR), multi-layer perceptron (MLP), and support vector regression (SVR) with multiple inputs using porosity and well logs. The result of their investigation indicated that MLR performed better than SLR while MLP and SVR outperformed SLR and MLR. The result of this work affirmed the assertion that though standard statistical models are still invaluable to petroleum engineers soft computing techniques have improved prediction accuracy. Hamada and Elshafei [36] combined well logs with nuclear magnetic resonance (NMR) for predicting gas-sand permeability using ANN. The permeability is derived from the empirical relationship between NMR porosity and mean value of T2 time. This model was tested with the combination of data from two wells from the Middle Eastern reservoir formation. The achieved correlation coefficient of 0.978 and 0.961 for training and testing respectively are very close to values obtained from core logs. Elkatatny et al. [37] proposed a reduction in the number of inputs to ANN for permeability prediction. They used three logs namely; neutron porosity, bulk density, and resistivity as inputs to the ANN to predict permeability. In addition, they proposed a term called the mobility index from studying the interrelationship among the logs. The mobility index shows a high correlation with permeability values from the core. Their results were compared to that of ANFIS and SVM. The researchers in [38] used data from Mesavarde tight gas sandstone in Washakie Basin in the USA to predict permeability with the aid of MLP, SVM, and CANFIS (coactive neuro-fuzzy inference system).
In [39], the authors proposed that in order to overcome the slow training, high computational cost, and getting stuck in local minima associated with MLP, there is a need for a modification in the structure of ANN. According to the authors, the human brain is modular and massively parallel. These two features make them work independently. This modularity concept can be applied to MLP to improve its performance. The authors used a modular neural network (MNN) in permeability prediction in Persian gulf Iranian offshore, a formation with high heterogeneity. Different inputs were selected for different indicators in the formation. Spectral gamma-ray was used for the shale region, electricity resistivity and water saturation were considered for the permeable region while total and secondary porosity were also applied. Different architectures of MNN were examined and the result shows that MNN reduced training time and CPU time as well as improving performance. Bagheripour [40] proposed the formation of a committee neural network for permeability prediction. First, the overlapping of data was removed using principal component analysis (PCA). Three committees were formed with MLP, RBF, and GRNN. The output from each member of the committee was fed as input into the committee neural network. The results show the committee performed better than the individual member. Furthermore, Irani and Nasimi [41] proposed an improvement to ANN by using GA to select optimal values for the weights and biases. The GA is used for its global search capability to overcome the weakness of ANN that usually gets stuck in local minima. The results of the optimized ANN with GA are better than the prediction without weight optimization. Aïfa et al. [42] developed a nonlinear regression by using fuzzy logic to select the best input to the ANN model. Fuzzy logic was used to calibrate the permeability and prediction model with linear regression and backpropagation was constructed and compared. Olatunji et al. [43] applied ELM to build a model by assuming a nonlinear relationship of permeability with other rock properties using Middle Eastern well data. The ELM surpasses ANN and SVM in terms of speed of training, RMSE, and coefficient correlation. An extension of ANN, functional networks [44] was used in [45] for permeability prediction. FN allows neurons to be multivariate, multiargument, and different learning function [44]. The authors used the least square (LS) method to estimate the activation functions for the network. The activation function can be based on the least square, steepest descent, and mini-max. The authors compared their results to that obtained from NN, linear, and nonlinear regression and fuzzy inference systems. FN performed better than ANN and ANFIS.
Alfaouri et al. [15] observed that most of the earlier work on permeability prediction was done on sandy reservoirs. They applied FL on the carbonate reservoir by modifying the defuzzification part of the fuzzy logic system. Fuzzy C-means was used for rock type clustering in [46]. Then ANN was applied to verify the result of the model. In [47], the authors proposed the use of FL for permeability prediction. The authors used ANFIS to learn the rules from the data, thereafter, applied the least square method and backpropagation gradient descent to train the fuzzy inference system (FIS). Rules were extracted using grid partitioning. Grid partitioning though easy to use, it may give rise to rules explosion. Subtractive clustering was used to generate input data clusters. Wang et al. [48] posited that permeability in tight and heterogeneous sand is more challenging. They proposed the idea of feature engineering for the optimization of FL. This involves the application of the Student-Newman-Keuls method to the sample before applying it to FL. They compared the results to an ordinary regression model without feature engineering. The optimized model performed better. In [49], the author maintained that different litho-facies exhibit different features. Also, log porosity may be associated with permeability but some are more likely than others. They assigned a data bin to each litho-type for FL to learn rules from. Furthermore, a comparison of FL with other methods utilized was made. That ANN needs the right conditions and architecture to perform well. Least square regression (LSR) cannot predict extreme values but FL can while FL cannot predict any value outside the data point, in this case, LSR helps out. Olatunji et al. [50] expressed that the level of uncertainties in real life is higher than the one that might be handled by type I fuzzy logic (T1FL). The authors proposed the use of type II fuzzy logic (T2FL) to handle the high uncertainties in real life well data. Table 2 summarizes the published papers by techniques and data sources. Table 3 summarizes the published papers by comparison to other techniques.

B. MULTI MODELS APPLICATION
This section discusses works that include multiple models. In literature, some referred to the combination of multiple models as hybrid while others called it ensemble. Hybrid computational intelligence is defined as an effective combination of intelligence techniques that perform superior or in a competitive manner compared to the single technique [43]. The ensemble on the other hand involves learning and integration of multiple models in order to improve the final prediction. Ensemble methods help to reduce the chance of error while increasing the overall reliability and confidence of the model [53]. Different combination approaches are as shown in Figs. 11 and 12. Improving soft computing capability through a combination of models has gained popularity lately [43], [54], [55]. In Fig. 11, models were created from the overall dataset. Thereafter, each model contributes to the prediction. Several methods abound in the literature for model switching or selection. On the other hand, in Fig. 12, the data is segmented or clustered. Model clusters are then created for each cluster, then they are combined to give the final prediction.
In [54], the authors asserted that determining the structure of ANN is a difficult problem, to overcome this weakness, there is a need for an optimization algorithm. The authors proposed multi-gene genetic programming (MGGP) a variant of GP to obtain the optimal structure of ANN. Although, GP represents a problem with a tree structure. The work combined the selection ability of GP and the estimation power of regression. There is always a tradeoff between the complexity and success of the algorithm. The proposed model was applied to a porous media and compared to another model with ANN, ANFIS, and GP only. Similarly, to improve the generalization of ELM, Olatunji et al. [43] introduced T2FL to handle uncertainties. As ELM is fast, with better generalization, and avoid local minima, the combined model is used to predict permeability in the Middle Eastern reservoir. A comparison with results obtained from T2FL, ELM, SVM, and ANN as a single model applied to the same data show that the multiple models performed better. In [55], the researchers opined that T2FL is complex and performed poorly with small data. They proposed the hybrid of least square functional and T2FL and their model was tested with North American and Middle Eastern reservoirs. But the researchers in [56] used PSO to select optimal hyperparameters for SVM (PSO-SVM) and applied them to the same data. It shows that PSO-based SVM outperformed the ordinary SVM. In related research, the authors in [57] posited that FIS performed poorly without optimization. They combined FL, LS-SVM, and GA to form two distinctive models made up of GA-FL and GA-LS-SVM. The authors achieved a correlation coefficient of 0.96 and 0.97 for both models, respectively. Furthermore, Ahmadi et al. [58] stated that despite the successes of GA as an optimization algorithm. It is computationally expensive in a large scale optimization problems. The authors proposed the combination of GA and PSO for the benefit of both to optimized ANN for permeability prediction. A similar model was proposed in [59], where GA and PSO were used to optimized BP to generate GA-BP and PSO-BP, respectively. The new hybrids combined the local search ability of BP with the global search abilities of GA and PSO for effective permeability prediction.
Anifowose and Abdulraheem [60] proposed two hybrids from SVM, FN, and FL. The data were divided using stratified sampling methods and processed using the least square fitting algorithm. The two hybrids are FN-T2FL-SVM and FN-SVM-T2FL. The difference is in the pattern of model combination during training and testing. Earlier in [61], the authors used the same approaches in [60] by using FN to select the best predictor variables. Both Anifowose and Abdulraheem [60] and Helmy et al. [61] were applied to well logs. However, in [62], the authors deviated a bit by integrating five seismic properties and six oil well logs and applied the same hybrids in [60], [61]. Olatunji et al. [43] combined T2FL and sensitivity based linear learning method (SBLLM) [63]. The T2FL was used to handle uncertainties and cleaning of data and rule extraction, thereafter, SBLLM is used for prediction. Table 4 summarizes the published papers using hybrid techniques.

VI. CONCLUSION
This paper presents a comprehensive review of the available research in the literature on the application of soft computing in solving the permeability prediction problem in the oil and gas industry. The formations studied in the literature were mostly from the Middle East specifically oil fields in Saudi Arabia and Iran. Others are few oil fields from Algeria, Abu Dhabi, China, Canada, and Norway. It is observed that there is limited data available for research in this area. This is unconnected with the fact that oil companies keep their data close to their chest.
Traditional methods of permeability prediction based on empirical models and theoretical equations as well as models based on porosity and facies are still useful to petroleum engineers. Moreover, soft computing methods are worthy of addition to the solution of permeability prediction. In addition, the cost and time-consuming nature of other approaches such as well logging make soft computing methods to be appealing to oil and gas companies. Although there are no approaches without limitation, soft computing is not an exception. Traditional methods are used as a baseline for measurement with other approaches.
Early works in soft computing techniques focus mainly on ANN and FL, later SVM but recently, there are many research efforts that harnessed optimization techniques to complement ANN in weight selection to improve the performance of ANN. Furthermore, optimization algorithms have been used to improve the performance of SVM through optimal parameter selection.
Ensembles and hybrid computational techniques have received tremendous attention from researchers. This is evident from the number of recent publications on the use of hybrid soft computing models for the prediction of permeability. Nonetheless, more work needs to be done on the standardization of the methods for ensemble integration.
This review provides a comprehensive work in the application of soft computing in permeability prediction. The work serves as an entry point for researchers wanting to delve into this area and reference points for practitioners in oil and gas exploration.

ACKNOWLEDGMENT
The author gratefully acknowledges the support provided by the University of Hafr Al Batin.