iAceS-Deep: Sequence-based identification of Acetyl Serine sites in Proteins using PseAAC and Deep Neural Representations

In the biological systems, Acetylation is a crucial post-translational modification, prevalent in various physiological functions and pathological conditions like carcinoma and malignancies. To better understand serine acetylation, the first step is the efficient identification of the same. Although multiple large-scale in-vivo, ex-vivo, and in-vitro methods have been applied to detect serine acetylation biomarkers, these experimental methods are time-consuming and labor-intensive. This research aims to develop an in-silico solution to supplement wetlab experiments for efficient detection of serine acetylation sites by combining Chou’s Pseudo Amino Acid Composition (PseAAC) with deep neural networks (DNNs). By employing well-known DNNs for feature learning and classification of peptide sequences, our approach obsoletes the need to separately perform costly and cumbersome feature learning process. Based on performance evaluation using standard evaluation metrics, CNN and FCN based models, for AcetylSerine site identification, surpassed previously reported predictors which shows the efficacy of proposed approach.


I. INTRODUCTION
Acetylation is a reversible type of post-translational modification deemed important due to its effects on the metabolism of the body, genetic expressions and various disorders, i.e. carcinomas and malignancies, etc. [1]. Both the antagonist processes of acetylation and de-acetylation have a regulatory effect on the translation of proteins and carry out various cellular activities i.e. protein assembly, sequencing the amino acid over the ribosomes and regulation of cell functions. Effector proteins, molecular chaperones, cytoskeleton proteins all are regulated by acetylation [2], [3]. Normally, the different reactions i.e. methylation, sulfation, hydration undergoes the conjugation process of the posttranslational modification but acetylation commonly takes place over the N-terminal (NH3+) group of the target residue of the histone core of the protein. The acetyl CoA usually acts as a donor of the target residue [4]. Acetyl CoA is a 2-carbon compound and even acts as a linkage between the metabolisms of the major digestive nutrients i.e. carbohydrates, lipids and proteins.
It can yield the energy by part initiating the Krebs cycle, excessively produced by the catabolism of the fatty acids [5]. Serine acetylation is less targeted in literature; however, it is of great importance biologically due to its participation in human metabolites and drug development. Thus, to study its physico-chemical properties, identification of serine acetylation sites is very important. But, the in vitro, ex vivo and in vivo identification can be laborious, time-taking and costly [6], [7]. This fact requires an efficient and accurate computational model to help researchers and biologists identifying these sites, easily. Modern deep learning offers a very powerful framework for solving learning problems [8]- [10]. When a Deep Neural Network (DNN) is sufficiently trained on input / output pairs of sequences, resultant output label is given by output layer of the model using classifiers such as logistic regression or softmax [11]. DNNs do not have the limitations of feature learning prior to classification because the deep model can automatically learn the optimal, lowdimensional feature representation from hierarchical non-linear transformations of original PseAAC sequences and these abstract, task specific deep neural representations are used by the output layer, which is usually composed of any classifier like logistic regression or softmax [12]. Although we used simplest DNNs in this study, some advanced deep learning techniques are proposed by [13], [14]. In this study we have adopted chou's five rule with some amendments. The adopted methodology (shown in figure 1) takes advantage of DNNs' inherent feature learning capabilities to learn significant features of constituent Pseudo Amino Acid Compositions (PseAAC) of peptide samples [15].

II. MATERIALS AND METHODS
The adopted methodology in this study, shown in fig 2, is derived from the five-step rule of Chou, popular in proteomics research [16]- [18] and used in our previous works [6], [7], [19]- [21]. To identify A-serine sites more comprehensively, we used the intrinsic capabilities of DNNs' feature extraction and classification. Multiple models, developed using Deep neural Networks, were trained and tests were performed using standard model evaluation metrics to achieve an effective predictor of A-serine sites.

A. COLLECTION OF BENCHMARK DATASET
We utilized the advanced search and annotation features in UniProt [22] to produce a dataset for conducting the proposed study. The benchmark dataset's consistency has been ensured by choosing experimentally verified protein sequences. The non-redundant protein sequences were used for extraction of positive and negative sequences. The PseAAC representation of a peptide sequence containing a positive A-serine site may be described as follows: where 'S' denotes PTM site for A-serine and 'r' represents each of the neighbor amino acid residues of the positive site. Respectively, the letter κ describes the indexes of PseAAC sequence residues, where the left-hand residues of A-serine site are located at negative κ indexes, and the right-hand residues are located at their respective positive κ indexes. To develop a benchmark dataset, the length ξ for both negative and positive samples were extracted from aforementioned non-redundant protein sequences. Based on empirical observations and literature support [7], [19], the length ξ is set at 41 for negative and positive samples equally. Each positive sample is created via setting the index κ of the Aserine site at 21 and collecting 20 left and 20 right neighbor residues of the positive side, which resulted in the standard ξ length sequence. For sequences with ξ < 41, a dummy residue symbol X is placed on both sequence sides to obtain the standard length. Similar approach was utilized to develop the negative samples from experimentally verified proteins, where the only difference is the presence of non-A-serine Lysine at sequence index κ = 21 rather than Aserine site. The above sample preparation procedure provided 2536 positive and 80568 negative sequences of length ξ. We applied USEARCH [23] with 70% threshold on positive and negative data to reduce redundancy. The positive samples were severely reduced to only 387 samples while negative samples were reduced to 18402. Considering the very low positive samples after removing homologous sequences, we exempted the positive samples from this application and negative dataset was selected from redundancy reduced negative samples to be five times of positive samples as opted by Ju and wang [24]. This process resulted in 12681 negative sequences. The final dataset, which consists of a total of 15217 samples, is shown as follows: In above mentioned equation, S + represents a positive sample, containing 2536 samples while S − represents a negative sample, containing 12681 samples. The class proportions of both groups were 20% and 80% respectively for positive and negative samples maintaining 1/5 ratio. The dataset can be accessed at https://mega.nz/folder/44dhTIrb#sSBFuy95CdrYvsnt8-iLoQ. Authors in [25] have suggested two-sample logo that is created to visualize residues that are substantially depleted/ enriched in the collection of A-serine fragments to help develop understanding about sequence biases around Aserine sites. As shown in Fig 3, the benchmark dataset twosample logo comprises forty-one residues, twenty upstream and twenty downstream, from all Serine (A-seine and non-A-serine) sites present in experimentally validated proteins. The positive sample contains 2536 samples consisting of experimentally confirmed A-serine sites, while the negative sample contained remaining non-redundant Serine sites from negative group. There were significant differences in the enriched region (containing A-serine sites) and depleted region (containing non-A-serine sites). M, H, N and Y were more frequently observed in the depleted position, while A, G and S were more regularly noticed in the enriched region. Furthermore, no stacking residues were discovered for positions -20 to -2 in the enriched region. Multiple amino acid residues were discovered stacked at certain over-or under-represented positions in the negative sequences, meaning that there is a substantial difference between the positive and negative samples. The findings show that more task-specific and non-linear features are needed to differentiate between both groups of samples. Task-specific features mean the features are robust to the variations which does not help the output layer for classification while sensitive to the variations, which are indeed useful for the classification of problem at hand. The features learned through linear operations have relatively limited capability to learn the representations as compared to non-linear features [26] because linear features are essentially the linear combinations of initial input.

B. SAMPLE ENCODING
DNNs require input sequences in the form of quantitative data to process. A simple quantitative encoding of the PseAAC sequences was utilized to minimize the encoding    Table 1. Quantitative encoding is done according to Table 1, where the first row shows IUPAC amino acid symbols, and the corresponding integer in the second row defines the encoding used for the sample. A desirable outcome of this encoding technique is the minimal effect of encoding on the final outcomes. The benchmark dataset has been divided into a training set of 10651 samples and a testing set of 4566 samples with a 70/30 ratio. However, both training and testing sets maintained the original class ratio.

C. CANDIDATE DEEP MODEL TRAINING AND OPTIMIZATION
This section focuses on describing the DNNs architecture and optimization utilized to develop A-serine site prediction candidate models. This study has employed commonly used neural network architectures like 'Standard Neural Networks (FCNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs) with simple units, Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) units, respectively. For DNN optimization, we applied the Randomized Hyperparameter search methodology employed in [27] to maximize the effectiveness of DNN candidate VOLUME 4, 2016  models. A randomized search over large hyperparameter space presents better hyperparameters for DNNs with a finite number of computations. In this strategy, Hyperparameters are randomly sampled, and models created using these parameters are evaluated. The following subsections present a quick overview of each DNN architecture that is utilized to predict the A-serine sites.

1) Standard Neural Network
A standard neural network (FCN) is composed of layers of neurons in a manner that each neuron in the previous layer is associated with all neurons in the following layer. The FCN is aimed to estimate the learning function f * where f * is a classifier described as y = f * (α, x) and use appropriate parameters α to assign appropriate category label y to input x. The FCNs' task is to discover the optimal set of parameters α so the y = f * (α, x) mapping provides the best possible approximation to f * . To predict A-serine sites, an FCN architecture comprising of two dense layers of 20, 10 rectified linear neurons (relu) respectively is used, as shown in Table 2, along with a dropout layer to minimize over-fitting. A single Sigmoid neuron served as the output layer for the binary classification task. The FCN architecture is illustrated in Fig  4. Stochastic gradient descent (SGD) optimizer is used to train the model, with a learning rate of 0.01 via minimization of negative logarithmic loss. The training set was further divided into a training set and a validation set with a ratio of 70/30 for FCN based A-serine predictor training. It is important to note that the test set, to evaluate the resulting A-serine site prediction models' generalization capability, was never shown during the training phase to FCN and other DNNs. After the model was successfully trained, the evaluation was done using the benchmark test set, and the performance was assessed by utilizing well-known measurement metrics.

2) Recurrent Neural networks
A shortcoming of traditional DNNs is that the weights are learned by individual neurons which preclude the DNNs from identifying exact representations that occurred at different locations in sequences. An RNN circumvents the restriction via utilizing a repeating loop over timesteps to resolve the problem mentioned above. A sequence vector x 1 , . . . , x n is manipulated utilizing a recurrence of the form where learning function is denoted by f, α is a set of parameters applied at each time step t and x t is the input at timestep t. Three variations of recurrent neurons i.e., a simple RNN unit, a gated recurring unit (GRU), and the LSTM unit are used to develop the candidate RNN based models for the proposed study. The shared architecture of three RNNs is shown in Fig 5 where the green circles of RNN show recurrent cells while red squares show timesteps i.e. residue vectors of peptide sequence being classified by the model. At each timestep in a simple recurrent neuron, the weights governing the connections from input to the hidden layer, between previous activation a t−1 & current activation a t , and from the hidden layer to the output layer, are shared. A basic recurrent neuron's forward pass can be expressed as follows: Where g reflects an activation function, 't' represents the current timestep, X t outlines input at timestep t, b a defines the bias, W a presents cumulative weights and the activation output of timestep t is denoted by a t . If needed, this a t activation could be employed to measure the y t forecasts at time t. Table 3 demonstrates the RNN structural design with the simple RNN neurons. This model uses an embedding layer to map amino acid sequence in vector space R 20 , and transform the semantic relationships into geometric relationships. The following layers of the DNN model interpret these sequence vectors' geometric relationships to learn deep feature representations, which are evaluated by the output layer to render predictions. To make predictions, output layer is developed using a single sigmoid unit. Even Though DNNs with simple RNN neurons enjoy favorable outcomes in several applications, they remain susceptible to vanishing gradients and demonstrate a limited capability to learn longterm dependencies. The research community has provided several modified recurrent neuron architectures to overcome the simple RNN neurons' drawback. Well-known architectures include the Gated Recurrent Unit (GRU) proposed by [28] and the LSTM method presented by [29] to resolve the problem of vanishing gradients and to allow long-term dependencies to be learned. Cho et al. [28] presented GRU, which is capable of showing better performance for longterm relationship learning in sequence data. The memory variable H t , which contains the running summary of samples seen by the neuron till timestep t and is given by H t = a t is used by the GRU unit at each stage t, which provides an updated list of the entire samples processed by the unit. Hence, the GRU unit considers overwriting the H t at each timestep t, but the regulation of memory variable overwriting is implemented via the update gate Γ u , when the GRU unit  superimposes the H t value at each step 't' with the candidate valueH t . GRU neuron functionality can be represented via the following series of equations: Where W r , W c and W u represents the respective weights and b r , b c and b u denote the subsequent bias terms for input X t at timestep t. σ is the function of logistic regression, and the activation value at timestep t is represented by a t . Except for the usage of GRU neurons, the implemented RNN model developed with GRU is like that of simple RNNs. Table  4 presents the GRU-based RNN model architecture for Aserine site identification.
As mentioned earlier, Hochreiter et al. [29] have proposed the LSTM neuron with some improvements to the design of SimpleRNN unit, which provides a more robust generalization of GRU. Prominent variations in LSTM and GRU cells are illustrated as follows:   to the activation a t at time t. Moreover, the Model using RNN-LSTM approach is constructed with similar architecture as GRU and simple RNN models. The only difference is that of LSTM units in recurrent layers. Table 5 shows the model's architecture that used LSTM neurons based RNN to build the A-serine site identification model.

3) Convolutional Neural Network
Convolutional Neural networks are designed to handle learning problems involving large input data with complex spatial structures such as image, video, and speech signals. CNNs try to learn hierarchical filters which can transform large input data to accurate class labels using minimal trainable parameters. This is accomplished by enabling sparse interactions between input data and trainable parameters through parameter sharing to learn equivariant representations (also called feature maps) of the complex and spatially structured input information [30]. In a Deep CNN, units in the deeper layers may indirectly interact with large portion of input due to usage of pooling operations which replaces the output of Net at a certain location with a summary statistic and allows the network to learn complex features from this compressed representation [30]. The so-called 'top' of the CNN is usually composed of a bunch of fully connected layers, including the output layer, which uses the complex features, leaned by previous layers, to make predictions. The CNN-based architecture of the A-serine site identification approach is shown in Fig 6. Each peptide sample x ∈ R with a length of ξ = 41 was translated via the embedding layer to achieve X ∈ R ( η * ξ)tensor where η ∈ R 20 is the symbol vector in R 20 of every amino acid residue. Each convolution block consists of two convolution layers and a sub-sample layer. The convolution layer of the first block consisted of 6 1-D convolution units and the convolution layer of the second block used 16 convolution units. By averaging the complete feature map, the GlobalAveragePooling layer is flattened into a one-dimensional array of 16 scalars used in the output layer to predict the markings. A Dropout layers, proposed by Srivastava et al. [31], is employed to reduce the overfitting during the training phase. The output layer is formed using single sigmoid unit to enable binary classification. The detail of trainable parameters for CNN is elaborated in Table 6.

III. RESULTS
In this section, DNNs-based model performance is evaluated using well-known evaluation metrics and discussed in detail. The critical evaluation metrics employed in this study include the receiver operating characteristics learning curve (ROC), precision-recall, Area under Curve, accuracy, and matthew's correlation coefficient to name a few. The five proposed DNNs models are evaluated and tested on the testing data set, which is not exposed to models throughout the training process, to guarantee fair estimation of generalization capability. All the above-mentioned metrics stem from the confusion matrix, which is composed of the following measures:

A. PRECISION-RECALL CURVE AND MEAN AVERAGE PRECISION
When considering the identification models' evaluation, recall and precision are considered crucial measures. Recall evaluates the classifier's sensitivity to positive samples and is depicted by the ratio of correct positive predictions and total positive samples in the test. At the same time, precision evaluates the relevance of the predicted positive samples and is calculated as the ratio of correct positive predictions to total positive predictions. A high precision and recall ranking indicate that the predictions made via model for the positive class contain a high percentage of true positives (high-Precision), together with identification of majority of positive class samples in the dataset (High-Recall). A precision-recall curve is determined by plotting precision and recalls against each other, and it evaluates the proportion of positive identifications that are true positives [32]. In precision-recall space, the closer a predictor's score is to the ideal classifier point (1,1) the better it is and contrariwise. Fig 7 illustrates the precision-recall curve of candidate DNN based predictors.

B. RECEIVER OPERATING CHARACTERISTICS AND AREA UNDER CURVE
A receiver operating characteristics (ROC) is a method for organizing, visualizing, and selecting classification models based on their performance [33]. Additionally, it is insensitive to changes in class distribution and especially useful for problems involving skewed class distributions [33]. The ROC curve illuminates, in a sense, the cost-benefit analysis under evaluation of the classifier. The false positive (FP) ratio to total negative samples is defined as the false positive (FP) rate and measures the negative examples misclassified fraction as positive. This is considered a cost as any further action taken on False Positive is considered a waste of resources. True positive rate, defined as the fraction of correctly predicted positive samples, can be considered an advantage due to the fact that correctly predicted positive samples assist in solving the problem being examined more effectively. RoC curve is created by plotting the False Positive Rate with True Positive Rate. In ROC space, point (0,1) represents the perfect classifier because this point depicts FPR of 0 with TPR of 1. The closer a curve is to this ideal point, the better the performance and contrariwise. The ROC curves for the proposed five DNN based A-serine predictors, built in this study, are introduced in Fig 8. It can be seen from the aforementioned figure that the curve of the CNN-based predictor is nearest to the perfect classification point as compared to that of remaining DNN based models, demonstrating the better performance of the CNN-based model. Additionally, the ROC curve can be represented as a scalar value using Area under ROC curve (AUC). The AUC is the indicator of a classifier's capability to differentiate between classes, and it is employed as an ROC curve summary. AUC reduces the effects of the ROC curve to a single value and highlights mathematical insights into the success of the model. AUC is equal to the probability that a randomly chosen positive sample will be classified higher than a randomly chosen negative instance by the classifier. Moreover, AUC is similar to the Wilcoxon rank test [33]. The greater the AUC score, the better the model distinguishes the negative and positive samples [34] and vice versa. The proposed five models' AUC values in this analysis are presented in the Legend portion of

C. ACCURACY, F1-SCORE, AND MATTHEW'S CORRELATION
Accuracy is the most widely used metric for models trained using balanced datasets. This indicates the fraction of correctly estimated samples to the overall number of samples under evaluation for the model. Fig 9 shows the accuracy scores for the prediction models of A-serine sites, determined from an independent test set. As depicted in Fig 9, out of five proposed models, the CNN-based model achieved an accuracy score of 0.856 followed by a 0.83 score of LSTM based RNN model. Although accuracy is a popular standard measure, It has its drawbacks; mainly when there is a class imbalance in samples, Accuracy does not provide the true picture of model performance. Due to this reason, Accuracy is often used along with other measures like F1 score or Matthew's correlation coefficient. F1-score is frequently employed in circumstances where an optimum integration of precision and recall is necessary. It is the harmonic mean of precision and recall scores of a model. Matthews Correlation Coefficient (MCC) is another point metric that is an efficient solution to overcome class imbalance problems prevailing in accuracy and other binary identifications model evaluators [34], [35]. Matthews first developed MCC in 1975 to compare chemical structures [36].
Later in year 2000, Baldi and colleagues promoted MCC as an assessment metric for binary identification models which can be easily extended for multi-class identification models [37]. The MCC is a more robust statistical metric that produces a high score only if the classifier obtained good results for all four confusion matrix measures (true positives, false negatives, true negatives, and false positives) proportionate to both positive and negative class size in the test dataset. Fig 9 illuminates

D. MODEL DEPLOYMENT AS WEBSERVER
Final step of Chou's 5-step rule as shown in Fig 1  is the deployment of developed model as a web service to enable easy access for research community. To this end, we developed a web application based on our best performing CNN based model for identification of A-serine sites. The webserver is deployed using streamlit public server at https://share.streamlit.io/sheraz-n/iacesdeep/main/app.py while the server code is available at https://github.com/sheraz-n/iAceS-Deep repository. The web application can accept a peptide sample in the form of string and return the identified Serine sites likely to become Aserine. Homepage of iAceS-Deep webserver is shown in Fig  10a while Fig 10b highlights the peptide sequence submission process for computing A-serine sites.

IV. COMPARATIVE ANALYSIS AND DISCUSSION
A specific method for predicting the Serine Acetylation was not available in literature, so we compared the results with the few recently proposed predictors of Acetylation [38]- [41] shown in Table 7. Chen et al. [38] constructed nine models based on feature selection and optimization for Acetylation prediction in different species and their best model for Acetylation prediction was on B.Subtilis with AUC score of 0.94 and mAP score of 0.8. Xu et al. [39] used RankSVM algorithm to develop an Acetylation predictor with 73.86% accuracy. Qiu et al. [40] fused PseAAC and functional domain annotations to devise computational method for Acetylation prediction and their model achieve 77.10% accuracy and 0.84 AUC score. Another acetylation sites predictor was devised by Ning et al. [41] using cascade SVM and two-step feature extraction which achieved 74.45% accuracy and 0.75 mAP. The comparison, shown in Table 7    comparing the resulting paired AUCs using Delong's method [42] of non-parametric comparison of two or more RoC curves. We used the fast implementation of Delong's method by Sun et al. [43] to calculate the p-values by comparing each AUC with random classifier. We also constructed the 95% Confidence interval using AUC for DNN based predictors developed in this study by comparing each of them with a random classifier. AUC scores, Delong p-value scores, and 95% confidence Intervals of AUCs along with accuracy are shown in Table 8. The reader may bewilder why the DNNs are performing better than the previous conventional machine learning approaches? For understanding deep feature representations of peptide sequences, learned by DNNs to predict ADMA sites, visualizing these feature spaces can provide an intuitive understanding of why these feature representations work. To create these visualizations, we calculated the output of penultimate layer of each trained model using testset peptide sequences and extrapolated the 2-D projections of the same using t-stochastic neighborhood embeddings (t-SNE) algorithm, developed by Maaten and Hinton [44]. T-SNE makes use of non-linear statistical approach to extrapolate 2-D projections of deep features calculated from non-linearly transformed input peptide sequences. T-SNE uses many hyperparameters including perplexity, initialization and iterations to develop the projections in lower dimensions. Since our testset contained only 4566 samples with maximum 41 dimensions for raw sequences and 8 dimensions for deep representations, the recommended range for perplexity is 0-50. We used default perplexity value of 40 for scikitlearn t-SNE implementation [45], used PCA initialization for efficient dimensionality reduction and fixed the iterations to 1000 for calculating the 2-D projections of deep features. While labeling classes, we followed the convention of Binary Classification using machine/Deep Learning, in which the negative class is depicted by '0' and positive class is depicted by '1'. The developed 2-D projections of deep models were plotted on the basis of class labels using matplotlib and seaborn package of python. Fig 11a shows Table 7 use different feature extraction and selection techniques which require domain knowledge and human expertise while current study learn representations from raw PseAAC sequences and does not require domain expertise and human intervention. In addition, the proposed deep models in this research show only a first step towards utilizing DNNs and additional research can build on work presented in this study to improve prediction of Acetylation sites of Serine or other amino acids.

V. CONCLUSION
In this study we proposed a new predictor for computational identification of the serine acetylation site of proteins by learning representations using deep neural networks from Chou's Pseudo Amino Acid Composition (PseAAC). Identifying in vitro, ex vivo and in vivo can be tedious, time consuming and costly and requires supplemental computational methods to reduce the cost of site identification. For this task, we used well-known DNNs to learn feature representations of peptide sequences and perform classification. Among the DNNs, used in this study, convolutional neural networks and standard neural network show the best performance with accuracy score of 99% on independent data and other performance evaluation measurements corroborated the results. Based on these results and performance comparison with notable research contributions, it is concluded that the proposed predictors will help scientists identify serine acetylation in a very efficient and accurate way to understand the mechanism of this protein modification. He has authored over 20 ISI and Scopus journal/conference papers, books, and book chapters. He is the lead editor for an edited book shortly be published by Wiley entitled "A Comprehensive Guide to IPTV Delivery Networks". Besides, He is a member of different professional bodies as IEEE, IACSIT, IAENG, and Institute of Research Engineers and Doctors, USA. He is a reviewer in many international impact-factor journals, and also technical committee program member in a number of international.
AMGAD MUNEER received his B. Eng degree (with honors) in Mechatronic Engineering from Asia Pacific University of Technology and Innovation (APU), Malaysia in 2018. Currently, pursuing Master in Information Technology in Universiti Teknologi PETRONAS, Malaysia. His research interests focus on Machine Learning, Image Processing, Internet of Things, Machine Vision, Robotics and Automation. He has authored several ISI and Scopus journal/conference papers. He is a reviewer in some international impact-factor journals such as Journal of Combinatorial Optimization and several IGI global journals.
RAO FAIZAN ALI received the bachelors degree in computer science from COMSATS University Islamabad, Pakistan, and the M.Phil. degree in computer science from the University of Management and Technology, Lahore, Pakistan. He has eight years of experience in teaching and research. He has been with various computer science positions in financial, consulting, academia, and government sectors. He is pursuing Ph.D. degree with University Technology Petronas, Malaysia. He is currently working as a research officer in the Department of Computer and information Sciences, University Technology Petronas, Perak Malaysia.