OPCNN-FAKE: Optimized Convolutional Neural Network for Fake News Detection

Recently, there is a rapid and wide increase in fake news, defined as provably incorrect information spread with the goal of fraud. The spread of this type of misinformation is a severe danger to social cohesiveness and well-being since it increases political polarisation and people’s distrust of their leaders. Thus, fake news is a phenomenon that is having a significant impact on our social lives, particularly in politics. This paper proposes novel approaches based on Machine Learning (ML) and Deep Learning (DL) for the fake news detection system to address this phenomenon. The main aim of this paper is to find the optimal model that obtains high accuracy performance. Therefore, we propose an optimized Convolutional Neural Network model to detect fake news (OPCNN-FAKE). We compare the performance of the OPCNN-FAKE with Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and The six regular ML techniques: Decision Tree (DT), logistic Regression (LR), K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB) using four fake news benchmark datasets. Grid search and hyperopt optimization techniques have been used to optimize the parameters of ML and DL, respectively. In addition, N-gram and Term Frequency—Inverse Document Frequency (TF-IDF) have been used to extract features from the benchmark datasets for regular ML, while Glove word embedding has been used to represent features as a feature matrix for DL models. To evaluate the performance of the OPCNN-FAKE, accuracy, precision, recall, F1-measure were applied to validate the results. The results show that OPCNN-FAKE model has achieved the best performance for each dataset compared with other models. Furthermore, the OPCNN-FAKE has a higher performance of cross-validation results and testing results over the other models, which indicates that the OPCNN-FAKE for fake news detection is significantly better than the other models.


I. INTRODUCTION
In recent years, the ability of a user to write anything on online news platforms such as social media and news websites newspapers has led to the propagation of misleading information [1]. Online social media platforms (Twitter, Facebook, Instagram, YouTube, etc.) have become the primary source of news for people around the world, particularly in developing nations. Therefore, anyone from anywhere in the world can use popular social media and social networking as platforms The associate editor coordinating the review of this manuscript and approving it for publication was Cheng Chin .
to publish any statement and spread fake news through various networking sites to achieve various goals, which may be illegitimate.
We are currently experiencing significant ramifications for society, business, and culture as a result of the increasing use of social media, which have the potential to be both detrimental and beneficial [2].
Fake news is widely regarded as one of the most severe dangers to global commerce, journalism, and democracy, with significant collateral harm. The stock market suffered a $130 billion loss as a result of a false news story saying that the US President Barack Obama had been injured in an VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ explosion [3]. According to statistics published by Stanford University academics, 72.3 percent of fake news originates from official news outlets and online social media platforms [4]. Because of the negative impact of fake news on society, and as fake news is widely regarded as one of the most serious challenges to global commerce, media, and democracy, posing significant societal harm to them, it is critical to build effective fake news detection systems.
With the rapid advances in Artificial Intelligence (AI), a significant number of experiments are being undertaken to tackle issues that were never addressed in the framework of computer science, such as fake news detection [5]- [8]. Automatic detection approaches based on Machine Learning (ML) have been studied to combat the emergence and dissemination of false news. The majority of fake news detection systems utilize ML approaches to help consumers in filtering the content they are seeing and determining if a given news piece is misleading or not [5], [9]. Deep Learning (DL) techniques recent accomplishments in difficult natural language processing tasks make them viable for detecting fake news effectively and efficiently. Creating automatic, trustworthy, and accurate systems for identifying fake news on social media is a trending topic of research. The process of determining if a certain news item on any field, from any social media domain, is purposefully or inadvertently misleading might be characterized as fake news detection [6]- [8]. Convolutional Neural Network (CNN) has been prominent in many fields with the best performance, including computer vision [10], smart building structures [11], and natural language processing [12]. CNN uses convolution layers, pooling layers, and fully connected layers to extract more features with high-level and low-level features. Therefore, we have proposed an Optimal CNN model for Fake news detection (OPCNN-Fake) that can extract high-level and low-level features from the dataset to detect fake news, and it has registered the best performance compared with others models.

A. MOTIVATION AND CONTRIBUTION
Fake news has an effect on journalism, global commerce, and democracy, with significant collateral harm. Fake news detection is an area of artificial intelligence that has attracted the curiosity of researchers from all over the world. Unfortunately, regular ML techniques have not provided significant performance for the detection of fake news. On the other side, DL is more efficient for extracting features of fake news detection than Regular ML due to its capability to deep extraction of high and low levels. In this paper, we propose an efficient OPCNN-FAKE model based on optimized CNN for detecting fake news. Furthermore, CNN can extract more features by using different layers. The contributions of this study are as follows: We propose the OPCNN-Fake model for detecting fake news; the proposed model uses various layers to extract high-level and low-level features. Also, we optimized OPCNN-FAKE by selecting the best values of OPCNN-FAKE's parameters in each layer using the hyperopt optimization technique. In addition, we utilized four benchmark datasets, where each one is divided into a 20% testing dataset and an 80% training datset. We evaluated the performance of the OPCNN-FAKE model based on accuracy, precision, recall, F1-measure. Furthermore, we compared the performance of OPCNN-FAKE with different models, DT, RF, SVM, NB, LR, KNN, RNN, and LSTM. Then, we registered the results for cross-validation (training set) and the testing set (unseen data). The experimental results demonstrated the effectiveness of the OPCNN-FAKE significant performance compared to other models. And this shows that our OPCNN-FAKE can effectively and efficiently detect fake news with high level of accuracy.

B. PAPER ORGANIZATION
The remainder of the paper is organized as follows: Section II reviews related works on fake news detection. The proposed OPCNN-FAKE model in this paper is presented in Section III. The experimental results, as well as a comparison to the baseline categorization and discussion are presented in Section IV. Finally, Section V provides a summary of the paper.

II. RELATED WORK
This section covers a variety of machine learning algorithms for detecting fake news. Jing [13] have proposed model-based to build hidden representations that capture changes in contextual information in relevant posts over time. They conducted experiments using 5 million postings that were collected from Twitter and Sina Weibo microblogs. They made a comparison between DT, RF, SVM, LSTM and Gated Recurrent Unit (GRU), and RNN. On the same dataset, another study developed a hybrid DL model. Ruchansky et al., [14] proposed a model which includes three modules: Capture, Score, and Integrate (CSI). The capture module has used LSTM and RNN to extract from particular article mundane patterns of user activity.
Score module has used a fully connected neural network layer to capture characteristics from users' behavior. Both models have integrated with the third model to classify articles as fake or not. Shu et al. [15] released the FakeNewsNet dataset and applied different algorithms to a dataset: SVM, LR, NB, and CNN. Salem, et al. [16] used the FA-KES dataset that comprises news events around the Syrian war. There are 804 news articles in the collection, 376 of which are fraudulent. Semi-supervised with a fact-checking labeling approach were used to annotations dataset. However, the dataset can be used to train machine learning models for detecting fake news. Popat et al. [17] introduced DeClarE, an end-to-end neural network model for debunking fake news and fraudulent claims. To support or reject a claim, it uses evidence and counter-evidence gathered from the internet. The authors trained a bi-directional LSTM model with at least four different datasets and achieved an overall classification accuracy of 80%. Ksieniewicz et al. [18]  proposed decision tree ensembles diversified using the Random Subspace method to detect fake news.
Singh, et al. [19] proposed an Attention-based LSTM network that uses tweet text with thirteen different linguistic and user features to distinguish rumor and non-rumor tweets. They compared the Attention-based LSTM network with various conventional machines and DL models. The results showed that Attention-based LSTM network has achieved the best performance. Ahmed et al. [20] proposed the ISOT dataset, made compassion between six machine learning models using n-gram with two feature extraction techniques: Term Frequency (TF) and Term Frequency -Inverse Document Frequency (TF-IDF) to the ISOT dataset. Pérez-Rosas, et al. [21] also developed classification models using linguistic features such as lexical, syntactic, and semantic level features and a linear SVM to detect fake and real news. CNN have been utilised in a variety of computer vision in recent years, and they have improved the state-ofthe-art performance of a variety of visual classification tasks, such as image processing [22], face verification [23], object recognition [24], and natural language processing tasks [25].
Yang, et al. [26] proposed a model using Text and Image information based CNN (TI-CNN). They compared their model with several models such as LSTM, CNN, and GRU using two datasets. Abdullah, A., et al. [27] used CNN and LSTM to classify the fake news articles that achieved significant performance. They conducted their experiments using one Fake news dataset from Kaggle.
To detect fake news, the authors of [28] proposed a Deep Convolutional Neural Network (FNDNet) to learn the discriminatory features for fake news detection. Furthermore, the authors of [29] introduced a hybrid deep learning model that blends CNN and RNN. for the same aim of detecting fake news articles, the authors of [27] presented CNN and LSTM to categorize fake news to produce significant results. Also, the authors of [30] developed multi-level CNN, which incorporated local and global convolutional features to collect semantic information from article texts efficiently, [31]. Also, the authors of [32] focused on the substance news piece and the presence of echo chambers in the social network. Table. 1 summaries the comparison of the existing works and our proposed work. Figure 1 presents the main steps of the proposed system. It consists many steps: fake news data collection, text preprocessing, dataset splitting, features extraction methods, models training/optimization, and models evaluation. There are two approaches in the proposed system: the regular ML approach and the DL approach. In the ML approach, six ML models: DT, LR, KNN, RF, SVM, and NB are used to train and evaluate the model. Different sizes of n-gram, including uni-gram, bi-gram, tri-gram, and four-gram with TF-ID feature extraction method are used to extract features and build matrix features. Grid search with cross-validation is used to optimize the ML models. In the DL approach, the OPCNN-FAKE model is proposed and LSTM, RNN are used to train and evaluate the model. The hyperopt optimization method is used to optimize the OPCNN-FAKE, RNN and LSTM. Word embedding is used for feature extraction. Also, we compared the OPCNN-FAKE model with RNN and LSTM. Word embedding is used to build a feature matrix. Each step is described in detail as following.

A. FAKE NEWS DATASET
We trained, optimized, and evaluated models using four datasets. Each dataset was split into 80% training dataset and 20% testing dataset (unseen data). In this section, these datasets are introduced as following.

1) DATASET1
Fake News detection was collected from Kaggle [34]. There are 3988 news articles in this dataset. In addition to the body of the text, each article includes a headline and a list of URLs. There is also a class label with the values ''0'' for fake news and ''1'' for real news. Only the article body and headline can be used in models. The 1868 articles are real news, while the remaining 2120 are fake news. The statistics of the training set and testing set for dataset1 are shown in Table 2.

2) FakeNewsNet (DATASET2)
FakeNewsNet [33] dataset includes data about two topics: gossipco and politifact. Each topic includes two files. There are two files in the politifact dataset: politi-fact_real.csv, includes 432 tweets and contains samples relevant to real news. politifact_fake.csv contains 618 tweets and samples related to fake news.
There are two files in the gossipco dataset: gossip-cop_real.csv, which includes 5328 tweets and contains samples relevant to real news. gossipco_fake.csv contains 5322 tweets and samples related to fake news.
Each file includes id, URL, title, and tweet. We created a new dataset merged between four files and added a new column; the label column consists of two values: 0 belongs to fake news and 1 belongs to real news. The total number of tweets is 44280 tweets. The FakeNewsNet has been split into 80% training set and 20% testing set. The statistics of the training set and testing set for dataset3 are shown in Table 3.

3) FA-KES5 (DATASET3)
The FAKES5 [16] dataset includes 804 article news about Syrian war. Also, it includes a set of articles labeled by 0 (fake) or 1 (real). Each article has the headline, date, location, and full body of text. The 426 articles are true, and the 376 are fake. The statistics of the training set and testing set for dataset3 are shown in Table 4.

4) THE ISOT (DATAET4)
The ISOT dataset [20] consists of 44202 news articles, the 21416 news are true and 22756 of news are fake. Real news were collected from the Reuters website, and fake news were collected from Wikipedia7 and from Politifact website. Each news consists of title, text, date, and subject. The dataset includes two files: fake file and real file. We created a new dataset merged between two files and added a new column; the label column consists of two values as 0 for fake news and 1 for real news. The statistics of the training set and the testing set for dataset4 are shown in Table 5.

B. DATA PREPROCESSING
Data preprocessing is a critical step of natural language processing, such as fake news detection, as it directly impacts the model's effectiveness to the complexity of the data. Fake news datasets consist of many links, hashtags, special symbols, etc. Therefore, we applied many steps of preprocessing to each dataset. These steps are as follows.
• Lower casing: The most effective kind of text preprocessing is lowercasing, which ensures correlation within the feature set and solves the sparsity problem.
• Removal of URL's: Irrelevant links embedded into news have been removed.
• Removal of Stop Word: Stop words are small words in a language that are useless in text mining and are utilised to structure language grammar. These stop words have been filtered away, including articles, conjunctions, prepositions, some pronouns, and common terms like the, a, an, about, by, from, to, and so on.
• Stemming The stemming step is the process of changing the words into their original form. For example, the words '' Walking'', '' Walked'' and ''Walker'' will be reduced to the word ''walk''.

C. DATA SPLITTING
Using a stratified technique, each dataset is divided into 80% training dataset and 20% testing dataset (unseen dataset). The training dataset is used to optimize and train the machine learning models and deep learning models, while the unseen dataset is used to evaluate the machine learning models and deep learning models.

D. FEATURE EXTRACTION METHODS
N-gram with TF-IDF are used to extract features for the ML models and build feature matrix. To describe the context of the text, we employed several sizes of N-gram approach, ranging from n = 1 to n = 4 (i.e., uni-gram, bi-gram, trigram, and four-gram). TF-ID assigns a weight to each word VOLUME 9, 2021 representing the importance of the word in the document and corpus. Word embedding is a technique for converting text data (words) into vectors. Every word is represented as an n-dimensional dense vector, with vectors that are comparable for similar words. We used the Golve [35] for word embedding to build embedding matrix. GloVe is an unsupervised learning technique that generates word vector representations. The resulting representations highlight intriguing linear substructures of the word vector space, which are trained using aggregated global word-word co-occurrence statistics from a corpus. We utilised glove.6B.zip, which contains vectors in four different dimensions: 25d, 50d, 100d, and 200d. The embedding matrix was constructed using 200d vectors.

E. THE PROPOSED MODEL (OPCNN-FAKE)
In this section, we describe the architecture of the proposed OPCNN-FAK model as shown in Figure 2 that is used to detect fake news. Also, we describe optimization methods to select the best values for OPCNN-FAKE's parameters. OPCNN-FAK consists of six layers: an embedding layer, dropout layer, a convolutional layer, a pooling layer, flatten layer, and an output layer.
• In the embedding layer, each news is embedded at the word level and is represented as a matrix with each row corresponding to a word. It is implemented in the Keras library [36]. It has three arguments: the input-dim parameter represents the vocabulary size in the dataset, the output-dim parameter describes the vector space in which words will be embedded, and the input-length parameter describes the length of input sequences. We configured the output-dim as 200 because the length of Golve is 200d vectors and input-dim as 20000, and the input-length as 32.
• The dropout layer is an efficient regularization technique that prevents overfitting and reduces the complexity of model [37]. It receives the output of the embedding layer. We adopted the value of dropout using optimization methods range from 0.1 to 0.9.
• Convolutional layer receives the output of the dropout layer to reduce the complexity of the model. It includes a convolution filter and feature map (kernel). The convolution filter is applied to the input word matrix to produce a feature map indicating valuable input data patterns. Additionally, each filter employs the Rectified Linear Unit (ReLU) activation function [38] to identify multiple features in news. We used ReLU as the activation function in our DL. It is able to remove negative values from an activation map in a given network by setting them to zero. The most significant benefit of ReLu is the non-saturation of gradient, which considerably accelerates stochastic gradient descent convergence when compared to other activation functions [38]. Furthermore, it addresses the vanishing gradient problem and is more computationally efficient than sigmoid or tanh activation functions.
• The pooling layer uses the max operation to reduce the features in the feature map. Choosing the highest value is to capture the most significant features while reducing the amount of computation required in the next layer.
• The flatten layer has converted the text into a 1dimensional array for inputting it to the next layer.
• The output layer gets the flatten layer's output to produce the model's final output, in which the neural network model identifies the news as real or fake. It has one neuron that determines if the news is fake or not. The ADAM optimizer [39] was used in this layer, and the activation function is sigmoid [40].
For optimization method, a crucial aspect of DL solutions is the selection of hyper-parameters. Distributed asynchronous hyper-parameter optimization (hyperopt) [41] technique has been used to optimize RNN, LSTM, and the OPCNN-FAKE. Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees. For OPCNN-FAKE, we adapted sets of values for different parameters in OPCNN-FAKE: filter sizes, kernel size, pool size, dropout, batch size, and epochs. Table 6 presents the values of parameters that have been adapted for OPCNN-FAKE.

F. RNN AND LSTM MODELS
We used RNN [42], LSTM [43]. Figure 3 shows architecture of RNN and LSTM models. It consists of five layers: an embedding layer, hidden layers, dropout layer, flatten layer, and an output layer.
The embedding layer is the first layer and it is a similar layer in OPCNN-FAKE. In hidden layers, RNN [13] and LSTM [44] have been used. For each model, one layer and two layers hidden layers have been used. For each hidden layer, L2 weight regularization technique [45] has been used by adopting reg_rate value for l2. Dropout layer has been used for each hidden layer. The next layer is flatten layer that converts the text into the single long feature vector. The output layer gets the flatten layer's output to produce the model's final output, in which the neural network model identifies the news as real or fake. It has one neuron that determines whether the news is fake or not. The ADAM optimizer was used in this layer, and the activation function is sigmoid.
For optimization of RNN and LSTM models, the hyperopt optimization technique is used. We adapted sets of values for different parameters in RNN and LSTM: number of neurons, dropout, reg_rate, batch size, and epochs. Table 7 presents the values of parameters that have been adapted for RNN and LSTM.
For optimization ML models, There are many ways to optimize hyper-parameters, including grid search, random search, Bayesian optimization, hyperband optimization, gradient-based optimization, and metaheuristics optimization. Each method has its advantages and disadvantages. For example, Hyper-parameter optimization search space is not convex and not differentiable, where it is impossible to reach the global optimum. On the other hand, grid search does an exhaustive search in the hyper-parameter's search space. This allows the grid search to reach the best results compared to other techniques, especially when the hyper-parameters are not significant. As a result, we expected that this technique will achieve the best results. Grid search with stratified 10-fold cross-validation was used to select the best value for each parameter of regular ML models. Grid search is used to find the optimal hyper-parameters of a model that achieves the best performance of ML models. We define the set of values for each parameter of models. Then, the model tests all values for each parameter using stratified 10-fold cross-validation and selects the best values which achieved the best performance. In fold crossvalidation, the dataset is split into k equal divisions, with k-1 groups utilised for training and one fold reserved for testing.

G. EVALUATING THE MODELS
The accuracy, precision, recall, and F1-score of the models were used to evaluate the models. TP stands for true positive, TN stands for true negative, FP stands for false positive, and FN stands for false negative. Equations 1-4 can be found here.

A. EXPERIMENT SETUP
The experiments of this paper were conducted on a Google Colab RAM 25 GB, Python 3, and GPU. The Keras library implemented the OPCNN-FAKE, RNN, and LSTM. The scikit-learn package implemented the ML models. The hyperopt library and grid search have optimized DL models and ML models, respectively. To initialize the embedding layer, we used the 200-dimensional word vectors pre-trained in the Glove set. Four benchmark fake news datasets were split into 80% training datasets used to optimize the models and register cross-validation results, and the 20% testing datasets (unseen data) to evaluate the models and register the testing results. All the experiments were run 10 times separately.

B. RESULTS OF DATASET1
The performance of cross-validation and the testing results for ML models and DL models will be discussed in the two sections.       Table 10 shows the best values of parameters for the OPCNN-FAKE model. Table 11 presents the best values parameters for RNN and LSTM.

D. RESULTS OF DATASET2 (Fakenewsnet)
The performance of cross-validation and the testing results for ML models and DL models will be discussed in two sections. Table 12 shows the performance of cross-validation and the testing validation of applying regular ML to detest2. The performance of cross-validation and the testing results will be discussed in two sections.     Overall, SVM with Unigram is ranked the highest performance for cross-validation results than the other regular ML models. And NB with Unigram is rated the highest performance for the testing results than the other regular ML models.

E. RESULTS OF DATASET3
The performance of cross-validation and the testing results for ML models and DL models will be discussed in two sections. Table 16 shows the performance of cross-validation and the testing validation of applying regular ML to detest3. The performance of cross-validation and the testing results will be discussed in two sections.  Table 17 shows the cross-validation and the testing results of LSTM, RNN and OPCNN-FAKE for dataset3.

2) The testing results
We can see that OPCNN-FAKE has obtained the highest performance (Accuracy = 53.99%, Precision = 53.86%, Recall = 53.91% and F1-score = 53.99%), while LSTM with two layers obtained the lowest performance (Accuracy = 47.26%, Precision = 47.17%, Recall = 47.32% and F1-score = 47.26%). Overall, OPCNN-FAKE is ranked the highest performance for cross-validation results and the testing results compared with the other regular ML models and DL models (RNN and LSTM). Table 18 shows the best values of parameters for the OPCNN-FAKE model. Table 19 presents the best values of parameters for RNN and LSTM.

G. RESULTS OF DATASET4
The performance of cross-validation and the testing results for ML models and DL models will be discussed in two sections. Table 20 shows the performance of cross-validation and the testing validation of applying regular ML to detest4. The performance of cross-validation and the testing results will be discussed in two sections. Overall, RF with Tri-gram is ranked the highest performance for cross-validation results than the other regular ML models. And NB with Four-gram is rated the highest performance for the testing results than the other regular ML models.

1) Cross-validation results
We can see that OPCNN-FAKE has obtained the highest performance (Accuracy = 100%, Precision = 100%, Recall = 100% and F1-score 100%), while RNN two layers has obtained the lowest performance (Accuracy Overall, OPCNN-FAKE is ranked the highest performance for cross-validation results and the testing results compared with the other regular ML models, and DL models (RNN and LSTM).  Figure 4 and Figure 5 illustrate the experimental results in the broad picture for the cross-validation performances and the testing performance for the best models, respectively, based on the results acquired in our experiments for dataset1. Overall, When compared to the other models, the OPCNN-FAKE model provides the largest cross-validation and the testing performance. While, NB with Unigram has achieved the worst cross-validation and the testing performance compared to the other models. For cross-validation results, the OPCNN-FAKE model has achieved the highest performance (Accuracy = 99.99%, precision = 100%, recall = 99.97%, and F1-score = 99.97%). NB with Unigram has    registered the lowest performance (Accuracy = 94.49%, precision = 94.79%, recall = 94.49%, and F1-score = 94.49%). RF with Tri-gram has achieved the second best performance (Accuracy = 98.75%, precision = 98.82%, recall = 98.78%, and F1-score = 98.82%). For the testing results, the OPCNN-FAKE model has achieved the highest performance (Accuracy = 97.84%, precision = 97.86%, recall = 97.84%, and F1-score = 97.84%). NB with Unigram has registered the lowest performance (Accuracy = 91.57%, Precision = 92.27%, Recall = 91.57%, and F1-score = 91.57%). RF with Tri-gram has achieved the second best performance (Accuracy = 96.97%, precision = 97.04%, recall = 96.97%, and F1-score = 96.97%).

4) THE BEST MODELS OF FAKE NEWS DETECTION FOR DATASET4
Briefly, the proposed OPCNN-FAKE model has the highest performance compared to the other models based on Accuracy, Precision, Recall, f1-score. Furthermore, it indicates that the OPCNN-FAKE model for fake news detection performance is significantly better than the other existing works that have used methods based on CNN. For instance, the authors of [27] used a dataset from Kaggle and the accuracy registered was 97.5% and in [28] the performance for CNN was only (Accuracy of 91.50%, Precision = 90.74%, Recall = 92.07, F1-Score = 91.40);and the performnce for the FNDNet was (Accuracy of 98.36%, Precision = 99.40, Recall = 96.88, F1-Score = 98.12), while OPCNN-Fake achieved the highest performance for cross-validation results (Accuracy = 99.99%, Precision = 100%, Recall = 99.97%, and F1-score = 99.97%), and for the testing results (Accuracy = 97.84%, Precision = 97.86%, Recall = 97.84%, and F1score = 97.84%).

V. CONCLUSION
This paper has introduced a fake news detection system using two approaches, namely, regular ML and DL. In DL, we have proposed the OPCNN-FAKE model that has achieved the best performance. The proposed OPCNN-FAKE model consists of six layers: an embedding layer, a dropout layer, a convolutional layer, a pooling layer, a flatten layer, and an output layer. Also, it has been optimized using hyperopt optimization technique; the different values of parameters for each layer have been adapted, and the best values that achieved the best performance have been selected. Also, n-gram with TF-ID and word embedding feature extraction methods have been used for ML and DL, respectively. We compared the OPCNN-FAKE with RNN, LSTM, and the six regular ML techniques: DT, LR, KNN, RF, SVM, NB using four fake news benchmark datasets. Each dataset has been split into 80% training dataset and 20% testing dataset. The training datasets have been used to optimize and train the models, while the testing datasets were used to evaluate the models. Also, cross-validation and the testing results have been registered showing that the OPCNN-FAKE model has achieved the best performance for each dataset compared with the other models. For dataset1, the OPCNN-FAKE model achieved the best performance for the testing results (Accuracy = 97.84%, precision = 97.86%, recall = 97.84%, and F1-score = 97.84%). For dataset2 (FakeNewsNet), in the testing results, OPCNN-FAKE model has achieved the highest performance for the testing results (Accuracy = 95.26%, precision = 95.28%, recall = 95.26%, and F1-score = 95.27%). For dataset3, OPCNN-FAKE model has achieved the highest performance for the testing results (Accuracy = 53.99%, precision = 53.86%, recall = 53.91%, and F1score = 53.99%). For dataset4, the OPCNN-FAKE model has achieved the highest performance for the testing results (Accuracy = 99.99%, precision = 99.99%, recall = 99.99%, and F1-score = 99.99%). In future, we will use our proposed model to detect COVID-19 fake news. Also, we plan to apply multimodel-based methods with recently pre-trained word embeddings (i.e., Elmo, XLNet, etc.) to handle visual information like video and images. In addition, we may use knowledge-based and fact-based approaches to detect fake news. We will also expand our planned dataset to include data from additional languages.