Intelligent Question Answering in Restricted Domains Using Deep Learning and Question Pair Matching

With the rapid expansion of the Internet, intelligent question answering for information retrieval has once again gained widespread attention. However, current question answering models mainly focus on the general and common-sense questions in open domains and are incapable to effectively solve more complex professional domain questions. This paper proposed an integrated framework for Chinese intelligent question answering in restricted domains. The proposed model fused convolutional neural network and bidirectional long short-term memory network which performs efficient semantic analysis on the question pairs to extract more effective features of the text. Meanwhile, the coattention mechanism and attention mechanism were combined to obtain the semantic interaction and feature representation of the question pair for providing complete information for subsequent calculations. In addition, we introduced the method of question pair matching to implement the Chinese intelligent question answering in a restricted domain. Experiments were tested and evaluated on the open-source CCKS2018 dataset and our private self-built inverted pendulum control question answering (IPC-QA) dataset for automation control virtual learning environment. Experimental results confirm that the proposed models are efficient and achieve a high precision of 0.86042 and 0.8031 on CCKS2018 and IPC-QA respectively.


I. INTRODUCTION
As the data information of the Internet grows exponentially, people have an increasing number of ways to obtain information on the Internet, like Baidu, Google and other traditional search engines providing people with numerous information of varying quality [1]. Intelligent question answering is applied to websites, various online communities, and chatbots to offer more accurate and relevant information. Intelligent question answering is generally defined as that the user types the question in natural language form, and the question answering system outputs a streamlined answer or a list of possible answers through the model based on the background knowledge base, rather than returning a bunch of related The associate editor coordinating the review of this manuscript and approving it for publication was Luigi De Russis .
documents to the user [2]. As the current intelligent question answering mainly focuses on the open domains, for general and common-sense questions, there are main methods based on surface pattern matching, syntactic structure comparison, massive data redundancy, logical reasoning of answers, and multi-feature statistical machine learning. These methods have defects such as incomplete matching and insufficient semantic connection, while deep learning can effectively solve the above problems.
Most of previous intelligent question answering systems used machine learning to analyze and retrieve texts with applying the knowledge base to answer common questions [3]- [5], which required a large amount of annotation data, and could not effectively deal with questions with complex semantic structures. At the same time, machine learning has a poor representation for the semantic information relevance of questions and answers. With the re-development of artificial intelligence, deep learning has become the major research method in intelligent question answering. Mikolov et al. used a neural network model to obtain a new vector representation called word embedding [6], which is a low-dimensional, dense, continuous vector representation and contains both the semantic and grammatical information of the words. Deep neural networks such as recurrent neural network (RNN) have outstanding performance in uncertainty modeling, which can be applied to tasks such as video captioning [7]. On the basis, relevant researchers designed deep neural network models to acquire the vector representation of sentences, such as sentence modeling by RNN and convolutional neural network (CNN) [8], [9]. At the AAAI 2016 conference, in order to extract valuable social information in addition to text content, Fang et al. (2016) added heterogeneous social network to RNN to import rich user interaction information, effectively solving the problem of semantic sparsity in intelligent question answering [10]. On a deeper level, answers' authority of the given question also plays an important role in question answering. In [11], Zhao et al. (2017) formulated a community question answering system from the viewpoint of asymmetrical multi-faceted ranking network learning, providing a certain idea for professional question answering in restricted domains. Nevertheless, most of these models did not pay much attention to the important information in the texts. At the same time, long short-term memory network (LSTM) performs well in image and video tasks due to its sensitivity to timing information, which provides inspiration for the application of LSTM in natural language processing [12]. In recent years, bidirectional long short-term memory network (BiLSTM) combined with the attention mechanism has been widely concerned by researchers [13]. BiLSTM network can better construct the vector representation of the answers based on the input questions, but the model ignores the influence of the answers on the question vector representations and results in certain result deviations. On the other hand, at present the open domain intelligent question answering has made remarkable progress in the field of deep learning research. However, questions in a restricted domain are difficult to process because the related knowledge base contains specialized and complex specific corpus data of the corresponding domain, which is a bottleneck hindering the development of intelligent question answering. As the knowledge system has strong logic and truth in a particularly professional field, the standard answers corresponding to the questions are mostly limited and unique, and the questions corresponding to a specific answer have multiple representations. The method of modeling and matching between question and question is used to solve the dilemma of one answer corresponding to multiple question representations in a restricted domain. The main advantages are as follows: (1) The stability of corpus. In the restricted domain where the underlying knowledge base or dataset domain is vertical and closed, the question corresponds to the answer one to one, and the data size is delineated in an inductive range, while the user-input questions have different forms and expressions with a tremendous number [14]. By modeling the question pair, the input question is matched with the existing question, then the corresponding answer output is obtained.
(2) Semantic space aspect. There may be a semantic gap when coding between the questions and answers. Whereas when multiple questions correspond to a same answer, their starting point and form in the learning process are not far from each other, thus their semantic space is consistent. (3) On-line running speed. The questions corresponding to similar dense vectors in the model can be indexed with tools such as artificial intelligence markup language (AIML) through the idea of question pair matching, saving computing resources and increasing the running speed.
The main contributions of this paper are summarized as follows: (1) An integrated framework for Chinese intelligent question answering in restricted domains was presented based on CNN-BiLSTM network, coattention mechanism and attention mechanism. We fused CNN and BiLSTM to extract and represent important textual information and contextual information. The coattention mechanism and attention mechanism were combined to obtain the semantic interaction and feature representation of the question pair for providing complete information for subsequent calculations. Additionally, we introduced the method of question pair matching to implement the Chinese intelligent question answering in a restricted domain. The proposed methods can not only effectively achieve a high accuracy, but also improve resource utilization.
(2) We constructed an inverted pendulum control question answering (IPC-QA) dataset for automation control virtual learning environment to test our proposed methods. The IPC-QA is a typical restricted-domain teaching question answering dataset and contains 9450 question pairs, which involves the main inverted pendulum control experiments in Automatic Control Theory course.
The rest chapters of this paper are organized as follows: Section II briefly reviews related work. Section III presents the main frame model of the paper. In Section IV, the datasets and related parameter settings used in the experiments are introduced in detail. Section V is a detailed analysis and summary of the experimental results. We will summarize in Section VI to discuss further work in the future.

II. RELATED WORK
The introduction of the Question Answering Track (QA Track) in the Text Retrieval Conference (TREC) in 1999 has greatly promoted the research and development of natural language processing (NLP) technology in the field of question answering, and numerous scholars have proposed many effective models [15]- [18]. Based on predecessors' work, this section will emphasize the research methods of Chinese intelligent question answering in restricted domains and the advantages of the question pair matching idea, CNN and BiLSTM, coattention mechanism and attention mechanism in other literatures.

A. CHINESE RESTRICTED DOMAINS AND QUESTION PAIR MATCHING
In 1961, the first intelligent question answering system in restricted domains appeared abroad. The system was built by Green and provided question answering services for the baseball field. In the following decades, the intelligent question answering in restricted domains has received extensive attention. Among them, Pathak and Mishra (2016) followed the principles of information retrieval and natural language processing to propose a question answering system for tourism containing the information about specific tourist places, with precision and accuracy of 70% and 80%, respectively [19]. Although the domestic Chinese intelligent question answering in restricted domains started late, it has achieved continuous development. Similar with [19], Sun et al. (2007) [20] focused on intelligent question answering in tourism domain, and used 5 subsystems to form InsunTourQA system, achieving question answering in the specific domain of Chinese tourism. Traditional research methods of intelligent question answering are mainly based on the syntactic matching of question and answer. Moreda et al. (2011) [21] presented two proposals for the answer extraction module of question answering system according to semantic information, semantic roles and WordNet. Zhu et al. (2011) [22] designed and implemented a Chinese intelligent question answering system based on domain ontology and sentence template, using domain ontology as the knowledge base to supply domain vocabulary to question analysis and answer generation, then conducting knowledge retrieval.   [23] improved the baseline system, combining simple rules and unsupervised learning models with deep linguistic features, to choose yes or no answers for questions in the legal domain.   [5] used a vocabulary model based on word relationships to select answer sentences. However, most of these models are derived from the idea of question and answer matching, with huge data computation and excessive dependence on external conditions. For instance, manually marking information and constantly changing external conditions require too much related work to implement. Therefore, Hao and Agichtein (2012) [24] proposed a precise approach of automatically finding an answer to input questions by automatically identifying ''equivalent'' questions submitted and answered in the past. By means of automatically generating equivalent question patterns, the model achieved over 57% recall and over 54% precision, but still did not completely solve the defects of question and answer matching. After a long period of research, Palmera and Figueroa (2017) [25] matched the intention of the newly published question with the intention of the archived answer presented to the questioner. By manually annotating the number of how-to questions and answers, the accuracy of the best answer retrieval was increased by 4.12%. Inspired by Palmera and Figueroa [25], in this paper, we used the method of question pair matching to solve the difficulty of the large number and great change of external input questions of Chinese restricted domain in the professional field, and significantly improved corresponding accuracy.

B. CNN AND BILSTM
Since Krizhevsky et al. [26] used the extended CNN in the ImageNet Large Scale Visual Recognition Challenge (LSVRC) in 2012 to achieve the best classification effect at the time, the convolutional neural network has once again set off a research boom. As a typical network structure in deep learning, CNN adopts a local perceptual strategy, which can greatly simplify the complexity of traditional neural networks [27].   [28] used CNN to conduct semantic representation and semantic matching of texts. Zhou et al. (2018) [29] combined CNN with RNN to capture the semantic matching between questions and answers and the semantic relevance embedded in the answer sequences, achieving the best performance of Macro-F1 at 58.77%. CNN can fully consider the order of sentences in the semantic representation of text and the semantic matching of text, and can take sentence semantics into account from many aspects. Yet, it only combines consecutive phrases and only considers the word order in the sliding window, resulting that it is impossible to capture long-distance dependencies and can't represent complex semantic sentences well.
In recent years, long short-term memory network has been favored to capture contextual semantic relationships [12], [13]. Palangi et al. (2016) [30] proposed a text matching model LSTM-RNN based on long short-term memory network. Cheng and Hu (2018) [14] constructed a network model using conditional random field (CRF) and LSTM network to improve information extraction performance and establish a question answering system over knowledge base for the machinery industry. Gao et al. (2019) [31] integrated attention mechanism with hierarchical LSTMs for video and image caption tasks. Inspired by the above works, we combined CNN and BiLSTM to effectively capture data and information of local keywords in the sentence. At the same time, the model can also achieve the preliminary analysis and understanding of the relevance of each word in the Chinese sentence.

C. COATTENTION MECHANISM AND ATTENTION MECHANISM
The attention mechanism is used to assign different weights to content vectors of distinct importance according to diverse scenarios, which is beneficial to extract key information in the text, thereby expressing more effectively. Researchers have conducted plentiful researches on the application of attention mechanism in intelligent question answering. Bahdanau et al. (2014) [32] added attention mechanism to the model based on bidirectional recurrent neural network to encode and decode sentences in machine translation. Enlightened by the work of Bahdanau et al. [32] and    [35] introduced an attention mechanism fusing bidirectional single layer LSTM for question and answer matching, constructing a better answer representation according to input question.   [36] studied the internal and external attention mechanisms of discourse representation in implicit discourse relationship recognition. Then, the coattention mechanism has attracted the attention of researchers due to its prominent role in semantic relevance [37]. Yang et al. (2019) [38] developed a coattention mechanism using an end-to-end deep network architecture to jointly learn image and question features, implementing a challenging visual question answering (VQA) task. Motivated by Yang et al. [38], in this paper we combined coattention mechanism and attention mechanism to learn the features of the input question and the existing question together to obtain the interactive vector representation of the question pair. We improved the accuracy of the experiment by making use of the internal semantic connections of the question pairs.

III. PROPOSED MODEL FOR QUESTION ANSWERING
The Chinese intelligent question answering based on question pair matching is a progressive process. Our proposed framework for Chinese intelligent question answering in restricted domains is shown in Fig. 1. After the input corpus is preprocessed by the Chinese word segmentation tool, we encode and extract features of different expressions of the same target. Firstly, it needs a well-organized underlying question answering knowledge base to train the sematic representation and matching model for the question pairs. Secondly, we analyze the semantics and structure of the question pairs to construct the feature vector model. This is a crucial step.
Finally, the matching algorithm is deployed to calculate the similarity between the input question vector and the existing question vector, then outputting the answer.
The model of Chinese sentence semantic feature extraction is constructed in Fig. 2. Firstly, the sentences after word segmentation are sent to the word2vec model trained by the self-built corpus dataset for encoding. With a layer of convolutional neural network, the model can fully consider the word order and semantic context of the input sentence. Then the local key information of the statement is extracted and compressed into a fixed length to prepare for semantic interaction and representation of question pair in subsequent network. Next, semantic information is extracted and feature vectors are further represented by combining stacked BiLSTM network and coattention mechanism. This method not only can solve the dependence problem between the before and after words in the long statements, but also can obtain the related feature representation between the question pair. Finally, the output is predicted by the softmax function.
Chinese sentences are more complex and diverse in expression, and there are also certain difficulties in the specific lexical parsing and coding of restricted domains. Compared with the traditional feature extraction models, the network architecture in Fig. 2 can solve the problem of Chinese semantic understanding. CNN is used to capture crucial feature information, while the BiLSTM network is used to obtain the semantic analysis of the entire sentence.

A. SEMANTIC ANALYSIS WITH THE FUSION STRUCTURE OF CNN AND BILSTM
The relevance of words in Chinese expression is very close, and the analysis of sentence semantics and feature extraction include the sequence coding of each word and the weight VOLUME 8, 2020 assignment among words. Therefore, inspired by related researches [39]- [41], this paper uses the fusion structure of CNN and BiLSTM in the framework of question-sentence feature extraction to calculate the correlation importance of the words and map the local depth of the shallow features in the sentences.
In the CNN network layer, the initial input obtains the preliminary parsing vector of the text mainly through the two processes of convolution and pooling. In the experiment, its internal structure is illustrated in Fig. 3.
The input of the CNN network is the n × k-order word matrix generated by the pre-trained word2vec model, where n is the number of all words in the sentence, i.e., the maximum length of the sentence, and k is the vector dimension corresponding to each word. In this paper, the dimension of the word2vec model is trained as k = 300. We set x i ∈ R k as the k-dimensional word vector of the i-th word of the sentence, where the vector of the unregistered word is filled by the zeropadding method, then the sentence dense vector of length n is expressed as: where ⊕ represents the word vector parallel concatenation operator and x a:b represents the word vector matrix composed of {x a , x, . . . , x b }.
In the process of natural text processed by convolutional neural network, the convolution kernel generally covers the words of the upper and lower several lines of a statement, so the convolution kernel of a text has only length but no width. We set the convolution kernel window W ∈ R h×k with a fixed length h (h×k, the longitudinal h, and the transverse k represent the convolution window size, the number of words contained in the sliding window, and the dimension of the word vector, respectively) to convolve the words contained in x i:i+h−1 in the sentence. The feature operation of the convolution kernel window on the i-th word of the sentence is calculated as follows: where c i , σ , x i:i+h−1 ,and b represent the feature output of the word i after convolution, the activation function, the matrix composed of the word vectors from i to i + h − 1, and the bias factor, respectively;W is the operational matrix of the convolutional layer, i.e., the sliding window. By analogy, when the convolution kernel window with longitudinal length h is acted on the entire sentence with length n, n − h + 1 new feature vectors can be obtained for the statement, which form the one-dimensional feature map of the corresponding statement. The specific calculation is as follows: Pooling is a downsampling process for the convolutional layer, which can reduce the size of data to be processed while effectively extracting and retaining important semantic features of statements. In this paper, we extract the maximum eigenvalues of the convolution kernel vectors after convolution by the 1-max pooling strategy, and combine them as input of the BiLSTM network. The calculation is as follows: When the 1-max pooling technique is applied to the convolution kernel vector, each feature map only outputs a maximum value. This method can downsample all the dimensions of the feature map output from the convolutional layer to 1. Consequently, the final combined output vector C obtained after pooling is a fixed-size vector matrix whose dimension is the number of feature maps output from convolutional layer. Through convolution and pooling, the vector matrix of the statement performs initial semantic analysis and feature extraction to obtain local important information features of the statement and effectively reduce the training parameters.
After extracting the structured semantic information with representational ability in the sentences of question pair Q = (q 1 , q 2 , . . . , q n ) and Q = (q 1 , q 2 , . . . , q m ) through CNN network, the output is expressed as C Q = max[c 1 , c 2 , . . . , c n−h+1 ] and C Q = max[c 1 , c 2 , . . . , c m−h+1 ], where n and m are the numbers of words contained in question Q 1 and question Q 2 , respectively, namely the sentence length, and h is the convolution window size. The next step is to transfer C Q and C Q to BiLSTM network to extract the interdependencies and influences of the distant words in the sentence.
A single LSTM unit can only capture the semantic information of the first half of the sentence in which the word is located while the second half of the semantic information cannot be captured. In order to overcome the drawback, Schuster and Paliwal [42] designed a semantic capture model containing the LSTM hidden layer with two opposite directions. The forward and backward LSTM units simultaneously traverse the statement from the beginning to the end so that the forward neuron hidden layer output sequence . . , − → h n ) and the backward neuron hidden layer output sequence ← − h n ) of the sentence sequence x = (x 1 , x 2 , . . . , x n ) can be respectively obtained. The coded output y t of the hidden layers of bidirectional LSTM network can be obtained by cascading the forward and backward outputs, where W and b represent the weight vectors and bias terms corresponding to the three gates of the LSTM, respectively.
In relevant experiments [43]- [46], reasonable stacking of the networks can effectively improve model's ability of classification and regression. Therefore, this paper construct a stacked BiLSTM network based on the LSTM cell to fully achieve semantic parsing of a single statement. We define the output of upper level BiLSTM network, y t , as the input of next level BiLSTM network. The stacked BiLSTM network structure is shown in Fig. 4. The status output of each word in the sentence is: We define p t and p t to represent the t-th vectors of the question pair sequences C Q and C Q , respectively. The question pair sequences after CNN network are sent to the stacked BiLSTM network separately, from which their state matrices H Q and H Q can be obtained as follows: where d is the dimension of output state matrix of the hidden layer, d ∈ R.

B. SEMANTIC INTERACTION AND FEATURE REPRESENTATION USING COATTENTION AND ATTENTION MECHANISMS
In this paper, we design an incidence matrix in the coattention mechanism to capture the correlation and interaction between vectors, and use the softmax activation function to map the output of multiple neurons to the corresponding interval range. The state matrices H Q and H Q of question pair through the stacked BiLSTM network are taken as input, and the internal structure of the coattention mechanism is shown in Fig. 5. We perform matrix multiplication on the state matrices H Q and H Q to calculate the correlation matrix L. Each item in incidence matrix is the correlation score between the words of the question pair sentences, that is, the interaction between the question pair statements is reflected as follows: The softmax function normalizes vector elements and performs well on the problems of multi-classification and probability distribution. With the softmax activation function, the attention weights of the hidden layer states of the question pair are: After the interaction, the corresponding feature outputs of the question pair are: In the process of data calculation and transmission of the stacked BiLSTM network and the coattention mechanism, there may be some information mismatches in the question pair statements. To solve this problem, the attention mechanism layer is added following the coattention mechanism to connect and integrate the statement vector information of the previous few steps.
The input of the last module is the new eigenvector representations of the question pair statements C Q and C Q , and C Q t is set as the t-th attention feature vector of the input question statement. We use max pooling to convert the input into a fixed-length vector O q , which is the final feature vector output of the input question statement. When the attention mechanism is used to analytically represent the final vector of the existing question statement, the softmax weights of all the feature vectors (C where S q q is standardized by softmax function to represent the attention weight of the text vector of the existing question statement at time t, which is proportional to C Q t . The higher the value of S q q , the higher the relevance between C Q t and the input question statement, that is, the existing question statement will have a greater effect on the eigenvector representation of the input question.

C. QUESTION PAIR MATCHING
After obtaining the final eigenvector representations of the question pair, the output is determined by measuring the similarity between the feature vectors, of which cosine similarity and Euclidean distance are commonly used. The measure standard of cosine similarity is the spatial angle between two vectors, while Euclidean distance is to calculate the absolute spatial distance of two vectors. The spatial diagram of cosine similarity and Euclidean distance is shown in Fig. 6.
Generally, it is hoped that the angle between the feature vectors of the question pair is small enough and the distance is the shortest so as to maximize the similarity capture and calculation of the question pair statement vectors. Therefore, this paper reconciles cosine similarity and Euclidean distance to generate a similarity calculation function that can consider the above both factors simultaneously. The calculation functions of the final vector similarity are as follows: Cosine similarity: Euclidean distance: The cosine similarity is normalized to the range [0, 1]: The final vector similarity matching function is as follows: where · represents the point multiplication operation, O q and O q respectively represent the modulus length of the corresponding feature vectors, O q − O q 2 is the Euclidean distance between two points. The values of the equations (19) and (20) are in the range of [0, 1].

IV. EXPERIMENTAL DATA AND SETTINGS A. EXPERIMENTAL DATA
Throughout the current open-source dataset, many corpora are based on traditional question and answer matching for open domains, such as Baidu's WebQA, Stanford's SQuAD, CMU QA dataset, and SemEval 2015 cQA dataset [35]. Based on the experiments on the CCKS2018 dataset, this paper constructed a Chinese question answering dataset for the professional field, i.e., IPC-QA.

1) DATASET CCKS2018
The CCKS2018 dataset is derived from the third task of the WeBank Intelligent Customer Service Question Matching Contest. The core content of the contest is to understand the structure and semantics of two input question statements, and to determine whether they have the same inclination, intention or purpose by matching the correlation similarity of two sentences. The data in the CCKS2018 task 3 corpus mainly comes from real texts in the financial field, and most of the question sentences are the customers' related business consultations. The positive cases of the question pairs with the same intention in training set of the dataset are marked as 1, while the negative cases of the question pairs without the same intention are marked as 0. The original corpus provided by the official website contains 100000 marked question pairs as training set, 10000 unmarked question pairs as verification set, and 10000 unmarked question pairs as test set. The specific representation of the dataset is shown in Table 1.

2) INVERTED PENDULUM DATASET IPC-QA a: DATA COLLECTION PROCESS AND STANDARDS
As inverted pendulum is a complex and unstable system, many typical questions of Automatic Control Theory such as nonlinearity, stabilization and tracking analysis can be simulated on the basic experimental platform. According to the basic knowledge involved in the inverted pendulum experiment, this paper constructed the IPC-QA dataset for restricted-domain teaching question answering by the online searching engines and the professional books.
The collection criteria are set as follows: a total of 60 questions closely related to the inverted pendulum experiment are constructed. Each question has five organized expressions and only one standard answer. The length of each question is set to no more than 35 words, and the length of each answer should not exceed 100 words. Among them, different expression forms corresponding to each question can be conveniently obtained by means of similar question recommendation methods in Baidu or Google search engines. In order to ensure the effectiveness and reliability, the answers corresponding to each question are basically obtained by consulting professional books.
According to the setting criteria, the corpus data is collected and the initial dataset has 60 questions related to the inverted pendulum experiment. Among them, five expressions of each question are selected without deviating from the core semantics but with some certain difference as much as possible. An example of the dataset is shown in Table 2.
The collected question pairs and answers are closely related to the inverted pendulum experiment. The initial dataset covers most of the relevant knowledge content involved in the experiment, which has better coverage and practical application value and is saved in an Excel document.

b: CONSTRUCTION OF EXPERIMENTAL DATASET OF INVERTED PENDULUM
In order to train and learn the model in this paper, the initial dataset needs to be sorted, divided and labeled. According to VOLUME 8, 2020  the question matching dataset in the financial field provided by CCKS WeBank 2018 question matching evaluation task, the initial dataset is structured as follows: For the five expressions of each question in Table 2, we divide each two into a group but not repeated, then a total of 600 positive cases of question pairs are obtained. The five expressions of each question are combined in turn with one of the expressions of the other questions without repetition, and a total of 8850 negative cases of question pairs can be obtained. According to the general rule of 8:2, the positive and negative cases of question pairs are divided into training verification set and test set. As a result, the training verification set includes 480 positive cases and 7080 negative cases of question pairs, and the test set has 120 positive cases and 1770 negative cases of question pairs. Moreover, the training verification set is further divided into training set and verification set in the proportion of 16.7%. Consequently, the training set has 400 positive cases and 5900 negative cases of question pairs, and the verification set has 80 positive cases and 1180 negative cases of question pairs. The IPC-QA dataset is shown in Table 3.
When labeling the question pairs in the training set, similar with the dataset provided by CCKS2018, the positive cases are marked as 1, and the negative cases are marked as 0.

B. EXPERIMENTAL SETTINGS AND PARAMETER OPTIMIZATION
To complete the training of the model, the specific experimental settings are as follows: Some data of training sets in CCKS2018 and IPC-QA are sorted out to complete the training of the word2vec model. We set the dimension of the output word vector to 300 and the output length of each question statement to 35. The OOV (Out-Of-Vocabulary) zero-padding method is used to make up the insufficient length.
There are two different convolution windows with h of 2 and 3 respectively in the CNN network, which slide on the statement matrix by the stride of 1. If the number of convolution kernels for both windows is 100, 200 feature maps can be generated. Therefore, each question sentence can obtain corresponding output of a 200-dimensional matrix after max pooling, which is the feature vector input of the stacked BiLSTM network.
We set the initial learning rate of the model, the IPC-QA data batch size, the loss function's fixed margin M , and the regularization parameter λ to 0.001, 30, 0.2, and 1e-5, respectively. During the training process, a dropout layer with a dropout rate of 0.5 is added to the convolutional layer to avoid over-fitting of the model, a gradient clipping method with a clip gradient of 5 is used to avoid the model gradient explosion, and the hinge loss function is set as the objective function of the model training. In this section, Adam algorithm is used to optimize the model with the decay rate of 0.95 to complete the updating and optimization of network parameters. The specific calculation process is shown in Algorithm 1.

V. EXPERIMENTAL RESULTS AND ANALYSIS
According to the evaluation criteria commonly used in intelligent question answering, we use precision, accuracy, recall and F1 score to evaluate the experimental results on Algorithm 1 Feature Extraction and Similarity Calculation of Question Pairs Using Deep Learning Input: The question pair sequence Q = (q 1 , q 2 , . . . , q n ) and Q = (q 1 , q 2 , . . . , q m ) after Chinese word segmentation and word2vec vectorization Output: The similarity degree of question pairs Process: 1. Input the initial word vector x i ∈ R k of the statement in turn to perform the convolution operation  (23) where N pc is the number of positive cases with correctly marked, N pt is the total number of marked positive cases. Accuracy = N qc N qt (24) where N qc is the number of correctly marked question pairs, N qt is the total number of question pairs. Recall = N pc N p (25) where N pc is the number of positive cases with correctly marked, N p is the total number of positive cases.
A. EXPERIMENTAL RESULTS AND ANALYSIS ON CCKS2018 DATASET When tested on CCKS2018 dataset, the initial feature representation process was completed using the jieba word segmentation tool and the word2vec model of 300-dimension output. Due to the large dataset, the batchsize was set to 50 and the iterations was 25. According to the evaluation criteria, the task on the dataset can be regarded as a binary classification task, whose label set is {1, 0}, where 1 indicates that two given question statements have the same or similar intent, and 0 indicates that two given question statements have no same or similar intent. Label 1 is given if the similarity between the two question statements exceeds 0.65, otherwise label 0 is given. In Table 4, the sequence 1∼3 is the top three teams in model precision in the competition. The model precision in this paper is 0.86042, which is lower precision of 0.443% than the best performance of the team and rank the eighth in the list of participating teams. However, analyzing lines 4 and 5, it can be found that the precision of the prediction results of question pair is higher of 0.228%. We can obviously see that adding the convolution layer can appropriately improve the analytical ability and prediction level of the model. At the same time, recall and F1 score in our model have achieved favorable label prediction results. The results indicate that the model has a high probability of correctly labeling the positive question pair cases and can well understand and extract related features of the similar statements to perform the representation calculations.

B. EXPERIMENTAL RESULTS AND ANALYSIS ON IPC-QA DATASET
Based on the research work of CCKS2018 dataset experiment, the model was verified and tested on IPC-QA in this paper. The preset parameters are shown in Section 4.2. Firstly, the iterations were set to 10, 20, and 30 respectively, and three question statements A, B and C were selected from the test set to test model's prediction ability with different iterations. A: How does LQR in the inverted pendulum work? B: What is the working state of the inverted pendulum? C: The application methods of LQR in the inverted pendulum. If a pair of positive cases or a pair of negative cases is predicted correctly, the label is 1 and vice versa, the same statement not predicted. The test results are shown in Fig. 7.
(1) In Fig. 7, it can be observed that when the iteration is 10, the model can correctly predict the positive cases A and C. However, at 20 iterations, the negative cases A and B still cannot be correctly predicted. The reason may be that the A and B sentences contain the same words, and the semantic features extracted by the model have not reached a certain depth. At 30 iterations, the model has been able to accurately classify the positive and negative cases composed of the three sentences.
(2) In Table 5, comparing lines 1 and 2, it can be found that the experimental results of the internally stacked BiLSTM network model are better than those of the single-layer BiLSTM. Generally, a properly stacked LSTM network helps the model to understand the relationship between the words in the mapping statement at a deep level.
(3) In Table 5, compared with the simple stacked BiLSTM, the precision and recall of the model with the coattention mechanism are increased by 2.35% and 2.7%, respectively. These data strongly proved that the coattention mechanism can effectively bridge the lexical gap between question pair sentences. A reasonable structure of the attention mechanism can help the NLP task to extract the key information of the sentence.
(4) In this paper, the optimal precision and recall are obtained through a model combining CNN, stacked BiLSTM, and the coattention mechanism, which are 0.8031 and 0.8214, respectively. This not only indicates the feasibility and accuracy of the model, but also shows that the harmonic cosine similarity and Euclidean distance can balance the angle and distance relationship between the vectors to achieve better matching of the sentence vectors.
(5) In Table 5, the precision of the model added with CNN network is 0.71% higher than the original model. At the same time, recall of the two models are 2.23% and 1.83% higher than precision respectively. The data indicates that the model has the certain ability of semantic analysis, and there are good experimental results in the understanding and matching of similar sentences. However, the final experimental precision is not good enough, of which one reason may be that the IPC-QA dataset corpus is not ample, resulting that the semantic understanding of the negative cases of the question pairs is not sufficiently differentiated by the model.

VI. CONCLUSION
In this paper, we focus on the difficulty of Chinese sentence semantic understanding and the lack of valid question answering datasets. Using convolutional neural network, long and short-term memory network, coattention mechanism and attention mechanism to analyze the sentence information, we validated and analyzed the model on CCKS2018 dataset and the self-built Chinese question answering dataset for the inverted pendulum experiment. The experimental results verified the analytical expression ability of the model on Chinese corpus. The idea of question pair matching and the construction of inverted pendulum dataset have laid a solid foundation for further research on intelligent question answering. However, the model of this paper still has some defects in the expansion of professional field and dataset. In addition, although the model has achieved good results in experiments, it lacks practical applications in reality.
In the future, providing knowledge solutions based on professional restricted domains will become a general trend of future intelligent question answering. Deep learning can predict effective answers based on the summarized and reorganized knowledge databases in specific fields. However, the existing large-scale question answering corpora are oriented to the open domains. Researchers will focus on expanding datasets of restricted domains and designing subtler models corresponding to these datasets. With the development of data transmission speed, the application of restricted-domain intelligent question answering to the real scene environment such as intelligent teaching in virtual environment will also become the focus of future development.