A Quantum Entanglement-Based Approach for Computing Sentence Similarity

It is important to learn directly from original texts in natural language processing (NLP). Many deep learning (DP) models needing a large number of manually annotated data are not effective in deriving much information from corpora with few annotated labels. Existing methods using unlabeled language information to provide valuable messages consume considerable time and cost. Our provided sentence representation based on quantum computation (called Model I) needs no prior knowledge except word2vec. To reduce some semantic noise caused by the tensor product on the entangled words vector, two improved models (called Model II and Model III) are proposed to reduce the dimensions of the sentence embedding stimulated by Model I. The provided models are evaluated in the STS tasks of 2012, 2014, 2015 and 2016, for a total of 21 corpora. Experimental results show that using quantum entanglement and dimensionality reduction in sentence embedding yields state-of-the-art performances on semantic relations and syntactic structures. Compared to the Pearson correlation coefficient (Pcc) and mean squared error (MSE), the results of 16 out of 16 corpora are better than the results of the comparative methods.


I. INTRODUCTION
Semantic textual similarity (STS) is a task that measures the degree of the semantic similarity between two sentences. There are many applications in natural language processing (NLP) that refer to the textual similarity in semantics, such as document summarization, semantic search, question answering, document classification, and natural language inference (NLI). The main challenge of STS in recent years is how to mine more semantic information to make the calculational results infinitely close to those of humans. The only criterion for evaluating the computational results is the degree of the approximation to human-made scores. The closer the calculated result is to the human-made score, the more general the model is. With the development and improvement of word embedding, existing text analysis methods are continuously emerging, which is mainly based on word representations. Some methods and the calculation results on semantic analysis based on word vectors are collected in [1].
The associate editor coordinating the review of this manuscript and approving it for publication was Thomas Canhao Xu .
Compared to the mentioned models, two problems should be considered. First, only the semantic information of the word is considered, but the influence between words is ignored. Second, the relationship between words is considered in the methods based on dependency trees but with complex computing processing. To solve the problems mentioned above, our proposed methods consider the influence between words and integrate the theoretical knowledge on quantum entanglement into the textual representation. In the models, we use the tensor product to extend the dimensionality of the sentence representation with more semantic information. Because of the entanglement between the adjoining words, the impact of the continuous synonyms in any sentence can widen the semantic differences in the sentence pair.
For the sake of the influence of continuous synonyms, we provide two approaches of dimensionality reduction to sentence representation. The one approach introduces sentence level improvement on the sentence representation based on quantum entanglement, in that we directly decrease the dimensionality of the sentence embedding. In the other method, we give an entangled words level advancement to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the sentence representation-based quantum entanglement by decreasing some dimensions with relatively small terms of the entangled words vectors; as a consequence, the ultimate dimensionality of the sentence representation is also reduced. Our proposed methods are composed of the three models mentioned above. Experimental results for STS of the years 2012, 2014, 2015 and 2016 on the similarity in sentence pairs demonstrate the high performances of our proposed approaches.
In brief, the innovations are as follows. First, introducing quantum mechanics methods, two continuous notional words are entangled together with the numerical computation method of the tensor product. Taking the entangled words pair as a whole can mine more semantic information. Then, by means of the physical ideas on extracting the primary factors and ignoring the secondary factor, two models of dimensionality reduction are proposed to optimize the model of the sentence representation based on quantum entanglement, which is different from the dimensionality reduction ideas of DP. Last, the experimental results of the proposed models are excellent, and the algorithms are very simple with no need for any prior knowledge except for word2vec.
The paper is organized as follows. Section II summarizes some related literature on quantum computation and sentence similarity. Section III explains our proposed models in detail. The detailed comments on the different combinations of the proposed models are explained in Section IV. Section V demonstrates the experimental results and lists the comparison to other methods. In Section VI, some conclusions are drawn.

II. RELATED WORKS
The main idea of this paper is to improve the sentence representation based on quantum entanglement with dimensionality reduction. In this section, we review some related works on quantum computation and sentence similarity.

A. QUANTUM COMPUTATION
In recent years, the integration of quantum computation with other disciplines has become increasingly popular, such as the application of machine learning in quantum computation [2]- [7], the introduction of quantum theory into artificial intelligence [8]- [11], the application of quantum theory in information science [12]- [17], quantum chemistry [18], and quantum annealing algorithms [19]. In [2], experimental machine learning of quantum states was possible to efficiently learn and classify, which indicates that the classification of quantum states can be achieved with limited resources. J. Venderley et al. established a machine-learningbased approach that can enable rapid exploration of large phase spaces [6]. In [14], efficient verification protocols for any stabilizer state were given. C. Guo et al. introduced a machine learning model in which matrix product operators were trained to implement sequence-to-sequence prediction to predict the next sequence [20]. In [3], the number of neural network features for machine learning was shown, which could naturally be mapped into the quantum optical domain by introducing the quantum optical neural network. A framework that captures entanglement distillation in the presence of natural correlations arising from memory channels was introduced by Waeldchen et al. [21].
There are very few works integrating natural language processing with quantum computation [22]. In [22], P. Zhang et al. designed a sentence representation using quantum language for description. The sentence embedding represented by Dirac symbols was input into deep neural networks to compute the similarity of the question answer sentence pair.

B. SENTENCE SIMILARITY
The key to the improvement of many top-level applications is the development of supporting technologies. In the big data era, it is important to advance the accuracy rate of text similarity, as the computation of text similarity is a significant part of NLP. The common techniques on word embedding [23] or sentence embedding [24] include GloVe [25], PSL [26], ST [27], SCBOW [28], PROJ [29], PP-tf-idf [30], DAN [31], LSTM [32], and RNN [29]. The similarity in semantics can be applied to many NLP fields [33]- [35]. Z.-T. Guan et al. proposed a cross-lingual multikeyword ranked search scheme based on the open multilingual WordNet with flexible keyword and language preference settings [33]. A. J. M. Traina et al. provided technologies and tools to meet the variety and veracity characteristics of big and complex data and consider the semantic information of data [34]. In [35], a method for querying relational databases with keywords to simplify access to these data is proposed. An algorithm using latent Dirichlet allocation (LDA) and OpenAI-GPT to generate negative examples is introduced to multilingual STS [36]. Latent semantic analysis and LDA are compared to identify the unit [37]. Quan et al. [1] provide an efficient framework for the sentence similarity merging the attention weight mechanism with a constituency tree and give comparison experimental results to other methods. All the results of these classic methods are collected from [1], as shown in Table 2.
The main methods for computing sentence similarity are text embedding or neural network models [38], [39], [41], [42]. A multitask learning approach for understanding the relationship between two sentences is reported by Choi and Lee [43]. A. Skabar and K. Abdalgader provide an algorithm that is based on fuzzy relations to identify overlapping clusters of semantically related sentences [44]. A text expansion and deep model-based approach for service recommendation is proposed, which can bridge the vocabulary gap between services and user queries with the collective semantic similarity of sentences and descriptions [39]. An interactive self-attentive Siamese neural network is used to verify the effectiveness of the interactive self-attention [40]. With the development of capsule networks, the text representation preprocessed by neural networks can achieve out-of-state results as the input of classification and machine translation. ELMo transfers the top LSTM layer into a linear combination of the vectors stacked above each input word for each end task, with marked improvement [45]. Due to the powerful pretraining function of the transformer, some new models calculating classification and language inference have achieved state-ofthe-art results, such as XLNet [46], BERT and its variants [47]- [50], and UNILM [51], [52]. Minaee et al. provided a comprehensive review of deep learning-based text classification [53]. However, few studies have examined semantic similarity [54].
Compared to the works mentioned above, existing computations of sentence similarity mainly focus on the similarity between word representations and do not consider the similarity in semantics. Considering only the similarity between words, the calculated sentence similarity is far below the human-made score when several synonyms are included in the sentence pair. Some methods combine other datasets or contexts to infer the implied semantics of sentences with complex computational processes or large corpora. Our provided model (called Model I) that integrates quantum entanglement into sentence representations can explain the modification between words and express more semantic information. Considering that the tensor product can expand the semantic difference between sentences when several synonyms are included in the sentence pair, our proposed two advanced models (called Model II and Model III) based on Model I reduce the dimensions of the sentence representation to decrease the semantic difference between sentences. We use the different combinations of the three models to optimize our method to mine more semantic information in the sentence pair. Our provided method can compute the sentence similarity of all the sentence pairs with at least two notional words in each sentence. Moreover, it integrates quantum entanglement with sentence embedding and utilizes dimensionality reduction to reduce the semantic error in the sentence pair, with experimental results achieving state-of-the-art performances on 21 corpora.

III. APPROACHES
In this section, we first provide a sentence embedding based on quantum computation (Model I) and then construct two advanced models (Model II and Model III) to reduce the semantic error caused by the tensor product. Considering the dimensionality of sentence representation expanded to d 2 (the dimension of word2vec is d) by the tensor product, the high dimensionality of the sentence representation may give rise to some unnecessary semantic errors that cause the similarities of sentence pairs to decline. The advanced models have different effects on the sentence similarity in semantics. Model II that considers the overall influence on sentence pairs reduces the dimensionality from the sentence level. Model III that considers the local information of the sentence pair declines the dimension of the sentence embedding on the entangled words level. For the combination of the models, our provided method first considers the effect of Model II on the sentence pair with a large similarity and then introduces Model III to decrease the semantic errors of the sentence pair with a small similarity.

A. MODEL I 1) EXTRACT WORDS
Remove the function words to extract the notional words from the sentence. Store the notional words in an array A with the original sequences of the words in the sentence, where w i is the ith notional word in the sentence and n is the total number of notional words.

2) NORMALIZE WORD VECTOR
where s i is the vector of the ith word and | s i | is the module of s i . |w i representing the normalized word vector is called a ket. A ket in quantum mechanics expresses a state in the Hilbert space, which is a column vector. The dimension of the word vector is d, so |w i is also a d dimensional column vector.

3) ENTANGLED WORDS VECTOR
Two adjacent words are entangled together in order, forming the array The definition of the entangled words vector is where w i is the ith word in array A, w i+1 is the right adjoint word of w i , and ⊗ denotes the tensor product. The definition of the tensor product of a and b is V and W are Hilbert spaces with m and n dimensions, respectively, then V ⊗ W is an mn dimensional vector space. Thus, the entangled words vector is VOLUME 8, 2020 |w i and |w i+1 are d dimensional vectors; then, |w i w i+1 is a d 2 dimensional vector.

4) SENTENCE REPRESENTATION
The sentence representation is defined as where we set all the entangled coefficients to 1 to simplify the sentence embedding.

5) SENTENCE SIMILARITY
The direction cosine of the two sentence representations is defined as the sentence similarity of the sentence pair. Hence, the sentence similarity is where T 1 |T 2 denotes the inner product of T 1 | and |T 2 , ||T 1 | and ||T 2 | are the norms of |T 1 and |T 2 , respectively, and T 1 | is the conjugate transpose of |T 1 .

B. ADVANCED MODELS
A vector with n 2 dimensions can be performed by the tensor product on two vectors with n dimensions, so the tensor can describe the states of the objects more detail than vectors.
Consequently, using tensors to analyze the semantic relations between words may expand the differences between sentences, such as the following sentence pair. S a : A man is playing on a guitar and singing, S b : A woman is playing an acoustic guitar and singing. The two sentences are different though they have many of the same words. The human-made score is only 0.44 (divided by 5). Compared to the two sentences, there is only one word 'man' in S a different from S b and the word 'acoustic' is absent. If we entangle the adjacent notional words together, there are 3 out of 5 words entanglement pairs, which is more than two. Due to the dimensionality expansion of the tensor product, the calculational similarities of the sentence pairs from the semantic analysis are apparently lower than human-made scores. The main reason is the tensor product expressing the semantic relations of the words in a very particular way. To reduce the impact of secondary factors, we provide two methods with dimensionality reduction: Model II and Model III.

1) MODEL II
In Model II, the dimension of the sentence representation is decreased to D 1 , which is smaller than d 2 (d is the dimensionality of the word2vec). The main idea is to extract the top D 1 values and reorder them by their original indexes from the sentence representation based on quantum entanglement. The algorithm is illustrated in Algorithm 1.

Algorithm 1 An Improved Sentence Level Dimension Reduction Model Based on Quantum Entanglement
Input: array A, B, word2vec, sentence embedding dimension D, D 1 . Output: cosθ 1: Input one sentence of the sentence pair, extract all the notional words and store in an array A = {w 1 , w 1 , . . . , w n } 2: Entangle the two adjacent notional words together to form the array B = {(w 1 w 2 ), (w 2 w 3 ), . . . , (w n−1 w n )} 3: Obtain the entangled words representation by the tensor product: |w i w i+1 4: Generate the sentence representation by linear superpositions of all the entangled words representations with D dimensions: |T = n−1 i=1 |w i w i+1 5: Reduce the dimensionality of the sentence representation to D 1 : remove the D − D 1 dimensions with smaller absolute values to achieve the sentence embedding as |T 1 6: input the other sentence, repeat Step 1 to 5 to receive the sentence embedding |T 2 7: Compute the direction cosine between the sentence pair: In Model III, the dimension of the entangled words representation declines to D 2 , which is smaller than d 2 (d is the dimensionality of the word2vec). The main idea is to extract the top D 2 values and reorder them by their original indexes. Then, the modified entangled words vector is substituted into the sentence representation; as a consequence, the sentence embedding is modified to a vector with D 2 dimensions. We show the simulation steps in Algorithm 2.

C. CORRELATION OF THE THREE MODELS
If the dimensions of Model II and Model III are not reduced, namely, D 1 = D 2 = D, the three models are equivalent.
. . , (w n−1 w n )} 3: Obtain the entangled words representation by the tensor product: |w i w i+1 4: Decrease the dimensions of the entangled words representation to D 2 : remove the D − D 2 dimensions with smaller absolute values to achieve the entangled words embedding as |w i w i+1 5: Generate the sentence representation by linear superpositions of all the entangled words representations with D 2 dimensions: |T 1 = n−1 i=1 |w i w i+1 6: input the other sentence, repeat Step 1 to 5 to receive the sentence embedding |T 2 7: Compute the direction cosine between the sentence pair: dimensions of the sentence representation on the level of the entangled words. The two methods also have different effects on sentence similarity. Consequently, we utilize the different models to discover the different influences on the semantic analysis and syntax structures. Model II focuses on the selection of overall sentence attributes, which is suitable for the semantic analysis of the sentence pair with a large similarity. However, Model III focuses on the characteristic distribution of the entangled words and describes the semantics in a more detailed way. Moreover, it can grasp the main influential factors of the entangled words while ignoring the secondary factors and is suitable for the sentence pair with low similarity. Thus, we use Model II and Model III to optimize the sentence representation differently. Subscripts 1 and 2 identify the physical quantities of Model II and Model III, respectively.

A. COMBINATION OF MODEL I AND MODEL II
We first define three variables: σ 1 , E 1 and λ 1 . σ 1 means the threshold of the human-made score y 1 of the sentence pair. The relative error E 1 is defined as where S 1 is the calculational value of the sentence similarity modeled by Model I. λ 1 is defined as the threshold value of the relative error. When y 1 > σ 1 where x i denotes the experimental results, y i is the human score of the ith sentence pair and N is the total sentence pairs in a corpus. If sentences in one corpus are computed by two different models, we change the standard deviation of the calculated sentence similarities as follows, denoted by δ x .
where N 0 + N 1 = N . N 0 sentences pairs are computed by Model I with the standard deviation of δ 0 , and the other N 1 sentences are calculated by Model II with the standard deviation of δ 2 . We replace δ x in Equation (10) with Equation (12) to obtain the expression of Pcc.
where x i is the calculated similarity of the sentence pair modeled by Model I and the total text modeled by Model I is N 0 , x j is the calculated similarity of the sentence pair modeled by Model II, and the total text modeled by Model II is N 1 .

B. COMBINATION OF MODEL I, MODEL II AND MODEL III
Four variables σ 2 , γ 2 , E 2 and λ 2 are introduced. σ 2 , E 2 , y 2 and λ 2 are defined as the same means of σ 1 , E 1 , y 1 and λ 1 . γ 2 is the minimum of the human-made score y 2 of the sentence similarity. When y 1 > σ 1 and E 1 > λ 1 , Model II is introduced. When γ 2 < y 2 < σ 2 and E 2 > λ 2 , Model III is introduced. Where λ 1 and λ 2 can be selected as the different values, σ 1 and σ 2 cannot also be the same, and γ 2 cannot be equal to 0.   pairs are defined as δ x .
where δ 0 , δ 1 and δ 2 are the standard deviation of the similarities of sentence pairs modeled by Model I, Model II and Model III, respectively. δ x in Equation (10) is replaced by Equation (10) to achieve Pcc. The MSE is changed as follows.

A. DATASETS
In the study, the public word2vec lib is assigned, and every word is a 300-dimensional vector [ . The SemEval-2016 task involves plagiarism detection, postedited machine translations, questions-answers and article headlines of news [60]. These corpora consist of sentence pairs and their textual similarities ranging from 0.0 to 5.0. To compare with experimental scores, we divide each human-made score by 5.

B. EXPERIMENTAL SETTINGS
In this subsection, we discuss the results of the designed experiments in various aspects and then compare them with the methods that used word/sentence embedding, as illustrated in Table 2 [1]. 'ACVT' is the proposed method, and the bold-type figures are the best values for every corpus, as indicated in Table 2. The adjustment processes of the parameters of λ 1 , σ 1 , γ 2 , λ 2 and σ 2 are as follows. First, for combination of Model I and Model II, we adjust the parameters λ 1 and σ 1 to obtain the best Pcc of each corpus. Second, for the combination of Model I, Model II and Model III, we first adjust the parameters λ 1 and σ 1 to the optimum values and obtain the best Pcc value by adjusting the parameters γ 2 , λ 2 and σ 2 . The last Pcc is the optimal value, and the MSE of the last Pcc is set as the optimum MSE value. Table 3 lists the comparison of the results of the proposed methods to the selected results from  For the methods of Model I and II, the corpus with the greatest improvement rate is STS'12.SMTeuroparl with an improvement rate to 32.7% relative to 'Best Values' and an improvement rate of 60.5% relative to 'ACVT'. The corpus with the second-highest improvement rate is STS'15.answers-forum with an improvement rate of 23.2%. The datasets with the third-highest improvement rate are STS'12.SMTnews and STS'14.deft-forum, with an improvement rate to 16.7%. In addition, the datasets with improvement rates over 10% are STS'12.OnWN and STS'15.belief, with improvement rates of 13.7% and 10.3%, respectively. Moreover, Pcc of STS'14.headlines is 0.79, which is more 0.07 higher than 'Best Values', with an improvement rate of 9.7%, which is slightly lower than 10%. Contrasting the influences of the different combinations on Pcc of all the corpora, the maximum absolute improvement is 0.17 for STS'12.SMTeuroparl, and the next is STS'15.answersforums, which achieved 0.15. There are four corpora with Pcc improvement exceeding 0.1.

C. COMPARING WITH WORD EMBEDDING-BASED METHODS
Considering the combination of Model I, Model II and Model III, the results of our proposed method are higher than 'Best Values' for all the STS datasets. The corpus with the greatest improvement rate is also STS'12.SMTeuroparl with an improvement rate of 32.7% relative to 'Best Values' and an improvement rate of 60.5% relative to 'ACVT'. The corpus with the second-highest improvement rate is STS'15.answers-forum with an improvement rate of 24.6%.
The datasets with the next highest improvement rate are STS'12.SMTnews and STS'14.deft-forum, with improvement rates of 22.2% and 18.2%, respectively. Additionally, the datasets with improvement rates over 10% are STS'12.OnWN, STS'14.headlines and STS'15.belief, with an improvement rates of 15.1%, 12.5% and 11.5%, respectively. There are 7 out of 16 datasets that improved over 10%, including one corpus that improved over 30% and two corpora that improved over 20% with rates of 24.6% and 22.2%, respectively. Compared to the effects of the two proposed methods, the maximum absolute growth is 0.17 for STS'12.SMTeuroparl and the next is STS'15.answers-forums achieving 0.16. In addition, there are three corpora with Pcc that increased by over 0.1. Fig. 1 and Fig. 2 illustrate the comparison of the calculated results of different years of STS with the histogram. The height of the histogram of the proposed methods for each corpus is more than that of the best results collected from other studies listed in Table 2. Excluding MSRpar shown in Fig. 1, the differences in the heights of the histograms for all the other corpora are excellent. Consequently, it is generally accepted that our proposed methods significantly improve Pcc of every STS dataset, which demonstrates that the proposed methods are effective and valuable. Table 4 evaluates the influence of different combinations of different models. It is evident that every result calculated by the combination of Model I, Model II and Model III is no less than the result of the corresponding corpus computed by Model I and II. Four corpora increase by 0.03, five corpora increase by 0.02, ten corpora increase by 0.01, and two corpora remain unchanged. Moreover, for the combination of Model I and Model II, the corpus with the maximum Pcc is STS'14.OnWN, which reaches 0.91. There are fifteen out of  twenty-one corpora with Pcc exceeding 0.8 and three corpora close to 0.8, which are 0.77, 0.76 and 0.78. Subsequently, for the combination of Model I, Model II and Model III, the maximum Pcc reaches 0.92 from STS'14.OnWN. Pccs of STS'12.MSRvid and STS'14.OnWN exceed 0.9. It is important that the two corpora with Pccs exceeding 0.9 contain 750 sentence pairs. There are sixteen Pccs out of twenty-one that are higher than 0.8, and two corpora are close to 0.8, which are the same value of 0.78. It is amazing to find that all Pccs of STS'15 and STS'16 surpass 0.8, which means that the influences of the proposed models are predominant.

D. OVERALL RESULTS
The MSE of different corpora influenced by the different models can be observed in Table 5. MSE is a measure reflecting the degree of the difference between the estimated values  and the measured values. In this work, MSE indicates the fitting degree between the calculated values and human-made scores of sentence similarities from the semantic analysis. The smaller the MSE is, the higher the fitting degree. All the values in Table 5 are less than 0.05, the minimum MSE is just 0.015 and the maximum MSE is only 0.048. Data from Models I and II illustrate that the MSE of the corpora STS'14.deftforum and STS'16.headlines are 0.048 achieving the peak though they are infinitesimal. There are five corpora of which the MSE exceeds 0.04 and seven corpora between 0.03 and 0.039. Compared to the corpora of STS'12, it is evident from the results that the proposed method performs dominantly because all the MSEs of datasets are less than 0.035. The dataset with the lowest MSE is STS'12.SMTeuroparl, just 0.015. As detailed in Table 5 from the combination of Model I, Model II and Model III, the minimum MSE is 0.016, which is more 0.001 than that of Model I and II.
The maximum MSE is only 0.044, which is less 0.004 than that of Model I and II. MSEs of the two corpora STS'12.MSRvid and STS'12.SMTeuroparl are insufficient to 0.02, with 0.019 and 0.016, respectively. There are fourteen out of twenty-one corpora with MSEs below 0.03, which demonstrates that the performances of our proposed methods are perfect. Compared to the results of the two proposed methods, except for dataset STS'12.SMTeuroparl, all MSEs of the corpora are decreased by the introduction of Model III. The MSEs of STS'12.OnWN and STS' 16.answer-answer are reduced by 0.008. The greatest reduction in MSEs is 0.011 for STS'15.answers-students.
Comparing Table 4 to Table 5, except for STS'12.SMTeuroparl, all Pccs of the other corpora increase and the MSEs are reduced by introducing Model III, as illustrated in Fig. 3  The main reason is that the combination of Model I, Model II and Model III decreases the difference between the calculated values and the human-made values of some sentence pairs. The great reduction in MSE can be explained by the characteristics of Model II and Model III. Moreover, comparing the variation tendencies between the Pcc and MSE of the same corpus in Table 4 and Table 5, the conclusion is obtained that the Pcc and MSE can analyze the semantic information of sentence pairs from different standpoints.

E. DETAILED RESULTS
In this subsection, we discuss the Pcc and MSE of some specific corpora by adjusting the parameters of Model II and Model III. The adjustment process is as follows. First, the similarities S 1 of all the sentence pairs in the corpus are calculated by Model I. Second, Equation (9) is used to compute the similarity error E 1 . When the sentence similarity error E 1 satisfies the condition E 1 > λ 1 , Model II is selected to recalculate the sentence similarity. Subsequently, the combination of Model I and Model II is formed. Third, the parameters of Model II σ 1 , λ 1 and D 1 are regulated to achieve the optimal values of Pcc and MSE. Finally, Model III is introduced to optimize the combination of Model I and Model II, which is called the combination of Model I, Model II and Model III. choose some appropriate values for γ 2 and σ 2 according to the value of σ 1 . When the sentence similarity and the similarity error both satisfy the following condition: E 2 > λ 2 and γ 2 < y 2 < σ 2 , Model III is used to compute the sentence similarity. The parameters of Model III σ 2 , γ 2 , λ 2 and D 2 are adjusted to optimize the values of the Pcc and MSE. The values of σ 1 and σ 2 satisfy the condition of σ 1 > σ 2 , so all the     Table 6 exhibits the influence of γ 2 , λ 2 and D 2 on STS'12. SMTeuroparl, which is comprised of 459 sentence pairs. A large number of the annotated similarities of sentence pairs are greater than 0.8, and no sentence pairs with annotated similarities are less than 0.3. As a result, when changing γ 2 from 0.2 to 0.3, the Pcc and MSE both remain invariant. Attributed to the large number of human-made scores over 0.8, the effect of Model II is better than that of Model III. Therefore, the Pcc of the combination of Model I and Model II is higher than that of the combination of the provided three models. In the majority of cases, MSEs decrease apparently by the introduction of Model III, as shown in Table 6. The detailed comparable charts affected by λ 2 and D 2 are expressed in Fig.  5(a) and Fig. 6(a). Both λ 2 and D 2 can markedly alter the Pcc of the datasets.
2) EXPERIMENTAL RESULTS OF STS'14.input.images Table 7 explains the comparison of the combination of Model I and Model II to the combination of the three provided models in STS'14.input.images consisting of 750 sentence pairs with sentence similarities ranging from 0.0 to 1.0. Compared to the Pcc and MSE, all the values of MSE are very small but with high Pccs, which illustrates that Model III can decrease the semantic error apparently for the sentence pair.

3) EXPERIMENTAL RESULTS OF STS'15.input.images
The effect of the parameters on STS'15.input.images is given in Table 8, and the detailed comparison charts of Pcc and MSE are interpreted in Fig. 5(c) and Fig. 6(c), respectively. The corpus STS'15.input.images is composed of 750 sentence pairs with human-made sentence similarities from 0.0 to 1.0. There are some long sentences with the number of notional words over 10 in the dataset. The longer the sentence is, the greater the sentence semantic differences are. The impacts of the entangled words operated by the tensor product are marked in semantics; as a consequence, Model III plays an irreplaceable role in the reduction in semantic noise. As Fig. 5(c) and Fig. 6(c) show, the influences of D 2 are more obvious than those of λ 2 , which can be attributed to the considerably transformed Pcc and MSE. Table 9 shows the influences of different combinations of different models on STS' 16.input.answer-answer, which contains 259 sentence pairs with the annotated sentence The lengths of the two sentences are different in some sentence pairs. For example, one sentence of some sentence pairs in the corpus is very short with just two national words, but the other sentence is more than five notional words. In the corpus, many high-frequency words are not considered in the provided methods according to the abstraction laws of words. With some colloquial words and short sentence pairs, the Pcc and MSE are sensitive to changes in the parameters, as listed in Fig. 5(d) and Fig. 6(d). When σ 1 = 0.4, λ 1 = 0.3, D 1 = 10000, λ 2 = 0.3 and D 2 = 85000, comparing Pcc and MSE of 0.15 < y 2 < 0.3 to that of 0.15 < y 2 < 0.5, the diversifications are insignificant with only 0.001, as detailed in the first columns in Fig. 5(d) and Fig. 6(d).

4) EXPERIMENTAL RESULTS OF STS'16.ANSWER-ANSWER
Compared to the second columns in Fig. 5(d) and Fig. 6(d), when D 2 is changed from 85000 to 75000 on the condition of σ 1 = 0.4, λ 1 = 0.3, D 1 = 10000, 0.0 ≤ y 2 < 0.5 and λ 2 = 0.3, the Pcc declines by 0.085 varying from 0.859 to 0.774, and the MSE is approximately doubled with the variation from 0.033 to 0.066. When we only alter γ 2 from 0.0 to 0.1, Pcc and MSE, as explained in the third columns in Fig. 5(d) and Fig. 6(d), respectively, vary dramatically but are smaller than the influence of D 2 shown in the second columns in Fig. 5(d) and Fig. 6(d), respectively. When σ 1 = 0.4, λ 1 = 0.3, D 1 = 10000, λ 2 = 0.3 and D 2 = 75000, comparing the Pcc and MSE of 0.0 ≤ y 2 < 0.5 to that of 0.1 ≤ y 2 < 0.5, respectively, the alterations are apparent with the value of Pcc changing from 0.774 to 0.842 and the MSE changing from 0.066 to 0.042, as displayed in the last columns in Fig. 5(d) and Fig. 6(d), respectively.

5) SUMMARY
In summary, compared to the Pcc and MSE of the four corpora influenced by Model II and Model III from Table 6 to Table 9, the effect of Model III on long sentence pairs with low sentence similarities is better than the effect of Model II, VOLUME 8, 2020 but for the sentence pairs with high similarities, the effect of Model II is better. The impact of Model III on the corpus consisting of more sentences with more notional words is clearer. Take STS'14.input.images as an example. All Pccs of the combination of Model I, Model II and Model III are much higher than that of the combination of Model I and Model II, and all the MSEs of the combination of Model II, Model II and Model III are lower than that of the combination of Model I and Model II. The influence of parameters on the data with a small number of sentence pairs is more obvious, as evident from STS' 16.answer-answer.

VI. CONCLUSION
This study demonstrates how to integrate quantum theory into text embedding to construct sentence representations based on quantum entanglement. Considering that the dimension expansion of the entangled words vector caused by the tensor product may introduce some semantic noise, our models on dimensionality reduction are reported, which incorporate the physical idea of identifying the principal contradictions and ignoring the secondary contradictions. The Pcc and MSE of each corpus are obtained and compared with the results of other models. Experiments are implemented on 21 datasets, including the SemEval Semantic Textual Similarity Tasks (years 2012, 2014, 2015, 2016). It is clear from the above discussions that 16 out of 16 datasets outperform the comparative methods significantly and need no prior knowledge except for word2vec. The data from the experiments indicate the advantage of our approaches in that sentence embedding based on quantum computation taking dimensionality reduction into account can efficiently mine semantic information without complex computing processes. For future work, we attempt to extend the current framework to some study on the semantic structure of sentences considering the different weights of words.