Classical Computer Assisted Analysis of Small Multiqudit Systems

Quantum computation model is regarded as a model which can overcome barriers in calculations efficiency of problems which appear in modern science. In spite of hardware development, in particular a recent emergence of several different physical installations of the pioneering quantum machines, the contemporary and numerical analysis of problems concerning quantum computing is very important. In the first part of this article, some useful computing techniques for quantum registers processed by a quantum circuits are presented. Applied classical parallel computational techniques are utilised to shorten the whole computational time. New methods of processing state vectors for qudits and density matrices are presented, indicating which operations may be performed in parallel in the context of the implementation of local unitary operations. There is also shown, how to use the reduction operation in parallel implementation of the von Neumann measurement by performing local measurements on a system of qudits. In addition to the purely technical results as described above, the paper includes also a bunch of purely theoretical results which substitute a solid mathematical ground for the computations performed with the help of the computational routines as described in Section III. In particular, a discussion concerning general multi-qudit quantum states through the prism of Entropy and Negativity measures of entanglement included in has been presented. Additionally, the notion of the total entanglement has been introduced. For certain classes of popular multi-qudit states, the introduced deficits of entanglement defined with the use of von Neumann Entropy and Negativity have been discussed. In particular, by the use of Gram matrix technique, the corresponding deficits of entanglement in the analysed states have been computed in an explicite way. Additionally, some new results on AME states for some multiqudit systems are also included in Section V. The last part of the article presents some numerical experiments on multi-qudit entanglement and determination of total entanglement values for convex combinations of GHZ and W states. Some details concerning technical nature of results are included in the attached Appendix.


I. INTRODUCTION
Quantum computing is believed to be a source of new ideas and solutions leading to a very significant increase of computations performance, especially by shortening its duration The associate editor coordinating the review of this manuscript and approving it for publication was Abdullah Iliyasu . in many areas of science e.g. applications to fuzzy logic [1], machine [2] and deep learning methods [3]. Unfortunately, quantum computers are still in an early stage of development what may be seen in restricted capabilities and an appearance of a high intensity of errors appearance during the computational process. Presently, there exist some physical installations of a small scale quantum computational systems, e.g. IBM Q Experience [4], D-Wave Systems [5], Rigetti Computing [6], IonQ [7], Xanadu Quantum Technologies [8]. However, theoretical analysis -but also computation and simulation (a very actual and complete list of software simulators is available at Quantiki [9]) -in the field of quantum computing are still a basic cognitive tools, despite the expected quantum supremacy [10]- [12] with large scale quantum machines [13]. On the other hand, small quantum computers with intermediate noise scale, so-called NISQ computers, are used to perform quantum computations and simulations of quantum systems that are much larger than the NISQ machine itself [14], nowadays.
The reasons mentioned above maintain importance of developing classical simulation techniques for quantum computations. This approach has already appeared in nineties of the previous century [15] and it is still utilised what is shown in [16], [17]. For example, recent works on the topic show that even some computationally hard problems may be solved successfully with use of the real quantum computer [18], e.g. tensor networks are capable of solving high-dimensional problems [19]. Other simulations techniques of quantum programs on modern multi-GPU platforms may be crucial in validating quantum algorithms i.e. the robustness of the proposed error corrections codes against quantum noise, and designing applications for future large scale and universal quantum computers [20], [21] what is also confirmed in [22] where the constructed simulator is highly efficient.
To sum up, thanks to the computer simulations of quantum computing, it is possible e.g. to model a quantum noise and check whether analysed system is capable to successfully perform calculations. Therefore, it seems valuable to propose efficient, or at least better, approaches to performing numerical calculations. In this paper, we discuss classically based computational solutions prepared for simulation of quantum circuits what causes a need of basic concepts introduction, like quantum registers, vector states, density matrices, quantum measurement, etc. The nature of quantum computation, where parallel processing [23], [24] is present on the exponential level, obliges us to build a very demanding modelespecially, in the context of restricted utilisation of additional temporary data buffers which are necessary to process a vector state or a density matrix. This paper is organised as follows. In Section II, there are included selected basic concepts relating to simulations of quantum circuits. Next, in Section III, computational procedures are shown which serve to process quantum register in a form of vectors (our approach is also adapted to density matrices case). The notion and importance of local operations is also highlighted, and computational complexity with the possibilities of parallel processing is analysed. In Section IV, several new variants of operator decompositions for general multi-qudit quantum states are discussed. In particular, a canonical relation between the coefficients of basic Hilbert-Schmidt inner product based on Schmidt decomposition with coefficients arising from the combined spectral and Schmidt decompositions of the corresponding eigenstates is presented there for a general quantum states [25]. Some problems concerning multi-qudit entanglement anatomy are presented in Section V. In this section, von Neumann entropy, negativity, and entanglement deficit notions for some quantum states, like multi-qudit GHZ and W states, are discussed. In the last part of Section V, some numerical examples are also presented. Appendix includes mathematical description of local quantum operation introduced in Section III. Summary, acknowledgements, and bibliography end this article.

II. MATHEMATICAL PRELIMINARIES
In this part of the article, the basic mathematical concepts directly referring to quantum states processing are presented. In particular, we describe single-qubit (single-qudit) unitary operations, controlled two-qubit (two-qudit) operations, and von Neumann quantum measurement. Table 1 contains symbols, notations, sets, and functions which are used in further parts of the paper.
We denote a set of complex numbers as C, the real values as R, and integers by N. The floor function for real numbers will be denoted as · .
Let the space . . ⊗ C d be a finite d N dimensional complex Hilbert space where d and N are integers d ≥ 2, N ≥ 1. The symbol ⊗ stands for the tensor product and it is repeated N times.
A given pure quantum state | ∈ H forms a quantum register containing N quantum information units. It should be emphasised that in the further part of the article the expression quantum register is a substitute of quantum state denoted as | . Naturally, the Dirac notation is applied to express so-called pure states, i.e. column vectors | (additionally by | we denote co-vector i.e. transposed and complex conjugated component-wise row vector from | ). VOLUME 10, 2022 The generalised unit of quantum information is called a qudit (its application to quantum computing is discussed in paper [26]): and d−1 i=0 |α i | 2 = 1 where α i are so-called probability amplitudes. For d = 2, a unit termed as a qubit (quantum bit) is obtained. Another well-known unit of quantum information is a qutrit with d = 3. The degree of freedom (marked as d) is an integer which points out the informational capacity of single qudit.
An arbitrary qubit vector state, which is expressed with the use of basis states, may be denoted as: where the basis vectors |0 , |1 are: and mentioned vectors are orthonormal, and are also called the standard computational basis. For vector states with d ≥ 2, the following basis vectors are very often marked with integers from 0 to d − 1 written in decimal form, but the binary number or general dit coding is also used. For example, the state |010 is a three qubit state where the first qubit is |0 , the second |1 , and the third |0 . Of course, for qubits (d = 2), 010 is treated as the binary number. If there is a four qutrit state |0221 , then 0221 is a ternary number.
Any complete factorizable state of N qudits may be put into a quantum register with the use of tensor product: Remark 1: In this paper, for simplicity, all qudits |ψ i , 1 ≤ i ≤ N have the same dimension. It should be emphasised that numerical routines which will be presented later in the text, can be also applied to the registers with qudits possessing different dimensions d. In this case, an additional array of used d values for the corresponding qudits is required.
States of quantum registers may be expressed also as density matrices ρ. The density matrix for the pure state | is: It should be mentioned that in case of a density matrix ρ, we sometimes also use the name quantum register. Remark 2: It should be also stressed that some pure quantum states cannot be presented as a tensor product states of single qudits, as in (4). These states are called entangled states [27]- [30]. As an example of entangled state, we can present one of so-called Bell states (or EPR pairs): Further definitions of important notions, in the context of quantum entanglement as von Neumann entropy and Schmidt decompositions, will be presented in Section IV and Section V. A fundamental restriction of efficient structures processing, containing information about quantum states, is based on their dimensions. If N qubit system is taken into consideration, i.e. d = 2, then dimensions of vector and matrix ρ are: For any d ≥ 2, the dimensionality of quantum states, expressed as the vector or the matrix, is as follows: Taking into account Remark 2 for entangled systems, there is no possibility of expressing a quantum register as a direct tensor product of smaller subsystems.
One of the basic operations performed on a given register | are unitary operations given by unitary matrices U . The second type of operation which can change a state of a quantum register is measurement. The von Neumann measurement (VNM) [31] is a basic scheme which may be presented in the context of quantum register measurements. The unitary operation performed on the quantum register | is described by the matrix U (it is naturally a unitary matrix, i.e. UU † = I). In the context of the so-called quantum circuits, a unitary operation is called a quantum gate. Its action on the vector state | and the density matrix ρ is described as follows: In other words, the operation U transforms state | (or state ρ in the case of density matrices) to the state | (or to the state ρ , respectively).
It is worth to mention that the global unitary operation can be composed from a smaller unitary operations. As an example, let us consider the following unitary operator U : where only two of four qubits are modified, because operators I (2) , I (4) are identity operations, and u (1) , u (3) are unitary operations performed on the first and the third qubit. Let us emphasise that this product-like structure of U allows defining a computational procedure which, in spite of exponential grow of sizes of quantum state, processes the quantum register by the direct use of the parallel computing methods. Adequate solutions are shown and discussed in Section III. In case of qubits, a basic set of Pauli operators (together with the identity operation) and Hadamard gate can be expressed with the following matrix representations: 82638 VOLUME 10, 2022 Next to single-qubit (qudit) gates, there also exist controlled quantum gates. A basic example of such a gate is the controlled negation operation -CNOT. The way it works for qubits can be written as follows: If the state of the first qubit is |1 , then the state changing operation is performed on the second qubit using a negation gate -NOT. The first qubit is called the controlling qubit and the second qubit is the controlled qubit. The described mode of operation can be also expressed in the form of a matrix: The CNOT gate works on two qubits, but there are variants with more controlling lines, e.g. Toffoli gate [36]. An important property of the CNOT gate is the possibility of introducing entanglement to the quantum register, because the introduction of entanglement is not possible using only gates operating on one qubit/qudit. The entanglement phenomenon is an important and vital quantum resource, necessary to perform teleportation protocol [37], [38]. Remark 3: It is important to notice that unitary operations, just like entangled states (Remark 2), may have a form which cannot be obtained as a result of other local unitary operations in the tensor product. These operations may be called non-primitive [39]. The basic example of such gate is the CNOT gate, i.e. controlled negation: where X represents the unitary negation (Pauli gate X) and In the context of quantum gates, the notion of a universal quantum gates set should be recalled. It can be said that a set of universal quantum gates is represented by any set of gates with following property: any other unitary operation can be expressed as a finite sequence of gates from the universal set of gates. For the case of qubits, the rotation gates build on the base of Pauli gates and CNOT gate can be regarded as the universal set of quantum gates, for more details please see [40].
The von Neumann measurement [31] is a basic scheme which may be presented in a context of quantum register measurement. The operation is described by the observable M which is a Hermitian operator acting in a subspace of a system to be measured (naturally, also the whole system may be measured not only its subspace). The spectral decomposition of observable M is: where P i is a projector on eigenspace of M with the eigenvalue λ i . The measurement's results are represented by the eigenvalues λ i . The probability of obtaining the result λ i on the state |ψ is given as: After a measurement, the result λ i influences the system's state in the following way:

Remark 4: Quantum measurements may be realised in various computational bases what requires a selection of proper set of projectors. However, in case of the standard basis, a very efficient implementation of measurement can be presented, what is shown in Section III-A.
A broader introduction to quantum gates, quantum computation models, and quantum circuits is beyond the scope of this article. A solid introduction to the topics of quantum circuits can be found, for example, at [41]. Let us just mention that the Pauli X gate plays the role of a negation gate, that is, like its classical equivalent, the state |0 transforms into |1 and by analogy |1 → |0 .

III. PROCESSING OF QUANTUM REGISTER FOR QUANTUM CIRCUITS
Processing of a quantum register [41], in the context of quantum circuits simulation, can be depict by the use of three main procedures: single unitary operators, controlled unitary operators, and a measurement operations.
Utilising the basic properties of the tensor product, which describes the whole quantum register, we can present some characteristic values for effective addressing of particular probability amplitudes in a quantum state. The applied solutions, presented in previous publications [42], and in our papers [43], [44], refer mainly to qubits. In this paper, the proposed routines are extended to the case of qudits. Parallel techniques, in a context of quantum simulators harnessing qubits, are also known [45]- [47] but routines, presented here, can be implemented directly and do not depend on other computational libraries for linear algebra. We also directly emphasise which operations can be parallelised. The procedures, shown in this section, have a characteristic feature: the operations are always grouped in several groups which contain one set of local operations which may be executed regardless of other sets of local operations. This makes possible the parallel processing of indicated local operations, and applying additional optimisation when it comes to operational memory access.
In the proposed calculation procedures, in the context of the measurement implementation, we propose the use of a parallel reduction operation. The reduction operation itself can be described as the transformation of a sequence of data, e.g. an array of elements, into a single element/scalar. An example of this type of operation is the sum of array elements, e.g.: 3,5,7,9,11], (18) the result of the reduction is the following operation: Naturally, the sum operation can be split into e.g. pairs of sums: Each of these operations can be performed independently of the others, and then the partial results can be combined together. Despite the triviality of the example, in an analogous way, it is possible to effectively implement the state's measurement of a given quantum register, what is shown in the next Section III-A. Additionally, for a detailed mathematical description of an arbitrary local quantum operation's action on the general multi-qudit states, we refer to the paper [48] and Appendix of the present article.

A. LOCAL MEASUREMENT OPERATION AS PARALLEL REDUCTION
The von Neumann measurement [31] in the standard basis may be efficiently performed by the use of the parallel reduction. Let us consider a system composed of two qutrits being in the pure state | = |ab = 8 i=0 α i |i and the second qutrit b is measured. If the measurement is realised as described in Section II, then three probability values have to be calculated: p(λ b 0 ), p(λ b 1 ), p(λ b 2 ). However, instead of vector operations, three following sums have to be computed: where lower indexes are ternary numbers. All above operations may be performed in separate threads. The parallel reduction is easy to apply. The reduction process may be depict in a form of a tree what is shown in Fig. 1.
If the result of the von Neumann measurement is denoted as λ i , then according to (16), probability is expressed as p(λ i ) = where P i is a projector representing the result eigenspace. Performing the measurement operation requires the modification of a vector which represents a pure state. Adopted solution is based on putting zeros in amplitudes which are not related to measurement result λ i . In other words, if the result is e.g. λ b 1 with probability equal to p(λ b 1 ), then the amplitudes in which the second qutrit is not in state  This operation may be performed independently on indicated amplitudes. Finally, the measurement process can be realised in parallel because each probability amplitude is processed independently. Algorithm 1: The computational procedure VNM realising a special case of measurement, i.e. the von Neumann measurement in the standard basis. The first parameter | represents a quantum state, α i stands for i-th amplitude of | , d is qudit's freedom level, n marks the number of qudits, t stands for ordinal number of qudit, mm is a mask function which points out the qudits to be measured.
If we use a mask, for example mm(j, 1), then we check whether j-th amplitude refers to the state result denoted as λ 1 under measurement operation and the result of measurement is given as (r, m) pair. In the case like mm 1 , the outcome of the measurement is related to the result denoted as λ 1 .
The estimation of computational complexity T VNM of the VNM procedure directly refers to two for loops: where T mm represents complexity of operations on a mask, and T RS stands for the operation of measurement selection, T norm denotes the last part of the measurement operation -normalisation. However, it is easy to allocate individual operations in both loops between particular computational cores, the basic unroll optimisation for loops can be also applied in this case. It should be added that the VMN procedure, described by Algorithm 1, may be used as measurement's simulation in a context of One-Way Quantum Computation Model (1WQC) [32]- [35]. In this model, the operation of measurement (realised in different computational bases) is a fundamental computational mechanism. The relevant modifications, concerning a measurement in basis described by a set of P i projectors for observable M (as in (15)), are presented as Algorithm 2.
Algorithm 2: The computational procedure VNM {P i } realising a measurement in a basis described by the set of projectors P i . Parameters have the same meaning as for the VNM routine The masks are used just like in the previous algorithm, e.g. mm(i, 1) stands for the i-th amplitude referring to the state result denoted as λ 1 .
The main modifications are, naturally, additional multiplications on probability amplitudes which include projectors P i with their dimension d, what may be stated as: and it means a fragmentary execution of the operation (16) on the j-th probability amplitude related to one of P i operators under the measure mask (P i (j/mm(j,0))0 ) when the probability distribution is calculated. The entry expresses amplitude's modification according to utilised projectors for final measurement result denoted as r.
Just like in the VNM routine for measurements in the standard basis, the computational complexity of the modified procedure is still O(d N ). The additional operations, connected with the computational basis, influence only parameters T mm and T norm .

B. LOCAL UNITARY OPERATORS
An impact of a unitary operation on one qubit, also when we deal with controlled operations, may be implemented with an effective usage of many computational cores (thanks to the way in which a unitary operator is applied to a register). Fig. 2 depicts some quantities which may be formalised in the following way: procedure

Proposition 1: If N is the number of qudits in a quantum register, d stands for the the corresponding dimension of single qudit, and t is the number of qudit modified by a unitary operation U , then: (I) the number of so-called local blocks p = d t−1 , (II) the space between elements in a local operation step
The operation LocUniOpForStateVector(·) is a realisation of interaction between operator u and pointed out elements of state | :  Additionally, the number of LocUniOpForStateVector(·) calls in Algorithm 3 is In other words, a matrix is multiplied by a vector, but this operation concerns only d elements of a vector | . Each operation of this type is local and may be performed independently. Again, just like for the operation VNM, the computational complexity of procedure described as Algorithm 3 is: where T ini represents initialisation operation. Main loop contains operations T LUO , but complexity T LUO is constant and depends only on d. Finally, d N complexity is obtained. Owing to UPST approach, an algorithm for controlled unitary operations can be presented.

Algorithm 4: The algorithm of vector state processing for a controlled 1-qudit gate u (CUPST). Parameters are the same as in Algorithm 3, and c is an ordinal number of controlling qudit.
procedure is responsible for preparing the index number based on the information which qudits are used as controlling qudits. This information and information from Proposition 1 allows pointing out indexes of probability amplitudes and the expression: means that the index which points out the given row r i is calculated under additional dits mask which points the controlling qudit. The computational complexity remains exponential, but it should be emphasised that the number of controlling lines may significantly decrease the number of local operations: Also for the case when the quantum register is described by a density matrix, one can analogously formulate computational procedures in which the processing is done in-place by means of the indicated local operations which can be performed independently. In earlier work [44], the computational procedure for density matrices for qubits has been presented. However, the Proposition 1 can be used directly to construct analogous routines for density matrices containing qudits. Algorithm 5 shows how to process density matrices of N qudits with local unitary operator which acts on qudit numbered as t. The Algorithm 6 implements computational routine to process density matrix ρ where qudit t is modified by an operator u if qudit c is in given state represented by the function mask(·) -similarly as in computational routines for a vector state.
Algorithm 5: The algorithm of density matrix processing for single-qubit gate (DMT). The parameter ρ represents a density matrix, n number of qudits, t ordinal number of a qudit, u is a unitary matrix for single qudit. procedure end for end for end for end for end procedure Algorithm 6: The algorithm of density matrix processing for a controlled 1-qudit gate u (CDMT). The parameter ρ represents a quantum state, other parameters remain the same as in the previous algorithms.
procedure CDMT end for end for end for end for end procedure The local operation LocalUnitaryOpForDenMatrix(·) performs changes in indicated elements of matrix ρ. In this case, d × d elements are modified. For example, in case of a qutrit, the local operations are (s is a shorthand of the variable step): where

IV. OPERATOR DECOMPOSITIONS OF GENERAL QUANTUM STATES
Let −|− be any factorisable inner product on the space L(H A ⊗ H B ) of linear operations acting in the product space ∞ and equipped with the standard Euclidean scalar products −|− . This means that there are two inner products: −|− A on L(H A ), and respectively −|− B on L(H B ) and such that for any Q A ∈ L(H A ), Q B ∈ L(H A ) the following equality is true: The following theorem generalises the well known operator Schmidt decomposition, formulated for the standard Hilbert-Schmidt products structure [41], [49] and applied in the context of quantum computations in several papers [50]- [53].
and such that Example 1: Let the inner product be given by the Hilbert-Schmidt's formula: where tr (−) stands for the trace of (−) in the corresponding Hilbert space. The corresponding Schmidt decomposition of Q will be called the HS-Schmidt decomposition of Q and is given by: and similarly for E HS(B) k . The phenomenon of quantum entanglement is, by no doubts, one of the fundamental resources for the emerging VOLUME 10, 2022 Quantum Information Processing technology. Basic information concerning entanglement can be found in widely available textbooks and several review papers [49], [50], [53]- [55]. The following result is known [51], [52], [56] for the Hilbert-Schmidt decomposition.
Using (36) it follows that: Then the HS-Schmidt numbers of Q are given by (2) the state Q is separable if and only if ( α τ α ) 2 = 1 which means that the canonical Schmidt's rank of | is equal to 1. Proof: It is easy to observe: and similarly for |θ α θ β |. Moreover, the system {|ψ α ψ β |} forms a complete system in HS(H A ), and similarly {|θ α θ β |} forms a complete system in HS(H B ). The part (2) follows from the use of Theorem 2.
For the case of a general Q ∈ E(H A ⊗ H B ), the spectral decomposition of Q is considered: where λ k ≥ 0, k λ k = 1 are eigenvalues of Q, and E k = | k k | are the orthogonal projectors onto the corresponding eigenvectors. Let, be the Schmidt decomposition of the corresponding eigenvectors of | k . Then, comparing the arising expansions for Q: with expansion (32) one can obtain: Let: For further use, let us define: where and similarly for the part B, and let: where I is the identity matrix.
D + ≥ 1. In this case, if Q is separable the following is true: Proof: We start the proof with assumptions: . For the complete proof, it is enough to assume (without lost of generality) that the actual Q is of the form Q = Q A ⊗ Q B . Let us define, for any Q ∈ L(H A ⊗ H B ), the following (witness-like) operator: where q A k , and resp. q B k are given from the corresponding Schmidt decomposition (32). Then, by an easy computations from which it follows: Remark 7: It easy to observe that in the HS-inner products Theorem 3 reproduces the well known Rudolph theorem [50], which is also called the realignment criterion. The proven by us Theorem 3 is a slight generalisation of the original realignment criterion although is of a similar nature.

A. MULTI-QUDIT QUANTUM STATES
Let d ≥ 2 be a fixed integer and let h(N ) be d N -dimensional, complex euclidean Hilbert space = tr (Q) = 1}. If Q ∈ ∂E, then there exists (essentially) unique normalised vector | ∈ C d N such that Q = | | (i.e. Q represents pure state).

D. THE NEGATIVITY
For a given | ∈ C d N and π 1 ∈ 2q − Par(I N ), we define negativity of and corresponding to the decomposition I N = π 1 ∨ π 2 as Note that: The total negativity of the pure quantum state | is given by Note that: Remark 8: The defined by (69) and resp. by (71) quantities corresponding to the negativities of a given many-qudit state along given cut π 1 (see (69)) and the total negativity defined by (71)

also posses most of the properties as it was listed for the von Neumann entropies in points (i)-(iv) at the end of the previous section.
The computational complexity of the evaluations of the entanglement deficit by the use the corresponding von Neumann entropy or the use of Negativity measure grows exponentially fast as the number of qudits N is increasing and this is why the suitable parallelisation has to be prepared. For example, taking N = 10 and d = 2, one has to deal with the ensemble of 31 matrices of size 1024 × 1024.
Remark 9: The states for which the maximal value of the total entropy (56) or the total negativity (71) are attained, are known under the name of absolutely maximally multi-partite entangled states (AME [59], [60]). The problem of their existence is strictly connected to the so-called quantum marginal problems [61] and belongs, according to [62]

to one of the most important (and mathematically most complicated) problems of the quantum information theory foundations.
The new results on the existence and morphology of AME states for some special many qudit systems are presented here.
Let S be a system composed of N -qudits where each qudit's dimension is d i (for i = 1 : N ). Let us assume that there exists a value of i such that Then, the set AME(d 1 , . . . , d N ) of all AME states in such system is given by the following construction. Let us assume, without loss of generality, that the value of i for which the condition (74) is valid is equal to N . Let {e i α , α = 1 : d i } be a system of orthonormal bases, for i = 1 : N in the corresponding spaces d i . Any unit vector | ∈ ∂E(⊗ N i=1 C d i ) is given by the following formula where Let Q = | |. Then the corresponding reduced density matrix Q 1,...,N −1 , given by is a d 1 ·. . .·d N −1 dimensional matrix with the matrix elements If the state Q is AME state then the following must be true: If these equalities are true, then it follows that for any (i 1 , . . . , i k ) ⊂ (1, . . . , N − 1) the corresponding reduced density matrix Q i 1 ...i k N , given by where the partial trace operation tr i 1 ...i k (·) is performed on corresponding Hilbert spaces describing the system of qudits labelled by lists (i 1 . . . i k ). Then all the appearing reduced density matrices are maximally mixed as it follows from simple handmade calculations. Therefore, the entanglement of the all (i 1 . . . i k ) is maximal possible (in the sense that entropy and negativity are maximal). Thus, we have proved the following results.
where (e α i α ) is any complete orthonormal system in C d α , for each α, (F α 1 ...α N −1 ) is as system of orthogonal vectors in C d N and such that (82) Remark 10: The introduced here notion of deficite of entanglement and the notion of the total negativity are very close to the previously used in the similar context notion of the many qudit tangle [63]- [65]. In particular, for a specific bipartite division, the notion of the corresponding negativity is essentially equal to the widely used in the literature notion of the generalised concurrence [66], [67].
The definitions of entropy (and negativity respectively) for the partition π, and the definitions of total entropy (and respectively negativity) allow constructing the algorithm for determining these quantities: Algorithm 7: Algorithm Bipartition Entropy and Negativity -BPEN for calculating values of entropy and negativity. Q represents the analysed quantum state, N is the number of qudits, and d is qudits' degree of freedom. procedure BPEN( , N , d) for π = π 1 ∧ π 2 ∈ L 2p do in parallel calculate τ π 1 j for j = 1:d |π 1 | and such that | = j τ π 1 j |ψ π 1 j ⊗ |θ π 1 j , The 2 N -element list of partitions influences the complexity of the algorithm. The particular operations interact with data sized d N . The algorithm's complexity is: where T SHM represents the Schmidt decomposition which is performed at O(N 3 ). Therefore, the final computational complexity contains two exponential factors. However, the analysis of particular partitions π may be performed independently, what shows the pseudo-code in Algorithm 7. It means that analysis of a each individual partition may be dispersed among available computational nodes which of course is still of exponential complexity computation as number of qudits, the number N grows.

E. ENTANGLEMENT DEFICIT FOR SOME QUANTUM STATES
Let d = 2 and let N ≥ 2. The following N-qubits state: is called the (generalised) Greenberger-Horne-Zeilinger state (GHZ state for shorthand). For N = 2, the GHZ 2 (2) state is maximally entangled 2-qubit state in the sense that the corresponding one qubit reduced density matrices (RDM) are maximally mixed. Additionally, the GHZ 2 (2) state is maximally entangled with respect to LOCC semi-order relation on the set ∂E(C 2 ⊗ C 2 ). VOLUME 10, 2022 For N = 3, all the one qubit RDMs: are equal to each other and are maximally mixed (where tr ij (−) means the quantum operation of tracing out the corresponding to qubits i and j degrees of freedom). This shows that the GHZ 3 (2) state has maximal total entropy (56) and maximal negativity as well. Additionally, one can show that GHZ 3 (2) state is maximal state with respect to LOCC semi-order relation.
In the case of N = 4, the situation is different as it was proved by Higuchi and Sudberry [68], and also confirmed in [69] (see also the fundamental work of Klyachko [61]) that four qubits states which have maximally mixed all two-qubits RDMs do not exist. However, the state GHZ 4 (2) is maximal with respect to LOCC semi-ordering relation.
Moreover, it appears that the corresponding Schmidts ranks are equal to two and the values of non-vanishing Schmidt's numbers are λ 1 = 1 √ 2 and λ 2 = 1 √ 2 . The corresponding to π 1 partition von Neumann entropy is equal to ln(2), for each k = 1:N . Therefore, the total entropy comparing it with (57), concludes the proof. Another well known, and frequently used for several purposes, is the class of the so-called W-states, defined for N -qubit system as: Observation 2: For W N (2) states: Proof: Using the permutational symmetry of W N (2) state, it follows that it is enough to compute the corresponding entropies for the one of the representative of all classes in partition groups for which |π 1 | = k. So, let k = 1: N /2 , and the corresponding partitions of I N will be denoted as 1 . . . k = {1, 2, . . . , k}. Then, some easy and handy computations show that rank of the corresponding Gram matrix (see [70], and also the Example 5 and below) is equal to k +1, and the corresponding spectrum has the non-zero part equal . Therefore From (90) it follows that Comparing this formula with definition (57) concludes the proof by an elementary arguments (see also below).

Definition 1: Let T ⊆ S N be a subset of the symmetric group S N . The state Q ∈ E(C d ⊗N ) will be called T-symmetric state if and only if
where: and |i α j α | is the canonical basis in the corresponding space L(C d ) (the space of d × d matrices), then

Remark 11: It is easy to conclude that if the state Q is T invariant, then it is also invariant under the action of the subgroup, generated by the set T of the corresponding symmetric group. This gives the opportunity to divide the sets of k-th elements partitions into equivalence classes, on which the corresponding entropy is constant.
Example 3: The states GHZ N (2) and W N (2) are examples of a S N -invariant states. Also their generalisations, introduced below, denoted as GHZ N (d) and resp. W N (d), are S N -invariant states.
Let Q ∈ ∂E((C d ) ⊗N ) be a S N -invariant state. Let π ∈ 2 − Par(I N ) and |π 1 | = k ≤ N /2 . Then, for any such π the corresponding Gram matrices (of dimension d k ) have the same (modulo numbers of zeros eigenvalues) spectrum (λ π 1 1 , . . . , λ π 1 d k ). Therefore, for any π 1 such that |π 1 | = k, the von Neumann entropy is the same. So, it is enough to compute entropy in the case when π 1 = {1, . . . , k}. Lemma 1: Let Q ∈ ∂E((C d ) ⊗N ) be a S N -invariants a straightforward corol N -qudit pure state. Then, the total entropy of Q is given by As a straightforward corollary, we have: where c ∈ R, then Much more stronger version of this simple observation is the following result.
Proposition 3: Let Q N ∈ ∂E((C d ) ⊗N ) be any sequence of qudit states, and let for any (π 1 , π 2 ) ∈ 2 − Par(I N ) with |π 1 | = k the following estimate be valid: Then, Remark 12: The very interesting class of highly entangled N -qubit states is given by the so-called cluster states [35], [71]- [73]. It appeared that for most of them the corresponding entropy deficit also grows fast with an increase of N .
Example 4 (The Generalised GHZ-States): The following family of states generalises the notion of GHZ N (2) states for qudits: This family is evidently S N -invariant family. The corresponding to the cut 1 . . . k Gramian G 1...k (GHZ N (d)) is spanned by the frame where |α . . . α N −k stands for state |α ⊗. . .⊗|α of the qudits with label k + 1 up to N . The Gramian (modulo the permutations of rows and cutting-off null sectors) looks like: The von Neumann entropy of the cut 1 . . . k is now easy to compute: and the total entropy is equal

Example 5 (The Generalised W N (d)-States):
The following family of states seems to be a natural generalisation of the standard N -qubit W-states to the case of qudits of dimensions d: where α i is putted at i-th position, and For a given k, the corresponding Gram matrix G 1...k (W N (d)) is spanned by the frame consisting of 1 + (d − 1)k vectors: where and The corresponding Gram matrix G 1...k (W N (d)) after permuting the rows and eliminating the null sectors is given by the following (1 + (d − 1)k) × (1 + (d − 1)k) matrix:   and the corresponding entropy according to cut 12 is equal to: where particular eigenvalues of the matrix are: G 12 (W 4 (3)) this λ 1 =

F. NUMERICAL EXPERIMENTS
Calculating Entropy in multi-qubit entanglement may be shown for two states families, i.e. GHZ N and W N states, where N stands for the number of qubits utilised to build the subregister. In the presented numerical example, a convex combination of these states, for 0 ≤ p ≤ 1, is: Calculating values of Entropy/Negativity with Algorithm 7, as it is pointed out by the algorithm's description, may be realised in parallel by a multiprocessor system. This approach shortens the calculations time and allows better utilisation of computing resources. It should be emphasised once more that the parallelisation is useful in calculating partial Entropy values for each partition (by creating several partition sets and performing calculations in one computational task for a specific set). Additionally, algebraic operations performed on quantum states, e.g. Schmidt decomposition, may be also realised in parallel due to the NumPy package.
Values presented in Table 2 depict calculations duration for the state Q given by (112). Gathered data show that more effective usage of computational resources may be gained by properly selected number of threads and tasks (i.e. partition sets). In the testing environment, octa-core Intel Xeon W-2245 was utilised but the true number oif threads is sixteen thanks to the Hyperthreading technology. A split into eight main computational tasks -where each task has two dedicated NumPy threads -results with the quickest execution of the computation. The obtained acceleration, in systems of seven and lesser number of qubits, is about two. However, for greater number of qubits acceleration increases. The computation is performed four times quicker for a ten-qubit system in comparison to basic approach when particular steps of partition analysis are executed one after the other.
Algorithm 7 is easily scalable. Even for one node, the partition set split results with shorter calculation time. Especially, when the number of nodes is greater, the acceleration is more effective what is shown in Fig. 4. The testing environment was based on processors Intel Xeon E5420 2.50 GHZ (older solution than in the first experiment), the calculations were performed on sixteen nodes, and each node consists of two processors. This approach allows checking, if greater number of nodes results with shorter computation time.
The communication between nodes is based on the mpi4py 3.1.3 package [75]. Fig. 4 depicts that doubling the number of nodes causes proportional time shortening, what is especially visible for the ten-qubit system. It should be added that the communication between nodes is restricted only to passing partial Entropy values what means that it has no significant impact on the whole calculation process. The biggest difference between the worst and the best case is over fortyfold, if we compare cases with and without parallelisation. Naturally, this figure is lesser if we compare a node where eight compute tasks receive a split set of partitions with the cases where 2, 4, 8, and 16 compute nodes are used. For 16 nodes (10 qubit system), the acceleration is approximately  fifteen times (or even fourteen is we compare node ''1'' where one computational task with the case where 16 nodes is used). For systems with lesser number of qubits, the acceleration is not so spectacular because of smaller matrix size expressing analysed quantum states.
The results of the last numerical example, presented here, are shown in Fig. 5. We calculate entropy deficit for states GHZ N (2), GHZ N (3), GHZ N (4), and in Fig. 6 for W N (2). We also present the differences between the total entropy value (58) and the entropy value for two defined states by formulas (87) and (91), respectively. Values of total entropy

VI. CONCLUSION
The article presents fundamental opportunities of parallelising the operations of the quantum register processing. The register may be expressed as a qudit vector state or a density matrix. The crucial element of the presented procedures is the indication of local operations that can be performed independently of one another. The approach to a very efficient implementation of the von Neumann measurement operation in the form of a parallel reduction is the contribution of this work, and it allows using many available computational units.
The problem of an entanglement's presence in the multiqubit (multi-qudit) register is also discussed and enriched by providing the necessary definitions of the entropy value and the negativity measure for a register divided into two parts. The basic technique of parallel computation for this task was also indicated and the measures for an exemplary register, which state is a convex combination of two entangled states, are shown.
Although simulations of quantum computing are burdened with the exponential size of the problems, they are subject to the possibility of parallelisation, both at the level of quantum circuit simulations or other problems, such as the presented study of multipartite entanglement (see Algorithm 7), which allows obtaining a significant reduction in the processing time of the analysed problem or simulation.
It was also shown how the problem of calculating the entanglement deficit can be effectively calculated for selected quantum states. In particular, it is shown how to use the Gramian based techniques to perform the necessary calculations. The given formulas no longer require exponential computing resources to calculate the values of total entropy or entropy at the given cuts. VOLUME 10, 2022

APPENDIX LOCAL OPERATIONS ON QUANTUM COMPOSITE SYSTEMS
A compound quantum system of N independent subsystems θ i (i = 1:N ) is analysed in this appendix. Let H i be a adequate Hilbert space for i-th qudit (under the assumption that each space has a finite number of dimensions n i ). State space describing states of i-th qudit H i is marked as E(H i ). Following quantum physics rules, a global Hilbert space of the composite system of all qudits is defined by H = H 1 ⊗ H 2 ⊗ . . . ⊗ H N , and the dimensions of this space are D = n 1 · n 2 · . . . · n N . Let us set I N = {1, 2, . . . , N }, then a space of all partitions of I N is denoted by Par(I N ). Elements π of Par(I N ) fulfil π = (π 1 , . . . , π k ), where π i ⊂ I N , π i ∩ π i = ∅ for i = i and i π i = I N . A partition π = (π 1 , . . . , π k ) is called finer than π = (π 1 , . . . , π l )) only if for i ∈ I k there exists j ∈ I l such that π i ⊆ π j . This allows defining the partial order α in the space Par(I N ). Now, it is simple to point out the maximal element π max = {I N } and the minimal one π min = {{1}, . . . , {n}}. For an arbitrary π = (π 1 , . . . , π k ) ∈ Par(I N ), the space H π = (H π 1 , . . . , H π k ) may be assigned where H π k = i∈π k H i . Let QP be a physical quantum operation on E(H). The QP may be termed as an elementary π-local operation only if there exists a sequence of operations (QP π 1 , . . . , QP π k ) each transforming E(H π i ). Additionally, QP = i=1:kQ P π i whereQP π i are equivalents of QP π i in the space QP(H). Usually, a global state ρ ∈ E(H) and a π-local operations are given as arithmetic bases and this raises the problem of calculating the action QP as the global operation and solving this problem is the goal of this appendix.
Another problem is that if operation QP is separable, the result of it action on entangled register may affect heavily the order of local operations performed. This problem will be not discussed here.
Let M n×m (C) stand for the space of C-valued matrices sized n × m. In this space, a family of matrices A = A 1 ⊗ . . . ⊗ A N ∈ M n 1 ·...n N ,m 1 ·...·m N (C) may be defined where A i ∈ M n i ×m i (C), i = 1:N . Let A(i, j) be an element of the matrix A. Of course, the value of A(i, j) may be computed in terms of the local indices (i α , j β ) ∈ I n 1 × I m 1 .
Proof: The easy proof follows from the associativity of the Kronecker product and from the ancient theorem about dividing integers with remainder. The details can be found in [48].
The above proof also determines the way how tuples α and β have to be calculated. Some examples presenting the usage of Proposition 4 are shown below.
If (E i,j ) stands for the system of canonical unit-matrix, i.e. (E i,j )(k, l) = δ ki · δ lj , then the system (E α 1 ,β 1 ⊗ . . . ⊗ E α N ,β N ) forms a product computational basis in the space Any linear mapping Op : M n×m (C) → M n×m (C), termed sometimes as a superoperator, can be expressed by the (super-)matrix representation in the canonical basis (E ij ): and therefore for A ∈ M n×m (C): For any k-element tuple σ = (i 1 , . . . , i k ) ⊂ I N exists the adequate two-element partition π σ = (σ, σ c ) where σ c = I N \ {σ }. Each partition π = (π 1 , . . . , π k ) ∈ Par(I N ) can be defined by the concept of π-locality in the product ⊗ N i=1 M n i ,m i (C). A superoperator Op acting in the space ⊗ N i=1 M n i ,m i (C) is called a π-local operation only if there is a sequence of superoperators Op π i , each affecting the space ⊗ α∈π i M n α ,m α (C), such that whereÔ p π i are natural counterparts of Op π i in the space Let a π-local operation Op π act on ρ ∈ E(H). The problem is, how to compute the result of this operation on ρ, if only the local information on Op π is available. For this purpose, the following steps have to be performed: (1) If ρ is represented in the canonical basis (E ij ) ij of the global space H, then use Proposition (4) and pass to the corresponding representation in the product basis (E α 1 β 1 ⊗ . . . ⊗ E α n β n ): ρ = α=(α 1 ,...,α N ), β=(β 1 ,...,β N )ρ (α|β)E αβ , where E αβ = ⊗ N i=1 E α i β i . (2) Let Op σ be a superoperator acting in the space ⊗ i∈σ H i , and let Op σ also stand for its matrix representation in the product basis ⊗ i∈σ E k i ,l i . Then: where the multi-indices α and β are given according to Proposition (4), and (α σ ∨ α σ c ) = α, (β σ ∨ β σ c ) = β are their decompositions. (3) For a general π = (π 1 , . . . , π k )-local operation Op π = Op π 1 ⊗ . . . ⊗Ôp π k apply step (2) k-times, independently of the order, and this property allows performing step (3) in parallel. Example 8: Let us consider three-qudit case with the following partial transposition coresponding to the decomposition of π = ({1, 3}, {2}). Let the action ofT 2 = I (1) ⊗ T ⊗ I (3) on a given tripartite state ρ ∈ M n 1 ·n 2 ·n 3 ,m 1 ·m 2 ·m 3 (C) be analysed here. The locality of the operationT 2 and the use of previous computations (see Proposition 4 above) allows obtaining: where i = (α 1 − 1)n 2 · n 3 + (β 2 − 1)n 3 + β 3 , j = (β 1 − 1)m 2 · m 3 + (α 2 − 1)m 3 + α 3 . (124) In the last couple of years, he is conducting research on the mathematical foundations of quantum information processing together with its applications to the several technological applications. Currently, he is associated with Zielona Góra University on the status of a Professor Emeritus.
JOANNA WIŚNIEWSKA received the M.Sc. degree in computer science from the Military University of Technology, Warsaw, Poland, in 2003, and the Ph.D. degree in a field of quantum algorithms, in 2012. After the graduation, she has started the work at her alma mater with the Faculty of Cybernetics, Institute of Information Systems, as a Research Assistant. As an Assistant Professor, she continues her research concerning the problems of pattern recognition, quantum algorithms, quantum communication, and computer simulation.
MAREK SAWERWAIN received the M.E. and Ph.D. degrees from the Faculty of Electrical Engineering, Computer Science and Telecommunications, University of Zielona Góra, Zielona Góra, Poland, in 2004 and 2010, respectively. He is currently associated with the Faculty of Computer, Electrical and Control Engineering, University of Zielona Góra, as an Assistant Professor. His current research interests include quantum communication methods, quantum computation methods, and theory of quantum programming languages. He also realizes, his research in the field of quantum computations models simulations and creating of effective algorithms for solutions based on modern multicore CPU and GPU technology. He is currently the Chair of the Committee on Automatic Control and Robotics of the PAS and a member of the IFAC SAFEPROCESS TC. For more information visit the link (http://www.uz.zgora.pl/ jkorbicz/). VOLUME 10, 2022