An Improved Consensus Clustering Algorithm Based on Cell-Like P Systems With Multi-Catalysts

Consensus clustering algorithm, which integrates several clustering results obtained by common algorithms, can find a better result that is independent on parameter settings. However, this kind of algorithm is often designed based on simple, such as <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-means, algorithms, which is limited by the time complexity. In this work, a P system, a novel branch of bio-inspired computing with inherent parallel and distributed computation, is combined with the consensus clustering algorithm. As a result, an improved consensus clustering algorithm is constructed using the hierarchical membrane structure and parallel evolution mechanism in a cell-like P system with multi-catalysts, where the catalysts are utilized to regulate the parallelism of objects evolution. The integration strategy of the algorithm is based on a revised PAM where only the <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula>-nearest neighbors of the original medoids are considered as candidates for the new medoids. The experimental results indicate that the clustering quality of the proposed algorithm is more robust than well-known consensus clustering algorithms on data sets with noises and outliers. This work gives evidence that the effectiveness and efficiency of consensus clustering algorithms can be improved using P systems.


I. Introduction
Information plays an important role in each field in modern society. Analyzing data and extracting useful information from huge amounts of data are always important topics in both research and practice. Clustering analysis, revealing relationships between data, is a class of important data analysis methods. Common clustering algorithms have three limitations: 1). The clustering results largely depend on the parameter settings and their initial values. 2). Most clustering algorithms cannot judge the actual number of clusters. 3). Different clustering algorithms may generate different clustering results.
The associate editor coordinating the review of this manuscript and approving it for publication was Weipeng Jing .
Consensus clustering gives a solution to these limitations [1]. It aggregates several basic partitionings (BPs) obtained by common algorithms and obtains the final result which is better than each of these BPs. There are two types of consensus strategies, the object co-occurrence and the median partition [2]. The object co-occurrence strategy computes the number of occurrences of a point in a certain cluster or the number of occurrences of two points belonging to the same cluster, and then determines the final cluster label of each point [3]. The relabelling and voting method [4] and the graph partitioning based method [5], [6] follow this strategy. The median partition strategy turns the consensus clustering into an optimization problem aiming to find the median partition, that is, the partition which can maximize the similarity of BPs. The kernel-based method [7] and the geneticbased method [8] follow this strategy. Consensus clustering is very popular, and researchers have proposed many improvements since it was proposed. For high dimensional datasets, Li et al. [9] designed a random projection framework using a cumulative agreement scheme. For a large-scale dataset, Yu et al. [10] proposed a three-way algorithm based on Spark inspired by the theory of three-way decisions. For multiview datasets, Zhao et al. [11] projected incomplete multiview to a complete and unified representation in a common subspace, Liu and Yun [12] presented an interactive consensus clustering approach in which the BPs from each view interact with the final result, and Tao et al. [13] proposed a marginalized algorithm considering higher-level information such as BPs generated by a single-view. For large scale group decision-making processes, Wu and Xu [14] designed a model where the clusters could be changed and the decisionmakers used fuzzy preference relations to show their preferences. For datasets with dubious data points in which many features are uninformative to clustering, Wang and Tsuchiya [15] performed sub-clustering on data points using some subset of features from data points to handle these dubious data. Considering the relationship among BPs, Rathore et al. [16] proposed a maximum relative density path accumulation algorithm. Considering the local diversity of clusters inside the same BP, Huang, Wang and Lai [17] proposed a locally weighted clustering approach using uncertainty estimation and a local weighting strategy. Considering the cluster-level selection in the selection of BPs, Nazari et al. [18] designed a framework based on cluster-level weighting. However, a significant drawback of consensus clustering is the slow running-time, and simple strategies are used in various algorithms to balance the running-time and the quality of the clustering results. How to improve algorithm quality within an acceptable time complexity is an important research topic in consensus clustering. This work addresses this problem using a bio-inspired computing method, i.e., a P system.
A P system [19] is a novel bio-inspired computational model that is inherently parallel and distributed [20]. There are three types of, i.e., cell-like [19], tissue-like [21] and neural-like [22], P systems which were inspired by a single cell, a set of multiple connected cells, and a network of nerve cells, respectively. Up to now, many variants of P systems have been proposed which abstract various biological mechanisms, such as cell-like P systems with channel states and symport/antiport rules [23], cell-like P systems with polarizations and minimal rules [24], P systems with rule production and removal [25], tissue-like P systems with evolutional symport/antiport rules [26], spiking neural P systems with white hole neurons [27] and so on [28]- [31]. These types of P systems have been shown to be computationally complete [32]- [36]. The current research on using P systems to solve practical real-world problems takes two approaches. One approach, called the indirect membrane algorithm approach [37], [38], is to design algorithms based on certain features of a P system such as separating the whole computational areas into serval regions (membranes) and to execute these algorithms on a modern electronic computer. While this approach leads to novel algorithms for solving many complex realworld problems, the efficiency of these algorithms is still bounded by the physical limitations of electronic computer. Another approach, called the direct membrane algorithm approach [39]- [41], is to design P systems that are capable of performing algorithms for solving real-world problems, and the computation of these P systems can be executed on hardware, software or living cells as a natural evolution process. While the execution on living cells has a promising potential to significantly improve the computational efficiency, it also faces a lot of challenges such as defining the P system to solve specific real-world problems and making living cells perform the computation defined by a P system. While the construction of a bio-computer relies on the participation of biology researchers [42]- [44], the design of P systems appropriate for solving specific problems is up on computer science research and is considered in this study.
This work follows the direct membrane algorithm approach and chooses the cell-like P system with multicatalysts (MCC-P), a specific cell-like P system, to develop and to improve a consensus clustering algorithm based on PAM (Prediction Around Medoids) [45] with the q-nearest neighbors. For easy reference, this improved consensus clustering algorithm is called the P-PCC.
A MCC-P system consists of a membrane structure, a multiset of objects and a set of evolution rules (rules for short) for each membrane. Once entering an input membrane, the objects will start evolving according to applicable rules under the regulation of specific multi-catalysts. This characteristic of the MCC-P system makes it attractive for improving consensus clustering. The evolution of all objects under an applicable rule will occur simultaneously in a nondeterministic maximally parallel manner. When implemented on a living cell, this distinct feature will likely result in much improved computational efficiency.
This work makes the following contributions. 1) A revised PAM is proposed in which only the q-nearest neighbors of the original medoids are considered as candidates of a possible replacement for the original medoids, which improves the computational efficiency. This medoid replacement approach is used as a BP integration strategy of the consensus clustering algorithm which is robust to noises and outliers. 2) A specific MCC-P system is used to perform the computation required by the consensus clustering algorithm in a maximally parallel fashion. Specifically, by using this system, the medoids of all clusters are determined in parallel and the information about each cluster is stored in one membrane regulated by multi-catalysts. A specific output membrane is used to output the clustering result. To the best of our knowledge, this is the first direct membrane algorithm for consensus clustering. 3) Simulation is performed using well-known data sets from the UCI machine learning repository [46] to verify the clustering quality, of the P-PCC and to analyze the major factors affecting the performance, of the P-PCC. If realized in a living cell, the P-PCC will be much more efficient than implemented on an electronic computer.
The remaining of this article is organized as follows. Section II briefly reviews the basic concepts of consensus clustering and the cell-like P systems with multicatalysts. In Section III, the P-PCC algorithm is presented. In Section IV, the simulation results are reported and analyzed. Conclusions are drawn and future works are outlined in Section V.

II. PRELIMINARIES
In this section, some key concepts about consensus clustering are reviewed and the concepts of a MCC-P system are introduced. For more detail, the readers can refer to [1], [47], [48].

A. CONSENSUS CLUSTERING
Consensus clustering is a computing framework that can aggregate the results obtained by several algorithms and produce a better final result. As shown in Fig. 1, let X = {x 1 , x 2 , . . . , x n } denote a dataset with n data points. Arbitrary clustering algorithms are used p times to cluster X into K 1 , K 2 , . . . , K p clusters respectively. Generally, different algorithms or the same algorithm with different parameters are chosen to capture different information about the dataset. The corresponding clustering results π 1 , π 2 , . . . , π p are called BPs where π i =< L π (x 1 ), . . . , L π (x n ) > with L π (x j ) ∈ {1, 2, . . . , K i } for 1 ≤ i ≤ p and 1 ≤ j ≤ n is a label vector, the element L π (x j ) indicates that the data point x j belongs to cluster L π (x j ) in clustering algorithm i. The consensus function aggregates the BPs and obtains the final result π with clustering quality typically better than that of the best BP. The clustering quality can be evaluated by using the normalized Rand index and the normalized mutual information, among others. Note that consensus clustering may take two forms of input. In the first form, the BPs are the input and the original dataset cannot be accessed. In the second form, the original dataset is the input.
The P-PCC takes the BPs as the input, and uses the object co-occurrence strategy. The details of the method are as follows [48].
The revised PAM is used as the consensus function. The final result can be obtained by clustering this new binary dataset X (2) .

B. MCC-P SYSTEMS
A cell-like P system is a distributed and parallel computational model that mimics the biological behavior of a single living cell. A living cell consists of one or more membranes, such as the plasma membrane and the membranes surrounding the internal organelles, which separate the inner space of the cell into relatively independent regions. These membranes may form a nested structure, in which some membranes may contain one or more other membranes. Each membrane may contain some chemical elements and the type, the number, and the location of these elements are controlled by some biological processes, such as chemical reactions and transmembrane transport. The biological process may be influenced by catalysts. Chemical elements can enter the cell through a specific membrane, and evolve under biological regulations by changing their types, numbers, and locations (e.g., moving among membranes). Evolution can occur simultaneously for different chemical elements.
To model the biological behavior of a living cell, a celllike P system uses a hierarchical structure of regions to represent the nested membrane structure, a multiset of objects to represent chemical elements, and a set of rules to mimic the regulation of the biological processes. In this work, a novel variant of the cell-like P system, called the MCC-P system represented by of degree m, is defined as follows, where: -O is the alphabet, i.e., a set of symbols that denotes various types of objects in .
{here, out, in}}, and Q is a list of catalysts, i.e., promoters and inhibitors represented by special symbols in O.
A rule can associate with multiple catalysts, different catalysts are separated by commas. A rule is enabled when any catalyst is satisfied. A catalyst can contain either or both promoters and inhibitors. A catalyst is satisfied only when a promoter but no inhibitor appears.
To distinguish promoters and inhibitors, a symbol ¬ is placed before an inhibitor. If a rule does not need a catalyst, Q can be omitted. -ρ is a partial ordering relationship of the rules, where r 1 > r 2 means the priority of r 1 is higher than that of  The computation starts in the initial state when objects are placed into the input membrane. The system then evolves into new states in steps. At each step, one or more enabled rules for membranes are activated and fired in a non-deterministic maximally parallel fashion. Specifically, a rule u Q → v in a membrane is enabled if 1) at least one catalyst listed in Q is satisfied in the membrane; 2) if every object specified in u presents in the membrane and 3) no rule with a higher priority is enabled in the membrane. In the rule, u is called the activation set of the rule. When an enabled rule is activated, it will be fired, which will consume the objects in u and create and transfer objects according to v. If v contains an (a, here), then object a will be created and kept in the current membrane (note that here can be omitted). If v contains an (a, out), then object a will be created and moved from the current membrane into the immediate parent membrane (the one immediately outside the current membrane). If v contains an (a, in), then object a will be created and moved into one of the immediately nested membranes selected at random. In a special case, a specific inner membrane, say membrane j, can be specified by (a, in j). At any time in any membrane, an object may be listed in the activation sets of different rules, but it can only stay in one of those sets. It is undetermined about which set it should be in. Furthermore, there are many ways to group objects to form an activation set for a rule, and it is undetermined how objects are grouped.
No matter how these objects are grouped, during each step of the computation, as many enabled rules as possible are activated and fired within their respective membranes which means no further rule can be enabled, and the firing of all the rules in all membranes will take place at the same time, without any particular order. This style of computation is so-called non-deterministic maximally parallel computation and is a characteristic of a P system. After a step of the computation, the system is transferred into a new state, and in the next step, another set of rules may be activated and fired. The computation halts when no more rule is enabled.
At that point, the specified set of objects remaining in the output membrane represents the computation results. It is emphasized that while it is possible to implement the computation of a cell-like P system in a conventional computer (with or without parallel programming), the real power of the non-deterministic maximally parallel computation can only be fully unleashed in a novel biocomputing environment.

III. A PAM-BASED CONSENSUS DIRECT MEMBRANE CLUSTERING ALGORITHM
In this section, the improved consensus clustering algorithm is developed based on cell-like P systems with multi-catalysts in the form of direct membrane algorithm (P-PCC). The PAM based on the q-nearest neighbors is first constructed as the BPs aggregation strategy of the P-PCC. The whole P-PCC algorithm with the form of MCC-P system is then specified. The computation process of the P-PCC is also given to show its usability and effectiveness. The flowchart of proposed P-PCC algorithm is shown in Fig. 2.

A. PAM BASED ON THE q-NEAREST NEIGHBORS
PAM is a classical K -medoids clustering algorithm, while the K -medoids clustering algorithm is an improvement of the K -means algorithm. The K -medoids clustering algorithm is VOLUME 8, 2020 more robust to noises and outliers than the K -means algorithm [45].
For a dataset X = {x 1 , x 2 , . . . , x i , . . . , x n } with n data points such that x i ∈ m , PAM clusters the data points into K clusters C = {C 1 , C 2 , . . . , C K } with the following properties: The absolute-deviation criterion in the following is used to measure the clustering quality: where is the absolute deviation between x i and o k representing the dissimilarity, and E is the sum of the absolute deviations. The aim of the PAM algorithm is to partition the dataset X into K clusters with the minimum value of E. The steps of PAM are as follows: 1) Randomly select K data points as the initial medoids. This work extends PAM in two ways. First, flexibility is introduced to specify different distance metrics dynamically. The PAM originally does not take alternative distance measures when computing E using (1). However, using different distance measures according to the characteristics of specific applications and datasets may result in improved clustering quality. Furthermore, a robust measure should work well when no external information is provided. In the experiments, four distance measures are used, including the Euclidean d euc , the KL-divergence d kl , the cosine d cos and the L p distance d lp (see Table 1), as well as their corresponding normalized distances d n_euc , d n_kl , d n_cos and d n_lp .

2) Assign each remaining data point
Second, only the q-nearest neighbors of o k are considered when selecting a new candidate medoid. The corresponding distance measure used to compute E is used to measure the closeness between a data point and o k for this purpose. The conventional PAM randomly selects a non-medoid data point o k as a candidate medoid, and the absolute-deviation criterion E of the whole dataset is computed. For large data sets, the computational time is long limiting the application of the PAM algorithm. Furthermore, when o k and o k are compared, only the data points within the current cluster C k are considered. In deciding if the current medoid should be replaced, only the values of Since the data points and the medoid in each cluster change during the computation, every data point has the opportunity to become the candidate medoid. This modification to the algorithm does not affect the clustering quality of PAM. These two changes can improve the computational efficiency of the PAM algorithm.
The steps of the improved PAM algorithm based on the qnearest neighbors are as follows: 1) Generate the q-nearest neighbors of each data point in the data set. 2) Randomly select K data points as the initial medoids. 3) Assign each non-medoid data point i to the cluster k such that dist( from the q-nearest neighbors of the original medoid o k as a candidate medoid.
Steps 3 to 7 until none of the medoids can be replaced.

B. P-PCC WITH THE FORM OF MCC-P SYSTEM
This section describes a specific MCC-P system to develop the P-PCC and discusses the rules in each membrane for performing the computation. It is common knowledge that in a cell-like P system the number of membranes is inversely proportional to the number of evolution rules. In the design of the MCC-P system, a trade-off is made between the number of membranes and the number of rules in a way similar to the space-time trade-off in conventional algorithms. Hence, the MCC-P system has one membrane for each cluster, plus three additional membranes: one for data preprocessing (obtaining the q-nearest neighbors for each data point), one for storing intermediate results during the computation, and one for storing the final result. In this system, the count, i.e., the number of occurrences, of an object, i.e., a symbol in the alphabet, is used to represent a numeric value of a type of information. The computation of the system uses rules to change the type and number of objects and to move objects between membranes, so as to complete tasks such as finding and remembering the nearest neighbor or the nearest medoid. Hence, each membrane in the structure is used to hold objects representing types and numeric values of information.
Given any dataset X = {x 1 , x 2 , . . . , x n } of size n and an integer K > 1 as the number of clusters, a MCC-P system is defined as follows The details of each component in this definition are provided in the following.
The alphabet O contains objects (i.e., symbols) that denote various types of information about the data and status of the computation. Specifically, a ik , s, s , σ, η, d, δ, δ , , e i , τ }, for 1 ≤ i, j ≤ n and 1 ≤ k ≤ K , where a i denotes data point x i ,Û ij , U ij and D ij denote one unit of distance between data points x i and x j , c denotes a neighbor of a medoid, c ij indicates that x j is a neighbor of x i , c ij indicates that the operation on c ij has completed, A ki , A ki and B ki indicate that x i is the medoid of cluster k, o i denotes the candidate medoid, d i denotes a neighbor of data point x i , β k is an auxiliary to choose the medoid of cluster k, ξ indicates that a data point has been assigned to a certain cluster, ψ indicates that the medoid of a cluster does not change, α indicates that all the K clusters have been formed, θ indicates that not all the K clusters have been formed, a ik indicates that data point x i belongs to cluster k, s denotes one unit of distance among data points in a certain cluster and the medoid, s denotes one unit of distance among data points in a certain cluster and the candidate medoid, σ indicates that the current candidate medoid cannot reduce the absolute deviation, η indicates that the current candidate medoid can reduce the absolute deviation, d indicates that an iteration in a certain cluster ended, all δ, δ , , e i and τ are auxiliaries to control the computational process of the system and λ denotes the empty set.
For any object s ∈ O and any integer f > 0, s f denotes f copies of s.
The membrane structure is defined as µ 0 and is shown graphically in Fig. 3.
The set of rules consists of three subsets: R 0 is the set of rules for membrane 0; R K +2 is the set of the rules for membrane K + 2, and R k , for 1 ≤ k ≤ K , is the set of rules for membranes k. The rules in R k , for 1 ≤ k ≤ K are similar. The partial ordering of the rules is given by ρ : {r ij 1 > r ij 2 Note that in the MCC-P system, the range for parameters i, i 1 , i 2 , . . . , i K , j, j 1 , j 2 , . . . , j (n−1) , t is 1, . . . , n and the range for k, k 1 , k 2 is 1, . . . , K unless explicitly indicated otherwise.
The rules for membrane 0 consist of The computation of this system starts by placing objects a i ,Û f ij ij , c q and d q i for 1 ≤ i, j ≤ n into membrane 0.

1) FINDING q-NEAREST NEIGHBORS OF EACH DATA POINT
According to the partial ordering ρ, rule r 0,1 fires first, which transforms eachÛ ij into one D ij and U ij . Since there are f ij copies ofÛ ij , this serves to gather the distance information for each data point Rules r 0,2 , . . . , r 0,n are used to generate the q-nearest neighbors of each data point.
Rule r 0,3 : (S n−1 ) c → λ is enabled only when the promoter, a copy of c, presents in membrane 0. For any data point . . . D in } contains one unit of distance between x i and the other n − 1 data points. Firing r 0,3 once will consume one unit of these distances. Because of the non-deterministic maximally parallel computation fashion, if a rule is activated by multiple copies of the activation set u, the rule fires multiple times, once for each copy of the activation set u, at the same time. After r 0,3 fires multiple times, all sets S n−1 are consumed, and at least one type of object, say D ij , will no longer be present in membrane 0, indicating that x j is the nearest neighbor of x i . Hence, rule r 0,3 identifies the nearest neighbor for all data points.
As soon as all copies of D ij are removed by the firing of r 0,3 and x j has not been marked as the nearest neighbor of x i , rule r 0,2 is enabled and the firing of r 0,2 will consume one copy of d i and create a copy of c ij , which means that the nearest neighbor of x i (one of the q-nearest neighbors, as d i denotes) has been found to be x j (denoted by c ij ). Note that once c ij is created, rule r 0,2 will be disabled for the pair x i and x j , but can still be activated for pairs involving x i and other data points.
The process defined by the remaining q − 1 rules, namely r 0,4 to r 0,q+2 is similar to that of r 0,3 . Each of these rules is designed to obtain the next nearest neighbor for each data point. These rules will be enabled and fired one after another in the given order, as controlled by the promotor c t . After each of these rules is fired, rule r 0,2 will fire to mark the next nearest neighbor.
After all d i objects, for 1 ≤ i ≤ n, are consumed by r 0,2 (indicating q-nearest neighbors of all data points have been obtained), rule r 0,n+1 is enabled and the firing of this rule will move copies of a i ,Û ij and c ij to membrane K + 2 for further processing.
The rules for membrane K + 2 consist of

2) RANDOMLY SELECTING K INITIAL MEDOIDS
Rule r K +2,1 is activated as soon as objects are moved into membrane K + 2 by rule r 0,n+1 . When it is fired, the distance information U ij and the q-nearest neighbor information c ij are placed into membranes 1 to K , with a copy in membrane K + 2.
Rule r K +2,2 is used to select at random a medoid for each cluster. Since there is exactly one object β k , for 1 ≤ k ≤ K , and exactly one a i for each data point, the activation of r K +2,2 for any pair of i and k is a random event. Once activated, this rule creates a unique copy of A ki which indicates that the medoid of cluster k is x i , i.e., o k = x i . Let A = {A ki } be the objects describing the current set of medoids. Note that r K +2,2 and r K +2,1 will be activated in the same computational step.

3) ASSIGNING NON-MEDOID DATA POINTS TO THE CLUSTERS
Rules r K +2,3 , . . . , r K +2,6 are used to assign data points to the clusters.
Rule r K +2,4 is enabled for data points i and j only if 1) x i is a medoid of cluster k (indicated by promotor A ki ), 2) x j is a non-medoid data point (indicated by promotor a j ), and 3) the rule has not been fired before in this loop (indicated by promotor δ K ). The firing of this rule once creates one copy of D jk to denote a unit of distance between x j and the medoid of cluster k (i.e., o k ), an object δ to denote that this rule has been fired, and an object to be used as a promoter in rule r K +2,5 . Because of the maximally parallel nature of the P system, the firing of rule r K +2,4 finds the distance between each non-medoid data point and the K medoids at one step.
Similar to rule r 0,t+2 , the firing of r K +2,6 will consume a unit of distance D j1 D j2 . . . D jK between non-medoid x j and medoid of each cluster. As a result, the object denoting the shortest distance will be removed completely. If this object is D jk , its absence in membrane K + 2 will enable rule r K +2,5 , which will move a j into membrane k when fired, resulting in data point x i being assigned to cluster k.
Once all data points are assigned to their nearest clusters, rule r K +2,7 fires to reset membrane K + 2 back to the initial state.
Rule r K +2,8 is used to detect if any medoid has been replaced and trigger the computational process in membranes 1 to K . If any medoid in the K clusters is replaced, object θ is sent to each of the membranes 1, 2, . . . , K ; otherwise, if none of the medoids in the K clusters is replaced, which means the K clusters have been formed, object α instead of θ is sent to each of the membranes 1, 2, . . . , K . At the same time, A ki k is sent to membrane k to trigger further processing.
Rule r K +2,9 is enabled when the whole clustering process ends. This rule is fired to move object a ik , which is sent by rule r k,1 , to the output membrane K + 1 to record the assignment of data point x i to cluster k.
Each of the rules for the remaining membranes consists of 13 : dA ki B ki → (δA ki , out)

4) REDETERMINING THE BEST MEDOID IN EACH CLUSTER
The best medoid is determined among the current medoid and all its q-nearest neighbors in each cluster in the corresponding membrane. Rule r k,3 is activated after object A ki is moved into membrane k by rule r K +2,8 , which transforms each A ki into one A ki and one B ki , where A ki stores the current medoid and B ki stores the original medoid in this iteration.
-If membrane k contains no a i , rule r k,2 is activated to send ψ, A ki and δ out. -If membrane k contains object a i , rules r k,4 to r k,7 are used multiple times to find the best medoid.

Selection of a non-medoid point o k in C k from the q-nearest neighbors of the original medoid o k as a candidate medoid
Rule r k,4 is enabled and activated by B ki generated in rule r k,3 as a promoter. One neighbor of the original medoid is chosen at random as a candidate medoid, and object a j is transformed into o j to mark the candidate medoid. At the same time, c ij is transformed into c ij which indicates that a j does not need to be chosen again. If a certain a j is not in this membrane, c ij is also transformed into c ij which indicates that a j does not need to be chosen again but for a different reason from that just discussed above.
Computation of x i ∈C k dist(x i , o k ) Rule r k,5 is enabled for data points i, j and t only if: 1) x i is a medoid of cluster k (indicated by promoter A ki ), 2) x j is a candidate medoid of cluster k (indicated by promoter o j ), 3) x t is neither a medoid nor a candidate medoid in the cluster (indicated by a t ), and 4) the rule has not been fired before in this loop (indicated by inhibitor e j ). The firing of this rule once creates one copy of s representing one unit of distance between x t and x j , one copy of s representing one unit of distance between x t and x i , an object e j indicating that this rule has been fired. Because of the maximally parallel nature of the P system, the firing of rule r k,5 calculates and stores the values of x i ∈C j dist(x i , o k ) and x i ∈C j dist(x i , o k ) in the form of the numbers of s and s, respectively, in one step. 6 is activated to remove the same number, determinded by whichever is smaller, of copies of s and s , and then rules r k,7 and r k,8 are used to compare the two medoids. If all copies of object s are consumed, which indicates that the candidate medoid cannot reduce the absolute deviations, rule r k,7 is enabled to transform o j back into a j and generate an object σ which indicates that the medoid is not changed. Otherwise, rule r k,8 is enabled to transform A ki into A kj which indicating that x j is set as the new medoid, and to generate an object η which indicating that the medoid has been changed.
Rules r k,9 , . . . , r k,13 are used to reset membrane k to its initial state and trigger the computational process in membrane K + 2.
Once the best medoid is found, rule r k,9 fires and generates an object τ to enable r k, 10 . Rule r k,10 is used to transform c ij back to c ij and remove auxiliary objects e i . If the membrane does not contain object η meaning that the medoid is not changed, the first part of r k,11 is activated. The first part of r k,11 transforms σ j and τ into d to be used to enable rules r k, 12 and r k, 13 , and into ψ to be sent out to show that the medoid in this membrane is not changed. Otherwise, the rest of r k,11 is activated to remove σ and η, and to transform τ into d. Rule r k,12 is enabled by d, as a promoter, to send a i out. Similarly, rule r k,13 is enabled by d to send δ out to indicate that the computation process in this membrane during this iteration is over, and to send A ki out to mark the updated medoid.

IV. EXPERIMENTS AND ANALYSES
In this section, the performance of the P-PCC is demonstrated using publically available datasets. The major factors affecting the performance of the P-PCC are also examined.

A. EXPERIMENT SETUP
Fifteen datasets from the UC Irvine Machine Learning Repository [46] are used in the experiments. Table 2 shows the basic characteristics of these datasets. The normalized Rand index R n is used to measure the clustering quality. A larger R n value means a higher clustering quality. The BPs π i (1 ≤ i ≤ p) are generated by the MATLAB Kmeans function using the squared Euclidean distance. To generate the BPs, two strategies, random parameter selection (RPS) and random feature selection (RFS), are used. RPS randomizes the number of clusters K i in a given interval, and RFS randomizes the selected features. The default settings are as follows. The parameter K for the P-PCC is set to the actual number of clusters K a . The number of BPs p is set to 100. The weight of each BP w i is set to the same value 1. The default generation strategy is RPS, and the interval for the number of clusters is set to [K a , √ n]. In RFS, the number of clusters is set to K a , and two features are selected randomly from all the features. All experiments were simulated by MATLAB R2019b running on a Windows 7 platform of the 64-bit edition. The PC has an Intel Core i7-5500U 2.4 GHz CPU and 8 GB of RAM.

B. CLUSTERING QUALITY
The clustering results, measured by R n , obtained by the P-PCC using different distance measures are shown in Table 3. Clustering results in Table 3 show that the P-PCC has the best performance on nine out of the fifteen datasets using the normalized KL-divergence distance measure. Therefore, the normalized KL-divergence is the most robust distance measure, and is set as the default distance measure if no external information is provided.
The clustering qualities of the P-PCC and another Kmeans-based consensus clustering algorithm (KCC) [48] are compared in Fig. 4, where the normalized KL-divergence is chosen as the distance measure. As can be seen, the clustering qualities of the P-PCC and the KCC on twelve out of the fifteen datasets are almost the same, while the P-PCC obtained better results on the Wine, Seed. and Some. datasets.
In summary, the P-PCC performs at least as well as the KCC in terms of clustering quality, and is robust to datasets with noises and outliers.

1) NUMBER OF NEAREST NEIGHBORS q
Experiments are performed to investigate the effects of the number of nearest neighbors q on the quality and on the efficiency of P-PCC. The results using for datasets are shown in Fig. 5. The range of q is set to 1, 2, . . . , n − 1 and the value of R n and the running time are used to measure the results. The results show that for all tested cases except for Iris with q = 2, R n is almost not affected by the choice of q. On the other hand, the running time increases in decreasing speeds in all cases as q increases. Based on the experimental results, q = min{ √ n, 20} is used, which gives overall better results, in this work to report the experimental results.

2) NUMBER OF BPs
The number of BPs p is set to 10, 20, . . . , 90, and the experiments are repeated for 100 times for each value of p. The clustering results are shown in Fig. 6 for four datasets. As shown in Fig. 6, the clustering quality improves as the value of p increases, and becomes stable when p reaches to about 50.

3) QUALITY AND DIVERSITY OF BPs
The Ecoli, Iris, Wine and Frogsmfcc datasets are used for illustration. The frequency distributions of the BPs with different R n values of these datasets are shown in Fig. 7 To reveal the relationship between the clustering quality of the P-PCC and the quality/diversity of the BPs, the BPs are first sorted according to their R n values in descending order. The BPs with larger R n values are then gradually removed to see the changes in the R n values of the P-PCC. The results are shown in Fig. 8. As can be seen from Fig. 8,   the clustering quality drops when the relatively high quality BPs are removed, which is more evident in the Iris, Wine and Frogsmfcc datasets, indicating that the clustering qualities of the P-PCC mainly depend on the high quality BPs. The BPs with smaller R n values are also gradually removed, and the changes in the R n values of the P-PCC are shown in Fig. 9. As can be seen from Fig. 9, the clustering quality basically remains unchanged, indicating that the clustering qualities  of the P-PCC are more affected by the qualities than by the diversities of the BPs.
For the Ecoli dataset, only 13 BPs have R n values above 0.4. However, the clustering quality starts dropping when 60 BPs with larger R n values are removed, indicating that the diversity can uphold the clustering quality when the qualities of all BPs are relatively low.
Furthermore, when all BPs are used, the R n values of the P-PCC on the Ecoli, Iris, Wine and Frogsmfcc datasets are 0.4095, 0.7302, 0.4024 and 0.5152 respectively, which are basically consistent with the best R n value of all BPs. These results indicate that the P-PCC can take advantage of the high quality BPs even if the percentage of the high quality BPs is very low.

4) THE GENERATION STRATEGY OF BPs
Different generation strategies of BPs are used to improve the clustering quality. In the following, the results of two such strategies are reported. One strategy uses a different interval for the number of clusters and the other uses only a selected number of features for each data set.
For large scale datasets, the interval of the number of clusters [K a , √ n] is too large. When dealing with these large scale datasets, this interval is set to [2, 2K a ]. The clustering results, measured by R n , obtained by different distance measures are shown in Table 4. Results in Table 4 show that the clustering quality of the Breast_w dataset is substantially improved, with the average values of R n increased from 0.1944 to 0.8465, while the clustering qualities of the other datasets are almost the same as the previous results. These results indicate that the interval of the number of clusters [2, 2K a ] can improve the clustering quality of large-scale datasets sometimes.
For data with multi-features, i.e., with 10 or more features, the RFS strategy is adopted. Four datasets are used for illustration in Fig. 10.

V. CONCLUSIONS
This work developed an improved PAM-based consensus clustering algorithm using the MCC-P system, called the P-PCC, which is robust to noises and outliers and is able to achieve an acceptable time complexity. Experimental results on fifteen datasets show that the P-PCC performs well for clustering. The results show it is viable to improve conventional clustering algorithms by using the parallel mechanism of membrane computing models.
There has been a research focus on reducing the amount of the computing resources in use. It is of interest to find ways to reduce the amount of computing resources used in the P-PCC.
For further research, it is of also interest to use some other membrane computing models, such as the spiking neural    P systems (SN P systems) [49], to improve the efficiency of consensus clustering. SN P systems are inspired by the mechanism of the neurons that communicate by transmitting spikes. The cells in SN P systems are neurons that have only one type of objects called spikes, which is easier to control in biological experiments. Furthermore, some other clustering algorithms, even data mining algorithms, such as spectral clustering, support vector machines, and genetic VOLUME 8, 2020  algorithms [50], may also be improved by using parallel evolution mechanisms and tree membrane structures [19]. The use of evolutionary computing in the genetic algorithms proved to be effective in solving several problems [51]- [53]. It is also of interest to combine membrane computing models with evolutionary computing to improve the effectiveness and efficiency of problem solution algorithms.