Mining Correlated High Utility Itemsets in One Phase

High-utility itemset mining (HUIM) in transaction databases has been extensively studied to discover interesting itemsets from users’ purchase behaviors. With this, business managers can adjust their sale strategies appropriately to increase proﬁt. HUIM approaches usually focus on the utility values of itemsets, but rarely evaluate the correlation of items in itemsets. Many high-utility itemsets are weakly correlated and have no real meaning. To address this issue, we suggest an algorithm, called CoHUI-Miner, to efﬁciently ﬁnd correlated high-utility itemsets. In the proposed algorithm, we use the database projection mechanism to reduce the database size and present a new concept, called the preﬁx utility of projected transactions, to eliminate itemsets which do not satisfy the minimum threshold in the mining process. Experimental evaluation on two types of datasets from sparse to dense ones shows that CoHUI-Miner can efﬁciently mine correlated high-utility itemsets with regard to both execution time and memory usage.


I. INTRODUCTION
Data in a variety of applications are more and more expanded and exploited, especially for business activities, which attracts great attention of scientists. Many methods have been proposed and brought into practical usage. Among them, data mining is an interesting direction to study. It includes frequent itemsets [1]- [5], association rules [6]- [8], high utility itemsets [9]- [11], etc. Frequent pattern mining (FPM) is an interesting problem in the field of data mining, with the target of finding frequent itemsets from a transaction database. Frequent itemsets have been widely studied and play an important role in the context of association rule mining (ARM) [1], [2], [5] for analyzing customers' buying behaviors. However, the limitation of ARM is that it does not consider the benefits of items and the profits of itemsets. Therefore, in 2004 Yao et al. [12] defined two types of utilities for items, internal utility and external utility, to mine the itemset utilities from databases. High-utility itemset mining (HUIM) was presented to mine itemsets having high-utility values (or profits) in a transaction database, and The associate editor coordinating the review of this manuscript and approving it for publication was Xin Luo . was implemented in approaches such as Two-Phase [13], UP-Growth [14], HUI-Miner [15], FHM [16], EFIM [17], and D2HUP [18]. Yun et al. [19] proposed the MU-Growth algorithm (Maximum Utility Growth), including two effective candidate pruning techniques in the mining process. In addition, the authors built an MIQ-Tree tree structure that can obtain data with a single-pass. This data structure can be used to find efficient patterns from incremental databases. The experimental results showed that MU-Growth is more effective than other algorithms in terms of execution time and candidate storage. In particular, the algorithm excels when the database contains many long transactions or low minimum utility thresholds. In 2016, Dawar et al. [20] presented the UFH algorithm to exploit HUIs efficiently on the sparse dataset. Moreover, on the sparse dataset, some other techniques also presented to discovery useful knowledge such as randomized latent factor model [21], distributed alternative stochastic gradient descent model DASGD [22] or model based on SGD extensions [23]. Recently, Nguyen et al. [24] proposed an algorithm that relies on a new tight database format named MEFIM (Modified EFficient high-utility Itemset Mining), which can discover the desired itemsets efficiently. The authors then introduced the iMEFIM algorithm, which is an improved version of MEFIM. The iMEFIM employs the P-set structure to shrink the amount of transaction scans and accelerate the mining process. The experimental evaluation on several databases showed that iMEFIM is efficient in terms of time and memory.
However, many of the discovered HUIs contain items that have weak correlations. For example, when a customer has just purchased a pair of diamond cores that represent a huge profit and by chance that same transaction also has a tissue product, then this means that {Diamond} is already an HUI, so when combined with the tissue product or any other item in that transaction, then both represent a high utility itemset. However, the correlation between diamond and tissue items is not a strong one in reality. Lin et al. [25] thus offered the FDHUP algorithm to efficiently discover discriminative high-utility itemsets with strong frequency affinity. After that, Gan et al. [26] proposed an algorithm which considers both the utility and correlation between items in a transaction database, called CoHUIM. However, the number of candidates generated by these algorithms is huge, and it is necessary to scan the original database several times. Therefore, the execution time is relatively long, and the storage space is large. In this paper, we propose a novel algorithm, called CoHUI-Miner, to reduce the space usage requirement, and improve the performance of the mining process for correlated high-utility itemsets (CoHUIs).

A. RESEARCH CONTRIBUTIONS
• It applies the projected database strategy to reduce the database size when exploring the search space, and proposes a new concept called the prefix utility of the projected transaction to directly calculate the utility of itemsets.
• It presents a new method, SearchCoHUI, to mine CoHUIs in a single phase without generating candidates.
• Several pruning strategies for mining CoHUIs are applied more efficiently to project the database, such as, TWU-Prune, KULC-prune, Variant of U-Prune and LA-Prune.
• The experimental results on dense, moderate and very dense datasets prove that the improved algorithm we proposed shows better performance than the state-ofthe-art CoHUIM algorithm in both runtime and memory usage.

B. ORGANIZATION
The rest of this paper is organized as follows: Section 2 presents influential studies related to CoHUIs, and the important definitions of problem mining for CoHUIs. In Section 3 we propose the CoHUI-Miner algorithm to mine CoHUIs. Section 4 presents experimental evaluations of the proposed algorithm. In Section 5 we discuss the conclusions and give some expansions of the problem.

II. RELATED WORK
In this section, we briefly review prior works both on high-utility itemset mining and on correlated high-utility itemset mining, and the reason for mining correlated highutility itemsets.

A. HIGH-UTILITY ITEMSET MINING
Recently, many studies have been carried out to mine HUIs, and have been applied to many useful applications that improve the quality of human life. Some previous works on utility mining problems use a two-phase approach and generate a large quantity of candidates, such as Two-Phase [13], CTU-Mine [27], TWU-Mining [28], UP-Growth [14], UP-Growth+ [29], which thus consume more execution time and storage space. To solve this problem, in 2012 Liu and Qu proposed an algorithm, called HUI-Miner [15], to discover HUIs with a list data structure (utility list), and a TWU pruning strategy.  [20] proposed a hybrid algorithm UFH to effectively exploit HUIs on sparse databases in 2016. In 2018, Duong et al. [31] built a new utility-list buffer (ULB) structure that evolved from the previous utility-list structure and proposed the ULB-Miner (Utility-List Buffer for high utility itemset Miner) algorithm using an effective method for creating utility-list segments to reduce the cost of constructing utility-lists, so HUIs are mined more efficiently both in terms of runtime and memory consumption. Moreover, the problem is also extended to exploit High-Utility Sequential Patterns (HUSP) from a sequence database [32]- [34]. Taking advantage of the global optimization of evolutionary algorithms, many algorithms that exploit HUIs based on GA (Genetic Algorithm), PSO (Particle Swarm Optimization) and Bio-Inspired approaches have been proposed. In 2014, in order to reduce the search space and difficulty in determining the minimum utility threshold, Kannimuthu and Premalatha [35] proposed the HUPE UMU -GRAM and HUP EWUMU -GARM algorithms based on GA and presented results based on databases containing negative item values. In 2016, Lin et al. [36] applied the PSO algorithm to the proposed HUIM-BPSO algorithm by integrating the sigmoid strategy in the TWU update model and using the Or/NOR tree structure to trim invalid candidates to reduce the number of database scans.
Biology focuses on the study of organisms, their interrelationships and environments, so a bio-inspired algorithm is one based on measuring and duplicating the appearance or nature of a biological object. These algorithms achieve high performance but are not guaranteed to explore all the HUIs. In 2018, Song and Huang [37] proposed a new model based on bio-inspired algorithms to find a set of HUIs. This approach selects the proportion of HUIs detected as the target values of the next population. The three algorithms were a genetic algorithm, particle swarm optimization, and bat algorithm. Extensive tests conducted on publicly available datasets show that the proposed algorithms outperform existing state-of-the-art algorithms in terms of efficiency, quality of results, and convergence speed.
In 2019, Zhang et al. [38] presented three new strategies in the HUIM-IGA algorithm, namely the neighbor search strategy to search for HUIs more effectively, the strategy of individual diversity in the population to solve the loss problem of HUIs, and the strategy of adjusting individuals to limit the generation of invalid candidates, with all three combined to ensure that HUIs are preserved. Experiments have shown that the HUIM-IGA algorithm is more effective than previous EC-applied algorithms with regard to time consumption.

B. CORRELATED HIGH-UTILITY ITEMSET MINING
The HUPM algorithm can find HUPs, but many HUPs may be unnecessary due to poor correlations. High-utility pattern mining (HUPM) has the major drawback that it only considers the utility of itemsets but ignores the affinity between items in an itemset. Many studies have worked to calculate the correlation between items in an itemset, with researchers proposing various scales to measure the correlation.
The concept of measurements between items was also proposed by Omiecinski [6] to mine associations in databases. Omiecinski presented three interestingness measures for associations: any-confidence, all-confidence, and bond. These measures describe the strength of items with regard to an association with other related items. He also demonstrated the importance of the downward closure property to both all-confidence and bond metrics, and proposed an algorithm to exploit association rules based on these. However, these measures do not evaluate the correlation among items in a transaction database that contains many unstable and null transactions. In 2010, Wu et al. [3] re-evaluated a collection of five null-invariant scales and indicated the stereotype and linear ordering between them. They also presented a new concept, namely the imbalance ratio, to assess the degree of deviation for a dataset and proposed the GAMiner algorithm to exploit frequent and closely related Kulc and Cosine itemsets. In another study, Duan and Street [39] proposed finding maximal fully-correlated itemsets in large databases by building some key properties when choosing the best correlation measure, and gave a complete definition of a fully correlated itemset. Finding the best correlation measure combined with the definition of fully-correlated itemsets eliminated itemsets with irrelevant items in a reasonable time.
In 2011, Ahmed et al. [40] proposed an algorithm, high-utility interesting pattern mining (HUIPM) with a frequency affinity, to explore interesting itemsets in which the relations among items are meaningful. They presented the UTFA structure, which is a utility tree based on frequency affinity for the single-pass mining of HUIPs. In 2017, Lin et al. [25] proposed the element-information table (EI-table) and frequency-utility tree (FU-tree) structures and some pruning strategies based on them to discovery discriminative high-utility patterns (DHUPs) without candidate generation using a strong frequency affinity HUIM algorithm, called FDHUP. The algorithm performs considerably better than the UIPM algorithm in terms of runtime, memory usage and scalability.
In 2018, Fournier-Viger et al. [41] proposed combining the concept of correlation and high-utility itemset mining to find the correlated high-utility itemsets (CHIs). The FCHM (fast correlated high-utility itemset miner) algorithm was presented to mine CHIs. This algorithm has two versions, FCHMall-confidence based on all-confidence and FCHMbound based on the bond measure. The algorithm integrates several strategies to discover CHIs efficiently. The results of their experiment were evaluated using the min_measure and min_util thresholds. This algorithm was also compared to the FHM algorithm for HUIM with regard to runtime and memory usage. More recently, Fournier-Viger et al. [42] proposed the mining of LHUIs (local high-utility itemsets) and expanded it to seek PHUIs (peak high-utility itemsets), with the algorithms LHUI-Miner and PHUI-Miner being proposed to find these patterns. However, the set of PHUIs can be quite large, and some items emerging in PHUIs do not contribute much to their peaks. NPHUIs (non-redundant peak high-utility itemsets) were thus proposed to exploit a smaller set of patterns. LHUI, PHUI and NPHUI are completely new issues in HUI mining.
Djenouri et al. [43] introduced a highly effective pattern mining approach, namely Clustering-Based Pattern Mining (CBPM). This study included two steps: first, clustering the transaction database into clusters in which highly correlated transactions are grouped together by the K-Means algorithm, and then applying the HUIM algorithm to each cluster to discover relevant HUIs. An experimental evaluation of CBPM showed that it was efficient, especially in a large search space. Moreover, Gan et al. [44] proposed an efficient utility mining approach that considers the correlation of items among patterns, called non-redundant CoUPM (Correlated high-Utility Pattern Miner). This algorithm performed on a one-phase basis on the utility-list and revised this structure to contain data on utility and correlation. The downward closure properties of Kulc and remaining utility were applied for pruning in this algorithm. The design and experimental results of this algorithm were compared with those of the HUPM and CoHUIM algorithms.

III. PRELIMINARIES AND PROBLEM STATEMENT
Let I = {i 1 , i 2 , . . . , i m } be a set containing m different items in a transaction database D = {T 1 , T 2 , . . . , T n }, where n is the number of records in D. ∀T j ∈ D, T j = {x l |l = 1, 2, . . . , N j , x l ∈ I }, where N j is the number of items in transaction T j . VOLUME 8, 2020 An example of transaction database D is given in Table 1 and item profits are given in Table 2. Each item x i in I is assigned a profit value denoted as Prf (x i ). Each item x i in transaction T j is assigned a purchase quantity value and denoted as Pq(x i , T j ). The utility of an item Definition 1: HUI is an itemset X whose utility is larger than the minimum utility threshold (minUtil) given by the user. HUIs is a set of all HUI in D and is defined as: Table 1 Table 3 shows twu of 1 -itemsets in D. Property 1 [13]: If twu (X ) < minUtil then no of supersets of X are a HUI.
Definition 3: The support of X is the quantity of transactions which contain X , denoted as sup(X ). For example, in Table 1, sup(ac) = 3 because three transactions T 1 , T 3 and T 6 contain {ac}. The support for each item is shown in Table 3.
In 2018, the first correlated high-utility itemset mining (CoHUIM) algorithm was proposed by Gan et al. [26] to mine items in every HUI that has strong correlation. The algorithm uses a correlation measure and applies the sorted downward closure property of Kulc (abbreviated as Kulc -correlation measure) to efficiently prune unpromising candidates. Table 3). For the database given in Table 1, the total order is:

Definition 4: (Ordering of items). A total order is built on the increasing support of items in D (in
Selecting minUtil = 70 and above total order ≺, the database has been changed as shown in Table 4. There is a large number of HUIs that have items whose correlation is weak, not strong. Many measures have been proposed to evaluate the HUIs' correlation such as affinitive frequency, all-confidence, bond, any-confidence, and Kulczynsky (Kulc). However, Kulc is widely used to assess the inherent correlation of itemsets.
Definition 5: The correlation between items in the itemset X = {x 1 , x 2 , . . . , x k } is defined as kulc (X ) = Sup(X )/Sup(x i ) is the ratio of the number of appearances in the same transaction to all other items in X to the number of times the item x i appears during the transaction. If this ratio is high, it proves that x i is highly associated with all items in X . In contrast, kulc is the average of the above ratios, so if the ratio is high, the items in X appear together many times and are highly related. Definition 6: A correlated high-utility itemset (CoHUI) is an itemset X in D which is kulc (X ) ≥ minCor and u (X ) ≥ minUtil). CoHUIs is a set of all CoHUI in D and is defined as: For instance, in Table 4, u (bde) = 75, kulc (bde) = 0.5333. If the minUtil = 70 and the minCor = 0.52 then the itemset {bde} is a CoHUI.

IV. COHUI-MINER ALGORITHM A. DATABASE PROJECTION STRATEGY
To reduce the database size while exploring the search space, we use the projected database strategy [17], which significantly reduces the time needed to scan the database. Therefore, the process of mining for CoHUIs becomes more effective. The details of the projection mechanism are presented below.
Definition 7: The item x i is after itemset X in T j if x i is after all items in X . In this, all items in T j are a set of ascending sorted order support-values and are denoted as X ≺ x i .
For instance, in the transaction T 6 in Table 4, The projected transaction T j on itemset X is denoted as T j \X , and T j \X = {x i ∈ T |X ≺ x i }.
For instance, in Table 4, the projected transaction T 6 on itemset {a, e} is T 6 \ {a, e} = {c}.
Definition 9: The projected database on itemset X is denoted as D\X , and D\X = {T j \X |T j ∈ D}.
For instance, in Table 4, the projected database on itemset X = {a, b}: itemset X is contained in the three transactions: T 1 , T 2 , and T 6 . Then, D\{a, b} is T 1 \ {a, b} = {d, c}, T 2 \ {a, b} = {d, e}, and T 6 \ {a, b} = {d, e, c}. The above example shows that the projected database D\{a, b} is a lot smaller than the original database, thus increasing the effectiveness of the CoHUI mining process. Because D\{a, b} has removed {a, b}, U {a, b} cannot be directly calculated on this projected database. Gan et al. [26] put {a, b} into the candidate set if kulc(a, b) ≥ minCor. After a large quantity of candidates was found in phase 1, they continued to scan the original database in phase 2 to compute U {a, b}. Therefore, this algorithm required a long execution time and consumed a large amount of memory. To solve these issues, we propose the prefix utility concept.
Definition 10: The prefix utility of the projected transaction T j \X is defined as pru T j \X = u X , T j .
Thus, when projecting a transaction T j on itemset X , the utility of X on T j will be calculated and saved by the prefix utility. Different transactions may result in different prefix utility values. The prefix utility of a transaction will change in different projections. In the projected database D\X , the itemset X is extended from X and its utility is identified based on the utility of X and the prefix utility of the transactions in D\X . Combined with the kulc value, our proposed algorithm helps determine whether X is CoHUI or not in the projected database without re-scanning the original database. The CoHUI-Miner algorithm is shown in Algorithm 1. Definition 11: The remaining utility of X in T j (denoted as ru(X , T j )) is sum of the utility of all items which are after X in T j by the total order, For instance, in Table 4, ru(ab, T 1 ) = u(d, T 1 ) + u(c, T 1 ) = 2 + 1 = 3.

B. THE COHUI-MINER ALGORITHM
Some pruning strategies are applied to reduce the search space and execution time of the algorithm, as follows. With strategy 1 (TWU-Prune), in the first scan of the database, the twu values of the 1-itemsets X are calculated. According to property 1, if twu(X ) is smaller than minUtil, X and all its supersets are not HUIs and also are not CoHUIs [13] (because a CoHUI's prerequisite is HUI). So, the algorithm stops expanding from X . With strategy 2 (KULC-prune), based on the total order ≺ of all items in the database, if an itemset X has kulc (X ) < minCor, every itemset Y expanded from X also has kulc (Y ) < minCor (for kulc(Y ) ≤ kulc(X )) (property 2). This means that X is not a CoHUI and any extended itemsets from the prefix X are not CoHUIs. With strategy 3 (Variant of U-prune), the use of projection will significantly reduce the size of the database during algorithm implementation. The larger the extended set is, the smaller the database size after the projection is made, so that the search space will be effectively reduced based on the projection database by itemset X . The utility and the remaining utility values of an itemset X (extended from X ) are then calculated. If u(X ) + ru(X ) < minUtil, then all the itemsets Y extended from X are not HUIs [15] and are also not CoHUIs. With strategy 4 (LA-Prune), in the process of extending itemset X from X by projection method, the ULA of X value is initialized by U (X ) + RU (X ). The projection will consider all transactions in the input database. If X ⊂ T , then reduce the ULA value by pru (T ) + u τ (T ). If ULA < minUtil then X is not a CoHUI and all itemsets that extend from X are not CoHUI. The projection with itemset X will stop. The procedure of CoHUI-Miner takes as its input D, a transaction database; minCor, user-specified minimum correlation threshold; and minUtil, a user-specified minimum utility threshold. It mainly performs in one phase. It first scans the database to calculate sup(i), twu(i) and u(i) for each item i in D (line 1), then constructs the I keep set which consists of items whose twu value is not less than minUtil and finally updates SUP, U and database D with respect to I keep (lines 2, 3). After that, I keep is sorted in the increasing order of SUP, and the items of all transactions in database D is sorted w.r.t I keep (line 4). In line 5, each 1-item X ⊆ I keep , if U (X ) > minUtil then X is a CoHUI (because kulc (X ) = 1) (dismiss lines 6, 7). From lines 9 to 24, the algorithm creates the projected database from 1-item X . Each projected transaction is assigned a prefix-utility of itemset X (line 19). In line 25, the SearchCoHUI procedure is called to extend the CoHUI set.
The SearchCoHUI algorithm takes X : prefix itemset, U (X ) : utility of X , RU (X ): remaining utility of Algorithm 2 SearchCoHUI Input: X : prefix itemset; U (X ): utility of X , RU (X ): the remain utility of X , dbProjectX : projected database with X prefix; k: length of items set X . Output: itemsets are CoHUIs with X prefix. if U X + RU (X ) ≥ minUtil then //U-Prune 36 SearchCoHUI (X , U X , RU (X ), dbProjectX , k + 1); 37 end if 38 end if 39 end if 40 end for X , dbProjectX : projected database with X prefix, and k: length of items set X . This is based a DFS algorithm to find the extended set X having the prefix X , then calculate U (X ), RU X , support(X ), kulc(X ) and projected database by X .
The calculation U (X ) will be based on U (X ) and the prefix utility of the transactions. Based on the U (X ) and kulc(X ) values, we can determine whether X is a CoHUI or not. With each item in I keep (starting from position k), the procedure finds the extended set X of X (lines 2, 3). Initializations for U (X ), RU (X ), Support(X ) and ULA(X ) are done in line 4. From lines 5 to 28, the algorithm processes each transaction T of the database dbProjectX to calculate U (X ) (if X ⊆ T then increase U (X ) by u x j , T and update RU (X ), support X ; else decrease U X by pru (T ). At lines 1 and 15, the LA-Prune strategy is applied, if X ⊆ T then the ULA value is decreased by (pru (T ) + u τ (T )), and if ULA < minUtil then X is not a CoHUI so stop the projection with X by return command. Lines 20 to 26 calculate projected transaction T on X (T \X ) and update prefix utility corresponding, pru(T \X ) = pru(T ) + u x j , T . In lines 29 to 39, kulc X is computed and if kulc(X ) ≥ minCor and U (X ) ≥ minUtil then X is a CoHUI. At lines 35 and 36, U-Prune strategy is applied, if U X + ru(X ) ≥ minUtil, SearchCoHUI is called recursively to continue searching for CoHUI with the prefix X . Otherwise this algorithm is ended.

V. EXPERIMENTS A. GENERAL SETTINGS
We implemented the CoHUI-Miner algorithm in Java and conducted it on a Dell Precision Tower 3620 with Intel Core i7-7800X CPU @3.5GHz, 32GB of memory, running Windows 10. We used standard datasets downloaded from the SPMF library [45] such as Chainstore, Kosarak, Accident, Mushroom, Chess and Connect. Detailed characteristics of experimental datasets are given in Table 7 with both dense, moderately dense and sparse datasets. The obtained results show that the CoHUI-Miner algorithm has better execution time than CoHUIM [26] in all datasets from Table 7. The major difference between the two algorithms is that CoHUIM generates the candidate set in phase 1 and rescans the dataset several times in phase 2 to discover CoHUIs, whereas CoHUI-Miner uses the prefix utility of the projected transactions to define the utility of an itemset without generating candidates. Furthermore, we apply two effective pruning strategies, Variant of U-Prune and LA-Prune, to the projected database to reduce the search space. The CoHUI-Miner algorithm was used on three models to compare with the CoHUIM algorithm: Basic algorithm: The single-phase CoHUI-Miner algorithm uses the prefix utility value, applies KULC-prune and TWU-Prune strategies to reduce the search space. The CoHUIM algorithm that we compared also adopts these strategies.
Improved algorithm: To increase the performance of the basic algorithm, the Variant of U-Prune and LA-Prune strategies are used to evaluate our more efficient proposed algorithm, including CoHUI-Miner RU (Basic algorithm + U-Prune), CoHUI -Miner RU +LA (Basic algorithm + U-Prune B. COMPARISON OF RUNTIME AND MEMORY USAGE Figures 1, 2 and 3 present the runtime effectiveness of CoHUI-Miner on three models, namely CoHUI-Miner (basic algorithm), CoHUI-Miner RU and CoHUI-Miner RU +LA . As the density of the dataset increases, the cost of dataset scanning is also increased. Thus, CoHUIM shows that it suffers from excessive dataset scanning in phase 2. The experimental performance in CoHUI-Miner is significantly better than CoHUIM on all dense datasets.
In a very dense database ( Fig. 1.), all the models of the CoHUI-Miner algorithm are much more efficient than the CoHUIM in execution time. In particular, the smaller the minUtil and minCor are, the more obvious the effect is. In some cases, when the Connect database is run on low minUtil thresholds, the CoHUIM algorithm could not find out the CoHUI sets. However, CoHUI-Miner still performs effectively and does not need much time. CoHUIM is a two-phase algorithm. In phase 1, the CHUUBI set is built to save candidates which satisfy the minCor threshold. Phase 2 rescans the database to calculate the candidates' utility and defines CoHUI from the candidates. Meanwhile, CoHUI-Miner is a single-phase algorithm, which calculates the correlated value between items in itemsets and their utility immediately when determining CoHUI itemsets. CoHUI-Miner is also more efficient than the CoHUIM algorithm because it does not spend a lot of time to determine the utility of candidates in phase 2. In dense databases, the correlation between items is high, and the TWU of itemsets is also large, so the number of candidates is huge. Therefore, the CoHUI-Miner takes much less time than the CoHUIM. The CoHUI-Miner RU algorithm (applying the Variant of U-Prune strategy) is better than the basic CoHUI-Miner algorithm at all minCor and minUtil thresholds. The CoHUI-Miner RU +LA algorithm (applying both U-Prune and LA-Prune strategies) has the best results in all cases of the different types of databases. This result demonstrates that both pruning strategies play a very meaningful role in exploiting CoHUI.
The execution times of the CoHUI-Miner and CoHUIM algorithms on moderately dense databases are shown in Fig. 2 at various thresholds of minCor and minUtil.
The results show that all the models of CoHUI-Miner are more efficient than the CoHUIM, especially in the situation of small minCor and minUtil. With regard to the runtime for the Accident dataset and minCor= 0.62, the  proposed method, basic CoHUI-Miner, is two times faster than the CoHUIM. Moreover, the CoHUI-Miner RU algorithm (using the U-Prune strategy) has an effective execution time at high minUtil thresholds. For example, at minUtil = 28, 000, 000, the execution time of CoHUI-Miner RU is about seven times faster compared to the basic CoHUI-Miner algorithm and about 14 times faster compared to CoHUIM. The execution time of the CoHUI-Miner RU +LA algorithm is the fastest in all case of the minUtil thresholds, about 14 times faster than the basic CoHUI-Miner algorithm and 28 times faster than the CoHUIM algorithm. In Fig. 2, showing the runtimes with the Mushroom dataset, also shows the similar results. This experimental results demonstrate that the U-Prune and LA-Prune strategies combined with the prefix utility value are effective on moderately dense datasets. Figure 3 shows the runtimes of CoHUI-Miner and CoHUIM on very sparse datasets. In these the items' correlation is very weak, so the number of candidates in the CHUUBI is rather small. Although the CoHUIM works in two phases, performing phase 2 to determine the utility for each candidate by rescanning the database will take less time. Therefore, the execution time of basic the CoHUI-Miner algorithm is faster than CoHUIM, but not significantly. However, the combination of the U-Prune and LA-Prune strategies shows significantly improvedperformance. For the Chainstore dataset, the average execution time of the CoHUI-Miner RU +LA algorithm is faster than those of the basic CoHUI-Miner and CoHUIM algorithms by about 1.2 and 1.5 times, respectively. Moreover,, for the Kosarak dataset, the smaller the minUtil threshold is, the more effective CoHUI-Miner RU and CoHUI-Miner RU +LA are.
In Fig. 4, we make a comparison of memory usage between the CoHUI-Miner and CoHUIM algorithms on very dense datasets, such as Chess and Connect. With these datasets, the memory consumptions of all the CoHUI-Miner models are much less than that of the CoHUIM. For instance, with the Chess dataset, minCor = 0.7 and all of the minUtil thresholds, the memory usage among the models of CoHUI-Miner is not much different, at about 550MB. Meanwhile, the memory usage of the CoHUIM algorithm is 6,800MB, which is 12 times higher. In addition, the memory usage of CoHUI-Miner RU +LA algorithm for Connect is the lowest, while the CoHUIM algorithm is overloaded at minUtil < 33, 500. This result proves that the CoHUI-Miner uses memory effectively with very dense datasets because these have a large number of candidates, and the application of the projected database strategy and the prefix utility structure has a clear effect on shrinking the search space. Moreover, the U-Prune and LA-Prune strategies are very effective. These eliminate a large number of unpromising itemset early, and thus the memory usage reduces significantly.
Next, the algorithms are implemented on moderately dense databases (Fig 5.), and all the CoHUI-Miner models still consume less memory than the CoHUIM. On the Accident dataset, the memory usage between the upgraded algorithm models is nearly the same but lower than CoHUIM's at minCor thresholds: it is about two times lower with minCor = 0.68, 2.5 times lower with minCor = 0.65 and three times lower with minCor = 0.62 respectively.  Similarly, the memory usage with the Mushroom dataset is also about five times lower. The strategies to reduce the search space and memory usage are still effective, but not as much as with dense databases because the number of candidates in these is significantly lower compared to Connect or Chess. In sparse or very sparse datasets (Fig. 6), such as the Chainstore dataset, the memory usage of the basic CoHUI-Miner algorithm is not much different from that of CoHUIM, because in a very sparse dataset the length of each candidate is small, so the CoHUIM algorithm uses only a little memory to store the candidates. However, the U-Prune and LA-Prune strategies show that the amount of memory usage decreases significantly because these pruning strategies remove many patterns which do not satisfy the CoHUI property. For instance, at minCor = 0.2 and minUtil = 1, 900, 000, the CoHUI-Miner RU +LA uses only 1,800MB, the CoHUI-Miner RU uses 2,800MB, while the CoHUIM uses about 8,500MB.
For the Kosarak dataset, similar to the Chainstore dataset, the memory usage of the basic CoHUI-Miner is not much different from that of CoHUIM. On the other hand, the CoHUI-Miner RU +LA algorithm shows the lowest memory usage at all the minCor and minUtil thresholds.

C. SUMMARY
Based on the experimental results, some observations can be summarized as follows. 1) The projection strategy on the database reduces the database size significantly when exploring the search space because it stores only the information of the needed transactions for calculation in the next level.
2) The new concept of prefix utility can be used to calculate the utility values of itemsets quickly in exploiting CoHUIs. As mentioned before, the utility of an itemset is calculated by the sum of the prefix utility values and the utility values of the items to be considered. Thus, when the prefix utility value is determined, the utility of an itemset can be calculated easily during the processing algorithm.
3) The CoHUI-Miner algorithm runs in one phase and does not generate the candidates. It shows good performance compared to the previous CoHUIM on runtime and memory usage. Through the experimental results, the efficiency of the algorithm has been proven. Especially, with the dense and moderately dense datasets, the performance of the CoHUI-Miner algorithm is much better than the CoHUIM algorithm. 4) Many pruning strategies such as TWU-Prune, KULC-prune, Variant of U-Prune and LA-Prune are applied to mine CoHUIs effectively on the projected database.

VI. CONCLUSIONS
This paper presented a new measure for efficiently mining CoHUIs, in which we calculated the utility values of projected transactions via projected databases. We proposed the prefix utility to calculate the utilities of extended itemsets during the search process. The storage of the prefix utility combined with many pruning strategies during the mining process significantly increases the performance of the proposed algorithm. The CoHUI-Miner is a single-phase algorithm and no candidates were generated during the discovery process for CoHUIs. The experimental results on several databases with different densities, from sparse to very dense, showed that our proposed algorithm was more efficient compared to the previous one.
In the next study, we aim to use a novel data structure to reduce the memory usage, and explore some pruning strategies to increase the performance of the algorithm.
Moreover, we will apply this approach to incremental and dynamic profit databases. On the other hand, we will combine with the optimization problem [46]- [49] to be able to improve the quality of industrial production activities.