Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework

Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.


I. INTRODUCTION
Sequential pattern mining (SPM) [1], [2], [3], [4], [5], [6], [7], [8] is a significant research theme in data mining. The primary goal of SPM is to identify frequent sequences that are defined by a minimum support level that is planned by the user. In particular, a customer who purchased ''Television'' would like to return to the store and purchase ''Speakers''. Market analysts can use this information to develop novel marketing tactics such as product cross-selling and advertising activities. The standard SPM techniques employ a frequency-based framework and are regarded as uninformative because they can't mine highly profitable sequences. As a result, high utility sequential pattern mining (HUSPM) [9] has been offered as a solution to this problem, it mines high utility sequences while taking into consideration both the item's profit and quantity.
The associate editor coordinating the review of this manuscript and approving it for publication was Chien-Ming Chen . Although previous HUSPM algorithms [9], [10], [11], [12], [13], [14], [15] produce very profitable sequences, they are unable to estimate the time gap between consumers' subsequent visits to the store. Therefore, high utility time interval sequential pattern mining (HUTISP) [16] was developed to take time periods into account. Its primary objective is to find the patterns that have the time interval between each item. Observe a store that offers groceries such as soaps, cereals, books, and ice creams. Assuming these items in a database, the goal is to determine the time interval between purchases of specific items which are being sold. As a solution, the store keeper may keep track of the quantity of consumed items by time period. For instance, consider an output sequence pattern with time intervals denoted as x, 4, y, 6, z . It signifies that a person who bought item x also bought item y after 4 months and returned to the store after 6 months to purchase item z. It mines extremely profitable sequences as well as the time interval between them. In spite of this, it uses the single VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ unique threshold for each item in the input database, implying that every item has the same unit profit. This is unsatisfactory because every single item in real world situations is unique and should not be considered equally. For instance, sales of washing machine will make more profits than sales of detergent. The problem of finding sequential patterns considering multiple utility thresholds was first addressed in [17] and later extended to [18]. The authors [18] designed Lexicographic sequence tree and utility array. The tree structure represents the possible HUSPs as nodes and the extension of each node is done using I-concatenation and S-concatenation mechanisms. The former denotes the itemset extension, whereas later denotes the sequence extension. Utility array allows to find the utility values of each node in the tree. However, the authors in [17] and [18] does not deal with the time intervals which generates more significant patterns. Due to the ever growing database sizes, many researchers of data mining revised the conventional mining algorithms and formulated the distributed algorithms to handle the big data more efficiently. The most efficient big data framework that helps in designing the distributed algorithms is MapReduce [19]. In 2008, Google developed MapReduce [19] distributed programming framework. It can handle the processing of big data by distributing the work into two parallel processes, namely, Mapper and Reducer. Mapper is a process that partitions the input data into multiple chunks and processes them in parallel. The processed output is sent to the reducer for further processing which leads to the final output. The output of the parallel program is always stored inside a distributed storage called Hadoop distributed file system (HDFS). The work flow of MapReduce framework is given in Fig. 1.
HUTISP [16] is a centralized algorithm that is ineffective for managing large amounts of data. In light of this, DHUTISP [20] has been introduced, it is a distributed MapReduce algorithm. Recently, a MapReduce algorithm, namely, DHUTISP-MMU [21] has been proposed to handle the issue of multiple utility of items. But, it considers the predefined time interval set before generating the patterns. In consideration of this, we propose MRHUTSP-MMU in this study. It will handle the issue of setting time constraints.
The work plan of the current paper is to: 1) Investigate HUTISP mining using multiple utility thresholds. 2) Propose the distributed framework that deal with the big data. 3) Provide an upper bound that maintains the downward closure property which can efficiently prune the unpromising candidates. 4) Introduce an efficient mechanism to push the time constraints within the algorithm instead of predefined time intervals. 5) Provide an empirical analysis and comparison between the distributed and non-distributed algorithms and analyze the impact of applying time constraints.
The significant contributions of the current paper include: 1) Defined a novel sequential pattern that includes utility, time intervals and multiple utility thresholds. 2) Proposed MRHUTSP-MMU -a distributed algorithm based on MapReduce framework that mines high utility sequential patterns including time intervals and each item having its own threshold. 3) Introduced Downward closure property that can be tested on the patterns which includes multiple utility thresholds. 4) The correctness of MRHUTSP-MMU is proved theoretically. 5) Compared the distributed version with the nondistributed approach in terms of processing time, usage of memory, and scalability.
The remaining sections include the following. Section II provides a quick overview of the literature. The problem is defined in Section III. In Section IV, the proposed algorithm is described. Section V contains the experimental outcomes. Section VI discusses the conclusions.

II. LITERATURE REVIEW A. SEQUENTIAL PATTERN MINING AND TIME INTERVAL MINING
Sequential pattern mining [1], [2], [3], [4], [5], [6], [7] is applied in a variety of data mining research disciplines. For mining sequential patterns, there are mainly two kinds of algorithms. They are Apriori [22] and Pattern growth approach [5]. In 1996, Agarwal and Srikant invented the GSP algorithm. Candidate generation and frequency calculation are the two processes of GSP. Zaki invented the SPADE method in [4], it is grounded on the vertical mining procedure. Han developed pattern growth approaches such as Freespan [23] and Prefixspan [5]. Garofalakis et.al., conceived and implemented the SPIRIT algorithm for sequential pattern mining using constraints in [2]. Researchers focused on time interval pattern mining after investigating the problem of mining sequential patterns. Chen et al. [23] presented novel algorithms for time interval mining, notably I-Apriori and I-Prefixspan. These techniques have been extended by Chen et al. by proposing FTI-Apriori and FTI-Prefixspan, which use fuzzy theory to partition temporal periods [24].

B. HIGH UTILITY ITEMSET MINING WITH MULTIPLE THRESHOLDS
The above listed algorithms are concerned with mining of frequent sequences and time interval patterns. They consider the occurrence of each item as binary and miss the number of units purchased and the profit raised by each item. In view of this, [25], [26] introduced a new framework called high utility itemset mining (HUIM) which is an improvement over frequent itemset mining [27], [28], [29], [30]. Considering non binary transactions, utility mining [25], [26], [31] was introduced, and it became an essential research theme in the data mining industry. Initially, HUIM with multiple thresholds was introduced in [32]. The authors proposed two algorithms, one is a baseline algorithm provided as an initial solution for mining HUIs using more than one threshold. The other is an improved version of the former that makes use of TID-index procedure. Using this index, utility of a candidate can be found by reading the TID-index of that candidate instead of scanning the complete database. Further, the above two algorithms are extended using estimated utility co-occurrence structure (EUCS) and a novel algorithm, namely, HUI -MMU TE [33] has been proposed. However, these are Apriori-based and follows candidate generation and level wise approach. These involve multiple database scans. In contrast to Apriori-based algorithm, the authors proposed HIMU and HIMU EUCP [34] algorithms. These make use of the compact tree structure. Also,the properties introduced in [34] assure the anti-monotonicity inorder to mine the patterns from MIU-tree. However, these algorithms order the items in a transaction with reference to the values of minimum utility threshold. It means that these algorithms are sensitive to specific ordering of items. MHUI algorithm proposed in [35], do not consider any ordering among the items and is superior to all the above mentioned high utility mining algorithms with multiple thresholds. The author proposed suffix minimum utility which is used in developing generic pruning strategies which are independent of item ordering based on minimum utility thresholds.

C. SEQUENTIAL PATTERN MINING INCLUDING UTILITY AND MULTIPLE THRESHOLDS
All the above mentioned algorithms are associated to HUIM with multiple utility thresholds and are unable to address the HUSP issue. A framework for HUSP mining that considers different utility threshold for each item has been introduced in [17]. From the proposed framework, the authors introduced a baseline algorithm and later it was extended by including three pruning strategies. The pruning techniques helps to reduce the upper bound there by the search space is reduced. Later, the framework proposed in [17] was extended to USPT algorithm [18]. The authors made use of compressed utility array structure and this helps in the construction of Lexicographic-sequential tree. To the best of our understanding, these are the only algorithms that discuss the issue of HUSP mining using multiple utility thresholds. Yet, they are unable to mine the time intervals.

D. HIGH UTILITY SEQUENTIAL PATTERN MINING INCLUDING TIME INTERVALS AND SINGLE THRESHOLD
Considering the time intervals, Wang et al. [16] introduced UTMining_A algorithm that can mine high utility sequences including the time intervals. Considering the need of processing big data, a distributed approach has been proposed in [20] which uses a single threshold for all items. The authors in [36] proposed an efficient way of imposing the time constraints while generating the sequential patterns with high utility. The above-mentioned techniques in [16], [20], and [36] on the other hand, treat each item as equally essential, considering a single minimal utility criterion. Recently, UIPrefixSpan-MMU [37] algorithm has been proposed which can handle multiple utility threshold problem. But, the algorithm assumes that the data fit into a single centralized system and not suitable to handle the current era of big data. As a result, in the current framework, we present a distributed approach for HUTSP mining with multiple utilities.

E. DIFFERENCE FROM EARLIER WORKS
The existing works in mining HUSP using multiple utility thresholds avoid the time frame amidst the items. Also, time interval HUSP mining algorithms use a single threshold for every item. Furthermore, the research in HUSP mining lacks a distributed algorithm using MapReduce framework that VOLUME 10, 2022  deal with utility, time interval sequences and multiple utility thresholds. To the best of our understanding, the current work is the initial study on proposing a MapReduce framework which can generate high utility sequences including time intervals along with a different threshold on each item.

III. PROBLEM DEFINITION
To illustrate the mining process, we consider a sample quantitative sequence dataset as given in Table 1.
then I is referred to as a q-itemset. An input sequence S is denoted as (t 1,1 , I 1 ), (t 1,2 , I 2 ), . . . , (t 1,n , I n ) , where I i denotes the itemset and t α,β denotes the time gap between the purchase of two itemsets I β and I α . A quantitative sequence dataset is denoted as D = {S 1 , S 2 , S 3 , . . . , S n }, where every input sequence has its unique identifier called Sequence id. Each item is allotted a profit value called its external utility, E(i), and its existence in the input sequence is allotted a value called internal utility, IU (i, I j , S n ), here i is the item in I j . For instance, IU (a, I 1 , S 1 ) = 3 and E(a) = 2 according to Table 2. Definition 1: The utility of each item i in I j for a given sequence S a is defined as The utility of i in S a is the maximum utility value out of multiple itemsets in S a where i occurs. It is denoted as u(i, S a ). Let us find the utility of a in S 1 , i.e. u(a, S 1 ) = max{u(a, Definition 2: The utility of a pattern P = (t 1,1 , I 1 ), (t 1,2 , I 2 ),. . . , (t 1,n , I n ) of length n in a sequence S a and P ⊆ S a is denoted as u(P, S a ) and derived as follows: For example, u( (0, a)(1, a) , Definition 3: Sequence utility is equal to the sum of the item utilities in a sequence S a , and derived as follows: For example, SU (S 1 ) = u(a, I 1 , The input splits for the dataset shown in Table 1 are assumed to be, Definition 5: Local utility of a pattern P in a partition T i is represented as For instance, U (T 1 ) = SU (S 1 ) + SU (S 2 ) + SU (S 3 ) = 43 + 34 + 40 = 117. Definition 7: Global utility of a pattern P is represented as U G (P) and derived as follows: For example, Definition 8: The utility of a given dataset D is defined as the summation of each partition utility. To illustrate, let us find the utility of the sample dataset D, Definition 9: For a pattern P = (t 1,1 , I 1 ), (t 1,2 , I 2 ), . . . , (t 1,n , I m ) , the following time constraints Considering the time interval between two consecutive itemsets, C 1 and C 2 denote the minimum and maximum time. Similarly, considering the first and last itemsets, C 3 and C 4 denote the minimum and maximum time.
Definition 10: Multiple Minimum Utility threshold table  (MMU table) is used to construct and express utility thresholds for every item, i.e. mu(i j ). Table 3 shows the MMU table for the sample dataset. mu(i j ) is defined as follows [32]: LMU is the user defined least minimum utility threshold, E(i j ) is the external utility of i j , and c is the constant to adjust mu of item.

Definition 11:
The minimum utility threshold of a pattern P is expressed as Definition 12: The potential minimum utility threshold of a pattern P is denoted as PMIU (P), where, ext(P) represents the possible extensions of P in the database. For instance, PMIU Definition 13: A pattern P is a local high utility sequential time interval pattern only if the local utility of P is not less than MIU (P) and holds C 1 , C 2 , C 3 , and C 4 .
Definition 14: A pattern P is a global high utility sequential time interval pattern only if the global utility of P is not less than MIU (P) and holds C 1 , C 2 , C 3 , and C 4 .
Definition 15: Problem Statement: Mining the HUTISPs considering more than one utility threshold is to extract the possible time interval sequential patterns those satisfy the utility threshold for a given input dataset.
Definition 16: Sequence weighted utility (SWU) of a pattern P is defined as the sum of the sequence utility of each input sequence in which P occur. It is defined as follows: Definition 17: Given a pattern P, its upper bound in a sequence S a is denoted as, where RU (P, S a ) is the remaining utility of P in S a and it is the summation of item utilities that appear after the last item of P in S a . For example, RU Multiple occurrences of a pattern consider the maximum value as its upper bound. Definition 18: For a pattern P, its upper bound in an input partition T i is, where x is the count of sequences present in T i in which P appear. For example, UB ( (0, a)(1, b) , T 1 ) = UB ( (0, a)(1, b) , S 1 ) + UB ( (0, a)(1, b) ,  SWU ← SWU + SU 10: end for 11: if SWU ≥ MIU (item) then 12: Emit the item i and global utility GU 13: end if 14 Definition 20: The projected database of a sequence S over the database D is expressed as D | S and it is defined as the collection of all the postfix of S in each input sequence of D.

IV. MRHUTSP-MMU
In this section, the proposed system is outlined by listing the details of each MapReduce phase. The major goal of the proposed algorithm is to impose time constraints and consider different utility thresholds while keeping the downward closure property. Using the MapReduce framework, the algorithm is designed in three phases. First of all, the proposed algorithm finds the global utility of every item in its first phase and finds the promising items. The detailed approach for the first phase is stated in Algorithm 1. Inorder VOLUME 10, 2022 to generate all the output sequences, we invoke the second phase. In other words, Algorithm 2 generates entire output sequences that are local to each partition. We conduct the third MapReduce phase to generate the patterns which are global. This is detailed in Algorithm 4. The workflow of MRHUTSP-MMU is given in Fig. 2.

A. FIRST PHASE OF MRHUTSP-MMU
The first phase of MRHUTSP-MMU is explained in Algorithm 1. It includes a mapper and a reducer. Mapper takes time interval quantitative sequence dataset, external utility associated with every item, and MMU-Table as input. Initially, mapper scans the database and extracts the items local utility and sequence utility of each input sequence (Line 1) (refer to Definition 1 and Definition 3). The local and sequence utilities of item are emitted by the mapper (Line 2). This in turn will be the input to the reducer. For each item, reducer is responsible to calculate item's global utility and sequence weighted utility. From Definition 7, global utility of an item is the sum of its local utility (Line 8). Similarly, from Definition 16, SWU refers to sum of all the sequence utilities (Line 9). Finally, the items whose sequence weighted utility satisfies the MIU threshold will be emitted as the output (Lines 11-13). They are said to be promising items written to the distributed cache file.
For example, consider the sequence S 1 , according to Definition 1, utility of a, b, d, and f are 8, 6, 6, and 3 respectively. Similarly, from Definition 3, sequence utility of a is 43. Hence, the mapper emits the key, value pairs a, (8,43) , b, (6, 43) , d, (6,43) , and f , (3, 43) . In the same way, for every sequence, the output emitted by the  mapper is given in Table 4. Now, these key, value pairs reach the reducer. In our example, the values (8, 43), (4,34), (4,54), and (8,13) are with respect to the key a (refer to Table 4). The values associated with the other keys are shown in Table 5. The value is a pair of local utility and sequence utility. The sum of local utilities i.e. 8 + 4 + 4 + 8 = 24 is the global utility of a. Similarly, sum of sequence utilities i.e. 43 + 34 + 54 + 13 = 144 is SWU of a. The global utility and sequence weighted utilities for each item are mentioned in Table 6. As described in Algorithm 1, if the SWU of an item do not satisfy its MIU, then the item cannot generate the high utility patterns. So, the reducer returns the item and its global utility after satisfying the above mentioned condition. MIU of each item is varied (refer to Table 3) in the current study and it is 45% of the database utility for item a, 28% in case of item b, and so on. As mentioned in Definition 8, database utility is 43+34+40+54+13 = 184. In the current example, every item satisfies its MIU threshold and emitted as output.

Algorithm 2 Second Phase
Input: Time interval quantitative sequence dataset DC p -Distributed cache file with promising items MMU- Table  Constraints -C 1 ,C 2 ,C 3 , and C 4 Output: LHUTISP, LU -Output patterns from each input partition and its utility function Mapper 1: Let the promising item read from DC p be i p . 2: Let C be the candidate pattern set. 3: Let P be the output pattern. 4: Modify the input sequence by removing the unpromising items. 5: for each i p do 6: Include P ← (0, i p ) into C. 7: if utility of P satisfies the MIU threshold then 8: Emit P and its utility 9: end if 10: Invoke MRHUTSP-MMU (D | P, C, MMU , C 1 , C 2 , C 3 , C 4 ) 11: end for 12: end function

B. SECOND PHASE OF MRHUTSP-MMU
The input for the second phase is time interval quantitative sequence dataset, promising items, MMU-Table and time constraints. We should at first make sure that all the candidate patterns may not be output patterns. Hence, we use two structures, one represents the candidate set C and the other represents the output pattern P (Lines 2-3 of Algorithm 2). Initially, the unpromising items will be pruned from each input sequence (Line 4). Now, an initial pattern P is created which includes time 0 and promising item i p (Line 6). If the utility of P (which is the global utility of i p ) satisfies its MIU then P is emitted as output (Lines 7-8). Inorder to extend the pattern P, we invoke the function MRHUTSP-MMU . This is a recursive function that extends the pattern P by scanning the projected database of P.
For example, the promising items received by the second phase mapper are a, b, c, d, e, and f . Hence, the initial patterns generated are (0, promising item). Global utility of a is 24 which is less than its MIU (i.e. 45% of the database utility). That is, (0, a) is not an output pattern. It is known that, super pattern of a non high utility pattern may be of high utility. Hence, we proceed to generate the super patterns of all the initial patterns by invoking Algorithm 3.
Algorithm 3 first reads the projected database of candidate pattern and generates all the possible time, item pairs. In this process, only the pairs which satisfy the upperbound are considered. Also, the time interval of the pair must obey the constraints C 1 and C 2 (Line 1). Now, the candidate pattern is updated by including all such pairs (Line 3). If the resulting candidate pattern satisfies the constraint C 4 , then the function is called with the new candidate pattern as its argument (Lines 4-5). Later, if it satisfies the constraint C 3 , we include it in the candidate pattern set (Lines 6-8). Next, if utility of the candidate pattern satisfies the MIU threshold, then it is emitted as output along with its projected database (Lines 10-12).

Algorithm 3 MRHUTSP-MMU
function MRHUTSP-MMU (D | P, C, MMU , C 1 , C 2 , C 3 , C 4 ) 1: Read the projected database D | P and generate all the time, item pairs such that each pair satisfies the constraints C 1 and C 2 . 2: for each time, item pair do 3: Update P ← P, (time, item ). 4: if P satisfies the constraint C 4 and UB(P) ≥ PMIU (P) then 5: Invoke MRHUTSP-MMU (D | P, C, MMU , C 1 , C 2 , C 3 , C 4 ) 6: if the pattern P holds the constraint C 3 then 7: Include P in list C 8: end if 9: end if 10: if U L (P) ≥ MIU (P) then 11: Emit P and D | P

Algorithm 4 Third Phase
Input: P, D | P MMU- Table  Output: P, U G (P) function Mapper 1: for each pattern P i in P do 2: Find the local utility of pattern P i 3: Emit P i and U L (P i ) 4: end for end function function Reducer 5: By adding local utility of P i calculate the global utility U G (P i ). 6: if U G (P i ) ≥ MIU (P i ) then 7: Emit P i and U G (P i ) 8: end if end function

C. THIRD PHASE OF MRHUTSP-MMU
In the third phase, the output from second phase is received i.e. local output patterns and their projected database. The mapper of third phase scans the projected database of each local pattern and finds the local utility (Line 2). The reducer finds the sum of local utilities, which results in the global utility (Line 5). Finally, only the patterns that satisfy the MIU threshold are emitted as output (Lines 6-8).
In our running example, for each local pattern, its utility in both the partitions is calculated. For instance, the utility of (0, b), (1, d) from partition 1 is 54 and from partition 2 is 0. Hence, the global utility of (0, b), (1, d) is 54 which satisfies its MIU , i.e. U G ( (0, b), (1, d) Table, MRHUTSP-MMU will generate all possible high utility time interval sequential patterns Proof: We prove the theorem by stating that MRHUTSP-MMU will not lose any of the pattern in every phase. 1) Prune the unpromising items in the first MapReduce phase: According to Algorithm 1, the mapper generates all the items and their local utility and sequence utility. Mapper will not miss any item present in the dataset. The reducer aggregates the local utilities to form the global utility of item and aggregates the sequence utility to form the sequence weighted utility. Reducer will output the items whose SWU satisfies its MIU . Thus, we do not miss any of the promising item. 2) Generate the candidate patterns whose length exceeds 2: In the second phase, each mapper generates entire local high utility sequential patterns following the time constraints. During this process, each candidate pattern is extended to its super pattern following the Property 1. Hence, the pattern is not extended if its upper bound does not satisfy the PMIU threshold. From Definition 13, whenever the local utility of a pattern is less than its MIU , then we can prune the candidate pattern. This pruning followed in the second phase will not lose any of the local candidate pattern. 3) Prune the patterns whose global utility do not satisfies its MIU : This is done in the third MapReduce phase. According to Definition 14, if the global utility of a pattern is less than its MIU , then it is not a high utility sequential pattern. Therefore, the third phase will not lose any of the output pattern. Hence, each phase in MRHUTSP-MMU will not lose any of the pattern whose utility satisfies the threshold.

V. RESULTS
We conducted several experiments on three real datasets and two synthetic datasets to assess the performance of MRHUTSP-MMU. Kosarak, 1 BMSWebview2, 2 MSNBC 3 are the three real datasets. Kosarak consists of 990,002 sequences and it is obtained from FIMI repository. It stores   Table 8. The parameters passed to the synthetic data generator are mentioned in Table 9. However, the above mentioned datasets do not include any internal/external utility information. So, we used a random number generator to provide the internal/external utilities from 1 to 10. The time information is included in the dataset based on the occurrence order of the itemset in a transaction. We employed a Hadoop cluster including one node as a master and eight as data nodes. Each node has a 2.5 GHz Intel Xeon CPU with 16 GB RAM and Hadoop 2.9.1 installed. Java is used to implement all of the algorithms.

A. RUN TIME COMPARISON WITH RESPECT TO CONSTRAINTS AND LEAST MINIMUM UTILITY
The constraints used in the performance assessment are C 1 = 0, C 2 = 2, C 3 = 0, and C 4 = 4 on the real datasets and C 1 = 0, C 2 = 5, C 3 = 0, and C 4 = 15 on the synthetic datasets. MRHUTSP-MMU is compared with UIPrefixSpan-MMU in terms of run time and the findings are shown in Fig. 3. Both the algorithms are executed with and without constraints. As a result, it is found that, the algorithms perform better with time constraints. Because the time constraints induce a smaller number of candidates to be generated, which lowers the search space and increases the performance. Furthermore, for lower values of LMU, the run time will increase. This is because of huge candidates for lower LMU values. Additionally, more effort is spent during candidate evaluation. It is also observed that MRHUTSP-MMU is more efficient than UIPrefixSpan-MMU on all the five datasets. UIPrefixSpan-MMU requires more time because it needs to read the database for three times, and the time for scanning the database is directly proportional to the database size. Whereas, MRHUTSP-MMU scans the database only for two times. Also, the parallel execution of multiple map and reduce functions leads to reduced processing time. Hence, the distributed version MRHUTSP-MMU is more efficient than the original UIPrefixSPan-MMU.

B. MEMORY CONSUMPTION OF THE ALGORITHMS
The memory consumed by MRHUTSP-MMU is lesser than the other three approaches. The test reports of both the algorithms with and without constraints has been presented in Fig. 4. It is observed that both of them consume lesser memory when constraints have been applied. This is due to the fewer candidate sequences generated by applying the constraints. It is also noticed that the memory requirement tends to decrease with an increase in the threshold. This is mainly due to the generation of more number of patterns for lower values of threshold. On Kosarak dataset, the memory consumption of MRHUTSP-MMU with constraints is nearly 1.5, 2.8, and 4.2 times less than MRHUTSP-MMU without constraints, UIPrefixSpan-MMU with constraints and UIPrefixSpan-MMU without constraints. On BMSWebView2 dataset, the memory consumption of MRHUTSP-MMU with constraints is nearly 1.2, 2.3, and 2.7 times less than MRHUTSP-MMU without constraints, UIPrefixSpan-MMU with constraints and UIPrefixSpan-MMU without constraints. On MSNBC dataset, the memory consumption of MRHUTSP-MMU with constraints is nearly 1.6, 2.2, and 2.7 times less than MRHUTSP-MMU without constraints, UIPrefixSpan-MMU with constraints and UIPrefixSpan-MMU without constraints.On Synthetic Dataset1, the memory consumption of MRHUTSP-MMU with constraints is nearly 1.7, 3.1, and 3.6 times less than MRHUTSP-MMU without constraints, UIPrefixSpan-MMU with constraints and UIPrefixSpan-MMU without constraints. On Synthetic Dataset2, the memory consumption of MRHUTSP-MMU with constraints is nearly 2.3, 3.3, and 3.9 times less than MRHUTSP-MMU without constraints, UIPrefixSpan-MMU with constraints and UIPrefixSpan-MMU without constraints.

C. SCALABILITY TEST
To test the scalability of MRHUTSP-MMU algorithm, we carried out two experiments -scalability with respect to dataset capacity, scalability with respect to node count in the cluster. The results of former experiment are given in Fig. 5. The number of sequences used in the experiment on the Kosarak dataset ranged from 100,000 to the entire dataset (990,002 sequences). The size has been changed from 200,000 to full dataset size. For BMSWebView2 dataset, sequence count ranged from 10,000 to the full dataset size (77,512). The dataset size has been increased by 20 thousand VOLUME 10, 2022 every time. For MSNBC dataset, the sequence count ranged from 200,000 to the full dataset size (989,818). The size is increased by 200,000 every time. The size of Synthetic Dataset1 is varied from 200,000 to 1,000,000. The size is increased by 200,000 for every experiment. The number of sequences considered for Synthetic Dataset2 is from 2,000,000 to 10,000,000. The size is increased by 2,000,000 each time. The least minimum utility used in the above experiment is 0.6, 1.7, and 0.6 for Kosarak, BMSWebview2, and MSNBC datasets, and it is 0.5 for Synthetic Dataset1 and Synthetic Dataset2. It is visible that the performance gradually decrease with the rise in dataset size. It is also noticed that the MRHUTSP-MMU scales well when executed on Synthetic Dataset1 and Synthetic Dataset2 compared to real datasets. This shows that MRHUTSP-MMU is more scalable than UIPrefixSPan-MMU especially on large datasets. The reasons are two fold. Firstly, MRHUTSP-MMU scans the dataset only for two times, whereas UIPrefixSpan-MMU scans the dataset for three times. Secondly, the distributed nature of the proposed algorithm parallelize the processing of sequences thereby reducing the execution time. Figure 6 demonstrates the scalability of MRHUTSP-MMU regarding the updation of node count. For this experiment, the run time of MRHUTSP-MMU is noted with and without constraints. The algorithms are executed using 2, 4, 6, and 8 nodes. The centralized approaches are not used as they exhibited more running time especially for the synthetic datasets using 2 nodes. It is noticed that, the reduction in run time is more when we scaled from 2 to 4 nodes compared to 4 to 6 and 6 to 8 nodes.

D. UPPER BOUND EVALUATION
Two upper bounds UB and SWU are evaluated and the results are shown in Fig. 7. UB is a tighter than SWU and this is evident from Definition 16 and Definition 18. A tighter upper bound always results in less number of promising sequences there by the candidates for evaluation are reduced. This effects the algorithm's run time. MRHUTSP-MMU using UB as the upper bound executes faster compared to SWU as the upper bound. Especially, for lower thresholds, the UB approach outperforms the SWU .

VI. CONCLUSION
Conventional algorithms for finding sequential patterns with high utility excludes the time factor and consider that all items in a transaction have the same utility. The prime contribution of the current paper is to study the problem of multiple utilities with respect to time interval mining of sequences and propose a distributed solution that deals with the big data. Taking into consideration, we contributed MRHUTSP-MMU for finding the sequential patterns which include separate utility for each item and generates the time between each itemset of the pattern. The experimental work was carried out to know the efficiency of the algorithm with respect to time constraints. Also, MRHUTSP-MMU outperforms the non-distributed algorithms in terms of run time, memory, and scalability. As an extension to the current work, MRHUTSP-MMU can be more investigated to introduce most efficient pruning strategies.
[38] P. Fournier-Viger, C. W. Lin She is also a reviewer of reputed international journals. She has an overall teaching experience of 22 years. Her research interests include graph mining, recommender systems, natural language processing, and security analytics.
MOHD WAZIH AHMAD is currently working as an Assistant Professor with the Department of Computer Science and Engineering, Adama Science and Technology University, Adama, Ethiopia. He has supervised more than 20 postgraduate thesis and working as the Leader of Intelligent Systems SIG with ASTU Campus. His current research interests include machine learning, the Internet of Things, information retrieval, and soft computing applications in agriculture and health. VOLUME 10, 2022