Finding Partial Periodic and Rare Periodic Patterns in Temporal Databases

Most of the periodic pattern mining algorithms extract fully periodic patterns by strictly monitoring the cyclic behaviour of patterns in transactional as well as temporal databases. The most recent and preferred method for discarding non-periodic uninteresting patterns is partial periodic pattern mining, which has control over the strictness measure on cyclic repetitions of patterns. Recently, a variety of industries, including fraud detection, telecommunications, retail marketing, research, and medical have found applications for rare association rule mining, which uncovers unusual or unexpected combinations. A limited amount of literature demonstrated how periodicity is essential in mining low-support rare patterns. However, time of occurrence is also a vital phrase that is ignored which further aids in significant information retrieval. With this inspiration, a novel depth-first search framework named 3P-BitVectorMiner, is proposed to extract entire partial periodic patterns from a temporal database. Experiments are carried out by varying support and periodicity thresholds for a variety of datasets. It is found that 3P-BitVectorMiner consistently displays greater performance over the state-of-the-art algorithm 3P-Growth. Further, the scalability of the 3P-BitVectorMiner algorithm is also presented to demonstrate the efficiency over the 3P-Growth algorithm on large temporal databases. In addition, two variations named RFPP-BitVectorMiner and R3P-BitVectorMiner are proposed to mine rare fully periodic patterns and rare partial periodic patterns from temporal databases respectively. Different experiments carried out show that these proposed frameworks successfully capture periodic rare patterns in temporal databases.


I. INTRODUCTION
The aim of Pattern mining, a key component of data mining, is to extract valuable information from a vast volume of data. The most researched area of pattern mining is Frequent Pattern Mining(FPM), which extracts often recurring patterns from a dataset. Rare Pattern Mining(RPM), on the other hand, extracts hidden, unusual, yet valuable information from the set of transactions. However, one significant drawback of these techniques is that the pattern's occurrence behaviour is not taken into account. Additionally, the information regarding the time at transaction occurred is also completely The associate editor coordinating the review of this manuscript and approving it for publication was Giacomo Fiumara . ignored. ''Periodic Frequent Pattern Mining''(PFPM), has come up as a promising area that studies the occurrence characteristics of the itemsets. An inter-arrival time of an itemset is said to be periodic (or cyclic) if it is not greater than a periodic measure considered. Tanbeer et al. [1] have first demonstrated the significance of taking regularity into account in a static database. With its expansion, it is now employed in a variety of applications, including the analysis of gene and medical data [2], [3], mobility intention analysis [4], website user behaviour analysis [5] and so on. The periodic behaviour is closely checked in the early PFPM works using any of the user-specified periodicity measures like maximum periodicity [1], [6], [7], variance [8], [9], multiple periodicity measures [10] and lability [11], [12], among others. The patterns extracted by these methods are called full periodic patterns. These patterns are selected only when all the periodicities confirm the threshold measure considered. However, in real-life applications, mining partial periodic patterns [13], [14] is also vital. Even though some events may happen regularly, some events occur only on weekends or at a specific instant of the day or maybe a particular day of the month. For example, heavy traffic is observed during weekends than weekdays. In supermarkets, customers can purchase milk and butter regularly. Whereas, rice is purchased monthly once. Partial periodic patterns relax the strictness measure by having a count on the requirement of the minimum number of cyclic repetitions. In addition, maintaining the temporal information gives still more insight into the knowledge discovery. For example, during weekends traffic congestion is observed more from 10 a.m. to 2 p.m. or 6 p.m. to 9 p.m. than at other timings. Therefore, it is important to preserve the occurrence time of transactions in temporal databases. Distinguished characteristics of temporal databases compared to transactional databases are: • The transactions are sorted concerning their arrival time in ascending order.
• The arrival time of every transaction is not uniform.
• Multiple transactions may arrive at the same time. It should also be noted that the conversion of temporal information to transactional information by merging the transactions with common time stamps should be avoided. As it leads to the following issues: • The actual support of a pattern is lost when transactions with common time stamps are merged. In some cases, this may miss the required partial periodic patterns. For example, consider the sample temporal database presented in Table 1. It comprises 8 transactions with 5 distinct items. Here transactions T3 are T4 are having common time stamps. The merging of these transactions results in itemset {m,n,o,p,q}. This will cause a loss of actual support of items {n}, {o} and {p}. Further, as a result, these items may not be selected as partial periodic patterns.
• On the contrary, the merging of transactions may also create false associations leading to the generation of uninteresting partial periodic patterns. For example, merging transactions T3 and T4 will create an invalid association between items {m} and {q}. Few studies have focused on extracting partial periodic frequent patterns from columnar databases [15], [16], partial frequent patterns [17], [18], partial frequent as well rare patterns [19] from temporal row databases.
Periodic Rare Pattern Mining has been emerging as a new promising area to discover hidden unexpected or unusual activities with their occurrence behaviour. The primary focus behind periodic rare pattern mining is the ability to discover uncommon or unexpected combinations that are missed by PFPM algorithms. If the rare patterns are evenly spread throughout the transaction dataset, they are periodic and significant.Very few algorithms have been designed to mine these patterns [20], [21]. However, studying the time stamp information will further enhance the knowledge discovered. For example, traffic congestion may be more at particular times during festival days. As festivals are happening rarely but throughout the year hence it is important to capture the information. Unlike periodic frequent patterns, these are the patterns comprising low support and larger periodicities. The occurrence of these events is not captured with a majority of the PFPM algorithms. With this motivation, the following major contributions are included in this paper: • A novel algorithm named 3P-BitVectorMiner is proposed to capture all required partial periodic patterns from a temporal dataset. When the pattern satisfies a user-given periodicity measure called maxPer it is treated as periodic(or cyclic). Here, the number of cyclic repetitions is counted and based upon the user-specified periodic support measure called minPS the partial patterns are selected.
• The temporal database is converted to a bit-vector form. 3PTSList structure is created, which plays an important role in producing the one-length partial periodic patterns. Subsequently, it eliminates the nonperiodic one-length patterns which reduces the huge search space. In contrast to the pattern growth method used in 3P-Growth algorithm, here subsequent partial periodic patterns are generated with simple logical operations.
• As in real-life it is necessary to capture periodic rare patterns from the temporal database. In this paper, 3P-BitVectorMiner is modified and two variations are proposed. First, RFPP-BitVectorMiner is proposed to mine rare fully periodic patterns. Second, R3P-BitVectorMiner extracts rare partial periodic patterns. These are the first algorithms, to our knowledge, that successfully capture periodic uncommon patterns in temporal databases. When the periodic support minPS threshold is set too low to extract periodic rare patterns, this results in numerous periodic patterns including both frequent as well as rare patterns. In addition, it will also generate a lot of spurious patterns. If minPS is set high, then it is unable to extract many rare periodic patterns. To overcome this issue, two different support thresholds minFreqPS and minRarePS thresholds are used along with maxPer to control the required number of cyclic repetitions. Here minRarePS assists in discarding the uncommon patterns that are associated by chance and are considered to be noisy itemsets.
• Many real-life as well as synthetic, sparse and dense type datasets are used for experimentation. Results show that 3P-BitVectorMiner is faster, highly scalable and uses less memory compared to 3P-Growth. Additionally, several analyses using different periodicities and support thresholds are shown for RFPP-BitVectorMiner and R3P-BitVectorMiner. The remainder of the paper is arranged as follows: Section II discusses the related work done in the field VOLUME 11, 2023  of periodic pattern mining. The basic definitions of the proposed algorithms are defined in section III. Different modules and illustrations of 3P-BitVectorMiner are shown in Section IV. The two variations RFPP-BitVectorMiner and R3P-BitVectorMiner are exhibited in Section V. Experimental evaluation and result analysis are presented in section VI. Section VII discusses the time complexity of proposed algorithms. Section VIII highlights the conclusion and future directions.

II. LITERATURE WORK A. PERIODIC FREQUENT PATTERN MINING(PFPM)
A generalization of FPM, Periodic Frequent Pattern Mining(PFPM), addresses periodicity(occurrence behaviour or regularity). The literature review is carried out based on whether the time of occurrence is considered along with the periodicity threshold value.

1) RELATED WORK IN STATIC/STREAM DATASET
FP-Tree is a prominent data structure designed by Han et al. [22] that focuses on the enumeration of frequent patterns using the support count measure. The importance of considering the periodic behaviour of patterns is shown first in the work of Tanbeer et al. [1]. Here regularity behaviour is controlled by maxPer threshold measure. To support the regularity computation Regular-Pattern tree is constructed, where the transaction ids are stored only in the leaf nodes. The work is enhanced further to find regular patterns from stream data [6] and body sensor networks [7]. These algorithms discard itemsets even when a single periodicity fails to satisfy the maxPer threshold measure. To overcome this drawback, several other models are designed with different periodic measures. Rashid et al. [8] extracted regularly frequent patterns from static data by utilizing both support count as well as variance measures for finding frequency and regularity respectively. This work is modeled in the field of wireless sensor networks to find regular frequent sensor patterns [9]. The ''rare item problem'' is addressed by Kiran et al. [23], [24], [25] with multiple support and periodicity thresholds to extract frequent as well as rare frequent regular patterns from a static database. Fournier-Viger et al. [11] proposed a novel measure named lability to mine the stable periodic patterns from the database. A flexible method named ''Periodic Frequent Pattern Miner'' is designed by Fournier-Viger et al. [10]. The combination of minimum, maximum and average periodicity threshold measures is used here to discover entire frequent periodic patterns from the given set of transactions. To avoid setting the tedious task of occurrence frequency measures several models have come up with discovering top-k regular frequent patterns. Amphawan et al. [26] developed a single scan algorithm to discover top-k frequent regular patterns based on partition and estimation techniques. To deal with the stream data, this work is improvised and named TFRIM-DS. This is a single-pass algorithm that makes use of the sliding window technique to mine top-k regular itemsets having the highest support in the stream data. TSPIN, a model designed by Fournier-Viger et al. [12] constructs stable periodic-frequent tree and employs pattern growth approach to mine topmost-k frequent stable periodic patterns. The regularity concept is also introduced in mining highutility itemsets. An efficient single-scan algorithm called MHUIRA, is the contribution of Amphawan et al. [27]. This algorithm finds regularly co-occurring items with high utility values. In addition, HUIIs-Miner [28] is designed to extract infrequently purchased itemsets with high profits. PHM a contribution of Fournier-Viger et al. [29], uses the minimum and average periodicity measures to extract high-utility periodic patterns from the static dataset.
These methods are found to be too strict as patterns are pruned even if one of the periodicity is not satisfying the considered periodicity measure. To overcome this strict behaviour, partial periodic pattern mining algorithms are developed. The partial periodic mining algorithms take the itemsets into account even though some of the periodicities do not satisfy the periodicity measure. The number of cyclic repetitions is controlled by the minimum periodic support count threshold. Kiran et al. [13] designed GPFgrowth algorithm to extract a complete collection of partial periodic frequent patterns from a dataset. A novel periodicratio measure is utilized which considers the proportion of cyclic repetitions of frequent itemsets in databases. Venkatesh et al. [14] found a solution to the rare-item problem by discovering new measures named all-confidence and periodic-all-confidence. A pattern growth method ''Extended Periodic-Frequent pattern-growth'' is proposed to enumerate frequent patterns involving both frequent as well as rare periodicity.

2) RELATED WORK IN TEMPORAL DATASET
The characteristics of temporal transactions are non-uniform arrival time and multiple occurrences of transactions at a common time stamp. Initially, these features are handled by 3P-Growth algorithm designed by Kiran et al. [17], [18]. Here, 3P-list and 3P-tree data structures are designed which are used to store temporal information instead of storing tid information. 3P-Growth mining method enumerates all the partial periodic patterns by considering inter-arrival time information. To extract the patterns comprising both rare as well as frequent items in non-uniform temporal databases, Kiran et al. [19] proposed a novel measure named relative periodic support. In the initial phase, the temporal dataset is compressed into G3P-tree and further from this tree G3P-growth algorithm recursively extracts an entire set of partial periodic patterns. To extract periodic patterns with minimum cyclic repetitions while showing non-uniform periodic nature, Kiran et al. [30] introduced a relative periodicsupport measure. The periodic pattern growth method is used which mines the periodic pattern tree to capture periodic patterns in the non-uniform temporal database. The rare-item problem solution used for transactional data is enhanced by Venkatesh et al. [31] and ''Extended Periodic-Correlated pattern-growth'' method proposed which is able to mine frequent patterns that are correlated periodically. Few algorithms are developed to deal with columnar databases. Given a columnar temporal database, ''Frequent-Equivalence CLass Transformation'' is a run-time and memory-efficient method presented by Ravikumar et al. [16] to enumerate periodic-frequent patterns. Further, 3P-ECLAT a depth-first search framework is designed by Ravikumar et al. [15] where initially, one-length partial periodic patterns are generated by storing time-stamp in a list named TS-list. Next, an intersection operation is performed on the TS-list and entire partial periodic patterns existing in the temporal database are generated. To tackle memory, runtime and energy, Likitha et al. [32] developed max3P-Growth to extract maximal partial periodic patterns from the temporal dataset. In addition, to avoid the tedious task of setting minSup threshold value, Likhitha et al. [33] designed ''Top-k Periodic-Frequent Pattern Miner''. This model accepts a threshold value k and it presents all k frequent periodic patterns having the least periodicity value in a temporal dataset. Further, Likhitha et al. [33] contributed SPP-ECLAT method to extract the periodic-frequent patterns that are stable in a temporal dataset represented in vertical format.

B. RARE PATTERN MINING 1) RELATED WORK WITHOUT CONSIDERING PERIODICITY MEASURE
To mine rare itemsets and non-present itemsets, Adda et al. [34] developed ARANIM algorithm. It follows a top-down approach by starting from a k-itemset consisting of all items in the database. Subsequently, the subsets of the k-itemset are repeatedly produced and at every level, many non-existent patterns are generated and pruned. Rarity algorithm, a novel level-wise top-down strategy created by Troiano et al. [35], [36]. In the beginning, the longest itemset is found and its level-wise subsets are found subsequently. The advantage of this approach is that repeated database scanning is prevented. On the contrary, the memory requirements are increased to maintain different list structures. To extract the minimal rare itemsets(MRIs) which lie in the negative border of the frequent itemset zone, Szathmary et al. [37] designed Apriori-Rare and MRG-Exp. Further, ARIMA algorithm, [38] mines all rare itemsets from already discovered MRIs. The database is scanned multiple times to compute the support which degrades the performance. Bhasker and Yahia [39], [40], utilized the bond threshold along with support measure to handle the spurious rare itemsets extracted during mining of low support threshold value. At first, CORI algorithm transforms the input database into its vertical bit-wise format. This helps in performing simple logical operations to compute disjunctive as well as conjunctive support of the itemsets. Further, entire rare correlated patterns are mined in a bottom-up fashion with the help of this metric. To handle both frequent as well as rare patterns Borah and Nath [41] constructed SSP(Single Scan Pattern Tree). At first, the transactions are sorted in non-ascending order of support count and a compact tree is built by inserting it into the tree. During the tree building, if any path of the tree deviates from the support count descending order then the path is re-arranged. To store the static data in the main memory, Rai et al. [42], proposed a compact tree structure called the Binary count tree(BIN-Tree). An efficient mining technique is proposed to extract both rare and frequent itemsets. Lu et al. [43] contributed NII-Miner, the first tree-based method to discover all rare itemsets using a top-down depthfirst strategy. This method considers the dual perspective of the original database by the representation of negative items.
To minimize the search space, some of the existing RPM algorithms mine subsets of rare patterns in a bottom-up fashion. These algorithms found rare 1-itemsets along with their supersets from those transactions which comprise one rare item at least. Rare Pattern Tree(RP-Tree), a compact tree structure, is a contribution by Tsang et al. [44] which is similar to the FP-Tree structure [22]. Pattern growth approach is utilized to extract only those rare itemsets that fall in between minFrepSup and minRareSup threshold values. Similarly, to mine subsets of rare patterns, Borah and Nath [45] proposed Hyper-Linked Rare Pattern Mining, a memory-based queue data structure with the hyper-linked pattern. Algorithms proposed in [46] and [47], discovered minimal rare itemsets using the bottom-up approach which traverses through several frequent itemsets to reach the minimal rare itemsets in the lattice.

2) RELATED WORK CONSIDERING PERIODICITY THRESHOLD
Periodicity plays a vital role in discovering significant rare patterns in a wide variety of applications. MRCPPS is a contribution by Fournier-Viger et al. [20] to extract periodic rare correlated itemsets from multiple sequences. With the support threshold, the standard deviation of periods is used as the periodicity measure. Along with these thresholds, the bond measure is utilized to filter the bundle of spurious patterns generated in the process of extracting useful periodic rare correlated itemsets. PRCPMiner, is our novel contribution which is able to discover periodic rare correlated patterns [21]. Here CORI algorithm is enhanced by using three different threshold measures regarding support, periodicity and bond measures.
As the literature shows very less RPM algorithms have concentrated on finding the occurrence behaviour of rare patterns. In addition, the temporal information is not at all considered in most of the existing RPM works. To examine the periodic behaviour of partial patterns in the temporal database, PRCPMiner is modified and a novel algorithm called 3P-BitVectorMiner is proposed. Further, two variations named RFPP-BitVectorMiner and R3P-BitVectorMiner were presented to study the occurrence behaviour of rare patterns in the temporal dataset.

III. PERIODIC PATTERN MODEL
A complete collection of k unique data items is represented as D = {d 1 , d 2 ,. . . ., d k }. A sample collection of data items P ⊆ D is called a pattern. A pattern comprising of c unique items, where c≥ 1 is named as a c-pattern. A temporal dataset TD over D is an ordered group of transactions, i.e., TD = {t 1 , t 2 ,. . . ..,t x } where x ≥ 1 represents total number of transactions and the database size is represented as |TD|. A temporal transaction comprising three fields, t x = (tid,ts,P) where tid presents unique transaction identifier, ts represents time-stamp and P denotes a pattern. Let ts min and ts max represent the lower and upper time-stamp values in TD respectively. It can be observed from Table 1, two transactions can occur in the same time-stamp and there can be a delay between the two consecutive time-stamps. As a result (ts min ts max + 1) may not represent |TD|. This shows that the temporal dataset may represent transactional dataset but not vice versa. The time-stamp of a pattern Q ∈ P can be expressed as ts Q if it appears in a transaction t x = (tid,ts,P).
where i≤j≤n is used to signify the ordered time-stamps in which Q appears in TD. Support of Q is the number of transactions in which Q appears in TD and is indicated as Sup(Q)= |TS Q |. Example 1: Table 1  Definition 3 (Partial periodic pattern Q): Given the usergiven minimum period support threshold minPS, an itemset Q is considered to be partial periodic if PS(Q) ≥ minPS.
Example 3: For the user specified minPS value 2, itemset {mo} results as a partial periodic pattern.
Problem Definition Given a temporal database TD, a support threshold minPS and a periodicity threshold maxPer, the task of extracting partial periodic pattern is to find the entire collection of patterns with periodic support not less than minPS.
The partial periodic patterns(PPPs) enumerated by the proposed model satisfy the downward closure property. The correctness is illustrated with the help of Lemma 1 and is based on Property 1.
Lemma 1: If Q is a partial periodic pattern, then ∀R ⊂ Q and R ̸ = ∅, is also a partial periodic pattern.
Proof: If PS(Q) ≥ minPS then Q is a partial periodic pattern, with respect to Definition 3. Then PS(R)≥PS(Q)≥minPS based on Property 1. Therefore, R is also a partial periodic pattern.

IV. 3P-BitVectorMiner(PARTIAL PERIODIC PATTERN BIT VECTOR MINER): THE PROPOSED ALGORITHM
3P-BitVectorMiner is a novel method to extract an entire collection of partial periodic patterns(PPPs) from the given temporal dataset TD. Complete set of PPPs are discovered in two steps: (i) Transform TD into a bit-vector form and produce one length PPPs which are maintained in a 3PTSList structure (ii) Construct 3PTSTree and recursively traverse 3PTSTree in Depth First Search(DFS) method to extract complete set of PPPs by discarding non-periodic patterns during the mining task.

A. TRANSFORM THE DATASET INTO BIT-VECTOR FORM AND GENERATE ONE LENGTH PPPs
In the initial stage, the temporal database TD is scanned and each item is converted into a bit-vector form. Each bit in the bit-vector represents consecutive temporal transactions where the presence of an item is indicated by '1' and absence by '0'. A 3PTSList structure which stores the bit-vector of every item is simultaneously created. 3PTSList maintains two fields related to an item i : bit-vector of item i -bitVector(i) and periodic support of item i -PS(i). As every transaction is transformed into a bit-vector form, the 3PTSList structure is updated for all the items appearing in the transactions. Along with the modification of bit-vector, the periodic support values are updated in the 3PTSList as shown in Algorithm 1. The transaction id, tid cur , is considered to have continuous values beginning from 1. As shown in line 3, the current time-stamp information ts cur is stored in an array TSList and it is used as common time-stamp information for all the items. Using the bit-vector representation all the occurrence information along with the previous occurrence information can be retrieved. And also the Sup(i) can be calculated any time using bitVector(i). This shows that like 3P-Growth [17] support and previous time-stamp information need not be maintained for each item. Lines 6 and 7 show how the time-stamp value can be acquired by extracting the information from the TSList array for the required bit of bitVector(i). As observed in line 8, the current periodicity is computed by subtracting the current time-stamp from the previous time-stamp obtained from array TSList. The periodic support PS value of the current item i is incremented by one if the resultant periodicity value is not greater than maxPer threshold. Finally, once the entire database is scanned, the non-periodic one length items i with PS(i) not greater than minPS value are eliminated from 3PTSList. For the database shown in Table 1, the 3PTSList creation and updation are as shown in Fig. 1. Initially, after reading the first transaction the bit-vectors created for items m, n, o, and p are shown in Fig. 1(a). As this is the first appearance of items the PS values are zero. Fig. 1(b) shows the updated bit-vectors after reading the second transaction. As it can be observed only item n has occurred in consecutive transactions, its period difference is calculated. From the bitVector of item n the last bit set to 1 was for tid 1 which is the resultant value extracted by lastSetTid. Now from the common time-stamp array TS the previous time-stamp for item n is extracted and used in the current periodicity calculation. Here curPrd results in 1(2-1). According to the maxPer this is periodic and therefore PS value is incremented by 1. These steps are shown in lines 6 to 10 of Algorithm 1. Similar way the remaining transactions are read and the PS values are updated as shown in Fig. 1(c) and (d). After the complete database has been scanned, the non-periodic items of one length are removed for each item i ∈ T do 5: Set tid cur bit of bitVector(i) as 1 6: Let lastSetTid represent the tid of last bit set of bitVector(i) 7: Set prevTS ← TS[lastSetTid] 8: Compute current periodicity curPrd by subtracting the prevTS with ts cur . 9: if curPrd ≤ maxPer then 10: Increment periodic support PS(i) by 1 11: end if 12: end for each 13: end for each 14: Remove all non-periodic item i from 3PTSList having PS(i) < minPS and Sort remaining items in 3PTSList in ascending order of Support Count 15: Save all one length PPPs to 3POutputList and the remaining partial periodic 1-itemsets are sorted by their support value into ascending order. Fig. 1(e) shows the final 3PTSList after discarding the non-periodic 1-itemset {q} and sorting the remaining 1-itemsets in ascending order with respect to the periodic support PS . These one length PPPs are also saved into the 3POutputList. All the one length PPPs generated after the first scanning are sorted in ascending order. The first layer of 3PTSTree is constructed with the sorted PPPs. Further, the supersets VOLUME 11, 2023 Algorithm 2 3P-BitVectorMiner( 3PTSList ) 1: for each item i in 3PTSList do 2: Set pi ← ∅ and k ← i 3: for each item j that comes after item i in 3PTSList do 4: Set δ ← k ∪ j and Generate bitVector(δ) ← bitVector(k) ∧ bitVector(j) 5: Set PS(δ) = CalculatePeriodSupport(bitVector(δ)) 6: if PS(δ) ≥ minPS then 7: Store PS(δ) to 3POutputList and Add δ to pi 8: end if 9: end for each 10: 3P-BitVectorMiner(pi) 11: end for each 12: Procedure CalculatePeriodSupport(bitVector) 13: Set PS = ∅ 14: for each bit b in bitVector do 15: Let x and y represent the tid of ordered bits that are 1 in bitVector 16: Compute current periodicity curPrd by subtracting the TS[x] with TS[y]. 17: if curPrd ≤ maxPer then 18: Increment PS by 1 19: end if 20: end for each 21: Return PS 22: end procedure of one length PPPs are generated as shown in Algorithm 2. Every item i will be considered and in each iteration item j that has a higher PS value than i will be considered. The initial stage generates a superset of item i and j named as δ. Next, an AND operation is performed on the bitVectors of i and j to generate bitVector(δ). Further, the period support of δ is calculated as shown in Procedure CalculatePeriodSupport. Let x and y represent tid's of two continuous bits that are 1 in bitVector(δ). The curPrd is computed by subtracting the timestamps corresponding to x and y. If the curPrd is ≤ maxPer then PS is incremented by 1. This task is repeated for all the bits of bitVector(δ). Finally, if the resultant PS value is ≥ minPS threshold value then itemset δ is partially periodic and is saved in to 3POutputList. Further, when the child nodes produce partial periodic itemsets, the DFS method is continued in that path to generate all the remaining PPPs. In every other case, non-periodic itemsets are pruned. Their children will not be traversed as there is no chance of generating any PPPs. The PPPs generated for item {m} are shown in the Figure 2. Since itemset {mo} and itemset {mp} both results as PPPs, they are saved into 3POutputList. The DFS traversal continues further to extract superset {mop} which is also found to be a partial periodic pattern and saved into 3POutputList. Whereas, Itemset {mn} is pruned because it is non-periodic and traversal along that path is not commenced. Since item {m} doesn't have any more supersets, the process is then repeated by considering item {o}. This task is repeated for all the pending one length  PPPs in the first layer of 3PTSTree. The final resultant 3PTSTree after the DFS traversal of all items is presented in Figure 3. As the proposed method of finding PPPs follows downward closure property, this reduces the search area and results in improving the mining performance.

V. RARE PERIODIC PATTERN MINING
In many real-life situations, it is necessary to extract rare patterns whose presence is throughout the database. When the period support minPS threshold is kept too low to extract rare patterns, this results in numerous periodic patterns including both frequent as well as rare patterns. In addition, it will also generate a lot of spurious patterns. If minPS is set high, then it is unable to extract many rare patterns. To overcome this issue, two different support thresholds minFreqPS and minRarePS thresholds are used along with maxPer threshold to control the number of cyclic repetitions. Here minRarePS assist in discarding the uncommon patterns that are associated by chance and are considered to be noisy itemsets. Based on the strictness of periodicity measure rare periodic patterns can be classified as full and partial periodic patterns.
The Rare full periodic pattern measure is too strict and a pattern is discarded even when one inter-arrival time is also exceeding the maxPer threshold. As rare patterns show the tendency to behave non-periodic in certain time-period there is a need to propose a relaxed measure.
To handle Rare full and partial periodic patterns, 3P-BitVectorMiner is modified and two variations are proposed. RFPP-BitVectorMiner is proposed to mine rare fully periodic patterns and R3P-BitVectorMiner extracts rare partial periodic patterns. Algorithm 1 is modified as follows in order to deal with rare patterns: When discarding non-periodic itemsets from the 3PTSList, item Q with PS(Q) < minRarePS are also discarded. Removing noise itemsets aids in minimizing the search space. All item Q with PS(Q) ≥ minFreqPS are retained as their supersets may have PS value < minFreqPS ∧ ≥ minRarePS. Further, in Algorithm 2, line 6 is modified according to Definitions 4 and 5 to extract RFPPs and R3Ps respectively.

A. EXPERIMENTAL SETUP
The proposed framework 3P-BitVectorMiner accepts minPS and maxPer threshold values from the user and discovers all partial periodic patterns from the temporal database. As 3P-Growth is the state-of-the-art algorithm which considers row temporal information, accepts the same inputs and generates same number of PPPs as 3P-BitVectorMiner. Therefore, 3P-BitVectorMiner is compared with 3P-Growth [17] which is the state-of-the-art algorithm in mining PPPs. This section analyses how the approaches differ in terms of memory usage and execution time. The scalability test is also presented to demonstrate the performance of the 3P-BitVectorMiner algorithm over the 3P-Growth on large temporal datasets. Further, the two variations RFPP-BitVectorMiner and R3P-BitVectorMiner is proposed to mine rare fully and partial periodic patterns respectively. To extract rare full and partial periodic patterns, the algorithms ask the user for two support thresholds minFreqPS and minRarePS as well as a periodicity threshold maxPer. Since these are the first algorithms for mining rare periodic patterns from temporal databases, the algorithms are tested on different datasets with varying threshold values for maxPer, minFreqPS and minRarePS. All the proposed algorithms are implemented in the Java platform and are tested in the system with configuration Intel(R) Core(TM) i5-7400 CPU@3.00GHz with 8GB RAM running Windows 10 Enterprise.

B. DATASETS
For the experimentation, three real and one synthetic datasets with a varying number of transactions are downloaded from the ''frequent itemset mining dataset repository'' (http://fimi.ua.ac.be/data/). Mushroom is a dataset with total transactions of about 8k whereas Accidents is a large dataset with more than 340k transactions. Chess is a small real-world dataset with about 3k transactions. T20I6D100K is the sparse synthetic dataset generated by the IBM data generator. Pollution is a real-world dense high dimensional dataset with long transactions which is taken from https://uaizu.ac.jp/ udayrage/datasets.html. Both sparse and dense types of datasets are chosen and the descriptions of the dataset are given in Table 2.
C. EVALUATION OF 3P-BitVectorMiner AND 3P-GROWTH ALGORITHMS By taking into account various datasets for varied maximum periodicity and minimum support threshold values, the run-time performance of the algorithms is determined. Fig. 4 displays the run-time comparison of 3P-BitVectorMiner and 3P-Growth for various thresholds. Here X-axis represents minPS threshold value keeping maxPer as constant. On the other side, if X-axis represents maxPer threshold value then minPS is kept as constant. Y-axis represents the run-time in seconds in these figures. It has been observed that 3P-BitVectorMiner outperforms 3P-Growth in all the cases.

1) RUN-TIME PERFORMANCE
For the Accidents dataset, the minPS values are varied from 45% to 65% while keeping maxPer constant as 1% as shown in Fig. 4(a). Fig. 4(b) depicts the run-time performance of Chess dataset, where maxPer is set constant as 0.5% and minPS values are varied in the range of 55% to 75%. Compared to 3P-Growth, 3P-BitVectorMiner has demonstrated a performance improvement of 88.87% and 92.38% in the case of Accidents and Chess dataset respectively. As shown in Fig. 4(c), for T20I6D100K dataset, 3P-BitVectorMiner depicts a performance improvement of 74.2% when minPS values are changed from 0.4% to 0.8% and maxPer is kept constant as 2%. It is observed in Fig. 4(d), 3P-BitVectorMiner shows a performance gain of 91.33% and 91.05% for Mushrooms dataset when maxPer thresholds set as 0.1% and 3% respectively. Here minPS is varied in the range of 5% to 40%. Fig. 4(e) shows the case of Pollution database where the maxPer is set constant as 15% and minPS is changed from 50% to 54%. In this case 3P-BitVectorMiner has achieved a performance improvement of 91.98%. Also, it is noticed that algorithm takes a long time to run as minPS values are reduced further, especially when the dataset is large. Fig. 4(g) depicts the total execution time taken by Chess dataset where the maxPer is varied from 0.1% to 10% while the minPS is set constant as 55%. It is observed that compared to 3P-Growth, 3P-BitVectorMiner performance is improved by 94.04%. Similarly, 3P-BitVectorMiner also depicted a performance improvement of 73.88% in the case of T20I6D100K as shown in Fig. 4(f). Here the maxPer is changed from 0.25% to 5% while the minPS is set constant as  0.5%. Fig. 4(h) presents the case of Pollution database where the minPS is set constant as 50% and maxPer is changed from 2% to 50%. In this case 3P-BitVectorMiner have achieved a performance improvement of 92.23%. However, increasing the maxPer value or even decreasing does not give any significant changes in the number of patterns generated in other cases.
Influence of minPS and maxPer threshold values: The number of PPPs generated for the different datasets and thresholds considered in Fig. 4 are shown in Fig. 5. The Figures emerged in the following key points: (i) It is evident that minPS threshold has a negative effect on the generation of PPPs.That is as minPS threshold is increased the collection of PPPs decreases and vice-versa. The reason behind this is more number of 1-itemsets fail to satisfy the increased minPS value. This further declines the number of PPPs generated. Obviously, execution time taken reduces as a lesser number of PPPs are generated. (ii) On the contrary, the maxPer has a positive effect on the generation of PPPs. As maxPer is increased, most of the non-periodic 1-itemsets will turn into partial periodic 1-itemsets. This further becomes the reason to increase the resultant PPPs. As Fig. 4 depicts, the run-time performance of 3P-BitVectorMiner improves compared to 3P-Growth with a increase in the number of PPPs. The major difference in the case of 3P-Growth is the tree size grows when non-periodic 1-itemsets turn into partial periodic 1-itemsets. This further results in more recursive tree creation and increased mining time. Whereas, in 3P-BitVectorMiner, these complex operations are replaced with the simple logical bit-wise operations playing the main contribution towards performance gain. iii) It is also noted that, in comparison to maxPer, minPS's alteration has a greater impact on the generation of PPPs. Fig. 6 displays the memory consumption details of both algorithms considering various datasets with different threshold values set as exhibited in Fig. 4. Fig. 6(b),(c),(d),(e) presents the memory utilized by both the algorithms when minPS is varied for Chess, T20I6D100K, Mushroom and Pollution datasets respectively. The memory usage details of Chess, T20I6D100K and Pollution datasets for maxPer variation is shown in Fig. 6(f),(g) and (h) respectively. In all these cases 3P-BitVectorMiner consumes lesser memory compared to 3P-Growth. The following observations are noted from the Figures: (i) The memory consumption reduces with the increase in minPS threshold value and vice versa. (ii) Conversely, as maxPer threshold increases memory usage is also increased. It demonstrates that when more number of nonperiodic 1-itemsets change into partial periodic 1-itemsets, causes rise in the number of PPPs formed, increasing the need for memory. In 3P-BitVectorMiner Partial periodic 1-itemsets are represented using bit-vectors and further the PPPs are extracted using logical operations consuming the same number of bits. Consequently, in 3P-Growth, when  non-periodic 1-itemsets transform into partial periodic 1-itemsets, the tree size increases and further increases the memory need. This adds to the load of memory needed for the increased recursive tree constructions. 3P-BitVectorMiner uses 31.52% less memory and 32.32% less memory for T20I6D100K dataset when minPS and maxPer are varied, respectively, as compared to 3P-Growth. In the case of Mushroom dataset, 31.11% and 26.59% lesser memory is consumed by 3P-BitVectorMiner when the maxPer is set as 0.1% and 3% respectively. The Chess dataset has the same characteristics as the Mushroom dataset. The memory consumption of 3P-BitVectorMiner is reduced by 29.48% and 17.32% when minPS and maxPer is varied respectively. In the case of Pollution dataset, similar behaviour is observed. When minPS and maxPer is varied, 3P-BitVectorMiner consumed 65.76% and 57.56% lesser memory compared to 3P-Growthrespectively. However, as shown in Fig. 6(a), only for Accidents dataset, 3P-BitVectorMiner consumes 20.22% more memory space than 3P-Growth. Even though Accidents is a large dataset, the number of PPPs produced is very less. Memory is consumed in finding and discarding non-periodic partial patterns.

3) SCALABILITY
The scalability operation performed here determines the effectiveness and productivity of 3P-BitVectorMiner over 3P-Growth on large temporal datasets. The T20I6D100K database, a sparse real-world large dataset, was employed in this experiment to carry out the scalability testing. The database is split into five equal sections for this experiment, each portion comprising around 20K transactions. The performance at each iteration is carried out by accumulating the VOLUME 11, 2023 previous portion of transactions. The sparse dataset chosen shows the performance of algorithm as every portion consists of a different number of items. When minPS is set to 0.4% and maxPer is set to 2%, Fig. 7(a) and (b) displays the results for the runtime-performance and memory consumption levels of both the methods for various database sizes respectively. Fig. 7(c) depicts the corresponding resultant number of PPPs extracted by both the algorithms. The following are some significant points observed from these figures: (i) The resulting number of PPPs also rises as the database size increase. As a result, both algorithms' runtime and memory needs increase linearly. (ii) 3P-BitVectorMiner shows a run-time performance improvement of 76.6% compared to 3P-Growth algorithm. (iii) Compared to 3P-Growth, 3P-BitVectorMiner consumes 38.77% lesser memory. The scalability test demonstrates that, with less runtime and memory needs, 3P-BitVectorMiner could extract partial periodic patterns from massive temporal databases.

D. EVALUATION OF RFPP-BitVectorMiner AND R3P-BitVectorMiner
The two variations RFPP-BitVectorMiner and R3P-BitVectorMiner, uses minFreqPS and minRarePS thresholds, in order to restrict the retrieved patterns to rare patterns. While maxPer threshold makes sure that only periodic patterns are enumerated. As rare periodic patterns are patterns with low support and larger periodicity in comparison with frequent periodic patterns, the thresholds are set accordingly. Threshold minRarePS is varied from 2 to 12%, 40 to 48% and 0.1 to 0.5% as shown in Fig. 8(a),(b) and (c) respectively. Whereas, minFreqPS is kept constant as 16,18 and 20% in Fig. 8(a), 49,50 and 55% in Fig. 8(b) and 0.6,0.7 and 1% in Fig. 8(c) respectively. The threshold maxPer is set as 2% in Fig. 8(c) and 15% in both Fig. 8(a) and (b). Fig. 10(a),(b) and (c) presents the corresponding RFPPs and R3Ps generated for these execution setups. The experiments resulted in the following significant outcomes: (i) As it can be observed from Fig. 9(a) and (b) the minRarePS variation has shown a negative effect on the count of resultant RFPPs and R3Ps. The decrease in minRarePS especially the low threshold accelerates the transformation of noisy itemsets to rare 1-itemsets. This increase in rare 1-itemsets results in a rise in the number of RFPPs and R3Ps produced for low minRarePS threshold values. As the number of RFPPs and R3Ps rises, the execution time also increases as shown in Fig. 8(a),(b) and (c). It is also observed that the number of RFPPs and R3Ps start varying for low minRarePS thresholds. However, as T20I6D100K is a sparse dataset, the number of R3Ps extracted are more compared to RFPPs as depicted in Fig. 9(c). (ii) On the other hand, the number of RFPPs and R3Ps increases very slowly with the rise in minFreqPS threshold value in the case of T20I6D100K sparse dataset as shown in Fig 8(c). Threshold minFreqPS is varied from 10 to 18%, 50 to 75% and 0.4 to 0.8% as shown in Fig. 10(a),(b) and (c) respectively. Whereas, minRarePS is kept constant as 4,6 and 8% in Fig. 10(a), 46,47 and 48% in Fig. 10(b) and 0.2,0.25 and 0.3% in Fig. 10(c). The threshold maxPer is set as 2% Fig. 10(c) and 15% in both Fig. 10(a) and (b). Fig. 11(a),(b) and (c) presents the corresponding RFPPs and R3Ps generated for these execution setups. The experiments resulted in the following significant outcomes: (i) As it can be observed from Fig. 11(a) and (b) the minRarePS variation has shown a negative effect on the count of resultant RFPPs and R3Ps. The decrease in minRarePS threshold accelerates the count of rare 1-itemsets. In addition, this rise in the count of rare 1-itemsets increases the number of RFPPs and R3Ps produced. A similar observation is noted during the generation of R3Ps as depicted in Fig. 11(c) for T20I6D100K dataset. Additionally, there is a slight variation where a decrease in minRarePS has not much effect on  Total execution time taken by Mushroom, Pollution and T20I6D100K dataset respectively presented in Fig. 12(a),(b) and (c). The Figures present a significant impact on the total execution time when maxPer is changed by keeping minRarePS and minFreqPS constant. Threshold maxPer is varied from 0.1 to 20%, 2 to 50% and 0.25 to 20% in a different range as depicted in Fig. 12(a),(b) and (c) respectively. Whereas, minRarePS is kept constant as 4,6 and 8% in Fig. 12(a), 46,47 and 48% in Fig. 12(b) and 0.2,0.25 and 0.3% in Fig. 12(c). The threshold minFreqPS is kept as 10%, 50% and 1% in Fig. 12(a),(b) and (c) respectively. Corresponding RFPPs and R3Ps generated for these execution setups are shown in Fig. 13(a),(b) and (c) respectively. Different tests conducted have resulted in the following major outcomes: (i) For very low maxPer threshold value, either RFPPs not generated ( Fig. 13(a),(b)) or very less RFPPs are generated (Fig. 13(c)). However, R3Ps are generated because itemsets are still considered even when some inter-arrival time exceeds maxPer threshold. An increase in maxPer threshold, increases the PS count which in turn changes more number of aperiodic 1-itemsets into partial periodic 1-itemsets. Consequently, the number of RFPPs and R3Ps produced also increases. It can be observed that when maxPer around 20, the same number of RFPPs and R3Ps is generated and further they will increase very slowly. This is because of the reduction in further transformation of non-periodic to periodic 1-itemsets. (ii) The minRarePS threshold value has a negative effect on the number of RFPPs and R3Ps produced. An increase in minRarePS value, decreases the noisy itemsets while increasing the rare 1-itemsets. In addition, this reduction in noisy itemsets increases the number of RFPPs and R3Ps produced. (iii) Obviously, increase in the number of RFPPs and R3Ps demands more execution time as shown in Fig. 12(a),(b) and (c).

VII. DISCUSSION SECTION
This section compares the time complexity analysis of 3P-BitVectorMiner and 3P-Growth algorithms. 3P-Growth [17], [18] is the state-of-the-art method that handles multiple arrivals of transactions simultaneously considering the row temporal database. The time complexity analysis of 3P-Growth: In this model the two components of 3P-tree named a 3P-list and a prefix tree are constructed. The operations performed in 3P-Growth algorithm are: i)Initially, the entire dataset is scanned and all the one-length periodic items are retained in the 3P-list. Consider a temporal dataset comprising 'C' number of unique items and the database size depicted as 'S'. Here if 'C' number of items is considered interesting(periodic) and present in each transaction S then the generation of 3P-list is in O(C × S). ii) The dataset is scanned once again and the items in every transaction are sorted and saved in the prefix tree. In the worst case, the prefix tree creation operation is in O(C × S). iii) Next, the prefix tree is recursively mined in a dfs manner to extract PPPs. The collection of possible itemsets generated is N = 2 C − 1. Finally, to generate the conditional pattern base, 3P-list and prefix-tree of α for every considered itemset α that extends an itemset β, 3P-Growth traverses the node-links of the 3P-list of β. As these structures of β are only visited once, this construction is completed in linear time. Hence the 3P-Growth's overall time complexity is O(C × S × N). The count of itemsets taken into account in real-world applications relies on the features of the database and the considered threshold measures of the algorithm. Due to the usage of search space pruning techniques, fewer itemsets may be taken into account when minPS is increased and maxPer threshold value is decreased.
Resultant PPPs are generated using two algorithms in 3P-BitVectorMiner. In Algorithm 1, the entire database is scanned and transformed into a bit-vector representation. Simultaneously one length PPPs are generated and stored in 3PTSlist data structure. In the worst case, the time complexity of 3PTSlist creation is O(C × S). Initially, Algorithm 2, performs the logical AND operation by considering the two current length itemsets to generate the larger length itemsets. Every item is represented by S number of bits. Irrespective of the number of bits the time complexity for performing the logical AND operation results in O(1). Furthermore, the period support calculation considers every bit of each item that requires a maximum 'S' number of operations. On the itemset lattice, this algorithm applies the DFS method. There are N = 2 C − 1 itemsets that are possible. As a result, it takes O(N × S) time to create all potential interesting itemsets. Hence 3P-BitVectorMiner has a total time complexity of O(N × S). Since RFPP-BitVectorMiner and R3P-BitVectorMiner are variations of 3P-BitVectorMiner, the time complexity remains same as that of 3P-BitVectorMiner. The entire effectiveness of 3P-BitVectorMiner ultimately depends on the real-world values of the provided parameters, such as C, S, and N. To prove that the proposed 3P-BitVectorMiner method outperforms the state-of-the-art 3P-Growth algorithm, extensive experiments on a variety of real-world databases are carried out in this paper.

VIII. CONCLUSION
The proposed efficient and novel algorithm named 3P-BitVectorMiner, is a depth-first search method to capture entire partial periodic patterns from the temporal database. The thresholds maxPer and minPS successfully control the periodicity and the number of cyclic repetitions respectively. Here, the input temporal database is converted into a bitvector form. 3PTSList structure plays a vital role in producing the one-length partial periodic patterns by eliminating the non-periodic one-length patterns which reduce the huge search space. In contrast to the pattern growth method used in 3P-Growth algorithm, here subsequent partial periodic patterns are generated with simple logical operations. An indepth study of the novel approach 3P-BitVectorMiner on various real-world, synthetic, sparse as well as dense datasets exhibited that it is run-time efficient, highly scalable and consumes less memory relative to the state-of-the-art 3P-Growth algorithm. As in real-life it is necessary to capture periodic rare patterns from the temporal database. In addition, two variations named RFPP-BitVectorMiner and R3P-BitVectorMiner is proposed to mine rare fully periodic pattern and rare partial periodic patterns from temporal databases respectively. Along with maxPer, two different support thresholds minFreqPS and minRarePS thresholds are used to control the number of cyclic repetitions and to prune the noisy itemsets. Different experiments carried out show that these proposed frameworks successfully capture periodic rare patterns in temporal databases.
The proposed 3P-BitVectorMiner framework and its variants were limited to extracting partial and rare periodic patterns from static databases based on maxPer and minPS threshold values. However, alternative periodic support metrics can be applied as per the user requirements. The technique can also eventually be expanded to include incremental mining of rare periodic patterns, including partial periodic patterns. Further, the proposed algorithms may be enhanced to extract partial and rare periodic patterns from stream data. In addition, suitable real-world situations can be considered and the proposed frameworks may be employed to extract the uncommon periodic patterns. Further, a step may be used for an in-depth study of the patterns discovered by the proposed methods to uncover the hidden knowledge. He has around 30 years of teaching experience. He has published many research papers in national/international journals/conferences. His research interests include fractals and theoretical computer science.
MINI SHAIL CHHABRA is currently pursuing the bachelor's degree with the Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.