Mining Periodic-Frequent Patterns in Irregular Dense Temporal Databases Using Set Complements

Periodic-frequent patterns are a vital class of regularities in a temporal database. Most previous studies followed the approach of finding these patterns by storing the temporal occurrence information of a pattern in a list. While this approach facilitates the existing algorithms to be practicable on sparse databases, it also makes them impracticable (or computationally expensive) on dense databases due to increased list sizes. A renowned concept in set theory is that the larger the set, the smaller its complement will be. Based on this conceptual fact, this paper explores the complements, redefines the periodic-frequent pattern and proposes an efficient depth-first search algorithm that finds all periodic-frequent patterns by storing only non-occurrence information of a pattern in a database. Experimental results on several databases demonstrate that our algorithm is efficient.


I. INTRODUCTION
The big data generated by real-world applications naturally exists as a temporal database, an ordered set of transactions by timestamp.Beneficial patterns that can empower the users with competitive information to achieve socio-economic development lie hidden in this data.Tanbeer et al. [1] described a model to find periodically occurring frequent patterns in a uniform temporal database.Venkatesh et al. [2] generalized this model to (an irregular) temporal database.Since then, the problem of finding these patterns has received considerable attention [3], [4], [5], [6], [7], [8].A classic use case of periodic-frequent patterns is market-basket analysis.It involves finding the regularly purchased itemsets in market-basket data.An example of a periodic-frequent The associate editor coordinating the review of this manuscript and approving it for publication was Juan A. Lara .The above pattern says that five percent of the customers have purchased the items 'Bread,' 'Jam,' and 'Bag' at least once every two hours.The supermarket managers may find this information beneficial for campaigning, inventory management, and product placement.
The basic periodic-frequent pattern model is as follows [2]: Let I = m x=1 i x , m ≥ 1, be a set of items.Let TS = f x=i ts x , where ts x ∈ R + represents a timestamp, ts i = 1 represents the initial timestamp in TS and ts f represents the final timestamp in TS.The ts i represents a hypothetical timestamp, which is crucial to determine the time taken for the first appearance of a pattern in a database.Let P ⊆ I be a pattern (or an itemset).A pattern containing β, β ≥ 1, number of items is called a β-pattern.A transaction, tr = (tid, ts a , Q), a ≥ 1, is a triplet, where tid ∈ R + denotes the transaction-identifier, ts a ∈ {TS − 0} represents the timestamp, and Q is a pattern.A temporal database, denoted as TDB, over I and TS is an ordered set of transactions by timestamp.That is, TDB = n x=1 tr x , n = |TDB|, where |TDB| represents the number of transactions in TDB.A temporal database is said to be regular if n = t f ; otherwise, the database is said to be irregular.For a transaction tr = (tid, ts x , Q), if P ⊆ Q, it is said that P occurs in tr (or tr contains P) and that timestamp is denoted as ts P x .Let TS P  = k x=j ts P x , j, k ∈ [1, ts f ], be an ordered set of timestamps at which P has occurred in TDB.The support of P, denoted as sup(P) = |TS P |, where |TS P | represents the number of transactions containing P. The pattern P is said to be a frequent pattern if sup(P) ≥ minSup, where minSup refers to the user-specified minimum support value.Let ts P c and ts P d , j ≤ c < d ≤ k, be the two consecutive timestamps in TS P .The time difference (or an inter-arrival time) between ts P d and ts P c is defined as a period of P, say per P e .That is, per P e = ts P d − ts P c .Let PER P = {per P 1 , per P 2 , • • • , per P d } be the set of all periods for pattern P. The periodicity of P, denoted as prd(P) = max(per P x |∀per P x ∈ SP P ).The frequent pattern P is said to be a periodic-frequent pattern if prd(P) ≤ maxPRD, where maxPRD refers to the user-specified maximum periodicity value.Given a temporal database (TDB) and the user-specified minimum support (minSup) and maximum periodicity (maxPRD) constraints, the problem definition of periodic-frequent pattern mining is finding the complete set of periodic-frequent patterns having support no less than minSup and periodicity no more than the maxPRD.Please note that a pattern's support and periodicity can be represented in the percentage of |TDB| and t f , respectively.
Example 1: Let I = {p, q, r, s, t, u, v} be the set of items.Let TS = {0, 1, 2, • • • , 12} be the set of timestamps.A hypothetical temporal database generated from I is shown in Table 1.This database contains 10 transactions, i.e., n = 10.The initial timestamp of this database, i.e., ts i = 0.The final timestamp of this database, i.e., t f = 12.Since n! = ts f , this database represents an irregular temporal database with no transaction occurring at the timestamps 2 and 10.The set of items p and q, i.e., {p, q} (or pq, in short) is a pattern.It is a 2-pattern as it contains only two items.The pattern pq appears in the transactions whose timestamps are 1, 4, 5, 8, 9, and 12. Therefore, the list of timestamps containing pq, i.e., TS pq = {1, 4, 5, 8, 9, 12}.The support of pq, i.e., sup(pq) = |TS pq | = 6.If the user-specified minSup = 5, then pq is a frequent pattern as sup(pq) ≥ minSup.The periods for this pattern are: per   The complete set of periodic-frequent patterns generated from Table 1 is shown in Table 2.
Several algorithms [3], [9], [10], [11], [12], [13], [14] were described in the literature to find the periodic-frequent patterns in a database.The basic approach used in these algorithms has always been the same.It involved the following two steps: 1) Construct a list, say ts-list, containing the occurrence timestamps of a pattern.(The ts-list captures TS P .)2) Determine whether a pattern is periodic-frequent by performing an exhaustive search on its ts-list.The time complexity to search a pattern's ts-list is O(n), where n represents the length of a ts-list.Henceforth, the performance of a periodic-frequent pattern mining algorithm primarily depends on the lengths of ts-lists generated for the patterns in a database.The ts-lists in sparse databases are relatively small (or manageable) than those in dense databases.Consequently, the existing algorithms are practicable on sparse databases while impracticable (or computationally expensive) on dense databases.This paper tackles this challenging problem by exploring the concept of ''(set) complements'' and proposing an efficient algorithm to find the patterns in dense databases.
It has to be noted that discovering periodic-frequent patterns in dense databases is a challenging and non-trivial task for the following reasons: 1) Many algorithms [15], [16], [17], [18] were described in the literature to find frequent patterns in a dense transactional database.Since these algorithms completely disregard the temporal occurrence information of an item, they cannot be extended to discover periodic-frequent patterns in a dense temporal database.2) Zaki and Gouda.[19] explored the concept of set difference1 to calculate the frequency (or support) of a pattern in a database.They have not described any methodology to determine the periodicity of a pattern.
This paper proposes a novel method to calculate a pattern's periods and periodicity using complements.
The contributions of this paper are as follows.First, we introduce the concept of complement timestamp-list for an item and a pattern.Second, we redefine the support, period, periodicity, and periodic-frequent pattern using the complement timestamp-lists.Third, we propose a novel depth-first search algorithm to find all periodic-frequent patterns in a dense temporal database.We call our algorithm Periodic-Frequent Pattern Miner with Complements (PFPM-C).We also present our algorithm's correctness and theoretical complexity.Fourth, a new synthetic database generator algorithm was proposed to create synthetic sparse and dense temporal databases.We conduct experiments on synthetic and real-world databases and show that our algorithm is efficient concerning memory and runtime and highly scalable.Fifth, a case study on air pollution analytics has been presented to demonstrate the usefulness of finding periodic-frequent patterns using PFPM-C.This analytics involves identifying the areas where people were regularly exposed to harmful pollution levels in Japan.
This study is a significantly expanded version of our previous work [20], which reported a preliminary version of PFPM-C.This study contributes significantly to the related work by completely comprehending the existing literature.This study also provides theoretical correctness of the extended model of periodic-frequent patterns based on set complements.The time complexity of PFPM-C has also been investigated in this paper.Furthermore, incorporating new databases significantly expands the experimental findings section (Section V), which is of the utmost importance.This study demonstrates that PFPM-C outperforms the state-ofthe-art on dense databases, regardless of the minSup and maxPRD values.
The remainder of this study is organized as follows.Section II describes the literature on frequent pattern mining and periodic-frequent periodic mining.In Section III, we define the model of periodic-frequent patterns using complements.In Section IV, we introduce our algorithm.In Section V, we present our experimental results.Finally, in Section VI, we present our conclusions and future works.

II. RELATED WORK A. FREQUENT PATTERN MINING
Agrawal et al. [21] introduced frequent pattern mining as a key intermediary step to discover interesting associations between the itemsets in a transactional database.Since then, several algorithms (e.g., Apriori [21], ECLAT [22], and Frequent Pattern-growth [23]) were described in the literature to find these patterns effectively.Most of these algorithms can discover frequent patterns effectively in a sparse transactional database; however, they suffer from performance issues while mining the patterns in a dense transactional database.Zaki and Gouda [19] first explored the concept of set difference and proposed a depth-first search algorithm, ECLAT-diffSets, to find all frequent patterns in a dense transactional database.This algorithm uses the size of set difference list to determine whether a pattern is frequent or infrequent in the database.This algorithm does not describe any methodology to determine the periodicity of a pattern from the set difference list.Henceforth, this algorithm cannot be directly extended to find periodic-frequent patterns.This paper proposed a novel methodology to calculate the periodicity of a pattern given its set difference information.Luna et al. [18] conducted a detailed survey on frequent pattern mining and presented the improvements in the past 25 years.

B. PERIODIC PATTERN MINING IN TIME SERIES, EVENT SEQUENCES, AND GRAPHS
A key limitation of frequent pattern mining studies is their inability to consider the temporal occurrence information of the items in a database.Ozden et al. [24] tried to solve this limitation by adding a time attribute to a transactional database, then splitting the database into non-overlapping subsets by time, and finding cyclic association rules by counting the number of subsets in which a pattern has occurred.This approach simplifies the mining algorithm but also raises a major limitation of missing the patterns that span multiple windows.
Inspired by Ozden's work [24], Han et al. [25] described a model to find partial periodic patterns in an evenly spaced binary time series.Later, the authors proposed an efficient algorithm [26] to discover the partial periodic patterns.In this model, a binary series is split into multiple sequences of a particular length specified by the user, and interesting patterns were discovered using only the minSup threshold value.Yang et al. [27] extended Han's model to multiple minimum supports to address the rare item problem.Yang et al. [28] extended the model to discover asynchronous periodic patterns in a time series.Xun et al. [29] proposed an ECLAT-variant to discover partial periodic patterns in multi-source time series data.A key limitation of these models is that they fail to discover patterns spanning multiple sequences.More importantly, the periodic patterns discovered from a time series are conceptually different from those in a temporal database.In particular, the periodic patterns generated in a time series are very close to frequent patterns generated in a transactional database (if the length of segment or period is set to 1) as these models only use the minSup constraint to determine the interestingness of a pattern.
An event sequence represents an ordered list of events, where each event has a distinct timestamp.Mannila et al. [30] introduced frequent episode mining to find all the episodes (or subsequences of events) that frequently appear in a sequence over time.An episode is said to be frequent if its support is no less than the user-specified minSup value.Huang and Chang [31] extended Mannila's model to discover frequent episodes in complex event sequences.Although temporal and event-sequence databases capture the temporal occurrence information of the items in a database, they differ in their underlying data models and how they handle temporal aspects.As a result, the knowledge discovered from frequent episodes is completely different from the partial periodic patterns discovered from temporal databases.
Lahiri and Berger-Wolf [32] proposed a model to discover periodic patterns in graphs.Zhang et al. [33] described a model to discover seasonal periodic subgraphs in a network.It has to be noted that the patterns discovered from graphs are completely different from the patterns generated in databases.

C. PERIODIC-FREQUENT PATTERN MINING
Tanbeer et al. [1] described the periodic-frequent pattern model that eliminated the need for splitting the database by time.This model involved discovering all patterns in a temporal database that satisfy the user-specified minSup and maxPrd constraints.A pattern growth technique was presented to generate all periodic-frequent patterns in a database.Amphawan et al. [11] designed an efficient depth-first search-based algorithm for mining top-K periodic-frequent patterns without using the user-specified minSup constraint.Kiran and Reddy [3] introduced a novel greedy approach to discover periodic-frequent patterns effectively.Anirudh et al. [9] introduced a novel concept of periodic summaries to find the periodic-frequent patterns in a temporal database.Anirudh et al. [10] also presented a distributed in-memory algorithm based on map-reduce and Spark environment.Ravikumar et al. [14] proposed an ECLAT-based [22] to find periodic-frequent patterns in columnar databases.Tarun et al. [5] described a CUDA-based GPU algorithm to find periodicfrequent patterns.Since these algorithms store a pattern's complete temporal occurrence information in a list, they suffer from computational issues while dealing with dense databases.

D. EXTENSIONS OF PERIODIC-FREQUENT PATTERN MINING
Recently, the periodic-frequent pattern model was extended to find fuzzy periodic-frequent patterns [34], stable periodic-frequent patterns [7], [35], non-redundant periodicfrequent patterns [13], periodic-frequent patterns in uncertain data [36], geo-referenced periodic-frequent patterns [37], periodic-correlated patterns [2], regular patterns [38], [39], [40], periodic high-utility patterns [41], fuzzy driven periodic mining [6], and maximal periodic-frequent patterns [42].Unfortunately, most of these algorithms maintain the temporal occurrence information in a list structure and, thus, suffer from computational issues while dealing with dense databases.The solution presented in this paper can be extended to improve the performance of the above algorithms.However, in this paper, we confine ourselves to improving the performance of the basic periodic-frequent pattern mining algorithms for brevity.

III. REDEFINITION OF A PERIODIC-FREQUENT PATTERN USING COMPLEMENTS
In set theory, given a universe of elements U , a complement of a set A, denoted as A c , is the set of elements not in A. That is, A c = U −A.More important, it turns out that the larger the set A, the smaller its complement A c will be, and vice-versa.This motivated us to discover periodic-frequent patterns in dense databases using set complements.However, a key challenge we encountered is the methodology to determine the periodicity of a pattern from a complement set, as no prior studies exist in the literature.In this section, we resolve this challenge and provide the correctness.
Definition 1: The cts-list of an item i j ∈ I , denoted as TS i j = {TS − TS i j }.
Definition 2: The cts-list of a pattern P, denoted as TS P , represents the union of complement timestamp-lists of its items.That is, TS P = i k ∈P TS i k .
Property 1: The timestamp-list of a pattern P, i.e., TS P = i k ∈P TS i k .Theorem 1: be a pattern in TDB.The complement timestamp-list of P, i.e., TS P = i k ∈P TS i k .
Proof: According to set theory, the complement of TS P , i.e., Hence proved.Theorem 2: Hence proved.Theorem 3: Hence proved.Please note the definition of a periodic-frequent pattern remains unchanged in our approach.We consider a pattern periodic-frequent if it satisfies the user-specified minSup and maxPRD constraints.The following section presents our algorithm to find periodic-frequent patterns in a dense database.

IV. PROPOSED ALGORITHM A. BASIC IDEA: CONSTRUCTION OF CTS-LIST DURING DEPTH-FIRST SEARCH
An itemset lattice represents the space of items in a database.This lattice represents the search space of periodic-frequent pattern mining.Henceforth, the search space size of periodic-frequent pattern mining is 2 n −1, where n represents the total number of items in a database.Reducing this huge search space is a challenging task in pattern mining.We try to reduce this huge search space by performing the depth-first search using the downward closure property of periodic-frequent patterns (see Properties 3 and 4).
A key challenge in performing the depth-first search on the itemset lattice is the construction of the correct cts-list for a child node.We tackle this challenge effectively by constructing the cts-list of a child node, say Q, by performing the union operation between the cts-lists of its parent node, say P and an item, say i k , that exists in the child node but not exist in the parent node.That is, if Q = {P∪i k }, where i k ̸ ∈ P and i k ∈ I , then Q = P ∪ i k .The correctness of our idea is shown in Lemma 1.
Lemma 1: Proof: According to Definition 1, the cts-list of P, i.e., Similarly, the cts-list of Q, i.e., Substituting Equation 4in Equation 5, we get Hence proved.

B. CONSTRUCTION OF CTS-LIST
Our algorithm, PFPM-C, has the following three steps: (i) Find all periodic-frequent items (or 1-patterns) by scanning the database, (ii) construct the complement set, i.e., TS i j , for every periodic-frequent item i j , and (iii) using the downward closure property (see Property 4), find all periodic-frequent patterns from the database by performing the depth-first search on the lattice.Algorithms 1 and 2 provide the procedures to find the complete set of periodic-frequent patterns in a database.Now, we illustrate the working of these algorithms using the database shown in Table 1.Let minSup = 5 and maxPRD = 3. Construct the list of timestamps for every item by scanning the database (Lines 1 to 8 in Algorithm 1).Fig. 2(a)-(c) shows the TS-list generated after scanning the first, second, and every transaction in the database.Next, calculate the support and periodicity for each item in the list.Using the downward closure property, we prune the items having support is less than minSup or periodicity is more than maxPRD (Lines 12 to 18 in Algorithm 1).Consider the remaining items in the list as periodic-frequent items and sort them in support descending order (Line 22 in Algorithm 1).Fig. 2(d) shows the sorted list of all periodic-frequent items discovered from Table 1.Let the sorted list of all discovered periodic-frequent patterns be denoted as L. Let us update the list of timestamps of all Algorithm 1 PeriodicFrequentItems(TDB: Temporal Database, minSup: Minimum Support, maxPRD: Maximum Periodicity 1: Let TS-list = (i j , ts-list(i j )) be a dictionary that records the temporal occurrence information of an item i j in a TDB.Let TS l be a temporary list to record the timestamp of the last occurrence of an item in the database.Let Per be a temporary list to record an item's periodicity in the database.2: for each transaction t cur ∈ TDB do 3: Set ts cur = t cur .ts;

4:
for each item i j ∈ t cur .X do 5: if i j does not appear in TS-list then 6: Insert j and its timestamp into the TS-list.Set TS l [i j ] = ts cur and Per[i j ] = (ts cur − ts initial ); if len(ts-list(i j )) < minSup then 14: Remove i j from the TS-list; Set if Per[i j ] > maxPRD then 18: Remove i j from the TS-list.Update the i j 's TS-list with its complement information.That is, set ts-list(i) = TS − ts-list.Let us call this new TS-list a CTS-list for brevity.25: end for 26: Call PFPM-C(CTS-List).periodic-frequent items with their complements (Lines 23 and 24 in Algorithm 1).Fig. 2(e) shows the resultant TS-list produced after applying the set complement.We now call this list a CTS-list for brevity.Next, we find all periodic-frequent patterns by performing a depth-first search on the itemset lattice generated by L (Line 26 in Algorithm 1).The search space optimization during the depth-first search is achieved by preventing the search on child nodes if a parent node fails to be a periodic-frequent pattern (Algorithm 2).Fig. 3. shows the depth-first search performed on the lattice.We start with the first item in L, i.e., p.Since p is a periodic-frequent Set pi = ∅ and X = i; 3: for each item j that comes after i in the CTS-list do if ts final is in ts-list then 9: Remove ts final from ts-list for Call PFPM -C(pi); 25: end for pattern (or item), we concatenate p with the second item in L, i.e., q.The result is a new pattern pq.Using Property 2, we construct its complement list and determine whether it is a periodic-frequent pattern or not using Definition 4. We repeat this process until the child node fails to be a periodic-frequent pattern or lattice is traversed.

C. TIME COMPLEXITY ANALYSIS
Suppose we are examining a database that stores temporal information.This database contains a transactions, each corresponding to a specific time.Across all of these transactions, c unique items exist.Furthermore, the average transaction length is equal to b.In this database, all items are deemed of interest and, therefore, included in the analysis.Understanding the characteristics of the database, including the number of transactions, unique items count, and the length of transactions, is crucial for performing the complexity analysis.
The PFPM-C algorithm significantly contributes to PFPM by efficiently computing and identifying PFPs.The Algorithm 1 starts by scanning the entire database and calculating the support, and periodicity of each item.A list of items satisfying the minSup and maxPRD constraints are

FIGURE 2.
Finding periodic-frequent patterns in the itemset lattice using the depth-first search technique.(a) Let us start from the periodic-frequent item (b) After determining p as a periodic-frequent item, we visit its child node pq, construct its cts-list, and determine its support and periodicity .(c) As pq is a periodic-frequent pattern, PFPM-C visits its child node pqt and determines it as a periodic-frequent pattern.(d) The child node, pqtr , is later visited and pruned as it fails to be a periodic-frequent pattern.(e) The process of depth-first search performed n on the itemset lattice of p, q, t , r , and s items.sorted based on their support in ascending order.In the final step of the algorithm, the timestamp information for each item is replaced with its complement information, which has a time complexity of O(a), where a is the total number of transactions (or worst case size of a TS-list).This process is repeated for all items in the CTS-List, resulting in a time complexity of O(c).Overall, the complexity of the initial Algorithm is O(ab) + O(c) = O(ab), where a represents the number of transactions and b represents the average transaction length.
Once the CTS-List (or one-length PFPs) has been identified, we generate combinations of items to form larger PFPs.This is accomplished using procedures outlined in Algorithm 2. This algorithm consists of two steps.In the first step, it accesses two items and compares their (d − 1) itemset complement TS-lists to generate a d-itemset complement TS-list.This step has a complexity of O(c 2 ), where c represents the number of unique items.Notably, in dense databases where the original TS-list size is large, the size of the complement TS-list is relatively small.As a result, the d-itemset construction process is faster, especially in dense databases.The second step involves calculating each itemset's periodicity and support and discarding uninteresting patterns based on user-specified minSup and maxPRD criteria.The overall complexity of Algorithm 2 is O(c 2 ).
In conclusion, the PFPM-C algorithm has a complexity of O(c 2 ) for finding all the PFPs.This efficiency makes PFPM-C a highly effective method for PFPM in dense temporal databases.

V. EXPERIMENTAL RESULTS
In this section, we show the results of conducted experimentation.The proposed algorithm PFPM-C is evaluated against the state-of-art algorithms PFP-growth [1], PFP-growth++ [3], and PFECLAT [14]) in terms of runtime requirements and memory consumption.We conducted experiments on various real-world dense databases by varying minSup and maxPRD thresholds.

A. EXPERIMENTAL SETUP
In this subsection, we explain the complete environment details of the experimentation.The configuration of the server machine (Gigabyte R282-z94 rack) is as follows: equipped with two AMD EPIC 7542 CPUs and 600 GB RAM, running on Ubuntu Server OS 20.04.All algorithms were written in Python 3.7.On both synthetic (C20D10K) and real-world (Chess, Connect, PUMSB, Mushroom, and Pollution) databases were utilized to conduct the experiments.Mostly, dense databases are taken for experimentation as PFPM-C performance is poor in sparse databases.
The C20D10K is a synthetic database generated using the procedure described in [21].The Chess database is a highdimensional real-world database containing 75 items and 3196 transactions prepared from the UCI Chess database.The Connect is a dense real-world database that contains all positions in the game of Connect-4 prepared from the UCI Connect database.The Mushroom is a dense real-world database containing different species of grilled mushrooms prepared from the UCI mushrooms dataset.The PUMSB is a real-world database containing census data for population and housing with 49,046 transactions.
Many cardio-respiratory issues are caused by air pollution.The Japanese Ministry of the Environment developed the Atmospheric Environmental Regional Observation System (AEROS) to tackle air pollution problems.Several air pollution measurement sensors are scattered around Japan as part of this system.Each station collects the data of various air pollutants, say PM 2.5 , NO 2 , and O 3 , hourly.For our experiment, we confine to PM 2.5 since particle size is the primary contributor to the wide variety of cardio-respiratory issues experienced by Japanese citizens.According to Air Quality Index Standards, PM2.5 values greater than 16 µg/m 3 per hour are unsuitable for the people.Consequently, the hourly raw data of PM 2.5 was then transformed into a temporal database.As there is no optimal way to identify the appropriate values for the user-defined parameters, the parameters were chosen based on the statistics of the database.The statistics of all databases are shown in Table 3.All algorithms employed for evaluation purposes were made available in the GitHub-hosted PAttern MIning (PAMI) repository [43] to verify the repeatability of our experiments.

B. EVALUATION OF ALGORITHMS BY VARYING MINSUP
In this first experiment, we evaluate PFPM-C against PFP-growth, PFP-growth++, PS-growth, and PFECLAT algorithms by varying only the minSup constraint in each database.The maxPRD value in each database will be set at a particular value.The maxPRD (specified in count) in Chess, Connect, PUMSB, Mushroom, C20D10K, C73D10K, and Pollution has been set at 400, 2000, 14, 2500, 5000, 4000, and 45, respectively.Fig. 3 illustrates the total number of periodic-frequent patterns generated by PFP-growth, PFP-growth++, PFE-CLAT, and PFPM-C in different databases by varying minSup values.The X-axis denotes the minSup, and the Y-axis denotes the number of periodic-frequent patterns generated at a particular minSup value.Based on this figure, the following observations can be made: 1) Since all algorithms have generated the same number of periodic-frequent patterns, the line plots have overlapped with one another.2) Increasing the minSup threshold value results in the generation of fewer periodic-frequent patterns.This is because many patterns fail to satisfy the increased minSup value.More importantly, long patterns fail to meet the increased minSup value due to the downward closure property.
The results of runtime requirements are shown in Fig. 4.These graphs help us to analyze the relationship between minSup and runtime required by all algorithms at each minSup.The X-axis denotes the minSup, and the Y-axis denotes the runtime requirement of the algorithms at a particular minSup value.Based on this figure, the following observations can be made: 1) The increase in value of minSup tends to decrease the runtime of all algorithms.This is because, as the minSup increases, the number of periodic-frequent patterns generated will decrease, automatically requiring less runtime.2) Through these graphs, we note that PFPM-C performs strongly in dense real-world databases like Chess, Connect, PUMSB, Mushroom, and Pollution.However, in the case of the sparse C20D10K database, PFPM-C requires higher runtime when compared to state-of-theart algorithms.3) PFPM-C excels in dense databases primarily due to the pervasive occurrence of items in all transactions, resulting in shorter lengths of complement sets.Conversely, in sparse databases, the lengths of complement sets tend to be more extensive.Overall, complement sets play a pivotal role in reducing the runtime requirements to determine the periodicity of a pattern.
The memory consumption results are presented in Fig. 5.The graph allows us to analyze the relationship between minSup and the memory consumption to mine the periodicfrequent patterns.The Y-axis represents the memory requirements of PFP-growth, PFP-growth++, PFECLAT, and PFPM-C algorithms.The following observations can be made based on this figure : 1) Increase in minSup decreases the memory requirements of all algorithms.This is because fewer patterns will be generated with the increase in minSup.2) The algorithm PFP-growth and PFP-growth++ algorithms consumed less memory in all databases, whereas PFPM-C is more efficient than PFECLAT regarding memory.3) PFPM-C has more advantages in large databases like PUMSB and Connect.

C. EVALUATION OF ALGORITHMS BY VARYING MAXPRD CONSTRAINT
We evaluated the PFP-growth, PFP-growth++, PFECLAT, and PFPM-C algorithms in the previous subsection by varying only the minSup value.We now evaluate the  results in more periodic-frequent patterns, as many patterns meet the maxPRD criteria.All algorithms generate the same set of periodic-frequent patterns at a given maxPRD value.
The results of runtime requirements of various algorithms are shown in Fig. 7.These graphs help us to analyze the relationship between maxPRD and runtime required by all algorithms at each maxPRD.The X-axis denotes the maxPRD, and the Y-axis denotes the runtime requirement of PFP-growth, PFP-growth++, PFECLAT, and PFPM-C at a particular maxPRD value.Based on this figure, the following observations can be made: 1) Increase in maxPRD increases the runtime requirements of all mining algorithms.This is because of the increase in the number of patterns being generated.2) Except in the sparse C20D10K database, the PFPM-C algorithm performed very well in all other databases and took less time to mine the patterns than other stateof-the-art algorithms.
The memory consumption results are presented in Fig. 8.The graph allows us to analyze the relationship between maxPRD and the memory consumption to mine the periodicfrequent patterns.The Y-axis represents the memory requirements of PFP-growth, PFP-growth++, PFECLAT, and PFPM-C algorithms.The following observations can be made based on this figure : 1) Increase in maxPRD increases the memory requirements of the mining algorithms.This is because the mining algorithms must generate and store more patterns in the memory.
2) The tree-based algorithms, PFP-growth and PFP-growth++, consumed less memory than PFECLAT and PFPM-C algorithms.3) PFPM-C consumes less memory than PFECLAT in some databases and vice-versa.

VI. CONCLUSION AND FUTURE WORK
An efficient algorithm, PFPM-C, was introduced in this paper to efficiently find periodic-frequent patterns in a database.This algorithm exploited the notion of ''set complements'' to reduce the size of patterns' timestamps lists.We introduce a new property to calculate the periodicity of a pattern from its complement temporal information.We studied the performance of the PFPM-C algorithm on various real-world and synthetic databases.Empirical results demonstrate that PFPM-C is memory efficient and can obtain all periodic-frequent patterns faster against state-of-the-art algorithms.This paper focused on developing a sequential CPU-based algorithm to find periodic-frequent patterns in a temporal database.As a part of future work, we would like to investigate approaches to find the patterns in a distributed fashion.In future work, we would like to explore new measures or techniques to further reduce the computational cost of mining the periodic-frequent patterns.
be a pattern in TDB.The support of pattern P, denoted as sup(P) = |TS| − | TS P |.Proof: The support of P according to the definition of support 3, the support of P, i.e., sup(P) = |TS P | = |TS − TS P |.
be a pattern in TDB.The periodicity of P in TDB, i.e., prd(P) = max(| MTS P i |∀ MTS P i ∈ TS Z ) + 1. Proof: According to the definition of the periodicfrequent pattern model, the periodicity of P, i.e., pd(P) = max(

FIGURE 1 .
FIGURE 1.The process of finding periodic-frequent items and constructing their cts-lists.(a) After scanning the first transaction.(b) After scanning the second transaction.(c) After scanning all of the transactions.(d) An unordered list of periodic-frequent items.(e) Sorted list of periodic-frequent items and their cts-lists.

FIGURE 3 .
FIGURE 3. Number of patterns generated at different minSup thresholds.

FIGURE 6 .
FIGURE 6. Number of patterns by varying maxPRD in different databases.

FIGURE 7 .
FIGURE 7. evaluation by varying maxPRD in different databases.
performance of these algorithms by varying only the maxPRD constraint in each of the databases.The minSup in Chess, Connect, PUMSB, Mushroom, C20D10K, and Pollution and Retail databases have been set at 2200, 60000, 42500, 70, 2000, and 50, respectively.Fig.6illustrates the total number of periodic-frequent patterns generated by PFP-growth, PFP-growth++, PFE-CLAT, and PFPM-C in different databases by varying maxPRD values.The X-axis denotes the maxPRD, and the Y-axis denotes the number of periodic-frequent patterns generated at a particular maxPRD value.Based on this figure, we can say that raising the maxPRD threshold

FIGURE 8 .
FIGURE 8. Memory evaluation by varying maxPRD in different databases.

TABLE 1 .
An irregular temporal database.
Add i j 's timestamp in the TS-list.Update TS l [i j ] = ts cur and Per[i j ] = max(Per[i j ], (ts cur − TS l [i j ])); 12:for every item i j in TS-list do13: 1 then per(Y ) = max(periods) + 1 20: if sup(Y ) ≥ minSup and per(Y ) ≤ maxPRD then 21:Add Y to pi and Y is considered as periodicfrequent pattern;

TABLE 3 .
Statistics of the databases.