k-PFPMiner: Top-k Periodic Frequent Patterns in Big Temporal Databases

Finding periodic-frequent patterns in temporal databases is a prominent data mining problem with bountiful applications. It involves discovering all patterns in a database that satisfy the user-specified minimum support (<inline-formula> <tex-math notation="LaTeX">$min{\_{}}sup$ </tex-math></inline-formula>) and maximum periodicity (<inline-formula> <tex-math notation="LaTeX">$max$ </tex-math></inline-formula>_<inline-formula> <tex-math notation="LaTeX">$per$ </tex-math></inline-formula>) constraints. <inline-formula> <tex-math notation="LaTeX">$Min\_{}sup$ </tex-math></inline-formula> controls the least number of transactions in which a pattern must appear in a database. <inline-formula> <tex-math notation="LaTeX">$Max\_{}per$ </tex-math></inline-formula> controls the maximum time interval within which a pattern must reappear in the database. The popular adoption of this task has been hindered by an open problem, which involves setting appropriate <inline-formula> <tex-math notation="LaTeX">$min\_{}sup$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$max\_{}per$ </tex-math></inline-formula> values for any given database. This paper addresses this open problem by proposing a solution to discover top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> periodic-frequent patterns in a temporal database. Top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> periodic-frequent patterns represent the <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> number of periodic-frequent patterns having the lowest <inline-formula> <tex-math notation="LaTeX">$periodicity$ </tex-math></inline-formula> value in a database. An efficient depth-first search algorithm, Top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> Periodic-Frequent Pattern Miner (<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-PFPMiner), which takes only <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> threshold as an input, was presented to find all desired patterns in a database. Experimental results on synthetic and real-world databases demonstrate that our algorithm is efficient and scalable.


I. INTRODUCTION
Pattern mining is an important sub-field of data mining.It involves discovering all user interest-based patterns hidden in a database.Frequent pattern mining (FPM) [1] is a fundamental knowledge discovery technique introduced as a key intermediary step to discover association rules between the items in a database.It involves discovering all patterns having support no less than the user-specified minimum support (min_sup) value.The min_sup is a hyper-parameter that controls all the minimum number of transactions in which a pattern must appear in a database.It is also an important parameter that controls the number of patterns discovered in a database.Unfortunately, setting this parameter for any database is an open research problem.When confronted with this problem in real-world applications, researchers try to solve it by finding top-k frequent patterns [2] as The associate editor coordinating the review of this manuscript and approving it for publication was Kostas Kolomvatsos .the constraint k is relatively easier to specify than the min_sup value.Since then, the problem of finding top-k frequent patterns has received much attention [2].A classic application of top-k frequent pattern mining is marketbasket analysis.It involves identifying the top-k frequently purchased itemsets by customers.An example of a top-k frequent pattern is as follows: The above pattern indicates that 10% of the customers have purchased the items 'Milk' and 'Beer' together.This knowledge may help managers to make profitable business decisions, such as product placement and campaigning.
Over the years, the FPM has inspired many knowledge discovery techniques, such as closed frequent pattern mining [3], maximal frequent pattern mining [4], fuzzy frequent pattern mining [5], high utility pattern mining [6], frequent sequence pattern mining [7], and generators [8].However, the wide societal adoption of this technique has been hindered by its inability to consider the temporal occurrence information of an item in a database.Tanbeer et al. [9] tried to solve this problem by extending the FPM to discover periodic-frequent patterns (PFPs) in a temporal database.An example of a PFP is as follows: {Milk, Beer} [support = 80%, periodicity = 2%].
The pattern indicates that items are sold every 2 hours with 80% support.Based on the identified PFP, the retailer can send text reminders to customers on weekends, encouraging them to purchase the associated items and providing discounts as an incentive.This proactive approach aims to enhance customer engagement, foster loyalty, and drive sales.Furthermore, in the literature, the model of PFP was extended to find local periodic patterns [10], periodic sequential patterns [11], fuzzy PFPs [12], maximal PFPs [13], recurring patterns [14], geo-referenced PFPs (GPFPs) [15], fuzzy GPFPs [16] and stable periodic patterns [17], [18], [19].However, there are certain limitations of PFPM.The two significant limitations of PFP are: First, generating too many patterns that are uninteresting to the user.Secondly, relying on min_sup and max_per constraints to extract the desired patterns.
This paper presents a novel approach to address the abovementioned limitations in PFPM.Rather than relying on the conventional min_sup and max_per constraints, we propose utilizing a single constraint k to discover desired patterns.This constraint represents the selection of the top-k most PFPs.By adopting this approach, we mitigate the challenge of selecting appropriate minSup and maxPer values, which often require prior knowledge of the database characteristics.Instead, the user can leverage the top-k PFPs, which are helpful to the user in analyzing the pattern behavior.
The contributions of this paper are as follows: 1) This paper proposes a novel model of top-k PFPs that may exist in a temporal database.Informally, a PFP is a pattern that frequently occurs at regular intervals in a temporal database.The top-k PFPs represent the k number of PFPs with the lowest periodicity in the entire database.
2) The space of an itemset lattice represents the search space of finding PFPs.The size of this search space is 2 n − 1, where n represents the total number of items in a temporal database.Reducing this enormous search space is challenging without min_sup and max_per constraints.Our model employs a new upper-bound measure, dynamic maximum periodicity, to reduce the search space effectively.Please note that our model's upper-bound value will automatically be calculated and updated without human intervention.
3) This paper presents two novel pruning techniques to reduce the computational cost of finding the PFPs.
Both of these techniques rely on dynamic maximum periodicity.The objective of the first pruning technique is to prune the search space in the itemset lattice effectively.The objective of the second pruning technique is to determine whether a pattern is periodic or aperiodic in the database effectively.Traditionally, the time complexity to determine whether a pattern is periodic or aperiodic in the database is O(m), where m represents the number of occurrences of a pattern in the database.4) An efficient single-pass algorithm using a best-first search strategy, called top-k PFP Miner (k-PFPMiner), is proposed to find all desired patterns in a database.5) Experimental results on synthetic and real-world databases demonstrate that our algorithm is memory and runtime efficient and also highly scalable.This paper is an expanded version of a previous conference paper [20], providing a brief overview of the literature on this topic.In this paper, we have extended the related work by comprehensively reviewing the current literature.Furthermore, we report new experimental findings that demonstrate the superior performance of k-PFPMiner over the naïve on various databases, regardless of the k value.
The rest of this paper is organized as follows.Section II describes the related work on finding top-k PFPs in databases.Section III describes the proposed model to find top-k PFPs in databases.Section IV describes the proposed algorithm to discover the top-k PFPs.Section VI presents the experimental results obtained.Finally, in section VII, we conclude and discuss future research.

II. RELATED WORK
In this section, we briefly look at past studies about frequent pattern mining, periodic-frequent pattern mining, and top-k periodic-frequent pattern mining.

A. FREQUENT PATTERN MINING
Frequent pattern mining is a fundamental technique in the field of big data analytics, finding extensive applications across various domains such as bio-informatics [21], market basket analysis [22], energy reduction in smart homes [23], malware analysis [24], webpage click-stream analysis [25], proof sequence analysis [26], and text analysis [27].The goal of frequent pattern mining is to discover patterns that occur frequently in a given transactional database, enabling valuable insights and knowledge extraction.Several wellknown algorithms, including Apriori [1], FP-growth [28], and Eclat [29], have been developed to tackle the challenge of finding frequent patterns efficiently.These algorithms employ different strategies and have varying degrees of scalability and efficiency, offering flexibility for different data scenarios [27].However, while frequent pattern mining techniques have proven effective in uncovering commonly occurring patterns, they may not adequately capture patterns that exhibit consistent temporal behavior.Traditional frequent pattern mining approaches focus solely on identifying patterns that occur frequently, without considering the temporal dimension of the data.This limitation hinders their ability to detect patterns that occur consistently over time or exhibit periodicity.To address this gap, researchers have proposed alternative techniques, such as temporal data mining and periodic-frequent pattern mining, to specifically target temporal patterns.These approaches take into account the temporal variations and dependencies within the data, allowing for the discovery of patterns that exhibit temporal consistency or periodicity.By incorporating time-related information into the mining process, these methods identify valuable patterns that may have significant implications in various domains, such as understanding customer behavior over time or detecting recurring events in temporal databases [9].

B. PERIODIC FREQUENT PATTERN MINING
Periodic-frequent pattern mining considers temporal patterns that occur periodically or cyclically in databases.Tanbeer et al. [9] proposed a novel algorithm, periodic frequent pattern-growth, to identify periodic-frequent patterns (PFPs) in transactional databases.Their algorithm utilizes the PFtree, a tree data structure with tail nodes storing transaction identifiers for each pattern.Pruning involves moving the tail node to the parent node, enabling efficient pattern storage.PFPs are generated using a measure based on min_sup and max_per.The authors claimed efficiency in their mining process and demonstrated effective identification of PFPs in large databases.
Kiran and Kitsuregawa [30] proposed the ExPF-growth algorithm for identifying PFPs using the minimum periodic ratio measure.The algorithm utilizes potential patterns and employs an ExPF-list and a prefix tree to store transactional identifiers.It recursively expands patterns, checking their periodicity and support thresholds and pruning uninteresting patterns.The ExPF-growth algorithm terminates when no more patterns can be expanded or when all patterns have been mined.
Kiran et al. [31], [32] proposed the PFP-growth++ algorithm for mining PFPs from large transactional databases.The algorithm utilizes the PF-tree++ data structure, an extension of the PF-tree, to efficiently store patterns.A significant contribution of the algorithm is the introduction of local periodicity, which allows for early termination when no new periodic patterns can be found.By leveraging the PF-tree++ structure, local periodicity concept, and pruning techniques, the PFP-growth++ algorithm can handle large databases and reduce unnecessary computations.
Anirudh et al. [33] proposed PS-growth, a memoryefficient algorithm for mining PFPs in sparse databases.It uses periodic summaries to reduce search space and memory usage.Transactions are divided into intervals based on periodicity, and candidate patterns are generated using these summaries.PS-growth achieves faster mining and lower memory consumption by reducing the number of candidate patterns at each level of the search tree.
Surana et al. [34] introduced the MaxCPF model for discovering PFPs with constraints on maximum length or size.They developed the MaxCPF-tree, a modified version of the FP-tree, to efficiently mine these patterns.The MaxCPF-tree stores pattern information in a compressed form, reducing memory usage and improving efficiency.They also proposed the MaxCPF-List to filter out false positives during mining.This approach effectively identifies frequent and periodic patterns while considering constraints.
Ravikumar et al. [35], [36] proposed PF-ECLAT, an extension of the ECLAT algorithm, for mining PFPs in columnar temporal databases.It employs a list-based approach and utilizes pruning techniques to efficiently discover interesting patterns.

C. TOP-K PERIODIC PATTERN MINING
Top-k periodic-frequent pattern mining involves finding the k most periodic-frequent patterns in a database.Amphawan et al. [37] proposed MTKPP, a non-support metric-based algorithm for discovering PFPs.It utilizes a sliding window technique and a top-k list structure to identify the k most frequent PFPs.MTKPP employs a best-first strategy, pruning unlikely candidates until either k patterns are found or no further pruning is possible.Its uniqueness lies in not relying on a support metric, making it suitable for diverse applications.
Viger et al. [17] proposed TSPIN to find the top-k stable periodic patterns.It considers user-specified constraints max_per, and maxLA in addition to k.It stores all the transactions in SPP-tree and internal min_sup as 1.SPP-tree is mined recursively, the lowest frequent pattern is removed from the queue, and min_sup is raised to the least frequent pattern in the remaining queue.However, the above studies still require users to specify the other constraints in addition to k.

III. MODEL OF TOP-K PERIODIC-FREQUENT ITEMSET
Let O denote the set of objects (or items).An itemset P ⊆ O is referred to as an itemset or pattern.If an itemset contains α, α ≥ 1, number of items, it is known as an α-pattern.In a transaction, represented by t k = (ts, X ), the tuple contains the timestamp ts indicating when the pattern X occurred.A temporal database TDB over O is a collection of transactions, specifically TDB = {tr 1 , • • • , tr m }, where m = |TDB| represents the number of transactions in TDB.For a transaction tr k = (ts, X ) with k ≥ 1 and a pattern Z ⊆ X , it is said that Z occurs in tr k (or tr k contains Z ), and the timestamp associated with this occurrence is denoted as m] and j ≤ k, be an ordered set of timestamps indicating the occurrences of pattern Z in the temporal database TDB.
Example 1: Consider a set of items O = {p, q, r, s, t}.Table 1 shows a temporal database constructed from these items.We are interested in the pattern {r, p}, which we represent as 'rp' for brevity.This pattern is referred to as a 2-pattern due to the presence of two items.In the given database, the pattern 'rp' appears at the timestamps 1, 3, 4, 6, 7, 8, and 9. Therefore, the list of timestamps in which the pattern 'rp' occurs is denoted as TS rp = {1, 3, 4, 6, 7, 8, 9}.
Definition 1 (Periodicity of Z ): A period of Z in TDB is calculated using the following three ways: | and a ≤ p ≤ q ≤ c represent the periods (or inter-arrivals) of Z in the database, and (iii) p Z |TS Z |+1 = ts max − ts Z c .The maximal and minimal timestamps of all transactions in the database are represented as ts min and ts max .Let | + 1, be the set of all periods of Z in the temporal database.The periodicity of Z , denoted as per(Z ), is defined as the maximum value among the periods in P Z , i.e., per(Z The periods for the pattern 'rp' are calculated as follows: is considered a top-k periodic pattern if its periodicity is no greater than the periodicity of pattern X k in the database.In other words, X a is classified as a top-k periodic pattern if per(X a ) ≤ per(X k ).
Example 3: Let us assume that the current candidate top-K periodic pattern list as {p, q, r, s, t}.If the per(rp) ≤ per(t), the pattern rp is considered as top-k periodic frequent patterns.

Definition 3 (Problem Definition):
The objective of topk periodic pattern mining in a temporal database (TDB) is to identify the k periodic-frequent patterns with the lowest periodicities.The goal is to focus specifically on these top-k patterns rather than discovering all periodic-frequent patterns in the database.

IV. OUR ALGORITHM A. BASIC IDEA: DYNAMIC MAXIMUM PERIODICITY
To address the challenge of reducing the large search space in top-k periodic-frequent pattern mining, we propose an approach that involves the following steps: • Create an empty list called the candidate periodic pattern-list (cPP-List) and initialize a Max-heap data structure with its root set to null.
• Scan the database and add the patterns to the cPP-List.At the same time, update the Max-heap with the periodicity values of these patterns.
• As the cPP-List grows, keep track of its size.Once the size of the cPP-List reaches the desired value of k, set the dynamic maximum periodicity (dMaxPer) equal to the value stored at the root of the Max-heap.
• Prune the search space by applying the dMaxPer constraint to the itemsets.Remove any patterns that do not satisfy the constraint.
• If any pattern satisfies the constraint, add it to the cPP-List by removing the existing k-pattern and updating dMaxPer accordingly.
• Repeat this process until the entire search space is explored.By employing this approach, we can effectively reduce the search space by dynamically updating the maximum periodicity value and pruning patterns that do not meet the constraint.The time complexity to determine the periodicity of a pattern is O(1) for each pattern and O(n) for the entire database, where n represents the number of timestamps or frequency of a pattern in the database.This approach allows us to focus on the most relevant top-k periodic-frequent patterns, improving the efficiency and effectiveness of the mining process.O(n) Definition 4 (Dynamic Maximum Periodicity Constraint): Let AP represent the set of all patterns in the database, where and n is the number of items.The patterns explored by our algorithm so far are stored in the set EP, a subset of the set of all patterns (EP ⊆ SP).Among the patterns explored, we have a subset EP k that contains the topk candidate periodic patterns found up to this point, where The dynamic maximum periodicity, denoted as dMaxPer, measures the highest periodicity among all the patterns in EP k .In other words, dMaxPer is calculated as the maximum value of the periodicity (per) among all the patterns in EP k : dMaxPer = max(per(X p )|forallX p inEP k ).
Example 4: Given a set of all patterns in the database AP = {p, q, r, s, t, pq, pr, • • • , pqrst}, and the set of explored patterns until now EP = {p, q, r, s, t} which is a subset of AP.If k = 5, then EP k represents the top-5 candidate periodic patterns discovered so far, which is EP k = p, q, r, s, t.The dMaxPer is calculated as the maximum periodicity among the patterns in EP k .Let's consider the pattern rp as an example.The ts at which rp occurs are TS rp = {1, 3, 4, 6, 7, 8, 9}, and the per(rp) = 2.If we explore an additional pattern rp and update EP to EP = {p, q, r, s, t, rp}, we need to determine if rp should be included in the top-5 candidate periodic patterns.Since per(rp) = 2 and dMaxPer = 4, we compare the periodicity of rp with dMaxPer.As per(rp) is less than dMaxPer, we prune the pattern t from EP since its periodicity is higher than that of rp.The updated EP k becomes {p, q, r, s, pr}, and  the new dMaxPer is calculated as the maximum periodicity among the patterns in EP k , which is max(2, 2, 1, 3, 2) = 3.By updating EP and dMaxPer, we ensure that the set of top-K periodic-frequent patterns remains up-to-date with the highest periodicities observed so far.
The constraint mentioned above states that for a pattern to be considered a candidate top-k periodic pattern, its periodicity must be less than the current value of dMaxPer.This constraint serves as a criterion to determine the minimum occurrences required for a pattern to be considered a candidate top-k periodic-frequent pattern.

Property 1 (Pruning Technique):
If the per(X ) is greater than the current value of dMaxPer, it implies that X and its supersets cannot be considered top-k periodic-frequent patterns.In other words, if the periodicity of X exceeds dMaxPer, the pattern occurs less frequently or exhibits a longer time interval between occurrences than the existing top-k periodic-frequent patterns.
The correctness of this property is based on Properties 2, 3, and Lemma 1.Our algorithm uses the above pruning technique to discover top-k periodic patterns effectively.
Property 2: For a pattern X , if per(X ) > dMaxPer, then X cannot be a top-k periodic pattern.
Lemma 1: For a pattern X , if per(X ) > dMaxPer, then X cannot be a top-k periodic pattern.
Proof: The correctness is straightforward to prove from Property 2.
Property 3: If X ⊂ Y , then per(X ) ≤ per(Y ) as TS X ⊇ TS Y .

V. K -PFPMINER A. FINDING 1-LENGTH PERIODIC PATTERNS
The downward closure property of periodic-frequent patterns, as demonstrated in Property 1, is a fundamental characteristic that contributes to identifying top-k periodic patterns in a temporal database.This property asserts that if a pattern is deemed periodic-frequent, all of its subsets are also periodic-frequent.Consequently, 1-length periodic-frequent patterns hold significant importance in discovering top-k periodic patterns.Algorithm 1 presents a systematic approach that leverages the cPP-List to identify periodic-frequent patterns.To provide an understanding of the algorithm's functionality, we will now delineate its operation using the temporal database illustrated in Table 1, with the parameter K set to 7.
The database is scanned sequentially to generate 1-length periodic-frequent patterns.Consider the first transaction, ''1:pqr'', with the current timestamp ts cur = 1.The items p, q, and r are inserted into the cPP-List.Their corresponding timestamps are set to 1, and the values of P and TS l are also set to 1.This process is described in lines 5 and 6 of Algorithm 1.The cPP-List after scanning the first transaction is illustrated in Figure 1(a).Moving on to the second transaction, ''2:qrs'', with ts cur = 2, the new item s is added to the cPP-List with its timestamp set to 2. Similarly, the timestamps of the existing items q and r are updated to 2. The values of P and TS l are also adjusted accordingly.This process is carried out in lines 7 and 8 of Algorithm 1.The resulting cPP-List after scanning the second transaction is depicted in Figure 1(b).No new items are added in the third transaction, ''3:pqrs'', with ts cur = 3.Therefore, the timestamps of the existing items are updated.This step is executed in lines 7 and 8 of Algorithm 1.The resulting cPP-List after scanning the third transaction is shown in Figure 1(c).Proceeding to the fourth transaction, ''4:pqrt'', with ts cur = 4, the new item t is inserted into the cPP-List with its timestamp set to 4. Additionally, the timestamps of the existing items are updated accordingly.This process is performed in lines 7 and 8 of Algorithm 1.The resulting cPP-List after scanning the fourth transaction is displayed in Figure 1(d).A similar procedure is repeated for the remaining transactions in the database.The final cPP-List obtained after scanning the entire database is demonstrated in Figure 1(e).The cPP-List is then sorted in ascending order based on the periodicity of the patterns, as shown in Figure 1(f).The top k patterns are stored in the variable topkPatterns, and dMaxPer is calculated as the maximum periodicity within the cPP-List, as outlined in lines 13-15 of Algorithm 1.

Algorithm 1 PeriodicItems(Temporal Database (TDB), K (k)
1: Let cPP-list = (X , TS-list(X )) be a dictionary that records the temporal occurrence information of a pattern in a TDB.Let TS l be a temporary list to record the timestamp of the last occurrence of an item in the database.
Let P be a temporary list to record the periodicity of an item in the database.Let topkPatterns be a list to record the top items with lowest periodicity.Let dMaxPer be a variable to store the dynamic maximum period dMaxPer among topkPatterns.end if 20: end for 21: dMaxPer = max(periodicity of all items in topkPatterns) 22: Call k-PFPMiner(cPP-List).

1) FINDING TOP-K PERIODIC PATTERNS USING CPP-LIST
Algorithm 2 outlines the procedure for discovering topk periodic patterns in a database.Now, we will describe the algorithm's functioning using the recently generated cPP-List.
We initiate the process by considering the first pattern in the cPP-List, which is pattern r (line 2 in Algorithm 2).The periodicity of r is recorded and displayed in Fig. 2(a).As r is a periodic-frequent pattern, we proceed to its child node rp and generate its TS-list by intersecting the TS-lists of r and p, denoted as TS rp = TS r ∩ TS p (lines 3 and 4 in Algorithm 2).The periodicity of rp is recorded and displayed in Fig. 2(b).We then check if rp is a candidate periodic-frequent or uninteresting pattern (line 6 in Algorithm 2).Since rp is a candidate periodic-frequent pattern, we proceed to check if its periodicity (per(rp)) is less than dMaxPer, as defined in Algorithm 3. We calculate dMaxPer as the maximum periodicity among all the periodic-frequent patterns in the current set of topkPatterns.Since rp satisfies the condition of being a top-k periodic-frequent pattern, we proceed to its child node rpq and generate its TS-list by intersecting the TS-lists of rp and q, denoted as TS rpq = TS rp ∩ TS q .The periodicity of rpq is recorded and displayed in Fig. 2(c).We identify rpq as a periodic-frequent pattern and check if it qualifies as a top-k periodic-frequent pattern.Once again, we proceed to its child node rpqs and generate its TS-list by intersecting the TS-lists of rpq and s, denoted as TS rpqs  = TS rpq ∩ TS s .As the periodicity of rpqs is greater than dMaxPer, we prune it from the candidate periodic patterns list, as shown in Fig. 2(d).We then move to the other child of rp and generate its TS-list by intersecting the TS-lists of rp and s, denoted as TS rps = TS rp ∩ TS s .As the periodicity of rps is greater than dMaxPer, we prune it from the candidate periodic-frequent patterns list, as shown in Fig. 2(e).This process is repeated for the remaining nodes in the set-enumeration tree to find all periodic-frequent patterns.The final list of periodic-frequent patterns generated from the temporal database in Table 1 is shown in Fig. 2(f).This approach of finding periodic-frequent patterns using the downward closure property is efficient as it effectively reduces the search space and computational cost.

Algorithm 2 k-PFPMiner(cPP-List)
1: for each item i in cPP-List do 2: Set pi = ∅ and X = i; for each item j that comes after i in the cPP-List do 4: Set Y = X ∪ j and TS Y = TS X ∩ TS j ;

B. TIME COMPLEXITY ANALYSIS
Suppose we are examining a database that stores temporal information.This database contains a total of a transactions, each corresponding to a specific time point.Across all of these transactions, c unique items have been recorded.Furthermore, the average transaction length is equal to b.In this database, all items are deemed of interest and therefore included in the analysis.Understanding the characteristics of the database, including the number of transactions, unique item count, and length of transactions, is crucial for performing the complexity analysis.
Due to its effective computation and identification of k−PFPs, the k−PFPMiner method makes significant contributions to the field of pattern mining.The Algorithm 1 begins by calculating the per and storing ts of each item by building the cPP-List.The current dMaxPer is updated as the maximum periodicity of all the current top-k periodicfrequent patterns.The cPP-List is sorted in ascending order based on their per.The complexity of this initial Algorithm 1 is O(ab), where a is the number of transactions and b is the average transaction length.
Once the one-length k−PFPs have been identified, we generate combinations of items to form larger k−PFPs.This is accomplished using procedures outlined in Algorithm 2. Finally, the entire complexity of finding all the k−PFPs using k−PFPMiner is O(c 2 ), which makes k−PFPMiner a highly efficient method for pattern mining.

VI. EXPERIMENTAL RESULTS
We conducted an evaluation of our algorithm, k-PFPMiner, for the task of discovering Top-k periodic-frequent patterns in temporal databases.As no existing algorithm specifically addresses this task with only the k constraint, our evaluation aimed to assess the effectiveness and performance of k-PFPMiner.We evaluated various databases, systematically varying the value of k to gain insights into the algorithm's behavior and capabilities.

A. EXPERIMENTAL SETUP
Our k-PFPMiner algorithm was implemented in Python 3.7 and executed on a high-performance Gigabyte R282-z94 rack server machine.This server machine has two AMD EPIC 7542 CPUs and 600 GB RAM, running the Ubuntu Server OS 20.04.The experiments have used synthetic (T10I4D100K, T20I6D10K) and real-world (Retail, BMS-WebView-2, Chess, Pollution, and Kosarak) databases.
The synthetic databases, T10I4D100K and T20I6D100K, were generated following the procedure described in [1].The Retail database is a real-world database.The real-world database named BMS-WebView-2 is sparse and contains click-stream data of an e-commerce website for several   Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.clickstream data collection of 990,000 sequences from a Hungarian news portal.
The Pollution database, on the other hand, is a highdimensional real-world database provided by the Japanese Ministry of the Environment through the Atmospheric Environmental Regional Observation System (AEROS) [38].It aims to address air pollution issues.Each transaction in this database contains information such as the timestamp in hours and the station identifiers that have recorded PM 2.5 values equal to or greater than 16 µg/m 3 .The Pollution database comprises 1,600 unique items and 720 transactions.The transaction lengths also vary in this database, with the minimum, average, and maximum lengths being 11, 460, and 971 items, respectively.These databases were chosen to evaluate our k-PFPMiner algorithm's performance and effectiveness in different scenarios and assess its scalability and applicability in both synthetic and real-world contexts.We provide statistical details for each database used in our experiments in Table 2.The algorithms are available in Github [39] to verify the repeatability of our experiments.

B. INFLUENCE OF THE K -PARAMETER ON THE PERFORMANCE OF THE K -PFPMINER
In the first experiment, we varied the parameter k to assess its impact on the performance of k-PFPMiner in terms of runtime and memory usage.The experiments were conducted on the six databases: T10I4D100K, T20I6D100K, Retail, BMS-WebView-2, Chess, and Pollution 3.
Fig. 3 illustrates the number of PFPs generated by the naïve and k-PFPMiner algorithms over different databases at varying k values.The Y-axis represents the number of PFPs generated by each algorithm, while the X-axis denotes the specific k value at which the patterns were generated.Based on this figure, the following observations can be made: • Firstly, the naïve and k-PFPMiner algorithms generate an equal number of PFPs for each database.As a result, both curves overlap one another.
• Secondly, an increase in the k threshold increases the number of 3Ps generated.This can be attributed to the fact that a higher k value results in more patterns that can be extracted from the complete set of PFPs.
The runtime results of the experiments are presented in Figure 4, which provides an overview of the execution time for different values of k.This graph allows us to analyze the relationship between k and the runtime required to mine the top-k periodic-frequent patterns.
In Fig. 4, the x-axis represents the values of the parameter k, while the y-axis represents the execution time of the k-PFPMiner algorithm.Based on this figure, the following observations can be made: • Firstly, increasing the value of k tends to increase the runtime of the k-PFPMiner algorithm.This is a reasonable observation since as k increases, more patterns are discovered, and a larger number of itemsets from the search space need to be considered to identify the top-k periodic-frequent patterns.Consequently, k-PFPMiner may need to evaluate more patterns to populate the candidate set cPP-List, resulting in increased computational time.
• Secondly, we observe that the k-PFPMiner algorithm requires considerably less runtime when compared to the naïve algorithm against all the databases (either sparse or dense nature), regardless of the k value.The memory results of the experiments are presented in Figure 5, which provides an overview of the memory consumption for different values of k.This graph allows us to analyze the relationship between k and the memory required to mine the top-k periodic-frequent patterns.
In Fig. 3, the x-axis represents the values of the parameter k, while the y-axis represents the memory consumption of the k-PFPMiner algorithm.Based on this figure, the following observations can be made: • As the value of k increases in the k-PFPMiner algorithm, the memory consumption also tends to increase.This is primarily because when k is set to larger values, the algorithm needs to consider more itemsets to populate the candidate set Q k .The process of evaluating and storing these additional itemsets requires more memory resources.
• k-PFPMiner algorithm requires significantly less memory than the naïve algorithm across all evaluated databases, regardless of their density or sparsity and the k value.Furthermore, this difference in memory usage was exceptionally high at high k values.

C. INFLUENCE OF THE NUMBER OF TRANSACTIONS ON K-PFPMINER PERFORMANCE
To assess the scalability of the proposed algorithm, we experimented to measure its execution time and peak memory usage as the number of transactions in the database was varied.The real-world database kosarak was selected for this experiment due to its many distinct items and transactions.
The database was partitioned into five parts, and the algorithm's performance was evaluated after adding each part to the previously processed data.Figure 6a and 6b respectively show the runtime and memory requirements of k-PFPMiner algorithm at different database sizes when k = 200.It is evident that both the execution time and memory usage increase as the size of the database grows.This observation is reasonable as larger databases tend to have more itemsets, requiring additional processing time for their evaluation.

VII. CONCLUSION AND FUTURE RESEARCH
Traditional algorithms for mining periodic-frequent patterns often rely on manually setting thresholds for min_sup and max_per, which can be challenging and may not yield optimal results.This paper addresses these limitations by proposing a novel model for discovering top-k periodicfrequent patterns in temporal databases.These patterns represent the k most frequent patterns with the lowest periodicity values across the entire database.Our model introduces a new upper-bound measure called dynamic maximum periodicity to reduce the search space effectively.Importantly, this upperbound value is automatically calculated and updated without human intervention, providing a more efficient and adaptive approach.Furthermore, we have developed a novel pruning technique that significantly reduces the time complexity of identifying whether a pattern is periodic or aperiodic.
In the best case, this complexity has been reduced to O(1); in the worst case, it is O(n).To efficiently discover all desired patterns in the database, we propose an efficient single-pass algorithm called top-k Periodic-Frequent Pattern Miner (k-PFPMiner).This algorithm utilizes a best-first search strategy, enabling it to effectively navigate the search space and find the top-k periodic-frequent patterns in an optimized manner.Extensive experimental evaluations have been conducted using synthetic and real-world databases to assess the performance of k-PFPMiner.The results demonstrate that our algorithm is highly efficient and can discover valuable patterns in temporal data.As for future work, we will work on discovering top-k periodic-frequent patterns in uncertain databases.

FIGURE 1 .
FIGURE 1. Finding periodic-frequent patterns.(a) After scanning the first transaction.(b) After scanning the second transaction.(c) After scanning the third transaction.(d) After scanning the fourth transaction.(e) After scanning all the transactions.(f) the final list of periodic-frequent items sorted in ascending order of their periodicity .
Algorithm 3 follows a two-step procedure.The first step involves accessing two items and comparing their (d − 1)itemset timestamp lists to generate a d-itemset timestamp list with a complexity of O(c 2 ).The second step involves calculating the dMaxPer of each itemset and discarding the uninteresting patterns based on the user-specified k parameter, which have a complexity of O(a).Overall, the complexity of this Algorithm 2 is O(c 2 ).

FIGURE 3 .
FIGURE 3. Top-k most patterns by varying k in different databases.

FIGURE 4 .
FIGURE 4. Runtime evaluation on various databases by varying k.

FIGURE 5 .
FIGURE 5. Memory evaluation on various databases by varying k.
2: for each transaction t cur ∈ TDB do Insert i and its timestamp into the PFP-list.Set TS l [i] = ts cur and P[i] = (ts cur − ts initial ); Add i's timestamp in the cPP-List.Update TS l [i] = ts cur and P[i] = max(P[i], (ts cur − TS l [i])); 16:for each item i in PFP-list do

TABLE 2 .
Statistics of the database.