Combining Graph Clustering and Quantitative Association Rules for Knowledge Discovery in Geochemical Data Problem

Identifying geochemical patterns from backgrounds and generating associated mineralization remains challenging due to the complex structure of mineral deposits. To learn how to identify geochemical anomalies that are spatially associated with mineralization, we need in-depth knowledge of the dependence process. Quantitative association rules (QARs) are applied to discover remarkable relations and dependencies between attributes in a dataset, but it is difficult to generate relationships from geochemical data. In previous studies, no methodology to find association rules is proposed to deal with geochemical data problem, and the classical methods designed for Boolean and nominal attributes require previous discretization, which makes the whole process limited in processing complex data. In this paper, we proposed a hybrid method of graph clustering and quantitative association rules (GCQAR) as a new way of identifying significant geochemical patterns. Graph Clustering (GC) is used as partitioning paradigm because of its ability to handle large-scale datasets. The GC is based on modularity to effectively generate the groups of the graph, to avoid the over-partitioning, and to cover all the rules. In each partition, a set of geochemical quantitative association rules is produced. The results obtained in the experimental study performed on data collected in the field of Xiaoshan, Henan province, China. Our GCQAR has significant benefits in terms of recognition geochemical patterns compared to the traditional methods used in the field of geochemistry.


I. INTRODUCTION
In recent decades, research on processing and recognition of geochemical anomalies that can be used in mineral exploration has made important progress. It is essential to look for the anomalies associated with mineral deposits [1], called significant anomalies. The anomalies are often interpreted as a basic sign of mineralization [1]. Besides, the distribution of geochemical elements is heterogeneous, and usually occurs at different temporal/spatial scales, and interconnects in various ways. Computational methods are necessary to extract knowledge from geochemical elements [2] that could help to identify hidden geochemical patterns related to mineralization [1]. Association rule is a machine learning method, The associate editor coordinating the review of this manuscript and approving it for publication was Byung-Gyu Kim . and one of the most frequently used approaches to find relationships between different attributes in a database. It was first introduced in 1993 by Agrawal et al. [3], and the main target was to discover frequent patterns [4]. Thereafter, a large number of studies have been proposed to find quantitative association rules (QARs) [5]- [9].
Discovering frequent patterns plays a fundamental role to produce interesting relationships among quantitative data. Once the frequent patterns have been found, it is simple to generate association rules that satisfy both minimum support and minimum confidence [10].
The QARs are grouped into various categories [11] according to their computational techniques [12]- [16]. Commonly used methods are clustering-based approach [15]. Many of these clusters apply a domain partition technique and focus on logical interval generation using the notion of dense regions.
The difficulty of these methods lies in reaching optimal partitioning and might give rise to information loss. In addition, clustering methods are not all scalable for high dimensional cases and particularly considering that data can be highly skewed and very sparse.
A basic issue of the traditional association rules is to find frequent patterns in a database, this turns out to be even more problematic in geochemical data problem, due to the compositional nature [17], [18] of data, various dependencies exist, and the large-scale datasets that surpass the processing capability of the conventional system. In addition, geochemical exploration is based on the treatment of a huge number of variables from the relatively large area, and the elements in real-world are more or less associated in terms of certain relationships. Hence, traditional association rules have limitations in processing complex data. As far as we know, no previous research has investigated to identify geochemical pattern using association rules.
In order to properly address this issue, it is worthwhile to discover hidden structures from geochemical data to manage nonlinear and complex relationships before implementing quantitative association rules, because geochemical data usually coexist in heterogeneous geologic systems and connect with each other in difficult ways, so to identify appropriately significant anomalies. Furthermore, the geochemical anomalies generated are used as a direction of frequent patterns that lead to discovering significant patterns and the form of the rules. Geochemical patterns also have a sense of conditions for the rules, which would eliminate the discovery of certain redundant and uninteresting rules.
This work presents GCQAR, to discover significant patterns associated with mineral deposits from massive amounts of input data. The proposed method sequentially applies graph clustering and quantitative association rules. The geochemical anomalies identified are more meaningful in the context of mineralization, and had stronger spatial association with the known deposits in the study area.
This work is organized as follows: Section 2 introduces related works. The geochemical data and pre-processing are provided in Sections 3, 4. Section 5 provides details of our GCQAR method to generate quantitative association rules from geochemical data problem. In Section 6, experiment results and the comparison results with other approaches are provided. In Section 7, through experiments, we summarize the advantages and disadvantages of the GCQAR.

II. RELATED WORKS
Various approaches have been proposed to identify geochemical anomalies related to mineralization. Bölviken et al. [19] introduced the application of Fractal/multi-fractal models to quantify the spatial distribution of geochemical data. Later, a variety of fractal/multi-fractal models have been developed, such as the concentration-area (C-A) fractal model [20], [21], the spectrum-area (S-A) multifractal model [22], and the concentration-distance (C-D) fractal model [23], on the basis of scaling characteristics of geochemical data. Multivariate statistics such as principal component analysis (PCA) [24] and factor analysis (FA) [25], etc., are used to extract the multivariate geochemical data for mineral exploration. The previous methods are based on certain idealized assumptions, and their concern of only lower order, linear features makes them fail to support the complex nature of geochemical data.
A few works in literature are proposed to identify geochemical anomalies based on machine learning. In the research of supervised Learning, Abedi et al. [26] introduced support vector machine (SVM) to explore the Now Chun porphyry-Cu deposits, located in the Kerman province of Iran. Logistic regression (LR) [27], [28] is used to create a multivariate relationship between dependent (e.g., deposits or non-deposits) and independent variable (e.g., faults, geochemical anomaly) to estimate the probability of a specific event related to mineralization. Artificial Neural Networks [29]- [32] have shown advantages over many other methods in geochemical anomaly recognition. Chen et al. [30] employed a continuous restricted Boltzmann machine (CRAM) to recognize multivariate geochemical anomalies in the Baishan district in northeastern China. Hinton et al. [33] used a deep belief net (DBN) to identify multivariate geochemical anomalies. Carranza and Laborte [34] used random forest for data-driven modeling of mineral prospectivity with small number of prospects and data with missing values, in Abra (Philippines). A combination of m-branch smoothing, C4.5 decision tree and weights-of-evidence techniques was introduced by Chen et al. [35] for mineral prospectivity mapping. In the research of unsupervised learning, a deep autoencoder network was introduced by Xiong and Zuo [29] to encode and reconstruct a geochemical sample population with unknown complex multivariate probability distributions. Unsupervised clustering [36]- [40] mainly include k-means clustering [41], fuzzy c-means clustering [41], [42]. These clusters are implemented to describe the spatial distribution of data and define the locations of anomalies. Fouedjio [43] developed an agglomerative hierarchical clustering approach that considers the spatial dependency between observations. Self-organizing map (SOM) [44], [45] is used to identify relationships and patterns in multidimensional datasets. Although studies have been conducted by many authors, this problem is still insufficiently explored.
In other hand, the massive amount of data and applications have led to the development of numerous methods for generation of association relationships. In literature, most of the existing association rules are based on classical methods proposed by Agrawal, Imielinski and Swami such as Apriori [46]- [48], FP-Growth [49] and SETM [50]. These methods are designed to work perfectly with Boolean, nominal values and categorical. Apriori based on candidate creation, then investigation while other methods such as FP-Growth, tries to create a tree without candidate generation, and then finds the frequent items by scanning on the tree. Later, extensive studies were carried out to improve these methods and their applications [46], [51]. However, these methods are based on the generation of a large number of rules suffering from a problem of choosing a threshold and take more database scan in order to calculate the frequency of itemset, which leads to an increase in execution time and memory overhead. Besides, the rules with numerical attributes cannot be discovered by these methods. Though the number of contributions that have been proposed to adapt these methods to deal with QARs, they all require previous discretization, where data are replaced by interval labels using data discretization or concept hierarchies. However, such simple discretization may lead to the generation of an enormous number of rules, most of which end up being unrelated or uninteresting. Even though minimum support thresholds help reduce the exploration of a good number of uninteresting rules, but several of them are still not interesting. Another large combination of strategies based on evolutionary algorithms (EA) [52], [53] that have been introduced to build a set of QARs. However, these methods require high implementation of knowledge exploration.
A new approach is therefore needed for the generation of frequent patterns from geochemical data problem. In this study, we have proposed a three-stage approach to this problem: 1) Implement graph clustering (GC) to generate clusters with significant frequent patterns (or geochemical patterns) from a complex background. 2) Obtain a set of QARs (RuleSet) from frequent patterns discovered in each cluster. 3) Evaluate the quality of the rules over the entire clusters with the aim of selecting the remarkable rules that present the best behavior between variables in the entire dataset.

III. STUDY AREA
The geological map of the study area is provided by the Institute of Geology and Mineral Resources and Development of Henan Bureau. The investigation area is located in the southern margin of north China Platform and in the middle section of the Huaxiong Tailong Group (Fig. 1). It is an important metallogenic belt of the Yuxi Gold Mine. The stratigraphic zone of the investigation area belongs to the western Henan section of the north China stratigraphic zone, and is spanning the Xiong'er Mountain Community and the Dianchi-Cheng Mountain Community.
The study area has a typical double-layer structure. The first layer consists of the crystalline basement, which is the Taihua metamorphic complex group. The second layer is the caprock, which is distributed from the bottom to the top, namely, Lushan group, Xiong'er group, Guandaokou group, Fuyang group, Luojing group, Sinian, Cambrian, Cretaceous, Paleogene, Neogene, and Quaternary. The Taihua complex is exposed in the core of the Lushan fault, and is surrounded by the broad-angled Xiong'er group. The Quaternary system forms the loess area in the southeastern and northwestern fault basins, and the remaining strata are scattered. The metamorphic rocks in the inspection area are developed with an exposed area of 340km 2 , and constitute of the crystalline basement of the Lushan fault, which is an important gold-bearing geological body in the area. The metamorphic rocks are composed of six major types of rocks, including slightly metamorphic rock, amphibolites, quartzite, schist, felsic rock, and gneissic granite. Furthermore, the exposed area of the intrusive rocks is about 260km 2 , which more than 90% are Neoarchean metamorphic granitoid, and are concentrated in the crystalline basement of the Lushan faulted area. The gabbro, diabase, granite porphyry, indosinian syenite porphyry and late Yanshanian granite porphyry in the Mesoproterozoic bear period are scattered in the caprock zone. The middle Proterozoic Xiong'er volcanic rocks spread throughout the region, accounting for the total of the investigation area. The type of these rocks consist of volcanic lava and volcanic clastic rock. Lava rock can be divided into calc-alkaline series and alkali-calcium series according to alkalinity.
The fracture structures in the survey area are very developed and divided into four groups according to their distribution direction, namely: northeast (NE), northwest (NW), east-west (EW), and north-south (NS). There are more than 20 large-scale fractional zones, with the trend is about 60 • , tend to northwest (NW), southeast (SE), and the inclination is generally between 60 • ∼ 80 • . The northwestward fracture is relatively developed and concentrated in belt production. A considerable part of this fracture is the ore-controlling structure of gold, silver, copper, lead, tungsten, barite and other minerals. The east-west fracture zones are generally of a huge scale, and there are many other normal fractures. Besides, two north-south fracture zones are developed in the area, Zhuyuangou-Yuwang fracture zone in the west and Dagugou-Taowangcun fracture zone in the east.
The Neoarchean granitic greenstone terrane and the middle Proterozoic Xiong'er group continental volcanic activity provide a source of gold for the group of gold deposits. The multi-stage tectonic-magmatic thermal events provide conditions for the group of gold deposits. In addition, copper lead, silver, tungsten polymetallic and non-metallic minerals based on barite have also formed a number of mineral deposits in the investigation area, so the metallogenic conditions are superior. The characteristics of geochemical elements in the study area are a comprehensive reflection of the geochemical fields in the Xiaoqinling gold ore field, the Xiong'er mountain gold and molybdenum polymetallic metallogenic belt. The distribution of the geochemical elements is uniform or uneven, while the majority of the elements are not highly differentiated. Only W is a strongly differentiated type.
The study area is divided into different regions, and two regions with different geological structures were selected for this study.  Gaobeigou area is located in the north of Changshui Township, Luoning Province. The known deposits in this region are W-Ni-Zn-Mo-Au, with the presence of large and small W deposit in the middle of the area.
Gushanling area is located in Chenjiayuan Village and Dashitun Village in the south of Gongqian Township, Shaanxi Province, where the known deposit is Cu.
Within the study area, 12 elements were collected from soil sampling for further analysis, including Ag, As, Au, Bi, Co, Cu, Mo, Ni, Pb, Sb, W, Zn, with a total of 28270 sample points.

IV. DATA PREPROCESSING
Data preprocessing is often problem-dependent, and should be carefully employed since the input data significantly influence the results of many algorithms. It is suggested to prepare data in particular ways before implementing any methods. In addition, geochemical data listed as compositions and represented as vectors with a constant sum constraint, typically summing to 100%. This poses a difficulty when looking for statistical correlation in compositional data because values are relative, rather than absolute n−1 and can lead to spurious results [54], [55]. The log-ratio transformation [56] is a solution to the constraints of closed data. An isometric log-ratio transformation (ilr) [57]- [59] was employed to open the raw geochemical data prior to data analysis. The ilr transformation is presented as where g(·) is the geometric mean of the argument, y + is the group with r parts marked with +1 and y − the group of s parts marked with −1.
After transformation, standardisation of feature values is required to provide relative measures of scale and a z-score [60] standardisation was selected for this purpose.
Here x is the transformed data, µ is the mean and σ is the standard deviation. Later, to avoid high values of interesting measures [10] that lead to misleading results, all the inputs are transformed into [0,1] by using where X* is the normalized value, X is the inputted value, X max and X min are the maximum and minimum values of X, respectively.

V. PROPOSED METHOD A. GCQAR
The proposed method sequentially implements graph clustering and quantitative association rules to geochemical data problem; Fig. 2 describes the conceptual scheme. Graph clustering is first applied to identify geochemical data from complex background (Fig. 3). After GC method, detailed features in each cluster are examined based on the concept of quantitative association rules (Figs. 4-6), which allow the generation of unknown interrelations present in the clusters being studied. A result of this process can provide a useful coarse-grained representation of the data [61]. It can improve our understanding of the distribution of geochemical patterns and the interactions between the elements. Furthermore, it helps us to learn deeper structures of geochemical data and predict the future behavior of the elements. Details of the graph clustering and quantitative association rules used in this study are illustrated in the following subsections.

1) GRAPH CLUSTERING
Graph clustering [61], [62] is a field in cluster analysis that looks for groups of similar vertices (i.e., nodes) in a graph. Graph clustering represents data as vertices connected to one another by edges with a set of properties. It plays a basic role to model meaningful systems in different disciplines [68]. The ultimate goal of graph clustering is to partition vertices into several subgraphs, where the vertices are highly cohesive inside but sparsely to other subgraphs. There exist a number of approaches aim at discovering natural divisions of the graph, based on different measures of similarity. A more comprehensive description can be found in [63]- [67].
In the present study, we use modularity optimisation method [68], since it is suitable for handling large datasets. The groups can be quantified in terms of quality functions that give the best split.
Suppose geochemical dataset is a graph, contains n vertices. Each point sample is a node, and edges represent interactions among them. Given a sparse graph G(V,E) which consists of the node set V, the edge set E. The graph can be divided into two groups using a membership variable s. Let vertex v belongs to group 1 if s v = 1 and s v = −1 if it belongs to group 2, for a specific partition of the data into two groups. The number of edges between vertices v and u be M vu , which will generally be 0 if there is no edge between vertices v and u or 1 if there is an edge between the two. The modularity Q  is defined as If edges are randomly placed between vertices v and u, then the expected number of edges is k v k u 2m , where k v and k u are the degrees of the vertices, m = 1 2 v k v is the total number of edges in the graph, where 2m = v k v = vu M vu and the modularity Q is given by the sum of M vu − k v k u 2m through all pairs of vertices v, u that fall in the same group. The whole procedure is repeated to subdivide the graph until every remaining subgraph is indivisible, and no further improvement in the modularity is possible. In this study, we focus on unweighted graphs.
The main process of Graph clustering algorithm used in this work is described as (Fig. 3): Input: A graph G(V,E) Require: unweighted graphs. 1) Each vertex belongs to a single group.
2) Consider each group pair, and assess the modularity score Q that could be achieved by joining them. 3) Join the two clusters that have positive, large values of the modularity ( Q) [68]. 4) Repeat the steps 2 and 3 till only one group remains. 5) Return the splits that allowed obtaining the highest modularity score. Output: The final partitions (disjoint modules).
In the initial work published in [68], it was described that the method was used to identify community compositions, and to reveal the structural features of networks. In the present study, we are specifically interested in the delineation of geochemical anomaly from complex background, and then the result obtained is used as frequent patterns.
GC method can find arbitrary shaped clusters, since geochemical data are not often spherical. Besides, we used modularity to identify disjoint groups that will generally lead to better results than the overlapping clusters. To keep particular features within the clusters for further analysis, and to avoid the generation of redundant rules [69].

2) QARs
Although the process of graph clustering creates groups in which geochemical patterns are brought into some degree of similarity in terms of the quality function known as modularity [68], the relation between the elements remains unclear. In addition, knowing the degree of association among the elements in the graph is also important to analyse their behaviors. In this section, our interest goes towards finding significant interrelation among nodes and explaining variations in   geochemical datasets, because understanding the interaction between elements through the obtained clusters, and exploring associated mineralization is worthwhile in geochemistry. The question now is how can we measure the interrelation between two given elements on a graph accordingly?
In order to address the question outlined here, we need to develop a new method to quantify the interrelation between the elements.
In this section, we introduce quantitative association rules to find useful information among the vertices. The QAR problem [5] is to identify all interesting rules of the form A → B where A is the antecedent and B is the consequent of the rule, A, B ⊆ I and A ∩ B = ∅. I represents itemset, A and B represent the set of items.
The learning phase of QARs used in this work consists of the following steps (Fig. 4): • Obtain a set of QARs for each cluster, in which the input dataset is divided. The antecedent and consequent of the rules are arbitrarily selected. Besides, the length of the rules is always fixed to the number of nodes in each partition ( Fig. 5(a)).
• Evaluate the quality of the rules over the entire splits, using the concept of support and confidence [70]. We focus on the following rule: If two elements are strongly related in the total splits, their relationship may lead to significant patterns (mineralization).
• Obtain the local support (L.Sup) of each rule in ruleset. The rule in each partition that does not satisfy a minimum threshold is removed (Fig. 6).
• Lately, the ruleset from each cluster is collected. Then, the local results generated (i.e., the local supports of ruleset) are merged to compute the global final result (global support of ruleset (G.Sup)) ( Fig. 6). However, the vertices lack additional attributes and there is nothing in the nodes themselves that allows the computation of a relationship. Besides, a path from one vertex to another one is a sequence of edges ( Fig. 5(b)). Considering this information, we define local support (L.Sup) as the probability to find a sequence of internal edges ''e'' between each pair of vertices (i.e., elements) A and B in the same cluster. And the global support (G.Sup) represents the ratio of the number of internal edges between two elements to the number of clusters (NC).

L.Sup(A → B) =
And confidence is defined as follows:  where E e l contains the internal edges ''e'' belong to the lth cluster, and global support(A) is the ratio of the probability distribution of |A| to the number of clusters (NC).
In other hand, in graph clustering the number of edges exceeds the number of nodes, thus to avoid high values of support and confidence, for each vertex, one edge is calculated (the edge that starts from the antecedent of the rule (Fig. 5(b))), instead of considering all of them.
In addition, an edge between two given nodes can be defined with the adjacency matrix M, where its elements M A,B = 1 when there is an edge from vertex A to vertex B, and M A,B = 0 when there is no edge (Fig. 5(b)). In this paper, the edges from a vertex to itself (loops) are ignored.
The resulting QARs are presented as follows: If the confidence is more than 50%, the relation is very significant and the edges between the two elements are effective.

end if
If the confidence is more than 39%, the relation is significant and the edges are mostly effective.
If the confidence is more than 10% the relation is low and the edges are ineffective.

VI. EXPERIMENTAL ANALYSIS
In our experiments, we implement GCQAR to regional geochemical pattern recognition for W-Zn-Mo-Ni-Au from 896 soil samples and Cu-Zn-Co-Pb-Ni-Ag from 1136 soil samples, of Gaobiegou and Gushenling area, respectively. In Xiaoshan, Henan province, China.
In this section, we will compare GC based on modularity optimization method [68] to the spectral partitioning that is used to generate overlapping groups (Luxburg) [71], Danon's greedy community detection agglomerative method (Martelot and Hankin) [72], and K-means (Serra and Tagliaferri) [73] that is widely used as partition method in Algorithm 2 Confidence of the Ruleset Input Ruleset a set of rules discovered; Local support of rule; Local support of antecedent of rule; Require: Confidence in NC 1 100%; 1. For each rule ∈ Ruleset do 2. Compute Confidence of rule. 3. end for geochemistry, so to demonstrate the features and operation of the proposed method for knowledge discovery in geochemistry.
The results were coded by lithology, using MAPGIS software package [74]. The experimental environments include an Intel Core i7-8550U 4.0-GHz CPU and 8 GB RAM.

A. STATISTICAL ANALYSIS
The statistical methods have performed in the description of the critical geochemical patterns [55]. The statistics have applied in as being descriptive such as mean, maximum, minimum, etc., for analyzing twelve elements (Tab. 1).
The elements concentrations are not normally distributed for Gaobiegou and Gushenling area (Figs.7, 8(a)).
Figs. 7, 8(b) show the histogram of the data after ilrtransformation. It can be seen that the distribution of the data has changed significantly.

B. GRAPH CLUSTERING RESULTS
The visualization of the graph clustering results is shown in Figs. 9-16 for Gaobiegou and Gushenling data, respectively. VOLUME 8, 2020 FIGURE 10. The partitions of the geochemical anomaly generated by (a) spectral clustering algorithm, (b) modularity maximization algorithm, and Geochemical anomaly maps obtained by (c) spectral clustering algorithm, (d) modularity maximization algorithm, for Gaobiegou area.
Figs.9, 13 present the clusters generated by modularity maximization algorithm, and the final separation is achieved at parameter = 0.5, appears in the x-axis.
Danon algorithm and modularity maximization algorithm can automatically discover the optimal number of clusters, and their results are very close. The spectral clustering algorithm requires providing the maximum number of clusters. Figs. 10, 14 (a,b) present the clusters of the geochemical anomaly generated by spectral clustering and modularity maximization algorithm for Gaobiegou and Gushenling area, respectively. The x-axis values describe the id of each node, and the y-axis values describe the number of clusters.

1) RESULTS USING DATA OF GAOBIEGOU AREA
In Gaobiegou area (Fig. 10d), the geochemical anomalies are typically detected at stratigraphy, which presents a set of metamorphic sedimentary clastic rocks, divided into two lithologic sections, and fit well into Tungsten deposit. In Fig. 10b the anomalies are characterised by large size, high intensity obvious concentration center and show a ring shape at W deposit, which must be given more focus. Both of spectral clustering (Fig. 10c) and k-means (Fig. 11a) method can identify the anomalies in different locations, with different shapes, but show low intensity. Fig. 12 shows the distribution of four clusters separately of Gaobiegou area presented in Fig. 10d.
In cluster 1, the geochemical anomalies are mainly detected at the center of the investigation area, and appear with a ring shape, and cover well W deposit. The anomalies are also spread along the faults and related to faulting activities.  In clusters 2 and 4, the geochemical anomalies are primarily identified at Yangsigou Rock Group. This is a set of metamorphic sedimentary clastic rocks, divided into two lithologic sections. The lithology of the lower rock section is black cloud and shallow granulite. It contains dolomitic shallow-grained rocks with dolomite schist and VOLUME 8, 2020 FIGURE 13. The visualization of (a) graph clustering generated by modularity maximization algorithm, and (b) partitions produced by danon algorithm, modularity maximization algorithm and spectral clustering, for Gushenling area.  the enrichment element W is relatively higher in these rocks.
In cluster 3, the geochemical anomalies are generally recognised at Xushan Group, and are slightly distributed at Yangsigou Rock Group. The Xushan Group is a set of medium-acid volcanic lava, which is mainly characterized by surface overflow, good layering, and forming a clear stacking layer. There are two lithological sections from bottom to middle. The lower section is gray-green andesite, andesite shale. The middle section is mainly the andesite porphyrite of the Great porphyry, where deposits of Au, Ni, Mo, and Zn are hosted. The elements W and Zn are higher in these sections. Besides, the anomalies are detected at the green amphibolite metamorphic domains of Fuping period, mainly the slanted amphibolite.

2) RESULTS USING DATA OF GUSHENLING AREA
In Gushenling area (Fig. 14d), the geochemical anomalies are typically detected at magmatic rock, where volcanic activity provides a source of deposits. In addition, the anomalies are spread along the rivers, and are characterised by high intensity and obvious concentration center. In Fig. 14c the anomalies detected by spectral clustering are obvious, but show low intensity at the same locations. However, the anomalies identified by k-means method (Fig. 15a) are generally detected at the west side of the investigation area. Fig. 16 shows the distribution of four clusters separately of Gushenling area presented in Fig. 14d.
In cluster 1, the geochemical anomalies are typically detected at magmatic rock of Xushan group, where the lithology is divided into two sections. The first (middle) section is mainly the porphyrites of the Great porphyry. The second (Upper) section is andesites and almond-shaped andesites. The porphyry is the primary cause for the presence of mineral deposits such as Cu, Zn, and Pb. The anomalies fit will the Cu deposit. Besides, the anomalies are distributed along the rivers, which provide a source of mineral deposits.
In cluster 2, the geochemical anomalies are generally distributed along faults, and are related to faulting activities of the investigation area. Furthermore, the fault occasionally opens to allow pulses of high-pressure fluid to be released toward the top, which is particularly rich in the elements of interest, and is important in hosting mineralization. In addition, the geochemical anomalies fit well the Cu deposit.
In clusters 3 and 4, the geochemical anomalies are distributed at the magmatic rocks, which in general contain mineral deposit.

C. QUANTITATIVE ASSOCIATION RULES
In this section, we implement quantitative association rules to reveal important details within each cluster, as illustrated in the support and confidence (section. 5). The minimum support was fixed according to the proportion of each cluster. Regarding the reliability and the number of the rules generated, the minimum value for the support and confidence measures was set to be 0.1 and 0.4, respectively.
The results obtained by QARs are shown in Tables 2, 3. Figs. 17, 18 show the local support of the elements in each cluster. The QARs proposed built a set of rules that cover different areas of the problem, which allow us to understand the anomalies generated.
In Gaobiegou area, the concentration of Tungsten (W) and Gold (Au) is high in four clusters. Meanwhile, Molybdenum (Mo), Nickel (Ni) and Zinc (Zn) are concentrated in three clusters.
As can be observed in Table 2 and Fig. 17, very significant association between W-Au, and Au-Ni with confidence of 0.67 and 0.55, respectively.
The strong association between W-Au can be explained by the fact that the tungsten occurs in vein deposits associated with granites along with gold, and can also be associated with various lithologies. There are, however, other possible explanations related to the geological process [75].  Significant associations for Mo-Ni, W-Mo, and Au-Mo with confidence of 0.50, 0.50 and 0.41, while the association of Mo with Zn is a little lower with confidence of 0.37.
Hence, the anomaly values divided into three categories : the high anomaly (> 0.50), moderate anomaly (0.50-0.39), and low anomaly (≥ 0.10) (Fig. 19). The high anomaly area occupies 3.4% of the total area, the moderate anomaly occupies 39.1% of the total area. In Gushenling area, the concentration of Silver (Ag) and Lead (Pb) is high in four clusters, Copper (Cu) and Cobalt (Co) are concentrated in three clusters. Meanwhile, Nickel (Ni) and Zinc (Zn) are only concentrated in two clusters.
As can be observed in Table 3 and Fig. 18, very significant association between Ag-Pb with confidence of 0.7. These VOLUME 8, 2020  elements are typically geochemically coherent, and their strong association probably indicates similar characteristics in the hydrothermal mineralization process and probably come from the same geological process. A significant association exists between Cu-Pb with confidence of 0.62. This strong association suggests their presence genetically related to the volcanic and/or subvolcanic quartz porphyry. Very significant to significant associations for Ag with Ni (0.41), Ag with Cu (0.42), Cu with Co (0.42), and Cu with Ni (0.41). Meanwhile, the association of Ag with Zn is a little lower with confidence of 0.25. This can be explanted by low concentration of the Zn samples in the study area.
Thus, the anomaly values divided into three categories : the high anomaly (> 0.50), moderate anomaly (0.50-0.39), and low anomaly (≥ 0.10) (Fig. 20). In Gushanling area, the high anomaly area occupies 2.4% of the total area, the moderate anomaly occupies 40.1% of the total area.
From the results, superior results are achieved with GCQAR than a result that is generated by k-means and spectral clustering. Therefore, our results cast a new light on learning the normal element behavior and highlighting anomalies related to it in geochemical data problem.

VII. CONCLUSION AND FUTURE DIRECTIONS
In this study, GCQAR method was implemented to recognize geochemical anomalies. The proposed method sequentially applies graph clustering and quantitative association rules. The results of this work lead to the following conclusions: The hybrid methodology combining graph clustering and QAR is a useful method for recognizing geochemical anomalies. Graph clustering is used to segment data into meaningful groups, and QAR is performed to learn the normal behavior of the elements and to highlight anomalies related to them.
The GCQAR has significant benefits in terms of recognition of significant geochemical patterns compared to the traditional methods used in the field of geochemistry.
The GCQAR can be used to not only delineate geochemical anomaly zones, but also to improve our understanding of mineralization. It can be a very suitable method for examining nonlinear and complex relationships caused by a variety of geological processes. Thus, the GCQAR is a potential method to be considered for use in geochemistry problem.
It can find high-dimensional clustering and provide the most suitable intervals of values belonging to the rules without implementing a discretization process. Moreover, it helps find reduced sets of significant rules from large dataset. This study will bridge a knowledge gap in terms of recognizing geochemical patterns formed over various lithology. Despite the success demonstrated, a significant limitation of GC (modularity maximization algorithm) is time consuming.
More broadly, the research is also needed to determine negative quantitative association rules. In future work, we plan to use association rules to isolate the overlapping groups by analysing the relation between the external edges, and considering negative quantitative association rules.