Research on Network Public Opinion in War Damage Incident of Major Water Conservancy Projects

Major water conservancy projects face the risk of damage from conflict, and the potential consequences of dam failure can lead to widespread public outrage, posing a threat to social stability. This study analyzed the formation model of network public opinion risk, using the Kakhovka Hydropower station damaged during conflict as a case study. Based on data analysis of netizen’ comments on Twitter and Weibo (a Chinese social media platform similar to Twitter), we used data mining algorithm to identify the primary topics of interest and examine the evolution of public opinion. The findings indicate that attention to incident resulting in war damage has a negative impact on social stability through a feedback loop. Netizen on Twitter and Weibo were primarily focused on “Flood” “Crisis” “Charge” and “Help” topics. Among them, Twitter was more popular in terms of public opinion and had faster dissemination speed. “Crisis” and “Charge” were more likely to generate public opinion risks. The potential impact of war damage on social stability is noteworthy, and social stability is largely dependent on the attention level of netizen.


I. INTRODUCTION
Major water conservancy projects face a high risk of being targeted or utilized as weapons in war [1].The Ruhr Dam's destruction in Germany during World War II caused significant damage to the local industries.Likewise, the catastrophic collapse of the Huayuankou Dam in China had a disastrous impact on the surrounding region.In recent years, there has been a rise in global instability, leading to frequent incidents of war damage, particularly those involving major water conservancy projects.On June 6, 2023, the Kakhovka Hydropower Station in Ukraine was destroyed, affecting over 16,000 people.Such incident brought an unforeseen risk to residents, who become stakeholders and accelerate the The associate editor coordinating the review of this manuscript and approving it for publication was Qilian Liang .escalation and dissemination of the situation.The dissemination of potential risks through the interest groups of social networks leads to increased public concern regarding the negative outcomes linked to such incidents.This situation affects the online public sentiment and may even provoke disturbances to social stability [2].
Scholars have extensively studied the potential impact of water conservancy projects on social stability.To evaluate the risk factors associated with these projects, complex systems are often utilized to create evaluation index models at a macro level.For instance, the physical-mature-human theory is applied to land requisition and resettlement in water conservancy projects that determine the index value of risk factors [3].Furthermore, an evaluation of indicators for hydropower projects was conducted, that simplifies and enhances the indicators' objectivity [4].However, macro models lack a comprehensive analysis of the interactions among social network participants, thus preventing the revelation of complex networks based on social connections, and the exposure of the impact of stakeholder interaction on conflict escalation.
Researchers have directed their focus to micro level studies, uncover macro features of interactional relationships by social networks.Specifically, they analyze pertinent risks and mutual connections among stakeholders, as well as identifying crucial social stability risks [5].Moreover, the Net-Logo simulation platform was applied to examine the propagation of risks through social networks based on interpersonal connections [6].Micro level model simulations reveal that stakeholder interactions utilizing social networks accelerate the development and dissemination of potential hazards.
To align simulation results with engineering practice, researchers have implemented various enhancements to social network models.Such as, the behavior of the public in five different risk states was simulated using the Agent-based model [2], and the causality of society stability risk variables was explored through the fault tree analysis [7].Additionally, the complexity of stakeholder attention and uncertainty arising from stakeholder diversity were addressed using the Bayesian network [8].However, these updated models limit the stakeholders to network nodes based on interpersonal relationships within the project and region, without considering any subjective evaluations.Consequently, the role of network public opinion was failed to acknowledge in social network models, resulting in disparities between engineering practices and simulation outcomes.
It has been found that network public opinion can amplify conflicts between stakeholders and exacerbate risks to social stability.For example, conflicts over hydropower development in a certain region have sparked an anti-small hydropower movement at the societal level [9], Also, researchers have found that studying network public opinion can better identify stakeholders and analyze conflicts of interest.Druckman [10] examined the polarization of public opinion during the COVID-19 crisis to understand how American citizens hold different positions on policies and parties.Lyu [11] collected Twitter users' views on hashtags such as #StopAsianHate to identify supportive and opposing groups in the movement.Meanwhile, Lamsal et al. and Khan et al. [12], [13] have used social media platforms such as Twitter and Weibo to mine target groups and analyze public attitudes.Su, Chen [14], [15] have compared the behaviors of Twitter and Weibo user groups to analyze cognitive differences.Obviously, network public opinion plays a vital role in identifying stakeholders, clarifying interaction relationships, and addressing uncertainties.
The aforementioned study outlines the potential for network public opinion in relation to social stability risks.Furthermore, there is a lack of a comprehensive research that assesses the effects of the social stability risks posed by incidents of war damage.However, current research in this area is limited and primarily focuses on evaluating conventional risks inherent in the project, neglecting sudden risks associated with network public opinion.In particular, Twitter (English) and Weibo (Chinese) netizen' views on the incident, as well as netizen' public opinion patterns and developments in topics of interest.
Therefore, this paper aims to examine war damage incident of major water conservancy projects to better understand their impact on social stability.Section I examines how network public opinion risks are formed.Section II details the process of data collection and research methods employed during the study.Finally, we analyzed the topics of concern and evolution patterns among netizen.The study concludes by presenting some final thoughts and conclusions, and the findings are significant for guiding online public opinion and reducing the risk of social instability.

A. RISK INFLUENCE MECHANISM
During times of war, major water conservancy projects faced a substantial risk of war damage, resulting in catastrophic flooding.Floods represent the most severe kind of natural disaster, with the potential from dam failure or collapse causing the most devastating flooding.Albu [16] examined the dangers of floods caused by breaches in dams, while Wang [1] assessed dams' susceptibility to underwater explosions.For instance, the Kakhovka Hydropower Station is located in a conflict zone shared by Russia and Ukraine.The collapse of the dam, resulting from war damage, inflicted catastrophic downstream damage, which ignited a public outrage on social media.
Objectively, war damage poses a significant risk to public.According to social psychology, the psychological perception of the public regarding natural disasters and war damage varies [17].The most severe psychological trauma comes from war damage and other ''man-made disasters'' which brings the public closer to the disaster.This trauma can have impacts on various societal tiers through social media, which can incur risks to the public opinion and lead to unpredictable incident [18].
In 1958, Forrester [19] introduced System Dynamics (SD), which proposes that a system's character and behavior depend substantially on its internal dynamic structure and feedback mechanism.SD is an appropriate approach for investigating the dynamic principles of the propagation of public opinion based on social network ties [20].Therefore, this paper analyzes the rules that govern the generation of public opinion risk through implementation of the SD model.It views netizen, online media, and society as subsystems and explores the impact of public opinion on social stability, as illustrated in Figure 1.
(a) The proliferation of network public opinion and the increasing focus on netizen imply a favorable feedback cycle within the netizen subsystem.The incident triggered debates on Twitter and Weibo, resulting in a rise in netizen posts and more significant involvement, ultimately intensifying network public opinion.(b) The subsystem of online media operates as a positive feedback loop.Online news coverage promotes netizen' views and behaviors, while netizen' increased attention to an incident, resulting in more news being published by online media.Consequently, online media participation intensifies, leading to further increased intensity of network public opinion.
(c) The social subsystem forms a negative feedback loop.Increased social attention results in a higher frequency of news releases, demonstrating greater social participation and an increase in network public opinion intensity.However, this surge in public opinion can harm social stability [18].Consequently, the resulting rise in social instability further elevates social attention.
(d) Overall system consists of three subsystems, two of which are positive feedback systems, while one is a negative feedback system.The attention of netizen serves as the initial factor of the entire system, and the resultant network public opinion carries the potential risks to social stability.
To sum up, the main objective of this study is to analyze the topics that are drawing the attention of netizen, and to examine the evolution in network public opinion.

B. DATA COLLECTION
Twitter and Weibo are two notable social media platforms with comparable influence, functionality, and audience target.While sharing similarities, they remain autonomous platforms that offer valuable insights into network public opinion, and provide a comprehensive analysis.
The data were collected from Twitter (English-speaking netizen) and Weibo (Chinese-speaking netizen) during 30 days statistical period from June 7 to July 6, 2023, accounting for time differences.The search term ''Kakhovka'' was used, and the Python intelligent crawler was employed to capture hourly review text data.Eventually, a total of 16,679 Twitter comments and 4,052 Weibo comments were collected, resulting in output text set D.
To ensure accurate analysis of netizen' comments, it is crucial to eliminate redundant and duplicative data.This can be achieved through the following steps: eliminating duplicates, filtering, and cleaning.Firstly, repeated comments will be filtered to retain only the text of the first comment to ensure objectivity.Secondly, String filtering is used to remove comments irrelevant to this incident and remove invalid symbols and punctuation and meaningless text.Finally, Repeat the previous steps and modify, add, or delete based on the actual results.For instance, strings such as ''coupons'' and ''welfare'' will be removed as they are not related to the topic.

C. NOISE REDUCTION
The commenting behavior of netizen on Twitter and Weibo is characterized by high frequency, low volume, and high noise.To mitigate the effects of this noise, the document frequency (DF n ) can be reduced using the least common term (LCM) threshold, and the low-frequency feature set can be processed more effectively using the information gain (IG) method.Therefore, this paper integrates the DF n -IG function to perform noise reduction for the text set D, as below: where  2 displays the noise reduction outcomes for various thresholds n.As the frequency threshold increases, the number of terms eliminated in each pass gradually decreases.The optimal noise reduction effect is achieved when n=7 for both Chinese and English keywords.Therefore, the minimum term frequency threshold is set at the lowest point of a new increase that words with frequencies below 7 are removed from the text set D. Ultimately, Twitter has 3127 words remaining, while Weibo has 2135.

D. KEYWORD EXTRACTION
Public opinion keywords should precisely reflect the event's characteristics and scope.The term frequency-inverse document frequency (TF-IDF) algorithm identifies significant keywords from high-frequency words in denoised text.As a statistical analysis method for words, it differs from traditional word frequency statistics.TF-IDF algorithm states that a word's importance is directly proportional to its frequency in the text and inversely proportional to its frequency in the corpus.It effectively filters common words, identifies infrequent feature words, and highlights important words that reveal event characteristics and define event scope [21].
Identifying keywords is fundamental to topic analysis.After conducting several topic cluster analyses, it has been found that keywords with a weight greater than 0.01 can lead to better convergence and more comprehensive coverage of the core views of public opinion.The equation is as follows: where TF i,j denotes the frequency of keyword t i in text set D, while N i,j signifies the number of occurrences of the t i in text j, and k N k,j stands for the total number of text j containing the t i in text set D. While IDF i,j represents the probability of occurrence of the t i in text set D, |D| denotes the total number of texts in D, and |{D i ∈ j:t i }| indicates the number of texts containing t i , we add 1 that prevent data from not existing.TF i,j IDF i,j indicates importance of the t i .

E. PERPLEXITY ANALYSIS
The optimal number of clusters for the extracted keywords is determined using the perplexity function.This function evaluates the probability model distribution, and a lower degree of perplexity improves the clustering effect.
If the perplexity curve reaches its lowest point and the number of topics is relatively small, the algorithm obtains the value of k.The keywords are then clustered into k topics using the K-Means algorithm.The calculation equation is as follows: (5) where n represents the nth text.P (w n ) is the occurrence probability of each keyword in the nth text.D n is the total number of words in the nth text.
The perplexity values of the D text set are calculated by (5).As the number of topics increases, the perplexity curve initially decreases and subsequently increases, as shown in Figure 3.It is noteworthy that once 8 topics are reached, both the Chinese and English perplexity curves associated hit their lowest point, indicating optimal clustering at this point.Therefore, we set the number of clustering topics k=8.

F. CLUSTER ANALYSIS
K-Means algorithm utilizes the perplexity function to classify keyword clusters, suitable for large datasets and complex contexts of netizen comment text [22].This clustering method measures sample similarity, with the lowest similarity between clusters and the highest similarity within clusters.The closer the distance, the greater the similarity.
Initially, define as the i th cluster with k clustering centers, and define the j th text, where n is the total number of texts in D.Then, followed by distance calculation between each text in D and each cluster center.Clusters are separated based on relative distance, and the recalculated average forms new cluster centers.This process is repeated until the K-Means clustering algorithm achieves convergence.
Finally, combined with the results of the perplexity analysis (k=8), the K-Means algorithm is used to cluster extracted keywords.The calculation equation is as follows: where c i as the ith cluster with k centers.d j as the jth text in text set D. The average values for the keywords in the cluster are denoted by u i .The clustering objective function is defined as the minimum squared error E, with better model distribution achieved as E decreases.

G. CO-OCCURRENCE ANALYSIS
There is a complex network of relationships among topical words, rather than a simple linear relationship.Based on cluster analysis, it is crucial to evaluate the co-occurrence correlation among topic words for potential semantic information extraction.The Jaccard algorithm is implemented in this study to examine the co-occurring frequency of keywords and form a semantic-oriented co-occurrence network [23].
The Jaccard algorithm is used to sets the co-occurrence relation (side) as the Jaccard coefficient, and the resulting Jaccard coefficient demonstrates the potency of co-occurrence frequency and relevance between keywords.When two words frequently appear together in the same time and space parameters (such as text, paragraph, or sentence), their co-occurrence frequency and semantic significance are both high [24].The calculation equation is as follows: where J mn represents the frequency of the co-occurrence of keywords t m and t n , while Count(t mn ) represents the total number of times they have been observed together.Similarly, Count(t m ) represents the frequency of m and Count(t n ) represents the frequency of n.

A. THE ATTENTION OF NETIZEN
Based on the above analysis, netizen' comments from Twitter and Weibo are grouped into English public opinion topics E 1 -E 8 and Chinese public opinion topics C 1 -C 8 , and produced a co-occurrence network of keywords, as depicted in Figure 4.
Additionally, the are represented as ''bubbles'' and their co-occurrence relationship as ''edges''.The thickness of the ''edges'' indicates the level of co-occurrence frequency, while the size of the bubbles depicts the frequency of keyword occurrences, with different colors indicating distinct topics.Also, the solid line represents the interaction within the topics, and the dotted line indicates the connection between them.
(a) The English public opinion topics E 1 , E 2 , E 3 , and E 5 have a strong correlation.This indicates that the Kakhovka Hydropower Station War Damage Event (E 1 ), resulted in Nuclear Crisis (E 2 ), Floods (E 3 ), and Water Shortage (E 5 ).According to Twitter netizen, there exists a correlation between floods (E 3 ) and the obstruction of Ukraine's counteroffensive (E 4 ).They also support providing assistance to victims (E 8 ).Additionally, it is indicated that the Kremlin is lying (E 6 ), and that a significant number of Black Sea creatures have perished (E 7 ).
(b) Chinese public opinion topics Floods (C 2 ), Nuclear Crisis (C 5 ) and Civilian Deaths (C 8 ), are closely linked to the Kakhovka Hydropower Station War Damage Event (C 1 ).Additionally, topic C 3 is connected to C 4 , indicating that Weibo netizen suspect US involvement in a similar conspiracy to sabotage the Nord Stream pipeline (C 4 ), which is also related to humanitarian disaster (C 3 ).Finally, it is indicated that a worldwide food crisis arising from the ongoing conflict (C 6 ), and depicts China's demand for peace (C 7 ).
To sum up, the networks of topics reveal semantic connections between keywords and showcase viewpoint context within the plot, enabling exploration of the context and characteristics of netizen perspectives.For instance, the keywords ''Kherson'' ''region'' ''flooding'' ''people'' and ''evacuate'' suggest that the flood in Kherson region compelled citizens to evacuate.Moreover, the phrases ''China'' ''concern'' ''conflict'' ''urged'' and ''peace'' suggest China's concern about the conflict and urge for peace, as summarized in Table 1.The truth of the incident is unknown, and this paper only elaborates netizen' comments.
From the perspective of topic distribution, as shown in Table 1 and Figure 4, Twitter and Weibo netizen' regarding this incident are focused on the following topics: ''Flood'' represents the direct consequences and its impacts caused by the incident, including Floods (E 3 and C 2 ), as well as their related consequences, like Ukraine's counteroffensive obstruction (E 4 ), Water Shortages (E 5 ), and Civilian Deaths (C 8 ).The discussion appears to be centered around these topics due to their similarity and relevance.
''Crisis'' represents the indirect consequences caused by the incident, including Nuclear Crisis (E 2 and C 5 ), Ecological Crisis (E 7 ), and Food Crisis (C 6 ).Additionally, Twitter netizen tend to focus more on the Ecological Crisis (E 7 ), while Weibo netizen tend to prioritize the Food Crisis (C 6 ).
''Charge'' represents the allegations made by netizen against the incident, such as the Kremlin's lies (E 6 ), humanitarian disaster (C 3 ), and US conspiracy (C 4 ).Additionally, there exists a noticeable disparity in the perspectives of netizen.Twitter netizen tend to support the notion of Russia's sabotage (E 6 ), while Weibo netizen tend to favor the notion of the US's conspiracy (C 4 ).
''Help'' represents the appeals made by netizen in response to the incident, which may involve providing aid (E 8 ) or promoting peace (C 7 ).Furthermore, Twitter and Weibo netizen are in agreement that ''Flood'' and ''Help'', however, there is a cognitive differentiation between ''Crisis'' and ''Charge''.

B. EVOLUTION TREND OF NETWORK PUBLIC OPINION 1) MACRO TREND
To analyze evolutionary trends, text set D was divided into daily time windows of equal duration.Then, create graphs to track the level of public engagement on Twitter (English) and Weibo (Chinese) based on the daily count of netizen posts over the 30-day period that followed the war damage incident.The results are displayed in Figure 5.
Firstly, Twitter shows faster transmission speed and higher public opinion intensity compared to Weibo.Also, Weibo's transmission speed lags behind Twitter's by approximately 1-2 days, as seen in the time it takes for the public opinion curve to reach its peak.
Additionally, Chinese public opinion intensity gradually decreases after the peak.In contrast, English public opinion displays fluctuations with various peaks and valleys during the periods of June 7-16, June 17-26, and June 27-July 6 that can be categorized into three stages.

2) MICRO TREND
This study presents an improved analysis of netizen' attention trends by com-paring the percentage of heat for each topic, rather than simply aggregating comment data.Additionally, the data can be independently analyzed within a specific time frame for a static model, or linked to data from other time frames for a dynamic model.
Take the abscissa as the date and the ordinate as the popularity percentage, which represents the proportion of comment text data for a certain topic within a certain time period in the corresponding text set.Based on the topics that netizen pay attention to, such as ''Flood'' ''Crisis'' ''Charge'' and ''Help'' we construct the change curves of netizen' attention in three stages (June 7-16, June 17-26, and June 27-July 6).The results are shown in Figure 6.
From the trends of ''Flood'' as shown in (a) and (b), Twitter and Weibo netizen show considerable interest in Flooding.However, their attention diminishes as the incident continues to unfold.At the same time, the extent of change in the Flood effect remains stable.
From the trends of ''Crisis'' as shown in (c) and (d), the attention of Twitter and Weibo netizen toward ''Crisis'' suggests a rising trend, with minor fluctuations in the first and second stages and a rapid surge in the third stage.Additionally, the attention to nuclear crisis is comparatively significant.
From the trends of ''Charge'' as shown in (e), Twitter netizen demonstrates a steady rise in interest in ''Charge''.In contrast, Weibo netizen initially display an upward trend of interest in ''Charge'' followed by a decline.Additionally, topic C 4 generates more attention than C 3 , Weibo netizen associate the US's conspiracy at C 4 with a humanitarian crisis at C 3 , which aligns with the curve's findings.
From the trends of ''Help'', as shown in (f), Weibo netizen is increasingly interested in ''Help'' over time, whereas Twitter netizen initially exhibit an interest that then decreases.
Analyzing the evolution rules of topics that netizen pay attention to can help trace the evolution of network public opinion at the micro level, and it is beneficial to discover important risk factors based on the evolution curve.To sum up the above analysis, there are the following rules for the attention of Twitter and Weibo netizen: 1) War damage incidents spread faster, generate significant public interest, and are more likely to elicit public opinion reactions on Twitter.Conversely, the public opinion heat on Weibo tends to be short-lived.
2) During the first stage, there was a notable upswing in Twitter and Weibo netizen exhibited high interest in the ''Flood'' culminating in the initial height of public opinion.In the second and third stages, escalating Twitter netizen' concentration on ''Crisis'' and ''Charge'' became instrumental in creating a subsequent public opinion peak.
3) As per the previous analysis, there is a consensus among netizen regarding ''Flood'' and ''Help'' topics, while ''Charge'' and ''Crisis'' are controversial.By examining the trends of attention curves, it is evident that ''Flood'' is the primary consensus and ''Help'' is the secondary consensus.However, ''Charge'' is the main point of contradiction, and ''Crisis'' is the secondary point of contradiction.

IV. DISCUSSION
This paper presents new research identifying that major water conservancy projects not only face war damage risk, but also lead to public opinion risk, which exposure public to psychological trauma and brought social instability risks.To analyze this process, we employ SD theory to analyze public opinion risk by three subsystems: netizen, media and social.Positive feedback loops are found to shape network public opinion by the attention of netizen, ultimately impacting social stability through negative feedback loops.
To elaborate further, the DF n -IG function is utilized to decrease the noise of comments, and the TF-IDF algorithm is employed to extract high-weight keywords.While the perplexity function is utilized to determine the optimal cluster number K of keywords, and the K-means algorithm is used for clustering.Then, the Jaccard algorithm is utilized to analyze the co-occurrence relationship of keywords, which resulted in the English topics E 1 ∼ E 8 and Chinese topics C 1 ∼ C 8 , and centered on ''Flood'' ''Crisis'' ''Charge'' and ''Help'' for both netizens.There are variances exist in opinions between both sides with regards to ''Crisis'' and ''Charge'' Additionally, we examined the macro trend in network public opinion evolution, and the micro trend in the attention of 4 types topics.The results indicate that Twitter exhibits quicker transmission rates, higher public interest, and a greater likelihood of causing public opinion peaks than Weibo, whose public opinion peaks take place a day or two after Twitter's.Initially, topics ''Flood'' and ''Charge'' received significant attention, resulting in the first peak of public opinion.Subsequently, the Twitter netizen' interest in ''Crisis'' and ''Charge'' increased, leading to a subsequent surge in public opinion.
This study presents a novel method for cluster analyzing topics.While conventional methods tend to classify keywords into various clusters, they neglect to examine the correlation among keywords in clusters [25].Therefore, we included an extra co-occurrence analysis phase, which involves constructing a network of topic-keywords based on their co-occurrence relationships.Furthermore, we divided the evolution of network public opinion into 3 stages, and analyzed the varying proportions of the four types topics of netizen' concerns within these stages, facilitating the tracking of netizen' attention trends.This method offers an intuitively understanding of netizen' public opinion.
As the first study to examine public opinion on incidents of war damage in different cultural backgrounds related to major water conservancy projects, which resulting public opinion pressure can encourage the government to adopt more effective water conservancy project protection policies and measures.Also, some limitations are in this study.Firstly, due to data protection policies, there may be insufficient comment text data from Weibo netizen, which may distort the analysis results from the actual.Conversely, more information is available on open social platforms like Twitter, providing better simulation of reality.Secondly, we examined the attention topics of netizen based on the co-occurrence relationship, which depends on similarity.However, the cognitive differences between Twitter and Weibo netizen are not discussed in depth.Moreover, there may be more efficient ways or different methods, such as deep learning models proposed by Alsaeedi for monitoring Twitter data [26], after which we will work on these aspects.

V. CONCLUSION
Netizen and media systems create a positive feedback loop, attracting more attention from groups and triggering public opinion.This, in turn, forms a negative feedback loop through the social subsystem, which affects social security and stability.
37254 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
For the same extreme risk, Twitter and Weibo netizen are in agreement that ''Flood'' and ''Help'', however, there is a cognitive differentiation between ''Crisis'' and ''Charge''.Initially, topics ''Flood'' and ''Charge'' received significant attention, resulting in the first peak of public opinion.Subsequently, the Twitter netizen' interest in ''Crisis'' and ''Charge'' increased, leading to a subsequent surge in public opinion.
Twitter exhibits quicker transmission rates, higher public interest, and a greater likelihood of causing public opinion peaks than Weibo, whose public opinion peaks take place a day or two after Twitter's.

FIGURE 1 .
FIGURE 1. Public opinion risk formation system based on netizen, media and social.

FIGURE 2 .
FIGURE 2.Noise reduction results at different thresholds.

FIGURE 5 .
FIGURE 5.The change curves of public opinion popularity.
t refers to a single word in the text, while D represents the text set and T is a single text within the set.DF n (t, D) represents the number of texts in which t occurs at least n times in the set.Similarly, DF n (t, D) represents the number of texts in which t occurs less than n times in the set.DF n (t, T ) is assigned a value of 2 if t occurs at least n times in a single text T , otherwise its value is 1.Similarly, DF n (t, T ) is assigned a value of 2 if t occurs less than n times in a single text T , otherwise its value is 1.The calculation of DF n -IG follows equation (1), and Figure