Panel Data Causal Inference Using a Rigorous Information Flow Analysis for Homogeneous, Independent and Identically Distributed Datasets

Panel data, which consist of observations on many individual units over two or more instances of time, have gradually become an important type of scientific data. Subsequently causal inference for panel data has attracted enormous interest from many fields as well as statistics. In this study, the rigorously formulated information flow analysis for time series, which is very concise in form and has been successfully applied in different disciplines, is generalized to identify the causality from homogeneous and independent identically distributed panel data. The resulting formula bears the same form as that for the former, though the meanings of the symbols differ. An algorithm is then proposed for panel data causality analysis, which has been validated with both linear and nonlinear problems. It has also been put to application to examine the causal relations among economic growth, energy consumption, trade openness, and energy price based on 15 Asian countries. Clearly identified are a strong bidirectional causality between economic growth and energy consumption, and a strong causality from import and export trade to economic growth. Energy price has no direct impact on energy consumption; it, instead, exerts a weak effect on the latter through influencing economic growth.


I. INTRODUCTION
In the past two decades, data have been accumulated at an exponential rate in essentially all fields, partly due to the easy access to social media and the interconnectivity of our society [1]. How to mine the causal information from the different datasets hence becomes a hot issue in the digitized society [2]. One direct way is to identify the causal possible relations. Unfortunately, causal inference is a very challenging problem. So far as of today, the methodologies for identifying causality are yet to be improved [3].
Most of the data can be classified into three categories: temporal data, cross-sectional data, and panel data. A set of temporal data or time series is a series of data points indexed (or listed or graphed) in time order. Differently, data collected by observing many individuals at the one instance of time is termed cross-sectional. Time series and cross-sectional data The associate editor coordinating the review of this manuscript and approving it for publication was Arif Ur Rahman .
can be thought of as special cases of panel data, which consist of observations on many individual units over two or more periods of time. There are several important advantages of panel data comparing to data sets with only a temporal (time series) or individual (cross section) dimension [4], one being the ability to control for possibly correlated, time-invariant heterogeneity without actually observing it. Besides, panel data can reduce the collinearity among explanatory variables, increase in efficiency of estimators, and alleviate problems of aggregation.
Several methods have been proposed to make causal inference with panel data, among which the most popular one is Granger causality analysis, which is based on the idea that the cause occurs before the effect, hence if an event X is the cause of another event Y , then X should proceed Y [5]. (This basis, however, is recently challenged by an observation with a purported generated dynamical system with synchronization; see [6]). For example, Holtz-Eakin et al. [7] considered estimation and testing of vector autoregression (VAR) coefficients in panel data to calculate the Granger causality, and applied the techniques to analyze the dynamic relationships between wages and hours worked in two samples of American males. The same method was used by [8] to report the findings on the relationship between foreign direct investment and pollution across 112 countries over 15-28 years. Kónya [9] used the Panel-Data Granger causality test approach based on bootstrap and studied the relationship between exports and economic growth in OECD countries. Similar approach was adopted by Bedir and Yilmaz [10] to examine the causal relation between the logarithms of the human development index and CO 2 emissions in 33 organizations for economic cooperation and development countries. Gupta and Singh [11] employed the Johansen cointegration technique followed by vector error correction model (VECM) and standard Granger causality test to investigate the causal linkage between FDI and GDP of BRICS nations. These applications are generally successful in their respective contexts.
Despite these studies, the causality analysis for panel data is still in its early stage of development. Both theoretically and practically there still exist much room for improvement. Recently, it has been realized that causality and information flow (IF) are real physical notions and hence can be put on a rigorous footing. In other words, they can be derived from first principles in physics [12], rather than axiomatically proposed as an ansatz.
Effort along this line can be traced back to the early work by Liang and Kleeman [13] on IF, but its ability has just been recognized with the publication of the time series study by Liang [14], where it is shown that causality can be assessed in a very easy way, with only sample covariances involved. The resulting formula, albeit simple, proves to be remarkably successful in solving many problems which defy the traditional approaches. It also fixes the philosophical debate on causation versus correlation (cf. section 2). Ever since then, the IF-based causality analysis has been widely applied to the problems with time series such as global warming [15], [16], El Niño [14], typhoon genesis prediction [17], space weather [18], chlorophyll variability [19], relation between soil moisture and precipitation [20], financial time series analysis [21], neuroscience problems [22], to name a few.
Considering the success of the IF-based causality analysis for time series, we henceforth want to generalize it to panel data. In the following a brief introduction of the theory is first presented, then in section III, we show that a generalization can be fulfilled, and an algorithm is then proposed. In section IV, the algorithm is validated with a linear stochastic system and a highly chaotic deterministic system. Section V give an application and section VI summarizes the whole study.

II. INFORMATION-FLOW AND CAUSALITY BETWEEN TIME SERIES-A BRIEF REVIEW
Different from the various statistical approaches for causal inference, the information flow-based causality analysis is derived from first principles in physics. Ever since Liang and Kleeman [13], much effort has been invested to establish a rigorous formalism which has just been fulfilled [12]. Accordingly a causal inference technique is developed for time series [14]. It is concise in form, easy to implement and, moreover, quantitative in nature (see below (3)). Since its advent, many applications in different disciplines have been carried out with remarkable success. The following material is just a very brief introduction of the theory that is needed for this study. For a systematic treatment and other materials, see [12], among other papers.
This line of work begins with the concept of information flow which is defined as follows: Definition II.1 In a dynamical system ( , t ) where is the phase space and t may be a flow or a discrete mapping, the information flow from a component X 2 to another component X 1 , written T 2→1 , is defined as the contribution of entropy from X 2 per unit time (continuous time case) or per step (discrete mapping case) in increasing the marginal entropy of X 1 as the state is steered forth by t .
With this, causality can be defined, in a quantitative sense, Definition II.2 X 2 is causal to X 1 iff the information flow T 2→1 = 0. The strength of the causality from X 2 to X 1 is measured by |T 2→1 |. Likewise, the causality from X 1 to X 2 can be defined.
Remark 1. A nonzero T 2→1 may be either positive or negative. A positive T 2→1 means that X 2 makes X 1 more uncertain, and vice versa. But for the purpose of causal inference, the sign is not essential; we just consider its magnitude.
Remark 3. In the above definitions entropy is generally understood as Shannon entropy, but other entropies may also apply. In this study, we stick to Shannon entropy. Now consider a two-dimensional (2D) stochastic dynamical system where F = (F 1 , F 2 ) is the vector of drift coefficients, X = (X 1 , X 2 ) ∈ R 2 are the random variables, W = (W 1 , W 2 ) is a standard 2D Wiener process and B = b ij is the matrix of diffusion/volatility coefficients. Liang [24] established that the time rate of IF from X 2 to X 1 with respect to Shannon entropy is: where ρ is the joint probability density function of X, ρ 1 is the marginal density of X 1 , g 11 = 2 k=1 b 2 1k , and E is the expectation with respect to ρ. Later on it has been shown that the formula is the same with respect to Kullback-Leibler VOLUME 9, 2021 divergence [25]. Likewise, the IF from X 1 to X 2 is Ideally, if T 2→1 = 0, then X 2 is not causal to X 1 ; otherwise it is causal, and the magnitude of |T 2→1 | means the strength of the causality. The larger |T 2→1 |, the stronger causality from X 2 to X 1 .In practice, significance should be tested prior to making the inference. The above derived information flow has many important properties. The first is the ''Principle of Nil Causality'' [12]: a process, say X , has a zero causality to another process, say Y , if the evolution of Y does not depend on X . This is a basic principle that all formalisms try to verify in applications, while in this formalism, it is a proven theorem. Many other properties can be seen in [12] and [25].
The IF formula has been validated with many highly chaotic systems, such as baker transformation, Hénon map, Kaplan-Yorke map, Rössler system, truncated Burgers-Hopft system, to name a few [12], [26]. Under a linearity assumption, Liang [14] further established that it can be estimated from two time series, say, X 1 and X 2 . The resulting maximum likelihood estimator is: where C ij is the covariance between X i and X j , C i,dj is the covariance between X i andẊ j , andẊ j = (X j (t + k t) − X j (t))/(k t) is the difference approximation of dX j dt using the Euler forward scheme. Here k is usually 1; for cases of deterministic chaos, it should be set 2. This formula is very simple in form but evidently very successful in real applications, some of which have been mentioned in the introduction above.
Considering that there is a long-standing philosophical debate over causation versus correlation, rewrite (3) in terms of correlation coefficients: here r = is the sample correlation coefficient between X 1 and X 2 , and r i,dj = the converse, however, is not necessarily true. In other words, causation implies correlation, but correlation does not imply causation. Equation (4), therefore, bridges causation and correlation with a simple mathematical relation.

III. CAUSALITY ANALYSIS FOR HOMOGENEOUS I.I.D. PANEL DATA-AN ALGORITHM
Panel data not only consist of observations over time, but also over many individual units. The above dynamical systembased formula then may not be directly applicable. This is different from Granger causality, which is fundamentally a notion of probabilistic conditional independence, and hence can be applied not only to time series data but also to crosssection and panel data [27]. We need to re-establish from scratch a formula of the like of (3). We first give a definition for panel data causality. Definition III.1 For a homogeneous panel dataset, the causality from a variable, say X 2 , to another variable X 1 between two cross-sections is defined as the absolute value of the information flow from X 2 to X 1 as the underlying system evolves between the two steps.
Remark: For panel datasets with more than 2 crosssections, a relation of causality vs. time step can be obtained by computing the information flows between adjacent steps.
As Liang [14], we assume a linear model. Though this sets a limitation, the formula (3) has proved to be remarkably successful in many highly nonlinear problems. In fact, this is not surprising; anyhow, when correlation is referred we usually mean linear correlation.
Theorem III.1 Suppose a homogeneous i.i.d. panel dataset is generated through some linear system with Wiener processes, and X 2 and X 1 are the two variables of the dataset. Then the information flow from X 2 to X 1 between two adjacent steps (t, t + t) is where σ ij are population covariances between X i (t) and X j (t), and Proof It has been established in [14] that (2) is reduced to which is remarkably simple. Here σ ij make the entries of the population covariance matrix. We now estimate this formula, given an individual independent ensemble of panel data with two time instances spanned by an interval t. Different from the time series considered in [14], which requires some extra assumption such as stationary, here the estimation of (7) turns out to be much easier. The reason is that (2) appears in a form of ensemble mean, while a set of panel data provides a natural ensemble. As Liang [14], discretize (1) with the Euler-Bernstein scheme the dynamical system to get where w ∼ N (0, t). For convenience, rewrite it as Considering the availability of the ensemble, take ensemble mean to get Subtracting (9) from (8), then multiplying by (X 1 − EX 1 ), and taking expectation, we get This is where σ ij are covariances between X i and X j , and Likewise, the difference between (8) and (9) multiplied by (X 2 − EX 2 ), followed by a mathematical expectation results in (10) and (11) combined to give Substitute back to (7) and we get . Q.E.D.
In real applications, the population covariances need to be replaced with sample covariances. This results in a formula which is in a form precisely the same as (3), except that now the mean is ensemble mean at time t, not time average. Here T 2→1 is understood as an estimator, and should have been writtenT 2→1 , but for simplicity, the hat is dropped. Similarly, the IF from X 1 to X 2 is which is absolutely different from (13). This naturally indicates the direction of causality. If the absolute value of T 2→1 (|T 2→1 |) passes the significance test, it is believed that X 2 is the cause of X 1 . Similarly, if |T 1→2 | passes the test, X 1 is the cause of X 2 . When there are multiple time steps, say K steps, (13) may be applied to each two adjacent time instances, and hence obtain (K − 1) information flows, over which an average information flow result. We hence have the following algorithm.

A. A LINEAR PROBLEM
We first use a discretized version of (1) to generate a set of panel data. Assuming that F and B have the following form a) The initial distribution (blue spots) and the ensemble mean (red spot) of X 1 and X 2. b) The distribution of X 1 and X 2 at t = 150 unit time.

FIGURE 2.
A typical series generated by the 2D autoregressive process initialized with X 1 = 0.3 and X 2 = 0.4.
Choose t = 0.01, and hence W = √ tR N , where R N is a random number satisfied the standard normally distribution. This forms a 2D autoregressive process. Clearly, X 1 causes X 2 , but not vice versa. This kind of problem is usually used to verify a causality analysis: One component causes another, but the latter does not cause the former. We initialize the system by making 10000 draws as follows: Fig. 1a shows that the initial distribution of X 1 and X 2 roughly meet the normal distribution of: For each initial condition the system is integrated for 15,000 steps, and the resulting X 1 and X 2 are recorded, and eventually form the ensemble. When t = 150 (Fig. 1b), the distribution has been inclined along the direction of X 2 = X 1 , which means that X 1 and X 2 are no longer independent. Fig. 2 is a typical series with the initial condition of X 1,t=0 = 0.3 and X 2,t=0 = 0.4. After t = 10, the system reaches a quasi-stationary state. We hence discard the segment t < 10 in forming the panel data. According to the size of ensemble or number of individual units (N ), and temporal series (K ), panel data are generally divided into three categories: the 'large K , small N ' temporal style long sequences; the 'small K , large N ' panel literature, and the 'large K , large N ' heterogenous panel data [4]. By the assumptions in Theorem III.1, here the heterogeneous case is excluded. Based on this we henceforth VOLUME 9, 2021 Algorithm-IF Information Flow for Homogeneous i.i.d. Panel Data Input: Panel data X p1 and X p2 with dimension N × K , where N is the number of individual units and K the time steps, every two adjacent steps separated by a time interval t.
Step 3: Substitute the covariances into (13) and (14) to get the information flow from X 2 to X 1 (T 2→1 ) and that from X 1 to X 2 (T 1→2 ). Output: T 2→1 , T 1→2 . generate three datasets, and respectively calculate the causalities between X 1 and X 2 . It is found that in all these cases, |T 1→2 | is nearly an order of magnitude larger than |T 2→1 |. Further, we adopted the same significance test as [14]. |T 1→2 | in the 3 cases are all passes the 99% significance test, while there are no cases that |T 2→1 | passes even the 80% significance test. We remark that in Case 1, using Algorithm-IF or (3) gives exactly the same result, indicating that time series data is just a particular case of panel data. By Table 1, Algorithm-IF for these panel data is robust.

B. A HIGHLY NONLINEAR PROBLEM
In deriving (13), a linear assumption is invoked. That is to say, strictly speaking, Algorithm-IF computes linear causality. Since (3) has been evidenced remarkably successful in highly nonlinear problems, we here test (13) and Algorithm-IF with such a dataset.
The panel data set is generated with a one-way coupled anticipatory map. This is a highly chaotic system designed by Hahs and Pethel [28] which fails the existing causal inference techniques then: where, and f 2 means that the logistic map f applies twice, α is a parameter called the ''anticipation parameter''. Picking ε = 0.3, α = 0.8, an example series pair is shown in Fig. 3. From (15) obviously X 1 causes X 2 , but not vice versa. However, Hahs and Pethel [28] showed that, with the existing technique, the causality thus inferred becomes widely off the track as α increases on α ∈ [0, 1]. When α > 0.5, not only the computed causality from X 2 to X 1 becomes dominating that from the other way around. We hence generate some panel data sets with this touch-stone system to test our algorithm. The anticipation parameter α takes value from 0 to 1 every 0.1. Like the linear runs for each α, with the initial conditions as: For each group of runs, the system is iterated by 10,000 times, when the resulting X 1 and X 2 are recorded.We check three cases with this map: cases 4, 5, 6, which are the same as cases 1, 2, 3, respectively, but with the nonlinear anticipatory system. The two time steps for case 5 are 9998 and 10000, respectively. Fig. 4 is the absolute value of the information flow (|T 2→1 | and |T 1→2 |) under the different cases and different anticipation parameters. The information flows with the panel data (no matter with large N, small K or the large N, large K ) are similar as the result of the time series information flow as obtained by Liang (2014). Most importantly, |T 2→1 | is very small throughout, though not exactly zero (perhaps due to the linear model used). Secondly, for 0 ≤ α ≤ 0.3 or 0.8 ≤ α ≤ 1.0, |T 1→2 | is much larger than |T 2→1 |, indicating a one-way causality in a consistent way. This is in sharp contrast to the counterintuitive result of spurious causality as discovered by Hahs and Pethel [28]. When 0.4 ≤ α ≤ 0.7, the information flow from X 1 to X 2 is quite small. But even in such situations, the |T 1→2 | / |T 2→1 | in all the cases are all no less than 1.5, and, besides, T 1→2 passes the 99% significance test, while T 2→1 does not pass the 95% significance test. In a word, though with a linear assumption, Algorithm-IF can capture the causality among an otherwise highly nonlinear panel dataset in a consistent way.

V. REAL PROBLEM APPLICATION
So far panel data are mostly investigated in economics. For this reason, we apply Algorithm-IF to a problem on economy versus energy. Specifically, it is about the causal relationship between economic growth, trade openness and energy consumption, based on the data of 15 Asian countries (Pakistan, India, Bangladesh, Sri Lanka, Philippines, Thailand, Indonesia, China, Malaysia, Japan, Jordan, Iran, South Korea, Nepal and Vietnam) over the period of 1980-2011. The problem has been studied by [29], hereafter NA14. They found a bi-directional causality among the above four factors ( Table 7 in NA14). Here, we re-examine the problem with the above proposed new algorithm based on a rigorously developed theory.
A detailed description of the data is referred to NA14. Briefly, energy consumption is measured by the Kg of oil equivalent per capita; economic growth is by real GDP per capita in constant international dollar; exports (US$) plus imports (US$) divided by population is used to measure trade  openness; the price of Dubai crude oil (US$) deflated by the country's consumer price index (100 in the year of 2005) is used as a proxy for energy price due to the unavailability of energy price data. Data on energy consumption per capita, merchandise exports, merchandise imports, consumer price index and population are obtained from World Development Indicators (2013) of the World Bank. Data on real GDP per capita are collected from Penn World Tables Version 8.0 [30] and Dubai crude oil price data are taken from British Petroleum's 2013 statistical review of world energy [31].
We calculate the information flows/causalities among the four factors with our Algorithm-IF. Similar to the Granger causalities as computed in NA14, we regard the causality with a p-value of information flow less than 0.05, 0.10, 0.15 as, respectively, strong, normal, and weak causality. The results are tabulated in Table 2, with information flows significant at an 85% confidence level blackened. For easy illustration, the causal relation is summarized in Fig. 5. From its economic growth and energy consumption are mutually causal, but the causality between economic growth and trade openness, and that between economic growth and energy price are oneway. Specifically, there is a strong bidirectional causality between economic growth and energy consumption, a strong unidirectional causality from trade openness to economic growth, and a weak unidirectional causality from energy price to economic growth. The first two are significant at a 99% confidence level; the third is significant at an 85% level. All other causalities (in total there could be 4 × 3 = 12 causalities) have not passed the significance test at the 85% confidence level, particularly, energy price (oil price) has no direct causal relationship with either energy consumption or trade openness, though it does exert a limited impact on the economic growth (significant at 85% confidence level; indicated by dashed line).
The above inferred causal relations are evidenced by reports in the literature. First, the bidirectional causal relationship between energy consumption and economic growth has been discussed in many papers. Since the energy crisis in 1970s, many studies have confirmed the existence of such a causal relationship, e.g., [32]- [36], among others. Recently in some studies it is argued that no direct causal relationship between energy consumption and economic growth may exist [37]- [39]. Even this is true, most of such studies are based on the data from developed countries. For the 15 countries selected here, most are developing countries. The improvement of people's living standard is bound to the increase in energy consumption. Indeed, other studies based on the data from South Asia [40], [41], Southeast and East Asia [42], [43] all attest to this mutual causal relation.
For these 15 Asian countries, trade openness will not directly affect energy consumption; the converse does not hold, either. However, trade openness can affect the energy consumption by influence the economic growth. This is similar to the conclusion of Cole's [44], who found that trade liberalization promotes economic growth, which then boosts energy demand. It is noted that these 15 countries, especially those from East Asia and Southeast Asia, and India, have taken over a large portion of the manufacture from Europe and the United States since the 1980s, promoting economic growth and henceforth energy consumption through the globalized industrial chain. A slightly counter intuitive finding is that no direct causality between oil price and energy consumption is identified. But with the unidirectional causal link from oil price to energy consumption, oil price can exert impact on energy consumption. This does make more sense than a direct causality from oil price to energy consumption-Based on our observation, we would not drive more just because gasoline becomes cheaper. For oil importing/exporting countries, the rise in oil price is negatively/positively correlated with economic growth [45]. By influencing the economic growth, oil price may affect energy consumption to a certain extent. Sarwar et al. [46] point out that fluctuations in oil price will affect economic growth, but electricity consumption can compensate for this effect to a certain extent. This is also the possible reason why oil price has only a weak impact on economic growth.

VI. CONCLUSION
Since it was found that information flow (IF) and causality are real physical notions and can be formulated on a rigorous footing (see [12]), many efforts have been made to put it to application to the important field of causal inference in data science. In this study, we generalized the method for time series, as established by Liang [14], to causal inference for homogeneous and i.i.d. panel data. The generalization is mathematically rigorous but straightforward, and the resulting formula bear the same form as that for time series, though the meanings of the symbols differ. We then proposed an algorithm, Algorithm-IF, for homogeneous and i.i.d. panel data causality analysis.
The algorithm has been validated with panel data sets from a linear stochastic model and a highly chaotic deterministic system. Three kinds of datasets, namely, time series, temporal style long sequences, and panel literature, have been generated and used for the validation. We found that in all these cases, the algorithm proves to be successful. Particularly, the performance with a touch-stone highly nonlinear problem proposed by Hahs and Pethel [28] turns out to be remarkably successful, though currently a linear assumption is made, in sharp contrast to the classical inference problem as discovered by Hahs and Pethel [28].
As a real-world application, we applied the algorithm to examine the causal relation among economic growth, energy consumption, trade openness, and energy price based on 15 Asian countries over the period 1980-2011. It is found that there are a strong bidirectional causality between economic growth and energy consumption, and a strong causality from import and export trade to economic growth.
Energy price does not have a direct impact on energy consumption, but it does exert a limited effect on the latter through influencing economic growth. These inferred causal relations are rather robust, and have been well justified by previous studies and observations. Some issues remain. Recall the assumptions we have made in Theorem III.1, homogeneity and independence (and identical distribution). But a general panel dataset may be heterogeneous and may be subject to pervasive crosssectional dependence. For heterogeneous panel data, where some individuals may be causal while others may not be (e.g., [47]), more than one dynamical system should be taken into account in arriving at the information flow. For panel data with cross-sectional dependence, whereby all units in the same cross-section are correlated due to, for instance, the presence of common shocks and unobserved components that have been taken as part of the error ( [48], [49]), the problem becomes more severe. These issues, among others, are to be investigated in future studies.

ACKNOWLEDGMENT
The constructive comments from the anonymous reviewers are appreciated.