Identifying Influential Nodes in Two-mode Data Networks using Formal Concept Analysis

Identifying important actors (or nodes) in a two-mode network often remains a crucial challenge in mining, analyzing, and interpreting real-world networks. While traditional bipartite centrality indices are often used to recognize key nodes that influence the network information flow, they frequently produce poor results in intricate situations such as massive networks with complex local structures or a lack of complete knowledge about the network topology and certain properties. In this paper, we introduce Bi-face (BF), a new bipartite centrality measurement for identifying important nodes in two-mode networks. Using the powerful mathematical formalism of Formal Concept Analysis, the BF measure exploits the faces of concept intents to identify nodes that have influential bicliques connectivity and are not located in irrelevant bridges. Unlike off-the shelf centrality indices, it quantifies how a node has a cohesive-substructure influence on its neighbour nodes via bicliques while not being in network core-peripheral ones through its absence from non-influential bridges. Our experiments on several real-world and synthetic networks show the efficiency of BF over existing prominent bipartite centrality measures such as betweenness, closeness, eigenvector, and vote-rank among others.


Introduction
In today's world, complex real-life systems are ubiquitous.For example, mobile phone as well as Facebook and Twitter networks facilitate to us the way we interact with one another.Airline and railway networks provide us with the most efficient modes of transportation while also highly reducing travel times.The energy and electric power networks play a significant role in supplying our domestic and industrial lives.Most of these systems frequently feature two types of data with complex substructures and can thus be represented as two-mode networks (also known as bipartite graphs or affiliation networks).Due to the complex structure of such networks, the spread of information across the network makes some nodes more important than others in certain contexts.As such, the interesting question of how to measure the relative importance of nodes in a two-mode network is often increasingly challenging in the field of complex network analysis (CNA).As it is frequently used to understand the role of nodes within a network, node centrality analysis can provide efficient answers to this question.The centrality measure ranks nodes based on how they influence or are effected by other nodes via their connection topology.Since no consensus holds on a unique definition of centrality for two-mode networks, while opening the door for the invention of new ones, various centrality measures have been proposed in the CNA literature (cf. [Jackson, 2010, Jalili et al., 2015, Oldham et al., 2019] for a detailed survey), each of which takes into account a distinct aspect of a central node.In the mainstream CNA research area, the bipartite centrality is frequently classified as local or global.
The local centrality metrics focus on the relative importance of the node in its neighbourhood within local cohesive communities.For example, the degree centrality [Borgatti and Everett, 1997] is a basic local metric that counts the number of links that each node has.However, it frequently captures irrelevant local information about a node in practice.Intuitively, it is assumed that only the node with the highest degree should be in the centre (because it is the most densely linked node i.e., a hub), but it does not account for the cascade effects of its neighbour nodes.Hence, it is sometimes necessary to remove nodes with high degree values because they provide no information.For example, Angelina Jolie has a high degree centrality in Facebook's network because so many people follow her; however, if you explore your friends' Facebook pages to find out what they are interested in or who among them enjoy soccer the most, Angelina Jolie becomes completely irrelevant in that network.
The k-shell centrality [Kitsak et al., 2010] is a community-based local centrality that enhances the degree of a node in terms of its neighbourhood connections using the k-core 1 .Thus, the higher the portions of k-cores contain a node, the more likely it is to be a hub in the cores of a network, and thus the more important it is in a network.However, k-shell frequently produces inaccurate results when the network structure has a small number of k-cores, which is prevalent in two-mode networks.This is due to the fact that in this case, many nodes are assigned an approximately equal number of k-cores.From the perspective of the topological graph of a two-mode network, k-bicliques may be more accurate graphical components than k-cores.That is, the number of k-bicliques among a node's neighbours is counted in order to estimate its importance using the Cross k-bicliques connectivity measure, which quantifies how the node affects information propagation through the network.However, in general, its calculation requires an exponential time and space complexity and is often sensitive to the k parameter.To compute Cross k-bicliques connectivity for a given node, we must first extract all k-bicliques from the network containing this node, which is an NP-hard problem [Chiba and Nishizeki, 1985].Furthermore, the determination of the optimal value of k may be problematic in many applications.Strictly speaking, picking a large k value may result in the overstepping of all k-bicliques with k less than the chosen one, leadingto an underestimation of the influence of other nodes in local cohesive communities within the network.A small k value may stimulate overestimation of the importance of other neighbour nodes, generating a behaviour similar to degree centrality.
Bipartite Closeness [Borgatti andEverett, 1997, Borgatti andHalgin, 2011] is a common type of local centrality that is based on the geodesics.It computes the reciprocal of the sum of the distances between the node and all of the other nodes in the network.Its basic form intuitively assumes that information can efficiently flow from one node to every other node via the shortest distances.The important node is therefore the independent one that is close to other nodes in the network in terms of shortest paths.Thus, at a high level, it can address the degree centrality limitation in a few cases.However, on non-spatial networks, bipartite closeness frequently produces inaccurate results [Rodrigues, 2019], and its values on spatial networks tend to span a rather small dynamic range from smallest to largest.This is because most complex real-world networks may have a high average length of the shortest path as their largest distance increases exponentially in terms of the number of nodes.That is, assuming that the minimum distance is equal to one, the asymptotic ratio between the minimum and the largest distances is O( 1 log n ).This frequently implies that numerous nodes, with diverse roles in the network's information flow, may have comparable closeness scores.On the contrary, most non-spatial networks feature low geodesic distances among nodes given that high geodesic distances increase logarithmically with their network size.As a result, the dynamic range of variations, as well as the network diameter, will be too small, and even slight changes in the network structure can have a significant impact on nodal closeness values.
The bipartite Betweenness [Brandes, 2001, Borgatti and Everett, 1997, Borgatti and Halgin, 2011] is another common geodesic-based measure.To evaluate the importance of a node, it computes the number of times it exists in the bridge along the geodesic paths among the other nodes in the network.Thus, it considers other nodes' dependence on a given node, and measures its optimal flow control on information passing among nodes, whether Closeness perceives the connection efficiency or independence from potential flow control through the use of intermediary nodes (cf.[Brandes et al., 2016], a detailed study differentiating between closeness and betweenness).In general, bipartite betweenness does not consider node connectivity and its calculation is frequently time-consuming.The fundamental assumption of betweenness is that every pair of nodes exchanges information through shortest-paths with equal probability.However, this is, in many situations, not a realistic assumption since information does not necessarily take the shortest path [Newman, 2005] (e.g., news related to a friend might not be directly known from another close friend but from other mutual friends).As a result, it does not provide a precise representation of the most influential nodes within these groups, but rather a fair approximation (see [Newman, 2018] for a more detailed explanation).Furthermore, its exact centrality computation on large or dense two-mode networks requires a time complexity of O(n 3 1 + n 3 2 ), where n 1 and n 2 are the number of the two types of nodes, respectively.
Looking at local centrality from a different angle, bipartite percolation centrality [Piraveenan et al., 2013] estimates a node's relative importance by counting the number of percolated paths that pass through it.The percolated path is the shortest path between two nodes in which the source node is percolated (e.g., infected) but the target node may be not.The percolation centrality fully captures the essential mechanics of contagion-mediated network spreading by associating percolation paths with weight terms that determines how much importance is given to potential percolation paths originating from given nodes.This is indeed helps percolation centrality to avoid the limitation of both betweenness and closeness, which rely solely on topological and random diffusion processes via random shortest-paths.It may, however, produce poor results when the spread of contagion has no effect on changing the node state, and it is frequently computationally expensive to calculate.Because the percolation through a network is affected by both the level of contagion and the network structure [Meyers, 2007], the spread of contagion in a complex network (CN) may not change node states in a few scenarios.From a theoretical standpoint, there is a possibility that there is no transmissibility, and in this case, the percolated contagion spreads over the edges of a complex network without changing the state of a node to either recoverable or infected, leaving them in the default state.Moreover, computing the percolation centrality in worst-case scenarios with large bipartite networks having complex local structures requires a cubic time complexity in the two types node numbers.
Global measures, on the other hand, consider a node's prominence in the context of the entire network.Its principle emphasizes the hypothesis that a few important neighbours can weight more than a large number of unimportant ones.That is, a node is important if it is connected to other important nodes.For example, Bipartite Eigenvector centrality [Borgatti andEverett, 1997, Borgatti andHalgin, 2011] quantifies whether a node is central based on its connections to other high-score nodes.It estimates the number of traversals of each node through indefinite-length random walks.Intuitively, this implies that the node in the network core is more accessible than the other nodes.From a conceptual standpoint, a node's eigenvector can be thought of as the global extension of its local degree centrality, in which both count walks that begin and terminate at that node.Eigenvector may include a localization transition, which frequently results in inaccurate centrality scores.As demonstrated in [Martin et al., 2014], eigenvector centrality has a localization transition under the common conditions of a network regime, causing the majority of the weight of the centrality to concentrate on a small number of nodes in the network.This implies that when a network structure contains many hubs, the eigenvector weights are skewed toward some few nodes: the hub node and its neighbours have the highest eigenvector values, while the remaining nodes have identical centrality values (likely close to zero).
In this paper, we present Bi-face (BF), a new bipartite centrality that can be used to identify key nodes in complex networks.While we focus on two-mode networks here, but in tandem with its formulation for one-mode networks which we present in [Ibrahim et al., 2020], its general framework can be easily modified and applied to other representations of CNs, such as multidimensional and multilayer networks [Dickison et al., 2016].The guiding idea of BF is to use a formal concept analysis framework to bring together the centrality aspects of cohesiveness via bicliques, network flow via bridges, and influence of important neighbour nodes for the benefit of actionable node identification.Its conceptual hypothesis is based on the fact that important nodes should be found in influential bridges and overlapping bicliques with a large number of important nodes.That is, it quantifies how a node affects, and is affected by, its important neighbours via bicliques while also connecting the densely substructures of a network through its presence in influential bridges.Thus, it differs from betweenness in that it deems influential bridges rather than all bridges.Unlike closeness and eigenvector, it can efficiently deal with the diverse topological structures of a network, without potentially having localization transition, due to this hybridization of the influential bridges and overlapping bicliques aspects.Furthermore, it leverages the powerful mathematical formulation of Formal Concept Analysis (FCA) to overcome the limitation of Cross k-bicliques connectivity.That is to say, it utilizes concept lattice related to the network to efficiently extract concepts that capture bridges and k-bicliques from the network while being insensitive to the k parameter.Technically, CF2 computation is based solely on the set of these extracted concepts, which is often quite small in comparison with polynomial functions in terms of nodes and edges.As a result, in contrast to percolation, it is relatively quick to compute in practice.
The paper is organized in the following manner.Section 2 recalls some basic definitions of FCA and traditional bipartite centrality measures.Section 3 explains our proposed Bi-face centrality for identifying key nodes of two-mode networks in further more detail.In Section 4 we conduct a thorough experimental study and a discussion.Finally, Section 5 presents our conclusions.This section will briefly review the main concepts that support the comprehension of our proposed centrality measure by using an illustrative example, which is a two-mode network of airline alliances and their flying destinations in the year 2000.As shown in Figure 1, the network is modeled as an undirected bipartite graph Υ = (G, M, I), where G is a set of 13 objects (also called type-I nodes) representing airline companies, M is a set of 9 attributes (type-II nodes) representing flying destinations, and I is a set of edges where an edge (u i , v j ) ∈ I links two nodes u i ∈ G and v j ∈ M, if a flight from airline company u i landed at the destination v j .
Figure 1: A two-mode graph network representing flights from 13 airline companies (in red) landing at 9 destinations (in green) in Year 2000.

Formal Concept Analysis
In the following we recall notions of FCA [Ganter and Wille, 1999] that will be used in this paper.Figure 2 is the formal context equivalent to an adjacency matrix that expresses the two-mode network shown in Figure 1.
Given arbitrary subsets A ⊆ G and B ⊆ M, the following derivation operators are defined: where A is the set of attributes common to all objects of A and B is the set of objects sharing all attributes from B. The closure operator (.) implies the double application of (.) , which is extensive, idempotent and monotone.The The object concept g ∈ G is expressed by γg := (g , g ) and the attribute concept of m ∈ M is defined by (1) In this case, c 2 is called a superconcept (or successor) of c 1 , and c 1 is called a subconcept (or predecessor) of c 2 .The set of all concepts of the formal context K is expressed by C(K) or simply C.
Definition 2.4 (Concept Lattice).The concept lattice of a formal context K, denoted by B(K) = (C, ), is a Hasse diagram that represents all formal concepts C together with the partial order that holds between them.In B(K), each node represents a concept with its extent and intent while the edges represent the partial order between concepts.
Figure 3 is the Hasse diagram of the concept lattice that corresponds to the context of Figure 2.More precisely, it is a diagram with reduced labeling.This means that the label g is written below γg and m above µm.The extent of a concept represented by a node a is given by all labels in G from the node a downwards, and the intent by all labels in M from a upwards.
There are several methods (cf.[Ganter and Wille, 1999, Valtchev et al., 2002, Choi, 2009]) that build the lattice, i.e., compute all the concepts together with the partial order.
Definition 2.5 (Lower and Upper covers).For any two formal concepts or We will use U(c) and L(c) to denote the sets of upper and lower covers of the formal concept c respectively.
Definition 2.6 (Concept Intentional Face [Pfaltz and Taylor, 2002]).The intentional face is the difference between their intent sets as: is the difference between their extent sets as: Definition 2.8 (Blocker [Pfaltz and Taylor, 2002]).Given the family of faces Λ c , the set Z is said to be a blocker of Λ c if ∀f i ∈ Λ c , f i ∩ Z = ∅, and the blocker Z is said to be minimal if Definition 2.9 (Generator [Bastide et al., 2000]).Given a concept c = (A, B) in a formal context K = (G, M, I), a subset H ⊆ B is called a generator of c iff H = B, and it is a minimal generator when We use H ex c and H in c to denote the sets of minimal generators of a concept c w.r.t.its extent and intent respectively.

Social Network Analysis
Definition 2.10 (Biclique).Let Υ = (G, M, I) be an undirected bipartite graph defined over the objects G and attributes M.
The disjoint subsets Q = ({AirCanada, Mexicana, ThaiAirways, UnitedAirlines}, {LatinAmerica, Caribbean, USA}) is an example of a biclique.In the sequel, we use Q as our illustrative biclique (see the lattice node indicated by a red arrow in Figure 3) to support the understanding of definitions and principles related to the Bi-face centrality.
Definition 2.11 (Bridge).An edge (u, v) ∈ I of a two-mode data network Υ is a bridge iff it is not contained in any cycle and its removal increases the number of connected components in the graph Υ.
For instance, the edge (AnsettAustralia, AsiaPacific) represents a bridge in Υ.
Definition 2.12 (Bipartite centrality measure).The centrality measure of a type-I node u ∈ G is a function that assigns a positive real number to u quantifying its centrality w.r.t. to all other type-II nodes v ∈ M in the network Υ (and vice versa).
The bipartite (also called two-mode) centrality measures are frequently used to identify and rank key nodes in two-mode networks.While several centrality measures have been introduced, the degree, closeness, betweenness and eigenvector have been found to be the most prominent in several applications, and they thereby are commonly used.
Definition 2.13 (Degree centrality D c [Borgatti andEverett, 1997, Tsugawa andOhsaki, 2015]).The degree centrality of a node in a two-mode graph network Υ, is defined as: where I ij is equal to 1 when a link exists between u i and v j , and 0 otherwise.Thus, the summation in Eq. ( 4) represents the number of edges (or ties with other type neighbour nodes) involving the node.Definition 2.14 (Closeness centrality C c [Borgatti andEverett, 1997, Borgatti andHalgin, 2011]).The normalized closeness centrality of a node g i , in a two-mode graph network Υ, is defined as: where d(u i , v j ) is the geodesic distance (shortest path) between the nodes u i and v j .Definition 2.15 (Betweenness centrality B c [Brandes, 2001]).In bipartite networks Υ, the normalized betweenness centrality of a node is defined as in [Borgatti and Halgin, 2011]: where σ xj x k denotes the total number of shortest paths between nodes x j and x k , and σ xj x k (x i ) is the number of those paths that traverse g i .To normalize the betweenness, we simply divide B c (u i ) and B c (v j ) by the corresponding term to its node set [Borgatti and Halgin, 2011]: where s = (|G − 1| div |M|) and t = (|G − 1| mod |M|), where p = (|M − 1| div |G|) and r = (|M − 1| mod |G|) Definition 2.16 (Eigenvector centrality EV c [Borgatti andEverett, 1997, Borgatti andHalgin, 2011]).The eigenvector centrality of a node g i , in a graph network Υ, can be iteratively computed as: where the eigenvalue λ = 0 is a constant, and a uivj is the adjacency element which is equal to 1 if node u i is linked to node v j , and 0 otherwise.

Bi-face Framework
At a conceptual level, our overall Bi-face centrality approach contains the following basic steps.

Building the Formal context of a Two-mode Network
We first build the formal context of the two-mode network Υ = (G, M, I) by computing the adjacency matrix as follows: In Eq. ( 14), we assign 1 to the element of K in the row i and column j if the object u i (node type-I) is linked to the attribute v j (node type-II) in the network Υ.Otherwise, we assign 0 to it.For example, the constructed formal context K of our toy graph in Figure 1 is represented in the table of Figure 2. We then construct the concept lattice B( K) from the formal context, as it is shown in Figure 3.Note that Figure 3 shows the Hasse diagram of B( K) with reduced labelling, where the label g is written below γg and m above µm.The extent of a concept represented by a node a is given by all labels in G from the node a downwards, and the intent by all labels in M from a upwards.

Overlapping Biclique Extraction and Refinement
Using the constructed lattice B( K), it is now possible to extract concepts that capture the corresponding bicliques of the two-mode network as follows: proposition 3.1.Given a network Υ and its corresponding concept lattice Proof.A concept represents a unit rectangular matrix of size |A| × |B| -as a sub-matrix of the adjacency matrix -and hence a biclique since it is a maximal rectangular in the formal context.Assume now that Q = ({u : u ∈ A}, {v : v ∈ B}) is a biclique of Υ.Then, from Definition 2.10, for any two different nodes u, v ∈ Q, there exists an edge (u, v) in Υ that links the two nodes.Based on Eq. ( 14), the obtained |A| × |B| adjacency matrix K({u : u ∈ A}, {v : v ∈ B}, I Q ) that expresses the biclique Q obviously represents a sub-matrix consisting of all 1's.Such a sub-matrix coincides with the concept c = (A, B) in which both extent A and intent B involve only the objects {u : u ∈ A} and attribute {v : v ∈ B} nodes of Q respectively.This entails that a biclique Q is identical to a concept c = (A, B).
An interesting question that could be raised now is how to determine the non-influential nodes in a given concept (or biclique).To answer this question, let us define a non-influential node from the viewpoint of FCA.Definition 3.1 (Non-influential node).For a formal concept (biclique) c i = (A i , B i ) ∈ C, a type-I node u ∈ A i is non-influential if its removal from c i (and accordingly from the graph G) does not violate the closure conditions of other biclique concepts C \ {c i } that involve it: In a dual manner, a type-II node v ∈ B i is non-influential if : That is, the subset of concepts (or bicliques) that contain either node u or node v still maintain their local conceptual structures even after removing u from their extents or v from their intents.Intuitively, this means that the node u or v is not important since taking it off from the graph does not affect the essential connectivity of the network (e.g., the collapsing of other concepts).In fact, Definition 3.1 raises another interesting question of how to determine the non-influential nodes in a given biclique.Fortunately, the faces of its corresponding concept, w.r.t.its upper and lower covers, can provide information as to what its non-influential nodes would be.Thus, an effective strategy here to answer this question is to contrast the corresponding concept (biclique) with its lower and upper covers through extensional and intentional faces to identify its potential non-influential type-I and type-II nodes respectively.That is, the set of faces of its concept c i = (A i , B i ), w.r.t.its lower and upper covers, share the non-influential (type-I and type-II) nodes in its (extent and intent) respectively: For instance, the corresponding concept of Q has two extensional faces f 1 ex = {T haiAirways} and f2 ex = {M exicana} respectively.The intersection of these two faces is empty, which means that there is no non-influential type-I nodes in the Q.It also has only one intensional face f 1 in = {Caribbean}.Thus, the intersection is also f 1 in , which entails that Caribbean is a non-influential type-II node in the Q.
On the basis of Equations ( 17) and ( 18), we can leverage the faces of concepts to define a key biclique 2 as follows: Definition 3.2 (Face Biclique).Given a two-mode network Υ and its corresponding concept lattice B( K), a concept (representing a biclique) c = (A, B) ∈ B, is called a face biclique if all of its (type-I and II) nodes are influential, i.e., no one of them satisfies the conditions in Equations ( 17) and (18).
Based on Definition 3.2, we can obtain the face biclique ĉ = ( Â, B) by refining the original biclique c = (A, B) as follows: In Equation ( 19), we remove non-influential type-I nodes from its extent and non-influential type-II nodes from its intent.It is worth noting that when the extent or intent contains only one node, no refinement is applied because this node is influential by default.This is due to the fact that removing this node clearly violates the closure conditions in Equations ( 17) and (18).For instance, the attribute concept c = ({AirCanada, AirNewZealand, AllNippnA, TheAustrianAG, BritishMidland, Lufthansa, ScandinavianA, SingaporeA, ThaiAirways, UnitedAirlines, Varig}, {Europe}) that appears in blue/black in Figure 3 has an extensional minimal generator set H ex c = {BritishMidland}.This implies that BritishMidland (in yellow in Figure 3) is a terminal (type-I) node and the edge (BritishMidland, Europe) represents a non-influential (face-I) bridge.Similarly, we have: Output: Set of minimal generators Gen ← ∅;

8:
for each h i in H in c do 9: Gen ← (Gen ∪ {h i }); The question now is, how can we obtain the minimal generators of object and attribute concepts?We can efficiently compute the set of minimal generators H in c of a concept c intent by applying Minigen() procedure, which is given in Algorithm 1.It iteratively calculates the face of c w.r.t. each upper cover in U(c) (Line 3).If the set of intentional minimal generators is empty, it then assigns the individual attributes in the first face to H c (Lines 4-5).Otherwise, it progressively checks the intersection between the calculated face f u and each generator h i in H in c (Line 8).If the intersection with the current generator h i is empty, then h i is not in the family blocker formed by the face (Line 9).This entails that the generator h i must then be modified so that it belongs to the minimal blocker family of faces.Thus, the new minimal generators will be obtained by adding each element of the current face f u to h i (Line 10).If the intersection is not empty, then the current generator h i , which exists in the family of minimal blockers of previous faces, is also a minimal blocker of the family formed of the the current face f u .So, we add the generator h i , without performing any modification to the minimal generator set H in c (Line 12).It ultimately verifies the minimality of the obtained set (Line 15) and returns the final set of minimal generators H in c (Line 18).Note that, in a dual way and using the set of concept's lower-covers L(c), we can apply Minigen() procedure to compute the set of extensional minimal generators H ex c of a concept w.r.t.its extent A.

Bi-face Centrality
Definition 3.5 (Bi-face Centrality BF c ).The Bi-face centrality of nodes u ∈ G and of v ∈ M, in a given graph network Υ, can be computed as: Ĉ stands for the set of face bicliques while Γ I and Γ II represent the two sets of non-influential (face-I) and (face-II) bridges, respectively.In Eq. 22, the Bi-face centrality computes the sum of face-biclique3 and Face-bridge terms.
The numerator of the face-biclique of the first term simply counts the number of refined concepts, with extent and intent sizes greater than 1, that involve a type-I node u.Thus, it quantifies the portion of face bicliques, in the graph network Υ, which the node u belongs to.From a conceptual perspective, this term can be considered as an efficient way of computing the cross connectivity [Faghani andNguyen, 2013, Everett andBorgatti, 1998] of the node u using refined overlapped bicliques that only contain influential nodes.In the face-bridge term, we first quantify the ratio of the face bridges that involve the node u.This ratio is then subtracted from 1 to approximate the portion of influential bridges in the graph that contain the node u.Note that the numerators of both face biclique and Face-bridge terms are unnormalized quantities.Thus, the denominators in Eq. 22 serve as normalization constants to scale the two terms between 0 and 1.In a similar manner, the Bi-face centrality in Eq. ( 23) can be interpreted and used to compute the centrality of type-II nodes in the graph.
Algorithm 2 gives the pseudo-code for computing the Bi-face centrality of all type-I nodes in the two-mode network Υ.
The algorithm takes as input the set of all extracted concepts C = c j = (A j , B j ) |C| j=1 .For each type-I node u i ∈ G, it first iteratively refines the extents of the bicliques to obtain the face ones by removing all their non-influential type-I nodes (lines 4-5).It then counts the number of those refined face bicliques in the graph that involve u i (lines 7-9).Hereafter, it iteratively calculates the minimal generators of the the attribute concepts w.r.t.their extents to identify the face-bridges that involve the node u i (lines 11-12).It then counts the number of those face-bridges that involve the node u i as a terminal (type-I) one (lines 13-15).Subsequently, it computes the Bi-face centrality BF I of a node u i (lines 19-21).Finally, it returns a list containing the Bi-face centrality measures BF I of all type-I nodes in the graph respectively (line 22).Without loss of generality, and in a dual manner, algorithm 2 can be applied to compute the Bi-face centrality for each type-II node v j ∈ M as follows.It iteratively obtains the face bicliques by refining the non-influential type-II nodes from the intents of their corresponding concepts.It then identifies the face bicliques in the graph that involve v j .It then uses the minimal generators of object concepts to count the number of the face-bridges that involve the node v j as a terminal (type-II) one.Finally, it returns a list containing the Bi-face centrality measures BF II of all type-II nodes in the graph.
Algorithm 2 Computing Bi-face centrality (BF c ) for all type-I nodes in a two-mode network.

Experimental Evaluation
The goal of our experimental evaluation is to investigate the following key questions.
• (Q1) Is the Bi-face centrality more accurate than the state-of-the-art centrality measures?• (Q2) Is Bi-face centrality performing fast compared to prominent centrality measures?
To find robust answers, we first select the following four (real-life and synthetic ‡ ) two-mode networks which have different configurations, and they thereby facilitate the validation of various scenarios.

Datasets
• Norwegian Interlocking Directorates [Seierstad and Opsahl, 2011], which contains interlocking boards of 1542 Norwegian director women in 373 Norwegian public limited companies.A link represents a board membership connecting a woman as a director of a public company in Norway on August 2009.• PediaLanguages [Morsey et al., 2012] involves the semantic web of 316 official languages spoken by people living in 169 different countries.An edge connects an official language to a country if people in that country speak that language.
• Southern-Women-Davis [Borgatti, 2009, Freeman, 2003], which is a two-mode social network of 18 women reporting their participation in 14 events (such as a meeting of a social club, a church event and a party) over a nine-month period.A woman is connected to an event if she attends that event • ‡ CoinToss, which is a random bipartite network generated by indirect Coin-Toss model generator [Felde et al., 2020].
A few statistics of the networks is summarized in Table 1.4  .13], a state-of-the-art centrality that assesses the importance of a node based on its connections to other highly influential nodes in a network.• Vote-Rank [Zhang et al., 2016], which is a well-known method for identifying decentralized spreaders.It calculates the ranking of the nodes in the bipartite graph based on a voting scheme.That is, at each turn, all nodes iteratively vote in a spreader.The node with the highest votes is elected iteratively, while decreasing the voting ability of the elected spreader' neighbours in the the next turn.• Percolation [Piraveenan et al., 2013], which measures the proportion of percolated paths5 that go through a given node.So, it quantifies the relative impact of nodes in various percolation scenarios based on their topological connectivity over time.The percolation state is commonly assigned a value between 0.0 and 1.0, with 0.5 being the most common that we used in our experiment.• Bipartite Degree [Definition 2.13], which can serve as a good baseline for comparison.
To evaluate the lists of (type-I and type-II) nodes ranked by all the centrality measures, we need to compare them with the corresponding ranked lists that are obtained by the real spreading process of the nodes.Thus, we applied the following traditional schema on each individual type of nodes [Chen et al., 2012, Zhao et al., 2019, 2020] to validate the performance of a tested centrality measure: 1. Compute the centrality measure for all nodes, and then record the node ranking list 2. Use SIR model [Chen et al., 2012] to simulate the spreading ability of the nodes.In the SIR model, every node belongs to one of three states: susceptible, infected, or recovered.At each step, we set only one node to be infected, the other nodes are susceptible nodes, and then investigate the information spreads in the network.Every infected node can infect its susceptible neighbours with spreading (also called infection) probability.Note that instead of considering the recovered state of each node, we focus on the influence within a time t = 10 since the spreading in an early stage is found to be more important in practice.At the end of the SIR simulation process, we calculate the spreading efficiency for every node, and then record the node influence ranked list 3. Based on the centrality-based ranking list and the one generated by the SIR model, we record the joint score list B = {(x i , y i )} n i=1 , where x i and y i are the centrality-based and SIR-based measures of a node g i ∈ G, respectively.For any two randomly selected pairs (x i , y i ), (x j , y j ) ∈ B, if both (x i < x j ) and (y i < y j ) or if both (x i > x j ) and (y i > y j ), they are said to be concordant.If both (x i < x j ) and (y i > y j ) or if both (x i > x j ) and (y i < y j ), they are said to be discordant.If (x i = x j ) and (y i = y j ), then the pair is neither concordant nor discordant.
Consequently, we calculate the following Kendall's tau rank correlation coefficient τ metric: where n c and n d are the number of concordant and discordant pairs in B, respectively.A high τ value indicates that the centrality measure could produce an accurate ranked list.The ideal case is when τ = 1 where the ranked list generated by the centrality measure is symmetrical to the ranked list generated by the real spreading process.To evaluate the accuracy of the results, we now calculate the average Kendall's tau rank correlation coefficient as follows: where τ I and τ II are the Kendall's tau correlation coefficients calculated using Eq. ( 24) for type-I and type-II of nodes, respectively.
To assess the scalability, we consider the average elapsed time metric as: where t i and t j are the elapsed times for computing the underlying centrality measure of a type-I node u i ∈ G and a type-II one v j ∈ M, respectively.
All the experiments were run on an Intel(R) Core-i7 CPU @2.6GHz computer with 16 GB of memory under MacOS Mojave.We implemented all the considered indices as an extension to NetworkX Python package.To extract formal concepts we make use of the Concepts 0.7.11Python package, which is implemented by Sebastian Bank6 .

Experiment I.
This experiment is devoted to answering Question 1.Each infected node has a spreading probability β of infecting its susceptible neighbours in the SIR model simulation.As a result, and in accordance with the scheme described above, we iteratively increase the spreading probability in the range β = (0, 0.1] with increments of 0.01.At each step-size, we compute the joint list B of each centrality measure and the real spreading of the nodes for each individual type of nodes separately.We then calculate the corresponding evaluation metric τ in Eq. ( 25).
Figure 4 displays the average Kendall's tau correlation coefficient τ between the seven tested centrality measures and the ranking list generated by the SIR model, with a spreading probability β ∈ (0, 0.1] and at a given time t = 10.Overall, Bi-face outperforms all the compared centrality measures, achieving the most accurate Kendall coefficient τ on Norwegian-Directorate, PediaLanguages and CoinToss networks.On the Women-Davis network, Bi-face has the highest τ value when the spreading probability β ≥ 0.03, otherwise vote-rank, closeness, betweenness and degree slightly compete with Bi-face.The percolation comes close behind Bi-face on Women-Davis, but considerably further behind on Norwegian-Directorate, PediaLanguages and CoinToss networks.Except on the Women-Davis network with spreading probability β < 0.03, the vote-rank is clearly less accurate than Bi-face on all the tested networks, but it is more accurate than percolation, betweenness, closeness, eigenvector and degree on PediaLanguages and CoinToss networks.
On the Norwegian-Directorate and Women-Davis networks, the vote-rank and percolation compete with each other.The percolation is clearly more accurate than betweenness and eigenvector when the spreading probability β ≥ 0.05 on all the tested networks.Both betweenness and eigenvector dominate degree and closeness on Norwegian-Directorate, PediaLanguages and CoinToss networks.The betweenness is more accurate than eigenvector on PediaLanguages network when the spreading probability β ≥ 0.05, but it is outperformed by eigenvector on CoinToss network, and both of them compete each other on Women-Davis and Norwegian-Directorate networks.

Experiment II.
The second experiment is dedicated to answer Question 2. The goal here is to evaluate the performance of the centrality measures.That is, we rerun Experiment I while reporting their computational time as in Eq. 26.The average elapsed time ξ of the seven centrality measures on the four underlying networks is depicted in Figure 5. On all the tested networks, the Bi-face dominates all centrality measures (except degree).It finishes at least twenty-three times faster than betweenness, eleven times faster than percolation, nine times faster than eigenvector and eight times faster than closeness.Degree is very competitive with Bi-face on Women-Davis and CoinToss, but Bi-face clearly prevailed over the degree by a significant margin on Norwegian-Directorate and PediaLanguages networks.Apart from Bi-face, the percolation is marginally faster than both the closeness and vote-rank by at least factors of 1.3 and 1.2 on all networks respectively.In addition, the closeness is considerably faster than betweenness, and competes with eigenvector on Norwegian-Directorate and CoinToss networks.Vote-rank is significantly faster than closeness on Norwegian-Directorate, PediaLanguages and CoinToss networks, but on the contrary, closeness is slightly quicker than it on Women-Davis network.

Discussion
Taking the identification of accurate node centrality into consideration, the results of Experiment I in Subsection 4.3.1 indicate that Bi-face outperforms traditional bipartite centrality measures such as vote-rank, percolation, degree, closeness, betweenness, and eigenvector.This is attributed to the use of its face biclique and face-bridge terms in tandem to leverage local and global aspects of network topology, respectively.That is, the face-biclique term quantifies the structural embeddedness of cohesive regions in a network involving each individual (type-I and type-II) node.From a conceptual perspective, this term considers the local information on how the node influences its immediate important neighbour nodes through the lens of its overlapping face bicliques.The face-bridge term quantifies a node's global role based on how the information flows through influential (face) bridges (i.e., important geodesics).
In terms of effective performance, the results of Experiment II from the previous Subsection 4.3.2,suggest that the Bi-face is considerably faster than all other tested bipartite centrality measures (except degree).In practice, this is because Bi-face primarily calculates the centrality of all nodes based on the set of concepts C, which is frequently too small in comparison to all other tested centrality measures with polynomial time complexity in terms of nodes and edges, i.e., |C| n p and |C| m q , with p, q > 1.Besides that, several well-known observations are clearly consistent with the obtained results in Subsection 4.3.First, in some real-world applications, we may end up with several nodes having approximately equal low or high degrees, and in these cases, degree centrality cannot serve as a descriptive measure that can distinguish between nodes.Second, closeness can address the degree centrality limitation in a few situations.For example, consider node u that is linked to node v. Assume that node v is in close proximity to the other nodes in the network, resulting in a high closeness score.Node u has a very low degree score of 1, but a rationally high closeness score, because node u can propagate information to all other nodes that node v reaches with one extra step.However, closeness, like degree, is usually inappropriate for irregularly connected bipartite networks.Because the shortest-path distance between two nodes is infinite when they are not reachable through a path, the closeness score is equal (or very close) to zero for those nodes in the network that do not reach all other nodes.Third, since betweenness lacks any form of measuring local nodal connectivity, it is expected to produce relevant results only if the goal is only to quantify influence on communication among local groups, which is not always the case when studying the centrality in real-world networks.Finally, and in practice, using the efficient implementation adopted from the fastest algorithm proposed in [Brandes, 2001], the calculation of percolation centrality for all nodes requires a time complexity of O(m 2 (n 1 + n 2 )), which still seems to impose a computational bottleneck even with fairly medium-sized networks.

Conclusion
The detection of influential nodes in a two-mode network is frequently an important task in scientific and industrial data analysis pipelines for explaining various behaviours and outcomes.Our work here addressed an obvious gap in the present CNA literature, namely the efficient identification of key nodes by combining both local cohesiveness and global network flow aspects of centrality through the use of FCA mathematical formalization.On this basis, we devised Bi-face, a new bipartite centrality measure that quantifies the prominence of a node in a two-mode network based on its presence in influential overlapping bicliques and bridges.While we focused on two-mode networks here, the approach can easily be modified to accommodate other complex network representations like multilayer networks.
From a conceptual perspective, the Bi-face score is a distinct centrality in the following three elements: (i) it uses the concept lattice formulation to efficiently extract overlapping bicliques and bridges, (ii) it leverages concept faces to refine bicliques from non-influential nodes and detect influential bridges, and (iii) it exploits the fact that influential bridges and overlapping bicliques with a large number of important neighbour nodes are likely to contain key central nodes.As a result, it measures how a node affects and is influenced by its important neighbours through refined bicliques, while also linking the network dense substructures via its existence in influential bridges.According to a thorough empirical study on several synthetic and real-life two-mode networks (see Section 4), the Bi-face score can identify key nodes more accurately and efficiently than other state-of-the-art centrality indices such as degree, betweenness, closeness, eigenvector, percolation, and vote-rank.
Definition 2.1 (Formal context).It is a triple K = (G, M, I), where G is a set of objects, M a set of attributes, and I a binary relation between G and M with I ⊆ G × M. For g ∈ G and m ∈ M, (g, m) ∈ I holds (i.e., (g, m) = 1) iff the object g has the attribute m, and otherwise (g, m) / ∈ I (i.e., (g, m) = 0).
subsets A and B are closed when A = A , and B = B .Definition 2.2 (Formal concept).The pair c = (A, B) is called a formal concept of K with extent A and intent B if both A and B are closed and A = B, and B = A.

Figure 2 :
Figure 2: The formal context K for the two-mode network of Figure 1.

Figure 3 :
Figure 3: The Hasse diagram of the concept lattice B( K) that corresponds to the context of the two-mode network in Figure 1.More precisely, it is a diagram with reduced labeling.This means that the label g is written below γg := (g , g ) and m above µm := (m , m ).The extent of a concept represented by a node a is given by all labels in G from the node a downwards, and the intent by all labels in M from a upwards.The red downward arrow indicates the illustrative biclique cited after Definition 2.10.

3. 3
Face-Bridge Detection Definition 3.3 (Face-I Bridge and Terminal type-I node).Given a 2-mode network Υ and its corresponding concept lattice B( K), an edge (u, B) represents a non-influential (face-I) bridge containing a terminal (type-I) node u ∈ G when there is an attribute concept c = (A, B) ∈ B( K) with |B| = 1 that satisfies the following: u ∈ A and ∃h i ∈ H ex c S.t.h i = u and |h i | = 1 (20) Definition 3.4 (Face-II Bridge and Terminal type-II node).Given a 2-mode network Υ and its corresponding concept lattice B( K), an edge (A, v) represents a non-influential (face-II) bridge containing a terminal type-II node v ∈ M when there is an object concept c = (A, B) ∈ B( K) with |A| = 1 that satisfies the following: v ∈ B and ∃h j ∈ H in c S.t.h j = v and |h j | = 1 (21) Algorithm 1 Minigen() procedure for computing the intentional minimal generators of a concept intent.Input: Concept intent B, Set of upper covers U(c).
; 21: end for 22: Return BF I Complexity Analysis The calculation of the face biclique term has time and space complexity equal to O(|C|) since we store and proceed through the extent of all the bicliques to count the face bicliques that contain the node.The Face-bridge term of type-I node needs iterating through the attribute concepts C and calculates their minimal generators w.r.t.their corresponding lower covers.Thus, the Bi-face centrality BF I of all type-I nodes requires |G| × |C| + | C| × | L| × | Hex |), where C is the set of attribute concepts, | Hex | is the largest size of an obtained set of minimal generators for attribute concepts, and L is the largest number of lower covers for an attribute concept.Now, since we often have | C| |C| and also | L| |G|, then the first term frequently dominates the second one.This entails that computing the Bi-face centrality BF I of all type-I nodes needs a time and space complexity of O(|G| × |C|).In a dual way, the calculation of the Bi-face centrality BF II of all type-II nodes has a time complexity of O(|M| × |C|).In total, the Bi-face centrality has time and space complexity of O |C| × (|G| + |M|) .

Figure 4 :
Figure 4: The average Kendall's tau coefficient τ between the tested centrality measures and the ranking list generated by the SIR model, with β ∈ (0, 0.1], at t = 10 on the four underlying datasets.

Figure 5 :
Figure 5: Average elapsed time ξ (in secs) of the seven tested centrality measures: Bi-face, closeness, betweenness, degree, eigenvector, percolation and vote-rank on the four underlying datasets.

Table 1 :
A brief statistics of the social networks, which includes the number |G| of type-I nodes, the number |M| of type-II nodes, the number |I| of edges, and the density Θ in %.Subsequently, we compared the results of our proposed Bi-face centrality with the following measures: • Bipartite closeness [Definition 2.14], a prominent diameter-based centrality • Bipartite Betweenness [Definition 2.15], a state-of-the-art geodesics-based centrality • Bipartite Eigenvector[Definition 2