Matrix Tri-Factorization Over the Tropical Semiring

Tropical semiring has proven successful in several research areas, including optimal control, bioinformatics, discrete event systems, and decision problems. Previous studies have applied a matrix two-factorization algorithm based on the tropical semiring to investigate bipartite and tripartite networks. Tri-factorization algorithms based on standard linear algebra are used to solve tasks such as data fusion, co-clustering, matrix completion, community detection, and more. However, there is currently no tropical matrix tri-factorization approach that would allow for the analysis of multipartite networks with many parts. To address this, we propose the triFastSTMF algorithm, which performs tri-factorization over the tropical semiring. We applied it to analyze a four-partition network structure and recover the edge lengths of the network. We show that triFastSTMF performs similarly to Fast-NMTF in terms of approximation and prediction performance when fitted on the whole network. When trained on a specific subnetwork and used to predict the entire network, triFastSTMF outperforms Fast-NMTF by several orders of magnitude smaller error. The robustness of triFastSTMF is due to tropical operations, which are less prone to predict large values compared to standard operations.


Introduction
Matrix factorization methods embed data into a latent space using a two-factorization or tri-factorization approach, depending on the number of low-dimensional factor matrices required for the specific task.Matrix factorization methods can help solve problems in recommender systems [1], pattern recognition [2], data fusion [3], network structure analysis [4], and similar.In many of these scenarios, two-factorization achieves state-of-the-art results.However, there are cases where tri-factorization outperforms two-factorization, such as in intermediate data fusion [3], where tri-factorization is used to fuse multiple data sources to improve the predictive power of the model.Matrix factorization methods employ different types of operations to compute the factor matrices [5][6][7].Most matrix factorization methods are based on standard linear algebra, such as non-negative matrix factorization [8] (NMF), binary matrix factorization [9] (BMF), probabilistic NMF [10] (PMF), while some novel approaches such as STMF [11] and FastSTMF [12] are based on the tropical semiring.
The (max, +) semiring or tropical semiring R max is the set R ∪ {−∞}, equipped with max as addition (⊕), and + as multiplication (⊗).For example, 2 ⊕ 3 = 3 and 1 ⊗ 1 = 2. Throughout the paper, the symbols "+" and "−" refer to standard operations of addition and subtraction.The renowned NMF method [8] is based on the element-wise sum, which results in the "parts-of-whole" interpretation of factor matrices.On the contrary, tropical or (max, +) factorization uses the maximum operator, which results in a "winner-takes-it-all" interpretation [13].Matrix factorization approaches using tropical semiring demonstrated their robustness against overfitting and achieved predictive performance comparable to techniques that use standard linear algebra.Moreover, they also reveal different patterns, as we have demonstrated in our previous studies [11,12].
Tropical semirings have various applications in network structure analysis and other research areas [14][15][16].Multiplication and addition of a similar (min, +) semiring enable mapping local edge information to global information on the shortest paths, while the (max, +) semiring describes the longest path problem.In our work, we are interested in an inverse problem that infers information about edges from potentially noisy or incomplete information [4].To the best of our knowledge, there is no matrix tri-factorization method based on the tropical semiring.Thus, we propose the first tropical tri-factorization method, called triFastSTMF, which introduces a third factor matrix.The proposed triFastSTMF can be used for various tasks that involve a single data source.Our GitHub repository https://github.com/Ejmric/triFastSTMFprovides the source code and data required to replicate our experiments.We demonstrate the applicability of triFastSTMF in edge approximation and prediction in a four-partition network.Moreover, this work sets the foundation for future research aimed at creating a tropical data fusion model capable of combining multiple data sources.
The paper is divided into the following sections.Section 2 describes the related methodology, while Section 3 introduces the proposed approach.In Section 4, we present the experimental evaluation.We conclude the work and discuss future opportunities in Section 5.

Related work
Matrix factorization (MF) is one of the most popular methods for data embedding, which enables the discovery of interesting feature patterns by clustering and gaining additional knowledge from the resulting factor matrices.A well-known matrix two-factorization approach is non-negative matrix factorization (NMF), which imposes non-negativity on both the input and output factor matrices for a more straightforward interpretation of the results.The tri-factorization based NMF called NMTF is used to extract patterns from relational data [17], and is applied in various research areas from modeling topics in text data [18] to discovering disease-disease associations [19].Fast-NMTF [20] is a version of NMTF that uses faster training algorithms based on projected gradients, coordinate descent, and alternating least squares optimization.One of the usual applications of NMTF is in data fusion methods.DFMF [3] is a variant of penalized matrix tri-factorization for data fusion, which simultaneously factorizes data matrices in standard linear algebra to reveal hidden associations.
In the field of tropical matrix factorization, De Schutter & De Moor in 1997 [21] presented a heuristic algorithm TMF to compute factorization of a matrix over the tropical semiring.The STMF method [11] is based on TMF, but it can perform matrix completion over the tropical semiring.With STMF, we have shown that tropical operations can discover patterns that cannot be revealed with standard linear algebra.FastSTMF [12] is an efficient version of STMF, where we introduce a faster way of updating factor matrices.The main advantage of FastSTMF over STMF is better computational performance since it achieves better results with less computation.Both STMF and FastSTMF showed the ability to outperform NMF in achieving higher distance correlation and smaller prediction error.However, NMF still achieves better results in terms of approximation error on the train set.
We can also use matrix factorization to solve different network optimization problems.The Floyd-Warshall algorithm [22] for shortest paths can be formulated as a computation over a (min, +) semiring.Hook [4], in his work of linear regression over the tropical semiring, showed how a (min, +) semiring can be used for the low-rank matrix approximation to analyze the structure of a network.The basis of this approach is a two-factorization algorithm that can recover the edge lengths of the shortest path distances for tripartite and bipartite networks.Network partitioning can be done using the algorithm for community detection called the Louvain method [23].Another interesting application of semirings is the fact that we can write the Viterbi algorithm [24] compactly in a (min, +) semiring over probabilities [25].
Currently, no method returns three factorized matrices computed over the tropical semiring.In our work, we propose a first tri-factorization algorithm over the tropical semiring called triFastSTMF, which is based on FastSTMF.To evaluate it empirically, we apply our triFastSTMF to approximate and predict the edge lengths of a four-partition network.

Semirings (max, +) and (min, +)
In a matrix semiring, the operations on the matrices are based on the main operations in the underlying semiring.We denote by R t×s max the set of all matrices with t rows and s columns over R max and for a matrix X ∈ R t×s max we denote its element in the ith row and the jth column by X ij .Moreover, R t max = R t×1 max is the set of all vectors with t components over R max .We define the matrix addition over R max as for all A, B ∈ R m×n max , i = 1, ..., m and j = 1, ..., n, and the matrix multiplication as for A ∈ R m×p max and B ∈ R p×n max .Similarly, in the (min, +) semiring, the matrix addition is defined as for all A, B ∈ R m×n min , i = 1, ..., m and j = 1, ..., n, and the matrix multiplication is defined as for A ∈ R m×p min and B ∈ R p×n min for i = 1, ..., m and j = 1, ..., n.We say that matrix A is less than or equal to matrix B, denoted as A B, if every element in A is less than or equal to its corresponding element in B. For given matrices A ∈ R m×n max and B ∈ R m×p max , the solutions of matrix equation do not need to exist.However, there might exist some matrices X ∈ R n×p max , such that A ⊗ X B. Such X is called a subsolution of the equation (1).The greatest subsolution of (1) is a matrix X 0 ∈ R n×p max , such that A ⊗ X 0 B and for any matrix X , satisfying A ⊗ X B we have X X 0 .
It is well known (see, e.g.[26]) that for exists and is given by More generally, for matrix equations the greatest subsolution is given by the following theorem.
Theorem 1 (Described by Gaubert and Plus [26]).For any A ∈ R m×n max and B ∈ R m×p max the greatest subsolution of the equation In what follows, we need to include both operations ⊗ and ⊗ * in our computations.First, we prove the following technical lemma.
To implement a tropical matrix tri-factorization algorithm, we need to know how to solve tropical linear systems.In particular, we need to find the greatest subsolution of the linear system is the greatest subsolution of the equation Proof.Observing the equation A ⊗ Y = C, its greatest subsolution is by Theorem Moreover, if any matrix Y satisfies the inequality A ⊗ Y C, this implies that Y (−A) T ⊗ * C. Similarly, the greatest subsolution of the equality and if any matrix Z satisfies the inequality Z ⊗ B C, this implies that Z C ⊗ * (−B) T .
Define X 0 = (−A) T ⊗ * C ⊗ * (−B) T .Using equations (3), (4) and Lemma 1 observe that Assume now there exists a subsolution X of (1), i.e., Let us prove that X X 0 , which will imply that X 0 is the greatest subsolution of equation ( 1).Since X ⊗ B is the subsolution of the equation

Tri-factorization over the tropical semiring
We propose a tri-factorization algorithm triFastSTMF over the tropical semiring, which returns three factorized matrices that we later use for the analysis of the structure of four-partition networks.
Matrix tri-factorization over a tropical semiring is a decomposition of a form Since for small values of r 1 and r 2 such decomposition may not exist, we define the tropical matrix tri-factorization problem as: Given a matrix R and factorization ranks r 1 and r 2 , find matrices G 1 , S and G 2 such that ( Because the solution of equation ( 5) does not exist in general, we will evaluate the computed tri-factorization by b-norm, defined as In particular, we want to minimize the cost function In Algorithm 1, we present the pseudocode of the algorithm triFastSTMF illustrated in Figure 1.The convergence of the proposed algorithm triFastSTMF, defined in Algorithm 1, is checked similarly to that of STMF [11] and FastSTMF [12].The factor matrices are updated only if the b-norm decreases, ensuring that the approximation error is monotonically reduced.
The triFastSTMF method consists of the following steps: 1.We follow the results obtained in [12] to preprocess a data matrix into a suitable shape using transformations, like matrix transposition and random permutation of rows.Wide matrices are shown to achieve smaller errors compared to tall matrices [12].2. The default initialization of factor matrices G 1 , S and G 2 uses the Random Acol strategy [11], which computes the element-wise average of randomly selected columns from matrix R. Fixed initialization for matrices G 1 , S, and G 2 can be used straight from the data, see Section 4.2.
3. Until converged, each iteration of the algorithm first updates G 1 and G 2 using CFL and CFR, presented in Algorithms 2 and 3, respectively, and described below.Then we compute the middle factor S as the greatest subsolution of equation 4. As the last step of triFastSTMF, we reshape the factor matrices G 1 , S and G 2 into appropriate forms depending on the initial transformation of the data matrix R. If some of the elements of the data matrix R are not given, we apply the operations proposed in [11] to skip all the missing values in the calculation.
Note that triFastSTMF updates one factor matrix at a time using CFL and CFR, presented in Algorithms 2 and 3, respectively.They are both based on FastSTMF and represent the two-factorization with FastSTMF core [12] that contains minor changes: • In CFL/CFR, we remove the initialization of the factor matrices, as they are already initialized at the beginning of triFastSTMF.In CFL, we update only the left factor matrix G 1 , and declare Q = S ⊗ G 2 to be the second Algorithm 1 Tri-factorization over the tropical semiring (triFastSTMF) Algorithm 2 Compute Factorization to update the Left factor matrix G 1 (CFL) factor matrix.Similarly, in CFR, we update only the right factor matrix G 2 and Q = G 1 ⊗ S is the first factor matrix.This approach prevents overfitting factor matrices since the optimization iterates over the left and right factorization.Such a process gives equal importance to both factor matrices, allowing patterns to spread in multiple factor matrices instead of being consolidated in one of them.
• We change the computation of the approximation error.FastSTMF computes the error of two-factorization, while CFL/CFR computes the tri-factorization error using the current factor matrices G 1 , S, and G 2 .
• We do not transpose the matrices nor permute the rows of matrices in CFL/CFR since this is performed as part of triFastSTMF.
The functions F-ULF, F-URF and TD-A used in CFL and CFR are the same as in the FastSTMF algorithm [12].We present the pseudocode of TD-A in Algorithm 4, where the notation of functions used is given in [12].

Different aspects of the tri-factorization on networks
The four-partition network shown in Figure 2 is an illustrative example of where we can apply tri-factorization for network structure analysis.We represent the four-partition network with three factor matrices which is the basis of tri-factorization methods.Further, different approaches to four-partition networks can be used depending on the nature of the data and the task that needs to be solved.
For a network Γ with a vertex set and an edge set E(Γ), we define a matrix G 1 ∈ R m×r1 max such that G 1(ij) represents the weight on the edge from x(i) to y(j), a matrix S ∈ R r1×r2 max where S jk represents the weight on the edge from y(j) to w(k) and a matrix G 2 ∈ R r2×n max where G 2(k ) represents the weights of the edges from w(k) to z( is the length of the longest path from x(i) to z( ), see Figure 2. If a matrix R is given, we can estimate G 1 , S and G 2 with triFastSTMF.
The main question is how to present an arbitrary network as a four-partition network.The two main approaches are: • All nodes in the four-partition network are real nodes.The matrices G 1 , S, and G 2 represent weights of the real edges from the original network, which preserves the interpretability of the network since the relations are only between real nodes.Moreover, the size of the four-partition network remains the same size as the original network.This approach is suitable when the original network's structure already has four partitions. x x( 2  • Some nodes in the four-partition network are latent nodes.The real nodes are only outer nodes (x, z), while latent nodes are inner nodes (y, w).In this case, the matrices G 1 , S and G 2 represent latent features of the outer nodes and not real weights from the original network, leading to a more difficult interpretability of the network since now the relations are also between real and latent nodes.The size of the four-partition network is larger than the size of the original network, which means increases the complexity of the task using this approach.
We focus on the first approach, where all nodes in the network are real nodes since we want to use the patterns from the data to initialize the factor matrices, maintain network interpretability, demonstrate how to work with real four-partition networks, and consequently obtain a better approximation of matrices R, G 1 , S, G 2 .In this way, we fully present the power of tri-factorization over two-factorization and its primary purpose.

Comparison with other strategies
In our work, we developed different tropical tri-factorization strategies, triSTMF and Consecutive, that are based on two-factorizations [11,12].We compare their effectiveness with proposed triFastSTMF in Section 4.1.1.
The triSTMF strategy is based on the TD_A method from FastSTMF, and we implement triSTMF tri-factorization as two different two-factorizations: We denote errors obtained from TD_A in the i) case as ε L and errors in the ii) case as ε R .We developed two versions called triSTMF-BothTD and triSTMF-RandomTD, which differ in the order of how the error is computed.
In triSTMF-BothTD, the computation is performed using both ε L and ε R .The smaller error between ε L and ε R is selected to perform optimization.In contrast, triSTMF-RandomTD randomly computes ε L or ε R and continues with the optimization.Also, triSTMF uses ULF and URF from STMF as the basis for updating factor matrices.Note that we cannot use F-ULF and F-URF directly in the case of tri-factorization since the third factor matrix S introduces additional complexity to F-ULF and F-URF, resulting in incompatible operations.This results in a slow optimization process of both versions of triSTMF.
The Consecutive strategy has two versions: lrConsecutive and rlConsecutive.The goal of this strategy is to achieve tri-factorization by first applying FastSTMF to the data matrix R, resulting in factor matrices U and V .In the second step, lrConsecutive obtains the third factor matrix by applying FastSTMF to the matrix V to obtain S and G 2 , while G 1 = U .In contrast, rlConsecutive applies FastSTMF to the matrix U to obtain G 1 and S, while G 2 = V .The drawback of a consecutive strategy is the consolidation of the patterns in one of the factor matrices during the first step.

Synthetic data
We created a synthetic data matrix of size 200 × 100 using the (max, +) multiplication of three random non-negative matrices.Since the purpose of synthetic data is to present the perfect scenario in which the proposed method works the best, we created our synthetic data using three random factor matrices of sufficiently large ranks r 1 = 25 and r 2 = 20.We use a synthetic data matrix to compare different tropical matrix factorization methods in Section 4.1.1.
We also created a synthetic network with four partitions of sizes (m, r 1 , r 2 , n) = (45, 10, 15, 30) and use it to analyze four-partition network in Section 4.1.2.

Real data
We downloaded the real-world interaction dataset of an ant colony [27] from the Network Data Repository [28].The nodes represent 160 ants, the edges represent physical contact (interaction), and the edge weight is the frequency of interaction during 41 days in total.We preprocessed the network to the appropriate format for evaluation as explained in Section 4.2.In Figure 3, we show the daily average frequency of interactions between ants.The distance between the nodes indicates the strength of interactions, i.e., nodes are closer when the interaction is stronger; contrary, nodes are farther apart when the interaction is weaker.The outer nodes interact less frequently with the nodes in the center of the network.We depict the individual frequency of interactions with the transparency of the edge color in Figure 3.
Figure 3: A real-world network of the daily average frequency of interactions in an ant colony.The strength of the interaction is visualized with the distance between nodes and edge transparency.

Evaluation metrics
In our work, we use the following metrics: • Root-mean-square error or RMSE is a commonly used metric for comparing matrix factorization methods [12].We use the RMSE in our experiments to evaluate the approximation error RMSE-A on the train data, and prediction error RMSE-P on the test data.
• b-norm is defined as ||W || b = i,j |W ij |, and it is used in [11] and [12] as objective function.We also use the b-norm to minimize the approximation error of triFastSTMF.
• Rand score is a similarity measure between two clusterings that considers all pairs of samples and counts pairs assigned in the same or different clusters in the predicted and actual clusterings [29].We use the Rand score to compare different partitioning strategies of the synthetic network.

Evaluation
We conducted experiments on synthetic data matrices with true ranks r 1 = 25 and r 2 = 20.The experiments were repeated 25 times for 300 seconds using Random Acol initialization.
For the synthetic four-partition network reconstruction, we repeat the experiments 25 times using fixed initialization with different random and partially-random partitionings.Due to the smaller matrices, these experiments run for 100 seconds.
For real data, we used the Louvain method [23] to obtain r 1 and r 2 .Furthermore, we randomly removed at most 20% of the edges.We use fixed initialization and run the experiments for 300 seconds.

Results
We perform experiments on synthetic and real data.First, we compare different tropical matrix factorization methods on the synthetic data matrix and show that triFastSTMF achieves the best results of all tropical approaches.Next, we analyze the effect of different partitioning strategies on the performance of triFastSTMF.Finally, we evaluate the proposed triFastSTMF on real data and compare it with Fast-NMTF.

Comparison between the tropical matrix factorization methods
We experiment with different two-factorization and tri-factorization tropical methods.The set of all tri-factorizations represent a subset of all two-factorizations.Specifically, each tri-factorization is also a two-factorization, meaning that, in general, we cannot obtain better approximation results with tri-factorization compared to two-factorization.In Figure 4, we see that the first half of lrConsecutive is better than the second half of lrConsecutive.Namely, in the first half, we perform two-factorization, while in the second half, we factorize one of the factor matrices to obtain three factor matrices as the final result.This second approximation introduces uncertainty and larger errors compared to the first half.We see a similar behavior in rlConsecutive.In this scenario, we show that the two-factorization is better than the tri-factorization.We see that the results of triSTMF-BothTD and triSTMF-RandomTD overlap and do not make any updates during the limited running time since they use slow algorithms to update factor matrices.
Comparing the two-factorization method FastSTMF and the tri-factorization method triFastSTMF, we obtain a similar approximation error in Figure 4. We see that our proposed triFastSTMF achieves the lowest approximation error on the synthetic data matrix of all tested tropical tri-factorization methods.Tri-factorization may outperform two-factorization in a limited running time because of the nature of the data and the initialization of factor matrices.Theoretically, we expect that two-factorization and tri-factorization would achieve the same results when evaluated across a large number of datasets.Tri-factorization has demonstrated its superiority over two-factorization in many examples.An important application of tri-factorization is the fusion of data from different sources [3].In our work, we show that tri-factorization can be applied to approximate and predict weights in four-partition networks.

Analysis of four-partition network construction
We construct a random tropical network K of total 100 nodes with a four-partition A ∪ B ∪ C ∪ D. We denote the sizes of sets A, B, C and D as m, r 1 , r 2 and n, respectively, and choose (m, r 1 , r 2 , n) = (45, 10, 15, 30), see Figure 5.We want to check the robustness of proposed triFastSTMF to the partitioning process and answer the following question: Is approximation error stable among different choices of partitioning?
Network K contains the following edges: We propose the following general algorithm for converting the input network K into a suitable form for tri-factorization.
First, partition all network nodes into four sets, X, Y, W , and Z, with fixed sizes m, r 1 , r 2 and n, respectively, in two ways: • random partitioning: X ∪ Y ∪ W ∪ Z is a random four-partition of the chosen size.Random partitioning is a valid choice when all network nodes represent only one type of object.For example, in a social network, a node represents a person.
• partially-random partitioning: Y, W are random subsets of nodes of K of sizes r 1 and r 2 , while X = A and Z = D, where A, D are given.Partially-random partitioning is applicable when there are two types of objects represented in the network.For example, in the movie recommendation system, users belong to the set X and movies to Z.In this case, sets Y and W represent the latent features of X and Z.
See examples of random and partially-random partitioning in Figure 5, where we show only the edges X − Y , Y − W and W − Z to achieve easier readability of the network.Given the (pseudo)random partitioning, construct matrix R as the edges X − Z.The matrices G 1 , S and G 2 are constructed as explained in 3.1.3and can be used for the initialization of tri-factorization of R (fixed initialization).For the missing edges, we set the corresponding values in triFastSTMF to be a random number from elements of G 1 , S and G 2 .Tri-factorization on R will return updated R, G 1 , S, G 2 with approximated/predicted weights on edges.
We show that partially-random partitioning achieves higher Rand scores, but approximation errors are similar to the ones obtained by random partitioning, see Figure 6.We conclude that the partitioning process does not significantly affect the approximation error of triFastSTMF.Still, if there is some additional knowledge about the sets of partition, it is better to use partially-random partitioning.When we do not know the real partition, random partitioning or advanced algorithms, such as the Louvain method, can be used.

Real data
We test our method on a real-world interaction dataset of ant colony introduced in Section 3.3.We describe the data on the interaction between pairs of ants using a weighted adjacency matrix of size 160 × 160, where diagonal elements are equal to 0. The adjacency matrix is symmetric, and we use the data from the upper triangular part to construct the matrix H, where each row describes one pair of ants, and columns represent a specific day.Since H is large, we use k-means clustering to obtain 50 clusters and analyze the behavioral patterns of the ants on each day, shown in Figure 7.  Next, we construct ten different networks, N 1 , . . ., N 10 by sampling with replacement the edges from N .Each sampled network has at most 20% of missing edges from N , which are used for evaluation.For each network N i , i ∈ {1, . . ., 10}, we construct the weighted adjacency matrix A i with the exact same size and ordering of the nodes in rows and columns as in matrix A. Now, to apply tri-factorization on networks, we need to perform Louvain partitioning [23] for each N i to obtain a four-partition of its nodes: Louvain method assigns sets of a four-partition and enables favoring larger communities using parameter γ.Different partitions are obtained for different values of γ, from which we select a connected four-partition network.We prefer the outer sets X i and Z i of corresponding sizes m and n, respectively, to have a larger size than the inner sets Y i and W i of sizes r 1 and r 2 , respectively.This will ensure that the matrix factorization methods embed data into low-dimensional space using rank values r 1 , r 2 min{m, n}.Louvain algorithm results in different parameters m, r 1 , r 2 and n for each N i , i ∈ {1, . . ., 10}, shown in Table 1.We define µ to represent a percentage of nodes in outer sets.Table 1 shows that µ ≥ 74% for all N i .We construct R i matrices of corresponding sizes m × n using edges from X i to Z i , and the corresponding matrices G 1 , S and G 2 of sizes m × r 2 , r 1 × r 2 and r 2 × n, respectively, using all four sets.In R i , we mask all values equal to 0. We run matrix factorization methods on each R i matrix using the corresponding factor matrices G 1 , S, and G 2 for fixed initialization and obtain updated matrices G 1 , S, and G 2 .Since we use fixed initialization, we evaluate each method only once because there is no presence of randomness.In Table 2, we present the comparison between our proposed triFastSTMF and Fast-NMTF.The results show that Fast-NMTF achieves a smaller approximation error RMSE-A, while triFastSTMF outperforms Fast-NMTF in a better prediction error RMSE-P.This result is consistent with previous research in [11] and [12], where we have shown that matrix factorization over the tropical semiring is more robust to overfitting compared to methods using standard linear algebra.
The matrix R i contains only edges X i − Z i .All other edges X i − Y i , Y i − W i and W i − Z i are hidden in the corresponding factor matrices G 1 , S and G 2 .If we want to obtain predictions for all edges of network N using different partitions of N i , we need to also consider factor matrices, not just matrix R i .To achieve this, we take into account the corresponding G 1 , S and G 2 including their products G 1 ⊗ S, S ⊗ G 2 and G 1 ⊗ S ⊗ G 2 .The edges that were removed from N during the sampling process to obtain N i are used to measure the prediction error, while the edges in N i are used for approximation.
In Table 3, we present the comparison between our proposed triFastSTMF and Fast-NMTF on network N using different partitions of N i .The results show that triFastSTMF and Fast-NMTF have the same number of wins regarding  the RMSE-A and RMSE-P.However, the main difference between triFastSTMF and Fast-NMTF is in the fact that Fast-NMTF achieves an enormous error compared to triFastSTMF in half of the cases.This is because now we are also predicting edges X i − Y i , Y i − W i , W i − Z i and X i − W i , Y i − Z i , which we obtain by multiplying the corresponding factor matrices G 1 , S and G 2 properly.There is no guarantee that the factor matrices G 1 , S, and G 2 and their products are on the same scale as the data matrix R i on which the matrix factorization methods were trained.Since Fast-NMTF uses standard linear algebra, one more matrix multiplication is needed to get to the original data scale.Using standard + and × operators results in significant error, since the predicted values expand in magnitude quickly.triFastSTMF does not have this problem because it is based on tropical semiring, and the operators max and + are more averse to predicting large values.

Conclusion
Matrix factorization is a popular data embedding approach used in various machine learning applications.Most factorization methods use standard linear algebra.Recent research introduced tropical semiring to matrix factorization, which enables the modeling of nonlinear relations.Two-factorization approaches are often applied to study bipartite and tripartite networks.However, tri-factorization is suitable for application on four-partition networks, and to the best of our knowledge, our work is the first to explore this option.
In this study, we evaluate different strategies based on two-factorization, called triSTMF and Consecutive.Both strategies have different drawbacks, such as a slow optimization process in triSTMF and the overfitting of one of the factor matrices in Consecutive.These limitations have motivated us to develop a novel tri-factorization approach that addresses the limitations of triSTMF and Consecutive.We propose triFastSTMF, a tri-factorization algorithm over the tropical semiring that can be used for a single data source.Our proposed algorithm is based on FastSTMF, a two-factorization method, with the necessary modifications for tri-factorization.We also provide a detailed theoretical analysis for solving the linear system and computing the third factor matrix.The obtained solution is used for the optimization in the proposed triFastSTMF.
We tested the method on synthetic and real data, applied it to the edge approximation and prediction task in four-partition networks and demonstrated that triFastSTMF achieves close approximation and prediction results as Fast-NMTF.Additionally, triFastSTMF is more robust than Fast-NMTF in cases when methods are fitted on a part of the network and then used to approximate and predict the entire network.
Although in this study we presented the proposed method on a single data source, we established the basis for creating a model capable of combining multiple data sources.Our future work involves the application and modification of the proposed triFastSTMF to the data fusion problem, which often employs tri-factorization.

Figure 1 :
Figure 1: Schematic diagram of one iteration of the proposed triFastSTMF method for updating factor matrices G 1 , S and G 2 of the data matrix R ≈ G 1 ⊗ S ⊗ G 2 .Step 1) updates the factor matrix G 1 through CFL, while step 2) uses the new G 1 to update G 2 through CFR.The last step, 3) updates S using Theorem 2 and newly-computed factor matrices G 1 and G 2 .The procedure repeats until convergence.

Figure 2 :
Figure 2: Example of a four-partition network.

Figure 4 :
Figure 4: Comparison of different tropical tri-factorization methods.The median, first and third quartiles of the approximation error in 25 runs on the synthetic random tropical 200 × 100 matrix are shown.

• 1 ,Figure 5 : 2 ,•
Figure 5: (a) A synthetic random tropical network K of 100 nodes created by applying the tropical semiring on four sets A, B, C and D. The sets A and D are densely connected, following the network construction process.In contrast, sets B and C are less connected.Example of partitioning network K, using b) random and c) partially-random partitioning.

Figure 6 :
Figure 6: Rand score and approximation error of triFastSTMF on 25 random and 25 partially-random partitionings of synthetic data.We performed one run of 100 seconds for each matrix R and used true ranks r 1 and r 2 as factorization parameters.

Figure 7 :
Figure 7: Analysis of ants' behavioral patterns over 41 days.The rows represent centroids of clustered ant pairs with k-means using k = 50, and the columns denote daily interactions.Rows and columns are ordered using Optimal Leaf Ordering for Hierarchical Clustering [30] using cosine distance and Ward linkage.

Figure 8 :
Figure 8: Comparison between the daily average of all interactions between ant pairs for different groups of days: (a) days 1-19, (b) days 20-31, and (c) days 32-41.Rows and columns are ordered using Optimal Leaf Ordering for Hierarchical Clustering [30] using cosine distance and Ward linkage.

Table 3 :
RMSE-A and RMSE-P on network N using different partitions of N i .The result of the best method in the comparison between triFastSTMF and Fast-NMTF is shown in bold.