Multi-Modal Entity Alignment Using Uncertainty Quantification for Modality Importance

The knowledge graphs are structured data utilized for information retrieval purposes. Entity alignment using multi-modal supplementary information plays an important role in knowledge graph integration. However, if the supplementary information is missing or incorrect, it can negatively impact the retrieval of information. If we can quantify the usefulness of the information for retrieval as a degree of importance, the influence of unimportant supplementary information can be reduced. In this study, we proposed a method that quantifies the importance of each piece of information by using a probability distribution. Our proposed method improves an existing method by 7.7% and 7.3% in H@1 on two datasets (FB15K-DB15K, FB15K-YAGO15K). Qualitative experiments also showed that the importance of information quantified by uncertainty successfully captured data that was not useful for information retrieval. Our qualitative experiments also show that the importance of information, quantified by uncertainty, effectively captures data that is not beneficial for information retrieval.


I. INTRODUCTION
Efficient and accurate information retrieval is becoming increasingly vital as the amount of stored data increases and becomes multi-modal. To achieve this goal, structured databases are required for effective retrieval. Knowledge Graphs (KGs) such as Freebase [1], DBpedia [2], and YAGO [3] describe entities and relationships in the real world in a structured form. In information retrieval using KG, entities are retrieved based on their relational information. However, as KGs become larger, missing and erroneous information can occur. Therefore, it is important to complete missing information or correct errors within the KG.
There are two important methods for complementing or modifying a KG. The first method involves finding the same entity in multiple KGs and integrating their information. This method is evaluated in a task known as entity alignment [4]. The second method involves utilizing The associate editor coordinating the review of this manuscript and approving it for publication was Fu Lee Wang . supplementary information from other modalities in addition to the relational information. A KG with added supplementary information is referred to as a multi-modal KG (MMKG). VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Entities are often given supplementary information in the form of text, numerical information, images, etc. Even when the information does not match the relational information, the search capability is improved as the information can be retrieved based on the correspondence of the supplementary information.
However, the addition of supplementary information that is missing or errounes can negatively impact the quality of information retrieval. This is illustrated in Fig. 1, which shows an example where some of the supplementary information is not useful in information retrieval. In the two KGs, different types of information are given to the entity ''Tokyo''. In the two KGs, different modality information is given to the entity meaning Tokyo. While the relational and attribute information (numerical information) correspond with each other, such as the area of Tokyo and the capital of Japan, the images of the Tokyo Tower and the Tokyo Sky Tree do not correspond with each other, indicating that the image information is not useful when searching for the entity ''Tokyo''. Such useless information can be caused by missing or erroneous information in KG. The negative effects of useless supplementary information have been noted in previous studies. Wang et al. [5] discussed how ambiguous visual information can act as noise during KG completion. Liu et al. [6] also mentioned that in multi-modal entity alignment, information about a modality with low discriminative performance causes overall performance degradation.
In the field of machine learning, there exist uncertainty quantification methods in data which may not be useful for decision making. This type of uncertainty, originating from missing or incorrect data, is commonly estimated through probability distributions in previous studies [7]. This study proposes a novel method to improve multi-modal entity alignment by using uncertainty quantification methods. The proposed method weights the usefulness of each modality information by measuring the uncertainty. This method quantifies the importance of information for each modality by embedding the entities in the MMKG into the representations of the probability distribution. This reduces the influence of unimportant information such as missing or erroneous information and improves the accuracy of information retrieval. The main contents of this study were as follows: • We proposed a novel KG embedding that uses uncertainty to quantify the importance of each modality. Uncertainty was calculated by embedding the entity into a representation of the probability distribution.
• We showed that the proposed method achieves a significantly higher accuracy than MMEA [8] (multi-modal entity alignment), which is a baseline model of two benchmark datasets for entity alignment.
• We verified the properties of unimportant information that were uncertain using qualitative experiments. This paper is organized as follows. Section II comprehensively reviews the related work in the areas of multi-modal knowledge representation learning, multi-modal entity alignment, and uncertainty quantification. Furthermore, it discusses the research progress achieved in these fields. Section III provides a detailed description of the proposed method, highlighting the improvements made over the baseline MMEA [8]. The experimental setup and evaluation details are described in Section IV. The experimental results and discussion of the proposed uncertainty measure are presented in Section V. Finally, Section VI concludes the paper and outlines future work.

II. RELATED WORK A. MULTI-MODAL KNOWLEDGE REPRESENTATION LEARNING
In this section, we first describe knowledge representation learning, which is used to complement missing KGs. We then describe related works that incorporate additional information from different modalities to improve the performance.
Recently, knowledge representation learning has emerged as a critical task in the field of KGs. The goal of knowledge representation learning is to project entities and relationships into a low-dimensional vector space, in which semantic information of entities and relationships can be captured implicitly and semantic associations can be efficiently calculated. TransE [9] is a well-known method for knowledge representation learning, which learns to embed relationships as transitions between head and tail entities. Due to its simplicity and versatility, several translation-based methods have been proposed as derivatives of TransE, including DistMult [10], ComplEx [11], and RotatE [12].
One direction for advancement in knowledge representation learning is the use of multi-modal data. DKRL [13] (description-embodied knowledge representation learning) considers textual information to learn the representations of KGs. Following DKRL, IKRL [14] (image-embodied knowledge representation learning) has been proposed to integrate visual and structured knowledge with the TransE model. However, these methods either only use textual knowledge or only visual knowledge, ignoring the simultaneous use of two or more multi-modal factors. MKBE [15] (multimodal knowledge base embeddings) also simultaneously use numerical and categorical information as well as relations and images. These methods use information from other modalities as supplementary information for embedded representation acquisition. MMKRL [16] (multi-modal knowledge representation learning) also selects an adversarial training strategy to enhance the robustness, which is rarely considered in existing multi-modal knowledge representation learning methods.

B. MULTI-MODAL ENTITY ALIGNMENT
The task of entity alignment involves mapping entities that have equivalent real-world identities across different KGs, in order to integrate information from diverse sources into a common space. Current methods often embed entities to a low-dimensional space using knowledge representation learning and measure the similarity between entity FIGURE 2. Conceptual diagram of the proposed model. The information for each modal is embedded as a multivariate normal distribution expressed as a mean vector and covariance matrix, respectively, before being integrated into a single multivariate normal distribution by common space learning. Finally, alignment learning brings the distribution of the entity close to the distribution of the entity in the other knowledge graph with the same object. [17], which jointly trains a translational embedding model [9] to encode language-specific KGs in separate embedding spaces and a transformation to align counterpart entities across embeddings. Research on entity alignment can be broadly categorized into three directions: the embedding space, labeling for alignment, and the use of multi-modal information [18]. This paper focuses on the last direction.

embeddings. A typical method is MTransE
We will now describe related studies that utilize supplemental information from other modalities in entity alignment. A product of experts (PoE) [6] utilizes numerical and image information, in addition to relational information, to improve the accuracy of entity alignment. Expert models are created for each modality, and validation experiments have been performed using different integration methods, including simultaneous learning, ensemble, and concatenation of input data. MMEA [8] achieves better accuracy than PoE by learning embedding vectors for each piece of modal information and integrating the information by bringing them closer to a single point in the common space. This method can be viewed as a natural multi-modal extension of MTransE. EVA [19] (entity visual alignment) is another study of entity alignment that utilizes visual information. EVA can learn alignments in an unsupervised manner by utilizing the similarity of visual information. This study has a different direction, as it focuses on innovations related to methods of integrating multi-modal information. MMEA is considered the most suitable baseline model for our study due to its simplicity, high accuracy, and ease of adding modalities and transferring trained models for each modality.
Despite the prevalence of existing studies on multi-modal entity alignment, they fail to acknowledge the significance of considering the informative value of each modality. The entity alignment accuracy can be reduced when utilizing information from modalities that are not relevant. Hence, it is crucial to develop a knowledge representation approach that incorporates the importance of each modality. In this study, we propose a novel common-space learning technique based on MMEA, which uses uncertainty as a means of weighting the information derived from each modality.

C. UNCERTAINTY QUANTIFICATION
In the domains of simulation optimization [20] and process optimization [21] various uncertainty quantification methods have been proposed to address uncertainties in data and models. Furthermore, recent advancements in deep learning have stimulated research efforts aimed at evaluating the credibility of deep learning model outputs [7], [22], [23]. The objective of this study is to enhance learning algorithms through the incorporation of uncertainty evaluation. The following studies have a similar focus as this study. Gal et al. [24] presented an active learning method that prioritizes uncertain data. Our approach differs in that we aim to reduce the weight of uncertain data. Khan et al. [25] proposed a technique to broaden the margins of classification boundaries for uncertain samples in classification problems. Yu et al. [26] introduced a method where a student model learns the trustworthy output of a teacher model for the task of left atrium segmentation. Our study, on the other hand, tackles a retrieval problem instead of a classification problem and involves performing relational, numerical, and visual learning simultaneously in a more complex setting. Kendall et al. [27] proposed a method to adjust the ratio of loss functions in multitask learning by quantifying task uncertainty. Our proposed method is unique in that it focuses on the uncertainty of individual samples rather than tasks. In addition, this study is the first to present VOLUME 11, 2023 a method for improving the multi-modal entity alignment method from the perspective of uncertainty.
One approach to uncertainty quantification is embedding into a probability distribution. In the field of knowledge graph embedding, KG2E [28] performs embedding into a multivariate normal distribution. This method is a adaptation of word2gauss [29], which was developed to represent word polysemy, to knowledge graph embedding. Apart from the multivariate normal distribution, Beta distribution [30] and Dirichlet distribution [31] are also viable options. Methods such as MC dropout [32] can also be used to learn arbitrary distributions, but they are computationally intensive. In this study, we employ a Gaussian embedding-based uncertainty measure, similar to KG2E, as a fair comparison with the baseline MMEA.

III. PROPOSED METHOD
In this section, we explain the terminology used in this paper, and then describe in detail the proposed method of multi-modal entity alignment using probability distributions. Specifically, the proposed method consists of five parts as shown in Fig. 2. First, we explain relational, numerical, and visual learning, which embed the information from each modality into a multivariate normal distribution. Next, common space learning, which integrates the multivariate normal distribution information obtained from each learning method, is explained. Finally, we explain entity alignment learning, which uses the obtained multivariate normal distribution to bring the distributions of entities with the same object in different KGs closer together.

A. NOTATIONS
Denote the MMKG as G = (Ê, R, I , N , X , Y , Z ), whereÊ is the set of entities, R is the set of relation names, I is the set of images, N is the set of attribute names, X is the set of relational triples, Y is the set of entity-image pairs, and Z is the set of numerical triples. A relational triple is a set of triples denoted by (h, r, t) ∈ X , where h ∈Ê is an entity called head, t ∈Ê is an entity called tail, and r ∈ R is a relation name, respectively. This means that there is a relation r between the entities represented by h and t. For example, the triplet (''Japan,'' capital_of, ''Tokyo'') means that Tokyo is the capital of Japan. Entity-image pairs are tuples denoted by (e, i) ∈ Y , where e ∈Ê is an entity and i ∈ I is an image feature. For example, the entity ''Tokyo'' is given the feature of a photograph of Tokyo scenery as shown in Fig. 1. The features are the outputs of neural networks that have been pre-trained in image classification. A numerical triple is a triple denoted by (e, a, n) ∈ Z , where e ∈Ê is an entity, a ∈ N is an attribute name, and n is a numerical value, respectively. For example, the numerical triple (''Tokyo,'' population, 13,617,445) means that the population of Tokyo is 13,617,445. Entity alignment is the task of matching entities that describe the same thing in the real world from different KGs. Let 2 ) | e 1 ∈Ê 1 , e 2 ∈Ê 2 } denotes the set of pairs of entities that describe the same object in the entire KG.

B. PROBABILITY DISTRIBUTION EMBEDDING
In this section, we present the implementation of the proposed method, which is based on MMEA [8]. For a fair comparison, as much as possible, the proposed method uses the same training process as MMEA. To highlight the improvements of the proposed method, we first describe the implementation of the base model MMEA and then outline the new improvements made by the proposed method.

1) RELATIONAL LEARNING
This section describes embedding relational information, which is the main information in the KG. A triple consisting of two entities and a relation between them is called a fact and is represented as (h, r, t) ∈ X . In MMEA [8], each entity and relation are embedded in a vector by TransE, and the similarity score is calculated as follows: where h, r, and t are the embedding vectors of h, r, and t embedded by TransE, respectively. The loss function for learning the relation embedding is defined by using the margin γ as follows: where D + and D − are the sets of positive and negative examples of the fact, respectively. The positive examples are given as τ = (h, r, t) ∈ X at training time, but in this study, as in [8], X is extended to D + using an exchange strategy. The exchange strategy is that given (h, r, t) ∈ X , when (h,h) ∈ H , (h, r, t) is also added to D + . This is repeated for t. Once D + is obtained, the set of negative examples, D − , is generated with the following definition: In the proposed method, for the (h, r, t) ∈ X , h, r, t are embedded into vectors µ h , µ r , µ t representing the mean and vectors h , r , t representing the variance-covariance matrixes of the multivariate normal distributions, respectively. Therefore, h, r, and t are represented by probability distributions N (µ h , h ), N (µ r , r ), and N (µ t , t ), respectively. The spread of each multivariate normal distribution expresses the uncertainty of the entity. Similarly to [28], we define the similarity score between N (µ h , h ) − N (µ t , t ) and N (µ r , r ) using Kullback-Leibler (KL)-divergence as follows: The KL-divergence between two multivariate normal distributions can be calculated as follows: where tr( ) is the trace of the covariance matrix , −1 is the inverse of , and x is a k 1 -dimensional random vector following N (µ 1 , 1 ). In this study, the covariance matrix was assumed to be a diagonal matrix to simplify these calculations. Substituting the similarity scores in Eq. (4) into Eq. (2), we obtain the loss function of the proposed method.

2) NUMERICAL LEARNING
First, we explain how MMEA [8] embeds numerical information. A triple of numerical information is represented as (e (n) , a, n) ∈ Z , where a is the attribute name and n is its numerical value. Here, the numerical values are real numbers, so they are converted into a distributed representation using the radial basis function (RBF) as follows: where c i is the radial kernel center vector and σ i is the variance vector. First, all numerical values corresponding to each attribute are normalized. Then c i and σ i can be computed by a supervised method in an RBF neural network. By concatenating the distributed representation of the numerical values and the attribute embedding vector a, we get the following 2 × d matrix M = ⟨a, φ n (e (n) ,a i ) ⟩. The score function when embedding numerical information is as follows: f num (e (n) , a, n) where e (n) is the embedding vector of e (n) , vec(·) denotes the projection, CNN (·) is the l-layer convolutional layer, and W represents a fully-connected layer. The loss function of the entire numerical information embedding is as follows: , a, n))). (8) The proposed method transforms e (n) into the normal distribution N (µ e (n) , e (n) ). We create the distribution N (µ (a,n) , (a,n) ) from (a, n). The mean vector is µ (a,n) = tanh(vec(CNN (tanh(M)))W ). (a,n) is a diagonal matrix with all elements fixed at 0.5. For (e (n) , a, n) ∈ Z , we define the following score function: f num (e (n) , a, n) = −D KL (N (µ e (n) , e (n) ), N (µ (a,n) , (a,n) )).
Substituting the similarity scores in Eq. (9) into Eq. (8), we obtain the loss function of the proposed method.

3) VISUAL LEARNING
First, we explain how MMEA [8] learns embedding points from visual information. Each entity is given visual information represented as (e (i) , i) ∈ Y . From the image given, we extract a 4096-dimensional vector from the classification layer of the neural network named VGG16 [33] as a feature, which has already been trained on ImageNet. The score function of embedding for visual information is as follows: where e (i) is the embedding vector of e (i) , and vec(·) is denotes the projection. The loss function for embedding visual information is defined as follows: The proposed method transforms e (i) into the normal distribution N (µ e (i) , e (i) ). We make a distribution N (µ i , i ) from feature i, the output of VGG16. The mean vector is µ i = tanh(vec(i)). i is a diagonal matrix with all elements fixed at 0.5. For (e (i) , i) ∈ Y given as pairs, we define the following score functions: Substituting the similarity scores in Eq. (12) into Eq. (11), we obtain the loss function of the proposed method.

4) COMMON SPACE LEARNING
This section describes how to integrate the three multivariate normal distributions obtained from the information from each modality. MMEA brings each embedded representation closer to a single Euclidean distance. In the case of the proposed method, each piece of modal information is represented by a multivariate normal distribution; therefore, the loss function to be integrated in the common space is defined as follows: L csl (e, e (r) , e (i) , e (n) ) = D KL (N (µ e , e ), N (µ e (r) , e (r) )) + D KL (N (µ e , e ), N (µ e (i) , e (i) )) + D KL (N (µ e , e ), N (µ e (n) , e (n) )), where e is an entity in the common space and N (µ e , e ) is its normal distribution embedding representation. From the KL-divergence formula, by increasing the elements of the covariance matrix, Eq. (13) can be reduced without bringing the mean vector of the modality closer. In the case of MMEA, the useless information for each modality must be adjusted using hyper-parameters; however, in the case of the proposed method, it is automatically adjusted by the covariance matrix.

5) ALIGNMENT LEARNING
In this section, we explain alignment learning, which brings entities that point to the same object closer together in a common space. MMEA learns to minimize the Euclidean distance between embedding vectors representing two entities given as training data. The proposed method minimizes the following loss function to bring the corresponding entity (e 1 , e 2 ) ∈ H closer between different KGs: L al (e 1 , e 2 ) = D KL (N (µ e 1 , e 1 ), N (µ e 2 , e 2 )).
The defined L rel , L vis , L num , L csl , L al are trained by iteratively updating the parameters one epoch at a time as in MMEA. The flowchart in Fig. 3 illustrates the entire learning process of the proposed method. Initially, the mean vectors of the datasets (G 1 , G 2 , H ) are learned while the variance-covariance matrices are kept fixed. Firstly, relational learning, visual learning, and numerical learning are performed in succession. Then, common space learning is performed on the normal distributions obtained from each modal by the previous learning. Finally, alignment learning is performed on the entity pairs given by H . This entire process is repeated as one epoch. The learning is terminated by early stopping for the L al metric on the validation data. After learning the mean vectors, the variance-covariance matrices are unfixed and the entire training process is repeated. Upon completion of the training, the similarity is calculated using the common space embedding and the model is evaluated through entity alignment.

IV. EXPERIMENT
First, a description of the dataset and evaluation metrics used in the experiment is given. Next, we describe in detail the setup of the quantitative evaluation experiment using entity alignment and the qualitative evaluation experiment on uncertainty.

A. DATASETS AND EVALUATION METRICS 1) DATASET
In this study, we used two MMKG datasets, FB15K-DB15K and FB15K-YAGO15K, both created by Liu et al. [6]. FB15K is a dataset commonly used in KG completion, and the entities and related entities were selected from DBpedia and YAGO, creating DB15K and YAGO15K, respectively. The datasets used in this study were the same as those used in MMEA [8], and the training dataset was randomly split 5 times into 20%, 50%, and 80% splits. The comparison between the models was based on the mean value of the training evaluation results in each dataset. In general, the more pairs of entities given in the entity alignment, the better the accuracy. By comparing the accuracy between the proposed method and MMEA at 20%, 50%, and 80%, we confirmed whether there is a difference in accuracy improvement depending on the amount of training data.

2) EVALUATION METRICS
Hits@n, MRR (mean reciprocal rank), and MR (mean rank), which are commonly used in ranking evaluation, were used as evaluation indices for entity alignment. Hits@n is the percentage of correct entities within the top n of the ranking obtained by the similarity calculation, MR is the average of the ranks of correct entities, and MRR is the average of the inverse of the ranks of the correct entities. Therefore, the higher Hits@n and MRR and the lower MR, the better the performance.

B. EXPERIMENTAL SETTINGS 1) SETTING FOR QUANTITATIVE EXPERIMENT
In this study, OpenEA 1 [18] was used for implementation. All models used in the comparison experiments used the default training settings of OpenEA, except MMEA, which was trained similarly to the training parameter settings in [8]. Each model used cross-domain similarity local scaling [34] during the evaluation. MMEA embedded an entity as an N = 200 dimensional vector. The proposed method embeds an entity as a pair of vectors, corresponding to the mean and the variance. For a fair comparison, we set the dimension number N of vectors to 100, using 200 elements in total.
To stabilize the learning process, two stages were used: the mean of the normal distribution was learned first, and then the parameter for variance was unfixed and relearned. To avoid the variance values diverging during learning, the range [C min , C max ] = [0.5, 50] was used. The initial value of all variances was set to 0.5.

2) QUALITATIVE EXPERIMENT SETTINGS
Hereafter, the measure of uncertainty will be referred to as the normalized area, and for convenience we will refer to it  as uncertainty. This experiment tested the hypothesis that the proposed uncertainty captures the utility of information for each modality. To measure the uncertainty of each sample, we considered the (hyper-)area of the (hyper-)ellipse that represents the probability for an intuitive understanding). We also normalized the area to be 1 if all variances are 0.5, which was the initial value in our experiments. Therefore, the normalized area was i ( (i,i) /0.5). The larger the normalized area, the more uncertain the sample. Because the variance is restricted to 0.5 or greater, the normalized area takes on a value greater than or equal to 1. In this section, we examine the validity of the uncertainty of the proposed method from two aspects.
First, we examined the relationship between the information and uncertainty of each entity. It is difficult to measure the degree to which an entity in one KG and its corresponding entity in the other KG have similar information. Therefore, in this experiment, for each entity, we plotted the relationship between the statistic representing the amount of relational and numerical information and the magnitude of uncertainty defined earlier. The statistics were the quantities of relational information triples, relation name types, numerical information triples, and attribute name types. If one entity has less information, it is unlikely that the other KG will have similar information. Therefore, by examining the relationship between the information an entity has and uncertainty, we can indirectly show that uncertainty depends on mismatch.
Next, we checked the relationship between the distribution of uncertainty for each type of entity. Each entity has a type, such as university or organization. Entities of the same type within a single KG are often given similar information. Conversely, different KGs may have different trends in the information given. Therefore, we plotted the distribution of uncertainty for each modality by type and examined the relationship between information mismatch and uncertainty for each type of modality. Specifically, we checked the relationship between the five types in DBpedia: organization, university, country, person, and film. Each type was obtained as follows: The entity having type organization is domain of ''<http://dbpedia.org/ontology/product>''. University is range of ''<http://dbpedia.org/ontology/campus>''. Country is domain of ''<http://dbpedia.org/ontology/country>''. Person is range of ''<http://dbpedia.org/ontology/residence>''. Film is range of ''<http://dbpedia.org/ontology/director>''. Each of the five types of entities was ordered by uncertainty, and the distribution was plotted for only the top 10% of the entities. This was because most of the entities had an uncertainty of 1, and we only wanted to focus on samples with large uncertainty values.

V. RESULTS AND DISCUSSION
A. RESULT FOR QUANTITATIVE EXPERIMENT Table 1 lists the evaluation results for each model when the training dataset is split at 20% of the total. Compared to the models using only relational information, the models using relational + numerical and relational + numerical + visual information were more accurate. We can confirm that the use of supplementary information tends to improve accuracy. However, since each method uses a different algorithm, there is a possibility of improved accuracy due to factors other than supplementary information. Therefore, we compared the accuracy of the proposed method and MMEA. The proposed method achieved better accuracy than MMEA in all evaluation metrics. The main difference between MMEA and the proposed method was that the supplementary information was weighted according to its importance. Therefore, this result indicates that the accuracy of information retrieval can be improved by considering the importance of the information when utilizing supplementary information. Table 2 lists the evaluation results of MMEA and the proposed method when the training data is divided into 20%, 50%, and 80% of the total. The proposed method consistently outperformed VOLUME 11, 2023  MMEA for all ratio splits, which indicates that the proposed method is effective regardless of the amount of training data.
For all methods, the order of the evaluation computation is O(NM ) for a given N queries to compute the similarity with M targets. Although the proposed method uses KL-divergence for similarity, it is implemented by a combination of basic functions, and the difference in computation time is negligibly small on the datasets (FB15K-DB15K, FB15K-YAGO15K). The proposed method has lower computational cost than Graph Convolutional Networks-based methods. From the experimental setup, the embedding dimension of MMEA and the proposed method is 200. Therefore, the proposed method achieves higher precision with the same number of parameters as MMEA. No correspondence could be found between the relational information given in the two KGs. Therefore, the uncertainty increased to reduce the importance of the relational information. Conversely, entities that have one relational triple, but low uncertainty often have multiple relational triples in the other KG, because the corresponding information can be found in the multiple relational triples of the other KG. The lower left panel is a scatter plot of entities showing the relationship between the uncertainty and number of relations. Most entities with many relations have the minimum uncertainties, which is also because it is easy to find correspondences for entities with a lot of relational information. This indicates that the uncertainties in the proposed method successfully capture the reduction in importance due to the lack of relational information.

B. RESULTS OF QUALITATIVE EXPERIMENT
The right panels are also scatter plots of numerical information. As with the relational information, the higher the number of numerical triples, the lower the upper limit of uncertainty for the numerical value of the entity. Because of the nature of numerical information, the number of attribute names and numerical triples coincided in most cases, so there is no significant change in the shape of the figure. The most uncertain entity in the upper right panel is ''Matt_Hubbard.'' ''Matt_Hubbard'' has the numerical information (''Matt_Hubbard,'' imdbId, 1863572) in DB15K; however, it has no numerical information in FB15K. Therefore, there is no matching numerical information, and thus there is a lot of uncertainty. Conversely, ''Brooklyn_Nets'' is an entity with low uncertainty. In DB15K, ''Brook-lyn_Nets'' has (''Brooklyn_Nets,'' foundingYear, 1967) as numerical information, while in FB15K, the corresponding entity has (''Brooklyn_Nets,'' sports_team.founded, 1967.0) given as the information. Thus, it has corresponding information, which reduces the uncertainty. These results suggest that uncertainty weights useful information by increasing the value for modalities that do not have corresponding information.
Next, Fig. 5 shows the magnitude of uncertainty for each entity type. Fig. 5a, 5b, and 5c show boxplots of the magnitude of uncertainty in relational, numerical, and image information, respectively, for each entity type. Fig. 5c shows low uncertainty for ''person'' and ''film,'' but high uncertainty for ''organization,'' ''university,'' and ''country''. In the cases of ''person'' and ''film'' there is little difference in different KGs, such as images of faces and posters. However, ''country,'' ''organization,'' and ''university'' are considered to have different images assigned to each KG, such as buildings, symbols, and maps, making it difficult for the information to correspond to another entity. Additionally, from Fig. 5a, we can see that ''person'' and ''film'' have less uncertainty in the relational information than in the other types of information. The dataset used for this visualization was FB15K-DB15K. The two KGs have similar relationship information such as hometown, occupation, and institutional affiliation for the entities with ''person'' types. Similarly, ''film'' often shares the same basic information such as director, distributor, and country. Therefore, there is a lot of corresponding relational information to be found. Fig. 5a and 5c show that the uncertainty of ''country'' is higher than that of ''person'' and ''film'' for the relational and image information but is similarly low for the numerical information in Fig. 5b. Countries are often given common statistical information such as population, latitude and longitude, and land area as numerical information in the two KGs. This makes the correspondence of numerical information easier to understand and less uncertain. Uncertainty of relational and numerical information was higher for ''university'' than for the other entities. For relational information, of the two KGs, FB15K contains more information on research institutes and specialized subjects of universities, whereas DB15K has more information on addresses and alumni. For numerical information, FB15K contains information about latitude, lightness, year of establishment, etc., whereas DB15K has information on the number of students. The characteristics of the dataset were reflected in the uncertainty. The above results confirm that the proposed method is able to reduce the influence of the modality information that does not correspond between KGs by automatically increasing the uncertainty of the information.
To summarize this section, the proposed method is highly uncertain for information that has no correspondence between two KGs. There are several reasons why information does not correspond between KGs. Fig. 4 shows a case in which the cause of uncertainty is the lack of information given to an entity. Fig. 5 shows the case where there is a difference in the tendency of information to be collected for each type of entity in each KG. We confirmed that the proposed method can successfully represent these useless pieces of information as uncertainty.

C. DISCUSSION OF IMPROVED EXAMPLES
Finally, in this section, we will review examples where the entity alignment was improved by increasing the uncertainty. Table 3 lists examples where the alignment results were wrong in MMEA, but the proposed method was able to obtain the correct answer.
For example, MMEA incorrectly matched the entity for ''Trumpet'' whose ID is ''/m/07gql'' in Freebase with ''Clarinet'' in DBpedia. ''Trumpet'' had an uncertainty close to 1, implying that its embedding was certain. However, the model was also uncertain for Clarinet, whose embedding using relations had high uncertainty. The proposed method took into account the uncertainty of Clarinet and avoided it being selected.
The second example of Table 3 is an alignment result for ''Tsinghua University.'' MMEA incorrectly matched this with ''Ohio_university.'' In the proposed method, the uncertainty of the relational information of ''Ohio_University'' was larger than that of ''Tsinghua_University.'' The proposed method reduced the ranking by considering the uncertainty of the relational information of ''Ohio_university.'' This is consistent with the results of Fig. 5a.
The last example is the result of alignment for ''Sea.'' MMEA incorrectly matched this with ''Geology.'' In DB15K, ''Geology'' has only two triples of relational information, (''Mediterranean_Sea,'' seeAlso, ''Geology'') and (''Charles_Darwin,'' field, ''Geology''). As a result, the uncertainty of ''Geology'' for the relational information was much larger than that of ''Sea,'' and thus no longer matched. This is consistent with the results in Fig. 4. The above confirms that alignment is indeed improved by increasing the uncertainty of unimportant information.

VI. CONCLUSION
In this paper, we propose a novel method of multi-modal KG embedding that utilizes uncertainty to evaluate the importance of supplementary information, which has not been explored in previous works. The method weighs each modality information based on its significance, reducing the impact of missing or erroneous KGs. Experimental results demonstrated the method's significant improvement over the baseline model for multi-modal entity alignment, validating the effectiveness of weighting the importance for information retrieval. Qualitative experiments identified general information shared by different KGs as crucial for retrieval. The proposed uncertainty measure effectively filtered out unimportant information.
While this paper focuses on the usage of multivariate normal distribution to express the uncertainty of each modality, it is important to note that the proposed method has limitations. It may not perform as well in cases where the uncertainty is not normally distributed. In future work, we plan to investigate the use of other probability distributions and uncertainty quantification methods that can address these limitations.