I. Introduction
Knowledge Graphs (KGs) are structured semantic networks that have been widely adopted in various applications, including semantic search, recommendation systems, and natural language question answering [1]. Nevertheless, traditional KGs represent entities and relations using triples, which weakens the ability of machines to fully describe and understand the complexity of the real world [2]. To overcome this issue, Multi-Modal Knowledge Graphs (MMKGs) like MMKG [3] and Richpedia [4] have been developed in recent years. These MMKGs enrich the knowledge diversity by incorporating additional multi-modal knowledge such as text and images into traditional KGs. Although MMKGs contain abundant information, they still suffer from the problems of low coverage and incompleteness, which has significantly hindered their applications [1].