Relationship Extraction and Processing for Knowledge Graph of Welding Manufacturing

Acquiring welding domain relationships and forming a knowledge graph can positively impact complex engineering problem solving and intelligent manufacturing applications. However, relationships are lacking in the welding domain. The relationship extraction and processing solution are designed to handle data with different characteristics in welding fabrication. The BiLSTM+Attention and CR-CNN models are employed to extract relations in unstructured documents. The neighborhood rough set-based association rule model is proposed for project-specific documents to accomplish relationship acquisition, in which invalid attributes are removed via neighborhood rough sets and attribute values are related via association rules. In addition, the knowledge graph is built based on extracted relationships, and unique empirical relationships are handled by introducing relational nodes and databases. The results show that BiLSTM+Attention gets a good score with Macro-average metrics (0.788 for Precision, 0.846 for Recall, and 0.816 for F1-score). The relational rules obtained via the proposed model are consistent with the production experience. The constructed knowledge graph effectively handles empirical relationships while positively impacting knowledge retrieval, intelligent question and answer, and decision-making for complex engineering problems.

The associate editor coordinating the review of this manuscript and approving it for publication was Sathish Kumar .
unresolved. The knowledge graph is considered to address the 30 proposed limitations due to its compatibility in the knowledge 31 representation. Relationship extraction is an indispensable 32 step in knowledge graph construction, and it is an essential 33 medium for the logical composition of knowledge and linking 34 of domain entities. However, the specialized and complex 35 nature of the domain relationships makes relationship extrac-36 tion challenging in manufacturing, especially in welding 37 fabrication. 38 Relational extraction methods are generally classified 39 as template-based [8], supervised-based [9], and weakly 40 supervised-based relationship extraction [10]. The template-41 based approach uses pre-defined relationship templates by 42 domain experts and then matches relationships from the 43 text. This method has high applicability in a small range 44 of texts but relies on extensive manual work making it less 45 VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ portable. Supervised learning models are used for relation- 46 ship extraction based on pre-labeled data to acquire many textual relationships. The method is also divided into pipeline  (i) We implemented the relationship extraction of unstruc-130 tured documents oriented to the lack of knowledge in welding 131 manufacturing. The baseline for welding relationship extrac-132 tion was listed on our data to support the relevant studies.

133
(ii) A neighborhood rough set-based association rule 134 model for welding data structure characteristics is pro-135 posed to extract the relationships in the attributes and 136 attribute values from the actual welded structured documents, 137 respectively.

138
(iii) Empirical relational databases and extracted relation-139 ships are embedded in domain knowledge graphs to sup-140 port complex problem-solving and critical decision-making 141 in welding manufacturing.

142
The aim is to extract relations and build a knowledge graph 143 for welding manufacturing data. For this reason, we designed 144 three kinds of relationship extraction and processing methods 145 for different data forms. In addition, the experimental results 146 were obtained via data processing, attribute restoration, and 147 relationship model construction. We predict that our work can 148 support the digital construction of welding manufacturing and 149 the data requirements of intelligent systems.   calculation formula is as follows: where is the sigmoid activation function, W f and b f repre-175 sent weights and biases, h t−1 represents the output of the last 176 time, and x t is the input at the moment.

177
The input gate is employed to update the state at the current 178 moment. The calculation process is expressed in (2) The positive state of the cell is assigned to the current 187 output via the output gate. The output is calculated via (5) 188 and (6). Where o t is the current output cell status, and h t is 189 the current output value.
BiLSTM+Attention obtains bidirectional semantic infor-193 mation in a sentence via fusing a forward LSTM, a reverse 194 LSTM, and an attention mechanism, including an input layer, 195 an encoding layer, an LSTM layer, attention, and an out-196 put layer. The input layer splits the sentence into several 197 words to complete the data input. The embedding layer imple-198 ments vocabulary mapping to low latitude space LSTM layers 199 acquire high-level features from the forward and reverses 200 directions. The Attention layer is weighted by the posi-201 tion weights to obtain the sentence vector. The output layer 202 completes relational classification based on sentence feature 203 vectors. The schematic diagram is described as shown in 204 FIGURE 2.

206
Companies usually keep many standardized result files in 207 welding production, guiding production and ensuring quality. 208 These files contain multiple attributes, and it is not easily 209 defined whether the attributes are related. Extracting rela-210 tionships oriented towards valid attributes is positive for our 211 work. The neighborhood rough set-based association rule 212 model is proposed, which takes the field attributes in the data 213 as decision attributes and the remaining attributes as condi-214 tional attributes, respectively. And complete attribute simpli-215 fication and relationship extraction for welding production 216 data. The reduced attributes and the decision attributes form 217 an inference relationship. The specific workflow of the model 218 can be divided into four steps as follows:

219
(1) The information decision system S = (U , A, V , f ) 220 is built separately for the field attributes in the data. Where 221 U is a non-empty finite set of objects called the theoretical 222 domain; A is a non-empty set of attributes that is the con-223 catenation of conditional attributes C and decision attributes 224 D; V is the value domain; f is an information function that 225 meets ∀x ∈ U , a ∈ A, f (x, a) ∈ V a . Defining x i ∈ U , the 226 neighborhood of x needs to satisfy (7).  (8) and (9).  (12). in (14). Relationships can be extracted from rules that exceed 255 the confidence threshold t, as in (15). 258 The knowledge graph is a collection of entities and relations; The relational database contains fundamental logical rela-276 tions (greater than, less than, starts at, ends at, not equal 277 to, etc.), empirical relations (logic rules, empirical formulas, 278 etc.), and model relations (classification or regression models 279 based on actual engineering data, etc.). In addition, the rela-280 tionship is called according to the unique key. The physical 281 form of a relational library can be seen as a collection of mul-282 tiple interfaces. When knowledge search involves relational 283 nodes, we can obtain relational content based on key-value 284 pairs. The schematic structure is shown in FIGURE 3.

287
Relationships are essential ties between properties of things 288 and connect entities to form domain knowledge networks 289 in the welding manufacturing knowledge system. In actual 290 production, relationships are often contained in unstruc-291 tured documents, standardized production data, empirical for-292 mulas, etc. Extracting domain relationships from different 293 structures is significant for domain knowledge system con-294 struction and welding engineering applications. The relation-295 ship extraction for practical welding production is divided 296 into three sub-tasks: unstructured data extraction, relationship 297 complementation, and experience relationships processing. 298 Relationship extraction and knowledge graph construction 299 are expressed in FIGURE 4.

300
Unstructured data extraction is the process of converting 301 unstructured data into standard relational triples. The rela-302 tionships are obtained through data processing, word vec-303 tor, and the relational model. The engineering data char-304 acteristics are considered in the relationship extraction of 305 standardization result documents. The attribute dependency 306 relationship is established through attribute reduction, and 307 the standard triplet relationship between attribute values is 308 obtained through association rules. Empirical data and for-309 mulas are collected into relational databases and associated 310 with knowledge maps through key-value pairs. In addition, a 311 knowledge map is constructed based on the acquired standard 312 triples.  Welding process specifications (WPS) are essential doc-345 uments for welding manufacturing and are used to sup-346 port the task of extracting relationships from standardized 347 result files. The critical attributes that make up the WPS 348 include weld method, weld joint, weld groove, assembly 349 parameters, base material parameters, preheat, and other 350 information in the welding process design. We collected 351 standard WPS files for bogie welding fabrication of high-352 speed trains. Selected welding position (Position), blunt edge 353 range (Blunt), assembly gap (Assembly), preheating tem-354 perature (Preheat), and gas flow rate (Flow) as decision 355 attributes to complete the property reduction. The detailed 356 information is listed in TABLE 2. The conditional attributes 357 have 9 categories: weld method (Method), weld type (Type), 358 weld groove (Groove), the base material 1 (Base-1), the 359 base material 2 (Base-2), the thickness of base material 1 360 (Min-1, Max-1), and thickness of base material 2 (Min-2, 361 Max-2). Significance (Sig) and weights (Weight) characterize 362 the influence of conditional attributes on decision attributes. 363 As shown in TABLE 2, the invalid attribute's significance and 364 weight are denoted as ''-,'' defined by a significance threshold 365 of 0.01.

367
We train models based on the trained word embedding in 368 the supervised condition to accomplish unstructured docu-369 ment relationship extraction. The 1832 sentence-level data 370 related to welding manufacturing are divided into training, 371 validation, and test sets and contain five relationship cat-372 egories (belong_to, reference, requirement, applicable_to, 373 unknown). The detailed information is shown in TABLE 3. 374 Furthermore, we run programs written via the python 375 programming language (version:3.7) in the TensorFlow 376 framework (version: 1.14.0).

377
Accuracy is the commonly used evaluation metric in most 378 conditions. However, in classification problems, the accuracy 379 calculation relies on large sample categories and has low 380 VOLUME 10, 2022    (19). As shown in TABLE 4, BiLSTM+Attention achieves 409 better macro-average metrics results than CR-CNN. For 410 BiLSTM+Attention, the F1-score of the ''belong_to'' cate-411 gory is the highest, i.e., 0.912, while the category reference 412 F1-score is the lowest, i.e., 0.688. The high score of cate-413 gory ''belong_to'' may be due to the independent entities that 414 make the sentence relationship feature clear, such as ''CP C1 415 belong_to the weld quality level''. The low scores in category 416 reference may be because most target entities are composed 417 of multiple independent entities, making the sentence rela-418 tionship characteristics ambiguous, such as ''Arc bolt weld-419 ing of metallic materials reference ENISO14555''. Hence, 420 enhanced entity features may positively affect the extraction 421 of welding manufacturing relationships.

423
The 110 actual production welding procedures were collected 424 as a sample to support the validation of the model. The sample 425

453
(3) The blunt edge refers to the part without a groove 454 in the thickness direction, which is used to prevent weld-455 ing penetration. Groove and partial plate thickness infor-456 mation are considered factors affecting the selection of 457 blunt edges. And Category Method, Base-1, and Base-2 458 also influence the choice of a blunt edge due to different 459 base materials and welding methods with different melting 460 depths.

461
(4) The choice of welding gas flow rate directly affects the 462 quality of welding production. In practice, the gas flow rate 463 is related to the welding method, the material of the welded 464 part, and some plate thickness information. Therefore, the 465 simplification results have credibility.

466
(5) Preheating before welding effectively controls weld-467 ing quality, incredibly thick plate welding. Maximum plate 468 thickness information positively influences the selection of 469 preheating temperature. In addition, different bevel geome-470 tries and heat flow densities will result in different preheating, 471 which makes property Method and property Groove influence 472 property Preheat.

473
The simplification results are highly similar to the actual 474 production experience based on the above information. 475 Therefore, we extracted the relationship between different 476 decision attribute values and conditional attribute values 477 VOLUME 10, 2022   A knowledge graph is a networked form of data storage that 497 positively impacts transforming knowledge and relationships 498 into practical engineering applications. Relationships as an 499 essential factor in knowledge graph construction are focused 500 on in this paper. The relationship extraction in welding man-501 ufacturing differs from traditional extraction methods due 502 to the complexity and specialization of engineering data. 503 This study divided the relationship extraction task into three 504

558
(ii) The actual engineering file relationships are extracted 559 through a neighborhood rough set-based association rule 560 model. Several relationship rules are obtained from 110 engi-561 neering data and are consistent with engineering experience. 562 (iii) Relational databases and relational nodes are intro-563 duced to implement knowledge graph embeddings of empir-564 ical relationships with positive engineering application 565 effects.

566
The proposed method can complete the extraction of weld-567 ing relations, especially suitable for processing a large num-568 ber of redundant data. The domain knowledge graph based 569 on the extracted relationship can support the solution of 570 complex engineering problems such as domain knowledge 571 retrieval, intelligent question answering, and expert decision-572 making. Furthermore, our research may extend data appli-573 cability, improve model accuracy, and efficient engineering 574 applications based on obtained results. 575 KAINAN GUAN was born in Luoyang, China, 653 in 1994. He received the M.S. degree in materials 654 science and engineering from Dalian Jiaotong Uni-655 versity, where he is currently pursuing the Ph.D. 656 degree in materials science and engineering.

657
His research interests include welding produc-658 tion informatization and intelligence, application 659 of knowledge engineering in traditional manu-660 facturing, expert decision algorithms, and data 661 support. Some of his research results have been 662 successfully applied to practical welding production.