Gated Relational Encoder-Decoder Model for Target-Oriented Opinion Word Extraction

Target-Oriented Opinion Word Extraction (TOWE) is a challenging information extraction task that aims to find the opinion words corresponding to given opinion targets in text. To solve TOWE, it is important to consider the surrounding words of opinion words as well as the opinion targets. Although most existing works have captured the opinion target using Deep Neural Networks (DNNs), they cannot effectively utilize the local context, i.e. relationship among surrounding words of opinion words. In this work, we propose a novel and powerful model for TOWE, Gated Relational target-aware Encoder and local context-aware Decoder (GRED), which dynamically leverages the information of the opinion target and the local context. Intuitively, the target-aware encoder catches the opinion target information, and the local context-aware decoder obtains the local context information from the relationship among surrounding words. Then, GRED employs a gate mechanism to dynamically aggregate the outputs of the encoder and the decoder. In addition, we adopt a pretrained language model Bidirectional and Auto-Regressive Transformer (BART), as the structure of GRED to improve the implicit language knowledge. Extensive experiments on four benchmark datasets show that GRED surpasses all the baseline models and achieves state-of-the-art performance. Furthermore, our in-depth analysis demonstrates that GRED properly leverages the information of the opinion target and the local context for extracting the opinion words.


I. INTRODUCTION
Target-Oriented Opinion Word Extraction (TOWE) [8] is a recently designed text retrieval task of aspect-based sentiment analysis (ABSA) [12], [20], [27]. In TOWE, entities or objects toward which users show their attitudes are regarded as the opinion targets. Correspondingly, those terms explicitly expressing attitudes are defined as opinion words. Given a sentence and opinion targets, the goal of TOWE is to extract the opinion words that reveal sentiment about the opinion targets. For example, in the sentence ''The service is amazing and food is out of this world.'', TOWE identifies the The associate editor coordinating the review of this manuscript and approving it for publication was Junhua Li . word ''amazing'' as the opinion word for ''service'', and the terms ''out of this world'' as the opinion words for ''food''. Table 1 shows more example pairs of opinion targets and opinion words.
Extracting opinion targets or opinion words has been widely used in various Natural Language Processing (NLP) tasks such as sentiment analysis [17], [26], [38], [46] and text mining [15], [30], [45]. From this trend, TOWE has become more important because it explicitly informs the correlations between opinion targets and opinion words. To address this task, [8] have constructed four benchmark datasets based on the ABSA datasets [27], [28], [29] from different domains, restaurant and laptop. Also, they have dealt with TOWE task by designing it as a BIO-tagged token TABLE 1. Example of user reviews and their extracted pairs of opinion targets and opinion words. The red-colored words and the blue-colored words represent opinion targets and opinion words, respectively.

FIGURE 1.
The concept of GRED. The target-aware encoder finds target-dependent opinion words, and the local context-aware decoder seeks local context-dependent opinion words. Then, the gate network dynamically aggregates the outputs of these two networks.
classification problem [32] for a sentence with opinion targets.
To solve TOWE, previous works have primarily focused on how to incorporate opinion target information into sentence representations. In this literature, there have been various attempts to encode the opinion target such as using an additional BiLSTM-based target encoder [8] or adopting a target position embedding [36], [41]. With the development of pretrained language models (PLMs) [6], [7], [16], recent works have added a special token indicating the opinion targets to exploit the power of PLMs, which have shown promising results [10], [14].
Nevertheless, these approaches have a critical limitation in that they cannot effectively utilize the surrounding words of the opinion words. Understanding such local context as well as the opinion target information is helpful in extracting the opinion words. For example, in the sentence ''The turkey was to die for'', a human can easily identify the word ''for'' as the opinion words by considering the opinion target ''turkey'' and the surrounding word ''die'' together. Although [14] have attempted to incorporate this local context information into PLMs, they have merely combined the target information with the local context information, resulting in congested sentence representation. Consequently, their performance has shown limited improvements only where the local context information has less importance.
In this paper, we propose a Gated Relational target-aware Encoder and local context-aware Decoder-based sequence labeling model (GRED), which dynamically leverages the opinion target information and the local context information for TOWE. Specifically, the target-aware encoder first obtains the opinion target information by using the target relation network [34]. Simultaneously, the local context-aware decoder captures the local context information from the relationships among surrounding words by using the local context relation network. Then, GRED employs the gate network to aggregate the outputs of the encoder and the decoder. The role of the gate network is to determine how much those outputs will impact the final prediction. Therefore, GRED can properly mix the opinion target information and the local context information rather than roughly combining them. The concept of GRED is illustrated in Fig. 1. In addition, to improve the language knowledge of both the encoder and the decoder, we adopt a pretrained language model Bidirectional and Auto-Regressive Transformer (BART) [16] as the structure of GRED.
We evaluate our GRED on the four public datasets, 14res, 14lap [29], 15res [27] and 16res [28]. Our extensive experiments demonstrate that GRED performs better than the baselines and achieves state-of-the-art performance. Additionally, further comprehensive analyses validate the effectiveness of the target-aware encoder, the local context-aware decoder, and the gate network of GRED. In summary, the contributions of our work are as follows: • We propose a novel transformer-based sequence labeling model GRED for TOWE. To enhance the language knowledge of GRED, we adopt both the encoder and decoder of BART, a pretrained language model.
• GRED's target-aware encoder and local context-aware decoder utilize the opinion target information and local context information, which are critical to solving TOWE task. Furthermore, the gate network of GRED can dynamically leverage a target-aware encoder and a local context-aware decoder.
• With our comprehensive analysis of GRED, we demonstrate the effectiveness of GRED from various perspectives. Our proposed methods not only enhance the performance of TOWE task but also help interpret predicted results as well as improve the model performance. The remaining parts of our paper are organized into four sections. In section II, we briefly review the previous works on aspect-based sentiment analysis and TOWE tasks. Then we introduce our model GRED in section III. Experimental results are provided and discussed in section IV. Finally, we present the conclusion and future works in section V.

II. RELATED WORKS
Extracting opinion targets and opinion words have been principal tasks for natural language processing. One line of this research has focused on opinion target extraction (OTE), which aims to seek the opinion target aspect terms in sentences [23], [30], [42], [44]. In other approaches, opinion words extraction (OWE) has attempted to find words expressing users' attitudes [2], [5], [11], [33]. Recently, several works have proposed a coextraction framework that extracts opinion targets and opinion words jointly. They have detected the targets and opinion words jointly by utilizing a word alignment model [21] or multitask learning [18], [39], [40]. These works have been able to obtain information on opinion targets and opinion words, respectively. However, none of them have considered the relationship between opinion targets and opinion words.
To study this, there have been researches conducted on the task of extracting corresponding opinion words for given opinion targets. Classical methods have been designed to seek corresponding opinion terms based on word distance [12] and dependency parsing tree [47]. However, these methods require external knowledge and show vulnerability to diverse patterns of data. Therefore, subsequent works have integrated opinion target information into the context and extracted the corresponding opinion words using deep neural networks such as RNN [8], [41] and GCN (Graph Convolutional Network) [13], [36]. In these works, [8] have first proposed an end-to-end neural network model using IOG (Inward-Outward LSTM + Global context) to fuse opinion target information with the global context but required high model complexity. Unlike the previous work, [13], [22], [36], [41] have employed position embedding of opinion targets, resulting in not increasing the model complexity excessively. Additionally, [22] have used various combinations of embedding architectures such as Transformer, GCN, and RNN. GRED has also utilized the transformer architecture to obtain sentence embeddings. However, while most studies have adopted only the encoder of the transformer, we have used both the encoder and the decoder.
Nevertheless, the above methods cannot fully utilize powerful pretrained language models to address TOWE task. Thus, recent works have adopted pretrained language models and achieved promising results [9], [10], [14], [43]. They have incorporated opinion target information into PLMs by modifying the input sentence with explicitly marking the opinion targets. In addition, [14] have shown that local context information is also important for solving TOWE task. Inspired by these previous works, our proposed method GRED has attempted to obtain the opinion target information and local context information. However, GRED is different from the pretrained language model-based methods as GRED dynamically mixes these two important pieces of information. Furthermore, we have adopted pretrained language model BART, which can exploit both the encoder and decoder of the transformer. As a result, GRED has shown the most overwhelming performance in TOWE task.

A. TASK FORMALIZATION
TOWE task can be formalized as a sequence labeling problem for opinion target specified sentences [8]. For given a sentence s = {w 1 , w 2 , . . . , w n } consisting of a sequence of n words and opinion targets, we use BIO tagging scheme [32] that classifies each word as y i ∈ {B, I , O} (Beginning, I nside, and Others) to solve TOWE task. Table 2 illustrates the BIO-scheme sentence for given opinion targets.

B. OVERALL FRAMEWORK
The structure of our Gated Relational target-aware Encoder and local context-aware Decoder (GRED) is illustrated in Fig. 2. GRED consists of a target-aware context encoder module and a local context-aware decoder module with a gate network. The overall framework of GRED is as follows. We first add a special token to indicate the opinion targets within the sentences. Next, these modified sentences are encoded into the target-aware encoder module and the local context-aware decoder module. These modules aim to capture opinion target information and local context information via the relation networks as in [34] and [14], respectively. The target-aware encoder module extracts the target-aware representation using multi-head self-attention layers and a target relation network. In the local context-aware decoder module, the local context relation network catches the local context representation from the surrounding words, and then the gate network outputs final representations by aggregating these two representations. Finally, the Conditional Random Field (CRF) layer determines the tags of the sequence based on the final representation as in [8], [14], and [10].

D. TARGET-AWARE ENCODER MODULE
The target-aware encoder module consists of a text encoder and a target relation network. The text encoder receives the modified sentence s * and produces the embedding of each word of those sentences. For each word in the sentence, the target relation network generates target-aware word representations based on the relationships between the opinion targets and each word in the sentence.

1) TEXT ENCODER SENTENCE EMBEDDINGS
The text encoder adopts the structure of the transformer encoder in Vaswani et al. [35], which has increased expressive power by using a multi-head self-attention mechanism. To exploit the power of the pretrained language model, we use the encoder part of BART for the text encoder. BART has the same architecture as the transformer but incorporated pretrained language knowledge. The text encoder takes the modified sentence s * and generates context embedding of s * (Figure 2 Text Encoder): where BART enc is the encoder layers of BART, h e i is the encoder representation of each token w i , and H enc = {h e 1 , . . . , h e i , . . . , h e i+j , . . . , h e n } is the sentence embedding of s * .

2) OPINION TARGET EMBEDDING
To obtain the opinion target embedding, we first gather the parts that correspond to the opinion target from the sentence embedding H enc and then apply a pooling layer to these parts. For the pooling layer, various pooling methods such as maxpooling, mean-pooling, and LogSumExp (LSE)-pooling are used to compute the opinion target embeddings: where LSE({h e i , . . . , h e i+j }) = log i+j k=i exp(h k ) and h e target is an opinion target embedding.

3) TARGET-AWARE REPRESENTATION
Target-aware representations are derived from the relationships between each word in the sentence embedding and the opinion target embedding. Thus, we employ the relation network [34] to compute these relationships. As shown in Fig. 3(a), the target relation network takes the opinion target and each word in the sentence and then produces the target-aware representation. In this work, we use MLP as a structure of the target relation network. Given the opinion target embedding h c target and the sentence embedding H enc , the target-aware representation is calculated as follows: where TRN is the target relation network, C Tar is combined function, r Tar u is the target-aware representation of each token, and R Tar is the target-aware representation. We use the concatenation operator as C Tar , but other methods such as elementwise sum and multiplication are possible.

E. LOCAL CONTEXT-AWARE DECODER
A local context-aware decoder is composed of a text decoder and a local context relation network. The text decoder takes the right-shifted sentence s * d and generates causal sentence embedding based on the encoder output and the previous words. The local context relation network produces the local context-aware representation by exploring the relationships among surrounding words in the given sentence. Finally, the gate network dynamically fuses the local context-aware representation into a target-aware representation for predicting labels.

1) TEXT DECODER SENTENCE EMBEDDING
The text decoder has the same structure as the BART decoder, which is a decoder of the transformer but incorporates language knowledge. Unlike the text encoder, the text decoder uses masked multi-head attention. Thus, the right-shifted sentence s * d is passed to the decoder. Given the encoder sentence embedding H enc and the sentence s * d , then the text decoder computes the text decoder sentence embedding as follows (Figure 2 Text Decoder):

2) LOCAL CONTEXT-AWARE REPRESENTATION
The local context-aware representations are obtained by contextualizing each word and its surrounding words in the sentence. We use the local context relation network to explore the relationship among surrounding words in the given sentence.
As illustrated in Fig. 3(b), the local context relation network calculates these relationships based on the word, its left-side word, and its right-side word. In this work, we consider trigrams as the local context (e.g. {w u−1 , w u , w u+1 }). Thus, the local context-aware representation is computed as follows: where LRN is the local context relation network, C Loc is combined function, r Loc u is the local context-aware representation of each token, and R Loc is local context-aware representation. C Loc is composed of two different MLP and elementwise sum (i.e., C Loc (x, y, z) = A(x, y) + B(y, z); A and B are MLP).

3) GATE NETWORK
The final representations aggregate the target-aware representation from the encoder and the local context-aware representation from the decoder. Instead of simply combining the two representations, GRED leverages the gate network to decide how both representations play a part in sequence labeling. Given the target-aware representation and the local context-aware representation, the gate network computes the aggregated representation as follows:

VOLUME 10, 2022
where α u is the gate weight of each word, σ is the sigmoid function, W Tar u and W Loc u are weight matrices, b u is the bias vector, r u is the final representation of each token, and R is the final representation.

F. DECODING STRATEGY AND LOSS FUNCTION
Given the final representation R, we decode the sequence label Y = {y 1 , . . . , y n } based on the probability p(Y |R). In this work, we adopt Conditional Random Field (CRF) as our decoding strategy because it can capture the word structural dependency of the sentence and correlations between labels. Specifically, the score function of CRF can be defined as: where the A measures the transition score between two adjacent labels, W q is the weight matrix, b q is the bias, and the matrix Q is the emission score. Then, we can compute the probability using the score function: whereȲ is the set of possible sequential labels. CRF uses the negative log-likelihood as the loss function. Thus, the given sentence loss is calculated as follows: We minimize this loss function Loss(s) for training. For a decoding process, the model generates the label sequence, which maximizes p(Y |R) via the Viterbi algorithm [37].

IV. EXPERIMENTS A. EXPERIMENTAL SETUP 1) DATASETS
To verify the effectiveness of GRED, we conduct extensive experiments on four benchmark TOWE datasets: 14res, 15res, 16res, and 14lap. These datasets were built by [8] for TOWE task based on the SemEval Challenge 2014 Task 4 [29], SemEval Challenge 2015 task 12 [27] and SemEval Challenge 2016 task 5 [28] respectively. 14res, 15res and 16res are collected from review sentencess in restaurant domain. 14 lap contains the review sentence in the laptop domain. The statistics of these datasets are summarized in Table 3.

2) BASELINES
For the comprehensive and comparative analysis of GRED, we compare it with the following methods:

a: RULE-BASED METHODS
We adopt Distance-rule and Dependency-rule as our baselines. The distance-rule method utilizes distance and Part-Of-Speech (POS) tags to extract opinion words. The dependency-rule method uses the dependency tree of the sentence to determine the opinion words.

b: DEEP NEURAL NETWORK (DNNs)-BASED METHODS
We choose BiLSTM, TC-BiLSTM, IOG [8], PE-BiLSTM and LOTN [41] for our DNN-based baseline methods. These methods employ the Recurrent Neural Networks (RNNs) to capture the dependency between the opinion target and its corresponding opinion words. IOG uses an Inward-Outward BILSTM to learn opinion target-aware representation and global context representation and then fuses these representations to predict labels. LOTN utilizes the additional position embeddings for indicating opinion targets and transfers the latent opinion knowledge from resource-rich datasets to TOWE task model.

c: PRE-TRAINED LANGUAGE MODEL (PLM) BASED METHODS
We also adopt SDRN [4], ONG [36], ARGCN [13], TSMSA [10], RABERT [14] and UNI-GEN [43] for baselines. SDRN employs a BERT-based encoder with a target entity extraction network, an entity relation detection network, and a synchronization network for the Aspect Opinion Pair Extraction (AOPE) task. ONG incorporates the syntactic structures of the sentence into deep learning models using Graph Convolution Networks (GCNs). ARGCN consists of BiLSTM-based sequential layers and attention-based relational GCN layers to capture semantic and syntactic relation between words simultaneously. TSMSA uses a multihead self-attention mechanism to specify opinion target in the sentence. RABERT integrates the relation network into BERT layer to capture the relationship between words. UNI-GEN converts all ABSA subtasks into a unified generative formulation and exploits BART to solve all these tasks.

3) HYPER PARAMETER SETTING
We implement our proposed GRED with the Pytorch library [25] and Hugging Face Transformers. 1 In our experiments, the batch size is set to 8, and the maximum sequence length is set to 128. We train the model using the Adam optimizer and learning rate decay strategy with β 1 − 0.9 and β 2 = 0.999. We also set the warmup steps to 100 steps. The dropout rate is selected from {0.3, 0.4, 0.5} based on the performances of the validation sets. The learning rate of the encoder layer and decoder layer is set to 5×10 −5 . We set the learning rate of TRN and LRN to 1×10 −4 . We adopt pretrained BART from [16], which consists of 12 layers for the encoder and the decoder, respectively. TRN and the gate network consist of 1-layer Feed-Forward Networks (FFNs). And the LRN is composed of two parallel 1-layer FFNs and one additional FFN. GRED could fit in i7-6850 64G  [4], [8], [10], [13], [14], [36], [41], [43].

FIGURE 4.
Ablation study of GRED. We report the F1 score of the variants of GRED on the four benchmark datasets.
CPU and a single NVIDIA GTX 1080ti GPU. For a fair comparison with the baselines, we randomly sample 20% of the train set as the dev set using the same random seeds as in [8].

4) EVALUATION METRICS
To maintain consistency with previous works [8], [10], [14], [41], we adopt precision, recall, and F1 score as the evaluation metrics to compare the performance of the models. As in [8], we consider the predicted opinion words spans to be correct when their starting and ending points are equal to those of the golden spans.

1) PERFORMANCE COMPARISON ON THE BENCHMARK DATASETS
Here, we focus on the TOWE task performance comparison between the proposed GRED and the existing models on the four benchmark datasets. All the experimental results are reported in Table 4. The best scores are highlighted in bold style. First, compared with the other baselines, the proposed GRED obtains superior performance and achieves new state-of-the-art performance on all of the datasets. In detail, we find that GRED outperforms the state-of-the-art scores by 0.24%, 1.71%, 0.39%, and 0.97% for F1 scores on the four datasets. These results validate the effectiveness of the proposed GRED for TOWE task.
For the results of rule-based methods, the dependency rule performs better than the distance rule, which indicates that word dependency is critical to solving TOWE task. And we note that both rule-based methods (Distancerule and Dependency-rule) show poor performance across all the scores. This reveals that the rules cannot handle the diverse patterns of TOWE task. On the other hand, DNN-based methods achieve relatively better performance than rule-based methods because of their model TABLE 6. Case study: the prediction of Encoder+Decoder model and GRED. The red-colored words and blue-colored words represent opinion targets and opinion words, respectively. expressive power. Next, we observe that IOG, PE-BiLSTM, and LOTN obtain an F1-score with an approximately 25% improvement over the LSTM and BiLSTM methods, which demonstrates the effectiveness of using opinion target information. Thus, all of these results reveal that capturing both word dependency and opinion target is important for TOWE.
Finally, pretrained language model-based methods achieve great performance on all the scores. In particular, RABERT and TSMSA show relatively better performance than other pretrained language model-based methods such as SDRN, ONG, ARGCN, and UNI-GEN. These experimental results demonstrate that the target-indicating token can be effective in fully utilizing the pre-trained language model for TOWE task. Additionally, RABERT achieves the previous stateof-the-art performance. This validates the effectiveness of capturing target information and local context information together.

2) ABLATION STUDY
To investigate the effectiveness of each part of GRED, we evaluate the variants of GRED: (a) Encoder only: only using the target-aware encoder to predict labels, (b) Decoder only: only feeding the output of the decoder to compute the final representation, (c) Encoder+Decoder: uniformly combining the outputs of the encoder and the decoder without the gate network, and (d) GRED.
The results of the ablation study are depicted in Fig. 4. First, Encoder only model is inferior to the other models on most of the datasets. This inferiority demonstrates that using only opinion target information cannot extract the diverse patterns of opinion words in TOWE task. Second, naive aggregation of the encoder and the decoder does not ensure performance improvement. The performance drops of Encoder+Decoder at 15res and 16res confirm the suitability of our proposed gate mechanism. Overall, GRED shows the best performance compared to all the other models on all the datasets. This high performance reveals that all three components of GRED are critical to solving TOWE: 1) capturing the opinion words, 2) utilizing the local context, and 3) dynamically leveraging by our gate mechanism.
Next, we also report the performance of various opinion target pooling methods in Table 5: Max-pool, Mean-pool, and LSE-pool from Equation 2. We observe that the Max-pooling method shows relatively poor performance, but the Mean-pool and the LSE-pool perform with almost identical performance, which indicates that the Max-pool can lose more target information than the other methods. Comparing the Mean-pool and the LSE-pool, the performance of the LSE-pool is more robust in the four benchmark datasets. In particular, the performance of LSE-pool is good at 14lap, which uses more complicated words than the other datasets. Thus, we adopt the LSE-pool for the target pooling method of GRED in this work.

3) CASE STUDY
In order to validate the effectiveness of our proposed GRED, we extract some TOWE examples of GRED and Encoder+Decoder model from the two different domains, restaurant and laptop (Table 6). In a simple case such as Sentence 1, we can observe that both GRED and Encoder+Decoder model provide the correct prediction for the given sentence and opinion targets. However, in Sentence 2, Sentence 3, and Sentence 4 which are more complicated than Sentence 1, only GRED can successfully extract the correct opinion terms. In sentence 3, Encoder+Decoder cannot extract the multiple opinion words. We also note that, even in the wrong prediction examples, Sentence 5 and Sentence 6, GRED gives the prediction closer to the gold label answers than Encoder+Decoder model. In sentence 5, Encoder+Decoder only extracts cooked as the opinion words but GRED identifies perfectly as the opinion words. These results demonstrate that the proposed GRED can leverage the gate network to improve the effectiveness of complicated patterns of TOWE task.
We also visualize the weight of the gate network α u to investigate its role in predicting labels in Fig. 5. In the first example, the value of α u increases at the word ''top notch'', which corresponds to opinion words for ''hot dogs''. However, in the second example, the highest value of α u is at the word ''amazing''. These results indicate that the target-aware encoder mainly focuses on opinion target-dependent words. In the third example, we can observe that the value of α u increases in the words, ''great'', ''die'', and ''for'', which indicates that the gate network can capture multiple opinion words simultaneously. In particular, comparing the words ''die for'' and ''Not much'' in the third and fourth examples, and the words ''top notch'' in the first example, the values of α u at ''much'' and ''for'' are lower than those at ''die'', ''Not'' and ''top notch''. Since the opinion words ''for'' and ''much'' are dependent on the surrounding words ''die'' and ''Not'' respectively as well as their corresponding opinion targets, the gate network reduces the influence of the target-aware encoder. Therefore, these results demonstrate that the gate network can effectively regulate the target-aware encoder and the local context-aware decoder.

V. CONCLUSION
In this paper, we propose a novel and powerful transformerbased encoder-decoder model named GRED for targetoriented opinion words extraction. In GRED, its target-aware encoder and local context-aware decoder can capture the opinion target information and the local context information, respectively, which is crucial for addressing TOWE task. Then, the gate network dynamically combines the outputs of the encoder and the decoder to predict the label sequences. Therefore, GRED can flexibly utilize the opinion target and local context information based on the gate mechanism. Furthermore, GRED can enhance its performance by incorporating language knowledge of the pretrained language model, BART. To validate our proposed GRED, we conduct extensive experiments on the four widely used benchmark datasets. Our GRED outperforms all the baseline methods and achieves state-of-the-art performance. Additionally, in-depth analyses and qualitative studies demonstrate that the gate network appropriately adjusts the influence of the target-aware encoder and local context-aware decoder to identify opinion words. As a result, GRED effectively solves the TOWE task by dynamically utilizing opinion target and local context information.
In future work, we plan to design more efficient modules to capture the invaluable information in a sentence and investigate more informative relationships for TOWE task. In addition, we desire to use large-scale language models such as T5 [31] and GPT3 [3] for GRED. Also, we would like to apply GRED to the other datasets that can be more helpful to the public interests, like COVID-19 datasets [1], [24] Furthermore, we attempt to use GRED for the entire space as in [19] to validate the generalization performance of GRED. He is a Professor with the Electrical and Computer Engineering Department and an Adjunct Professor with the Department of Mathematical Sciences, Seoul National University. Currently, he is the Vice-Chair of the ECE Department for student affairs. His research interests include natural language processing, deep learning and applications, data analysis, and web services.