Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners

Prompt-based learning is a method used for language models to interpret natural language by remembering the prior knowledge acquired and the training objective. Recent prompt-based few-shot learners have achieved superior performance by alleviating the catastrophic forgetting that occurs in pretrained language models. Few-shot learning contributes towards solving the data scarcity problem, an enormous challenge in AI systems and a significant consideration in natural language processing research. In spite of the significance of few-shot learning, research on Korean language-based few-shot learning is insufficient, and whether the prompt-based approach is appropriate for the Korean language has not been thoroughly verified. As a step toward realizing a Korean-prompt-based few-shot learner, we attempt to apply prompt engineering to the Korean language understanding benchmark dataset and introduce plain template insertion to overcome data scarcity in a more practical few-shot setting. The contributions of this study are as follows: (1) presumably, this is the first study to apply prompt-based few-shot learning to Korean benchmark datasets. With 32 few-shot settings, it improves performance by +14.88, +29.04, and +1.81 in the natural language inference, semantic textual similarity, and topic classification tasks. (2) We present prompt engineering, which merely inserts a plain template and increases data efficiency without training example selection, augmentation, reformulation, and retrieval. (3) Our approach is robust to the Korean prompt’s contextual information and sentence structure and is applicable to both hard- and soft-prompt.


I. INTRODUCTION
The recent concept of prompt-based learning has been proposed to utilize the vast amount of latent prior knowledge of contained in pretrained language models. Prompt-based learning predicts the correct answer, that is, it solves the task at hand based on linguistic knowledge and contextualized The associate editor coordinating the review of this manuscript and approving it for publication was Bo Pu . representations memorized during pretraining by reforming the input sequence with a textual prompt. This method enables finetuning of the model with well-aligned pretraining objective, demonstrating exceptionally high performance in using low-resource strategies.
In recent years, research on few-shot learning has been actively conducted to achieve high learning performance with small amounts of training data and light models by using prompt-based learning. According to [9], VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ prompt-based learning is divided into hard-prompts using human-interpretable natural language in discrete spaces and soft-prompts using the continuous embedding space of the model directly. Pattern-exploiting training (PET) [20] and better few-shot finetuning of language models (LM-BFF) [5] based on hard-prompt-based learning demonstrate the interpretability of the model's inference results, achieving superior performance on the GLUE [25] and SuperGLUE [26] benchmark datasets. In soft-prompt tuning, P-tuning [11] achieves outstanding performance on few-shot SuperGLUE dataset through additional prompt encoders' parameters.
Despite the success of prompt-based few-shot learning, research on prompt-based few-shot research learning for Korean has not matured rapidly. Prompt-based learning in other languages cannot guarantee high performance in Korean, which has different word orders and an agglutinative linguistic feature. Further, there is a problem concerning the few-shot settings in previous studies that must be addressed: the earlier prompt-based few-shot learners deviate from the original purpose of overcoming data scarcity. They reformulate the given question for mapping with the prompt, selectively set high-quality training samples, or augment relevant examples to ensure high performance. These approaches require considerable human effort and time during the preprocessing. Moreover, it is challenging to include selective data curation in domains where only a tiny amount of training data exists, and clear distinction for achieving higher quality standards is difficult.
To address these issues, we present a Korean-promptbased few-shot learner, in which we limit excessive resource consumption using prompt engineering for a more practical few-shot setting. To lighten prompt engineering, we propose plain template insertion (PTI) that prohibits augmenting the number of few-shot training examples and avoids human annotators' variations of sentences that do not suit the few-shot purpose. PTI is a method of placing a predefined template containing prompts and a [MASK] token into a specific position without refining the input sample. However, being pathfinders in the field of Koreanprompt-based few-shot learning, we do not intend to use null prompts [13], as these do not aid in grasping the linguistic meaning. Instead, we closely analyze the prompt-based fewshot learner's inherent challenges and reflect the linguistic features of Korean in the proposed PTI.
In this study, we apply the proposed method to a benchmark dataset, referring to as Korean language understanding evaluation (KLUE) [16], and off-the-shelf Korean pretrained language models. We perform comparative analyses by selecting three tasks: natural language inference (NLI) [2], semantic textual similarity (STS) [1], and topic classification (TC) [7], [18], [29], which have critical attributes that require an understanding of Korean with respect to the KLUE benchmark dataset. (i) KLUE-NLI requires an understanding of entailment and contradiction in the context. It is a fundamental evaluation in the subfield research of comprehending semantic representations of natural languages. (ii) KLUE-STS is a task that measures the semantic similarity between sentences. We demonstrate that Korean-prompt-based learning is applicable even when regression is applied to continuous real values rather than discrete classification [5]. (iii) KLUE-TC is a new subtask that does not exist in the GLUE and SuperGLUE datasets. It predicts one of seven predefined news categories (for example, economics and sports) based on a specific news headline. It indicates that prompt-based learning improves the inference capability of the model even for a sequence with a high degree of abstraction.

II. RELATED WORKS
With the advent of language models that can assimilate vast amounts of knowledge, enormous amounts of research has been conducted on the performance of downstream tasks by reforming the language model into cloze-style to maximize its capability. However, considering the low-data regime, recent research has been focused on effectively few-shot learning based on a limited number of examples.
In the few-shot setting, Schick and Schütze [20] proposed PET, which incorporates knowledge distillation and self-training for downstream tasks. They finetuned individual language models for data reformulated with pre-defined patterns and assembled them to annotate soft labels for large unlabeled datasets, which are then applied for classifier learning. An additional method known as iPET was introduced to address the discrepancy and performance gap among trained language models for each pattern. Thereafter, considering that handling only a single token in PET makes answer representation challenging, Schick and Schütze [21] adapted the model to predict multiple tokens.
Several studies have pointed out that leveraging hardprompts is sub-optimal and demands non trivial labor. To reduce the prompt engineering effort and investigate an optimal prompt, soft-prompting was introduced. In particular, Liu et al. [11] presented P-tuning, which places anchor tokens and automatically searches for prompts by optimization through gradient descent in continuous space using a long short-time memory (LSTM) structure. Logan IV et al. [13] far diminish the design endeavor, as they perform null prompting consisting of the input sentence and masking token only.
Certain studies have also adopted intriguing approaches in the few-shot setting. In terms of boosting performance using only a small amount of data, Wang et al. [27] approached few-shot learning from the perspective of metalearning. Cross-task transferable knowledge is obtained by performing multi-task finetuning with similar natural language processing tasks before adapting the learner to a specific task. In terms of data efficiency, there have been attempts to determine the prompting data points. Zhao and Schütze [19] investigated an appropriate data quantity per task and verified its effect by comparing it with head-based (or promptless) finetuning. In terms of language extension, XLM-RoBERTabase [30] performed multilingual few-shot learning using prompting. They utilize a multilingual pretrained XLM-RoBERTa-base [3] model along with prompting methods to conduct natural language understanding tasks in 15 languages.
Furthermore, most recent research for prompt-based learning has been applied in various fields of natural language processing including relation extraction [22], commonsense reasoning [8], and complementing weakness of prompt [4], [15], [23]. Son et al. [22] introduced a multitask learning approach for predicting a relation in a dialogue by guiding the model on the relational cues with an MLM-based relational mention prediction and the prior distribution of entity types. Liu et al. [8] proposed generated knowledge prompting to obtain the external knowledge required to solve commonsense reasoning tasks. Cui et al. [4] proposed a soft prototype verbalizer to find a suitable verbalizer within a large vocab. Lu et al. [15] pointed out that model performance can deviate significantly depending on the order of the training samples and the prompt position in prompt-based few-shot learning. Sorensen et al. [23] presented a new approach for choosing generated templates based on mutual information without human-annotated labels or updating models.
However, most studies have mainly experimented on English, and languages with different morphological characteristics have hardly been addressed. Therefore, in this study, we investigate how the performance of few-shot learning varies by applying prompt-based learning to Korean. Moreover, we argue that manipulating data, such as selective exploitation of high-quality data for training, obscures the main purpose of few-shot learning. That is, it is necessary to reconsider whether a scheme adequately alleviates the data scarcity problem. Confronted with this issue, we use data in a more strict and practical manner.

III. PRELIMINARY
In this section, we provide some background knowledge on hard-and soft-prompt tuning. The composition of the template is divided into hard and soft depending on the prompt used to represent the embedding space. We redesign the structure of the model by considering the tuning methods of prompts and determining the form of the template.

A. HARD-PROMPT
A hard-prompt consists of the template T d with humaninterpretable natural language in discrete space. T d = d 0:m , d m , d m+1:n reflects the purpose of contextual information and the training objective in Korean where each d i represents natural language tokens including the [MASK] token for d m . Along with the input sequence X, T d is fed to the pretrained language model for the intended downstream task. Thereafter, the model is trained to restore the m th position of T d to the mapped label word W (y) based on linguistic knowledge acquired during the pretraining. The label word W (y) is selected as the natural language token that retains syntactic coherence with T d and fits the purpose of the original label. In particular, hard-prompt-based training objective of the model θ can be described as follows: where y |MASK| indicates the model output for the m th position of T d , which is the position of the [MASK] token in the input sequence.

B. SOFT-PROMPT
A soft-prompt consists of a human-uninterpretable template T c = c 0:n , c m that is composed of trainable embedding vectors in a continuous latent space and a mask token . For the implementation, the prompt template tokens c 0:n and distinct prompt embedding e (·) are initialized. This enables the adaptation of the template to fit the optimal template, which may not be grasped by discrete natural language tokens. In particular, in adopting the softprompt, the model θ is trained with the following training objective: where e(·) indicates the original token embedding of the pretrained language model, and y m indicates the model output for the m th position of T c , which is the position of the [MASK] token in the input sequence. The prompt token embedding is trained in a different latent space from the original pretrained language model. Optionally, prompt embedding can further be trained with a Bi-LSTM encoder E bi (·) to impose a direct relational connection between c 0:n [10]. Equation 3 represents the encoding process of E bi (c 0:n ). In training the model θ using these, e (c 0:n ) in equation 2 is replaced with E bi (c 0:n ).

IV. PLAIN TEMPLATE INSERTION
We propose plain template insertion (PTI), which places a manual template in the fitted position considering minimal Korean contextual information for the given question and the connection between sentences. We integrate PTI with hard-and soft-prompt tuning. PTI can be engineered differently depending on the content, position, and mapping labels, thereby significantly increasing the data efficiency of the fewshot examples.

A. TEMPLATE CONTENT
The template consists of a [MASK] token and prompts combined to determine the content. For example, if the purpose of a given task is to identify the relationship between two input sentences, we can set the template content as follows: ''The two sentences are [MASK] related.'' Special symbol tokens (for example, |, ?) can be used to naturally interpret the context between the input sentences and the template or distinguish the relationship as a separator. Further, the template content represents the randomly initialized embedding space in the soft-prompt. By setting the length of the template, we determine the number of trainable continuous prompts to include. VOLUME 10, 2022

B. TEMPLATE POSITION
The template is inserted at a specific position in the given sequence. We set the fixed position before and after the input sentence. For example, if two sentences <s 1 > and <s 2  The mapping label is part of the template and represents the answer word to be predicted in the task. Unlike conventional finetuning using the [CLS] token as a model prediction, prompt-based learning infers the answer in the same way as masked language modeling pretrained with self-supervised learning. This approach inhibits the occurrence of catastrophic forgetting caused by the gap between pretraining and finetuning in language models. The hard-prompt maps one of the tokens in the vocabulary of the model, with considering the context to the [MASK] token (that is, we follow the verbalizer in Schick and Schütze [20], [21]). The soft-prompt uses the PTI that meets the training purpose so that the continuous prompt updates the embedding value based on the given sequence and mapping label.

V. PROMPT ENGINEERING
To assess our Korean-prompt-based few-shot learner, we apply PTI to the NLI, STS, and TC tasks, for which the KLUE benchmark dataset is used. We assume that prompt-based learning drives the linguistic knowledge of pretrained language models to understand the contextual representations necessary to solve the downstream task. As explained in Section §IV, a PTI consists of the (i) template content, (ii) template position, and (iii) mapping labels as three variables. PTI is used differently for the hard-and softprompt.

A. HARD PLAIN TEMPLATE INSERTION
This section introduces a method of engineering PTI with human-interpretable hard-prompt tuning, which differs for the three downstream tasks.
KLUE-STS is a regression task that the model uses to predict the degree of semantic equivalence between two sentences. This dataset labels semantic similarity as a real value from 0 (indicating no meaning equivalence) to 5 (indicating complete meaning equivalence).
(i) The template content describes the semantic simplicity of the two sentences as [MASK] token and discrete (ii) To recognize the relationships of two sentences <s 1 > and <s 2 >, we define the position of the template as follows: ' , (As the [MASK] content,).'' (ii) To recognize the relationships between the two sentences <s 1 > and <s 2 >, we define the template position as follows: ''[t] <s 1 > <s 2 >,'' ''<s 1 > [t] <s 2 >,'' or ''<s 1 > <s 2 > [t].'' (iii) Mapping real values to discrete label words is quite difficult. Thus, we use the binary classification tag in KLUE-STS, mapped with similar meaning tokens in the vocabulary. However, with mapping labels such as (same) and (different), it is still difficult to predict the real value of the semantic similarity score. Thus, inspired by [3] and [25], we use linear interpolation to alter the range of the real values [0,5] into [0,1] to map the two opposing poles.

3) KLUE-TC
KLUE-TC is a classification task in which the model predicts one of the seven predefined news categories based on a given news headline. It consists of human-annotated news headlines from online articles distributed by the Yonhap News and published from January 2016 to December 2020.
(i) The content of template is constructed by reflecting the attribute of the training objective that infer the one category (or topic) of a given sentence. For example, we use a predefined plain template, ''  PTI in continuous embedding space consists of uninterpretable prompts but has the advantages of being task-agnostic and independent of contextual information. Therefore, we use the same PTI for three different downstream tasks. 1 (i) We use randomly initialized continuous prompts to overcome the discreteness, which quickly converges to the local minimum caused by the discrete words [11]. A continuous prompt is not mapped to a natural language of discreteness but determines how much the template contains. The content of such a template with two sentences <s 1 > and <s 2 > and length of l prompts is ''<s 1 > <p 1 > . . . <p l > [MASK] <s 2 >.'' (ii) The position of the template is relatively unrestricted. The randomly initialized template is independent of the context, and some prompts can be separated. If a template with l prompt tokens is separated, ''<p 0 > . . . <p i−1 > <s 1 > <p i > . . . <p l−1 > [MASK] <s 2 >'' is one of the examples of the position of the soft-prompt template. (iii) The mapping labels consist of the same discrete label words as for the hard-prompt, but there is an additional option to remove the mapping to discrete label words.

VI. EXPERIMENTAL SETTINGS
In this section, we specify the settings for the datasets, models, and evaluation metrics for few-shot learning. We conduct our experiments and analyses based on the experimental settings specified in this section. More details about 1 Whether there are one or two input sentences leads to different number of choices, but the rule of template insertion is effectively the same.
training environments and hyperparameters are described in an Section VII.

A. DATASET
As described in Table 1, we employed KLEU-NLI/STS/TC dataset as a few-shot learning data considering the number of classes for each task. The recent KLUE dataset does not publicly open a leaderboard 2 and only training sets D train and development sets D dev are disclosed. Therefore, we used the entire D dev as a test sets D test . Inspired by [5] and to achieve the goal of learning from scarce data, we randomly selected the same number of samples as the few-shot size K from the entire D train and used them as the few-shot training sets D K train and few-shot development sets D K dev (that is, Each label in the D K train and D K dev has K samples and is composed differently based on the five seeds: {42, 52, 62, 72, 3407}.

B. MODELS
We choose KLUE-RoBERTa-large [16], which is pretrained with the RoBERTa [12] architecture and Korean corpora and demonstrates the best performance in the KLUE benchmark as a few-shot the solely masked language model pretraining of RoBERTa is acceptable for prompt-based learning to predict the [MASK] token. To recognize the improvement gap based on the model size, we supplemented the experiments for KLUE-RoBERTa-base. In soft-prompt, we optionally attach a Bi-LSTM head to the RoBERTa and conduct comparative analysis in Table 5. Bi-LSTM is a lightweight natural network that prevents the problem of rapidly converging continuous prompts to discreteness caused by a pretrained language model's embedding layers and sets randomly initialized prompts to have contextual dependencies [11].

C. METRICS
We measure performance based on the evaluation metrics presented by the KLUE [16]. KLUE-TC estimates performance using the F1-score, the harmonic mean of precision and recall, to prevent false-positive and false-negative problems arising from the imbalance of the seven labels. KLUE-NLI is uniformly divided into three labels and measures performance with accuracy. KLUE-STS measures performance using the Pearson correlation representing a linear correlation between the predicted and label values.

VII. EXPERIMENTAL DETAILS
We implemented the Huggingface [28] and Pytorchlightning 3 framework for language modeling on an 18-core Intel Xeon Gold 6230 CPU and an NVIDIA Quadro RTX A6000 GPU. We trained Bi-LSTM (embedding size 300, 2 layers, model parameters 3.8M), KLUE-RoBERTa-base (embedding size 768, 12 layers, 12 heads, model parameters We set the few-shot size to 16/32/64 and compare the performance between finetuning and our prompt-based few-shot learner. We report mean performance over five different seeds. Majority selection refers to predicting the answer with the largest number of labels. PTI(H) and PTI(S) indicate plain template insertion with hard-and soft-prompt engineering, respectively. ↑ presents performance improvements for the same-sized finetuned model. ↓ indicates a decline in performance for the same size finetuned model. The best models for the same few-shot size are formatted in bold and the second-best ones are underlined.
110M) and KLUE-RoBERTa-large (embedding size 1024, 24 layers, 16 heads, model parameters 377M) model as a few-shot learner. In addition, we assigned the template content, template position, and mapping labels in the process of loading data modules. We conducted the validation during the training step and saved the best checkpoint with the highest performance monitoring with evaluation metrics for each task.

C. BI-LSTM
We set the hyperparameters in the training process same as soft-prompt PTI. Also, we utilized FastText [6] with 300-dimension embeddings based on common crawl 4 corpus to train Bi-LSTM.

A. MAIN RESULTS
We conduct a quantitative analysis of our PTI for the three types of Korean natural language understanding tasks. As presented in Table 2, it is difficult for the model to achieve stable performance because few-shot training examples do not include the selection of quality, and the number of development examples is extremely limited. In particular, Bi-LSTM model without pre-trained knowledge has a problem that 4 https://commoncrawl.org/ can perform less than the majority selection for multi-class classification problems. However, few-shot learning, which assumes strict data scarcity, still reveals biased features with a small amount of data. Even though it is difficult for the model to guarantee the best result caused by the typical problem of few-shot learning, PTI outperforms the traditional promptless finetuning method in 31 of the 36 comparative experiments except soft-prompt PTI in KLUE-TC. These results indicate that PTI has the advantage of performance improvement while increasing data efficiency by adding three variables to the input sequence.
PTI incorporated into hard-prompt engineering exhibits performance improvement of +14.57, +29.54, and +2.67 in KLUE-NLI, STS, and TC at most, respectively. PTI incorporated into soft-prompt engineering enhances the performance by +14.88, +23.61, and +2.43 in KLUE-NLI, STS, and TC at most, respectively, but a maximum performance reduction of −4.88 in KLUE-TC. There is no previous research on whether the topic classification is valid for soft-prompting in the English benchmark datasets. However, based on the achievements made by hard-prompt engineering, we interpret that the continuous prompt rather increases the difficulty of the tasks while as well as the uncertainty. Moreover, few-shot learners do not necessarily guarantee higher performance of larger models because of the uncertainty of data scarcity. Although complete improvement has not been made, the PTI approach to the same few-shot examples in the KLUE-STS task relieves the uncertainty with consistent results in the model size.  full-shot dataset. We average over five seeds to compute the score. We apply hard/soft PTI and promptless finetuning to the RoBERTa-large model.
between PTI and promptless finetuning gradually converges. Experiments with the full-shot setting indicate that the gap between the two approaches is marginal. This reason for this is that the development examples of the full-shot settings offset the advantages of PTI with high efficiency in limited resources. We reaffirm that PTI is a method to achieve near-full-shot performance by significantly improving data efficiency in low-resource settings rather than dramatically boosting the performance of state-of-the-art models in the large-resource environment.

C. OUT OF CONTEXTUAL UNDERSTANDING
The PTI represented in the discrete prompt can be an inexplicable sentence that is not consistent with the natural language depending on the setting of the three variables. We manipulate templates that are not contextually proper for each task objective and evaluate the performance. Table 3 presents the case for the counterexamples from the expected result in prompt-based learning. Overall, the implausible template violates the Korean sentence structure and contextual information but achieves better performance than promptless finetuning. In the KLUE-NLI, there is a case in which the implausible template, which consists of unrelated mapping labels, is slightly superior in performance to the plausible template. In another case, an implausible template improves the performance to a larger extent, even when the content and position of the template are completely unrelated to the training purpose of the task and the input sequence in the KLUE-STS. Finally, even in the KLUE-TC, the template with the implausible content can enhance the performance as much as that with the plausible template. We interpret these results for two reasons as follows: (1) Robustness -PTI does not need to strictly follow Korean grammar, sentence TABLE 3. The PTI is manipulated based on the contextual understanding for each task. To estimate the performance, the particular few-shot size (for example, f32) and model (for example, base or large) are set. Except for the plausible template in KLUE-NLI (-0.02), all other cases outperform the promptless finetuning. Based on the three variables in PTI, the explicable component is denoted as and the inexplicable component as .
structure, and contextual information in the production of templates. In addition, we demonstrate that it is unnecessary to put too much effort into considering natural language in generating the manual template. That is, it is enough for prompt-based few-shot learners to train through roughly engineered PTIs. (2) Forgetting -The flip side is that it is difficult for prompt-based learning to preserve the knowledge of the language learned during the pretraining to address the downstream task. Considering the superior performance of the human-uninterpretable template in the soft-prompt, language models still do not fully understand natural language and are not entirely free from catastrophic forgetting. Table 4, we present the outstanding template, which demonstrates the highest performance for all models and few-shot sizes for each task. We consider that the VOLUME 10, 2022 empirical experiments and potentially infinite number of discrete word combinations cannot guarantee the best performance. In addition, considering the results presented in Table 3, plausible templates do not necessarily guarantee higher performance improvements. However, Table 4 indicates that contextual information is not a negligible attribute and can be advantageous in improving performance.

E. PROMPT ENCODER
The soft-prompt PTI in the main result shows the best performance among the various data points that can be combined. As shown in Table 5, we can extend the data points as the setting of the prompt encoder or freezing pretrained language models. Prompt encoder is known as an effective method of maximizing the performance of soft-prompt tuning by addressing problems caused by discreteness as well as association. Contrary to the expected results, the models with Bi-LSTM attached do not consistently outperform the case of simply attaching the pooling layer head without Bi-LSTM. To determine the cause for these results, we evaluate the performance by freezing the learning parameters of the RoBERTa model and training only the light parameters of Bi-LSTM. Using only the model parameters of Bi-LSTM leads to lower performance than promptless finetuning in most cases. Through ablation studies, we find that the Bi-LSTM as a prompt encoder has low capacity to understand contextual representations, offsetting the robustness of PTI. Therefore, the prompt encoder with Bi-LSTM appears to hinder the pretrained model's learning and memory of prior knowledge.

F. MASKED LABEL PREDICTION IN ZERO-SHOT
We rank the results of KLUE-RoBERTa-large's prediction of [MASK] of template in the zero-shot setting to track the optimal mapping label for template content and position. Even though the template content and position make sense in human-interpretable, mask token prediction can converge an unrelated word such as '' (contradiction).'' Figure 2 demonstrates that the pretrained language model already struggles to infer proper mapping labels. Thus, Table 3 presents that the models forget the incomplete inferences of pre-acquired knowledge and newly recognized patterns for mapping labels, showing robust results for mapping labels that are irrelevant to the context. Additionally, prompt-based learning of the PTI method aids the inference that is difficult to complete with only pretrained knowledge and significantly improves the performance.

IX. DISCUSSION AND ERROR ANALYSIS
Although the proposed prompt-based learning outperforms the promptless finetuned baselines, it needs to be noted that there are still unreasonable and burdensome to interpret results in the evaluation. There are three types of limitations and we present the direction of the study to be revealed later.
Firstly, the bias problem that a small amount of data can cause is inherent in our rigorous few-shot learning. Fewshot training examples make the performance gap between seeds large, and the model is challenging to find the optimization point caused by a small number of development examples. For this uncertainty, we have focused on improving performance without compromising the intrinsic purpose of few-shot but have not presented a precise solution to overcome the deviation.
Secondly, Table 5 shows the prompt encoder with autoprompting, Bi-LSTM, does not significantly impact performance improvement. This result is slightly different from what was claimed in the previous study in soft-prompt tuning [10]. Additionally, it is not clear whether the prompt encoder is capable of resolving discreteness and We set plausible templates (e.g., The news corresponds to the [MASK] field.) and normalize the score (e.g., 0.2958) for the prediction results up to top-rank 5 th (e.g., IT, communication, technology, science, and ICT).
association. It exhibits low performance when only using additional prompt encoders, and lightweight prompt encoders become less influential as the pretrained model embedding layers become deeper.
Thirdly, as described in Table 3 and 4, we cannot determine which template has the best content, position, and mapping label caused by the limitations of empirical experiments. Moreover, we find that even with contextual meaning and labels in the template inexplicable for training purposes, the performance is sometimes increased and vice versa. These results show the benefits of robustness and easy reproducibility of PTI. However, it is difficult to suggest detailed interpretations of whether language models are possible to overcome catastrophic forgetting entirely and understand natural language like a human.
To address these issues, we can conduct future work in the direction of enhancing the verification scheme within a given low-resource data, such as [17] or removing the bias of the few-shot data. In addition, soft-prompt needs to be studied to sufficiently prove its effectiveness for the prompt encoder. Furthermore, we should research few-shot learners with higher performance improvements and interpretability, minimizing the catastrophic forgetting of language models.

X. IMPLICATIONS OF THE STUDY
We propose Korean-prompt-based few-shot learnings and apply PTI as the prompt engineering method. PTI is a few-shot learning method close to a more practical manner and maximizes the efficiency of given data. Our study is applicable to research and industries that use datasets from domains (for example, fraud and personal information) with considerable collection constraints. In particular, the implications are expected to be more significant in the Korean language, which has strict social norms related to personal information and data collection. Moreover, in developing detection models with class imbalance problems, our study can be utilized to replace traditional sampling methods with prompt-based few-shot learning.

XI. CONCLUSION
As pathfinders in the field of Korean-prompt-based fewshot learning, we conducted an in-depth analysis considering the Korean sentence structure and overcoming data scarcity through rigorous few-shot learning. In this paper, we proposed PTI, which sets a manual template in the suitable position considering Korean contextual information and consists of template content, template position, and mapping labels.
Our prompt engineering method is powerful in the context of Korean and applicable to both hard-and soft-prompt tuning. PTI is robust to the uncertainty of low resources, achieving significant performance improvements in the few-shot learning KLUE-NLI, STS, and TC tasks. To reconsider whether a scheme sufficiently relieves the data scarcity problem, PTI also adheres to using data more practically. In future work, we plan to study whether we can dynamically determine the optimal template for the input sequence through abductive reasoning and contrastive learning within limited resources. We will also attempt to produce the best prompt that can be a guide to achieving performance close to full-shot learning in a few-shot setting. We hope that our proposed PTI and analyses will be a fundamental resource for research on Koreanprompt-based few-shot learners.
HYEONSEOK MOON received the B.S. degree from the Department of Science in Mathematics and Engineering, Korea University, Seoul, South Korea, in 2021, where he is currently pursuing the Ph.D. degree in computer science and engineering. Currently, he is a part of the Natural Language Processing & Artificial Intelligence Laboratory, under an integrated master's and Ph.D. courses. His research interests include natural language processing, neural machine translation, automatic post editing, and parallel corpus filtering.
CHANHEE LEE received the B.S. degree in computer science and engineering from Sogang University, Seoul, South Korea, in 2013. He is currently pursuing the Ph.D. degree in computer science and engineering with Korea University, Seoul. Currently, he is a part of the Natural Language Processing & Artificial Intelligence Laboratory, under an integrated master's and Ph.D. courses. He is currently working as an AI Research Engineer at NAVER Search U.S. His research interests include language understanding and neural network pruning, where he tries to find inspiration from how humans do it, and build computational models based on this.
SUGYEONG EO received the B.S. degree in linguistics and cognitive science, language and technology from the Hankuk University of Foreign Studies, Yongin, South Korea, in 2020. She is currently pursuing the Ph.D. degree in computer science and engineering with Korea University, Seoul, South Korea. Currently, she is a part of the Natural Language Processing & Artificial Intelligence Laboratory, under an integrated master's and Ph.D. courses. Her research interests include neural machine translation and quality estimation, where she tries to predict machine translation quality that minimizes human labor.
CHANJUN PARK received the B.S. degree in natural language processing and creative convergence from the Busan University of Foreign Studies, Busan, South Korea, in 2019. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Engineering, Korea University, Seoul, South Korea. From June 2018 to July 2019, he worked at SYSTRAN as a Research Engineer. He is also working as an AI Research Engineer at Upstage. His research interests include machine translation, grammar error correction, simultaneous speech translation, and deep learning. HEUISEOK LIM received the B.S., M.S., and Ph.D. degrees in computer science and engineering from Korea University, Seoul, South Korea, in 1992, 1994, and 1997, respectively. He is currently a Professor at the Department of Computer Science and Engineering, Korea University. His research interests include natural language processing, machine learning, and artificial intelligence. VOLUME 10, 2022