Controllable and Editable Neural Story Plot Generation via Control-and-Edit Transformer

Language-modeling-based methods for story plot generation aim to generate a plot with a language model (LM). LM methods have limitations of user-assist plot generation of goal control, refinement for editing, causing the generated plots not clear sense for specific goal, lack coherence, and edit flexible. We present a control-and-edit transformer technique which uses controlled imitation learning of editing distance from dynamic programming to support deleting policy, inserting policy, a weighting-reward with prepossess of corpus statistic, and measures continues reward for the controlled goal. Automated evaluation and Haman judgement show our method is promising in comparison with the baselines.


I. INTRODUCTION
Automated plot generation as a challenging and significant natural language processing task that attracts more and more scholars to study. Given a domain or a situation with a set of specifications, automated plot generation's role is creating a sequence of main plot points which are supposed to compliance with preset. Prior approaches to plot generation count on constructing [1]. These generated stories which approaches required extensive domain knowledge are unlively for much guiding knowledge to control the story's coherence.
With machine learning approaches prevalently applied in NLP tasks, the story plot generator could learn storytelling in particular domain. Existing stories and plots provide lots of knowledge in particular domain and plot construction to generator, however, the story plots constructed by these methods especially neural networks tend to construct goal basing on experiences and often against the will of the user for lacking the abilities to receive guidance [2]. Applied reinforcement learning on plot generating, they invented a reward-shaping technique that adjusts the weight of the neural language model in training so that the generation of plot points could align with intended objective. In fact, the user's goal in the generation of the plot will not be fixed. For instance, in a story generated by story plot generator, a woman hated a man in it, The associate editor coordinating the review of this manuscript and approving it for publication was Pasquale De Meo. however, users want a romantic story so user can delete some parts of this story and insert verbs like 'love', 'adore' in it. In short, users can reverse the plot of story by edit. So, our model has the ability to edit development if plot is needed for a story plot generator.
Recently, as RNNembed in [3] presented a characterlevel sequence-to-sequence learning method, modern neural sequence generation models are trained either generate tokens from scratch or modify the sequence of tokens. The flexibility of current decoding models has been raised by [4] which replaced standardized decoding mechanism with two operations--deletion and insertion using imitation learning algorithms. The characteristic of these two operations is that deletion and insertion are adversarial also complementary.
The researchers are exploring open-domain story generation where a given story's topic is as input for plot generator.
How to generate interesting stories full of twists and turns without running away from the theme is challenging these years.Seq2seq models perform not well in story generations in recent studies due to the complex and underspecified dependencies between the prompt and the story. [5] constructed large human generated stories and introduce fusion and self-attention mechanisms to generate stories human judges more like. [6] present an automated story generation technique whereby the authors decomposed the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). [7] proposed a plan-and-write framework and compared two strategies: dynamic and static planning to generated stories which were around a given topic. The primary pitfall of neural language model approaches for story generation is that the space of stories that can be generated is huge, which in turn, implies that, in a textual story corpora, any given sentence will likely only be seen once.
Interactive story generation is the latest research hotspots for researches think a controllable story generator is humanized and so that users can control the plot development. [8] introduced content-inducing Approaches to let users have more control over the generation. [9] proposed PLOTMA-CHINES, a novel narrative transformer that learns to transform outlines to full stories with dynamic plot state tracking. An interesting model was proposed by [10] where uses could design the emotions the protagonist experienced without sacrificing story quality.
In this paper, we take advantage of this characteristic in the previous neural story plot generation and use imitation learning to design a controllable and editable Neural Story Plot Generator with transformer's help. During the story plot generation, when user wants to edit the plot such as revise, add, replace, or delete any part of their generated plot, the edit operation will be trained as one policy. This policy's adversary's output at the previous iteration will be input in next iteration. For instance, user gives the generator a specific target to the generator but replace a verb with another one in the process of generator construction. So, this operation will decompose into one deletion and one insertion. The imitation learning in transformer training is proved to be effective to give our model the ability to be editable through the following experiments. Meanwhile, our model achieves comparable results both in generator's execution time and model's speed of convergence.
The remainder of the paper is organized as follows. First, we discuss related work on automated story generation, editing-based models and transformers Then in our approach we introduce the edit actions: insert & delete and our model based on control-and-edit transformer and provide an evaluation of the event representation in the context of story generation. We carry out experiments to show our model's performance in story generation with control-and-edit ability. We end with a conclusions about these experiments and expectation on real-world application.
The main contributions of this paper can be summarized as follows: • We first add editable capability of the story plot generator, we allow the generation process to be editable and controlled, this significantly increases user's engagement.
• The Control-and-Edit Transformer is proposed as a reward reshaped imitation learning from the training of goal, input plot sequence and the ground truth of output plot sequence.
• We evaluate our techniques in two ways such as automatic evaluation with the goal achievement rate and the perplexity of language modeling technique, and human judgement. The results show that our model is comparable to the state-of-the-art models.

II. RELATED WORK A. NEURAL PLOT GENERATION
Symbolic planning [1], [11]- [13], [14]- [16] and case-based reasoning [17], [18] are the main means to story plot generation in the early time. These methods rely on a combination of rigorous algorithms and domain-specific knowledge so that the coherence of story plot gets well controlled on the logic of cause and effect. The popularity of machine learning allows story line generators to learn how to tell stories from an existing library of stories. Recently, reinforcement learning is used in text processing issues, especially Markov decision process (MDP) problem has been proved to be solved efficiently. In MDP problem, a tuple M = <S, A, T, R, y> where S is world states, A is possible actions, T is a transition function. To some extend, reinforcement learning addresses some issues of preserving coherence for text generation. In addition, with MDP's constructing, reinforcement learning provides the ability to specify a goal. [2] used deep reinforcement learning in story plot generation, they give a goal to story plot generation and used reward shaping to force the strategy gradient search to find longer term reward.

B. EDITING-BASED MODELS
Incorporating ''editing'' operations for sequence generation tasks has been investigated by some prior works. Using convolutional neural network, [19] predict and apply token substitutions iteratively on phase-based MT system. [20] and [21] propose oracle policy which used Levenshtein distance with dynamic programming. Levenshtein transformer which simultaneously inset and delete multiple tokens iteratively was proposed by [4].

C. IMITATION LEARNING
In traditional reinforcement learning tasks, learning the optimal policy is usually done by calculating cumulative rewards, which is straightforward and performs well when more training data is available. However, in multi-step decision, the learner cannot be rewarded frequently, and there is a huge search space based on cumulative rewards and learning. The Imitation Learning approach has been developed over the The work-flow of Control-and-Edit Transformer framework with transformer and BERT pre-trained backbone to embedding input sequence y 0 and the control verb of g, the controlled imitation learning of the insert learning and delete learning among input sequence y 0 , output sequence y * , and the control verb g, and the reward shape between the output sequence and the controlled verb g.
years and can solve the multi-step decision problem well. Imitation learning has been widely applied in NLP tasks. Imitation learning means learning from the examples provided by the demonstrator, it uses states as features and uses actions as labels to obtain the optimal policy model. Thus we use imitation learning to train the Controlled Distance Transformer which empowers neural story plot generation Controlled and Editable abilities.

D. TRANSFORMER
Different from Seq2Seq [22] which inputs a word to the encoder in order in each step to generate one word output at a time in the decoder. In 2017,Breakthrough transformer framework used attention mechanism to overcome the context limitation of LSTM. Besides, it provided more throughput because the inputs were processed in parallel and there was no order dependency which solved the problem of long term dependence. The subsequent improved transformer model, such as gpt-1 [23] and bert [24] which were pretrained in large data sets. Subsequently, transfer learning was used to fine tune these models for specific task characteristics, thus significantly improving the performance of several NLP tasks such as language modeling, sentiment analysis, question answering and natural language reasoning. In Natural Language Generation(NLG) task, transformer also performs well. A denoising autoencoder BART [25] incorporates two-stage pre-training: (1) Corruption of original text via a random noising function, and (2) Recreation of the text via training the model. Turing-NLG trains more kinds and quantities of data in advance, and its resources are shared among different tasks to enhanced zero-shot learning. GShard [26] allows more than 600 billion parameters to be expanded. Through sparse gating and mixed multi language machine translation, it uses automatic slicing method to classify experts (MOE) with low computing cost and compilation time. In our work, we chose representative BERT as our model's transformer. which showed nice effect in story plot generation.

III. OUR APPROACH
We take advantage of the characteristic in previous interactive story generation and use imitation learning to design a controllable and editable Neural Story Plot Generator with transformer's help.

A. NOTATIONS AND DEFINITIONS
The plot generation problem is a planning of finding a sequence of editable events ended with a desired goal, where we using the verb (e.g., like, love, marry) of the event to represent the goal. The challenges include user-control and user-edit assist story event generation. Specifically, we use a Markov Decision Process (MDP) of a tuple (Y, A, E, R, g, y 0 ) to model controllable and editable plot sequence generation. The MDP has a setting of an environment E for the agent interaction with editing action and the designed goal g to generate output plot sequence. Y ∈ V N max is the plot sequence generate set with a max length N max , and V is a word vocabulary. For each decoding process, the model receives an input of the plot sequence and goal verb tuple (y, g) where y is drawn from scratch or uncompleted plot sequence generation and g is the desired goal, selects an action a with a continues reward r. A is the editable action set and R is the expected reward function. Usually, R is the optimal edit distance between the ground-truth plot sequence y * g given goal g and the plot generation sequence y, R(y) = −D(y * g , y) by Dynamic Programming. As the initial 96694 VOLUME 9, 2021 plot sequence, the agent receives y 0 , the agent essentially learns to do refinement plot sequence, and it is also a plot generation if y 0 is empty. The agent's goal is to learn a policy, π , which with condition of current plot generation and the verb goal to a policy probability to produce the probability for different edit actions: π (a|y, g).

B. EDIT ACTIONS: INSERT & DELETE
As the definition of MDP model, suppose given the current plot sequence y k = (y 1 , y 2 , . . . , y n ), the edit actions such as insertion and deletion are defined to output y k+1 = E(y k , a k+1 ).

1) DELETE
The deletion policy reads the input plot sequence y, and for every token y i ∈ y, the deletion policy π del (d|i, y, g) makes a binary decision such as 1 (delete this token) or 0(keep it).

2) INSERT
The insertion policy consists two phases: place-holder subpolicy and token classification sub-policy which can collaboratively insert multiple words for some slots.
(1) For each slot (y i , y i+1 ) in plot sequence y, π plh (p|i, g, y) calculate the number of tokens required by the slot. (2) for each placeholder calculated in the slot, a word classification sub-policy π tok (t|i, y, g) is defined to replace the default place-holders to the classified word in the vocabulary. π (a|g, y) = d i ∈d π del (d i |i, g, y) where with interaction of deleting action, y = E(y, d) is the environment output; and with the interaction of place-holder, y = E(y , p) is the environment output by adding default place-holder tokens.

C. CONTROL-AND-EDIT TRANSFORMER
The key model of Control-and-Edit Transformer and the dualpolicy learning algorithm are introduced in this subsection.
In a simplified word, in each iteration of our model, the input is a designed goal with either a plot sequence or an initial empty sequence, the model using the inserting or deleting to generate or refinement a output sequence. Until the model of two combined policies are converge.

1) OUR MODEL
Firstly, Transformer pre-trained by BERT is defined as the backbone block to encoding the concatenate sequence of goal g and input plot sequence y 0 . The states from the l-th block with the first token of g are: where word embedding E ∈ R |V|×d model and position embedding P ∈ R N max ×d model are defined in Transformer model, respectively, and the first token embedding is the embedding of plot generate goal E g . Controlled Policy Classifiers. The decoder outputs the action with the encoder (h 0 , . . . h n ) and designed goal g by three controlled policy predictions: (1) Controlled Delete Prediction: this will read the controlled goal and the input sequence of each word and will predict whether to delete the word or keep the word, where g is the generate goal and W A ∈ R 2×d model is the training parameter matrix, and the goal verb and the boundary words are always kept without deleting. We start the deletion position from the second index since the first one is the designed goal.
(2) Controlled Placeholder Prediction: it classify the number of words as a K max classification in each slot: (4) where g is the generate goal and W B ∈ R (K max )×(3d model ) is the trained parameter matrix. With the number (0 − K max ) of words predicted by this policy, the correspond number of placeholders are inserted in the slot. In our implementation, special token <PH> is the preserved word in vocabulary to added as default placeholder token in the generation or refinement process.
(3) Controlled Word Prediction: after adding the <PH> in the slots, the model will classify the <PH> with different context slots to the words in the vocabulary by the vocabulary classification: where g is the generate goal and W C ∈ R |V|×d model is the trained parameters matrix for word classification.
In summary, the learning parameters of our model including the parameters of the Transformer Blocks and the decoding parameters such as (W A , W B , W C ).

2) CONTROLLED AND REINFORCED LEARNING
Control-and-Edit Transformer are trained with imitation learning. Suppose there are exist an expert policy π * , VOLUME 9, 2021 we trained the agent of the Control-and-Edit Transformer to learn it by imitation learning. By the dynamic programming of controlled input sequence and the ground-truth output sequence or the random word dropping of the output sequence by knowledge distillation we can get the optimal expert policy. Given the plot sequences of y 0 , y * , and the controlled goal g, the objective is to maximize the follow exception: where R(v(y * ), g) is the reward of the controlled goal g and the verb of the ground truth plot sequence y * , {L del (y 0 , y * , g) is the controlled delete learning, and L ins (y 0 , y * , g) is controlled insert learning.

a: CONTROLLED DELETE LEARNING
The input plot sequence is either from the initial plot input sequence y 0 or from the insert generate sequence of the previous iteration. Formally, the controlled delete learning is defined as: where the delete distribution d π del is defined as follows: is a mixture factor, and y is input sequence of insert process.

b: CONTROLLED INSERT LEARNING
The input plot sequence is either from the delete generate sequence of the previous iteration or from the random word dropping of the ground truth of the output plot sequence y * . Formally, the controlled insert learning is defined as: where y ins is the output of the inserting placeholders p * upon y ins , the insert distribution d π ins is defined as follows: 1] is the uniform distribution, δ 2 ∈ [0, 1] is a mixture factor, and pi RND is a random dropping of words in the ground truth sequence y * .

c: EXPERT POLICY
The expert policy is the optimal edit distance between the initial plot input sequence and the ground truth of plot output sequence by the dynamic programming. Formally, the expert policy π * with the ground-truth y * is defined as: where D is the Edit distance, which can be obtained by dynamic programming.

d: CONTROLLED REWARD SHAPING
For different output plot sequence and different designed goal verb, we introduce different continues reward weight for the generating sequence pairs of the training dataset. Formally, to construct the continues weighting reward measure, we analysis the stories in the corpus by two sub-reword functions by prepossess: (1) for each ground-truth output sequence, we measure the distance of the verb in the groundtruth output sequence from the goal verb in story, and (2) for each ground-truth output sequence, we also need to count the frequency of the verbs of the output sequence in the corpus. By analysis the stories has the goal verb g, The statistics verb distance of υ and goal verb g is calculated by: where S υ,g is the subset of stories in the corpus that contain υ before the goal verb g, l s is the number of events in the story s, and d s (υ, g) is the number of events between υ and g in each story s. For each story s if the event verb υ and g are closer, the large reward is produced. By analysis the stories has the goal verb g, the statistic verb frequency of υ and goal verb g is calculated by: where N υ is the count of events with υ in the corpus, and the k υ,g is the count of co-occur of them in the corpus with a precedent order constraint of (υ, g) in the story. The totally continues reword for each verb in ground-truth output sequence is: where γ is the normalization constant.

IV. EXPERIMENTS
The effectiveness and flexibility of Control-and-Edit Transformer across the neural plot generation task are evaluated in our experiments.

A. EXPERIMENTAL SETTINGS 1) DATASET
A corpus of movie plots from Wikipedia [27] was used in our experiments. We cleaned this corpus up to remove any extraneous Wikipedia syntax, such as links for which actors played which characters. The number of stories in the corpus is 42170, with an average of 14.52 sentences per story. Not only our event translation process can extract a single event from a sentence, also we can pick up multiple events from one sentence. We picked up multiple events for there were more than one verbal in sentence or the sentence contained conjunctions. For instance, in this sentence ''Henry and Jerry played basketball.'', our algorithm would extract two events: Henry, play, basketball, and Jerry, play, basketball.

2) MODELS
In our experiments, Tensorflow is used to train the encoderdecoder network. Both the encoder and the decoder comprised of Transformer units, with a hidden layer size of 2048, word embedding size of 512, head size of 8 and layers of 6.With mini-batch gradient descent approach, this network was pre-trained for 300 epochs.
We compare with three baseline models: • (1) Seq2Seq [28]: As our baseline, this model is considered to be a ''generalized multiple sequential event2event'' model. [2]: Starting from the weight of seq2seq, we continue to use the strategy gradient technique and reward function for training. At the same time, we cluster and restrict the verb positions described in the previous section, while keeping all network parameters unchanged.
• (3) DRL-unrestricted [2]: This is the same as DRL-clustered butthere is no lexical restriction when sampling verbs for the next event during training.
• (4) Our model: Starting from the weight of seq2seq and using settings above.

3) EXPERIMENTAL SETUP
Every event in our data-set is a clue, we generated stories with above models. For all models, the generating process terminates in any of the following cases when: (1) the model outputs an end-of-story token;(2) an event generated with a target verb ; (3) the length of the story is already 15 lines. The goal achievement rate indicates the percentage of stories containing the target verb in the total number of generated stories. In addition, we compare the average story length generated with the average story length in the test data where the target event occurred (set the length to 15 if it did not occur). Finally, we measured the confusion of all models except the test data, because it is not a model.

4) RESULTS AND DISCUSSION
We chose three verbs (marry, command and drilling) as plot generation's goal. The detailed results of three goals are listed in Table 1. The 'drilling' verb performed best in blue4, goal achievement and perplexity. These show our method has the consistency effect as controllable and editable neural plot generation. The three verbs' generation results are averaged in Table 2 for performance comparison. The overall comparison results are showed in Table 2. 39.92% of the stories in the testing set, end in our desired goals, which shows that our chosen goals were not frequent in the corpus. Our model generated the given goals on average 94.72% of the time compared to 93.35% on average for DRL-Clustered model and 24.64% for the DRL-unrestricted model. Seq2seq is a baseline for story generation, and DRL-Clustered is a strong baseline of goal achievement rate for story generation. Our model outperforms this strong baseline by the improvement of 1.27% as well as outperforms the basic baseline by the improvement of 70.08%. Compare with the strong baseline, our model has the new characters such as controlled plot story refinement and the inverse goal verb controlling in generation since the Control-and-Edit Transformer, see the case study.
Perplexity, this standard is used to be a metric to calculate how accurate the learned distribution is for predicting unseen data. We observe that perplexity values drop substantially for the DRL models (7.05 for DRL-clustered and 9.78 for DRL-unrestricted) when compared with the Seq2Seq baseline (48.06). Our model achieve the lowest perplexity at 5.76. Our model outperforms the Strong baseline of DRL-Clustered by an improvement of 1.29 point of Perplexity. This low perplexity can be attributed to the fact that the Controlled-and-Edit transformer as well as the reward function, which is based on the distribution of verbs in the story corpus, improving the model's ability to recreate the corpus distribution. For the reason our method's rewards are based on subsequent verbs in the corpus instead of verb clusters, this causes a lower perplexity, as well as we can achieve the goal often.
The average story length in generation is also important since short story is trivial. However if the story length is too long, the controlled-and-edit model has bad performance for editing the goal verb in story generation. See the case study for inverse goal verb control. Even though our model's average story length is a little short than the strong baseline of DRL-Clustered, our model can change the story generation for inverse goal verb control by editing the control verb and the story refinement from the previous positive goal verb control. This shows our model has a great trade-off of the average story length and the story generate speed for refinement of goal verb and the generated story. In addition of Table 3, our model has better BLUE4 learning effect compared with the baseline of seq2seq model. VOLUME 9, 2021

B. HUMAN EVALUATION
Human evaluation is supposed to be the most intuitive and convincing standard in plot generation's evaluation. We first convert event sequences into natural language and then give generated plots to human judges. Through the concise, grammatically and semantically guaranteed artificial translation of the generated plot events, we know that artificial evaluation is to evaluate the original events, not the creativity of sentence writing.
We invited 100 participants who could give fair evaluation to generated stories. At a time, participants were given one of the translated plots, rating each of the following statements with a score range of 1-5 representing how much they agreed (the higher the score, the more agree): 1. GRAMMATICAL CORRECTNESS. This seven above standards are refer to a software which was designed to evaluate performance of generated stories [29]. Each participant in line with the principle of the more in line with the indicators, the higher the score. The detailed score these models achieved in 7 aspects are showed in Figure 3. As we can see, our model performs better than Seq2Seq and DRL, especially, 100 participants give a high score in local causality, this shows our model pays more attention to the logical relationship when reach the goal verb.

C. CASE STUDY
In this part, we will provide a successful and a failure case analysis for the benefit of future work propositions. In our experiments, we set a verb 'love' in the beginning of the plot generation, and we change the verb to 'hate' during the generation.
We pick a successful and a failure story to show the performances of our generator. The first story is: 'Tom and Mary met at a party. Mary's charming appearance and elegant temperament attracted Tom. Under Tom's pursuit, they fell in love. However Tom went out with his friend Lucy behind Mary's back. Once Mary's good friend June saw him in a restaurant. June told Mary, Mary was heartbroken and hated Tom. Now she is deciding to break up with Tom.' This story is a common sense story, and the plot develops according to the given direction, the protagonist Mary first fell in love with Tom and then hated him.
The first story is a successful one. And the other one story is: 'Henry is riding a bicycle in the park when he hears a girl calling for help. He looks back and finds that a girl is drowning. Henry immediately stops the car, takes off his clothes and jumps into the river to save the girl. Henry raised the girl on his shoulder and successfully rescued her ashore. The girl introduced her name as Mary and found that they both graduated from the same high school. In order to thank Henry for his help, the girl left his contact information and wanted to invite him to dinner together in the future. After several dates, Mary finds herself in love with Henry. In the end, they became a sweet couple.' This story is failure for that it is ending up with 'love' not 'hate'. We analysis it and find that maybe the story is too long to generate the object verb 'love' and so that there are not enough words to edit the plot. This failure inspire us to study the speed of story generation.

V. CONCLUSION
In this paper, we present a novel approach (Control-and-Edit Transformer model) for controllable and editable neural story plot generation. The scope of this approach is the assumption of existing a control goal verb as the ending verb of the story. Unlike previous methods based on controllable neural plot generation, the proposed method performs the flexible useassist neural plot generation such as remove some plots and add some plots by the plot refinement by a novel decoding of edit by imitation learning. The ablation analysis shows our proposed components are significant. In the future work, we still need to build a comprehensive use-assist neural generation tool for real-world application.