A System for Automatic English Text Expansion

We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, “automatic” means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.


I. INTRODUCTION
Natural language generation (NLG) systems [1] take quantitative, visual and linguistic data (words, sentences, texts) as input.In particular, we are interested in text-to-text expansion, which generates complete sentences or even texts from some meaningful words.A text expansion system must add elements like conjunctions and prepositions to transform the input words into linguistically correct outputs.For example, given the input words 'she', 'look', 'picture', 'yesterday', 'not', the output might be 'She did not look at the picture yesterday'.This is useful in Augmentative and Alternative Communication (AAC) [2], for instance.In this case, a user would select pictograms corresponding to input words on a tablet to obtain the equivalent linguistically correct written and oral output.
Our approach, which is modular and extensible to other languages, is based on our previous work on automatic NLG systems for Spanish [3], [4].We have improved modularity that allows to easily replace domain-dependent components.
The rest of this paper is organised as follows.In Section II we review the state-of-the-art, paying special attention to English NLG, automatic generation and text expansion.In Section III we describe our contribution, a system for English text expansion based on the aLexiE English lexicon (Section III-A), a grammar (Section III-B), and a sentence planner and a surface realiser (Section III-C).In Section IV we present our evaluation results, based on a widely used corpus in the NLG field (Section IV-B1) and an ad-hoc Spanish-English parallel corpus (Section IV-B2).Finally, Section V concludes the paper.

II. RELATED WORK
NLG tasks generally follow well-defined sub-tasks [1], [5]: content determination, text structuring, sentence aggregation, lexicalisation (expressing information with the right words), referring expression generation (domain objects identification), and linguistic realisation (text correctness).Content determination filters irrelevant information.It is obviously context-and application-dependent.For this subtask, it has been proposed to apply data-driven techniques [6].
Semantic-and syntactic-level sentence aggregation join data into single fluent and readable sentences [15].
Lexicalisation transforms the result of sentence aggregation into natural language (NL) but it must choose between different expression alternatives in NL.Generally, results improve with more alternatives [16].
Referring Expression Generation (REG) selects the phrases or words that can unambiguously describe domain entities.It selects the best properties to distinguish elements, and it discards irrelevant information.REG algorithms include the Full Brevity Algorithm [17], the Greedy Heuristic [18], [19], and the Incremental Algorithm [20].
Linguistic realisation comprises generation of morphological forms as well as insertion of auxiliary verbs, prepositions and punctuation marks.It may fall into the generation gap, because input data are often incomplete (i.e. they lack elements to be syntactically comprehensible) [21].Templates avoid this gap and ensure consistency.However, using templates to automatically transform data into text (which are typically used in applications such as weather, traffic, sports, and health reporting) only yields better results than other approaches in small domains with little variation.Hand-coded grammar-based systems outperform templates when detailed input is available as in the case of KPML1 [9].There are other alternatives like statistical methods that produce probabilistic grammars from large corpora increasing coverage at less effort [22]; such as the Head-driven Phrase Structure Grammar (HPSG) [23]; the Lexical-Functional Grammar (LFG) [24]; the Tree-Adjoining Grammar (TAG) [25], and some Deep Learning approaches [26].We employ a hybrid system that combines the advantages of stochastic and grammarbased systems with low NLG complexity since it uses only keywords to generate complete sentences.
Intelligent NLG architectures may be modular, (roughly following the previous stages of macro-planning and text structuring, micro-planning or sentence aggregation, lexicalisation, referring expression generation, and linguistic realisation with syntactic and morphological rules); planningoriented (which are less modular); or data-driven, based on statistical learning.Rule-(or template-) based approaches [27], [28], however, are the most extended nowadays in real applications.
According to types of output texts, these may be informative texts, summaries, dialogues, recommendations, and persuasions or creative writings.Most informative text systems generate routinary information from quantitative data [29]- [31].Summary generation [32] has applications in areas such as medicine, sports, and finance.Persuasive texts are intended to shape user behaviour [33].Dialogue systems, of interest for call centres or gaming interfaces, focus on human-machine communication [34], [35].Creative text generation is extremely difficult.It has been demonstrated that predefined templates are too rigid for it [36].Affective NLG systems have been able to generate texts beyond factual information such as poetry [37], but the operator has no control over the process.The quality of the results in NLG is measured in terms of adequacy, fluency, readability, and variation [38].There is a trade-off between efficiency and output quality.
Next we review the most relevant existing systems.Sim-pleNLG [39] performs surface realisation by following a knowledge-based approach.Originally for English, it is now available for German [40], French [41], Brazilian Portuguese [42], Italian [43], Spanish [44], [45], Dutch [46], Mandarin [47], and Galician [48].This library has had strong impact in NLG due to its simple usage.Its main disadvantage from the perspective of our research is its manual operation.The Nat-uralOWL [49] data-to-text manual tool imposes a complex input format.It generates texts from the OWL ontology.Sum-Time [50], [51], also for data-to-text generation, is highly sensitive to language variations and, thus, it is only adequate for the language it was designed for.
The system in [52] comprises a trainable sentence planner and a probabilistic surface realiser.Its modular design is similar to ours, but it is only available for English and it is not easily adaptable to other languages since certain languagedependent resources are required to train the surface realiser (a TAG grammar, the source of the supertags to annotate the semi-specified TAG derivation tree, a treebank to obtain the tree stochastic model driving the tree chooser and a corpus of sentences to train the language model required for the linear precedence chooser).Our system also needs two languagedependent elements, the lexicon and the grammar, but they can be easily replaced and we provide enough information to create/adapt these elements to any domain.
In the particular field of text expansion in English applied for AAC, we must mention the work by [53], in line with automatic generation.Their sentence compansion technique [54], [55] 2 takes a compressed message and expands it into a well-formed sentence.In practice it is useful as a writing assistant or a conversational aide in situations where grammatically correct output is desired.Its bottleneck is its generation method, based on a unification-type grammar that needs to explore many possibilities to deliver its output.Thus, it is highly time consuming.Besides, most of its operation is based on markers linked to a lexicon, as in the case of plural forms and possessives (the system might interpret the noun 'apple' followed by a plural marker as 'apples'; the pronoun 'I' followed by a possessive marker would be interpreted as 'my').Input words must be present in the lexicon, since they are taken from a word board and the user is not free to enter them.To avoid this constraint we have designed an automatic procedure for lexicon acquisition.It does not need markers, so that the entire generation process is much faster.
Summing up, most NLG systems are purpose-built and, as such, they are highly sensitive to problem characterization.On the contrary, thanks to the modularity of our system we can isolate domain-dependent modules (grammar and lexica) from domain-independent ones (NLG surface realiser).It can be tailored to different applications and domains using the corresponding syntactic structures and vocabularies.It can be easily extended to other languages as well.
As far as we know there are no other systems for automatic text-to-text expansion in English with a hybrid architecture.

III. METHODOLOGY AND ARCHITECTURE OF THE PROPOSED SYSTEM A. ALEXIE LEXICON
This section describes the morphological part, that is, the lexicon providing linguistic knowledge.This is our first contribution.
We pursue a fully automatic NLG system.It must select the grammar structure for the input words and their inflection.Therefore we need an ample vocabulary with linguistic data.The aLexiE lexicon serves this purpose.We created it by interpreting input resources and automatically (without human supervision) merging them with the two-step methodology in [56] followed by a final verification step (similarly to [4]): 1) Extraction of all possible entries and translation to a common format (Algorithm 1).2) Automatic comparison and combination of existing lexica to create the new resource (graph unification in [57] and [58]).3) Lexical verification of extracted and translated entries and their categories against the Merriam-Webster Dictionary 3 (MWD) (Algorithm 2).

1) Linguistic resources and creation of aLexiE
We built aLexiE from free English linguistic resources.We prioritized correctness and coverage and selected the following: • The morphological and syntactic English lexicon from the Alexina Project4 (EnLex) [59].• The Specialist Lexicon5 (NIH) of medical terms and everyday words.• The Freeling English dictionary6 (EN-FREELING), automatically extracted from WSJ (Wall Street Journal) and other corpora.
We performed extraction and mapping independently.Once the information was taken from the selected resources, it was transformed to a common format.Unlike for the Spanish version in [4], input resources were unrelated, and thus we conducted independent extraction and translation stages for each selected resource.
We first extracted entries from EnLex tagged as noun, pronoun, verb, adjective, adverb, determiner, conjunction, and preposition, ignoring interjections, numerals, and proper nouns.Each resulting EnLex word entry was translated to the extensional Alexina format [60].As an example, Listing 1 illustrates the EnLex entry for the English lemma 'picture'.It can be observed that this noun (cat=n), has two forms, 'picture' and 'pictures', respectively for masculine singular and plural.
LISTING1: Example of the English lemma 'picture' in EnLex.
p i c t u r e N2 1 0 0 ; Lemma ;N ; ; c a t =N; %d e f a u l t # d e l a + m u l t e x t + i n i t < t a b l e name ="N2" r a d s = " .* " > < form s u f f i x =" t a g =" s " / > < form s u f f i x =" s " t a g =" p " / > </ t a b l e > We did the same for the other two linguistic resources that we selected: NIH 7 and the Freeling English Dictionary 8 .
We kept the present, past, present participle, and past participle forms of English verbs.This information allowed adjusting the verbal tense to context-dependent semantic features.In the case of adjectives, we did not save the comparative and superlative forms (we leave comparative and superlative clauses to future work).
The merging process requires to handle the issue of the different formats and tags of word entries in the selected resources.Algorithm 1 converts them to a common format (note: e is an entry in a lexicon).Verification in Algorithm 2 checks the quality of the word entries.It looks for each lemma and its lexical categories in MWD (this dictionary has the advantage that it allows more web queries than other freely available online dictionaries).
Finally, collected entries are merged in a combination step that applies the graph unification in [57] and [58].This operation validates common information by integrating data of different nature and discarding inconsistent information.Specifically, 1) It joins all entries with a common lemma (homography is only considered for different lexical categories).a) For the entries that results from (1), feature structures are unified.b) Next, a new aLexiE entry is created with these structures.The entry comprises all common information plus any particular data in the source entries.2) A new aLexiE entry is created for any lexical entry that cannot be generated by combining entries from other lexica.
Algorithm 2 : Verification algorithm We remark that the common extraction and translation format avoids inconsistencies in this merging procedure.Algorithm 3 combines all steps in this section so far.
Therefore, aLexiE was built from inputs extracted from previously existing resources, which were merged into a common format and finally verified.Listing 2 shows the result for the lemma 'picture'.
Note that, in Listing 2, this lemma is semantically tagged as an object.We used the Multilingual Central Repository9 (MCR) [61] to get the semantic classification of nouns in aLexiE.Algorithm 4 summarises this procedure.
Due to its size, indexing aLexiE allows our system to conduct the whole NLG process much more quickly.

2) Automatic extension of the lexicon
NLG can be simplified by avoiding inputs with little meaning, such as prepositions.Consequently, it is necessary to infer a priori which preposition follows a particular verb.Indeed, a major challenge in text expansion is inferring missing prepositions.We trained this process from the text in the English Wikipedia, which was previously POS-tagged with Spacy Tagger 10 .The language model for this training was based on trigrams centered around verbs, using syntactic and semantic information.
LISTING2: Example of the English lemma 'picture' in aLexiE.
<? xml v e r s i o n = " 1 .0 " e n c o d i n g ="UTF−8" s t a n d a l o n e =" no "?> < l e x i c o n > <word > <lemma> p i c t u r e < / lemma> < c a t e g o r y >noun < / c a t e g o r y > <number > s i n g u l a r < / number > < p l u r a l > p i c t u r e s < / p l u r a l > < s e m a n t i c _ t a g > o b j e c t < / s e m a n t i c _ t a g > </ word > </ l e x i c o n > As previously mentioned, we used MCR to get the semantic classification of the entries tagged as nouns in aLexiE.In this case we established four semantic categories to start working with: living things, foodstuff, places, and objects.For each verb lemma in the training set we analysed if it was followed by a preposition and a noun or a determiner and a noun.We computed the probability by semantic category.Let us consider the entries in Table 1.Regarding the verb lemma 'look' and the semantic tag object, the preposition 'at' has the highest probability to go in between according to Table 2.
In this way, the language model along with the semantic classification allow us to infer the most suitable preposition after a verb by applying semantic knowledge rather than by only considering morphological forms.
Listing 3 shows the aLexiE entry for verb 'look'.Note how the language model has learned to add the preposition 'at' when the verb is followed by an object (semantically speaking) or 'for' in the case of foodstuff or a place.If 'look' is followed by a living thing, the system will add the preposition 'like'.In the running example in this section, since 'picture' is tagged as an object in aLexiE, and given the syntactic and semantic data in the 'look' entry; the system will insert preposition 'at' between the two words if provided as input to the system.

B. SYNTACTIC STRUCTURE SUPPORTED BY A GRAMMAR
In this section we describe the syntactic stage of our system.It performs syntactic structuring with the Definite-Clause Grammar (DCG) [62] Syntactic structuring, also called parsing, creates the tree structure of the desired target sentence.We infer this structure by checking the syntactic trees from the grammar for the input words.Obviously diverse possible syntactic structures may result, depending on the roles of the input words in the sentence.Fig. 1 shows the syntax trees from the grammar for the input words 'she', 'look', 'picture'; 'She looks at the picture' and 'She looks the picture'.Context-Sensitive Languages (CSL) are created from this type of grammar.The system picks the most appropriate trees for the input words given the different possibilities within a grammar.
For the English case, we adapted the simple grammar with wide range of basic sentences described in [4], by adding all its linguistics features, such as adjectives preceding nouns.The system can parse sentences regardless of their complexity.Sentence types may be affirmative, negative, interrogative (in positive or negative form) or imperative (in positive or negative form), including some of the following features: a nominal syntagm subject; a coordinated nominal syntagm subject (compound subject); a nominal syntagm direct complement; a coordinated nominal syntagm direct complement; an indirect complement, and other place or time complements.In our notation, upper case corresponds to tree structures and lower case to word components.Fig. 1 illustrates some linguistic rules taken from the grammar.2) Rules for adjectival/adverbial/prepositional syntagms In this case, an adjectival or adverbial syntagm, which consists of an adverb followed by an adjective or vice versa, may precede or follow a noun.Noun-Noun modifiers such as in 'car door' are not considered.In prepositional syntagms (PS), unless empty, a preposition precedes a nominal syntagm.Prepositional syntagms just follow (never precede) a nominal syntagm.

3) Predicate rule
The sentence predicate contains a verb that may be followed by a nominal syntagm (coordinated or not).The verb may be accompanied by an adjectival/adverbial syntagm.Note that a verb can be followed by two nominal syntagms (yet not coordinated ones, that is, without a conjunction in between), as in sentence 'She gave me a cookie yesterday'.

4) Sentence rule
Sentences are composed either of a nominal or coordinated nominal syntagm (subject) and a predicate, or of a single predicate (without subject).The latter is quite common in imperative English clauses.Given the relations among syntagms, the depth level of our system is limited to two iterations to reduce computational load.For example, in case a nominal syntagm includes a prepositional syntagm, the second nominal syntagm cannot contain another prepositional syntagm (to avoid recursion).

C. PROPOSED NLG LIBRARY: SENTENCE PLANNER AND SURFACE REALISER
The input words for our NLG library should be meaningful, such as adjectives, nouns, and verbs.The library can automatically infer the determiners, conjunctions, and prepositions that complement those input words in the output sentence.
Fig. 2 shows our two-stage architecture, an automatic NLG processing pipeline.The user introduces the words (plus symbol ?for interrogative sentences) in [subject, verb, object] (SVO) order, which is not limiting in practice in many domains.The first Sentence Planner stage performs lexicalisation, which adds words and configures sentences.The second Surface Realiser stage, our main contribution, introduces any extra elements that may be necessary and applies morphology inflections to produce grammatically correct and coherent sentences.Fig. 3 represents an example of generation using the library.Fig. 4 shows the flowchart of our NLG library.The main tasks are the following ones: 1) Detection of the linguistic structure (affirmative, negative or interrogative) of the sentence (Sentence Planner) The sentence is considered negative if one of the input words is the negation adverb 'not'.It is treated as interrogative if the input words include a question mark.If both elements are present, a negated question is generated by the system.The sentence is considered affirmative in any other case.This is the case in our example in Fig. 3.The library also adds extra elements corresponding to any linguistic realisations in the grammar and the knowledge in aLexiE.

2) Subject insertion (Sentence Planner)
Imperative sentences and other sentences with elided subjects are quite common in English, for example 'Go to your room'.We want the NLG process to be almost transparent to end users and, thus, if the user does not provide a subject, the library takes the personal pronoun 'I' as such.Besides, it generates a second option with elided subject in case the user wants to create an imperative clause.In our example in Fig. 3 and Fig. 4, the library does not include a subject because the user introduces 'she' as input.This is detected thanks to the grammar.

3) Inference of syntax structure (Sentence Planner)
The separation between subject and predicate simplifies the identification of the best syntactic trees for the input words, since they are smaller.Once the Sentence Planner decides the type of sentence, the library sets the boundary between the subject and the predicate taking the main verb as a reference and then searches for the best syntactic structure that matches them.For this purpose we follow a Depth-First Search (DFS) [63] in our grammar departing from the input words.In case that some of the input words are not in the aLexiE lexicon, they will be treated as proper nouns.This means no inflections will be applied to them.
In our example in Fig. 1, the system infers two possible syntax structures (options 1 and 2).

4) Inclusion of extra elements (Sentence Planner)
Once the syntactic structure is decided, some extra elements such as determiners, prepositions, and conjunctions may be FIGURE3: Sentence generation with our two-stage NLG library FIGURE4: Flow chart of the NLG procedure necessary.These elements are inserted in the sentence if they are associated to feasible grammar realisations.In our example in Figs. 3 and 4, the library adds to the output the extra elements that were inferred in the previous stage from the grammar.

5) Morphological inflections (Realiser)
This encompasses the inflections that are necessary to produce a grammatically correct sentence, in which the subject dictates the morphological conjugation, person, number, and gender inflections of the verb and other components.
The library distinguishes between subject and predicate before generating a sentence.In this regard, it can apply linguistic features to adjust person, gender, and number, to create sentences with coordinate subject.For example, given the input words 'caregiver', 'I', 'eat', 'apples', the subject of the resulting sentence 'The caregiver and I eat apples' is compound.
First, person, gender, and number features must be de-rived from the input words.The subject (expected to be a nominal or coordinated nominal syntagm) determines them.Continuing with the running example 'The caregiver and I eat apples', the subject is a coordinated nominal syntagm.The first nominal sytagm within is composed of the determiner 'the' and the singular noun 'caregiver'.The second is the pronoun 'I'.Consequently, the person and number of the sentence are first person and plural, and the verb 'eat' is inflected accordingly.By default, the first time the user introduces input words, the library takes masculine gender, singular number and first person.Then, using aLexiE, it adapts these features with grammar rules.For instance, if the subject is a coordinated nominal syntagm, the output sentence is plural.Regarding gender, the output sentence is only feminine if all subject components are so 11 .The following rules are applied in strict order to adjust the person feature: (1) if the subject has an TABLE3: Results of our NLG library

Input words
Best output sentence/s something, be, not, right Something isn't right.where, my, glasses, be, ?
Where are my glasses?dinner, be, good, last, night Dinner was good last night.appreciate, your, help, concern I appreciate your help and concern.live, yellow, house I live in the yellow house.how much, stamps, be, these, days, ?
How much are stamps these days?final, grades, be, available, after, class, today, ?Are final grades available after class today?aLexiE contains the number and gender inflections of all lemmas and the person features of verbs and pronouns.Once these features are decided, they are applied to all word inputs.However, in case the subject is missing, default features must be set (as previously mentioned).
The verbal tense of the output sentence is present unless a time adverb or a time adverbial locution (e.g. last week) are provided.For example, for the adverb 'yesterday', the tense of the output sentence is past.This linguistic information can also be found in aLexiE.
The library handles contraction spellings as well.We implemented those from a freely accessible list 12 .
In case a word is missing in the lexicon, no related features are available or they cannot be inflected, our library treats the word as a proper noun.
When generating a sentence, it is necessary to create its syntagms and join them while respecting their syntactic and semantic function.For example, to create 'She looks at the picture' from the input words 'she', 'look' and 'picture', it is necessary to generate the nominal syntagm the 'picture' and integrate it into a prepositional syntagm as 'at the picture'.It is also necessary to build the subject of the sentence 'she' and a predicate with 'look' as the main verb.Finally, it is necessary to integrate in the output sentence the subject and the predicate with the prepositional complement ('she looks at the picture').All these stages are automatic, even the inclusion of the preposition, from the syntactic and semantic information in aLexiE.
Table 3 shows examples of automatically generated sentences with increasing linguistic complexity.Alongside each example we indicate the input words.

IV. EXPERIMENTAL RESULTS
First, we compared the aLexiE lexicon with other lexica from the state-of-the-art (Section IV-A).Then, we evaluated automatic English text expansion with our system (Section IV-B), both directly (by regenerating corpus sentences) and manually (from annotations) using a widely used corpus in the NLG field (Section IV-B1).To the best of our knowledge, there is no other system for automatic text expansion.Therefore, we have created the English version of a corpus (Section IV-B2), to compare automatic text expansion in a multilingual scenario.

A. LEXICON
There are several resources for statistical natural language processing and corpus-based computational linguistics 13 .The main differences between aLexiE and those freely available online resources are coverage, correctness and completeness of linguistic information (morphology, syntax, and semantics).
Table 4 shows the information we combined from the selected resources to create aLexiE.According to [64], EnLex contains 508,000 unique lemmas, corresponding to 695,000 unique forms.We only extracted some entries, yielding 212,021 unique lemmas that correspond to 41.74% of the extracted entries.NIH contains 505,145 unique lemmas and 955,564 inflected forms.In this case we only extracted 67,660 unique lemmas that correspond to 13.39% of the lexicon.Specifically, we only extracted adjective, adverb, conjunction, determiner, preposition, and pronoun entries, since other entries had no associated morphological information.Freeling for English contains 37,000 unique lemmas, corresponding to 68,000 unique forms.We only extracted adjectives, adverbs, and verbs (for the same reason as for NIH), producing 14,368 unique lemmas corresponding to 38.83% of the original set.
Table 5 shows the sources for the semantic classification of the nouns in aLexiE.We only searched in FrameNet14 the lemmas of the nouns that were missing in MCR.Table 6 shows the amount of lemmas after merging and verification.Table 7 shows the lexical categories of the lemmas and forms in aLexiE.Most were tagged as nouns (77,966), yielding over 141,000 aLexiE inflected forms.Determiners and pronouns were revised manually to include plural forms for the lemmas 'this' and 'that'.

B. ENGLISH TEXT EXPANSION
As previously said, we evaluated our system by extracting keywords from sentences from a widely used corpus in the NLG field (Section IV-B1) and an ad-hoc Spanish-English parallel corpus (Section IV-B2).We evaluated output quality in terms of completeness, correctness and similarity to the original sentence.We decided to discard some common state-of-the-art metrics such as ROUGE [65] and BLEU [66], because they weakly reflect human assessment of NLG, as discussed in [67].

1) AAC corpus
Even though our approach may be used for general NLG scenarios, we chose an AAC corpus 15 for our first evaluation due to the interest of AAC as a representative real application.Some AAC tools such as Talk Together 16 and LetMe Talk 17have small vocabulary packages with rigid interactions.None of them generates messages taking morphological, syntactic, and semantic information into consideration.The interest of NLG for AAC is illustrated by several previous works [68]- [70].First we selected sentences without commas or hyphens, to ensure that there was a single sentence/idea in a clause (our system could handle multiple ideas as separate sentences).Since our system performs NLG automatically, we selected sentences in present, past, and future tense because these can be inferred by time adverbial complements.We then filtered the result to obtain the main words (adjectives, adverbs, nouns, pronouns, proper nouns, and verbs).Next we lemmatised all those words but the nouns and pronouns.This was because if we lemmatised the latter, the system would have no clue to generate a sentence with a plural noun or pronoun since the features of these particular words are independent from other components of the sentence (conversely, adjectives depend on the noun they modify).For this purpose we used the Spacy syntactic parser.The resulting dataset had 1,869 English sentences and their main words.

a: Annotation
We introduced the main words of a target sentence into our automatic NLG system and we studied the output sentences.In case of a full match between the target and generated sentences, automatic generation was considered totally successful.This happened for 1,315 sentences, 70.25% of the total.The remaining 554 were manually inspected.
Of these, 15 differed only in few capital letters.This was due to errors in the target and missing words in aLexiE that were treated as proper nouns.Our system correctly replaced words by proper nouns in eight sentences, such as 'I need a new harry potter book', which was generated as 'I need a new Harry Potter book'.Even though the matches were inexact, we consider these sentences success cases rather than failures.In five sentences the system failed to detect the lexical category of some input words due to missing data in aLexiE, as in 'I'm itchy', which our system generated as 'I'm Itchy' (words like itchy are neither present in aLexiE nor in MWD).We did not consider these sentences failures because they were due to missing words in the dictionary.Finally, there were two sentences containing words without an aLexiE entry that indeed existed in MWD: 'Need a bigger size' was generated by our system as 'Need a Bigger size', and 'It is 2 o'clock' was generated as 'It is 2 O'clock'.These were indeed failures of the system.There were 22 sentences containing spelling mistakes in the target such as 'I have an appoinment with the doctor today' ('appoinment' instead of 'appointment').Consequently, our system was able to generate 1350 18 correct sentences automatically, corresponding to 72.23% of the total.
Finally, it was only necessary to evaluate 517 sentences manually.They were revised by five NLG researchers from atlanTTic, University of Vigo, with English skills equivalent to C1 in the Common European Framework of Reference for Languages (CEFR) or 95 or above in the Test of English as a Foreign Language (TOEFL).Table 8 shows the annotation options.8).The annotators rated the quality of the generation from 0 (not generated) to 5 (full match between target and output) 19 .Moreover, when the system presented different alternative outputs the annotator had to choose one.We noticed that, in some errors, except for SVO order the system would have succeeded in generating the targets.The annotators were requested to provide output suggestions in that situation.
The annotation task took two months.We handed instructions and examples to the annotators in advance to guarantee the consistency of the resulting corpus.The tests exploited various features of English grammar such as sentence type and constructions with different word categories.The annotation script returned an XML file.Listing 4 shows an annotated sentence example.
The final results can be summarised as follows.Firstly, we must distinguish the cases when our NLG system generated a single possibility from those with several output sentences.In the first case the error type was set by majority vote between the annotators.If the annotators did not agree, the sentence was tagged with no consensus about error type.The final rating of each output sentence was computed as the arithmetic average of annotator ratings.In the second case, first we checked if there was consensus in the best realisation field.Otherwise, the sentence was tagged with no consensus about best realisation.If the annotators provided suggestions of best realisations and there was a consensus about them, we 18 1315 + 8 + 5 + 22 = 1350 19 0 and 5 ratings were automatically treated.
tagged and rated the best output candidate as in the first case.Table 9 shows the distribution of the annotations.When the annotators agreed about error and best realisation, their average rating indicates that the information in the target could be understood from the generated sentence.This also happened when the annotators agreed about the best realisation but there was no consensus about error type.The annotators suggested 367 different alternative outputs, of which our library generated 160 correctly and automatically (43.597%).The remaining 207 sentences (56.403%) were not manually inspected.We suppose that many of these generated sentences might be considered appropriate as well.
The main mistakes were due to verbal tense adjustment, since many sentences were in past tense but did not have any time-related complement.Another common failure was adding a different preposition instead of the one in the target (in some cases, however, this change did not modify the meaning of the output sentence).

b: Evaluation agreement
Manual evaluation was monitored with two recognized agreement metrics that yield robust estimations of the differences between annotators: Alpha-reliability [71], [72] and accuracy.
When the annotators perfectly agree, Alpha = 1.When their agreement seems by chance Alpha = 0. Obviously both extremes should be avoided.
Our evaluation focused on nominal data because we measured error annotation agreement between five observers.As previously said, we computed the agreement in error type and obtained the average rate.The first step was to build a 5-observers-by-523-sentences reliability data matrix containing 5 × 523 values.
Table 10 shows that our system generated 523 sentences for the 517 target sentences in the corpus.This was because there were several generated candidates for some targets.Next we tabulated the coincidence matrix in Table 11, in units.Coincidence matrices take into account the values in a reliability data matrix.They differ from contingency matrices in that the latter consider units in two dimensions, not values.Our coincidence matrix accounted for all pairable errors from the five annotators into a 6-by-6 square matrix, omitting references to annotators.This type of matrix is symmetric with respect to its main diagonal, which holds all perfect matches.Note that the coincidences were counted twice in the coincidence matrix.Disagreements (represented by offdiagonal cells) were also counted twice, yet in different cells.We then estimated inter-agreement accuracy between pairs of annotators.This simply averages the proportions given by the diagonal of the coincidence matrix.Note that it neither accounts for fortuitous (dis)agreement nor for value ordering.The results for Alpha and accuracy in Table 12 are promising.Tables 13 and 14 represent inter-agreement between pairs of annotators [73]- [75].

2) Spanish-English parallel corpus
We are not aware of the existence of other systems for automatic text expansion as we have defined it.Therefore, we decided to apply our automatic NLG system to Spanish and English and compare the results.
For this purpose, we manually created the English version of the Spanish corpus used in [4] 20 .The final parallel corpus is composed of 948 English/Spanish sentences covering various grammar features such as different sentence types and constructions with different word categories.
Table 15 shows a comparison between the English and Spanish versions in terms of automatic generation using the parallel corpus.First, our system generated 613 English sentences automatically.The remaining 123 sentences were inspected manually.We noticed that the most relevant mistakes (in 106 sentences) were due to Wikipedia training, since the system failed to add a certain preposition in the target.Two of the other 17 sentences were actually correct, since they only differed in some capital letters, and four had differences in determiners that did not affect their meaning.Second, our system was able to automatically generate 72 English sentences out of the 212 that the system in [4] failed to generate in Spanish.The remaining 140 sentences were manually inspected.The most common mistakes were related to verbal tense adjustment (64 sentences) and wrong prepositions (34 sentences).
Next we compared the approach in this paper with the system in [4].We correctly generated 77.64% and 72.26% of the Spanish and English sentences in the parallel corpus, respectively.
The most important difference between the two languages was the use of prepositions.In Spanish there were few mistakes of this kind, but they were the most common in English.In our opinion this was due to the difficulty to detect phrasal verbs.

V. CONCLUSIONS
We have developed an automatic hybrid system for English text expansion.Relying on the aLexiE lexicon and our grammar, our system is able to perform fully automatic text expansion from few input words.The integration of new lexical resources for any language is simple.The architecture separates domain-independent from domain-dependent components, so that the latter can be substituted.We remark that the aLexiE lexicon and the grammar we have developed for English expansion are relevant results in themselves.They could be useful to other NLG researchers.
As far as we know this is the first fully automatic hybrid system for English text expansion, combining a knowledge base of vocabulary and grammar realisations with a statistical language model for preposition inference.Our system has a good success rate when generating coherent and grammatically correct sentences from user-selected input words.
The surface realiser relies on aLexiE and our grammar to take its decisions.For this reason, we provide a detailed description of the procedure to create them.As future work

FIGURE1:
FIGURE1: Syntax tree resulting from the grammar

LISTING3:
Example of the English lemma 'look' in aLexiE.<? xml v e r s i o n = " 1 .0 " e n c o d i n g ="UTF−8" s t a n d a l o n e =" no "?> < l e x i c o n > <word > <lemma> l o o k < / lemma> < c a t e g o r y > v e r b < / c a t e g o r y > < p r e s e n t 3 s > l o o k s < / p r e s e n t 3 s > < p a s t > l o o k e d < / p a s t > < p r e s e n t _ p a r t i c i p l e > l o o k i n g </ p r e s e n t _ p a r t i c i p l e > < p a s t _ p a r t i c i p l e > l o o k e d </ p a s t _ p a r t i c i p l e > < o b j e c t > a t < / o b j e c t > < f o o d s t u f f > f o r < / f o o d s t u f f > < l i v i n g > l i k e < / l i v i n g > < p l a c e > f o r < / p l a c e > </ word > </ l e x i c o n >

LISTING4:
Annotation example <? xml v e r s i o n = " 1 .0 " e n c o d i n g ="UTF−8"?> <TAGGING> <CLAUSE> <TARGET> Dropped my c h a n g e .< /TARGET> < G e n e r a t e d _ C l a u s e s > < C l a u s e > Drop my c h a n g e .< E r r o r >b < / E r r o r > < R a t i n g >1 </ R a t i n g > </ C l a u s e > < C l a u s e > I d r o p my c h a n g e .< E r r o r >a < / E r r o r > < R a t i n g >2 </ R a t i n g > </ C l a u s e > </ G e n e r a t e d _ C l a u s e s > < B e s t _ r e a l i s a t i o n >2 </ B e s t _ r e a l i s a t i o n > < S u g g e s t i o n _ f o r _ G e n e r a t i o n > I d r o p p e d my c h a n g e y e s t e r d a y .</ S u g g e s t i o n _ f o r _ G e n e r a t i o n > </CLAUSE> </TAGGING> TABLE9: Distribution of annotations of our dataset

TABLE10:
Reliability data matrix of the AAC annotated dataset considering the error types in Table8

Algorithm 4 :
Adding syntactic and semantic data function ADD_SYNTATIC&SEMANTIC_DATA({ALEXIE}) for e ALEXIE ∈ {ALEXIE} do lemmae ALEXIE = e ALEXIE .getLemma()categorye ALEXIE = e ALEXIE .getCategory()if (categorye ALEXIE .isNoun()AND lemmae ALEXIE .isInMCR()) Automatic lexicon extension probabilities by lemma verb and semantic category TABLE1: Automatic lexicon extension.Sentence examples Sentence After processing I look the business.look + EMPTY + object She looks at the picture.look + at + object You look at the car.look + at + object She looks like her mum.look + like + living thing TABLE2: ) if the sentence has an element that refers to the second person with no relation with the first person, the output sentence is adjusted to second person; (3) finally, if the sentence has an element that refers to the third person with no relation with the first and second persons, the sentence remains in third person.
Semantic sources for entries tagged as nouns in aLexiE TABLE5: Merged and verified lemmas in aLexiE by lexical category TABLE6:TABLE7: aLexiE lemmas and forms by lexical category Coincidence matrix of our annotated dataset considering the error types in Table8 TABLE11: