CQACD: A Concept Question-Answering System for Intelligent Tutoring Using a Domain Ontology With Rich Semantics

In this study, a Concept Question Answering system applied to the Computer Domain (CQACD) for intelligent tutoring is proposed. This system is a dialogue-based Intelligent Tutoring System (ITS) that allows the tutor and student with mixed-initiative and natural language to ask each other questions concerning the basic computer knowledge in the Computer Basics course. CQACD is based on constructivist principles and encourages the learner to construct knowledge rather than merely receiving knowledge, which has the following characteristics: (a) this system employs a domain ontology with rich semantic relationships to model the basic computer knowledge and build up a concept-centric knowledge model, (b) uses a limited number of 80 input templates with description logics to acquire the intention of questions posed by students, (c) a textual entailment algorithm with semantic technologies is proposed to match the input template and assess the student’s contribution to improve the flexibility of the system, and (d) an ontology-driven dialogue management mechanism is proposed, which can quickly form the conversational content and conversational sequence. The experimental results show that CQACD can replace the teachers’ tutoring in large classes and can promote the learning of poor students in large classes better than teachers can. The paper reveals that the domain ontology with rich semantic relationships plays an important role in the Concept Question Answer System (CQAS). It can model CQAS’s discipline knowledge, provide structured domain knowledge for student model, template design and matching, and provide basic architectural architecture for dialogue management.

the domain ontology. Ontology is a sharable knowledge base template based on a structured Frame and Semantic Network with Description Logics (DL) [8]. Therefore, ontologydriven QA systems have strong knowledge organization and logical reasoning capabilities, and OE-based QA systems are particularly suitable for the development of dialogue-based ITSs that might require integrating various types of resources for personalized learning [9]- [12] and providing reasoningbased answers for intelligent tutoring [6], [13].
With the rise of online education, various MOOCs have emerged, moving more and more courses from physical classrooms to online, and the COVID-19 pandemic has exacerbated the process. In online teaching, a prominent contradiction is: there is a large number of registered students, but the tutor is insufficient. Therefore, developing more question-answer ITSs becomes a very urgent task. The Computer Basics course, which teaches about the basic theory and principle of computer science including the basic knowledge in the composition structure and working principle of computers, computer applications, operating systems, information security and computer networks, is a compulsory general course for non-computer major students in most of colleges and universities in China. A common problem that is associated with this course is the large number of registered students and a lack of relevant teachers. To address this problem, a number of colleges and universities in China have allowed students to study this course by employing educational websites. However, to provide sufficient help to students to facilitate their learning remains a problem that needs to be urgently solved. To solve this difficulty, we have designed CQACD, a concept QA system, for this course. CQACD, which is a restricted do-main-oriented QA system, can accept questions that concern the basic and theoretical knowledge of computer science posed by students in the natural language and can guide them through tutor-initiative dialogues to learn and construct the basic computer knowledge in the Computer Basics course. We have designed CQACD in two language versions (English and Chinese). For ease of reading and understanding, only the English version of CQACD is discussed in this paper. The main contributions of this paper can be summarized as follows: (1) We abstract the text corpora in the conventional dialogue tutoring system to the relationship and attribute restrictions and axioms of concepts in a domain ontology and build up a concept-centric knowledge model to provide powerful semantic support for natural language processing and answer reasoning in the system, which improve the subject knowledge model in in the conventional dialogue-based ITSs from the curriculum script [21], [28], [29] to structured domain ontology; (2) We design a template description logic system with domain ontology elements as operands to define the input templates and tag ontology's attribute restrictions so as to accurately represent their semantics and significantly reduce the number of question templates in QA systems from 27, 126, 355 [34] to less than 100; (3) We employ semantic technologies including mandatory word annotation, ontology element type analysis, word similarity and tone similarity to improve the flexibility and accuracy of Lexical-based Textual Entailment (TE) algorithms [41] for template matching and contribution assessing by more than 20%; (4) We propose an ontology-driven dialogue management mechanism based on the scaffold instruction theory of the zone of proximal development [14], which improves the design of the conversational content and conversational sequence from the hand-crafted mode [24] to the ontologybased automatic generation.

II. RELATED STUDIES
A QA system generally includes three modules: question pre-processing and classification, information retrieval and answer extraction. In a QA system, question pre-processing and classification is the first execution phase and comprises a crucial step with the objective to enable a computer to understand and recognize a user's question and determine the user's intention in asking the question so as to provide clues for the subsequent information retrieval and answer extraction steps. In the pre-processing phase, users' questions are processed by a procedure that generally involves lexical analysis, syntactic parsing and semantic analysis. Question classification is the process of determining to which type the user's question belongs in the predefined question patterns. Two methods exist to classify questions. In open domainoriented QA systems, users' questions are automatically classified by machine learning methods, including the SVM (Support Vector Machine) classification [15] and neural network classification methods [16], in which the semantic characterizations of questions are usually represented by named entity recognition [17] and semantic role labeling [18]. In restricted domain-oriented QA systems, users' questions can be processed and classified by ontologybased triples [4], [6] or ontology-based language templates (formulae) for questions [19], [20], in which question semantics are interpreted by the ontology. In the information retrieval step, an open domain-oriented QA system primarily acquires candidate answers from documents in websites or a large corpus based on the question's semantics and its answer type and then selects the best answer using the answer extraction module [47], whereas a restricted domain-oriented QA system usually completes answer retrieval, reasoning and extraction by the domain ontology.

A. DIALOGUE-BASED-ITSS
Dialogue-Based ITSs (DBITSs) is a subset of ITSs that emulate the natural language dialogues in human tutoring [21]. Based on the way of dialogue, Domeshek [22] classified DBITSs into two types, namely, tutor-initiative DBITSs and student-initiative DBITSs. Tutor-initiative DBITSs that adhere to constructivist principles attempt to get learners to do most of the talking by providing questions with hints or prompts, forced choices and other pedagogical scaffolds [21], such as DIALOG project in MEGA [3], Geometry Explanation Tutor [23], Why2-ATLAS [24] and BEETLE II [6].These computer tutors vary in the extent to which they simulate human dialog mechanisms, but all of them attempt to comprehend natural language, formulate adaptive responses and implement pedagogical strategies to facilitate learning. In contrast from tutor-initiative DBITSs, the objective of student-initiative DBITSs is to provide a convenient man-machine interface for students to ask questions to the system during their learning process instead of facilitating learning by pedagogical strategies. To enhance students' initiative in their knowledge construction, many DBITSs provide the student-initiative dialogue function, such as the SCHOLAR system [25], CIRCSIM-Tutor [26], ATLAS [27], AutoTutor [28] and Conversational-Tutoring-Agent [29]. In today's mobile Internet era, where knowledge explosion occurs, student-initiative DBITSs are significant in helping students to acquire the knowledge they are interested in. The concept QA system CQACD in this study is a tutor and student mixed-initiative DBITS.
At present, the most prominent deficiency of DBITSs is that systems lack a rich semantic and highly structured domain ontology to model subject knowledge, and some DBITSs [21], [28], [29] even use course scripts to express subject knowledge, which cannot provide effective support for template design, knowledge management, knowledge inference, building dialogue order and student model.

B. QUESTION TEMPLATES IN QA SYSTEMS
Question templates [19] in QA systems, which are also referred to as question formulae [20] or question patterns [2], [30], [31], are a question processing mechanism that is widely employed in question clustering and classification. The main goal of question templates is to map the problem pattern into the answer pattern. Based on the number of their semantic components, we classify question templates into two types, namely, Simple Question (SQ) templates and Language Question (LQ) templates. SQ templates refer to question formulae that contain only the core semantics of users' questions. For example, in question patterns adopted by Ray et al. [30], the simple template ''when questions'' is the abstraction and representation of all questions that are similar to ''when is the operating system loaded?''. SQ templates directly contain only a relatively small amount of semantics. To achieve fine classification of a question, additional semantic characteristics of questions need to be extracted by the syntactic parsing and semantic analysis [15], [18], [32] and a machine learning method, such as an SVM method [15], [32], need to be employed. SQ templatebased QA systems are highly automated, eliminating the need to manually build large-scale question templates, but they have a relatively complex question classification process that is significantly dependent on the syntactic parsing and semantic analysis accuracies and only moderately classification precision because of the widespread existence of semantic ambiguities [33]. Therefore, SQ templates are generally employed in open domain-oriented QA systems that can't pre-establish a complete question template library.
LQ templates refer to question formulae that contain the entire sentence structure and all semantics of users' questions [5], [19], [20]. For example, in the QACID system designed by Ferrández et al. [20], the LQ template ''where can I see [MOVIE]?'' is the abstraction and representation of all questions that are similar to ''where can I see Saw 3?''. Because a LQ template already contains the entire sentence structure and all semantics of a user's question, the question does not have to be parsed and can be directly classified using a textual similarity or TE-based template-matching algorithm [5], [19]. To reduce the size of the template library, LQ templates often represent and interpret formulate the core semantics of questions with ontology element symbols. For example, in the QACID system [20], the LQ template ''where can I see [MOVIE]?'' represents all movie entities with the ontology concept MOVIE. An ontology usually contains a large number of concepts and relationships. Consequently, such ontology element-based LQ templates still have these shortcomings as insufficient representational capacity and overly large template libraries [2], [6], [19], [20], [34]. For example, in the template library adopted by Cuix et al. [34], 27, 126, 355 LQ templates with concept names are used to process CN-based question classification. To improve the universality of question templates and further reduce the size of the template library, a first-order template description logic system with domain ontology elements as operands that can ensure decidable reasoning is introduced in this study. This logic system can use a first-order predicate to abstract the same ontology element into a variable in the question template, e.g., our template ''What is the <p: stringProperty> of <c: Concept>?'' is the abstraction and representation of all questions that ask any a string property of any a concept in the ontology.
In this study, LQ templates carry two functions of acquiring the intention of questions posed by students and formulating questions of the tutor to students and were expanded into input templates that include Meta COMmunication (Meta-COM) templates and question templates. After introducing description logics in the template, only 80 input templates are required to satisfy all the requirements of the students' interactions and asking questions.

III. SYSTEM OVERVIEW
The concept QA system CQACD designed and implemented in this study is a mixed (both tutor and student) initiative DBITS that concerns the basic knowledge of computer science. CQACD can play two roles in the Computer Basics course: an intelligent answering system to answer questions about basic computer knowledge posed by students in natural language form and a dialogue-based ITS for guiding students to learn and construct basic computer knowledge. CQACD is a typical knowledge-based system because it contains a domain ontology, a template library, a semantic dictionary and a student model library. Under the support of these knowledge bases, the mixed-initiative dialogue in CQACD is implemented by five functional modules, as shown in Figure 1. This chapter gives a brief introduction to each part of the CQACD system. They are described in more detail in subsequent chapters. Table 1 gives an example of the whole QA process.

A. BASIC COMPUTER KNOWLEDGE ONTOLOGY
Our CQACD system uses a domain ontology of Basic Computer Knowledge Ontology (BCKO) with rich semantic relations to formalize and manage all subject knowledge in the basic areas of computer science and build up a concept-centric knowledge model. This domain ontology is the core knowledge base in CQACD system, which plays the following roles: (a) provides all the subject knowledge for the QA and dialogue-based tutoring in CQACD, (b) pro-vides a glossary for the annotation of the ontology elements in the input preprocessing and the definition of question templates, (c) provides the dialogue order for the dialogue management through its concept hierarchy, (d) provides complete cognitive units for the student model, and (e) provides the basis for answer reasoning through its semantic relations and concept axioms.

B. TEMPLATE LIBRARY
Our template library is a collection of 80 input templates: where MetaCOM templates are used to classify students' metacognitive inputs (e.g. ''I need help'', ''I don't know'') and navigation inputs (e.g. ''I want to learn computer hardware''). Question templates are ontology and logic-based question formulas to classify students' question inputs; they reflect students' interests and needs to the basic computer knowledge. In addition, the question template in the template library can also serve as a language template for the tutor's asking questions. Each input template is composed of five bound parts: template category, template structure, synonymous structure, reasoning rule and reasoning function.

C. WORDNET
WordNet is a large synonyms semantic dictionary based on cognitive linguistics and is designed and realized by psychologists, linguisticians and computer engineers in Princeton University [35]. It can be widely used for text classification [45] and semantic similarity calculation [46]. WordNet, in our CQACD system, is used to provide more semantic interpretation for student inputs based on BCKO ontology. The introduction of WordNet in CQACD significantly improves the accuracy of template matching and contribution assessments.

D. STUDENT MODEL
Student model is used to track students' learning status (how much/how well has been learned) and is an overlay of the domain ontology in CQACD system. Our CQACD system is a concept-based ITS that guides students through dialogues to gradually grasp the conceptual knowledge in basic computer areas. Therefore, it does not require the complex Bayesian knowledge tracing [36], [37] for student model, which is often used to build student models for skill-based domains like mathematics. We directly use concept maps [38], [39], which are widely used in the ontology-driven ITSs [11], [12], to build the student model in CQACD as follows: where score(c) represents the confidence value that the student knows the concept c, which is measured by an improved TE algorithm with semantic technologies.

E. INPUT PREPROCESSING
Input preprocessing is a process of the pronoun elimination, lemmatization and morphological analysis for the student's input, in which the student's input is eventually marked with part-of-speech (POS) and ontology elements, as shown in Table 2. We use Stanford CoreNLP 1 to implement the lemmatization and POS tagging for student's inputs. Ontology element tagging is the process of successively marking the concepts, properties, relationships and concept sets of the domain ontology that appear in the student's input. In the ontology element identification and tagging process, fuzzy recognition is employed by our extended tagging software, which allows errors (individual misspelled letters) in the student's spelling of ontology elements.

F. MATCHING AND CLASSIFICATION
Matching and classification is used to classify the marked student input by matching it with input templates in the template library as follows: where M&C(MI) represents matching and classifying the marked student input MI. In our CQACD system, student inputs are divided into four types: MetaCOMs, questions, potential contributions to questions posed by the tutor and exceptions. MetaCOMs and questions posed by the student are identified and classified by the corresponding input templates. In this study, a improve TE algorithm combined with ontology element tagging and semantic similarities is employed to determine whether the student's input matches a template. If a relatively high degree of entailment exists between a student's input and a certain template in the template library that exceeds the threshold value, the student's input is considered to match the input template. Contributions and exceptions are classified by our proposed environment sensitive algorithm, in which in the responsive status after tutor's posing a question, the unmatched input is processed as potential contribution to the answer while in non-responsive status the unmatched input is processed as an exception.

G. ANSWER REASONING
Based on the semantic clues in a question and the corresponding template, answer reasoning module provides the answer or solution, which is extracted from the domain ontology BCKO, for the question or problem posed by the student or the tutor. Answer reasoning based on the domain ontology is done by calling the corresponding reasoning function bound with the question template.

H. CONTRIBUTION ASSESSMENT
Based on the solutions extracted from the domain ontology BCKO, contribution assessment module employs a proposed TE algorithm with semantic technologies to assess the 1 http://nlp.stanford.edu/software/corenlp.shtml student's contribution (response) to the question posed by the tutor and to determine whether the student's contribution is correct, error or a misconception. Moreover, this module is also responsible for updating the student model with the evaluation results.

I. DIALOGUE MANAGEMENT
Dialogue management is the most important and complex module of the system. It is responsible for the following personalized dialog functions: (a) to respond to the student's input, including positive encouragement for the correct contribution, the corresponding prompt for the wrong contribution or a misconception and the answer to the student's question; (b) to organize dialogue branches and the dialogue order according to the student's response and ontology's concept hierarchy; (c) to provide the student with help or learning contents in accordance with the current dialogue topic and the student's cognitive level.

IV. DOMAIN ONTOLOGY DESIGN
In this study, our CQACD system replaces the curriculum script in the conventional dialogue-based ITSs [21], [28], [29] using a domain ontology BCKO with rich semantic relations, which can bring significant advantage and convenience in semantic annotation and semantic analysis, knowledge management, knowledge inference, building dialogue order and student model, etc. BCKO ontology consists of six parts: concepts, attributes, relations, synonyms, a concept hierarchy and axioms.
A concept refers to a collection or abstraction of entities with the same characteristics in the basic areas of computer science. An attribute is a binary relationship between a concept and a data object that is mainly used to reflect a certain characteristic of the concept, such as definition, feature, etc. A relation is a binary relationship between concepts, such as the relations store and display in the computer field. Synonyms refer to the set of the concept, attribute and relation names that have the same meaning. All synonymous collections of concepts, attributes and relations constitutes the term set in the basic areas of computer science, which can provide the vocabulary for the recognition and annotation of the ontology element in stutents' inputs and question templates.
Concept hierarchy is a taxonomic structure composed of ''is-a'' relationships (hypernymies) between concepts, which reflects the inheritance relations between concepts in the computer field. It can provide important basis for the automatic reasoning of QA such as ''Is concept1 a type of concept2?''.
An axiom is an assertion of permanent truth about domain knowledge, which is a type of theoretical knowledge that can be understood by the machine, as shown in Table 3. Axioms are used to declare conceptual knowledge and organize a concept-centric knowledge structure (as shown in Figure 2) for our CQACD system, in which axioms with inclusion operator (⊆) are used to declare the hypernym, attribute restrictions and relation restrictions of a concept in BCKO ontology, while axioms with equivalent operator (≡) are used to declare the synonyms and logical definitions of a given concept. The attribute and relation restrictions of concepts in BCKO ontology are important basis for the retrieval and reasoning of answers, which are an abstraction of attribute values or relation targets for all instances of the concept, that is, the same attribute value or relation target of all instances in a concept are extracted as an attribute or relation restriction of the concept. For example, in our BCKO ontology, the value of the definition attribute of the memory concept can be constrained as ''the memory used to store the data'' and the target of the storage relation of memory concept can be constrained as the data concept. There are two restraint ways respectively formed by the universal quantifier (∀) and the existential quantifier (∃). In this study, the universal quantifier (∀) is interpreted as ''can only'' while the existential quantifier (∃) is interpreted as ''can''. Logical definition axioms are usually used to get the machine to understand how concepts are defined, e.g. the logical definition axiom Memory ≡ Storage ∩ ∃accessed_by.CPU tries to get the machine to understand ''Memory is a type of storage that can be accessed by the CPU''.
To extend the role of BCKO ontology in dialoguebased ITSs, we use some special attributes and attribute nomenclature in BCKO ontology. For example, we use the existential restriction of the attribute has_resource_link to point to a URL of the learning resource directory that contains multiple multimedia learning resources corresponding to different cognitive levels for a given concept. Moreover, we use the nomenclature of mis_attribute-name to represent the misconception attribute for a given attribute, e.g. the attribute mis_definition represents the misconception of the attribute definition; and use the nomenclature of difference_from_concept-name to represent a differential comparison attribute between concepts, e.g. the attribute difference_from_hard_disk represents the difference of a given concept with the concept hard disk. More importantly, we also use some special string tags and structures to bind some related information in attribute restrictions. For example, we use the string structure of <misconception description | hint | remedial concepts> to bind the information about an existential misconception restriction, and in Table 3, we define string2=<Memory refers to the storage in the mainframe box | There may be a hard disk in the mainframe box besides memory | host, main box>, which is the value of an existential restriction for the attribute mis_definition of the concept memory.
In this study, we use OWL 2 ontology model to build the BCKO domain ontology that includes the basic knowledge in the composition structure and working principle of computers, computer applications, operating system, information security and computer networks and contains 947 concepts, 66 relations, 55 attributes and over 30,000 axioms.

V. INPUT TEMPLATE DESIGN A. INPUT TEMPLATE STRUCTURE
The input templates in this study are a type of domain ontology-based input formulae that bind the template category, template structure, synonymous structure, reasoning rule and rule's reasoning program, which form the mapping from students' inputs to their intentions or answers. Their Backus-Naur form is defined as follows: <input template>::= (<template category>, <template structure>, {<synonymous structure>}, <reasoning rule>, < reasoning function >).

1) TEMPLATE CATEGORY
To facilitate ontology-based knowledge retrieval and reasoning, we decompose 18 templates that are identified by Graesser and Person [40] in educational settings, and build up 80 input templates based on students' interaction and question requirements, which are classified into seven categories based on the nature of the input and the asked ontological components: metacognitive input, navigation input, ask the data attribute restrictions of concepts, ask the relation (object attribute) restrictions of concepts, ask the definition domain of attribute restrictions, ask the hierarchical structure of concepts and ask the definitional axioms for concepts.

2) TEMPLATE STRUCTURE
Template structure refers to the language structure of an input template and its shallow semantics that are represented with variables and tags.

3) SYNONYMOUS STRUCTURE
Synonymous structure refers to a language structure with the same question semantics as the main template structure. One input template may contain multiple synonymous structures.

4) REASONING RULE
Reasoning rule represents the deep semantics of an input template and the reasoning process of the expected answer or intention of the student's input. The rule semantics is accurately represented by a domain ontology-based predicate formula.

5) REASONING FUNCTION
Reasoning function represents a template-bound program that performs the reasoning function specified by the reasoning rule. Unlike SPARQL inference programs that are suitable for querying the related information of named entities (instances) in an OWL ontology and used in NE-based QA systems [4], [7], [34], we employ Java reasoning programs based on Jena OWL Ontology API 3 to query axioms in BCKO ontology.
To accurately represent their semantics, a template description logic system (TDLS) is designed to define the input templates. This logic system is first-order description logics with domain ontology elements as operands and used for the semantic tagging and interpretation of input templates, which is defined as the following triplet: TDLS::= (<predicate set>, <operators>, <tags>)

1) PREDICATES
Predicates are used to reveal, recognize and determine the ontology elements in an input template. In TDLS there exist two types of predicates: one-place predicates and twoplace predicates. The one-place predicate is used to declare the category of the ontology element to which a template variable belongs. The two-place predicate is used to declare the semantic relationship between two template variables. One-place predicates can also be employed as variable type keywords in the template structure. For example, < c 1 : Concept > indicates that variable c 1 is an ontology concept, in which the predicate Concept is employed as an ontological element type.

2) OPERATORS
In this study, we employ the following two extended operations to improve the representational capacity of the template logic: a type constructor '':'' that define an ontology element variable, e.g. ''x : Concept'' represents that the variable x is defined as an ontological concept; and a conditional operator ''? : '' used to organize the semantics of the branch structure, e.g. ''a ? b : c'' represents that if the condition a is satisfied then the expression b is evaluated otherwise c is calculated.

3) TAGS
Three template element tags are designed to separate and define various types of elements in templates, as shown in Table 4.

C. EXAMPLES OF TEMPLATE DEFINITION
In this section, we give three typical structural definitions of question templates in CQACD system to illustrate the role of the template description logic in the template definition, in which all words in the template are used in their original forms. Moreover, we also give a reasoning function bound with Question Template2 using Jena OWL Ontology API in Appendix A.  <Example>Can the CPU only access memory? 4 Template matching, which is an important step in the input classification, is the process of selecting an input template that matches a given student's input from the template library. To enhance the robustness of template matching, the calculation of the matching between a student's input and a template is considered to be a textual entailment solution-finding process with a focus on determining whether a student's input contains the key semantics of the template. We propose the following three semantic methods to improve the conventional textual entailment algorithms based on ngram [20], [41], [42].

A. MANDATORY WORD-BASED MATCHING INFERENCE
We argue the key semantics of an input template is mainly characterized by the mandatory words that are marked by the delimiter ''<>'' in the template. Therefore, as long as a student's input contains the key semantics of an input template completely, we think that the student's input matches the template. In this study, mandatory wordbased template matching inference conclusions are drawn 4 The code is on https://github.com/wuhan-1222/CQACD by comprehensively considering the following two TE algorithms:

1) SIMPLE MATCHING
This simple matching algorithm does not consider word continuity. As long as a student's input contains all the mandatory words in an input template, this algorithm considers that the student's input matches the template in question, and the formula is: where Inp represents any one student input, Temp represents any an input template, T is a set of mandatory words included in the input template Temp and Lmatch(i) is calculated as follows: where U represents the set of words in the student input Inp.

2) CONTINUOUS SUBSEQUENCE MATCHING
This algorithm focuses on continuous subsequences of mandatory words and determines whether a student's input contains any possible continuous subsequences of mandatory words in a template, and the formula is: where ST k represents the set that contains all the contiguous subsequences whose length is k in the template Temp, and the formula computing f(ST k ) is as follows: Smatch(i) = 1 if∃ j∈SI k j=i, 0 otherwise.
where SI K represents the set of the contiguous subsequences with length k in the student input Inp.

B. ONTOLOGY-BASED MATCHING INFERENCE
This inference further considers the ontology element type based on mandatory word-based matching inference, that is, the ontology elements and their types in students' inputs are automatically marked out before matching, and for any one ontology element variable in the template, it is matched with an ontology element in the student's input as long as they have the same type in the matching computations of the formulae (5) and (8).

C. SEMANTIC SIMILARITY-BASED MATCHING INFERENCE
To improve matching flexibility, the semantic similarity is integrated into mandatory word-based matching inference process. Firstly, when determining whether two words match in Eq. (5) or Eq. (8), these two words are not directly determined whether they are the same term but are instead analyzed whether they are near-synonyms whose similarity degree is greater than the threshold value STH (STH is set to 0.75 in this study) in WordNet. The improved formula of Eq. (5) or Eq. (8) is as follows: if∃j∈Psim(j, i)>STH 0 otherwise. (9) where P represents the set or subset of words in the student input Inp, in which P is equal to U for Eq. (5) and equal to SI K for Eq. (8). Sim (j, i) represents the similarity between words j and i in WordNet, it is computed as follows according to the part of speech of j: 1if synonyms(j, i) else sim noun (j, i)ifj ∈ Nouns else 0.8ifj ∈ Verbsand pathLen(j, i) ≤ 2 else 0.9ifj ∈ Adjs ∪ Advs and similar(j, i) else 0 otherwise. (10) where synonyms (j, i) declares that the words j and i are synonyms in WordNet. Sim noun (j, i) represents the similarity between nouns j and i in WordNet, we employ an efficient path computing model proposed by Li et al [46] for measuring the semantic similarity between nouns in WordNet. PathLen (j, i) represents the shortest path length between verbs j and i in WordNet. Similar (j, i) declares there exists a similar-to relation between the adjectives or adverbs j and i in WordNet.
In addition, to distinguish between positive and negative tones, we use Stanford NLP Lex-Parser 5 to analyze the dependency between words in the input and template sentences and define the following tone similarity between the input and template sentences: ToneSim(Inp, Temp) = 1 ifneg(Inp)=neg(Temp), 0 otherwise. (11) where the function neg(s) determines whether there exists a neg dependency in the sentence s, and if existing, it returns 1, otherwise it returns 0. In Stanford NLP, neg dependency indicates there exists a negation modifier in the sentence.
Based on the analysis of student input corpus, we discovered that there was negative tone only in verification questions. In order to reduce the size of the template library, we did not design any negative templates in the template library. Therefore, when a student enters a negative sentence, only the template with the corresponding positive tone can be matched in the template library. At this time, ToneSim (Inp, Temp) in Eq. (11) is equal to 0, and the system performs the following processing: For a verification question with the answer ''yes/no'', the system returns the same answer with the corresponding positive input; for other questions, the system returns the answer ''I don't know'' to avoid erroneous answers and enhance the robustness of the system.

VII. CONTRIBUTION ASSESSMENT
The traditional ITSs based on the curriculum script usually employ latent semantic analysis (LSA) to evaluate free response questions [21], [43]. LSA for educational purposes use a vector space (also called semantic space) model, formed by the curriculum script, to calculate the similarity between the student response and the standard answer, so as to give the evaluation conclusions of the student's response. LSA solves the problem of response evaluation in the textual domain model without ontology, but its computing process is relatively complex and has only moderately evaluation accurate because it lacks the support of the semantic dictionary and the domain ontology. In this study, we use a domain ontology with rich semantics to model the subject knowledge in the system. Therefore, we employ proposed TE algorithms with semantic technologies to assess the student's contribution (response).
Firstly, we divide students' contributions (responses) to the questions posed by the tutor into the following two categories: (a) a short contribution corresponding to the question whose answer is a single word (e.g. ''yes'' or ''no'', ''right'' or ''wrong''), a single value, and one or more unordered ontology elements; and (b) a long contribution corresponding to the question whose answer is a text related to the concept's definition or the attribute restriction. For short contributions, we use the formula (4) combined with WordNet-based word similarity and synonyms in the domain ontology to calculate the entailment degree of the student's contribution to the correct answer, so as to give a score for the student's contribution. For assessing long contributions, firstly we, in the process of building the domain ontology, use the same way of the input template to mark the mandatory words and ontology elements in the attribute restrictions and misconceptions of all the concepts; and then we comprehensively consider the formulae (4), (6) and (11) with WordNet-based word similarity and synonyms in the domain ontology to determine whether the student's contribution is correct, error or a misconception.
The student's contributions to a question may be given step by step in the dialogue process; the system can automatically accumulate them. When the accumulated contribution score exceeds a threshold of 0.9, the student's answer to the question is determined to be correct. If a student is able to answer the question correctly without any help, the student is judged to have mastered the knowledge point corresponding to the question and a corresponding update is done in the student model.

VIII. DIALOGUE MANAGEMENT DESIGN
Dialogue management is the most flexible and challenging module in dialogue-based ITSs. It usually implements and balances three competing goals: effective pedagogy, smooth conversation and personalized guidance [21]. Effective pedagogy requires that dialogue-based ITSs can organize VOLUME 10, 2022 conversational teaching in accordance with the pedagogical scaffolds of constructivist principles, so as to enhance students' domain level learning outcomes. Smooth conversation is needed in order to accommodate virtually any input of the learner without having the conversation break down. Personalized guidance requires that ITSs can provide the corresponding feedbacks for different student's responses and organize the corresponding sub-dialogs according to students' misconceptions. Advanced dialogue management usually requires teachers to use a scripting language to create a finite state diagram and the corresponding technical texts for each dialogue [24], including defining specific recipes (conversation planning) and their steps, defining the initial answer class labels, assigning optional semantic labels to be used in implementing optional step and difficulty level transitions, and indicating the difficulty levels for each arc and which steps are optional, which is a very complex and time-consuming process.
We argue that the conceptual knowledge model and taxonomic hierarchy in our BCKO ontology is a type of natural pedagogical scaffold of conceptual learning, which is in line with the theory of the zone of proximal development [14]. Based on this perception, we propose an ontology-driven dialogue management mechanism for conceptual learning dialogue, which can quickly form the conversational content and conversational sequence and doesn't require teachers to design extra conversation planning. Our dialogue management mechanism consists of the following ontology-based measures: (a) conceptual knowledge is decomposed into the concept's definitional axioms, definitional attributes restrictions, other attribute restrictions and relation restrictions, and all concepts are organized into an ''is-a'' taxonomic hierarchy; (b) the learning dialogue process for a given concept is carried out according to the order of the above conceptual knowledge structure; the learning dialogue sequence between concepts is based on the taxonomic hierarchy in the domain ontology, and the next learned concept is selected from the hierarchy according to the depth-first rule; (c) a misconception language description is associated with the corresponding feedback information and remedial concepts through a special string attribute restriction in the domain ontology; (d) the question templates in the template library are used as the question types and question language templates for the tutor's asking questions to students; (e) the targeted prompt information is automatically generated through a comprehensive consideration of the wrong key elements in the student's answer and conversational turns; and (f) polite phrases (e.g. ''please tell me'') and cognitive questions (e.g. ''Do you understand?'') are automatically generated and inserted through a proposed context-based algorithm to make the conversation smoother.
Based on the scaffold instruction theory of the zone of proximal development [14], our ontology-driven dialogue frame involves collaborative discussion, heuristic teaching, and encouragement for the learner to construct knowledge rather than merely receiving knowledge, which is implemented in Table 5. Table 6 shows a dialogue fragment based on the above dialogue management algorithm in CQACD system.

IX. EVALUATIONS
As stated in Chapter 3, our CQACD system can play the following two roles in the computer basics course: an intelligent answering system to answer the questions from students about basic computer knowledge and a dialoguebased ITS for guiding students to learn and construct basic computer knowledge. Therefore, CQACD system is evaluated from three perspectives: template matching performance, QA system performance and tutoring effectiveness. A template matching evaluation primarily examines the classification performance of the matching & classification module in CQACD system, which investigates the classification performance of each matching algorithm based on the following indices: the input's classification precision, recall and F 1 -measure, of which the F 1 -measure value is employed as the final comprehensive evaluation index. A QA system evaluation examines the total performance of CQACD as a QA system, which investigates the overall performance of related modules and the completeness and accuracy of the subject knowledge in the domain ontology with the rate of correct answers provided by the QA system to the questions as the final evaluation index. A tutoring effectiveness evaluation examines the effectiveness of CQACD system as a DBITS for guiding or helping students to learn and construct the basic computer knowledge in the Computer Basics course, which is done by comparing it with the learning effectiveness of the traditional class instruction based on teacher lectures and self-regulated learning based on the teaching website.

A. TEMPLATE MATCHING EVALUATION
In order to evaluate the matching & classification module, we asked 10 students to pose a related question or instruction based on each one in the template library, respectively, resulting in a total of 530 different input sentences, and we apply these 530 questions or instructions to template matching evaluation experiments. The questions given by these 10 students are all based on the relevant concepts and relationships in the basic areas of computer science. We use the following three inference methods to evaluate template matching: (1) Lexical baseline inference (LB) [41]: directly consider the mandatory word-based lexical inference in formulae (4) and (6) to evaluate template matching, which is the benchmark for subsequent two improved algorithms. The final lexical-based entailment degree of the input to the template is calculated as follows;   (2) Ontology-Based lexical baseline inference (LB + Ontology): the ontology element type is considered in the lexical inference formulae (12), as stated in Section 6.2.
(3) LB + Ontology + Semantics-based inference (LB + Ontology + Semantics): Based on the above two methods, further consider the semantic similarity-based matching inference described in Section 6.3. The final semantics-based entailment degree of the input to the template is calculated as VOLUME 10, 2022 where spMatch I (Inp, Temp) is a simple matching algorithm with word similarity based on Eq. (9) and Eq. (10), LCSmatch I (Inp, Temp) is a continuous subsequence matching algorithm with word similarity based on Eq. (9) and Eq. (10). In addition, in LB + ontology + Semantics-based inference, the system will perform the tone analysis and processing described in Section 6.3. If in the template library there are multiple templates whose matching degrees with a given student's input are greater than the specified matching threshold, the template with the highest matching degree is selected as the final matched template for the student's input. Further, if there are multiple templates with the same maximum matching degree, the template with more mandatory elements is given a higher priority. The classification results of three matching inference methods are shown in Figure 3.
Through the trend analysis in Figure 3, we can get the best F 1 -measure value of each method and its corresponding matching threshold, precision and recall, as shown in Table 7.
To facilitate the understanding of the role of TE algorithms in template matching, we give the following two typical cases in the evaluation: Case 1: The system selects a long template form two matched templates with the same maximum matching degree, as shown in Figure 4.
Case 2: The system selects a correct template through the semantic similarity based on WordNet in LB + Ontology + Semantics-based inference, as shown in Figure 5.

B. QA SYSTEM EVALUATION
In order to evaluate the overall performance of CQACD as a QA system, we asked 10 students who didn't know anything about our system and research to ask questions to the system.  Everyone asks 15 questions about basic computer knowledge, resulting in a total of 100 different questions, and then let the system make reply. Based on the best matching threshold selected for each method in Table 7, we employed the following formulae to evaluate the QA system performance, yielding the experimental results shown in Table 8.
In Table 8, we use five template-matching methods to evaluate our QA system, in which there are three benchmark methods: (1) Lexical baseline inference (LB) is derived from [41] and follows the same computational procedure as described in the previous subsection.
(2) LB+Glove is a semantic similarity-based matching inference method. It uses Eqs. (4) to (9) and (13) for matching inference, in which Sim (j, i) in Eq. (9) is achieved by the cosine similarity between the Glove word embedding vectors of words j and i.
(3) Inference sentence (InferSent) [48] is also a semantic similarity-based matching inference method. It uses InferSent that is a pre-trained sentence embedding model [48] based on the Bi-LSTM neural network to encode the matched student input and input template and uses the sentence embedding cosine similarity to calculate the semantic similarity between the student input and the input template.
The matching thresholds in Table 8 are derived from the best matching threshold in Table 7. For fairness reasons, we use the same matching threshold 0.5 of our proposed semantic matching method LB+Ontology +Semantics for two semantic benchmark methods LB+Glove and InferSent.

C. TUTORING EFFECTIVENESS EVALUATION
We set up three learning models to investigate and compare the tutoring effectiveness of CQACD system: the traditional class instruction (TCI) model based on teacher lectures, the self-regulated learning (SRL) model based on the educational website, and the personalized tutoring model based on our CQACD system. In order to reduce the impact of students' own factors on learning outcomes, all the students participating in the experiment were selected from the freshmen of physics, mathematics and electronics in our school, their entrance examination subjects were the same, and the average enrollment scores for these three majors were very close, which ensured their cognitive levels were close. In addition, in order to investigate the impact of class size on learning outcomes, we set up four class types with different student numbers for each of three learning modes. Finally, we evaluated the learning effectiveness of three learning models with the same test paper that consists of 100 single-choice questions, and obtained the experimental results shown in Table 9.

X. DISCUSSIONS
From the above experiment results, we can draw several conclusions. First, the results presented in Fig. 3 and Table 7 show that when the matching threshold is less than 0.4, the classification precision is the same as or similar to the recall rate because most of student inputs can match a template from the template library. When the matching threshold is greater than 0.4, the difference between the classification precision and recall rate increases because there are more student inputs that cannot match a template from the template library. The classification precision and recall rate in the mandatory word-based lexical inference are lower because in this inference method the ontology element variable in the template cannot be matched by the ontology element in the student input. Fig. 3 shows that the classification precision in LB inference can be significantly improved by the ontology element type-based inference while the classification recall rate in LB inference can be significantly improved by the semantic similaritybased inference.
Secondly, the evaluation results in Table 8 show that CQACD system can give 88% correct answers through our proposed template matching method based on the ontology and semantics, which demonstrates that the overall QA performance of CQACD system has reached a higher level and CQACD is fully capable of acting as an intelligent answering system of basic computer knowledge in the computer basics course. Through further analysis, we found that after the introduction of semantic analyses, the system's error answering and failure to identify the question are mainly due to that students' questions are beyond the coverage of domain ontology knowledge rather than template matching errors. This out of coverage is mainly reflected in the following three aspects: (1) Being completely beyond, that is, the system cannot identify any ontology elements in the student's question. At this point, the student's question sentence is judged as being uncertain. For example, the question ''What is a television?'', in which the student mistakes the television as a type of computer.
(2) The partial excess that causes the question to be mismatched. That is, the system identifies only some of the ontology elements in the student's question, causing the question to be mismatched and wrongly answered. For example, for the question ''What is the price of memory?'', because the system did not define the attribute price, the question is mismatched with the template ''What is < c: concept>?'' and wrongly answered with the definition of memory.
(3) The partial excess that causes the system cannot correctly infer the answer. For example, for the question ''Can Data be transferred to the CPU?'', it is correctly classified as matching the template ''<can> [the] < c 1 :Concept> < r:Relation>[the]< c 1 :Concept>[?]'', but the system defines only the existence restrictionData ⊆ ∃be_transfer_to.host rather than Data ⊆ ∃be_transfer_to.CPU in the Domain ontology, resulting in that the system gives the wrong answer No. Such errors can be gradually reduced with the gradual improvement of the domain ontology. Table 8 also shows that although two semantic benchmark methods LB+Glove and InferSent can improve the lexical baseline inference (LB), they are still far inferior to our proposed matching methods LB+Ontology and LB+Ontology + Semantics, which reveals that ontology elements and their types play a key role in question template matching since the template contains ontology element variable.
Finally, Table 9 demonstrates that the teacher-led class instruction model gained a greater advantage in the small class instruction, such as obtaining a mean score of 84.2 in VOLUME 10, 2022 the small class with 30 students. However, with the number of students in the class being increased, the learning effect of this model showed a significant downward trend, for example, when the number of students per class is increased to 100, the mean score dropped to 81.8, which is mainly due to the limited capacity of a single teacher, so when the increase of students in the class, the attention and personalized tutoring of the teacher to each student in the classroom will decline. On the contrary, website-based SRL and CQACD-based learning are both technology-enabled services that have a higher confidence level than TCI-based learning, so the increase of students in the class has little effect on both. Especially CQACD-based tutoring mode still maintained a mean score of 81.3 in the large class (100 students / class), this mean score is much higher than self-regulated learning effectiveness (76.1) and has approached the teacher-led tutoring effectiveness (81.8) in the large class instruction, thereby demonstrating the CQACD's ability to replace the teachers' tutoring in large classes. In addition, Table 9 also demonstrates that CQACD-based tutoring mode has a lower standard deviation for the test scores of large classes than both TCI and SRL models, which proves that the constructivistbased conversational pedagogical strategy in CQACD can promote poor students' learning.

XI. CONCLUSION
In this study, we employ a domain ontology with rich semantic relationships and a limited number of input templates with description logics to design and implement a tutor and student mixed initiative concept QA system of CQACD that can give 88% of the correct answer and replace the teachers' tutoring in large classes. The design ideas of this paper can be used and referenced by other concept-centered computing courses such as operating systems and computer networks, and even by courses in history and philosophy in the humanities and social sciences.
The development of proposed CQACD system has gone through 8 years and released 3 versions. The axioms it contains have grown from less than 5,000 to more than 30,000. In the future, we intend to improve and perfect CQACD system in the following aspects: (a) we intend to add voice input/output functions to CQACD system. At present, a new CQACD version with voice input/output is being designed and tested by Automatic Speech Recognition (ASR) technology in Ali Cloud; (b) we will refine the domain ontology and input template library in CQACD to further improve CQACD's performance; (c) we intend to standardize CQACD system with SCORM Specifications [44] (ADL, 2009) and the interoperability model proposed by Zhu et al. [13] so that our CQACD system can apply to various discipline ITSs.
YU WEN received the B.S. and M.S. degrees in law from Guangxi Normal University, China. She is currently with the Student Work Department, Guilin Tourism University. Her research interests include intelligent tutoring systems and questionanswering systems.
XINHUA ZHU received the B.S. degree in computer science from the Department of Radio Electronics, Beijing Normal University, China. He is a Professor with the School of Computer Science and Engineering, Guangxi Normal University, China. His research interests include intelligent tutoring systems, natural language processing, and knowledge graphs.
LANFANG ZHANG received the B.S. and M.S. degrees in computer science from Guangxi Normal University, China. She is a Professor with the Faculty of Education, Guangxi Normal University. Her research interests include artificial intelligence in education, natural language processing, and knowledge graphs. VOLUME 10, 2022