Introduction
In modern society, the marriage crisis is a common phenomenon, and many people would like to protect their rights in legal means. However, most people know little about the rights that they should protect, let alone protecting them from violating lawfully.
People always maintain their rights in two ways. One way is to refer to many related cases and make comparisons with their cases [1]. Another way is to seek help from lawyers. As far as the current legal consultative service is concerned, the main method is that lawyers provide help and services to the users through a one-to-one communication method, which means that lawyers need much time to help the users, which leads to the inefficiency of consultation [1].
The first way requires searching for many similar cases. Although part of the burden can be alleviated by keywords search, there are still many cases returned from the searching engines, such as Google, Baidu, etc., which can’t fully meet the requirements of the users. The second way is to seek help from lawyers. It is well known that lawyers usually ask the same questions to different users and do the same logical reasoning. These cumbersome and repeatable works are urgent to be substituted by machines.
Currently, rule-based reasoning is widely used in most legal expert systems [2]. It has an advantage that both theoretical and practical knowledge of legal experts can be easily collected. In [3], a case-based medical consultation system was proposed, it analyzed a person’s complaint (disease) in the form of a sentence or question paragraph. Then the system answered the problem in the form of diagnosis according to the system knowledge, the system uses Case-based Reasoning (CBR) and Sorenson coefficient calculations to perform the matching process to find out which cases have the highest matching rate with the new cases. In [4], it proposed a similar case retrieval system, the system uses the iFLY-TEK’s online speech synthesis technology and natural language processing technology to realize the Legal Knowledge Q & A with relevant context ability, and finally retrieve the most similar cases from database as the conclusion of user consultation. In [5], an algorithm of legal text classification based on feature words was proposed. It took legal judgment as a training corpus to establish the relationship between legal provisions and feature words so that relevant legal provisions can be accurately extracted from the judgment. Then it established the corresponding relationship between legal provisions and feature words by calculating the feature words of documents with TF-IDF. In [6], a semi-automatic ontology construction method was proposed for legal question-and-answer, which provides reasoning support for the legal question-and-answer system by exploring the implication between legal provisions and problem statements, and effectively helped the development of the ontology and rules of criminal law.
With the development of natural language technology and deep learning, natural language technology and deep learning are widely used in simple dialogues in e-commerce customer service, chat, intelligent devices and other fields [7]–[14]. In [15], a question-and-answer system based on knowledge graph reasoning used a knowledge graph to provide well-structured relationship information between entities, and used deep learning to deal with noise in problems and learn multi-skip reasoning at the same time. In [16], a remote supervised open field question-and-answer system used a paragraph selector to filter out the noisy paragraphs and a paragraph reader to extract the correct answers from the de-noised paragraphs. In [17], a question-and-answer system was proposed based on reinforcement learning and collator. Through a new open domain question-and-answer communication model with collation components, the retrieved answers were ranked according to the possibility of extracting the basic factual answers of a given question, and the collator was trained by reinforcement learning.
In recent years, there are also some developments in question answering technology related to Chinese law. In [18], a framework was proposed for constructing a network of mixed legal knowledge based on the Chinese encyclopedia and legal judgment. First, it builds a network of legal terms from encyclopedia data. Then, the legal knowledge graph is constructed through Chinese legal judgment to capture the strict logical connection in legal judgment. Finally, a hybrid knowledge network of Chinese law is constructed by combining legal terms network and legal knowledge graph. In [19], it introduced a free Chinese legal technology system (IFly-Legal), which utilizes deep context representation, multiple attention mechanisms and other technologies for legal consultation, multi-channel legal inquiry, and legal literature analysis.
Through the research on the legal auto consulting products on the market in China, it is found that the characteristics of the existing mainstream legal consulting products are shown in Table 1.
Based on the above background, we design and implement a task-oriented automatic dialogue system based on the decision tree for real-time marriage legal consultation. The legal automatic dialogue system can realize multiple rounds of dialogue and provide accurate answers in real-time, which enables users to have a good interaction. Moreover, the method can be extended to other legal consultation easily. The main contributions of this paper are as follows:
Based on a parallel C4.5 decision tree, which is built from the case data we collect, an intelligent marriage consultation system is designed and implemented. The system can respond to similar inquiries for users intelligently, with the ability of reasoning.
The effect of different training set proportions on the maximum tree depth and the precision of the decision tree model of the legal automatic dialogue system are analyzed in our experiment.
The Proposed Methods
A. Design of Law Automatic Dialog System Based on Decision Tree
There are four modules in the system as shown in Figure 1. The data collection module is to craw data and collect data from web. The data preprocessing module is to fill missing values and discretize continuous value of some attributes; The data learning module is to build a parallel decision tree; attribute value extraction module is to establish models according to different types attributes.
From the architecture point of view, the four modules are integrated into three parts as shown in Figure 2: data preprocessing model is to obtain characteristic representation of data; training model is to use characteristic representation data to build a parallel C4.5 decision tree; user interaction model is to extract attributes from user’s input, and replies to users.
B. The Process of Building a Parallel C4.5 Decision Tree
The decision tree of the legal automatic dialogue system is built based on the information gain ratio of attribute as below [20].
1) Split Attribute Selection
Due to information gain ratio is based on information gain, and as we know that the information gain is based on an idea to decrease entropy for a data-set by splitting it on an attribute, and building a decision tree is all to select attribute that returns the highest information gain. Hence, we do in a way as below.
Suppose that the samples set \begin{equation*} SplitInfo_{A} (D)=-\sum \limits _{j=1}^{v} {\frac {\vert D_{j} \vert }{\vert D\vert }} \log \frac {\vert D_{j} \vert }{\vert D\vert }\tag{1}\end{equation*}
Then, the information gain is calculated. The information gain means the difference of information entropy after splitting with attribute \begin{equation*} Gain (A)=Ent(D)-\sum \limits _{i=1}^{v} {\frac {\left |{ {D^{j}} }\right |}{\left |{ D }\right |}} Ent(D^{j})\tag{2}\end{equation*}
Hence, the information gain ratio of set \begin{equation*} GainRatio(A)=\frac {Gain (A)}{SplitInfo_{A} (D)}\tag{3}\end{equation*}
Because information gain ratio considers both the data distribution and the information gain when selecting attributes, which avoid the disadvantage of information gain, it is reasonable to select an attribute
2) Tree Construction Based on Mapreduce
The overall process is shown in Figure 3. We can see that it is an iterative process, in order to speed-up the process, the tree is built in parallel by MapReduce [21], [22] in view of the problem that the system would slows down after data expansion. MapReduce is only carried out in the parallel phase of building decision tree model. and the process of MapReduce can be described as following: Firstly, the data should be transformed to the formation for MapReduce. Secondly, procedure MAP is to calculate split information and information gain. Thirdly, procedure REDUCE is to calculate information gain ratio. Finally, the information gain ratio obtained by MapReduce will be used for the selection of split attribute.
As Figure 4 shows, we can see that the decision tree summarizes decision rules from data, and presents these rules with tree structure, where each non-leaf node means a judgment on an attribute, and each leaf node represents a classification result. Hence, it is applicable to classify data for law consultation.
Another example is given as shown in Figure 4, when a customer asks a question about divorce, the system will ask the user whether he/she has a marriage certificate or cohabited before 1994/2/1. The system reaches the leaf node (Node 4) and returns a result which means dissolution of cohabitation if the user doesn’t satisfy both conditions, otherwise, it will visit the next branch node and ask other question until leaf node is touched. For example, if the user has no marriage certificate and cohabited after 1994/2/1, then the system reaches the node 3, and asks the user whether the woman currently in a period of pregnancy or abortion within six months.
When the user consults the issue of law, the automatic process of legal automatic dialogue will be launched based on decision tree as shown in Figure 5. Firstly, the system starts from the root node of the decision tree according to the input of user. Secondly, after the corresponding attribute value of user’s input is extracted, the system would judge whether the attribute value is reasonable for current node. If the value is appropriate, it would reach the next branch node of the decision tree and ask user the question which is related to the attribute value extracted above, otherwise, the system would ask user the same question. Finally, the dialogue is terminated if the current node is a leaf node, and return a result to the user.
C. Attribute Value Extraction
In order to remove noise for the user’s input, and obtain the user basic attributes in user’s input. Therefore, it is necessary to establish a discriminant model for each basic attribute to extract the key attribute value from the user’s input accurately. The process of attribute value extraction is shown in Figure 5, which includes training module and the application module.
1) The Training Module
Firstly, the elements of training data are tagged with the corresponding labels. Secondly, the module is to preprocess the collected data, which includes removing the noise including word segmentation, deleting stop words, and eliminating low frequency words. Thirdly, the term frequency–inverse document frequency (TF-IDF) is calculated to obtain the feature vector of document. Fourthly, topic features are extracted by Latent Dirichlet Allocation (LDA) with the feature vector. Finally, SVM is applied to train the extracted topic features with the tagged labels above.
2) The Application Module
When the user consults the legal issues, the user’s input is preprocessed and extracted by LDA and TF-IDF, and then the trained SVM classifier will return a result which indicates the attributes of the user’s input.
Taking whether there is a marriage certificate between the couple as an example, Figure 6 shows the result of extracting attributes from a dialog which is launched by a couple who consult divorce issue. 1 represents a positive class indicating that the marriage certificate is mentioned in the text, and 0 means the negative-class indicating that the marriage certificate is not mentioned in the text. When the user inputs text, the classification model will judge the category of input. If the input is the positive-class, the system would judge the existence of marriage certificate in user’s the input by the recognition of affirmative and negative sentences. Otherwise, the dialogue system will ask the user again until the user’s input is a positive class.
D. Theoretical Analysis and Comparison of Decision Tree Classifier Model
We use the C4.5 decision tree in the proposed system. In this subsection, the decision tree model is compared with Naive Bayesian (NB) and SVM in theory. The premise of the NB classifier model is that the attributes are independent of each other [23], so NB is not applicable in this system; The accuracy of SVM model often depends on the selection of support vector, and when there is a lot of data noise, it will seriously affect the performance of the SVM model [24]; Decision tree has strong comprehensibility and interpretability, and the characteristics of branch facilitate the generation of dialogue process of consultation [25]. The advantages and disadvantages of decision tree classifiers and other classifiers [26] is shown in Table 2.
Experiment
A. Experimental Dataset
In our experiment, we test our algorithm in judging divorce issues, and the related data set comes from website lvpin (https://ai.lvpin100.com), which is a legal consulting website, and the data on this website has been sorted out by many professionals. The data cardinality is 2304, and the data formation is shown in Table 3 (all data have been translated from Chinese to English for understanding). In addition, the data formation after preprocessing, which is the training data of the classifier, and an example of classification are shown in Table 4 It is observed that there are four categories and eight basic attributes, and each attribute may have multiple values. Moreover, the all attributes have been discretized into Numbers, and clustered Manually. Each attribute corresponds to a question that will be thrown to user if the decision tree traverses corresponding node, as shown in Table 5.
In the experiment of attribute value extraction, the data is collected by manual collection, and annotated by professor. The data cardinality is 753, and the data formation is the same as figure 6.
B. BTHE Method of Evaluation
The method of K-fold cross-validation is used to verify the experimental results, and obtain a reliable and stable validation model which prevents the model from over fitting [27].
C. Metrics for Evaluation
The results of the classification are evaluated by the precision (P), recall rate (Recall, R) and F1-score. The formulas are:\begin{align*} P=&\frac {TP}{TP+FP} \tag{4}\\ R=&\frac {TP}{TP+FN} \tag{5}\\ F1=&\frac {2P\ast R}{P+R}\tag{6}\end{align*}
Meanwhile, the average number of questions is also used to evaluate our system. The fewer questions required, the faster the system can understand the real intention of the user. The metric can be reflected by the depth of the decision tree, and the formula is:\begin{equation*} h_{avg} =\frac {\sum \limits _{i=1}^{N} {deep_{i} -1}}{N}\tag{7}\end{equation*}
D. Machine Configuration
The system is written in Python, and experiments are conducted on windows 10 with Intel(R) Xeon(R) CPU @ 2.30GHz and 12G RAM.
E. Results of Attribute Value Extraction
In this subsection, we conduct experiment on the data in eight-fold cross validation method to compare LSTM model and SVM model in attribute value extraction. The LSTM method takes word vector as word feature after data preprocessing, the number of LSTM layer is 2, and the number of units is 128. There are 602 data as training samples, 151 as test set in this experiment. The performances of the SVM method and its competitor are shown in Table 6 (Avg means average, std represents standard deviation). It is observed that all average scores of the SVM model are above 97%, which outperforms LSTM model, indicating that the SVM model can be better applied to the extraction and discrimination of attribute values for further processing by decision tree. Moreover, the standard deviation of SVM model is lower than LSTM model, which intends that the SVM model is more stable than LSTM model.
F. Comparisons of C4.5 Decision Trees With Other Algorithms
Due to the good performance of C4.5, we mainly use it as the decision tree in the proposed method. In this subsection, some experiments are conducted to compare C4.5 decision tree with other classification algorithms, such as SVM and NB.
The first experiment is to analyze the influence of the different proportions of the training set. The result of precision is shown in Figure 7, the result of training time cost is presented in Figure 8, and the time cost of 1000 times predictions on the same test set is shown in figure 9.
In the view of Figure 7, the larger the proportion of training set is, the higher the precision we obtain. Moreover, the precision of the C4.5 decision tree model is higher than SVM and NB after 5% with 90% proportion of training set, which indicates that the decision tree needs less data to achieve better results than the other models.
As can be seen from Figure 8, the training time is increased by the proportion of training set. The shorter is the model, the better the model is. The training time of SVM model has the largest growth, followed by C4.5, and NB is the most stable, which shows that in the case of big data, SVM training cost is far more than C4.5 and NB.
Figure 9 shows that the prediction time of NB and C4.5 does not increase with the proportion of training set, but SVM increases with it. For the real-time requirement of the system, because of the time cost of SVM in the case of big data, C4.5 and NB are more suitable for this system, and the real-time performance of C4.5 is the best.
The second experiment is conducted to verify the performance of the decision tree with 90% proportion of training set, and we also compare C4.5 with SVM and NB model. The scores of the C4.5 decision tree, SVM, NB are shown in Table 6, the training time costs are shown in Figure 10, and the predicting time costs are shown in Figure 11.
According to Table 7, the scores of the C4.5 decision tree model are all better and more stable than SVM and NB. Moreover, from Figure 10 and 11 we can see that the training time of the C4.5 decision tree is less than SVM model, but longer than NB model, and predicting time cost is better than other two algorithms which indicates better real-time performance. Comprehensively, the C4.5 decision tree is more applicable than the other two algorithms.
The third experiment is to analysis the effects of different proportion training data on the depth of decision tree, and the experimental results are plotted in Figure 12.
According to Figure 12, it is observed that the higher the proportion of training set, the deeper the depth of decision tree. In addition, it can be found that the average depth of the decision tree is about 5.5, which indicates that the automatic dialogue system can return the consultation result after 5–6 questions averagely. It is fewer than that of website lvpin (the average number of questions is about 8) and the SVM, NB model which should need all attributes to make a predict. Therefore, the decision tree model reduces some useless questions, which improves the efficiency of consultation.
Conclusion
In order to realize real-time legal consultation automatically, we design an automatic legal marriage consultation system based on the parallel C4.5 decision tree for divorce issues. It responds to users intelligently with the ability of reasoning, which yields higher accuracy than SVM and NB model. Compared with some automatic legal consultation websites, the proposed method needs fewer questions asked to the user during a dialog, which improves the efficiency of consultation.
However, due to the low efficiency of attribute extraction experts’ manual tagging, and the metric data in this case may not correspond to the user opinion. Therefore, it is suggested that the process be crowdsourced and labeled by users themselves to reduce the cost of manual labeling and improve the accuracy of attribute value extraction.
Our future work is to develop a new version of the proposed method by using fast clustering [28]–[30] and CNN [31] based time series data mining to deal with complex consultation. Also, we would optimize the depth of the decision tree, and prevent the tree building from overfitting.