Smart Policing Technique With Crime Type and Risk Score Prediction Based on Machine Learning for Early Awareness of Risk Situation

In order to quickly and effectively respond to a newly received criminal case, information regarding the type and severity of the case is crucial for authorities. This paper designs and develops a crime type and risk level prediction technique based on machine learning technology and verifies its performance. The designed technology can predict crime type and crime risk level using a text-based criminal case summary, which is criminal case receipt data. For the text-based criminal case summary data, the KICS data format is considered, which is actual policing data that contains information about criminal cases. For the crime type, 21 representative types of crimes are considered; therefore, the system can predict one of 21 types of crime for each criminal case. Furthermore, to predict the crime risk level, we developed a crime risk calculation formula. The developed formula calculates the crime risk level and outputs the risk score in numerical terms considering the severity and damage level of the criminal case. To predict the crime type and crime risk score, both DNN and CNN-based prediction models were designed and developed. The performance evaluation section shows that, in the case of crime type prediction, the proposed prediction models can achieve better performance than traditional classification algorithms such as naïve Bayes and SVM. The performance of the CNN-based crime type prediction model is about 7% and 8% better than those of the SVM algorithm and the naïve Bayes algorithm, respectively. The performance of the designed technology was comprehensively analyzed and verified through various performance measurement parameters. It is also developed in the form of a software platform with a GUI, allowing field personnel (e.g. police officers) to intuitively identify the type of criminal case and the level of risk from a text-based criminal case summary upon receipt of a new criminal case.


I. INTRODUCTION
Digital transformation based on artificial intelligence (AI) technology and big data has received considerable attention [1], [2] and has become a huge trend. In addition, AI technology is being utilized in various research areas such as smart communications, medical research, natural language process, and robots [3]- [9]. Recently, smart policing technologies using AI and big data have been actively investigated The associate editor coordinating the review of this manuscript and approving it for publication was Jad Nasreddine .
to secure the lives and properties of the public [10]- [17]. Yoo et al. [10] has investigated a fingerprint detection technique using computer vision technology based on deep learning. Deep reinforcement learning technology has been used in [11] to detect criminal networks. Sangkaran et al. [12] has developed community detection in criminal networks using the graph theory, and the method differs from the traditional method by allowing law enforcement agencies to be able to compare the detected communities and thereby be able to assume a different view-point of the criminal network. Shafi et al. [13] has proposed simplified yet adaptable framework that uses a novel features extraction algorithm for extracting features from the textual part of social media contents. Various technologies for predicting and preventing crimes have been developed using accumulated security data and AI technology [14]- [17]. In addition, the need to develop smart policing technologies that can provide detailed information related to newly received criminal incidents to police officers and investigators dispatched to the crime scene is constantly being raised. The details provided in connection with the newly received criminal case allow field staff and investigators to respond quickly and efficiently to the crime.
In this paper, we propose and implement a smart policing technique that can predict crime types and crime risk levels based on machine learning so that dangerous situations can be recognized early. The proposed technique predicts crime types by analyzing text-based crime summaries. In addition, the proposed technique predicts the crime risk level as well. Before developing the crime type and crime risk prediction technique, a calculation formula that numerically calculates the crime risk score (CRS) has also been developed. The CRS calculation formula calculates the severity of a criminal event taking into account the type of crime and the level of damage to the victim. ETRI and the Korea Police Science Institute (PSI) collaborated together to establish a reasonable crime risk score calculation formula. Besides, to reflect practical crime risk levels, the opinions of the Korean police officers were gathered and included in the formula. The proposed technique predicts the type of criminal case newly received, infers the severity of the criminal case, and outputs the information to the police. In addition, the developed technology is implemented in the form of a GUI-based SW platform, so field staff can easily use the developed system. To develop the proposed technology, data from the Korea Information System of Criminal Justice Services (KICS) was considered. The KICS data is actual policing data that contains information about each criminal case. For security reasons, we have created and used virtual KICS data according to the actual KICS data format. The created virtual KICS data includes 5000 criminal cases and is utilized to train and verify the developed system. In the virtual KICS data, text data on criminal cases is extracted and used for system development. The developed system can predict crime type and CRS from text-based crime data. Table 1 describes the examples including text-based criminal case descriptions, crime types, and CRS. The second column of Table 1 is a textbased criminal case description that follows the KICS format. The simple descriptions of criminal cases are shown in this column. The first criminal case is about larceny. The second criminal case is about fraud, and the last criminal case is about arson. The third column shows the crime types for the criminal cases. And the fourth column shows the calculated CRS. Therefore, the crime type and CRS can be labels for the prediction system. The developed system can predict the crime type and CRS based on the text-based criminal description part, as shown in Table 1. The developed system takes into account 21 types of crimes that fall into the middle category. Therefore, the developed system can predict one of the 21 criminal types for each criminal case. This prediction system is developed for security data based on the Korean language. However, in this paper, the proposed technology and developed system are introduced using English examples in order to improve readability.
In the first step of the prediction system development process, a keyword dictionary is built. In the keyword  dictionary build process, about 27 keywords are extracted for each crime type. So, given 21 types of crime, the keyword dictionary consists of 568 keywords. And then, the dataset is established. In this process, input data and output labels are determined. In addition, the training dataset, validation dataset, and test dataset are arranged, deep learning-based prediction models are developed, and the models are trained by the training dataset. The real-time GUI system is implemented and applied to the prediction models. The developed GUI-based system can predict crime type and CRS in realtime for the new input data.
The rest of this paper is organized as follows. Section II explains the formula for CRS calculation. Section III describes the prediction models. In Section IV, performance evaluation results and discussions are reported. The concluding remarks are in Section V.

II. FORMULA FOR CRIME RISK SCORE CALCULATION
In this Section, the development of a crime risk score calculation formula is explained which calculates the crime risk as a numerical value and then output. The crime risk score is calculated by taking into account the crime type and various damage information for the victim of the crime. The formula for calculating CRS can be written as: where the meanings of the variables are as follows: -WC (weight for crime type): weight according to crime type -WG (weight for gender): weight according to victim's gender -WA (weight for age): weight according to victim's age -WP (weight for physical damage): weight according to victim's physical damage -WM (weight for material damage): weight according to victim's material damage   For the WC, this paper considers 21 crime types that are representative and practical in the policing environment, and the considered crime types and weight values are listed in Table 2.
The weight values of WC are assigned considering severity as well as sentencing guidelines according to crime types. WG and WA are variables that are set taking into account the victim's physical information. Therefore, WG and WA can be regarded as information that determines how vulnerable a victim is to a crime. The weight values of WG and WA are described in Table 3. As shown in Table 3, a higher weight value is assigned to women who are relatively vulnerable to criminal damage, and from this perspective, higher weight values are assigned to young children and the elderly who are vulnerable to crime. WP and WM are variables that present the damage level from crime. WP is the victim's physical damage. The weight value of WP is set according to the time required to fully heal the damage. WM indicates the degree of property damage: the higher the amount, the higher the assigned weight value. Table 4 shows the weight values for WP and WM. As mentioned in Section I, ETRI and PSI worked jointly to establish a reasonable crime risk score calculation formula. To reflect practical crime risk levels, the opinions of Korean police officers were gathered and included in the formula.  Fig. 1 shows the proposed crime type and CRS prediction system. The proposed system consists of five functional parts which are as follows: VOLUME 9, 2021 -Data source: Text-based crime data is used as a data source. The crime data contains criminal information (criminal case summary), and the form of the crime data source is the same as actual Korean police security data from KICS. In addition, crime type and CRS according to the text-based crime data are also inserted for labeling of the training process. -Dictionary builder: In this process, keywords are extracted from the text-based crime data source, and a keyword dictionary is built. -Dataset builder: Datasets for training, validation, and test are built using the text-based crime data source and the keyword dictionary. -Model builder: In this process, both the CRS prediction model and crime type prediction model are designed, built, and trained by using the datasets. -Real-time prediction application platform: This is a GUI application system that can predict crime type and CRS from text-based crime data input in real-time.

III. PREDICTION SYSTEM FOR CRIME TYPE AND CRIME RISK SCORE
A. INPUT DATA MANAGER Fig. 2 describes the process of the input data manager. At the first step, the input data manager merges the KICS data with the crime type/CRS that can be labels for the prediction model. The input data manager produces two kinds of output. The first is Output 1 which is used for the keyword dictionary builder. To extract keywords according to the crime types, the data manager sorts the entered KICS data by crime type and outputs the sorted data as Output 1. The second output of the input data manager is Output 2 which is used for the dataset builder. In the dataset builder, the training dataset, validation dataset, and test dataset are built. Therefore, the data in which KICS data, crime type, and CRS are merged can be Output 2. Fig. 3 shows the keyword extraction process. In this process, keywords are extracted by both the wordrank analysis algorithm [18], [19] and the term frequency -inverse document frequency analysis algorithm (TF-IDF) [20], [21]. The text-based input KICS data is processed by each criminal case. The cleansing process is applied to the input data. In the cleansing process, noise that may be unnecessary or interfere with keyword extraction from the data source is removed. In this process, date and time information is removed, and the morphological analysis is then applied to the output of the cleansing process [22]. In this morphological analyzer, general nouns (NNG) and verbs (VV) are extracted. The outputs of the morphological analyzer are sorted and grouped according to 21 crime types in order to extract keywords for each crime type. The text group by crime type is inserted into both the wordrank analyzer and the TF-IDF analyzer. The results of both analyzers are merged and sorted in the order of the highest score.

B. KEYWORDS DICTIONARY BUILDER
The extracted keywords are inserted into the keyword selector. Among the keywords extracted by crime type,  keywords with over 100 points are finally selected and stored in the keyword dictionary. There are 568 keywords stored in the keyword dictionary, and the number of stored keywords for each crime type is described in Table 5.

C. DATASET BUILDER
In this dataset builder, datasets for training, validation, and test are generated, as shown in Fig. 4. In the first step of creating the dataset builder, data sourced from the input data manager is processed by a cleanser and morphological analyzer. These cleansing and morphological analysis processes are the same as those of the keywords dictionary builder. The processed data is inserted into the wordrank analyzer block, and NNG and VV are extracted by each criminal case. The result of the wordrank analyzer is inserted into the dataset generator. In the dataset generator, the inserted the wordrank analyzer result is analyzed by each criminal case and compared with the keyword dictionary which consists of 568 keywords. According to the above process, input vector X with 568 × 1 is generated. The input vector generation process is as follows: -An all zero vector with 568 × 1 length is generated.
-The extracted words from the wordrank analyzer result are compared with 568 keywords in the keyword dictionary. -If one of the words in the result of the wordrank analyzer result is the same as one in the keyword dictionary, the value of the zero vector at the same position as the corresponding keyword position is changed to 1. -This comparison process is applied to all words extracted from the result of the wordrank analyzer. -The generated vector with 568 × 1 length can be an input vector X of the prediction model. At the same time, the dataset generator produces two kinds of output (Y 1 for crime type and Y 2 for CRS) of the prediction model. The output of the crime type is the one-hot encoded label which presents one of the 21 crime types. Table 6 shows the one-hot encoded output vector for crime types, and numeric CRS values are used for CRS.
The generated datasets are transmitted to the dataset builder. The dataset builder separates the received dataset for training, validation, and test as shown in Fig. 4. As mentioned in Section I, virtual KICS data with 5000 crime cases is created and utilized in our research. Therefore, as shown in fig. 4, the dataset composition is as follows: -Training: 60% (3000 crime cases) -Validation: 20% (1000 crime cases) -Test: 20% (1000 crime cases).

D. MODEL BUILDER
In this model builder, the crime type prediction model and CRS prediction model are developed. For the model design, deep neural network (DNN) architecture and convolutional neural network (CNN) architecture are considered. For both prediction models, input vector X with 568 is applied. The dense layers are updated by the learning process.   hidden layers, and an output layer. The hidden layers include four hidden layers which consist of fully connected (FC) layers and activation functions. For the FC layers, feed forward NN architecture is considered, and for the activation function, the rectified linear unit (ReLU) activation function is considered [23]. The ReLU is the most general activation function that can relax the vanishing gradient problem. The number of nodes of each FC layer is also shown in Fig. 5-(a).   In the input layer, the input vector X from the dataset builder is inserted. In the hidden layers, the input data is processed by four FC layers and the ReLU activation function. The final output of the hidden layer is inserted into the output layer with the softmax activation function which is a representative classification function. The softmax function outputs a vector of length 21 which is the same as the length of one-hot vector of Table 5, and can be regarded as a probability distribution over all possible crime types. In the training procedure, all the layers are trained to reduce the cost between the real crime type and classified outputs. For the cost value, categorical cross-entropy is considered. The cost value is minimized by the RMSProp algorithm [24] which is one of the standard methods to train neural networks beyond stochastic gradient descent. Fig. 5-(b) shows the DNN architecture for the CRS prediction model. In this model, the input dataset is X which is the same as the crime type prediction model. The final output is one numerical value which is the predicted CRS value. The DNN architecture of this model is similar to that of crime type prediction model. Therefore, for the hidden layers, FC layers with feed forward NN architecture is also considered. For the cost value, the mean square error (MSE) between the final output of the CRS prediction model and the real CRS value is considered, and the RMSProp algorithm is also used for the optimization of this model. Fig. 6-(a) shows the conceptual diagram of the CNN-based crime type prediction model. It consists of input layer, three convolutional layers, one FC layer with ReLU, and output layer with softmax. The specific CNN architectures are described in Fig. 6-(a) including kernel size and CNN structure. Fig. 6-(b) describes the CNN-based CRS prediction model architecture in which the final FC layer does not consider the softmax function, just as in Fig. 6-(b). Although the CNN architecture requires much higher computational complexity, it can give better performance than DNN architecture, generally. Therefore, in our research, the CNN-based prediction models can give better performance than that of DNN-based prediction models. Table 7 shows the hyper-parameters according to the prediction models.

IV. PERFORMANCE EVALUATION AND IMPLEMENTED RESULT
In this section, the performance of the developed system is evaluated. For the performance evaluation, virtual KICS data with 21 crime types are considered. Since the system includes two prediction functions, which are the crime type prediction and CRS prediction, each prediction function performance is evaluated. Furthermore, the developed system also includes a real-time prediction application with GUI. In this section, the functional performance of the real-time prediction application is also checked.  Fig. 7 shows the training and validation performance of crime type prediction models. In the graph, training loss performance, training accuracy performance, and validation accuracy performance are shown according to the prediction model architecture. The validation accuracy performance of the CNN-based prediction model is better than that of the DNN-based prediction model.    Fig. 9 describe the receiver operating characteristic (ROC) curve and precision-recall curve, respectively [27], [28]. For the average method, both micro-average and macro-average are considered and depicted in Figs. 8 and 9. In Fig. 8, the area under the ROC curve (AUC) is also shown for each prediction model. In both average methods, the CNN-based prediction model shows the better performance than that of DNN-based prediction model. However, since both CNN and DNN-based models achieve more than AUC = 0.98, two prediction model can give superior prediction performance. The precision-recall curve is another tool to visualize the performance of each model. In precision-recall curve, precision is on y-axis and recall is on x-axis. The goal of precision-recall curve is the upper right-hand corner where precision and recall both are 1, which is optimal position for precision-recall curve. As shown in Fig. 8, the CNN-based prediction model can achieve better performance than that of DNN-based prediction model. Table 8 describes the prediction performance of the proposed crime type prediction models. In addition, for performance comparison, SVM with polynomial kernel function of degree 2, and Bernoulli Naïve Bayes algorithms are also considered [29], [30]. The table describes the computed values of performance evaluation parameters including accuracy, precision, recall and F1-Score. The accuracy is used as a measures of how close the result is to the standard one. For all parameters, the CNN-based prediction model can achieve the best performances among other prediction models. In term of accuracy performance, the performance of the CNN-based prediction model is about 7% and 8% better than those of the SVM algorithm and the naïve Bayes algorithm, respectively. Fig. 10 describes the training and validation performance of the CSR prediction model. In the validation cases, the performance of the CNN-based model is slightly better than that of the DNN-based model.

B. CRS PREDICTION
To evaluate the CRS prediction performance, the mean absolute percentage error (MAPE) is considered [25], [26]. The MAPE equation is written as where X n is the actual value andX n is the estimated value of the neural network. Since the MAPE presents the error value as a percentage, a lower value means better prediction performance. In Table 9, the CNN-based CRS prediction performance is about 5% better than that of DNN-based performance. In conclusion, the CNN-based CRS prediction model can achieve an accuracy of more than 80%.  part that receives text-based case contents from field personnel. The field personnel enter the crime summary, and the application uses the entered crime summary to predict the crime type and CRS in real-time. The middle part of the application displays the results of the crime type prediction based on the input crime summary. The middle part of the application displays the results of the crime type prediction based on the input crime summary. The prediction results show the probability values for 21 crime types as a bar graph. At the top of this section, the types of crimes with the highest probability values are displayed. Based on the predicted results, field personnel can quickly identify the type of crime. The predicted CRS is displayed at the bottom of the application. This prediction result is also displayed in real-time.
The platform can predict crime type and CRS and display the prediction results in real-time; therefore, field staff, such as police officers, can easily check predictive information about crimes received through the platform.

V. CONCLUSION
To quickly respond to newly received crimes, this paper designs smart policing technology based on machine learning. The developed system includes two prediction models: crime type prediction model and a CRS prediction model. For both prediction model, DNN architecture and CNN architecture are designed and developed. The performance of each predictive model is also evaluated. In the case of crime type prediction model, the proposed architectures have better performance than those of existing techniques such as SVM and naïve Bayes algorithms. Especially, the accuracy performance of CNN-based crime type prediction model is 7% and 8% higher than SVM algorithm and naïve Bayes algorithm, respectively. In the case of CRS prediction, the CNN-based CRS prediction model can accomplish 80% accuracy. In addition, the real-time operation of the GUI-based smart polishing system developed from a functional point of view is also verified. Designed smart policing technology can predict crime type and CRS using text-based crime event data. A real-time GUI-based application platform is implemented and applied to the predictive models. The developed GUI-based system can predict crime type and CRS in real-time for new input data. In addition, the developed system provides an intuitive GUI, allowing field personnel to use the system efficiently. Since 1995, he has been with the Defense & Safety ICT Research Department, ETRI, Daejeon, South Korea, where he is currently a Principal Researcher and the Assistant Vice President. Since 2014, he has been a Professor with the University of Science and Technology, Daejeon. His current research interests include digital transformation technology with AI, big-data, the IoT, and AR/VR/MR. He is an Associate Editor of the IEEE TRANSACTIONS ON CONSUMER ELECTRONICS Publications Editorial Board. VOLUME 9, 2021