Research on Screening of Empathy Information Based on Image Recognition and Data Mining

The Internet not only provides help for people to understand the world and facilitate life, but also provides a convenient way for widespread dissemination of bad information. It follows that young people are often harassed by pornographic, violent and other bad images, which affects the development of young people’s empathy. In this study, from the perspective of the combination of bad image screening and youth empathy ability, the effect of bad image on youth empathy ability is studied. In this article, a new empathy analysis model is constructed based on traditional empathy theory, combined with image recognition and data mining technology. First, the theoretical principles of bad image recognition technology and their application in the evaluation of empathy ability are expounded. Then, based on image recognition and fusion particle swarm optimization algorithm, the classification of bad images was studied. Finally, on the basis of image classification, a data envelopment model is used to grade the young people’s empathy ability. The actual case analysis and performance test results illustrate the superiority of the implemented image classification and empathy evaluation method based on image recognition and data mining. This research has certain theoretical significance for the research of enriching empathy ability and interpersonal relationship. At the same time, it has certain practical significance for improving the youth’s interpersonal trust, realizing the harmonious interpersonal relationship and the healthy development of body and mind.


I. INTRODUCTION
With the rapid development of information technology and Internet technology, network information has become a well-known convenient source of information and a leisure method [1]. At the same time, a large amount of unhealthy information such as pornography and obscenity on the Internet has seriously interfered with the normal online life and seriously poisoned the physical and mental health of young people and the development of empathy [2]. Empathy refers to the individual's ability to identify, understand, and cope with the emotional state of others in order to produce an emotional experience consistent with others [3]. In modern society, empathy has an important impact on individual moral development, interpersonal communication, and interpersonal relationships. How to purify the network environment, increase monitoring methods for network The associate editor coordinating the review of this manuscript and approving it for publication was Zhihan Lv . activities, and improve the ability of information recognition have become a strong demand. As its technical support, content-based bad information recognition technology has increasingly attracted people's attention. The content-based bad image recognition and detection technology has recently aroused great interest. At the same time, it is also an important and urgent research topic faced by content-based network filtering systems [4]. The identification of pornographic images is actually an image classification problem. We use content-based methods to study images and use statistical classification methods to identify pornographic images.
Bad image recognition is to determine whether an image is an image with bad information. The bad image information recognition technology mainly uses image analysis technology to segment the skin color area in the image [5]. Forsyth used computer vision and image understanding technology to study the recognition of bad images, and judged whether the images contained pornographic content by detecting the geometric features of skin color segmentation and human posture [6]. Wang implemented a pornographic image analysis system that uses Daubechies wavelet and color histogram features to detect bad images [7]. At the same time, in terms of empathy assessment H. Smith believes that when an individual observes that another person is in a certain strong emotional state, they are naturally born to experience the same feeling as the observed emotional state [8]. Spencer believes that empathy can make all members of the group experience the same kind of emotional state quickly, so that it is possible to make consistent behaviors [9]. Titchener mentioned the concept of ''empathy'' in English for the first time. He believed that empathy is a process in which an individual actively and diligently enters another person's inner world [10]. Stoutland believes that empathy is an emotional reaction in which the observer perceives that others are or will experience a certain emotion. Hoffman regards empathy as an emotional response that is more suitable for the situation of others than his own situation [11]. Fables believes that empathy is the perception of the emotional state or situation of others. Greenfield pointed out that empathy is the perception of the emotions of others and the experience and behavior caused by emotions [12].
Generally, image processing itself is a resource-and timeconsuming task, and the complexity of the recognition algorithm makes the performance of the image recognition system low. How to improve the performance of the system so that the real-time requirements for identifying and filtering information will also be the main problem in future research. The research of empathy has different forms under different historical conditions. At present, most of the relevant researches are conducted from the perspective of a single subject or a single visual form, and there is no research from the perspective of the combination of graphic images and mathematical statistics [13]. Based on this, based on the theory of empathy ability, combined with image recognition and data mining technology, this article builds a new empathy ability classification model. This method can provide scientific reference and basis for the research of modern empathy ability.

II. OVERVIEW OF RELATED TECHNOLOGIES A. SCREENING OF BAD INFORMATION BASED ON IMAGE CATEGORY
Through research, it is found that the identification of bad information images is best to determine whether the images contain specified objects or objects, which is also the most direct requirement of the content-based image recognition system. In this way, the image matching technology can be used for reference based on the identification of bad information content. For example, we can judge whether an image is a bad image by judging whether an image contains a designated sex organ. This problem is actually an optimization problem. You can use particle swarm optimization or other matching algorithms to find the fitness value to determine whether it contains the specified object. However, due to the uncertainty of the scale of pornographic images and the complexity of the image background, the study found that directly using matching technology to identify any bad images is not ideal [14].
Target detection refers to the process of detecting and identifying specific target objects in an image, that is, not only to obtain the category information of the objects in the image, but also to output the specific position information of the target object on this basis. At present, there are some traditional detection methods for target detection tasks, as well as some emerging detection methods. The former is to use traditional machine learning methods to generate some candidate frames and perform feature extraction, and then use some common classifiers to complete the detection task. The latter is based on deep learning target detection method, which has strong feature extraction ability and learning ability, the disadvantage is that it contains a large number of parameters and complex levels, the training and learning process is timeconsuming, etc., so often need to use high-performance computing Platform and massive data sets to assist neural network learning and training [15]. The calculation process is shown in Figure 1.
Different target objects often have different shapes and are susceptible to lighting conditions, noise and background effects. This makes it difficult to manually design a feature that is suitable for all applications, so different features need to be adopted according to the actual testing scenario and testing requirements. Common features include SIFT features, HOG features, Haar-like features and other classic features.
Scale-invariant feature transform (SIFT) is a classic feature extraction algorithm. It uses differential Gaussian Difference of Gaussian function to construct different scale spaces to detect the feature points existing under these scales, and describes the local feature points. It has a very high tolerance to other interfering factors such as image rotation changes, light brightness changes, etc., which can reduce the impact caused by target deformation, so that the feature extraction operation can be successfully completed, so its robustness is very good. The Haar-like feature is proposed by Papageorge et al. It is a feature extraction method based on the gray value of the image, and describes the characteristics of the object according to the change of the sum of the gray values between different blocks in the image. It is a feature method commonly used to describe human faces. For example, compared to other areas, the color of the eyes is darker, and the color of the cheeks is lighter than the area around the eyes. The commonly used features are shown in Figure 2.
Human judgment of bad images is based on the high-level semantics of image expression, while computer vision technology can extract the low-level visual features of the image. There is a large semantic gap between high-level semantics and low-level visual features. If machine learning methods can be used to automatically learn the bad images and find the regularity, then this regularity can be used to solve the problem of bad image recognition. Moreover, bad image recognition is a small sample problem, and how to solve the small sample problem is also a problem to be considered. Particle swarm optimization has the ability of learning  and memory, and has great advantages for solving small sample, nonlinear and high-dimensional pattern recognition problems. This article introduces the principle of particle swarm algorithm, and then describes the basic framework of particle swarm algorithm training and poor image recognition algorithm based on particle swarm algorithm. Finally, the corresponding experimental results are given.

B. EVALUATION OF EMPATHY
Empathy, also known as empathy, empathy, empathy, empathy, empathy, etc. There have been nearly 100 years of research history in the West, and many branches of philosophy, sociology, and psychology have conducted extensive research on it. Empathy is considered by humanist psychologists as a key characteristic that affects the effectiveness of the psychological counseling process [16]. In the Encyclopedia of Psychology, the definition of empathy refers to the emotional interaction between people in interpersonal communication. When an individual sees that others endure pain or good luck, they can also experience uneasy or happy emotions. This ability to share the emotions of others is the ability of empathy. Only when the individual can look at the situation from the perspective of the person concerned can the meaning expressed by the other party's emotions be clarified, thereby producing the corresponding emotional experience and behavioral response. Scholars at home and abroad have conducted some psychological theory and empirical research in the field of empathy, but so far there is no unified understanding of this concept [17]. Throughout previous research, there are at least three interpretation orientations for the interpretation of empathy: emotional orientation, cognitive orientation and multidimensional orientation. The conceptual structure of empathy is shown in Figure 3.
The structure of empathy content is complex and diverse, and the measurement methods are also diverse, generally including: self-reporting method, observer evaluation method and physiological measurement method. The indicators used are mainly language indicators, facial indicators and physiological indicators. The tool for measuring adult emotional empathy was once popular with the emotional empathy questionnaire compiled by Mehrabian in 1972. The questionnaire consists of susceptibility to other people's emotional infections, understanding of strangers' feelings, extremely high emotional responsiveness, tendency to be influenced by others' positive emotions, tendency to be influenced by others' negative emotions, sympathy tendency, and spontaneous assistance to others in difficulty 7 The three dimensions constitute [18].
A lot of research has been conducted on the factors that affect empathy. Through the existing empirical research, it is found that the individual differences that affect empathy and related responses can be attributed to factors such as genetics, family, social environment, personality, peers, age, and gender. Previous scholars believe that genetics will affect many personality dimensions, and empathy as a personality dimension is also affected by genetic factors. Empathy is a socialized emotion, and the family is the first place for children to socialize. Parents are the first teachers of children to the society, so the influence of family factors is particularly obvious and important [19]. At the same time, the specific scenes in the process of interpersonal communication will also have a timely impact on empathy.
The measurement of interpersonal relationship generally includes the measurement of a group's interpersonal relationship structure and the measurement of individual interpersonal relationship status. The social measurement method was first founded by the American scholar Moreno in the 1930s. It asks group members to ask questions and ask them to choose other members to understand the interpersonal relationship between the entire group and each member from the perspective of the group [20]. Behavior measurement method is to observe the factual state of communicative behavior direction, communicative frequency and communicative level without any awareness of the observed person, in order to study people's mutual selection and the characteristics of their actual relationship.
Motivation measurement is a method about the internal reasons for people to interact with each other and establish a good relationship. This method adopts two methods: direct motivation selection and indirect motivation selection. The hierarchical measurement method is to quantitatively determine the proportion of people who hold positive or negative opinions to the total number of people who provide opinions, and then qualitatively analyze the level of people who hold positive or negative opinions, and infer and judge interpersonal relationships from the level analysis Whether it is healthy or negative, it can be more realistic to judge whether the interpersonal relationship is healthy [21]. AHP framework for empathy is shown in Figure 4.

III. EVALUATION OF BAD IMAGES AND EMPATHY BASED ON DATA MINING A. BAD IMAGE CLASSIFICATION BASED ON PARTICLE SWARM OPTIMIZATION
Suppose there is a D-dimensional target search space, and particles form a particle swarm. Each particle is described by a D-dimensional vector, and its spatial position can be expressed as m i = (m i1 , m i2 , . . . , m iD ). This can be regarded as the first solution of the objective optimization problem. Substituting the fitness function to calculate the fitness value can measure the quality of the particles. The flying speed of the particles is also a D-dimensional vector, denoted as v i = (v i1 , v i2 , . . . , v iD ). The position experienced by the particle with the best fitness value is called the historical best position of the individual. The best position experienced by the entire particle group is called the best position in the global history,

denoted as
where i represents the i-th particle, j represents the j-th dimension of particle j, t represents the t-th generation, c 1 , c 2 is two acceleration constants, and r 1 ∼ U (0, 1), r 2 ∼ U (0, 1) is two independent random functions. It can be seen from the above formula that the particles can be adjusted to the best position around itself, and the particles can be adjusted to the best position that the entire particle group can find. The ''position'' in the algorithm is the optimal result obtained by the price and its four indicators in this article. That is, the ''particle'' is the dynamic variable that stores and finds the solution. In order to find a good enough optimal solution, and to avoid the increase in the amount of calculation. Particles are set in this model to perform the optimal solution search. Set the particles to store five parameters. First of all, we must initialize all the particles, that is, let the different parameters in the particles take random values within the feasible range. This process is equivalent to spreading the ''seed'' to find the optimal solution at some points in the entire ''position'' area. Where to spread the ''seed'' is closely related to whether it can find the ''optimal solution'' later. When the distribution position is unreasonable, the ''optimal solution'' may be a local optimal solution rather than a global optimal solution. Second, calculate the fitness value of each particle. The fitness value of this question can be calculated using the objective function. The fitness value is defined as: Then record these values. For each particle, compare its fitness with the individual's best global history. If it is better than the best record, then set this value to the best record. Next, referring to Equation (3), the destination of each particle is adjusted [22]. Every time it is adjusted, it must be re-randomly selected to ensure the randomness of particle behavior. Not only can there be a tendency to approach the optimal solution, but there is also the possibility of finding other more optimal solutions. After finding the first group of optimal solutions, the group of data is removed from the particle swarm, and the remaining data is optimized again to obtain the second group of optimal solutions [23], [24]. That is, the second-best solution after the first group, this operation is performed 20 times, and finally 20 sets of optimized solutions are obtained. In order to ensure that the solution found is optimal, the particles need to be given enough action time to find the solution. Here let the number of iterations be 1000.
A step size of 0.1 is selected, and 20 sets of optimized solutions are obtained by programming. Each set of data is normalized separately, and the new data obtained is used to obtain the relationship between pricing and four indicators through data regression analysis.
Randomly select 100 unfinished tasks in Annex I, and use these data to get the pricing under the optimal solution. When the following inequality is satisfied, it is certain that the task can be completed under the new pricing scheme, otherwise it cannot be completed.

B. EVALUATION OF EMPATHY ABILITY BASED ON DATA ENVELOPE
Data envelopment analysis is a non-parametric estimation method used to evaluate the relative effectiveness of work performance of decision-making units of the same type. With the help of mathematical methods and statistical data to determine the production frontier of relative effectiveness, each decision unit is projected onto the production frontier of DEA, and their relative effectiveness is evaluated by comparing the degree to which the decision unit deviates from the frontier of DEA [25], [26]. Flow chart of particle swarm optimization is shown in Figure 5.
There are decision-making units, each decision-making unit has the same input and the same output: If v i is used to represent the weight of input i and u r is the weight of output r, then the expression of the input-output ratio of decision unit h j is: Then the performance evaluation of the first decision unit can be classified as the following optimization model: 1, . . . , m) , u r ≥ 0 (r = 1, . . . , s) By transforming into an equivalent linear programming problem: Introducing dual variables, the dual problem of the above model can be written as: In which, θ (0 < θ ≤ 1) is the completion rate of the decision-making unit, and the larger the θ , the more output the unit inputs. That is, the closer θ is to 1, the higher the completion rate of the decision unit. VOLUME 8, 2020

IV. CASE ANALYSIS OF BAD INFORMATION CLASSIFICATION AND EMPATHY ABILITY A. CLASSIFICATION OF BAD IMAGES BASED ON DATA MINING
Empathy ability classification accuracy in different categories is shown in Figure 6. Standard value of different types of empathy test is shown in Figure 7.
The comparison of the classification effect between particle swarm optimization and other algorithms is shown in Figure 8 and Figure 9. The three dimensions of empathy ability, interpersonal trust, and interpersonal distress have no significant differences in professional categories. However, further analysis of the six factors of empathy ability revealed that there are significant differences in the factors of reverse comprehension, forward coping, and reverse coping. There is a significant difference in the reverse understanding factor, FIGURE 8. Comparison between particle swarm optimization and other algorithms-category accuracy. FIGURE 9. Comparison between particle swarm optimization and other algorithms-response efficiency. and the science and engineering students score higher than the literature and history. The positive coping factors are significantly different, and the scores of literature and history students are higher than those of science and engineering students. The reverse coping factors are significantly different, and science and engineering students score higher than literature and history students. In this article, through the preparation, prediction and testing of the empathy ability questionnaire of youth, a formal survey questionnaire is formed to conduct relevant research and analysis on the empathy ability, interpersonal trust and interpersonal distress of youth, and prove the empathy ability and various dimensions of youth, Factors have different correlations and influences on interpersonal trust and interpersonal distress, which can effectively predict and affect the interpersonal relationship of youth.

B. EVALUATION OF EMPATHY ABILITY BASED ON DATA ENVELOP
In the specific analysis, factor analysis is used to test the structural validity of the forecast questionnaire. The purpose is to extract factors through factor analysis to see whether the structure between these factors is consistent with the conceived structure. If they are completely consistent, the structural validity is high [27], [28]. First, a factor suitability test was performed, and the Bartlett spherical test was performed on the data of the empathy ability questionnaire of the youth. The test value was 3127, p<0.001, indicating that there is a possibility of sharing factors in each project. At the same time, the sample suitability measure KMO is O. 803, indicating that the data sample is suitable for factor analysis. Secondly, the principal component orthogonal rotation method was used for exploratory factor analysis, and 11 factors were extracted [29], [30]. The factor load is discrete and distributed among more than two factor items, and further factor analysis is performed. As a result, 8 factors are extracted. Then remove the discrete items with low factor load, and then conduct factor analysis, and then 6 factors are extracted. Then, these 6 factors were analyzed by second-order factors, and it was found that there were 3 higher-order factors. Combining the main content of the content of the item, we named the three factors as empathy recognition, empathy understanding, and empathy coping [31], [32]. Finally, the empathy recognition, empathy understanding, and empathy should be analyzed by the factors contained in each of the three factors. The description statistics of empathy ability, interpersonal trust and interpersonal distress are shown in Table 1.
There is a very significant correlation between the six second-order factors of empathy and interpersonal trust and interpersonal distress [33], [34]. Specifically, interpersonal distress and the recognition of others in empathic ability, the consistency understanding factor showed a significant negative correlation at the 0.01 level, and the P values were 0.002 and 0.008, respectively, and the correlation coefficients were 0.177 and 0.151. Interpersonal distress and reverse comprehension are positively correlated at the 0.05 level, with a P value of 0.039 and a correlation coefficient of 0.117. Interpersonal distress and reverse coping factors are positively correlated at 0.01 level, and the correlation is significant. Interpersonal trust and reverse coping factors are negatively correlated at 0.01 level, and the correlation is significant, with a correlation coefficient of 0.198. Interpersonal trust and interpersonal distress are negatively correlated at the 0.01 level, and the correlation is significant, with a correlation coefficient of −0.219.
The results in the analysis table show that there are six types of empathy capabilities after division, namely 001 (cluster 1, 2), 111 (cluster 3, 5), 000 (cluster 4), 010 (cluster 6), 011 (cluster 7) and 100 (cluster 8). Empathy, interpersonal trust, interpersonal distress and demographic variables also have a certain degree of correlation. There is a significant positive correlation between gender and interpersonal trust, and the interpersonal trust of girls is higher than that of boys. It is negatively related to the recognition of others and consistent understanding. The empathy recognition and understanding ability of boys is generally better than girls. Grades are negatively correlated with empathy ability, empathy recognition, and coping [35]. In the sample surveyed, with the gradual improvement of grades, the overall empathy ability of youth and empathy recognition, understanding, and coping dimensions all showed a downward trend. Major is negatively correlated with positive coping, and positively correlated with reverse comprehension and reverse coping, indicating that youth of literature and history have higher empathy and positive empathy coping ability than science and engineering, but reverse comprehension and reverse coping ability are higher than science and engineering to be low. The only child has a negative correlation with empathy ability, empathy recognition, and identification of others, that is, the only child has a higher empathy ability and recognition dimension. In addition, the source area is only positively related to interpersonal distress, that is, the interpersonal distress of rural youth is the most serious, followed by cities and towns, and cities. Student cadres are not related to empathy, interpersonal trust, and interpersonal distress. In summary, the empathy ability of youth has significant differences in demographic variables, and has a significant correlation with the interpersonal trust and interpersonal distress of youth, and can use six second-order factors to interpersonal trust and interpersonal distress of youth Have a significant predictive effect.

V. CONCLUSION
Many researches have been conducted on the factors that affect empathy. Through the existing empirical research, it is found that the individual differences affecting empathy and related reactions can be attributed to genetic, family, social environment and other factors. Based on the conclusions of the research on the empathy ability and interpersonal relationship of youth, this article combines graphic image and data mining technology to construct a new empathy ability analysis model. First, the theoretical principles of bad image recognition technology and their application in the evaluation of empathy ability are expounded. Then, based on image recognition and fusion particle swarm optimization method, the classification of bad images was studied. Finally, on the basis of image classification, data envelopment method is used to grade the young people's empathy ability. The actual case analysis and performance test results illustrate the superiority of the implemented image classification and empathy evaluation method based on image recognition and data mining. Among the surveyed youths, girls' empathy recognition ability and interpersonal trust are higher than that of boys. This conclusion is similar to the relevant research results at home and abroad, confirming Eisenberg's ''gender is one of the factors that cause empathy individual differences. And women have a higher emotional component of empathy than men.'' This conclusion reveals that college educators should pay attention to the difference between boys and girls in fostering empathy and interpersonal trust. It is hoped that it will have certain practical educational significance for the empathy ability and interpersonal attention of youth.