Teaching Teacher Recommendation Method Based on Fuzzy Clustering and Latent Factor Model

Colleges and universities attach great importance to the quality of undergraduate teaching. To virtually guarantee the course’s teaching quality, the key lies in recommending suitable teachers for the course scientifically. It is a seemingly simple but very complicated problem. Moreover, with the development of colleges and universities, new courses are continually set up, and new teachers are introduced, which further complicates the problem. The problem has not been solved well for many years. Therefore, we propose a course teacher recommendation model (FCTR-LFM) based on fuzzy clustering and the latent factor model (LFM) to solve this problem. Firstly, under the guidance of pedagogy theories and methods, we conduct quantitative modeling for teachers and courses’ relevant characteristics and combine the quantitative results with historical teaching scores to establish a large-scale sparse course teaching evaluation matrix as the recommendation dataset. Next, we adopt the improved fuzzy clustering model to realize teachers’ automatic clustering according to their characteristics and use the teacher cluster to reconstruct the teaching evaluation matrix, significantly reducing the dataset’s size and reducing the sparsity. Then, we used the improved LFM to predict the score items in the evaluation matrix, including the missing score items. Finally, the prediction evaluation scores are sorted according to the course, and the TOP-N recommendation of the course teachers is realized. The experimental results show that FCTR-LFM can realize the prediction and recommendation well using the optimized parameters. It effectively solves the problem that there is no scientific basis for recommending suitable teachers for the course for a long time.


I. INTRODUCTION
At present, colleges and universities still attach great importance to undergraduate teaching quality under the situation that new courses and teachers are constantly introduced. The key to improving the quality of teaching lies in whether the teacher matches the course. Studies have shown that the quality of course teaching is a complex multi-dimensional structure. Which is closely related to the teachers' educational background, degree, title, gender, age, teaching age, professional matching, knowledge structure, job burnout, and The associate editor coordinating the review of this manuscript and approving it for publication was Juan A. Lara .
teaching style [1]- [10]. The course teaching quality is also closely related to the course's complexity, difficulty, breadth, and depth [11]. It is reflected in the results of teaching evaluation [12], [13]. An implicit relationship exists among teacher characteristics, course characteristics, and teaching quality. A way to resolve this problem is to establish an appropriate course teacher Recommendation System (RS). It can be used to excavate the implicit relationship among teacher, course, and teaching evaluation, solve cold start and data sparsity and accurately predict the teaching evaluation score between each course and each teacher. In this way, we can accurately recommend TOP-N teachers for each course to improve the teaching quality scientifically.
In recent years, studies related to education recommendation mainly focus on the recommendation system and prediction. The research on the recommendation system mainly includes the following aspects: (i) Recommendation of teaching resources. Wu et al. [14] constructed an online education knowledge recommendation system based on an improved Neural Networks (NNs) path sorting algorithm. Oliveira et al. [15] proposed a content recommendation framework to customize content recommendations according to each learner's style. Taufik et al. [16] used the K-Nearest Neighbor (KNN) classification method to help students choose professional knowledge consistent with their abilities and interests.
(ii) Course recommendation. Xu and Zhou [17] designed a curriculum recommendation framework based on a deep learning model to extract multi-mode course features. Pan et al. [18] proposed a course recommendation model based on deep learning to generate viewpoints from different perspectives and provide students with sensible course recommendations. Wang et al. [19] proposed using attentionbased convolutional NNs to predict user ratings and recommend courses with TOP-N positions.
(iii) Professional or learning object recommendation. Alghamdi et al. [20] proposed a fuzzy recommendation system to help students choose the right major. Wan et al. [21] used a Random Forest (RF) algorithm to implement a personalized professional recommendation system according to students' interests and learning ability. Sergis and Sampson [22] developed a recommendation system to uniformly support teachers' select learning objects from the existing learning object library.
Research on prediction mainly includes: (i) Performance prediction. Huang et al. [23] designed a recommendation system based on a cross-user domain Collaborative Filtering (CF) algorithm to predict each student's performance in elective courses accurately. Zacharis [24] used classification and regression trees to analyze student course activity data to design a prediction model to predict student course performance. Imran et al. [25] proposed a student achievement prediction model based on a supervised learning Decision Tree (DT) classifier.
(ii) Student performance prediction. Data collected by Mozahem [26] from the Learning Management System were used to predict student performance in face-to-face classrooms. Chiu et al. [27] used statistical models to predict student performance based on learning behavior data. Almasri et al. [28] proposed a classifier technique based on the ensemble meta-based tree model to predict student performance.
(iii) Learn early warning and prediction. Qiu et al. [29] proposed an end-to-end dropout prediction model based on convolutional NNs to predict students' dropout problems in MOOCs. Ma et al. [30] discussed predicting students' performance before class according to the traditional classroom teaching situation. That is, testing the students at risk in each course before the beginning of each course. Zacharis [31] analyzed the log file data in the modern learning management system and predicted the students at risk of poor performance in blended learning courses.
(iv) Other predictions. Verma et al. [32] used a machine learning classifier to predict Indian and Hungarian university students' knowledge of information and communication technology and mobile technology. Huang et al. [33] designed a novel cross-user domain CF algorithm to accurately predict each student's optional course score using the most similar advanced students' course score distribution. Verma et al. [34] predicted the student's native place based on technological awareness having various development, availability, usability, educational benefits, etc.
Hybrid recommendation systems combining several recommendation technologies are becoming critical, enhancing the accuracy and reliability of recommendations. Zhu et al. [35] proposed an off-line course recommendation model that fuses network structure features with graphical NNs and combines user interaction activities with tensor factorization to improve course students' grading accuracy not attend. Wan and Niu [36] proposed a hybrid filtering recommendation method based on the Learners' influence propagation method, self-organizing recommendation method, and sequential pattern mining, which improved the personalization diversity recommendations. Esteban et al. [37] combined CF and content-based CF to propose a hybrid recommendation system; it is used to recommend the most suitable courses to students, which improved the recommendation's reliability and performance. Nafea et al. [38] designed an effective recommendation algorithm based on the K-means clustering algorithm, cosine similarity measure, and Pearson correlation coefficient. They recommended personalized course learning objects according to students' learning styles, which improved the recommendation accuracy.
From the above research status, we can see that the current research applied to the recommendation system in higher education does not involve the critical task of recommending suitable teachers for courses. Machine learning techniques (mainly NNs, KNN, K-means, CF, DT, RF.) used in these studies can solve the above-recommended tasks well. However, the dataset used in our task is constructed from the matrix of teachers' characteristics, course characteristics, and teaching scores, which is too sparse and has cold start problems. There is a big problem using any one of the above machine learning techniques alone to achieve the course teachers' recommended tasks. Our research tries to solve these problems and improve the effectiveness and efficiency of the teacher recommendation system. We are committed to the following specific issues: (i) The teacher's recommendation task's dataset is incredibly sparse, making it challenging to implement the recommendation using any of the above machine learning technologies alone [39]. The scarcity of data is that while there are many teachers and courses, each teacher teaches only a small number of courses. Therefore, extreme data sparsity presents a significant challenge to User-based CF or Item-based CF recommendations [86].
(ii) Many teachers with different characteristics, especially the characteristics of teaching style and job burnout, are typical unbalanced samples. It is challenging to set a fixed number of clustering based on K-means to meet automatic clustering requirements among teachers based on characteristics.
(iii) How to express the implicit relationship among teachers' characteristics, courses' characteristics, teaching scores, and sufficient good reasonable predictive teaching scores for teachers' non-teaching courses is a considerable challenge. Suppose the missing items in the matrix were filled with the mean value using the CF-based matrix decomposition method. It is then decomposed into two matrix multiplication using Singular Value Decomposition (SVD) [40]. The calculated predicted scores could hardly express the teacher and the course's real implicit correlation.
(iv) For a high-dimensional teaching evaluation matrix, how to effectively realize dimension reduction, improve the accuracy and efficiency of prediction, and reflect the implicit relationship between objects. It is another problem that needs to be solved.
To this end, we proposed a course teacher recommendation system (FCTR-LFM) based on the fuzzy clustering and latent factor model (LFM) [41] to improve the personalization and efficiency of recommendation. The main work includes: under the guidance of pedagogy norms, a series of methods are defined to obtain quantitative results of teacher characteristics, course characteristics, and teaching performance. These results will be used to build the experimental dataset. The dataset is presented as a high-dimensional sparse evaluation matrix. A fuzzy clustering model for teachers is designed, enabling teachers to cluster automatically according to their characteristics and solve the problems of reducing the evaluation matrix's sparsity and teachers' cold starting. Based on the improved LFM, the evaluation matrix is decomposed into the product of two low-dimensional matrices containing implicit factors. It reflects the implicit relation between teacher characteristics, course characteristics, and teaching grading. Predict all missing scores in the evaluation matrix. Recommend the top N teachers according to the course. Specifically, the main contributions of this paper are as follows: (i) Establish the experimental dataset. A series of quantitative models are defined under the pedagogical norms, which are used to quantify teachers' Educational Background, Degree, Title, Gender, Age, Teaching Age, Job Matching, Professional Matching and Teaching Style, as well as Course Difficulty and Teaching Score. Then, these are used to construct the teaching evaluation matrix as the experimental data set. In the evaluation matrix, if the teacher teaches a particular course, the corresponding evaluation value is the product of the teaching score and the course difficulty; otherwise, the value is 0. In this way, a sparse experimental data set is established, and the correlation among the three is also reflected.
(ii) The fuzzy clustering model of teachers is established. It does not need to set the number of clusters in advance but through teachers' characteristics, reflecting flexibility. First of all, we use the teacher eigenvector to establish the teacher set and then use the standardization method to process the teacher set's data to get a feature index matrix. Then the fuzzy similarity matrix is established by the similarity coefficient method. Then the square self-synthesis method is used to obtain the transitive closure. Finally, the appropriate confidence level value is set to realize teacher clustering. In this way, we use the teacher cluster to replace the related teachers in the evaluation matrix, effectively reducing the evaluation matrix's scale and sparsity and solving the cold start problem.
(iii) Establish the prediction model of teaching evaluation. When using the teacher clusters to reduce the evaluation matrix scale, the evaluation between the teacher cluster and the course is replaced by the mean value of non-zero evaluation within the teacher cluster, which improves the LFM model. The high-dimensional sparse evaluation matrix is then decomposed into the product of two low-dimensional matrices with implicit internal relations to predict the evaluation between teacher clusters and curriculum effectively. This way, it solves the problem that other methods can not reflect the implicit relationship among teacher characteristics, curriculum characteristics, and teaching scores and improves prediction efficiency.
(iv) The TOP-N teachers are recommended according to the course. First of all, the evaluation of the course in the prediction evaluation matrix is sorted in descending order, and the top teacher cluster is selected as the recommended object. The cluster teachers are then ranked in descending order according to their evaluation in the fuzzy similarity matrix, and the top n teachers are selected for recommendation. If the number of recommended teachers in the cluster is not enough, the teachers who are in the second cluster will be selected according to the same method. In this way, the efficiency of the TOP-N recommendation is improved.
In this way, we are expected to solve course teachers' recommendation problems and find a new scientific method to ensure course teaching quality.
This study's remainder is constructed as follows: In Section II, we discuss the related work on partition clustering methods and TOP-N recommendations. In section III, we describe data collection and preprocessing. In Section IV, we describe FCTR-LFM in detail. In Section V, we provide the experimental results and evaluate them. Finally, in Section VI, we conclude the paper and point out new directions for research.

II. RELATED WORK
In this section, we primarily focus on the partition clustering and TOP-N recommendation. Hence, we discuss the related work about the partition clustering and TOP-N recommendation methods.

A. PARTITION CLUSTERING METHODS
Clustering methods have many types. The partition-based clustering method with low computational complexity is frequently applied to deal with clustering large datasets. The representative algorithms include K-Means [42], mixed density clustering, graph clustering, and fuzzy clustering. K-Means algorithm is a widely used and efficient clustering method. However, this algorithm has obvious limitations, such as the quality of the clustering result depends on the selection of the initial clustering center, the sensitivity to abnormal sample points, and it can only handle numerical datasets. From the 1960s to the present, many researchers have improved the K-Means algorithm to overcome these limitations. Bradley and Fayyad [43] proposed an improvement strategy to overcome the initial center's influence. Pelleg and Moore [44] proposed a variant of the algorithm X-Means algorithm to accelerate the iterative process. Berkhin and Becher [45] extended the K-Means algorithm to the field of distributed clustering. The K-MODES algorithm proposed by Nguyen [46] overcomes the K-Means algorithm. The Means algorithm can only deal with the defects of numerical data. The hybrid density clustering algorithm transforms unsupervised learning into a supervised classification method [47]. These probability distributions are commonly used distributions, such as Gaussian distribution and t distribution. Maximum Likelihood Estimation is an essential algorithm in parameter estimation. Among these algorithms, the expectation-maximization algorithm is the most commonly used [48], [49]. The Cluster Analysis Statistical Test algorithm is also considered a clustering algorithm based on a probability model [50]. The typical graph clustering algorithm is the Clustering Identification via Connectivity Kernels algorithm based on a minimum weighted segmentation [51], [52]. This algorithm establishes a graph theory association between the sample points and the graph's vertices and sample points (edges and weights on the edges). The method based on Ding and Li [53] proposed random walks is also a graph clustering algorithm. Spectral Clustering algorithm transforms clustering into solving secondary optimization problems, can identify clusters of arbitrary shapes and converge to the optimal global solution [54], and has a wide range of applications in the field of image analysis. Lee et al. [55] and Shi et al. [56] also proposed a fast and adaptive graph-based relaxed clustering algorithm. In 1981, the fuzzy clustering method was implemented by Bezdek for the first time. The algorithm is the famous Fuzzy C-Means algorithm [57], a widely used clustering algorithm in image segmentation. This algorithm uses the degree of membership to determine the similarity of sample points and is a fuzzy clustering method based on the objective function. Subsequently, the research team of Hathaway [58] performed global optimization of the fuzzy objective function.

B. TOP-N RECOMMENDATION METHODS
The TOP-N recommendation is to recommend a list of items that may be of interest to the user through the user's implicit feedback information for their reference. The two major categories of TOP-N recommendation methods are as follows: pointwise and pairwise methods.
Pointwise methods learn the model parameters by minimizing a pointwise loss function to fit users' absolute rating values. Pan et al. [59] proposed a type of collaborative filtering (OCCF) problem and proposed two methods to solve the problem: one is based on negative sample weight. The other is based on negative sample sampling. Hu et al. [60] regard the user's rating data as an indicator of positive and negative preferences related to different confidence values. Ning and Karypis [61] proposed a Sparse Linear Method for TOP-N recommendation. This method learns the sparse aggregation coefficient matrix of the item by solving the regularization optimization problem. Kabbur et al. [62] proposed a Factorial Item Similarity Model (FISM) for the TOP-N recommendations to improve TOP-N recommendations' effectiveness. In their model, the item-item similarity matrix is decomposed into the product of two low-rank latent factor matrices.
Pairwise methods attempt to directly optimize rankingoriented metrics based on the assumption that users prefer the rated items to the unrated items. Rendle et al. [63] are the pioneers of pairwise recommendation. They proposed a general OPTimization criterion for Bayesian personalized ranking (BPR). Pan and Chen [64] proposed a group BPR (GBPR) model by introducing rich interactions between users. Shi et al. [65] presented the Collaborative Less is More Filtering model by directly maximizing the metric MRR. The work in [66] proposed a Tensor Factorization for MAP maximization model as a context-aware TOP-N recommendation model by maximizing the MAP metric. These pairing methods can achieve better performance than point pair methods. However, these methods treat all unobserved feedback as negative feedback, thus ignoring the users' relative preferences in the unobserved items. This separation results in an extreme imbalance between positive and negative samples.
Recently, a few studies have tried to solve this problem. Zhao et al. [67] developed a model called Social BPR based on the assumption that users tend to choose the goods selected by their friends. Song and Meyer [68] proposed a Generalized Area under the Receiver Operating Characteristic Curve metric to quantify ranking performance in signature social networks. Liu et al. [69] proposed a TOP-N recommendation algorithm called Collaborative Pair Learning Ranking based on the view that users tend to prefer the items selected by neighbors. Yu et al. [70] tried to combine various types of user-item relationships into a unified pairwise ranking model to optimize map and MRR ranking indicators. Although these methods take advantage of users' preferences in the VOLUME 8, 2020 unobserved projects, they are all based on heuristic rules, and their performance is not satisfactory.

III. DATA ACQUISITION AND PREPROCESSING
In this section, we establish quantitative models of teacher characteristics, course difficulty, and teaching scores under pedagogy theory guidance.

A. TEACHER CHARACTERISTIC DATA
Existing studies have shown that teachers' relevant characteristics significantly correlate with students' learning effects [1]- [4]. Basow and Silberg [5] found that students' evaluation of male teachers is higher than that of female teachers. Cohen [6] found a positive linear correlation between teaching age and evaluation scores by analyzing thousands of university teachers' student evaluation results. Greimel-Fuhrmann and Geyer [12] showed that the teacher's professional title was positively correlated with student evaluation. Agaoglu [7] found that the professional title is high, and the students' evaluation results are great when the teacher's age and teaching age are long. Spooren and Mortelmans [13] found that teachers' age, gender, academic qualifications, and professional titles also have a specific influence on the results of students' evaluation of teaching. However, the current research on teacher characteristics' influence on teaching evaluation results considers individual teacher characteristics such as gender, professional title, age, and educational background. Few studies consider all the background characteristics of teachers and their interaction effects. Therefore, we decided to use teacher education, degree, title, gender, age, teaching age, professional matching, and job burnout as teacher's characteristic data through serious argumentation. The specific statute methods are defined as follows.
Def 1: Educational background factor (Ebf): Ebf is set to 1, 0.7, and 0.5 for the doctor, master, and undergraduate college, respectively.
Def 4: Gender factor (Gf), Gf set to 1 and 0.5 for male and female, respectively.
Def 6: Teaching age factor (Taf), which represents the number of years of teachers' teaching work, reflects the growth process of teachers' teaching and marks the teachers' teaching profession's growth stage. We divided the teaching experience into six sections, according to [8], [9]. The teaching experience factors less than 2 years were set as 0.1, those of 3∼5 years as 0.3, those of 6∼9 years as 0.5, those of 10∼15 years as 0.7, those of 16∼20 years as 0.9, and those of more than 21 years as 1.
Def 7: Job burnout factor (Jbf). Job burnout refers to individuals' long-term reactions due to their inability to effectively cope with the work's continuous pressure. The negative mood, attitude, and bad personality characteristics caused by this factor easily affect students and lead to teachers' psychological disorders [71]. The job burnout's characteristic data come from three dimensions: Emotional Exhaustion, Depersonalization, and Reduced Personal Accomplishment. These three dimensions' data can be obtained through the survey results of the MBI-educators survey designed by Christina Maslach and verified by the Cronbach-α coefficient method. The acquisition process is shown in Figure 1. Then, the normalization method defined in (1) is used to obtain the burnout factor. Teacher burnout is serious when the burnout factor is large. The teaching style manifests teachers' preferred, habituated, charismatic, and stable teaching ideas, methods, skills, ideas, and manners formed in the long-term teaching activities and are different from those of other teachers. This style is the product of teachers' personality and teaching skills [72] and is closely related to teachers' thoughts, cultivation, and character. Teaching style has an important influence on the teaching effect. It affects the formation of students' learning style and attitude and the students' personality characteristics, interest cultivation, cooperative spirit, and learning efficiency [73].
We adopt the famous American psychologist Sternberg's teaching style classification method, which divides teachers' teaching styles into seven categories from cognitive style [74] ( Table 1). Students are greatly familiar with their teachers' teaching styles. We first revised the teaching style evaluation scale (TSTI) [10] compiled by Grigeorenko and Sternberg to accurately obtain the teachers' teaching style. The revision's main content is to change each question's mood in the questionnaire from the teacher's perspective to the student's point of view to facilitate the questionnaire survey of students. However, 7 subscales and 49 questions remain without changing the number and meaning of the questions. In each question, seven options are given, including ''completely inconsistent'', ''not very consistent'', ''slightly consistent'', ''basically consistent'', ''consistent'', and ''very consistent'', with the corresponding score of 1-7.
Then, the collected questionnaires are sorted out and analyzed. During processing, each teaching style type' value is calculated according to (2) for the questionnaire submitted by each student.
where TS i represents the value belonging to the ith teaching style, i ∈ {Ls, Es, Js, Gs, Locs, Libs, Cs}; j ∈ R i represents the range of different question numbers according to the value of i, R i is divided by TSTI agreement; and v j ∈ {1, . . . , 7} represents the corresponding point value of choice in question j.
The values of seven teaching styles submitted by each student were calculated according to (3) to remove a small number of random questionnaire samples. The questionnaire samples with a variance value of less than or equal to 0.3 were deleted. Almost all questions are the same option because they do not seriously think about the answer.
where TS represents the mean of the seven teaching styles obtained from this sample. We use the Cronbach-α coefficient [75] method shown (4) to verify that the remaining samples are credible.
where k (here 49) represents the number of problems; and k i=1 ς 2 i represents the score variation of item i, which is given by (5). (5) is the score given by each student for each question, and n is the number of valid questionnaires. ς 2 t in (4) represents the variance of the total scores obtained from all valid questionnaires and is given by (6).
x t in (6) is the sum of the scores given by each student for all questions.

C. TEACHING SCORING DATA
Given that some characteristic data of teachers dynamically change over time, the acquisition time of teaching scoring data should be consistent with that of teachers' characteristic data. The teaching scoring data are mainly based on the comprehensive scoring results. Such data come from the student, supervisory, and peer scoring. The teaching score was normalized by Definition 9.
Def 9: Teaching score (Ts) is normalized by (7) because teachers can teach more than one course at a time. (7) where Ts t,course denotes the normalized value of the teaching score of the course taught by the teacher t; v t,course denotes VOLUME 8, 2020 the comprehensive result of the teaching score of the course taught by the teacher t; v min denotes the lowest value in all the teaching scores; v max denotes the highest value in all the teaching scores; E max denotes the highest value after normalization (generally takes one); and E min denotes the lowest value after normalization, which is set to 0.1 in this study.

D. COURSE DIFFICULTY FACTOR
Course Difficulty Factor (Cdf) is difficult to quantify [76]. We establish the model shown in (8) to quantify the course difficulty based on the course difficulty model established by Shi et al. [77] and guided by the revision method proposed by Li [78].
where Cdf represents the difficulty coefficient of the course. When the Cdf value is large, the difficulty of the course is great. Dp represents the depth of the course. This factor uses the sum of the abstraction levels to express the depth of the course. The abstraction level represents the degree of abstraction related to concepts and principles and the degree of association between concepts, quantified by the sum of all knowledge chains' abstraction degrees. Sp is the breadth of the course, which refers to the scope and breadth of the field involved in the course content, which can be quantified by the number of course objectives that integrate the knowledge, skills, process, methods, emotional attitudes, and values of the corresponding course. Ch is the course's class hours, which refers to the time required to complete the course content using class hours to quantify. Dp/Ch represents the depth of the course per unit time, which is called comparable depth. Sp/Ch is the course breadth per unit time, which is called comparable breadth. η (0 < η < 1) is the weighting coefficient, reflecting the degree of emphasis on the course's comparable depth and breadth.

IV. MODEL AND ALGORITHM DESIGN
In this section, we first introduce the teacher's fuzzy clustering and latent factor model of the teaching evaluation matrix. Then, we introduce the teacher recommendation model and algorithm design of the course teaching that incorporates the teacher's fuzzy clustering.

A. TEACHER FUZZY CLUSTERING DESIGN
The clustering method's classification principle is to divide each data sample into related categories according to the specified method. The essence of fuzzy clustering analysis [79] is not to set the category in advance but to construct the fuzzy matrix by studying the object's attributes. On this basis, the classification relationship is determined. Therefore, the characteristics of uncertainty and the object of study can be objectively reflected.
We design a fuzzy clustering model for teachers to realize the automatic classification of teachers. This model uses 15 characteristics, such as teachers' education, degree, title, gender, age, teaching age, job burnout, professional matching, and teaching style (including Legislative, Executive, Judicial, Global, Local, Liberal, and Conservative) to construct a fuzzy matrix, which can effectively determine the classification relationship. The clustering results can provide effective and accurate recommendation information for the course arrangement.

1) TEACHERS' CHARACTERISTIC DATA STANDARDIZATION
We let the set of teacher objects be X = {x 1 , x 2 , . . . , x m }. Each teacher x i has d (d = 15) characteristic indicators, that is, x i can be represented by the following d-dimension characteristic indicator vector: where x ij represents the jth characteristic index of the ith teacher. All characteristic indexes of the m teachers form a matrix, which is called X * = (x ij ) m×d , and X * is the characteristic index matrix of X : Given that the teaching style is different from other features in dimension and order of magnitude, data standardization should be used in the feature index matrix to unify each index in a common numerical range. In this study, we use the standardization method to standardize the data to calculate the mean value and variance of the j column of the characteristic index matrix X * . Then, we make the following transformation: where, The characteristic index matrix after data standardization is expressed as follows:

2) CONSTRUCT THE TEACHER'S FUZZY SIMILAR MATRIX
According to the standardized data, the similarity coefficient y ij ∈ [0, 1] between teachers x i = x i1 , x i2 , · · · , x id and x j = x j1 , x j2 , · · · , x jd is determined by the similarity coefficient method, and the fuzzy similarity matrix Y = (y ij ) m×m is established. VOLUME 8, 2020 If y ij < 0, then we replace y ij with y ij = 1+y ij 2 . If the value range of y ij is not in [0, 1], then y ij = y ij −w W −w is used instead of r ij , where W = max i,j (y ij ), and w = min i,j (y ij ). In this way, we establish the teacher's fuzzy similarity matrix Y m×m . 11 y 12 · · · y 1m y 21 y 22 · · · y 2m · · · · · · · · · · · · y m1 y m2 · · · y mm    

3) FUZZY CLUSTERING
A new fuzzy equivalent matrix is constructed from fuzzy similarity matrix Y . First, the square self-synthesis method was used to work out the transitive closure t(Y ) of fuzzy similarity matrix Y . Specifically, Y 2 , Y 4 were calculated starting from Y and repeatedly multiplying by itself. When Y k · Y k = Y k for the first time, we obtain t(Y ) = Y k . Then, the appropriate confidence level value (λ ∈ [0, 1]) is appropriately selected to find the λ truncated matrix t (Y ) λ = y ij (λ) which is an equivalent Boolean matrix on X . With regard to x i , x j ∈ X , if y ij (λ) = 1, then teachers x i and x j can be classified into the same class at λ confidence level.

B. DESIGN OF THE MATRIX LATENT FACTOR MODEL OF THE TEACHING EVALUATION MOMENTS
The goal of the teacher recommendation system for courses is to analyze the historical teaching evaluation by majors of college teachers, build an appropriate model, calculate the predictive evaluation of teachers' teaching in other courses, and recommend teachers for courses by TOP-N recommendation [80] according to the predictive evaluation. We cannot simply deny teachers' incompetence in teaching other courses based on their historical teaching records. However, we need to find out the implied recommendation coefficient between teachers and courses to serve as the basis. The SVD algorithm is a common data dimensionality reduction technology. The evaluation matrix is a large-scale multi-dimensional sparse matrix in the course teacher recommendation system due to many teachers and courses. Accordingly, this matrix requires a large amount of storage space to convert sparse matrices into dense ones. The algorithm can only be applied to matrix decomposition with low dimensions. Thus, the SVD algorithm is not an ideal teaching teacher recommendation algorithm.
Simon funk proposes an improved algorithm called funk SVD algorithm or LFM, widely used in the recommendation system based on the SVD algorithm. The core idea of LFM is to connect teachers and courses through latent factors. We set the evaluation matrix of teacher and course as R m×n , m is the number of teachers, n is the number of courses, and R ij is the product of the score (Ts ij ) obtained by teacher i teaching course j and the difficulty coefficient (Cdf j ) of the course taught, which is called evaluation (i.e. R ij = Ts ij × Cdf j ). If no direct teaching relationship exists between teacher i and course j, then R ij = 0. The numbers of teacher m and course n are relatively large, and that of courses taught by each teacher is small in the course set. Accordingly, the evaluation matrix is a high-dimensional sparse matrix. The evaluation matrix R m×n can be decomposed into two low-dimensional matrices and multiplied by the implicit model to calculate the teacher's predictive evaluation on each course. Specifically, the teacher's latent factor matrix T m×f and the course latent factor matrix C n×f are multiplied, as shown in (14). (14) where F is the number of latent factors, each line of matrix T m×f represents a vector of the teacher's weight on the latent factor. Meanwhile, each line of matrix C n×f represents the probability distribution of every course on the latent factor vector. Then, teacher i's predictive evaluation value of course j can be calculated by (15). (15) where t if = T (i, f ) represents the weight of teacher i to latent factor f , and c jf = C(j, f ) indicates the probability distribution of course j to latent factor f . About the solution of matrices T and C, the RMSE between the observed and predicted evaluation values could be used in the training set to learn the T and C matrices if a suitable T and C matrix can be found to minimize the training set. The prediction error can also minimize the prediction error of the test set. Therefore, we define the loss function as follows.
where r ij is the actual evaluation score of teacher i on course j, and Train is the training set. Direct optimization of the loss function may lead to overfitting of learning. Therefore, the training model's loss function is added with a term to prevent overfitting, and (17) is obtained.
ω is a regularization parameter, which can be obtained through experiments. A stochastic gradient descent algorithm can be used to minimize the loss function. The algorithm finds the direction of rapid descent by obtaining the partial derivatives of parameters t if and c jf and then continuously optimizes the parameters through iteration. VOLUME 8, 2020 Calculating the partial derivatives of the loss function according to parameters t if and c jf yields (18) and (19): Parameters t if and c jf are iterated along the direction where the gradient decreases fastest until the parameters converge, and the recursive equation is obtained as follows: where a is the learning rate. The iteration will rapidly decline when a is high.

C. TEACHER RECOMMENDATION MODEL AND ALGORITHM DESIGN BASED ON FUZZY CLUSTERING AND LFM
The LFM model algorithm performs well in solving the lowdimensional sparse scoring matrix; however, its efficiency is low when if the matrix is large. We propose the FCTR-LFM that integrates fuzzy teacher clustering to solve the low efficiency of the LFM model algorithm when dealing with large-scale sparse matrices. The main idea of the model is first to use fuzzy clustering to cluster teachers automatically. Then, we use teacher clusters to replace the teachers in the original FLM model, thereby greatly reducing the evaluation matrix scale and improving the efficiency of the model.
When establishing the FCTR-LFM model, the LFM model must be improved, including the evaluation matrix based on teacher clustering and the prediction evaluation model. The implementation process of the TOP-N recommendation algorithm must also be improved.

1) IMPROVEMENT DESIGN OF THE OBSERVATION AND EVALUATION MODEL BETWEEN TEACHER CLUSTER AND COURSE
The evaluation matrix R m×n is a matrix established according to the relationship between the teacher and the course. The value of R ij is determined according to whether the teacher has taught the course and the difficulty coefficient of the course. When the teacher has taught the course, R ij takes the product of Ts ij and Cdf j ; otherwise, it takes a zero value. After Z teacher clusters are formed by fuzzy clustering, we use the relationship between teacher clusters and courses to establish a new smaller evaluation matrix R Z ×n . Many teachers may be assigned to the same cluster z ∈ {1, 2, . . . , Z }. In cluster z, some courses have been taught by teachers, and others have not been taught. How to determine the observed evaluation value R zj ? After a careful analysis, we think that the observed evaluation value R zj between teachers cluster z and course j can be determined by the mean value of nonzero evaluation between teachers and courses in the cluster shown in (22).
where p is the number of nonzero evaluation values R ij between teacher i and course j in the cluster.

2) IMPROVED DESIGN OF THE PREDICTIVE EVALUATION VALUE MODEL
The predicted evaluation valueR (z, j) =r zj of the teacher cluster z on the course j can be modified from (15) to (23).
where t zf = T (z, f ) represents the weight of teacher cluster z to latent factor f , and c jf = C(j, f ) indicates the probability distribution of course j to latent factor f . Thus, we can obtain a new predictive evaluation matrixR Z ×n .

3) IMPROVED DESIGN OF THE TOP-N RECOMMENDATION METHOD
In the process of recommending N teachers for course j, first, the rows in the prediction evaluation matrixR Z ×n are arranged in descending order according tor zj , and the top teacher cluster z 1 is selected as the recommended candidate cluster. In teacher cluster z 1 , the TOP-N teachers are recommended for course j according to the similarity of teachers in the fuzzy similarity matrix Y . If the total number of teachers in z 1 cluster is less than the recommended number N , then we select the remaining recommended teachers from the second cluster z 2 from the prediction evaluation matrixR Z ×n . This process goes on until the TOP-N teachers are recommended. According to the above analysis, the recommended algorithm of the FCTR-LFM model is designed as follows: Input: number of teachers m, teacher characteristic index matrix X * = (x ij ) m×d , confidence level λ, number of courses n, number of latent factor F, number of iterations N , learning efficiency α, and regularization parameter ω Output: list of TOP-N teachers recommended by course 1. //Fuzzy clustering of Teachers 2. X = x ij m×d ← According to (10), the teacher characteristic index matrix X * = (x ij ) m×d is standardized by using the standardized method; 3. Y = (y ij ) m×m ← According to (12), the fuzzy similarity matrix between teachers is established by the similarity coefficient method; 4. //The transitive closure t(Y ) of fuzzy similar matrix Y is obtained by the square self-synthesis method Select the appropriate confidence level value λ to obtain the cut matrix of t (Y ) = y ij m×m ; 9. Z ← According to t(Y ) λ , solve the number of clusters after the teacher clustering; 10. //Construction of teacher and course evaluation matrix R m×n = (R ij )m × n 11. for(i = 1, j = 1; i <= m, j <= n; i + +, j + +) 12. if(teacher i has a teaching relationship with course j)R ij = Ts ij × Cdf j ; 13. else R ij = 0; 14. //Construct teacher cluster and courses evaluation matrix 17. for(j = 1; j <= n; j + +)R zj = 1 T zf + = α× (error ×C fc -ω× T zf ); C fj + = α× (error × T zf -ω × C fj ); 27. α = α * 0.9; // When the optimization reaches a certain level, the learning rate must be slowed down, and the optimal value must be gradually approached 28

V. EXPERIMENTS AND RESULTS ANALYSIS
In this section, we first introduce the experimental dataset, followed by the evaluation indicators. Finally, we describe the experiments and results.

A. DATASET
The experimental data come from a certain university major, and recommending teachers for courses between different majors is meaningless. According to the definition in Chapter III, we can obtain the Teaching Recommendation Dataset (TRDs), which is a sparse matrix, and the dataset statistics are shown in Table 2. The characteristic data examples of teachers are shown in Table 3, and the teaching styles of all teachers in the dataset are shown in Figure 2. Figure 2 demonstrates that teachers' teaching styles are different in the seven dimensions proposed by Sternberg.
When the weighted coefficient η = 0.4 and after the maximum and minimum normalization, the courses' difficulty factors in the dataset are shown in Figure 3. Figure 3 shows that the method adopted can effectively and correctly distinguish the difficulty of each course.
The fivefold cross-validation was adopted in our experiments. First, the dataset TRDs is divided into five mutually exclusive subsets of similar size. Each subset TRDs i keeps the data distribution as consistent as possible. That is, it is obtained by stratified sampling from TRDs. Then, the union of four subsets is used as the training set. The remaining subset is utilized as the test set. In this way, five training and test sets can be obtained. Accordingly, five training and tests can be conducted, and the final return is the average of the five test results.

B. EVALUATION INDICES 1) EVALUATION INDEX OF THE RATING PREDICTION
The root means square error (RMSE) can be used to measure the rating prediction accuracy of the algorithm: where Test is the test set. When the RMSE value is small, the error is also small, and the recommendation system's performance is good.

2) TOP-N RECOMMENDED EVALUATION INDEX
Previous studies on recommender systems [81], [82] have shown that the common error rate indicators in rating prediction, such as mean square error (MSE) and RMSE, cannot completely characterize the performance of the recommendation algorithm. The performance of the algorithm mainly depends on the recommendation of the TOP-N item.   We will evaluate the hit rate through accuracy, recall, and F1-measure to fully verify the feasibility of our method: where R(t) and Q(t) are the teachers' recommended list on the training and test sets.

C. EXPERIMENTAL RESULTS AND ANALYSIS
The four important parameters in the FCTR-LFM model algorithm are confidence level λ, number of latent factor F, learning rate a, and regularization parameter w.
In this section, we first analyze the influence of each parameter on the algorithm. After that, we compare the algorithm with other related algorithms in terms of prediction and TOP-N recommendation accuracies. The experiment was completed on a common PC with Core i5 2.3 GHz CPU and 8 G RAM.

1) EFFECT OF THE APPROPRIATE CONFIDENCE LEVEL VALUE λ
In the FCTR-LFM algorithm, parameters F, α, N , and ω were set to 50, 0.007, 50, and 0.002, respectively. The appropriate confidence level value λ was adjusted in the interval [0, 1] by a step size of 0.1. The effects of the adjustment on the RMSE and TOP-5 of the FCTR-LFM are displayed in Figures 4 and 5, respectively. Figures 4 and 5 demonstrate that different confidence levels λ can make teachers fuzzy cluster into different teacher clusters. This phenomenon will change the matrix's sparsity and scale when reconstructing the evaluation matrix, which significantly influences the prediction and recommendation accuracy of the algorithm. Therefore, the algorithm needs to select the appropriate confidence level λ to achieve good results. Influence of different λ on the prediction accuracy. When other parameters are fixed, the experimental results show that the confidence level parameter λ is around 0.6, and the algorithm will obtain the optimal prediction accuracy. When other parameters are fixed, the experimental results show that the confidence level parameter λ will obtain the optimal recommendation accuracy around 0.5.

2) EFFECT OF THE LATENT FACTOR NUMBERS F ON THE ALGORITHM PERFORMANCE
Under the same experimental environment, parameters α, w, and N were set 0.01, 0.002, and 50. The number of latent factor F increased from 10 to 100, the FCTR-LFM algorithm (λ = 0.5) and the classic LFM algorithm on the scoring prediction accuracy and running time were compared. The results are shown in Figures 6 and 7. Figure 6 demonstrates that when other parameters remain unchanged, the value of F has a significant influence on the overall performance of the FCTR-LFM and LFM algorithms. Within a certain range, the two algorithms' accuracy has been significantly improved with the increase of F. The score prediction accuracy of the FCTR-LFM algorithm is slightly better than that of LFM. When the value of F exceeds 80, the overall accuracy of the two algorithms is stable. Figure 7 shows that, in terms of running time, the training time of traditional LFM is mainly derived from iterative model training; thus, the training time is closely related to the FIGURE 6. Comparison of RMSE. With the increase of F , the accuracy of both algorithms is significantly improved. The accuracy of the FCTR-LFM is slightly better than that of the LFM. When the F value exceeds 80, the accuracy of the two algorithms tends to be stable. In the case of small F, the running time of the FCTR-LFM model is higher than that of the LFM model. Meanwhile, the running time of the FCTR-LFM model is significantly lower than that of the LFM model with the increase in F. This phenomenon shows that the FCTR-LFM algorithm has a good performance while maintaining high accuracy.

3) EFFECT OF LEARNING RATE α ON THE ALGORITHM PERFORMANCE
The learning rate controls the change of parameters in the iterative process, and its size is directly related to the final prediction results of the model. In the experiment, fixed confidence level λ = 0.5, latent factor number F = 80, regularization parameter w = 0.002, change learning rate a, and corresponding iteration number N . The experimental results are shown in Figure 8.

FIGURE 8.
Comparison of learning rate (α) with iteration times (N). As the model's learning rate goes from high to low, the convergence speed changes from fast to slow, and the number of iterations required to achieve a good recommendation effect changes from less to more. Figure 8 demonstrates that the learning rate drops from 0.008 to 0.002, and the RMSE of the model changes differently. When the model's learning rate is high, the number of iterations required to achieve a good recommendation effect is relatively small, and the convergence rate of the model is relatively fast. The optimal solution will be crossed when searching for the optimal local solution along the gradient direction. When the learning rate is small, much iteration is required to achieve a good recommendation effect. The model's convergence speed is relatively slow; thus, the optimal local solution requires a large amount of time to find. Although a small learning rate can produce a low prediction error, the algorithm's convergence speed also becomes slow. Therefore, in the actual algorithm application and considering the characteristics of the model itself, factors, such as the memory space and time performance requirements of the recommended system, should also be comprehensively examined to select a good learning rate and iteration times.

4) COMPARISON WITH OTHER METHODS
The performances of several existing approaches are compared on TRDs to justify the effectiveness of our proposed method.

a: BPR
Based on implicit feedback, the BPR is extended from the TOP-N ranking recommendation algorithm by the pairing hypothesis for item ordering by using implicit feedback, Matrix Factorization, and K-Nearest Neighbors model [83].

b: FISM
The FISM alleviates the existing TOP-N recommendation algorithm's sparsity by taking the product of two low-dimensional latent factor matrices as the similarity matrix [84].

c: FST
The Factored Similarity model with Trust (FST) introduces the mutual trust matrix and user similarity matrix into the FISM, thereby alleviating the existing TOP-N recommendation algorithm's sparsity and enhancing the accuracy of the ranking recommendation [85].
The experiments are used to observe how their recommended accuracy changes when the number of iterations increases. In the FCTR-LFM algorithm, the fixed-parameter confidence level λ = 0.5, the number of latent factor F = 80, the regularization parameter w = 0.002, and the learning rate a = 0.006. In the BPR, FISM, and FST algorithms, the optimal parameter settings described in the literature [61]- [63] are used. The experimental results are shown in Figures 9-11.  The above figures show that we combine fuzzy clustering with the LFM model to improve the FCTR-LFM model's design. In the TRDs, we compare the existing related models after the optimal parameters are selected. Some advantages can be observed in the recommendation accuracy of course teachers.

5) COMPARISON WITH USER-BASED CF, ITEM-BASED CF, AND PERSONALRANK
The CF algorithm refers to a recommendation algorithm designed based on user behavior data, which mainly includes: neighborhood-based algorithms, latent factor model, and graph-based random walk algorithm. Typical neighborhood-based algorithms are user-based collaborative filtering algorithms (User-based CF) and item-based collaborative filtering algorithms (Item-based CF) [86]. In this paper, the user is the teacher, and the item is the course. The typical representative of the latent factor model is LFM. The FCTR-LFM is based on LFM. The typical representative of the graph-based random walk algorithm is PersonalRank [87].
In the experiment, we fixed the parameters α, w, N , and F in the FCTR-LFM algorithm as 0.006, 0.002, 80, and 100, respectively. We fixed the parameter probability p in PersonalRank as 0.6. We compare the P@5 of these four algorithms on different sparse evaluation matrices. To obtain evaluation matrices with different sparsity degrees, we can consider adjusting the confidence level value λ to increase from 0 to 1 during teacher fuzzy clustering, with a step size of 0.1. After using different teacher clusters to reconstruct the evaluation matrix, the evaluation matrix's sparsity will be different. The results are shown in Figure 12.
As shown in Figure 12, with the change of confidence level (λ), the number of teacher clusters will differ. The sparsity degree of the reconstructed evaluation matrix will be different. When user-based CF and item-based CF are used to recommend the sparse matrix, their recommendation accuracy is low. However, the FCTR-LFM algorithm can give more accurate prediction values for all items in the sparse matrix, making the matrix dense, so the recommendation accuracy is high. The PersonalRank algorithm will randomly walk according to the probability p, establish the edges between the teachers and the courses, calculate the PR value between all courses that the teachers can be walked. And then, it recommends according to the size of the PR value, so its recommendation accuracy is higher than User-based CF and Item-based CF but lower than our designed FCTR-LFM.

6) RECOMMENDED CASE
In our dataset TRDs, take the example of recommending five teachers for a course numbered C015 (Cdf = 0.644316). The course's difficulty coefficient, the characteristics of the teachers who taught the course, and the course evaluation scores obtained by these teachers are shown in Table 4.
As shown from the table, these four teachers have different characteristics and get different teaching scores (Ts) for the course C015. The remaining teachers who are not listed have not taught the course. In the experiment, we fixed the parameters λ, F, N , a, and w as 0.9, 80, 50, 0.006, and 0.002. The operation process is as follows: Step 1: realize the fuzzy clustering of teachers.
(i) The teacher characteristics matrix X * is standardized to obtain the feature index matrix X , as shown at the bottom of the next page.
Step 2: use LFM to predict the evaluation matrix.
(i) Teachers' clustering is used to reconstruct the evaluation matrix.
We use the teacher cluster to reconstruct the evaluation matrix. According to (22), we first calculate the teaching score between the teacher cluster and the corresponding course and then multiply it by the course difficulty coefficient as the evaluation score. In this way, the original matrix size of 132 × 204 is reduced to 19 × 204, which greatly reduces the evaluation matrix scale, and obtains the following evalu- · · · 0.5395 · · · 0.7481 · · · 0.5376 0.5541 · · · · · · · · · · · · · · · · · · . . . · · · · · · 0.6437 In R, the evaluation score of the z 7 cluster on the course C015 is 0.5541, and that of the z 13 cluster is 0.4961.
Step 3: recommend the TOP-5 teachers for the C015 course.

VI. CONCLUSION
It is an important means to improve colleges' and universities' teaching quality to arrange suitable teaching teachers scientifically. It is also the goal pursued by the teaching management of colleges and universities. However, to achieve this goal, there are still greater difficulties. The difficulties mainly lie in quantifying teachers' characteristics, course difficulty, and teaching performance effectively, establishing an implicit correlation model among teachers, courses, and teaching performance, and how to use algorithms to make predictions and recommendations efficiently and accurately. These difficulties have not been well solved. Therefore, under the guidance of pedagogy theories and methods, we scientifically established the quantitative models of teacher characteristics and course difficulty. Combined it with historical evaluation data and established a high-dimensional large-scale sparse course teacher recommendation evaluation matrix as the data set, which implied the correlation among the three. On this basis, we construct the FCTR-LFM algorithm based on fuzzy clustering and LFM. The algorithm firstly adopts the improved fuzzy clustering to realize automatic clustering according to the characteristics of teachers. It then uses the clustered teacher cluster to reconstruct the dataset to reduce the data set's size and sparsity. Then the improved LFM is used to predict the teaching evaluation score between the teacher cluster and each course, which improves the prediction efficiency. Finally, the improved TOP-N recommendation method is used to sort the teacher clusters according to the course and the predicted evaluation. Then the TOP-N recommendation is realized according to the teacher similarity degree within the cluster. Experimental results show that this algorithm can effectively implement the recommendation of course teachers and has the following characteristics: (i) According to teachers' characteristics, the improved fuzzy clustering method can realize the unsupervised automatic clustering of teachers, which reflects the flexibility. It solves the problem of cold start and avoids the disadvantage of using other clustering methods to force teachers to be divided into the specified number of categories.
(ii) The teacher cluster after clustering is used to reconstruct the data set, which greatly reduces the size of the data set, reduces the sparsity, and avoids the influence of extremely sparse data on the recommendation results.
(iii) Using the improved LFM model, the recommended matrix is decomposed into the product of two low-order matrices with an implicit factor association, which improves the prediction efficiency and accuracy.
(iv) The improved TOP-N recommendation algorithm is used to sort the teacher clusters according to the course. The teacher clusters are then sorted according to the teacher similarity, which realizes the TOP-N recommendation and improves the recommendation's efficiency and accuracy.
Although this method solves the problem of scientifically recommending suitable teachers to the course, it also has some limitations: (i) The workload of data collection is heavy. Teachers' characteristics are conducive to solving problems not only depend on the methods and results of previous research on teacher characteristics but also on whether the methods of sorting out the results of questionnaire data are properly used. Besides, the course difficulty coefficient's acquisition also depends on the knowledge structure in the course syllabus.
(ii) The parameter optimization of the algorithm is excessively dependent on the data set. The algorithm can obtain different optimization parameters on different data sets, which need to be determined by many experiments.
To sum up, our future research focuses include: (i) Further excavate the influencing factors of recommending suitable teachers for the course. (ii) Build an effective method to simplify the data acquisition process. (iii) Find more effective methods to improve the accuracy of prediction and recommendation.