Improvement of Collaborative Filtering Recommendation Algorithm Based on Intuitionistic Fuzzy Reasoning Under Missing Data

Commodity recommendation plays an essential role in the marketing field in the Internet era, and collaborative filtering, as a powerful technique of commodity recommendation, has been widely concerned in both academic studies and practical applications. Existing research on collaborative filtering often uses methods such as genetic algorithm and neural network to solve the sparsity and cold start problems while ignoring the fuzziness of users’ ratings on goods or services. To solve the problems, we propose a recommendation algorithm (IFR-CF) based on intuitionistic fuzzy reasoning and collaborative filtering. In this algorithm, the characteristic coefficient in intuitionistic fuzzy reasoning is used to replace the traditional similarity coefficient to determine neighbor set, and the finite prior ordering method is used to replace traditional algorithm to recommend commodity. Two groups of data are extracted from Movielens and Jester datasets for experiments, and the MAE value generated by the recommended items is taken as the metric to verify the algorithm performance. Experimental results show that compared with the traditional algorithms, our algorithm achieves lower MAE value and higher recommendation accuracy. Meanwhile, the intuitionistic index of fuzzy set is taken into account in the calculation of the hesitation coefficient, which provides a novel solution to the problem of missing scoring data of users.


I. INTRODUCTION
WITH the development and popularization of e-commerce, online shopping has become a typical behavior of the public. When users surf the Internet, they often receive recommendations for products. The recommendation of these products will undoubtedly provide a powerful boost for merchants to attract customers. Internet service platforms have been working hard to offer product recommendations to users more accurately so that products recommended to users are closer to their interests.
The associate editor coordinating the review of this manuscript and approving it for publication was Xin Zhang . In the product recommendation research, the nearest neighbor collaborative filtering recommendation algorithm is widely used. The main idea of the algorithm is to discover users' preferences by mining their historical behavior records, group users according to different preferences and recommend similar products [1]. So far, collaborative filtering recommendation algorithms are continually being improved. To alleviate the impact of data sparseness and cold start on the recommendation, Wu et al. [2] propose to combine the limited Boltzmann machine model and trust information to improve the performance of recommendation, where the trust information is the degree of trust between the target user and other users. In the recommendation process, the accuracy of the recommendation is improved by considering the trust degree of the target user to the recommendation opinion. Mohammadpour et al. [3] combine genetic algorithm and gravity simulation local search algorithm to enhance clustering effects in collaborative filtering, reduce mean absolute error (MAE) and root mean square error (RMSE), thus make coverage criteria better. Laishram et al. [4] use an evolutionary algorithm to obtain the subsets of highly correlated terms, and the local least squares method is used to analyze the user-subgroup of highly relevant items to calculate the missing scores. Genetic algorithm, as an essential technology in modern intelligent computing, is also introduced into the collaborative filtering recommendation algorithm to improve the quality of clustering and the accuracy of the recommendation [5]- [7]. Zhang et al. [8] propose a new collaborative filtering recommendation algorithm that combines the technique of time window and rating prediction to estimate the preferences of the users without any rating items. Besides, the singular value and the trust factor are also considered in the calculation of the collaborative filtering algorithm [9], [10]. To meet the individual needs of users, the personalized recommendation algorithm combining probabilistic semantic clustering analysis and collaborative filtering is used to recommend the most relevant items to users [11]. From the perspective of user reviews, some scholars generate user representations of the overall sentiment about item characteristics from their comments, analyze the user sentiment to explore their interests, and complete the recommendations [12]. Although the above methods improve the accuracy of recommendation, there still exist uncertainties in practical problems, including the inability to accurately determine user interests and the inability to accurately describe in the recommendation, which needs further research to solve them.
In this paper, intuitionistic fuzzy reasoning is introduced into the collaborative filtering recommendation algorithm, and the commodity recommendation problem based on similar users is studied from the perspective of intuitionistic fuzzy reasoning. In our solution, user interests are used as fuzzy sets, and rating values on commodities are converted into the membership degree of interest fuzzy sets. The membership degree, non-membership degree and intuition index in intuitionistic fuzzy reasoning are used to determine the neighbor users and then recommend the products for them.
The contributions and advantages of our work are as follows: • Intuitionistic fuzzy reasoning, as an extension of fuzzy reasoning, is introduced to solve the problem of commodity recommendation. It cannot only describe the fuzzy concept of ''Both this and that'', but also describe the fuzzy concept of ''Neither this nor that'', which makes the mathematical description more consistent with the nature of fuzzy objects in the real world.
• Feature coefficients in intuitionistic fuzzy reasoning are used to replace the conventional similarity coefficients to determine the neighbor set, and the finite relationship order method is used to replace the traditional recommendation algorithm for product recommendation, which improves the accuracy of the recommendation results.
• During the calculation process, the intuition index of intuitionistic fuzzy set is taken into account by the hesitation coefficient, and a new method is proposed to deal with the problem of missing user scoring data from the perspective of intuitionistic fuzzy reasoning.

II. RELATED WORK
The recommendation system is the technique to provide personalized services based on historical data of users, and has been applied to various fields to offer suitable services for different types of users. However, in the current network environment, the privacy of users is better protected, and many users are reluctant to disclose their personal information. Therefore, the recommendation system is usually faced with highly sparse data sets, which increases the difficulty of recommendation. To solve the problem of data sparseness and cold start in the recommendation system, Iwanaga et al. [13] use a shaped constraint optimization model to estimate the probability of item selection based on the recently visited web pages and corresponding frequencies of each user. Huang et al. [14] propose a new low-rank sparse cross-domain recommendation algorithm to improve the recommendation performance between related domains through knowledge transfer, and also propose a solution to the data sparsity problem to improve the recommendation quality. Batmaz and Kaleli [15] introduce a collaborative filtering system based on multiple criteria to improve the personalization of the system, and extract the nonlinear relationship between users and items by deep learning. It makes the recommendation algorithm more personalized and provides more accurate product recommendations for different users. Yu et al. [16] construct a Contextual-boosted Deep Neural Collaborative filtering (CDNC) model for item recommendation, which uses item introductions and user ratings to alleviate the cold start problem of item recommendation. Deng et al. [17] design a novel deep hybrid recommendation framework, Neural variational collaborative filtering (NVCF), which incorporates the profile information of user and item in the generation process to alleviate rating sparsity, and better latent user/item representations are obtained. Xiao and Shen [18] propose a novel deep generative model, namely Neural Variational Matrix Factorization, which also integrates profile information of users and items, and ratiocinates the complex nonlinear representations of users and items through the neural network.
In terms of data sparseness, many new theories and models have been introduced into the traditional algorithm to improve the accuracy of recommendation. Due to the user ratings for products or services are fuzzy, fuzzy mathematics is introduced into this study to deal with such problems. Through intuitionistic fuzzy reasoning, the similarity between users is better measured, and the fuzzy problem can be described quantitatively. Moreover, through the VOLUME 8, 2020 introduction of the hesitation coefficient, the missing data in scoring matrix is filled, and the sparsity of data is reduced.

III. TRADITIONAL COLLABORATIVE FILTERING ALGORITHM
The collaborative filtering recommendation algorithm is the earliest and well-applied recommendation algorithm, which is used primarily for preference prediction and item recommendation. By mining a specified user's historical behavior data, the algorithm analyzes the user's interest, finds other users with similar interest in the user set, synthesizes the evaluation of these related users on certain items, forms the system's preference prediction for the items, and finally recommends items with similar interest for the user [19], [20].
In the collaborative filtering process, users' preferences should be collected first, then similar users should be found, and items finally should be recommended to the target user. When searching for similar users, the Pearson correlation coefficient is generally used to calculate the similarity [21] as Equation 1. (1) x and y are two variables, and they are used to represent two users(i.e., two fuzzy sets) in this paper, n is the number of products being rated, and x i and y i are two users' ratings for the product i. The similarity calculation founds the neighbor users, and then the item recommendation is performed for the target user according to the preference of his neighbor users. In the recommendation process, the interest degree of the user can be calculated by Equation 2 v is the neighbor user of user u, n is the number of neighbor users, p(u, i) is the similarity between user u and user v, and γ (v, i) is the user v's interest in the product i.

IV. INTUITIONISTIC FUZZY REASONING
Intuitionistic fuzzy reasoning is a thinking process and method to infer a new proposition according to known propositions and given rules. There are many inference methods, including intuitionistic fuzzy implicit reasoning, conditional reasoning, multiple reasoning, multidimensional reasoning, and multiple multidimensional reasoning.
Let Q be a given universe, then an intuitionistic fuzzy set A on Q is A = { q, µ A (q), γ A (q) |q ∈ Q} where 0 ≤ µ A (q) ≤ 1 and 0 ≤ γ A (q) ≤ 1 are the membership function µ A (q) and the non-membership function γ A (q) of the intuitionistic fuzzy set, respectively, and all q ∈ Q, 0 ≤ µ A (q) + γ A (q) ≤ 1 is true [22].
When A ∈ IFS(Q) and A contains the membership function µ A (q), the non-membership function γ A (q) and the intuition index π A (q) [23], The relationship between them satisfies the following constraints: is the intuitionistic index of the intuitionistic fuzzy set A, which is used to measure the hesitation degree of q to A. From the above definition, the range of hesitation is 0 ≤ π A (q) ≤ 1.
Let A ij , B i ∈ [0, 1] be intuitionistic fuzzy reasoning, and then the form of intuitionistic fuzzy multi-dimensional reasoning is recorded as Intuitionistic fuzzy reasoning is carried out according to specific rules, when all conditions of rule A ij are satisfied, B i can be inferred, and the final result can be obtained gradually.
In this paper, we use the method of direct fuzzy reasoning to fuzzify the data and calculate the similarity between fuzzy sets. Finally, according to the preference of the neighbor users, we recommend products in that the target user is most likely interested.

V. COLLABORATIVE FILTERING ALGORITHM BASED ON INTUITIONISTIC FUZZY REASONING
In this section, intuitionistic fuzzy reasoning is introduced into collaborative filtering recommendation algorithm, and we use this new algorithm to recommend products for users based upon their interests and preferences. The basic ideas are as follows: • Fuzz the relevant data and calculate the corresponding membership degree and non-affiliation degree; • Use the similarity formulation of intuitionistic fuzzy rough set to calculate the similarity between fuzzy sets, and find the neighbor users; • According to the preference of the neighbor users, the product recommendation is executed to the target user.

A. DATA FUZZIFICATION
In the current network environment, after using or accepting a certain product or service, the user is often required to evaluate the product or service, and then reflect his satisfaction, interest, or other information. Due to the wide variety of products in the network and millions scale of users, a huge data network has been formed. In the real-life e-commerce platform, goods can often be divided into several categories. For example, in the movie recommendation system, movie categories can be divided into comedy, science fiction, action, etc. Without loss of generality, suppose that n types of products are divided into K categories. As shown in Table 1, r ij is the rating score of user i on product j, and 1, 2, . . . , n is  the product sequence, 1, 2, . . . , m is the user sequence. The maximum value of r ij is r max , and the unrated values are 0. According to the normalization operation, r ij is normalized to the value among the [0, 1]. Suppose z ij is the normalized value of the user's rating, which is a fuzzy number that indicates the interest degree of user i in product j. Consequently, C i can be converted into the fuzzy set A i as shown in Table 2.
It is worth mentioning that in solving practical problems, it is necessary to divide or define corresponding levels according to the significance of rating scores.

B. FIND NEIGHBOR USERS
Through the similarity calculation between the target user and other users, the neighbor user is found among many users, and then the product recommendation for the target user is performed through the preference of the neighbor users. However, in the network filled with a large amount of data, there always exists missing data, which results in incomplete and inaccurate data, and further affects the similarity calculation. In order to reduce this negative impaction, this paper adopts the similarity measurement method related to intuitionistic fuzzy sets for calculation. Definition 1: (Distance between intuitionistic fuzzy sets) Let A and B be the intuitionistic fuzzy subsets of the given domain X , d is a mapping and satisfies that d : , it is called the distance between intuitionistic fuzzy sets A and B [24].
The distance between A and B can be obtained by using Hausdorff measure [25].  d(A, B). The theorem is proven. Under discrete data, the standardized Hamming distance is So the similarity calculation formula of intuitionistic fuzzy sets A and B is In the real scenario, considering that user has no evaluation or lack of data in the evaluation of the product, it is generally considered that the evaluation may have either a high score or a low one, so it is considered reasonable to assume that the probability of high scores and low scores is half in the absence of other prior knowledge, that is, ρ = 1 2 is generally reasonable.
For the calculation of the intuitionistic index π A (x i ), the method of the intuitionistic fuzzy rough set is introduced. Since the intuitionistic fuzzy set is a special state of the intuitionistic fuzzy rough set [27], it is reasonable to use this method for the approximation calculation of intuitionistic fuzzy set with missing data (i.e., data with a score of 0).
Let A,B be two intuitionistic fuzzy rough sets on the non-empty universe X = {x 1 , x 2 , . . . , x n }, is the lower approximate membership degree of A and µ A + (x) is the upper one, γ A − (x) is the lower approximate non-membership degree of A and γ A + (x) is the upper one, and the mathematical expression of B is similar to A. The calculation method of the upper approximate membership degree µ A + (x) and the lower approximate membership degree µ A − (x) is as follows: where z is the fuzzy data in set A, n is the number of fuzzy data in set A, and a is the number of missing data in set A,ᾱ is the average of the amount of the scores above the median in the scoring range,β is the average of the sum of the scores below the median in the scoring range. r max is the maximum value in the scores, and r min is the minimum value in the scores. The calculation method of π A (x) is shown in Equation 9 π

C. PRODUCT RECOMMENDATION FOR TARGET CUSTOMERS
There may be many neighborhood users, and each neighborhood user will score the predicted objects. We can use the priority ordering method to determine the degree of interest of the neighbor users to the predicted objects, determine the priority order of these objects, and finally make recommendations. Suppose the products to be recommended are c 1 ,c 2 ,. . . ,c n , the total is n, and their priority matrix C = c ij n×m . In matrix C, let c ij be the degree of similarity between the user's interest in the product u i and the interest in the product u j . Use the following steps to determine c ij c ij ∈ [0, 1] , i = j, i, j = 1, 2, . . . , n : • When two products are compared, users rate the two products differently, which is c ij + c ji = 1; • When comparing the same product, c ii = 0, i = 1, 2 . . . , n; • When two products have the same rating, c ij = c ji = 0.5 is recorded.
Since it is the comparison between the scores of commodities, the similarity of interest values between different scores can be determined by Table 3.

Algorithm 1 Collaborative Filtering Recommendation Algorithm Based on Intuitionistic Fuzzy Reasoning(IFR-CF)
Require: User set {u 1 , .., u m }, Commodity set {c 1 , . . . , c n }, Score matrix r(m × n), Category K , Regulatory factor ρ,Threshold δ, Score levels {r min , r min + 1, . . . , r max } Ensure: Recommended commodity set R U ← ∅, T ← ∅, R ← ∅ /*normalization*/ for all commodity j ← 1 to n do r max ← max{r 1j , . . . , r mj } r min ← min{r 1j , . . . , r mj } for all user i ← 1 to m do z ij ← (r ij − r min )/(r max − r min ) end for end for /*membership calculation*/ for category k ← 1 to K do C k ← {c|c ∈ {c 1 , . . . , c n } and c belongs to category k} for all user i ← 1 to m do a ← the number of zero elements in A ik α ← ( (r min + r max )/2 + r max )/2 β ← (r min + (r min + r max )/2 − 1)/2 end if end for end for /*Select similar users for target user j*/ for all candidate user i with similar interests to user j do Remove all elements whose values equal to λ in T until T = ∅ or R = ∅ return R Algorithm 1 demonstrates the collaborative filtering recommendation algorithm based on intuitionistic fuzzy reasoning. First, the raw scoring data is normalized (lines 2-9). In the score data to be analyzed, we find the maximum (line 4) and minimum (line 5) value of all the scores, then normalize the score data for all users and get the normalized results (lines [6][7][8]. Second, the membership and nonmembership of users for different product categories are calculated (lines 10-29). For each category k, the membership of user i is calculated using the normalized results of his rating data for products (line 15). If all normalized data for the category k are not zero (line 16), the values of the intuition index (line 17) and the non-membership (line 18) are obtained. Otherwise, the upper membership degree (line 23) and the lower membership degree (line 24) of the user i are calculated to obtain the intuitive index (line 25) and the nonmembership degree (line 26). Third, we find the target user's neighbors (lines 30-36). Calculate the similarity between each user and the target user (line 32). When the calculated similarity value is higher than the specific parameter value δ, the user i is the neighbor of the target user j (lines 33-35). Fourth, recommend commodities for the target user (lines 37-57). Here, the priority relation ordering method is used. Refer to Table 3, the priority relation matrix is constructed according to the grading relation of each neighbor user to the predicted products (lines 38-40). For all matrices, calculate the mean value of matrix elements to obtain the final priority relation matrix P, and then insert all elements of P into the set T (lines 41-42). The elements in T are set as threshold λ one by one in descending order, and λ is compared with the elements of P in turn. If there exists a row i whose all elements except the diagonal are not less than λ, the operation is completed, and the commodities corresponding to the i-th row are recommended to the target user (lines 43-57).

D. EXAMPLES
In order to clearly illustrate our algorithm, we take users' movie ratings as an example, apply the algorithm to find neighbor customers through the similarity calculation, judge  their preferences for movies according to the neighbor users' ratings of the movie i, j, k, and based on this judgment, recommend the movie they are most interested in to the target user.
In the example, missing data is recorded as 0 point. Missing data may be any of the ratings{1, 2, 3, 4, 5}, and the probability of each score is the same. The treatment of missing data in this paper does not calculate specific prediction values but predicts the membership of the fuzzy set by the possible selection value of the missing data. It is generally considered that a score of 3 or above is of interest, and a score of less than 3 is not of interest. The upper membership degree and lower membership degree of each fuzzy set are calculated based on the intuitionistic fuzzy rough set, and then the intuition index is calculated. Take the data set [2,0,3] corresponding to customer 2 as an example, where 0 is the missing data, and score 4 is the average score in that the user is interested. The normalization method is used to calculate the upper membership of the fuzzy set is 0.6; The score 1.5 is the average score when the user is not interested. The normalization method is used to calculate the lower membership of the fuzzy set at 0.43, and then the intuition index is 0.17. In the case of missing data, the membership of the fuzzy set is 0.33, the intuition index is 0.17, and the non-membership degree of the fuzzy set is 0.5. Therefore, intuitionistic fuzzy reasoning is performed on the set of data [2,0,3], and the result is (0.33, 0.5). The same processing is performed on the data in Table 4 to obtain the intuitionistic fuzzy inference data shown in Table 5.
Through the intuitionistic fuzzy processing of missing data, the intuition index is taken into account, which reduces the possibility of data result distortion, makes the data better reflect the actual situation, and lays a good data foundation for the calculation of similarity between users. This method increases the accuracy of the recommended results.
We use Equation 6 to calculate the similarity between users and get S(1, 4) = 0.77,S(2, 4) = 0.83, S(3, 4) = 0.77 respectively. Assume that δ = 0.8, that is, the similarity value greater than 0.8 is the neighbor customer, so customer 2 is the neighbor of customer 4. VOLUME 8, 2020 When recommending a movie to a target customer, the priority ordering method in intuitionistic fuzzy reasoning is used to determine the priority of the predicted movie, and the priority relation matrix of i, j, k is obtained.

VI. EXPERIMENT
To measure the recommendation effect of our algorithm, we compare it with traditional recommendation algorithms and other improved recommendation algorithms from the perspective of user interest. The algorithms to be compared together are as follows: • Traditional collaborative filtering algorithm(ICF) [1]: The final recommendation is generated by user's nearest neighbors. Firstly, the correlation between items is calculated. Then the score of the target user can be predicted by the scores of neighbor users, and the recommendation can be made; • Collaborative filtering algorithm based on time factor(TFCF) [8]: By introducing time factor, the value of different data can be better divided. The more recent the evaluation, the more likely it is to reflect the current interest of the target users. Since users' interests are not fixed, and they are only interested in a limited number of items at a certain time, the items that users prefer in a short period have a higher degree of similarity.
• Hybrid recommendation algorithm based on collaborative filtering and implicit semantic model(LFIRS-CF) [11]: By using the liner fusion method, the collaborative recommendation algorithm based on item similarity and the crypto-semantic model recommendation algorithm based on user interest migration are fused to improve the recommendation accuracy.
• Collaborative filtering recommendation algorithm based on fusion of user interests and rating differences (CF-UIRD) [12]: This collaborative filtering algorithm considers both the similarity measure of user interest change and the score difference similarity measure. By combining these two factors, the algorithm can describe user preference more accurately.

A. DATASET
Experimental data are extracted from the Movielens dataset and Jester dataset. The Movielens dataset is a data set on movie ratings, which contains 100,000 user ratings for 1682 movies with a rating range from 1 to 5. The Jester dataset includes 73,421 users' rating data on 100 jokes, totaling about 4.1 million records, and each rating with an integer value range from -10 to 10. In our experiment, some records are extracted from the two datasets, and Table 6 describes some basic characteristics of the data.

B. MEASUREMENT METRICS
We use the mean absolute error (MAE) as a measure which evaluates the accuracy of the algorithms by calculating the average error between the predicted preferences and users' actual preference scores. When the MAE value is smaller, the error is smaller, so that the recommendation accuracy is higher.
X represents user set, m represents the number of users, h(x i ) represents the predicted score of the calculation system, and y i represents the actual score of the user.

C. ANALYSIS OF EXPERIMENTAL RESULTS
We experimentally vary the number of nearest neighbors to observe the MAE values of different algorithms and verify their accuracy. In the first group of experiments, 500 users were selected from the two data sets to test the accuracy of the algorithms. The comparison experiments are shown in Figure1 and Figure2. It can be observed from Figures 1 and 2 that as the number of nearest neighbors increases, the MAE value of each algorithm shows a decreasing trend, which indicates that the recommended accuracy is getting higher and higher. In addition, compared with the traditional algorithm (ICF). The MAE values of other algorithms are lower, which indicates that the traditional algorithm is improved by introducing time factor, hidden semantic model, user interest and scoring differences, and intuitive fuzzy reasoning, thus increases the accuracy of the traditional collaborative filtering algorithm. At the same time, it can be found in the value comparison that the collaborative filtering recommendation algorithm (IFR-CF) under intuitionistic fuzzy reasoning has a lower average absolute error value and higher recommendation accuracy than other algorithms, which indicates that the introduction   of intuitionistic fuzzy reasoning into collaborative filtering recommendation algorithm is reasonable and achieves better improvement effect.
In the second group of experiments, 1000 users were selected from the two datasets to check the accuracy of the algorithm. The comparison experiments are shown in Figures 3 and 4.
Through the comparison, the nearest neighbors selected are more representative when the amount of users increases, so the overall accuracy is improved compared to the previous group of users; In addition, the MAE values of the collaborative filtering recommendation algorithm under intuitionistic fuzzy reasoning in the two data sets are both lowest. When the number of nearest neighbors is 40, the MAE values calculated in the MovieLens data set and the Jester data set are 0.42 and 0.45, respectively, and there is a certain improvement in the recommended accuracy, the accuracy is 15% and 18 % higher than the traditional algorithm. In general, the advantages of the collaborative algorithm based on intuitionistic fuzzy reasoning are mainly reflected in the following two points: First, from the perspective of the description of the problem, the reference of intuitionistic fuzzy reasoning can make those unclear boundary problems be described using mathematical language, and break the limitations of the traditional collaborative filtering algorithm in semantic description, so that more objective problems can be quantitatively solved. Second, from the perspective of the algorithm theory, the recommendation algorithm under intuitionistic fuzzy reasoning defines three parameters: membership degree, non-membership degree and intuition index, and these parameters are used to the calculation. The missing scores are predicted and supplemented by using the intuition index, which makes the calculation result more accurate and reduces the calculation amount of recommended products in the traditional algorithm.

VII. CONCLUSION
This paper combines the intuitionistic fuzzy inference method with collaborative filtering recommendation algorithm, introduces non-membership degree function and intuition index into the calculation of collaborative filtering recommendation algorithm, and draws the following conclusions: • The introduction of intuitionistic fuzzy reasoning breaks the limitation of the traditional collaborative filtering recommendation algorithm in terms of scalability and accuracy. It describes the fuzzy problem with unclear boundaries in reality and expands the scope of research.
• This method not only considers the problem more comprehensively but also introduces the membership degree and non-membership degree parameters, improves the traditional similarity algorithm, replaces the similarity coefficient with the characteristic coefficient of intuitionistic fuzzy reasoning, which makes the final recommendation result more accurate, more consistent with the actual situation hidden in raw data.
• When calculating the intuition index, this method reasonably uses the hesitation parameter to interpret the missing data, which provides a solution to the problem of missing scoring data. VOLUME 8, 2020