Ant Collaborative Filtering Addressing Sparsity and Temporal Effects

Though collaborative filtering (CF) is a popular and successful recommendation technique, it still suffers from the data sparsity and users’ evolving taste over time. This paper presents a new collaborative filtering scheme: the Ant Collaborative Filtering. With the mechanism of pheromone transmission between users and items, the proposed method can pinpoint most relative users and items even in the case of the sparsity situation. Also, by virtue of the evaporation of existing pheromone, the proposed method captures the evolution of user preferences over time. Experiments are performed on the standard, public datasets and two real corporate datasets, which cover both explicit and implicit rating data. The results illustrate that the proposed algorithm outperforms current approaches in terms of accuracy and changing data.


I. INTRODUCTION
Though widely adopted for a recommendation, collaborative filtering still lacks capabilities of handling two fundamental issues: (a) data sparsity and (b) evolving preference.
Data sparsity is one of the difficult problems limiting Collaborative Filtering, makes the calculated similarity among users and items inaccurate, further rendering CF ineffective. Two aspects mainly cause the data sparsity problem. First, a user's ratings toward items are always small comparing to the enormous number of items exist in a system. Second, ratings toward an item are also small comparing to the enormous amount of users exist. This problem gets even worse when it applies to new users and items, which is referred to as the cold-start problem.
In a real-world recommender system, another practical but often overlooked issue is to consider the evolution of the user interests over time. For example, a customer may like to see recommendations about digital cameras if she/he plans to buy one. However, after purchasing, she/he may be no longer The associate editor coordinating the review of this manuscript and approving it for publication was Seyedali Mirjalili . interested in the recommendations on buying a new digital camera. Therefore, the time factor is of vital importance for the success of recommender systems in many applications, such as e-commerce, advertisement, and news services.
Ant colony optimization (ACO) borrows the idea from real ant colonies behaviour. The ants can easily find the best path for foods via using the pheromone as an intermediate communicator. ACO algorithms successfully exploit this characteristic to solve, for example, discrete optimization problems [29], [32].
To recommend using the sparse and evolving preference data, this paper proposes a novel collaborative filtering algorithm, named the Ant Collaborative Filtering (ACF). Similar to other swarm intelligence algorithms, the ACF can handle very sparse (rating) data by virtue of pheromone transmission. Pheromone is a chemical signal that triggers a response in another agent. Pheromone concentration increases the probability that an ant will follow a path. We make an analogy of users to ants that carry specific pheromone, initially. When the user rates a movie or simply reads a piece of news on the Web, our algorithm links the user and the item by the mechanism of pheromone transmission from the user to the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ item and vice versa. Therefore, the types of pheromone and the amounts, constitute a clear clue of historical preferences, which turns out to be strong evidence for finding similar users and items. An essential property of the deposited pheromone in real ant environment is the pheromone evaporation over time. Pheromone evaporation allows ants to adapt to changing environments, also allows ants to avoid being trapped in a suboptimal solution. By virtue of the evaporation of existing pheromone, the proposed method captures the evolution of user preferences over time.
The paper is structured as follows. Section II investigates the existing work. Section III details the ACF algorithm. Section IV represents and analyses the experiment results. Section V concludes the paper and discuss future works.

II. RELATED WORK
As indicated in the title, our proposed method targets two problems in the recommendation, i.e., sparsity and time effect. Therefore, we present related work respectively below from these two aspects.

A. SPARSITY
Various approaches and techniques have been proposed targeting eliminating the sparsity problem. Among them, most achieve good recommendations. However, it is still hard to reach a general solution for tackling with sparsity [13], [20], [33].
The most common technique is applying dimension reduction techniques towards the user-item matrix [10], [64]. The dimension reduction technique tackles the sparsity problem by removing low representative or noise data to condense the matrix.
Other techniques, such as Latent Semantic Indexing (LSI) from information retrieval, are also often adopted [23], [35]. One drawback of these approaches is that potentially useful information might be unavoidably removed during the reduction process.
Other approaches tackling sparsity include utilizing associative retrieval techniques toward the bipartite graph consist of items and users [19], [41], content-boosted CF [9], and using item-based similarity to replace user-based similarity [43].
Instead of using dimension reduction towards the useritem matrix, this work presents an Ant Colony Optimization(ACO) based approach for tackling the sparsity problem. ACO algorithms are proposed after observing the automatic accumulation or communication phenomena common to the ant colony. The ACO has been widely applied in various domains, such as clustering and information retrieval [24], web search [3], [26], [38], [44] and social network [4], [21], [30], [40], [46], [47]. The ACO belongs to a more general group of simulation algorithms named the swarm intelligence algorithms [31].
Among which, the recommender system is one of the many applications [34], [45]. Reference [50] used the particle swarm optimization algorithm to model every user as a particle in a multi-dimensional space with each dimension representing a movie genre. The advantage of such an approach is that every particle has a unique position and velocity. Therefore, the system is dynamic and to a certain extent probabilistic in nature, which is preferable in recommender systems. Reference [2] introduced a notion of dynamic trust among users to help select user neighbour based on that. Reference [6] treated recommendation as a ranking problem, factorize the user-item matrix into feature vectors first, then learn a linear ranking model via optimizing a mean average precision metric. Reference [62] used binary particle swarm optimization to achieve the best performing set satisfying contextual constraints. In [1], the particle swarm optimization algorithm was adopted to estimate the weights of the users and items. The weight space can be quickly searched using this algorithm and lead to a converge at a nearly global optimal value. After that, the optimal value was used to improve the prediction model.
In recent years, deep learning-based recommendation methods have increased exponentially. However, regarding the sparsity problem, the deep neural network still needs to rely on additional information or architecture or implicit data. More specifically, [48] incorporates side information, such as user profiles and item descriptions, to mitigate the sparsity and cold start influence. DeepCoNN [60] adopts two parallel CNNs to alleviate the sparsity problem by exploiting rich semantic representations of review texts with CNNs. Zheng et al. [61] proposed incorporating implicit feedback to overcome the sparsity problem of the rating matrix.
To the best of our knowledge, there are only a few works that directly relate to our model in the CF domain.

B. CONSIDERING TIME EFFECTS
The data in recommender systems keep changing [37]. Not only new users and new items are continuously added to the system, the preferences of existing users and the features of existing items also change over time [36]. To make better recommendations with the temporal diversity [24] into consideration, recommendation algorithms should update the learned model efficiently and appropriately.
Methods used in earlier years include: [65] tackled the time effect by encoding time order information into the data, transforming to a univariate time series problem, and adopting a decision-tree learning process. During computing the weights for different items, [7] adopted the strategy of decreasing the weights of the old data. However, this approach made an assumption that the old data was trivial and only new data was important, which however was not always true. Reference [18] proposed to use the concept of life cycles to formalizes a user's long-term behaviour and determine the user's stage, which also used lately in [28]. Taking this method a step further, a session-based temporal graph was proposed in [54]. The advantage of this approach is that it can model the long-and short-term factors at the same time. Reference [22] analyzed the evolution of ratings in the Netflix movie recommender system and proposed two CF algorithms that considered the time effects, i.e. item-item neighbour method and rating matrix factorization method. Both of the methods were elaborately tailored for the Netflix movie rating data and achieved the best results thus far reported.
Reference [39] introduce a novel time-aware recommendation algorithm that is based on identifying overlapping community structure among users. Their strategy is to detect time evolution by the same time to minimize the sparsity effects, to identify overlapping community structure amongst the users.
Other recent approaches dealing with time effect include Hidden Markov Chain [57], fusion model [52], probabilistic model [27], latent factor [56]. Among these methods, [5]'s work is similar to our idea in that it assumes a hidden network structure exists among the items, and each user tracks a sequence of items in this network. The difference is that in their work, the dependencies between the items are modelled based on statistical diffusion models, and the parameters are obtained through the maximum-likelihood estimation. In contrast, in our proposal, the time effect is modelled via modelling the evaporation of pheromone, which is a more natural design.
We particularly notice a category of work based on deep learning. According to a latest comprehensive survey [59], among the many kinds of neural networks, Recurrent Neural Network (RNN) is suitable for sequential data processing. As such, RNN becomes a natural choice for dealing with the temporal dynamics of interactions. Some examples include [8], [15], [53], [55].
As pointed out in [22], the preference evolution is subtle and delicate. For a single user, we can only use a few preference instances inundated by millions of non-relevant data. The methods mentioned above have been designed and tested for particular recommendation scenarios. However, in a more general sense, a robust recommendation algorithm considering time evolution for a broader range of applications including both rating-based and ranking-based recommender systems is still absent and is the main focus of this paper.
By virtue of the evaporation of existing pheromone [49], which naturally reflects the changing preference of existing users and existing items, our method adopting ACO considers the evolution of user preferences over time in a more natural manner.
Along the research line of adopting ACO based methods, we also identified some related works. One closely similar work with ours is [25] which also based on ant colony algorithm. They also take into account the user access time and frequency. In addition, our work not only assumes the user but also the items have the pheromone, which is exchangeable between users and items.
Another system utilizing trust information among ant colony is [11], their STARS system incorporates temporal information by exploiting timestamps of ratings, which uses an user-item rating matrix to store the ratings made on and before a time slot. Instead, our method does not need to maintain such a matrix, which may pose another sparse matrix. In addition, they use time values with day granularity, which is not a problem in our system as the evaporation feature of pheromone mimics the dynamicity.
To summarize, based on the examination of existing work regarding the two aspects: sparsity and temporal effects, we can see a need for a general approach that can resolve these two problems in an integrated manner. Thus we propose the ACO based recommendation system. The ACO algorithm has several enticing properties. The most prominent one is the dynamic and self-organizing nature of the algorithms [12]. Therefore, the problems that are complex and dynamic to be solved using other machine learning methods can directly be optimized for some utility function.

III. ANT COLLABORATIVE FILTERING
In the typical settings of ACO, every ant is identified by some kind of indicators for communication, often referred to as pheromone. The pheromone is used as an indirect communication medium. For example, consider the Traveling Salesman Problem (TSP) [16] that finds the shortest path on a graph. In TSP, each ant walks on the graph and leaves a pheromone on the path. Shorter paths will leave stronger signals. Other ants, when deciding the path to take, tend to choose paths with stronger pheromones with a higher probability, so that shorter paths are found.
Before proceeding to a more detailed description, the notations used are defined in Table 1.
The intuition of our ACF algorithm is that: given a pheromone that represents a user or a group of users, the item may share the user's pheromone when the item is rated. Meanwhile, an item transfers the pheromone that is already attached to the user. After a while, similar items receive similar patterns of pheromone, and then the users with similar choices become identical with respect to the pattern of the pheromone. The pheromone transmission process is illustrated in Fig. 1. We must note that the ratings are learned one by one in the original order of time. Recommendations can be generated in two strategies: (a) Provided similar users and similar items are identified, we can estimate the current user rating on the items that have not been seen before by employing memory-based CF methods. (b) We can rank the items according to the similarity of the pheromones to the users. The above two strategies correspond to the rating-based and the ranking-based recommendation systems, respectively.

A. TRAINING
First, we initialize the user pheromone by allocating every single user a unique pheromone with a value of 1. The pheromone for each item is empty. After user u i provides rating r i,j towards item v j , the pheromones are exchanged between each other. That is to say, the user updates its pheromone by adding the item's pheromone times by rating adjustment and a constant γ , which is introduced to control the spreading rate. Similarly, the item's pheromone is updated by adding the user's pheromone times by rating adjustment  and a constant γ . The rating adjustment is the gap between the rating and the average rating. The bigger it is, the stronger is the user's preference. If the rating is much higher than the average, then the transmission will be a strong positive ''plus''. Moreover, if the rating is much less than the average, then the transmission will be a strong negative ''minus'', which means that the user and the item are unalike, resulting in the negative values for the pheromones.
As mentioned previously, the user interests may change over time. Older interests fade out, and new interests develop. We capture this evolution by the mechanism of pheromone evaporation. Before the pheromone exchange between the user and the item is executed, the existing pheromones evaporate at a pre-specified rate. For our research, we consider the ratio of the current pheromone amount to the highest pheromone concentration among all the K types existing pheromones as the rate.
For the rating-based recommendation scenario, the complete pheromone update formulae for the item and the user are described in Eq. 1 and Eq. 2.
Similarly, for the 0/1 preference data, we rewrite the pheromone update formula as follows: To keep our model simple and robust against the rating noise, after evaporation and transmission, we delete the pheromones with the value less than a threshold value σ . This process is referred to as the threshold cut off. In our simulation experiments, σ is set to 0.01, and the training algorithm is shown in Alg. 1.

B. RECOMMENDATION
We provide recommendation algorithms for both the explicit rating prediction and implicit relevance ranking tasks.
Moreover, we can also calculate the following three types of similarities through pheromone comparison: To what extent is a given user u i and a given item v j alike? (This is computed by comparing the corresponding user and item pheromones.) For rating prediction, we employ memory-based methods. That is to say that the user u i 's rating toward the item v j , is predicted based upon the ratings, which are given by users similar with u i toward items similar with v j .
The central problem lies in respectively identifying the neighbors of users and items that share similar rating patterns. Now, we give the rating-based recommendation in Alg. 2. The relevance ranking tasks are concerned with the relevance of the current user and an item. The problem is to rank the items according to the similarities of the current user. The detailed algorithm is shown in Alg. 3.

C. COMPLEXITY ANALYSIS
The ACF algorithm's time complexity is O(K × #ratings), where K is the maximum number of the types of pheromones that a user and an item carry. Typically, we have k #users. For online updates, we can easily determine that an update complexity is only O(K ). In the recommendation phase, for the rating-based recommendation, the time complexity is if user has rated item v j then user_prediction += abs(s(u i , user)) * (r user,v j − r user ); user_similarity += abs(s(u i , user)); end user_prediction /= user_similarity; end item_prediction = 0; item_similarity = 0; for item ∈ C(v j ) do if u i has rated item then item_prediction + = abs(s(item, v j )) * (r u i ,item − r item ); item_similarity + = abs(s(item, v j )); end item_prediction / = item_similarity; end prediction = r + user_prediction + item_prediction; O(#users + #items). However, the computations could be significantly reduced, if we maintain user neighbors and item neighbors explicitly in the memory or in a database. For the ranking-based recommendation, the time complexity is O(#items).

IV. EXPERIMENTAL EVALUATION
As the collaborative filtering methods follow two different strategies: (a) rating prediction and (b) top-N ranking, we benchmark our proposed methods on both scenarios.

A. RATING-BASED RECOMMENDATIONS
We use a popular movie recommendation dataset, the Movie-Lens for the benchmarking of the rating prediction algorithms. The MovieLens data consists of more than 1, 000, 000 ratings from 6, 040 users of 3, 706 movies. The ratings are made on a five-star scale, and 90% are used for the model training, and the rest of the 10% constitute the test set.
The evaluation metric is the Root Mean Square Error (RMSE), which is given as:  The parameters in our algorithm are: (a) transmission rate γ , (b) evaporation rate λ. The transmission rate γ controls the speed of pheromones that are transferred from the user to an item and vice versa. The bigger the value of γ , the faster the transmission. The evaporation rate λ controls the speed of the pheromone evaporation. The bigger the value of λ, the slower the pheromone evaporates. In our simulations, we empirically choose the value of γ and λ to be 0.2 and 1.
We compare our proposed methodologies with two memory-based methods, namely: (a) classic user-based CF and (b) item-based CF [42], two model-based methods: (a) Probabilistic Latent Semantic Analysis (PLSA) [17] and (b) Non-negative Matrix Factorization (NMF) [58], and with one activation spreading method: (a) Rating Similarity Matrix (RSM) [14]. The comparative results are shown in Table 2.
Although the MovieLens data has timestamped information, we do not consider the time effects because the timestamp does not reflect the evolution of the user interests at all. Therefore, all of the algorithms are trained in a batched manner without particular time considerations. However, we will see the influence of time in the following simulations.

B. RANKING-BASED RECOMMENDATION
To examine the performance of our methodology for a ranking-based recommendation scenario, we performed simulations on two real-world recommender systems: (a) book recommendation and (b) music recommendation. The book recommendation data was crawled from the largest Chinese book recommendation website: Douban (http://www.douban.com). For simulation, we used part of the readers and books as our training data and the rest as the testing dataset. The music recommendation data was crawled from the largest online music recommender system: Last.FM (http://www.last.fm). We used the aforementioned datasets because (a) They are of higher quality than experimental dataset in terms of reflecting real user preferences. (b) They symbolize two types of popular recommender systems. (c) Both of them contain timed implicit ratings (which is of high interest to us). Table 3 shows some statistics of the datasets.
As one of the flaws of ranking-based recommendation is the lacking of evaluation, we use two metrics to be less  subjective, namely: Precision [51] and Ranking Accumulation (RA) [63], which is defined as follows.
It is noteworthy to mention that the higher value of Precision ∈ [0, 1] is considered better for the system, and the lower the value of RA ∈ [ N +1 2 , N + 1] is deemed appropriate for the system. We also must take notice of the fact that the Hitting set only contains items that are of interest to both of the users and is already present in the test set. We compare our methods with the three ranking-based CF methods reported in [14], [51], [63], respectively. The classical user-and itembased algorithms are also considered for comparisons. The methods proposed in [63] and [14] are denoted as NBI and RSM, 1 respectively. We also implemented the BM25-Item algorithm in [51]. For our simulations, we hold 10% data for testing. All of the results are obtained by averaging the results of five-fold cross-validation. The value of recall of various methods is also provided along with the precision, as shown in Table 4.
The ACF with and without considering time sequences are referred to as the time and timeless versions. We An apparent increase can be observed from the results in the precision as time goes on, which means that when the users keep using the recommender systems, the recommendation will be more accurate.

V. CONCLUSION AND FUTURE WORKS
This paper proposed a novel CF algorithm by simulating the ant colony behaviour. By pheromone transmission between users and items, and evaporation of the pheromones on both users and items over time, the ACF algorithm could flexibly reflect the latest user preference and thus can make the most reliable recommendations.
We summarize our major contributions as: • We introduced the concepts in Ant Colony Optimizations, such as Pheromone and Evaporation into the recommendation domain and proposed a scalable collaborative filtering algorithm that can gracefully handle sparse and evolving rating data.
• The algorithm proposed could recommend with both strategies: rating-based and ranking-based recommendation, which are used in explicit user preference and implicit user preference scenarios, respectively.
• ACF algorithm is easy to be deployed on distributed computational resources, even in a peer-to-peer environment, which means higher scalability and more importantly, user privacy protection. There are some unexplored possibilities to improve the algorithm proposed in this paper. First, the initialization of pheromones does affect the final recommendation results. Second, the robustness of the ACF algorithm needs to be further explored while running on a larger dataset. The possible reason is that evaporation is an interesting while hard-to-tune mechanism that applications should find the most suitable rate to their own needs.