Privacy-Aware Content-of-Interest Search and Recommendation in Internet of Things for Cross-Dressers

With the continuous development and gradual progress of Internet of Things (IoT) in human society, people are becoming increasingly diverse in terms of user preferences and things choices. In this situation, several new cultures or social phenomena have been emerging including the so-called Cross-dressing culture. As a special group of humans, cross-dressers are often very sensitive to their non-mainstream identities. Therefore, they are often confronted with more difficulties when using some modern information techniques such as Content-of-Interest (COI) search. Motivated by this fact, we introduce some advanced information retrieval and privacy protection techniques into the cross-dressing domain and further propose a privacy-aware COI search and recommendation solution for cross-dressers, named PCSR. First, PCSR uses fastText tool to transform the cross-dressers’ input keywords and the candidate webpages into corresponding vectors with less private content associated with cross-dressers. Afterwards, we use vector similarity calculation techniques to make privacy-preserving COI search and recommendation. At last, we validate the effectiveness of PCSR through a set of experiments. We believe that our proposed PCSR solution can benefit the cross-dressers significantly when performing COI search and recommendation in IoT while protecting sensitive information of cross-dressers.


I. INTRODUCTION
With the continuous development and gradual progress of Internet of Things (IoT), human society is becoming more and more open and tolerant [1]- [5]. Under this circumstance, people are becoming increasingly diverse and personalized in terms of user preferences and things choices, which render the whole world more and more rich and varied [6]- [10]. In this situation, several new and personalized cultures or social phenomena have been recently emerging including the so-called Cross-dressing culture.
According to the widely-adopted common views of researchers from academy, industry and society, a man is The associate editor coordinating the review of this manuscript and approving it for publication was Chi-Hua Chen . deemed a cross-dresser if his thinking, behaviors, actions and preferences are more like a woman instead of a man [11]. A typical example is a man who likes to wear a skirt (e.g., Jumper skirt, bias skirt, miniskirt and fish tail skirt) or is with long hairs or fingernails. As a special group of humans, cross-dressers are often very sensitive to their nonmainstream and abnormal identities. Therefore, they are often confronted with more difficulties when using some modern information techniques such as Content-of-Interest (COI, e.g., news, blogs, videos, we-media) search.
More concretely, a person is often not very sensitive towards his or her typed keywords when he or she is using a search engine such as Google or Baidu. However, a crossdresser is often reluctant to share his or her typed keywords with others even if only the search engine that he or she is using probably knows the input keywords. This is because the input keywords often contain certain sensitive identity information of the cross-dresser and hence keywords disclosure may make him or her at risk.
Motivated by this fact and challenge in real world, we introduce some advanced information retrieval (e.g., fastText [12] which can convert a keyword or a sentence or a paragraph into a short vector (released in 2016 and widely used in various applicable domains), and Cosine Similarity Calculation which can measure the similarity between different vectors) and privacy protection techniques (e.g., SimHash [13] and Hamming Distance [14] which can evaluate if two vectors are close or similar without revealing user privacy) into the cross-dressing domain. Furthermore, we propose a Privacyaware COI Search and Recommendation solution for crossdressers, named PCSR based on the above mentioned several techniques.
In concrete, PCSR first uses a word2vec tool, i.e., fast-Text to transform the cross-dressers' input keywords and the candidate webpages into corresponding vectors with less private content associated with the cross-dressers. Afterwards, PCSR uses vector similarity calculation techniques including SimHash, Hamming Distance and Cosine Similarity Calculation to evaluate the matching degree between a cross-dresser's input keywords and all the candidate webpages on the Internet. Then according to the matching degree evaluation results, PCSR makes privacy-preserving COI search and recommendation. At last, we validate the feasibility and effectiveness of PCSR solution through a case study created from the keyword search applications of a cross-dresser. We believe that our proposed PCSR solution can benefit the big volume of cross-dressers significantly when performing COI search and recommendation while protecting sensitive information of cross-dressers.
We summarize the contributions of this paper as follows.
(1) We argue that cross-dressers are a special group of humans who are often very sensitive to their non-mainstream identities when making keywords-driven COI search and recommendation.
(2) We integrate existing information retrieval and privacy protection techniques and further propose a privacypreserving COI search and recommendation solution PCSR which is tailored to cross-dressers who are sensitive enough.
(3) A set of experiments are designed and enacted based on a real-world PW dataset to show the feasibility of the PCSR solution.
The rest of this paper is structured as follows. Section 2 reviews the current research literatures associated with cross-dressers comprehensively. In Section 3, a motivating example extracted from real world is presented to spotlight the research value of our work. Detailed procedure of our proposed PCSR solution is introduced in Section 4, step by step. Afterwards, we show the effectiveness of the PCSR solution in Section 5 through a set of experiments created based on a real-world dataset. Finally, in Section 6, we summarize both the advantages and limitations of the proposed PCSR solution, and further point out the future research directions in the upcoming study.

II. RELATED WORK
We review the current research literatures associated with cross-dressing culture as follows.
In [11], the authors focus on the relationship between the cross-dressing behaviors or habits and the gender of people. The study indicates that some cross-dressers are reluctant to disclosure their gender information while others are less sensitive towards their special identity information. To achieve this statistics goal, a new tool is developed to record the dressing codes of individuals. In [15], more influencing factors are considered to analyze the cross-dressers cultures as well as their influence towards heteronormative behaviors, such as gender, sexuality, romance, and so on. The investigation is enacted and conducted in Taiwan district of China. The report shows that cross-dressing is a key factor that is related to the decisions in heteronormative behaviors.
In [16], different types of cross-dressing events as well as their possible hidden relationships are investigated and analyzed in depth. The research indicates that there are often some inner correlations among different cross-dressing behaviors, especially in the occasion of religions. Another finding is that people's cross-dressing behaviors are prone to be triggered by various reasons and for various purposes. However, such complex and implicit relationships still require intensive study instead of simply concluding them as just for a certain impressive ceremony.
Cross-dressing is a special culture that is very popular in Japan. Specifically, cross-dressing culture often appears in many cartoon movies or novels and has become a big advantage for attracting readers or audiences. Motivated by this fact, in [17], the authors introduce the cross-dresser characters in the cartoon works of Japan as well as their performances in terms of gender. The investigation report show that the female cross-dressers are more likely to succeed in career than the male cross-dressers as the formers own more male characteristics and therefore, more probably win the competition among massive candidates. Moreover, one reason that female cross-dressers are emerging is that they are eager to gain more powers just like mans.
In [18], a female cross-dresser and a male cross-dresser changed their respective identity. The authors investigate their various changes in daily life in a time window of two years. In [19], the authors have investigated the transgender behaviors or phenomenon, one of whose typical performances is cross-dressing. In addition, the authors study the crossdressing performance in ancient literature, articles, novels and so on. This work unlocks a new research field about crossdressing from many perspectives such as social aspect and concept aspect.
In the previous research, Victoria Flanagan insists that a female cross-dresser is more likely to succeed in business or life and a male cross-dresser is difficult to win the competition with others. While in [20], the authors do not find the evidence supporting the above conclusions. Instead, the authors discover that cross-dressing behaviors of men or women are just helpful in increasing the mystery degree of people identity.
According to the general opinions of the public, the people's cross-dressing behaviors are often related to some negative things such as perversion, fetishism, exhibitionism, sexual deviation, and homosexuality. However, in [21], the authors are opposite to the above conclusion. In this work, the authors introduce a real-world cross-dresser case who is a female but he likes to wear female clothes. The reason behind his strange behaviors is that he tends not to miss his mother who have already died very much. In other words, his cross-dressing behaviors are actually a kind of feeling defense mechanism.
In [22], the authors investigate the influence of crossdressing culture in video games of China. As we know, most of the players in video games are male. An interesting phenomenon that is emerging recently is many male players of video games have added cross-dressing elements, for the purpose of attracting more funs online. This finding has also validated the popularity of cross-dressing culture in our daily life.
In general, the above literatures have investigated the correlation between cross-dressing culture and other influencing factors such as gender, love and so on. However, these research outcomes do not consider the positive effects of cross-dressing phenomenon in recommender systems as well as the resulted privacy protection issues. In view of this challenge, we introduce several advanced information retrieval techniques and privacy protection techniques into the crossdressing domain, and further propose a privacy-aware COI search and recommendation solution for cross-dressers, i.e., PCSR. We believe that PCSR provides a beneficial attempt to explore the application of cross-dressing culture in commerce domains.

III. A MOTIVATING EXAMPLE
We use the example shown in Fig. 1 to illustrate the research value and significance of our paper. As the example shows, Alice types a keyword ''cross-dress'' in the search engine. Then the search engine searches for the most similar COIs that match the keyword ''cross-dress'' from massive candidate COI items.
According to the traditional keywords-based COIs matching methods, only the first COI item containing words ''. . .cross-dress. . .'' is returned to Alice. While the COI items containing words ''. . .drag queen. . .'' and ''. . .pseudogirl. . .'' are overlooked regretfully as ''drag queen'' and ''pseudo-girl'' are not similar with the keyword ''crossdress'' typed by Alice. While on the contrary, ''drag queen'' and ''pseudo-girl'' are the synonyms of ''cross-dress''. Therefore, traditional keywords-based COIs search methods are prone to overlook high-quality search results that match the keywords typed by users. Moreover, the above keywords search process is often not safe from the perspective of Alice since her typed keyword ''cross-dress'' is very sensitive to her. Therefore, pure keywords-based COIs search methods lack of sufficient capability of privacy-preservation, which probably increases the privacy disclosure concerns of users especially when he or she is very sensitive to his/her special identities of cross-dressers.
Motivated by the above drawbacks and limitations of existing keywords-based COIs search methods, we introduce several advanced information retrieval techniques and privacy protection techniques into the cross-dressing domain and further propose a privacy-aware COI search and recommendation solution for cross-dressers, i.e., PCSR. The detailed procedure will be clarified step by step in the following section.

IV. SOLUTION: PCSR
Our proposed PCSR method will be described in detail to describe the concrete procedure of privacy-aware COI search and recommendations. Generally, three steps are necessary to achieve the goal of privacy-aware COI search and recommendation, which is specified briefly in Fig. 2. Users' inputs are probably fuzzy and undetermined; therefore, we only consider exact and concise keywords as user inputs for simplicity.

A. STEP 1: KEYWORD → VECTOR
In traditional keywords-based COI items search and recommendation scenarios, the keywords typed by a cross-dresser are the major bases of discovering the target COI items that are interested by the cross-dresser. However, the keywords typed by the cross-dresser are often a category of private data as they often contain certain sensitive user identity information. Therefore, to achieve the goal of privacy protection when making keywords search, we need to convert the sensitive keywords into an equivalent vector that is not sensitive enough. Next, we describe the concrete conversion process.
As a classical and effective word-to-vector tool, fastText has recently been employed to fulfill the functionality of transforming a set of words into an equivalent vector. In concrete, according to the well-trained word-to-vector training model of fastText, each keyword is transformed into a 300-dimensional vector. As fastText is a mature word2vec tool in processing natural languages, we will not introduce the concrete transformation process in detail here. For simplicity, we assume that each keyword (denoted by KW) is transformed into a corresponding 300-dimensional vector V KW as presented in (1).
where v j is a real value B. STEP 2: COI DESCRIPTION → VECTOR Similar to Step 1, in this step, we convert the text description of each candidate COI item into an equivalent vector, which is mainly based on the fastText tool. In concrete, each candidate COI item is often accompanied with a text description that specifies the concrete functions that the COI item can realize. For example, an COI item with title ''What To Do If Your Husband Is A Cross-Dresser'' on the web claims that ''For both the cross-dresser and his family, it is not an easy thing to deal with and it is very common to experience a sense of despair''. 1 In other words, the latter text description introduces the general meanings or comments or public attitude regarding the former COI item. Next, we need to transform the text description of each COI item into a corresponding vector, which is also based on the word2vec model trained well in fastText. As fastText is a well-known tool for processing natural languages, we will not introduce the concrete transformation process in detail here. For simplicity, we assume that the text description of each COI item (denoted by CI) is transformed into a corresponding 300-dimensional vector V CI as presented in (2) V CI = (ϕ 1 , . . . , ϕ 300 ) (2) where v j is a real value 1 https://www.huffingtonpost.com.au/matty-silver/what-to-do-if-yourhusband-is-a-cross-dresser_a_21463341/

C. STEP 3: COI WEIGHTING
In the former two steps, we have transformed the crossdressers' typed keyword and each candidate COI item' text description into corresponding vectors, i.e., V KW and V CI . Next, according to the two vectors V KW and V CI , we can calculate the matching degree between the keyword KW and the COI item CI. Since V KW and V CI are both real-value vectors, we can measure their matching degree through calculating their vector similarity.
In concrete, the vector similarity is calculated based on Simhash technique, which is formalized in (3). In other words, the similarity between V KW and V CI , i.e., Sim(V KW , V CI ) is in proportion with the ratio between 1 and HD(V KW , V CI ) where HD(V KW , V CI ) is the Hamming Distance between vectors V KW and V CI and it can be calculated based on equations (4)- (5).
Here, the reason that we choose equation (3) to measure the vector similarity is as follows: according to Simhash theory [13], if the Hamming Distance between two vectors X and Y is small, then we can simply conclude that vectors X and Y are similar. Therefore, according to (3), the similarity between V KW and V CI , i.e., Sim(V KW , V CI ) is in proportion with the ratio between 1 and HD(V KW , V CI ) if HD(V KW , V CI ) is larger than 1; specifically, if HD(V KW , V CI ) = 0, then Sim(V KW , V CI ) would be the highest, i.e., Sim(V KW , V CI ) = 1. More formally, equation (3) is updated to be (6). Moreover, as V KW and V CI are both 300-dimensional vectors, Sim(V KW , V CI ) ∈ [1/300, 1] according to (6).
Furthermore, if two COI items CI − 1 and CI − 2 have the same Hamming Distance with keyword KW typed by a crossdresser, i.e., HD(V KW , V CI −1 ) = HD(V KW , V CI −2 ), then we cannot discriminate the importance of them through (6) directly. In this situation, we evaluate and rank them through Cosine Similarity (CS). In concrete, the Cosine Similarity between KW and CI is calculated by (7). In general, a larger value of CS(V KW , V CI ) often means a higher matching score of COI item CI for the keyword query by the cross-dresser.
Theoretically, the larger Sim(V KW , V CI ) or CS(V KW , V CI ) is, the more probably that the COI item CI matches the keyword KW typed by a cross-dresser; as a consequence, CI should be assigned a larger weight. Motivated by this fact, we weight each candidate COI item CI with W CI in (8). In other words, if two COI items CI − 1 and CI − 2 have the same Hamming Distance with keyword KW, i.e., HD(V KW , V CI −1 ) = HD(V KW , V CI −2 ), then we can assign a weight to each of them based on CS(V KW , V CI ), respectively. Otherwise, we can assign a weight to each of them based on Sim(V KW , V CI ), respectively.

D. STEP 4: COI RECOMMENDATIONS
According to the weight value W CI of each COI item CI derived in Step 3, we can rank all the COI items in descending order. As equation (8) shows, the ranking process is firstly based on Sim(V KW , V CI ) and then based on CS(V KW , V CI ). Specifically, if multiple COI items own the same W CI value, then they are ranked randomly. Thus, we can recommend appropriate COI items (e.g., Top-3, Top-5, and so on) to the cross-dresser who types a keyword KW to the COI recommender system. Moreover, as the recommendation decision is made based on W CI that is obtained in a privacy-aware way, our proposed PCSR method can secure the sensitive information or data of cross-dressers well. More formally, the pseudo code of PCSR method is presented in Algorithm 1.

V. EXPERIMENTS
To validate the effectiveness and efficiency of PCSR method, a set of experiments are provided here which are based on a real-world PW dataset [23]. In the dataset, the description of each web API can be regarded as a candidate COI item and the categories of APIs can be considered as the keywords typed by cross-dressers. Totally, PW dataset contains 17,478 web APIs as well as their functional descriptions. Concretely, we test the hit rate and computation time of PCSR method and analyze the time cost convergence of PCSR.
Experiments are conducted on a laptop (the hardware configurations include 2.70 GHz processor and 8.0GB memory, while the software configurations include Windows 10 and Python 3.6). Experiments are repeated 100 times to observe the computation time convergence of PCSR method.

A. EXPERIMENT 1: HIT RATE OF PCSR
In this text, we analyze and test the hit rate of the proposed PCSR method with respect to the number of returned COI items (i.e., Top-K) and the number of typed keywords by a cross-dresser (i.e., #q). In concrete, the number of candidate Algorithm 1 PCSR Inputs: (1) KW: keyword typed by a cross-dresser (2) COI items: CI − 1, . . . , CI − n (3) K : size of returned COI items Output: Then HD(V KW , V CI −i ) + + 10.
End If 11. End For Then End  Fig. 3 where the horizontal axis denotes Top-K while the vertical axis denotes hit rate, respectively.
It can be seen from Fig. 3 that the hit rate of PCSR method increases with the growth of Top-K. The reason is understandable that when there are more returned COIs after keyword query, there would be higher probability that the typed keyword is matched with the returned COI items. Additionally, the hit rate of PCSR method increases with the growth of #q. The reason is that when there are more typed keywords by a cross-dresser, there would be high probability that the typed keywords are matched with the candidate COI items.

B. EXPERIMENT 2: COMPUTATION TIME OF PCSR
In this text, we analyze and test the computation time of the proposed PCSR method with respect to the number of returned COI items (i.e., Top-K) and the number of typed keywords by a cross-dresser (i.e., #q). In concrete, the number of candidate COI items, i.e., n = 17,478, Top-K = 1, 3, 5, 10 and #q = 1, 2, 3. Experimental results are presented in Fig. 4 where the horizontal axis denotes the parameter Top-K while the vertical axis denotes computation time, respectively.
As Fig. 4 shows, the computation time of PCSR method increases with the growth of Top-K and the growth of #q, respectively. The reason is that when there are more candidate COI items or there are more queried keywords, the query time would be increased accordingly as more additional time costs are needed.

C. EXPERIMENT 3: COMPUTATION TIME CONVERGENCE OF PCSR
In this text, we analyze and test the computation time convergence of the proposed PCSR method with respect to the number of returned COI items (i.e., Top-K) and the number of typed keywords by a cross-dresser (i.e., #q). In concrete, parameter n = 17,478, Top-K = 1, 3, 5, 10 and #q = 1, 2, 3. Experimental results are presented in Fig. 5 where the VOLUME 9, 2021 horizontal axis denotes the repeated experiment times while the vertical axis denotes the computation time, respectively.
As the three sub-graphs in Fig,5 shows, the computation time of PCSR method stays approximately convergent when the experiments are repeated 100 times. Therefore, we claim that it is reasonable that we execute each set of experiments 100 times. Moreover, as Fig. 5 indicates, the general time cost of PCSR after convergence is slow and acceptable in most of cases.

VI. CONCLUSION AND FUTURE WORK
With the continuous development and gradual progress of IoT in human society, several new cultures or social phenomena have been emerging including the so-called Crossdressing culture. As a special group of humans, cross-dressers are often very sensitive to their non-mainstream identities. Therefore, they are often confronted with more difficulties when using some modern information techniques such as Content-of-Interest (COI) search. Motivated by this fact, propose a privacy-aware COI search and recommendation solution for cross-dressers, named PCSR. At last, we validate the effectiveness of PCSR through a set of experiments. We believe that our proposed PCSR solution can benefit the cross-dressers significantly when performing COI search and recommendation while protecting sensitive information of cross-dressers.
Recommender systems are actually a kind of decisionmaking problem that involves users, multiple influencing factors and so on [24]- [29]. In the future work, we will further discuss multi-factor recommendation problems for users. In addition, there are many other privacy-preservation techniques. Therefore, in the future work, we will further refine our method by integrating more privacy-preserving techniques such as Blockchain [30], [31], Differential Privacy [32]- [34], Locality-Sensitive Hashing [35], [36] and so on. At last, balancing multiple conflicting performances is necessary for a decision-making problem [37]- [39]. Therefore, how to achieve the multi-goal optimization involving multiple conflicting dimensions is a challenging task in future COI recommendations.
WEI SUN received the bachelor's degree in sport management and the master's degree in sport education training from Beijing Sport University, China, in 2003 and 2012, respectively. She is currently an Associate Professor with Jining Medical University, China. She has published several research papers in reputable journals and conferences. Her research interests include user profile modeling and recommender systems.
XIAOMING CAO received the bachelor's degree in sport education training from Beijing Sport University, China, in 2008. She is currently an Associate Professor with Public Physical Education Department, Yantai Nanshan University, China. She has published several research papers in reputable journals and conferences. Her research interests include user privacy protection and big data analyses.
HONGTAO YU received the bachelor's degree in basketball from Beijing Sport University, China, in 2001. He is currently a Lecturer with the Public Physical Education Department, Yantai Nanshan University, China. He has published several research papers in reputable journals and conferences. His research interests include sport education and big data analyses.
WENMIN LIN received the Ph.D. degree in computer science and technology from Nanjing University, China, in 2015. She is currently a Lecturer with Alibaba Business School, Hangzhou Normal University. Her research interests include service computing, big data analytics, and block-chain technology.
CHAO YAN received the master's degree from the Institute of Computing Technology, Chinese Academy of Sciences, China, in 2006. He is currently an Associate Professor with the School of Computer Science, Qufu Normal University, China. His research interests include recommender systems and services computing. VOLUME 9, 2021