Gender Identification from Community Question Answering Avatars

There are several reasons why gender recognition is vital for online social networks such as community Question Answering (cQA) platforms. One of them is progressing towards gender parity across topics as a means of keeping communities vibrant. More specifically, this demographic variable has shown to play a crucial role in devising better user engagement strategies. For instance, by kindling the interest of their members for topics dominated by the opposite gender. However, in most cQA websites, the gender field is neither mandatory nor verified when submitting and processing enrollment forms. And as might be expected, it is left blank most of the time, forcing cQA services to infer this demographic information from the activity of their users on their platforms such as prompted questions, answers, self-descriptions and profile images. There is only a handful of studies dissecting automatic gender recognition across cQA fellows, and as far as we know, this work is the first effort to delve into the contribution of their profile pictures to this task. Since these images are an unconstrained environment, their multifariousness poses a particularly difficult and interesting challenge. With this mind, we assessed the performance of three state-of-art image processing techniques, namely pre-trained neural network models. In a nutshell, our best configuration finished with an accuracy of 81.68% (Inception-ResNet-50), and its corresponding Grad-Cam maps unveil that one of its principal focus of attention is determining silhouettes edges. All in all, we envisage that our findings are going to play a fundamental part in the design of efficient multi-modal strategies.

1 stackexchange.com work architectures, for guessing genders from cQA profile 75 pictures (avatars for short). Needless to say, this is sophisti-76 cated task, due to the multifariousness of these avatars (see  98 4) In order to discover discriminative visual patterns, we 99 examined the Grad-Cam heat maps generated by our 100 best configuration [16]. 101 In a statement, our best performance was achieved by an    From an alternative viewpoint, the work of [32]     In our study, we benefited from the corpus compiled by [9], 281 which encompasses 657,805 community member profiles.

282
Each of these records contains the corresponding sets of 283 questions, answers, nicknames and self-descriptions (see Fig.   284 1). Thus, we capitalized on these textual inputs for automat-285 ically assigning each community peer one of two genders 286 (i.e., male or female), whenever it was possible. It is worth 287 noting here that we focused only on the 219,626 (33.39%) 288 community fellows that provided a non-default avatar.

289
In the same spirit of [13], our automatic annotation pro- own compilation of 58 highly recurrent nicknames, which 296 could not be found across the six previous catalogues (e.g., 297 sweetgirl, justagirl, guy and boy).

298
In general, nicknames can represent not only real names 299 (e.g., devin espinoza and helen_robi), but also strings con-   its outcome is the login/user name or its first part (e.g.,

316
"billy.peralta@unab.cl" → "billy"). If the resulting string 317 does not match any list, we then start to systematically trim 318 its end one character at a time until a match is found or its 319 length is five characters. This reduction helps to remove some 320 classes of suffixes typically used after names across social 321 networks aliases (e.g., "lauraweird" → "laura", "ulrich53" 322 → "ulrich" and "dennis372006" → "dennis"). Since each of 323 these databases can return male and/or female, we count the 324 frequency of each possible gender to decide the final label.

325
At this point, we preliminarily tagged community mem-326 bers by assigning the highest frequent gender whenever there 327 was one. Subsequently, we profit from these preliminary 328 labels for finding gender indicative phrases across their ques-329 tions, answers and self-descriptions. For this purpose, we 330 took advantage of CoreNLP 10 for tokenizing and splitting 331 sentences, and computing lowercased n-grams afterwards 332 (n = 2 . . . 7). It is worth noting here that we also capi-333 talized on part-of-speech tagging for substituting numbers 334 3 Their respective amount of records is in parentheses. 4 data.world/howarder/gender-by-name 5 data.world/arunbabu/gender-by-names 6 ftp.heise.de/ct/listings/0717-182.zip 7 www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/ 8 github.com/lizhi1104/nlp_data 9 www.kaggle.com/migalpha/spanish-names 10 stanfordnlp.github.io/CoreNLP/ (a) Males (b) Females FIGURE 1: Illustrative record excerpts corresponding to ten different community fellows. In bold red, phrases indicative of their respective gender. The first row contains self-descriptions, the next one question titles and bodies, and the last row answers. with a placeholder. After this, these n-grams were ranked in 335 conformity to their Entropy, and low-ranked elements were 336 manually inspected in order to verify if each of them by 337 itself suffices to make a good guess of the gender. Eventually, 338 this process aided in compiling a collection of 1,486 gender 339 indicative phrases (see Table 1).   we describe each of these architectures.   In the next stage, an average pooling layer acts as drop-out.  to obtain greater interpretability of the best model [16].

493
These experiments were carried out using a Yahoo! Answers  that the different inference processes of these neural 562 network models are more sensitive to male avatars. 563 2) There is a correlation between the overall accuracy 564 and the error rate across male avatars (see Table   565 2): 39.51% (Inception-ResNet, Acc. 81.68%), 41.83%

628
In this section, we qualitatively analyze the automatic classi-  Figure 11. We observe that in the event of 661 avatars with pictures of men (see Fig. 11a, 11c and   662 11d), the network is usually wrong. Errors can also 663 occur in ambiguous images such as couples (Fig. 11e), 664 a woman near a car (Fig. 11f), or symbols (Fig. 11g).   that the former is a language independent approach.

712
By inspecting these heat maps, we discover that informa-713 tive regions often vary on a case-by-case basis. Nevertheless, 714 these maps reveal one prominent pattern: body silhouettes.

715
The network focuses on delineating these edges/contours 716 regardless if it is dealing with virtual ( Fig. 13a -13d, 13i) 717 or real (Fig. 13e -13f) avatars. It is key to note here that a 718 greater variety of body silhouette patterns was observed in 719 the case of male pictures (Fig. 14a-14f,14i).

720
Interestingly enough, in the case of an eye, it also tends 721 to focus on its contour and pupil (Fig. 13g); while in the 722 case of a beach, the vegetation is in its spotlight (Fig. 13h).

723
Other interesting contrasts are due to images such as a) a cat, 724 where the grid is centered on the object itself (Fig. 14g); and 725 b) a guitar, where the network targets at the outline and the 726 background (Fig. 14h).

727
With regard to misclassifications, we found out that the 728 network follows patterns similar to the hits. For example, 729 in the case of virtual or real pictures of people ( Fig. 15a -730 15d, 15h, 15i, 16a -16d, 16g-16i) the model also aims at 731 body silhouettes. In like manner, the network centers on the 732 contour when dealing with the couple, but this time following 733 an unclear pattern (Fig. 15e), while in the image of a woman 734 and a car, it focuses on the intersection of both elements 735 (Figure 15f).

736
On the other hand, in more complex pictures such as a man 737 in a library (Fig. 15d) and the logo (Fig. 15g), the network 738 directs its attention to much of the image. Lastly, some 739 pictures does not show a well-defined pattern like animals 740 ( Fig. 16e -16f, 16h).

741
To sum this up, we found out that one of the focal points of 742 this model is the outline of objects of interest. This is also an 743 aspect that human users would use. However, it is surprising 744 that the network does not target specifically at avatar faces. 745 We conjecture that this is due to the high variance intrinsic