Measuring and Fostering Diversity in Affective Computing Research

This work presents a longitudinal study of diversity among the Affective Computing research community members. We explore several dimensions of diversity, including gender, geography, institutional types of affiliations and selected combinations of dimensions. We cover the last 10 years of the IEEE Transactions on Affective Computing (TAFFC) journal and the International Conference on Affective Computing and Intelligent Interaction (ACII), the primary sources of publications in Affective Computing. We also present an analysis of diversity among the members of the Association for the Advancement of Affective Computing (AAAC). Our findings reveal a “leaky pipeline” in the field, with a low –albeit slowly increasing over the years– representation of women. They also show that academic institutions clearly dominate publications, ahead of industry and governmental centres. In terms of geography, most publications come from the USA, contributions from Latin America or Africa being almost non-existent. Lastly, we find that diversity in the characteristics of researchers (gender and geographic location) influences diversity in the topics. To conclude, we analyse initiatives that have been undertaken in other AI-related research communities to foster diversity, and recommend a set of initiatives that could be applied to the Affective Computing field to increase diversity in its different facets. The diversity data collected in this work are publicly available, ensuring strict personal data protection and governance rules.


I. INTRODUCTION
T HE Artificial Intelligence (AI) field has traditionally been dominated by a majority of white male researchers.This is why the last decade has shown the rise of initiatives and activities (often led by representatives of under-represented groups) directed at enhancing the diversity of the field by aiming at increasing the presence of women, researchers from different cultural origins or those with disabilities.The need for diversity in AI is indeed acknowledged by both the academic community and policy makers.For instance, the Ethical Guidelines for Trustworthy AI, a document of the European Commission's appointed expert group on AI, defines diversity, non-discrimination and fairness as one of the seven requirements that AI systems should fulfil in order to be trustworthy [1].This requirement mentions the need to avoid unfair bias, facilitate accessibility and universal design, and promote stakeholder participation, ensuring respect to equality as a fundamental right expressed by the Universal Declaration of Human Rights (Article 7 [2]) and the European Union Charter of Fundamental Rights (Title III [3]).
The guidelines argue that, by fostering diversity at all stages of AI systems' life-cycle, they could become more accessible to all and involve all relevant stakeholders.This includes the need for a diverse community of researchers and developers of AI systems.However, the current state of diversity in the AI field is yet to be assessed, and whilst several reports point out the crisis we are witnessing in terms of workforce diversity, e.g., [4], still shared practices for such assessment are lacking in the community.The urge of having robust indicators for characterizing participation in AI venues, and consequently for raising awareness of this crisis, is therefore the main reason motivating this work.
Affective computing is a research field related to the study and development of systems to recognize, describe, process and generate human affects [5].The field deals at its core with AI human-centered applications, having a strong social and ethical impact, developing systems with a strong impact on humans such as emotion recognition and induction [6].In order for these systems to be accessible to all, incorporate different views, and avoid biases and potential discrimination (e.g., with respect to gender, race, cultural background or languages), there is a need to ensure that research is carried out with a varied perspective, which should be reflected by the research community [7], [8].
The field is diverse per se in terms of disciplines, as it combines approaches from Computer Science, Cognitive Science and Psychology.A prominent example is Sentic Computing [9], a multidisciplinary multilingual approach for commonsense sentiment analysis that has induced many studies, notably on human rights and discrimination [10].However, even inherent interdisciplinarity does not automatically imply diversity, as diversity covers multiple facets beyond interdisciplinarity [11].In this work, diversity refers in the first place to the existence of variations of different characteristics in a group of people, more particularly in a research community.Evidence suggests that diverse teams outperform homogeneous groups on complex tasks, including improved problem solving, increased innovation and more accurate predictions, all of which lead to better performance and results [12], [13], [14].Diverse and inclusive scientific communities can generate new research questions not yet being asked in their particular discipline or culture, develop inclusive methodologies to better understand broader populations, and offer novel approaches to problem solving from multiple and different perspectives.Diverse groups have been shown to publish more articles, and these receive more citations per article [15].Diversity thus enhances excellence, inclusion, generality and innovation.
The lack of tools to measure and monitor diversity is one of the limitations to assess the impact of efforts and policies.Hupont et al. [16] proposed a methodology to quantify diversity at the International Conference on Affective Computing and Intelligence Interaction (ACII), which considers gender, geographic location and institution type as diversity dimensions, and analyzes paper authors, keynote speakers and organizers.These results showed the limited diversity of the field, where the composition of researchers is persistently mostly men coming from Europe, Asia and North America.Some challenges of measuring diversity are the complexity of defining standard indicators, the lack of curated data (e.g., country, gender, institution type, topics), plus ethical concerns on the labeling of authors with gender information [17].
The current paper intends to complement previous literature, as discussed in Section II, by proposing a comprehensive methodological approach to measure, monitor and foster diversity in affective computing research (Section IV).For this purpose, we collect data on participants and papers of the three most prominent forums of the Affective Computing community: the IEEE Transactions on Affective Computing journal (TAFFC), the International Conference on Affective Computing and Intelligent Interaction (ACII), and the Association for the Advancement of Affective Computing (AAAC) (Section III).We use this data to first assess the diversity of the community's researchers in terms of gender (Section IV-A), institutional types of affiliations (Section IV-B), geography (Section IV-C) and the intersection of these dimensions.In a second stage, we assess how this diversity among people is related to the diversity of topics produced by them (Section IV-D).This is followed by an overview of existing diversity affinity groups and initiatives (Section V).Finally, we discuss how our findings translate into recommendations for initiatives and activities that could increase diversity in the Affective Computing community (Section VI).The diversity data collected and analysed in this study are publicly available at https://gitlab.com/humaint-ec_public/divinai-datasets.

II. RELATED WORK
The monitoring of diversity in scientific research communities is a practice that has been spreading in recent years.In the following, we present the most prominent studies and monitoring initiatives that have been undertaken to date in different scientific fields and, more particularly, related to Affective Computing.We discuss their focus, strengths and limitations, and compare them to this work.

A. Diversity Studies in Scientific Communities
The field of Neuroscience has been quite pioneering in diversity monitoring.The website BiasWatchNeuro 1 aims at raising awareness on the speaker composition of conferences, particularly with respect to gender representation.According to the current numbers in the website, almost half of studied conferences have ratios that fall below the base rate of women in the field [18].The study presented in [19] centres on the number of black scientists being speakers at Neuroscience conferences, in particular comparing the situation before and after the death of George Floyd in 2020.Even though after 2020 the number of black speakers slightly increases, the under-representation of the black community is evident by the fact that in most analysed conferences there is no black speaker representation at all.
Notable works can also be found in the Geoscience field [20], [21], [22].In particular, [20] analyses gender representation in 9 societies, 25 journals and 10 conferences evidencing the under-representation of women, especially in prestige roles such as conference organisers and journal editorial board members.Even though it is limited to a single edition of one conference, [21] discusses separately white versus black women participation, proving how the latter are much more under-represented than the former, who are already extremely under-represented in the field.Similar results have been obtained in disciplines such as STEM (Science, Technology, Engineering and Mathematics) [23] and Medicine (e.g., Paediatrics [24], Emergency Medicine [25] and Oncology [26]).
In the field of AI, some reports have looked at the composition of its members.The recent 2021 AI Index Report produced by the Stanford University Human-Centered Artificial Intelligence Institute [27] evidences alarming results about the poor gender, ethnic, geographic and sexual orientation diversity in AI.More importantly, it highlights that the AI community currently lacks trustworthy means to collect information about the diversity of its members.Indeed, this report is based on a large-scale survey which clearly points -again-towards a field dominated by white men.On the positive side, it also shows that the Black in AI and Women in ML affinity groups (c.f.Table III) increased their number of participants in AI-related workshops in the last two years, which is indicative of the AI community starting to pay more attention to diversity and inclusion.The AI Watch Index 2021 report by the Joint Research Centre of the European Commission [28] provides a good complementary view by reporting the diversity of participants, namely keynote speakers, authors and Program Committee (PC) members, in 5 top-tier AI scientific conferences in the period 2016-2020.Using the four Biodiversity-inspired indexes proposed by [17], [29] to measure gender, institutional and geographic diversity, the report evidences a slow but increasing diversity trend over the studied conferences.

B. Diversity Studies in Affective Computing
Diversity analyses of the Affective Computing research community are scarce.There are some recent bibliometric studies 1.BiasWatchNeuro: https://biaswatchneuro.com/ Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I MAIN FOCUS, YEARS, METRICS AND DIMENSIONS ANALYSED IN STATE-OF-THE-ART DIVERSITY STUDIES. THE LAST ROW CORRESPONDS TO THE CURRENT WORK
analysing the past two decades of publications in the field [30], [31], [32] but, rather than on community members, they focus on identifying major publications, leading journals, key research topics, and most productive authors, institutions and countries in terms of number of published papers.They present nevertheless interesting findings directly or indirectly related to diversity as follows.
The growth rate of scientific production in the field differs slightly from one study to another, but is generally very high.According to Ho et al. [31], the annual growth rate of published papers is 12.5%.Pestana et al. [32] found that publications double every 4 years.Thus, Affective Computing is attracting more and more researchers every year which could potentially foster diversity in a natural way.
Additionally, results show that top contributing authors and institutions in the field come exclusively from academia and agree that most productive countries are USA, China, UK and Germany.Ho et al. [31] studied collaboration networks by country.Interestingly, they found two major collaborative networks: the "Asia Pacific cluster" with USA, China, Singapore and Japan, and the "European cluster" with Germany, UK and the Netherlands.While the diversity of the "Asia Pacific cluster" might bring some novel cross-cultural emotion studies, the "European cluster" risks falling into homophily.The presence of homophily in scientific collaboration is indeed a known phenomenon proven in several works [13], [33], showing how ethnicity, gender and affiliation factors are highly influential on shaping collaborations in the scientific community.
While bibliometric studies shed some light on the current landscape of the Affective Computing research community, they do not fully capture important facets of diversity such as the presence of under-represented groups (e.g., women, black people, other minority groups), institutions and countries.Our previous work presented in ACII 2021 [16] is, to the best of our knowledge, the first attempt to quantify diversity in the Affective Computing community.It computes four diversity indexes for each ACII conference edition from 2005 to 2019, following the same methodology as the AI Watch Index 2021 report [28], [29], and comparing them to those of top-tier AI conferences.Results show that ACII is well-positioned for gender and geographic diversity, though still far below the equality threshold for gender, and strongly dominated by Europe and North America in terms of countries.On the other hand, as expected from bibliographic studies of research, ACII lags far behind as regards institutional diversity, being totally dominated by academia (with imbalance ratios up to 1:15 for the presence of industry versus academia).

C. Towards a More Comprehensive Monitoring of Diversity: New Metrics and Dimensions
Table I summarises and compares the aforementioned stateof-the-art works on diversity.It shows their main focus (i.e., field and community members analysed), the years covered by each study, the metrics used to monitor diversity, the dimension(s) analysed (among: gender, sexual orientation, ethnicity, age, countries, institution types and topics) and whether crossdimensional analyses are presented (e.g., gender × country, gender × topics, and so on).The last row of the table compiles the information corresponding to the present study.
As it can be seen, the gender dimension is considered by the vast majority of existing diversity works.However, the largest studies such as BiasWatchNeuro [18], [20] and [19] analyse the gender factor in isolation, without considering intersections between different dimensions.The most comprehensive work in terms of number of analysed dimensions is the AI Index Report [27], as information such as sexual orientation, ethnicity and age can only be collected by means of direct surveys Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
to community members.However, the latter does not present cross-dimensional views either.
The type of community members analysed varies among studies, although conference (keynote) speakers are the most common -and sometimes unique-target [18], [19], [21], [25].Speakers have undoubtedly a key role in shaping the external prestige of a conference, but other actors such as PC members, authors and attendees are equally important when it comes to effectively evaluate diversity, as covered in few works [20], [23], [28] as well as in our previous study on ACII [16].Interestingly, diversity among journal actors (authors, editorial boards) and societies/associations members is under-studied, especially in the AI field.
Finally, it is important to consider the duration (i.e., period covered) and the nature of the metrics used in these studies.Most works cover at most a period of 5 years and present simple percentages as quantitative evidence of diversity.While percentages are easy to understand by the wide public, there are some other interesting metrics that could be further explored.Traditional Biodiversity indexes (e.g., Shannon and Pielou) used in our previous work [16] are good examples, as they additionally allow for cross-dimension metrics.Other more sophisticated indicators applied to disciplines such as ecology [11] or telecommunications [34] could be also adapted to the analysis of diversity in a research community, but remain unexplored to date.
This work extends our previous study on the diversity of the Affective Computing community by including the last ten years of data from TAFFC journal's authors and AAAC members, as well as new diversity dimensions (namely the topic dimension and cross-dimensional perspectives).It is important to highlight that our contributions go beyond the field of Affective Computing by adopting novel quantitative indicators of diversity among the members of a research community.
The monitoring of AI workforce diversification in an essential but not sufficient step in order to raise awareness and to promote diversity and inclusion initiatives, which may favor the advancement of racial [35] and gender literacy [36] in the AI community.

A. Sources and Nature of Data
To analyse the diversity of the Affective Computing research community throughout the past decade (January 2011 -December 2021), we compiled data from the three most prominent sources in the field: r The IEEE Transactions on Affective Computing journal (TAFFC).We collected data from its papers and authors.
r The International Conference on Affective Computing and Intelligent Interaction (ACII).We compiled information from its papers, authors, keynote speakers and program committee members.
r The Association for the Advancement of Affective Com- puting (AAAC).We obtained data from the members of its executive committee.
For each identified member in the research community, be it a paper author, a member of a committee or any other role in the three aforementioned sources, we extracted the following information: gender (man or woman, inferred from names), country (inferred from the person's affiliation country) and affiliation type.For affiliation types, we considered the eight categories in the Global Research Identifier Database (GRID 2 ), namely: education, company, non-profit, healthcare, facility, government, archive and other.Additionally, we collected authors keywords from the papers published in TAFFC and ACII's proceedings.

B. Data Collection Process
We followed similar though slightly different processes to extract all the data needed for the study, depending on the source.
The methodology followed for collecting data from ACII conference proceedings and the TAFFC journal is outlined in Fig. 1.The data collection pipeline is fully based on public domain information available online.We first gathered the list of papers published in both sources from January 2010 to December 2021 using the Web of Science (WoS 3 ) global bibliographic database.In the case of ACII 2021, its proceedings were not yet indexed in WoS, thus we used the export tools from IEEE Xplore 4 (publisher of ACII's 2021 edition) instead.
For each of the 1175 so-collected papers (712 for ACII and 463 for TAFFC), we extracted the following information: title, publication year, list of authors (names) and their individual affiliations, plus the keywords provided by the authors when writing the paper.The final list of authors contains 2394 names for ACII and 1687 for TAFFC.As a second step, for each author we inferred gender from her/his name using the NamSor library 5 .From each affiliation, we obtained country and type by querying the GRID database, which currently contains about 102K research institutions.In order to ensure the quality and completeness of the data, we carried out some manual review and labelling of missing keywords (∼ 7% of the total), affiliations (∼ 3%), affiliation types (∼ 67%) and countries (∼ 2%).The high percentage of manual correction in features such as affiliation types (which was mostly due to double affiliations for certain authors, missing institutions in WoS or their different nomenclature in the GRID database) demonstrates the relevance of good quality data and the complexity of acquiring it, even if the papers are in principle available and indexed online.
In addition to ACII and TAFFC paper authors, we gathered and annotated fully manually the list of AAAC members, and ACII keynote speakers and organizers per edition.In the case of the AAAC, we got directly in touch with its current management board which kindly provided us the list (names) of the association's executive committee members since its creation in 2007 for the purpose of this study.For ACII keynote speakers and Program Committee members, we obtained their names from conferences' websites and proceedings' foreword pages.Then,  we manually labeled gender, country and type of institution (according to each person's affiliation in a particular year and GRID classification).

C. Responsible Data Collection
In this section we discuss ethical considerations and risks related to the data collection performed in this study, as it deals with demographic data from the Affective Computing community members, namely their gender, and country and institution where they work(ed).Here we face a trade-off between, on the one hand, the need for diversity monitoring and reproducible research and, on the other hand, the risks of storing this personal information.There is indeed the need for further guidance on how to address the risks of demographic data collection for its responsible storage and use.McKane and Villeneuve [37] review related risks in the context of algorithmic fairness and provide a series of recommendations on how to feasibly collect, manage and employ demographic data.Although the present work does not focus on algorithmic fairness, we relate to it to discuss two of the key sensitive issues in our data collection process: privacy risks and labelling.
First, privacy risks require the implementation of technical methods to maintain individual non-identifiability throughout the data's use.In this respect, we implemented a data protection procedure as follows.On the one hand, we stored raw data at an internal secure space, only accessible to the core research team and with the only purpose of ensuring full traceability of the results.On the other hand, we anonymised the data for an intermediate representation of members' list to share with researchers and to comply with reproducibility practices in terms of indicator computations.The resulting anonymised data collected and analysed during the current study are publicly available at https://gitlab.com/humaint-ec_public/divinai-datasets.
Second, labelling -and, more particularly, automatic labelling-procedures have associated several risks, such as the over-simplification of complex social concepts into categorical variables and potential mislabelling.For instance, our automated gender labelling process only allows for the classification of gender as binary classes. 6There have been ethical concerns regarding the use of systems for the automatic labelling of gender [38].However, there is no study to date providing evidence on whether the quality of automatic data collection of gender is significantly different from manual labelling of such data.Similarly, we classified affiliations into one of the eight GRID categories, which might not fully capture the complexity of the matter (e.g., for persons with more than one affiliation).In this respect, we have carried out manual data correction to gender, institution and country information by checking authors web and social media presence for self-identification information.With these means, we have made our best efforts to minimize mislabelling.

IV. DIVERSITY INDICATORS
In general, the term "diversity" refers to the heterogeneity of elements in a set in relation to a class that takes different values, such as species in an eco-environment, or ethnicity in a population [39].We measure diversity along four different dimensions: (1) gender, (2) institutional types, (3) geography, and (4) topics, where we also look at the intersection between gender and institutional type as well as gender and topic diversity.Since 6.We refer to Wamesley (2022) for an online glossary on the diversity of gender identities: https://www.npr.org/2021/06/02/996319297/genderidentity-pronouns-expression-guide-lgbtqAuthorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the number of classes in the gender and institutional dimension is fixed across source and time, we assess gender and institutional diversity by simply computing the percentages of men/women and of each institutional type in the studied sources.In contrast, for every source the number of countries and topics can vary substantially over time.In order to account for this variation, we assess geographic and topic diversity using a three-dimensional measure of diversity.More precisely, this definition of diversity is derived from [11] who provides a cross-disciplinary general framework of diversity in science, technology and society.This work shows that diversity consists of three basic components:  Each of these components is necessary but individually insufficient for diversity.That is, we expect higher levels of diversity as classes increase, as the distribution of elements between classes become more even or if the difference (or distance) between classes becomes wider.Thus, for the geography and topic dimension we compute different metrics that each weight the different diversity components differently: As metrics with a focus on balance, we compute the inverse of the Herfindahl-Hirschman Index (HHI), which is commonly used in economics to measure market concentration [40], and the Shannon Index (H') [41], which is commonly used in ecology to measure biodiversity [42].HHI is computed as follows: where p i = n i /N is the proportion of elements from a given class i, i.e., the number of elements from this class n i divided by the total number of elements N ; and C is the number of different classes.In a similar spirit, the Shannon index is computed as follows: As a metric with a focus on disparity, we compute the disparity metric (D) as follows: where d ij is the pairwise distance between classes i and j.
As a metric that combines all three diversity components providing a more general measure of diversity, we compute the Rao-Stirling metric (Δ) as proposed by [11]: Finally, in order to be able to compare results across diversity metrics, we compute the z-score of each diversity index across all sources and years as follows: Where div st is the diversity index for source s and year t, d iv is the mean of the diversity index across all sources and years and sd(div) is the standard deviation of the diversity index across all sources and years.
As each of these diversity metrics considers each diversity component differently, we can interpret each metric from different perspectives.The balance-focused metrics Shannon and HHI increase as the distribution of elements across classes is not dominated by only one class.For instance, in the case of geographic diversity a balance-focused diversity index would be low if one outlet is mostly dominated by contributions from the USA and only a small share of the contributions come from different countries.The distance between these countries does not affect the balance-focused metrics.In contrast, the disparity metric would be lower for an outlet with contributions only from one continent than for an outlet with the same number of contributions but from different continents.The Rao-Stirling metric combines both of these diversity components but it would also have to be interpreted as a composite index of which the individual contributions of each diversity component are fuzzy.Finally, by allowing for different specifications of diversity, we ensure that our findings are robust across various specifications.We summarise the majority of computed indicators in Table II.
In the following, we present the results for each studied diversity dimension separately.

A. Gender Diversity
We consider two different categories (C = 2) in the gender dimension: "women", and "men" 7 .Since the number of categories is fixed over time, we indicate gender diversity through the share of women in each group.This comes with the benefit of simple interpretability of the result.We illustrate the share of women over time by group and source in Fig. 2.
Note first that for all three sources, the share of women overall always remains well below 40%.On average, AAAC 7.We refer to Section III-C for a discussion of binary gender categories.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.has the lowest share of women, although there is a clear positive trend visible from 2016, which is not indicated by the other sources.Distinguishing between contributors from industry and academia reveals that the share of women is on average similar for TAFFC (although quite volatile until 2015 due to the smaller group size) but lower among industry contributors for ACII and AAAC.Interestingly, when considering author order for TAFFC and ACII, we find that the share of women among first authors is on average higher than among last authors.A reason for that could be that author order is usually correlated with seniority, where first authors are more likely to be junior researchers (PhD students, early postdocs) and last authors are more likely to be senior researchers (professors, supervisors).This is in line with the literature on "leaky pipelines" for women in science [43].Finally, for ACII we distinguish between authors, keynote speakers and conference organizers but the differences between these groups in terms of share of women are not clear cut.In fact, except for 2019, the share of women among organizers is lower or equal to the share of women among authors.One notable point is that the keynote speakers of ACII 2021 were all women.

B. Institutional Diversity
We consider eight different categories in the institutional dimension (C=8, see Section III-A for the description of categories).Again, as the number of categories is fixed and limited over time, we indicate institutional diversity through the shares of each institutional type.The distribution of institutional types by source over time is illustrated in Fig. 3.
Naturally, as the data sources are all academic forums, the contribution from academia is largest across all sources and years.There are some interesting trends to note.While the contribution from industry for the TAFFC journal seems to decline over time, it seems to be increasing for the ACII conference.A reason for that could be that from the industry perspective conference participation comes with the additional benefit of being able to network and recruit potential employees with targeted skills.Furthermore, TAFFC seems to have more contributions from research facilities and healthcare than ACII.Another interesting development is the seeming disappearance of government contributions in all three forums.Explanations for these variations would have to rely on anecdotal evidence, which is why further research on the dynamics of institutional participation in Affective Computing research is needed.

C. Geographic Diversity
For the geographic dimension, classes are the affiliation countries of contributors, where the total number of classes varies over time.We compute diversity indexes for each source and each year.As measure of disparity, we compute the geographic distances between the capitals of each affiliation country and normalize this distance to a [0,1] scale by dividing each distance by the maximum distance of each source-year group.Fig. 4 shows the z-scores of the geographic diversity indexes over time by source.
First, note that geographic diversity is quite volatile over time for all sources, which is probably caused by variation in the number of classes for every year (c.f."countries" column of Table II).Moreover, the results suggest that geographic diversity is highly source-specific as the development of the indexes is notably different between the journal TAFFC, the conference ACII and the association AAAC.The main reason for this could be that the contribution to a conference like ACII requires travelling, 8 which is subject to economic and location constraints of the contributor, while travelling is not necessarily required for contributions to a journal or association.Overall, it seems that geographic diversity is increasing for TAFFC, somewhat decreasing for AAAC and volatile for ACII, probably depending on the conference location.Interestingly, we find that the virtual ACII 2021 conference had lower than average diversity levels.A reason for that could be a lower motivation for participation in virtual events in the second year of the Covid-19 pandemic.Furthermore, while most diversity indicators are correlated, we find a notable difference between different types of diversity indexes for AAAC.While the disparity index remains relatively constant over time, the balance and rao-stirling indexes are decreasing.This suggests that the set of classes (affiliation countries) remains similar over time but the distribution of contributions across affiliation countries became less even.We explore this further in wordclouds of affiliation countries normalized by source in Fig. 5. Indeed, Fig. 5 shows for TAFFC relatively clearly an increase in the number of authors (i.e., increase in font size) and countries.Moreover, while for 2010-2013 TAFFC the dominant contributor was the USA, this position is shared between USA and China in 2018-2021 with notable second-level European contributors from UK, France, India, Germany and Spain.As expected from the previous figure, the picture is less clear for ACII, especially in terms of number of authors and countries.
8.Note that ACII 2021 was virtual only due to the Covid-19 pandemic.
While the USA is dominating in terms of contributors for every year of the conference, there is indeed a variation in the other dominating countries depending on the location of the conference.For instance, ACII 2013 and 2019 took place in Europe, Geneva and UK, respectively, and many larger contributions came from European countries for these years.In contrast, ACII 2015 took place in Xian, China and for this year, China is a large contributor.In addition, the notably smaller and less diverse wordcloud in 2021 confirms the previous finding of low diversity for ACII 2021.Finally, the scarce wordclouds for AAAC underline previous findings, and, while from 2010-2017 AAAC still had members from Canada and China, in 2018-2021 AAAC consisted of members from the USA, Australia and some European countries.

D. Topic Diversity
We measure topic diversity using the same indexes as for geographic diversity.We identify topics based on a community detection approach on co-occurrences of author keywords.Note that we use author keywords (i.e., the keywords chosen by the authors themselves) as a proxy for topics rather than extracting them automatically from full paper texts using topic modeling techniques (we refer the reader to [44] for a recent overview on topic modeling algorithms).The reason is that we purposely want to capture the topics explicitly selected by the authors to identify their work with.
Links between keywords are weighted by the minimum conditional probability of their co-occurrence in a paper [45].That is, where ω ij is the weight of the link between keyword i and j, k ij is the number of co-occurrences and k i is the number of occurrences for keyword i.To make sure that the most often mentioned keywords do not form one big class, we remove links when both linked keywords are in the top 95th percentile of total occurrences.For all source-year groups, we run the Louvain community detection algorithm with the same specification [34].With this method, node communities (i.e., topics) are greedily identified by comparing the density of connections within topics with the density of connections between topics, where the resulting number of topics is not pre-specified.The application domains of this community detection method ranges from Neuroscience [46] to Cybersecurity [47] as well as Social Science [45].As topics we consider any identified keyword-community with more than two keywords.To obtain the paired distances between topics for the disparity index, we compute the inverse of the normalized paired similarities between topics.Paired topic similarity is the weighted sum of all links between all keywords from any two topics, where the weights correspond to ω ij .We show the topic diversity indexes for TAFFC and ACII in Fig. 6.First, note that all diversity indexes display very similar patterns, suggesting a high correlation between topic balance, disparity and variety.What is most striking about this result is the similarity with the geographic diversity indexes.Indeed, for both Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.geographic and topic diversity, ACII depicts peeks for the years 2015 and 2019 and dips for 2017 and 2021.Similarly, we observe a positive trend in topic diversity for TAFFC, which is parallel to its geographic diversity.Thus, geographic and topic diversity are correlated.The degree of causality and the mechanisms of this relationship are yet to be explored in future research.However, two aspects suggest that this may not be merely a spurious correlation.First, ACII geographic and topic diversity depict high volatility in parallel.If there were a general trend over time due to an overarching policy towards more or less geographic and/or topic diversity, this pattern would depict a clearer trend.
Second, the number and diversity of topics is identified ex-post and agnostically from the set of already accepted papers.Unless the initial call for papers and the paper selection committee explicitly aimed for topic (instead of geographic) diversity, we can assume that geographic diversity pre-dates topic diversity in the production of research from these studied sources.
For a better understanding of the evolution of topics in detail, we present the largest clusters of co-occuring keywords (i.e., topics) with at least 10 keywords and 7 occurrences by time intervals separately for TAFFC and ACII in Fig. 7. To avoid that these graphs are dominated by only one generic topic, we removed the keywords "affective computing", "emotion(s)", "emotion analysis", and "emotion recognition" from this part of the analysis.Topics are labelled by the keyword with the highest degree in the cluster and the size represents the number of occurrences of the respective keyword.In line with [48] and [31] we represent these topics as four typologies that are structured by two parameters taken from network analysis.The first parameter (y-axis in Fig. 7), termed "degree", represents the weighted number of links between the respective keyword and all other keywords in the network, where weights correspond to ω ij .The degree can be seen as a measure for the maturity (or development) of the topic.The second parameter (x-axis in Fig. 7), termed "centrality", measures the number of times the respective keyword lies on the shortest path in the network between two other keywords.The centrality can be seen as a measure for the relevance for the overall studied set of topics.Note that all indicated topics are relevant to the field to a certain degree, as the figure only shows larger topics (covering at least 10 keywords) and the most important keyword within that topic.Yet, we can still follow the characterization as proposed by [48].That is, topics in the upper-right quadrant are well developed and relevant to the overall field, while topics in the lower left quadrant are not as well developed and less important to the overall field.The topics in the lower left quadrant could potentially be emerging or declining topics.Similarly, topics in the upper left quadrant are well developed but not so relevant to the overall field, suggesting that topics in this quadrant tend to be niche themes.
Note first about Fig. 7, that the most relevant and developed topics for TAFFC and ACII are changing over time.For TAFFC these are "affect analysis" in 2011-2013, "facial expression analysis/recognition" and "multimodal analysis" in 2014-2017, to "feature extraction" in 2018-2021.For ACII these are "physiological" in 2011-2013, to "multimodal" in 2014-2017, to "computational modeling" in 2018-2021.Interestingly, both of these developments suggest a potential trend in the focus on subfields (i.e., affects or physiology) to more methodological tools (i.e., feature extraction or computational modeling, which could be partly related to the strong rise of deep learning in AI).However, a deeper time-series analysis is necessary to confirm this hypothesis, as these types of cluster analyses for separate time periods do not take into account inter-temporal correlation.
Interestingly, we also confirm with Fig. 7 the somewhat opposing trends in topic diversity for TAFFC and ACII.While TAFFC starts in 2011-2013 with one important and few developed but still niche topics, ACII is seen at the beginning with two important topics and multiple less developed and less relevant topics.However, in time we see that other well developed topics emerge for TAFFC (facial expression and multimodal analysis is important in 2014-2017) and multiple new topics appear (e.g., task analysis, data models, videos, eeg), while ACII indicates over time a reduction in topics a stronger focus on one developed and relevant topics and fewer than TAFFC's emerging/declining topics.
To address the question, whether women and men investigate different topics in the affective computing field, we conduct the same exercise of community detection on a network of keyword co-occurences separately for papers with a woman as lead author and a man as lead author.We present the results, separately for Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.TAFFC and ACII, in Fig. 8, where we also indicate the total number of papers (N ) for each group.
First, note that the most relevant and developed topics of TAFFC and ACII are the same (or at least very similar) for papers with woman versus man lead authors.Naturally, we find differences in the total number of important topics as the share of women among first authors is much lower (133 versus 342 for TAFFC, and 270 versus 442 for ACII).However, we notice other interesting differences across woman-led versus man-led papers in both the topics and the characterization of topics in terms of relevance and development.More precisely, for TAFFC man-led papers focus mostly on one big topic, feature extraction, with many smaller topics in the area of emerging/declining themes.In contrast, a second well developed and relevant topic for womanled papers is speech analysis, and two more well developed topics appear among the woman-led papers (physiological signals and eeg) that are not among the man-led papers.In contrast, for ACII man-led papers seem to indicate more well-developed and relevant topics ("physiological", "multi-modal", "affect") than woman-led topics ("physiological").In addition, there are some non-overlapping topics in the quadrant of well-developed topics that do not overlap, i.e. "computational modelling" and "facial expression" among man-led papers and "personality" among woman-led papers.At the same time, "deep learning" appears as a more relevant (and almost well-developed) topic that does not appear among man-led papers.Exploring these topical differences requires a deeper bibliographic and timeseries focused analysis, which goes beyond the scope of this paper.Nevertheless, these differences suggest that the gender of the lead author may have an impact on the topic of the paper.

V. DIVERSITY AFFINITY GROUPS AND INITIATIVES
From the previous sections, we can conclude that there is a wide room for diversity enhancement in the field of Affective Computing.In the following, we present a series of initiatives that have been successfully undertaken in other AI forums to enhance diversity, and that could potentially inspire our community, as will be discussed further in Section VI.
Table III provides a list of affinity groups actively involved in fostering diversity in the AI community.As it can be seen, most well-established groups put the focus on women in specific research areas including Machine Learning (Women in ML -WiML, active since 2007), Music Information Retrieval (Women in MIR -WiMIR, active since 2012), Recommender Systems (Women in RecSys, active since 2014) and Computer Vision (Women in CV -WiCV, active since 2015).More recently, since 2017, the focus has been widened to other demographics and under-represented groups such as Black people (Black in AI, created in 2017), Latin people (LatinX in AI, created in 2018), the LGBTQ+ community (Queer in AI, created in 2019), people with disabilities ({Dis}Ability in AI, created in 2019), Indigenous people (Indigenous AI, created in 2019) or African women (African Women in AI, created in 2022).We have performed an in-depth historical analysis of the diversity initiatives that have been promoted by both the organizers of international AI conferences and the aforementioned affinity groups.Table IV compiles the main findings, grouped by type of initiative.
The most traditional means of encouraging geographic diversity is to rotate the location of conferences between continents, to engage with different local communities, a measure also undertaken by ACII's organisers.Although this facilitates the participation of researchers from different countries in alternate years, it does not help to improve the diversity of the conference in a given edition.In addition, involved geographic areas are generally Europe, Asia/Oceania and North America, leaving aside Africa and South America.To overcome this problem, financial support is needed, which has only very recently started to be granted by Black in AI and LatinX in AI in the form of conference fee and travel expenses payments.However, incentives to host conferences in Africa and South America are still lacking.
Specific workshops, dedicated panels and plenary sessions at conferences, and social events at main conferences are initiatives that have been successfully implemented since the early 2010s to make under-represented groups gain visibility.Following the trend in the emergence of affinity groups, they initially focused on women and were then extended in 2017 to other demographics and under-represented communities including Black, Latin, African, LGBTQ+ and people with disabilities.Recently, in  11  There are other fruitful -though less common-initiatives to help improving diversity in conferences.The role of diversity and inclusion chair was first created in the 2016 ISMIR Conference (International Society for Music Information Retrieval Conference 12 ) so as to have a person devoted to take into account diversity in speakers and participants, as well as their inclusion and accessibility needs.Since 2018, some of the largest AI conferences such as NeurIPS, ICML, ICLR and RecSys also incorporate a diversity and inclusion chair in their organising committee.Nonetheless, ensuring an adequate level of diversity is not always easy, an there is a need for means to contact under-represented populations.Most affinity groups now provide directories, 13 multimedia profiles of outstanding researchers 14 (with interviews, podcasts, talks, etc.) and mailing lists 15 that are particularly helpful, e.g., to invite a keynote speaker from a demographic minority or to make sure that a call for papers reaches an under-represented community.These directories also contribute to increase the visibility of these communities.Finally, beyond attendance and participation in conferences, other types of initiatives are oriented to further promote the scientific careers of individuals from minority groups.Mentoring allows early career professionals or research students, who are unaware of their options and lack role-models (e.g., due to local expertise limitations or financial hurdles) to get in touch with experienced senior industry or academics mentors and benefit from their experience.There are important mentoring programs ongoing since the early 2010s.Most of them debuted in the context of prestigious AI conferences (e.g., NeurIPS, ICML, AAAI, ICLR) and put a special focus on women.The Black 16 and Latin 17 communities have recently launched strong mentoring programs, independent from journals or conferences.The Latin community goes a step further by launching an official journal 18 for community members to publish their latest research (e.g., works accepted in the workshops they organise).

VI. DISCUSSION
In this study we have analysed gender, geographical, institutional and topic diversity of the three most prominent forums of Affective Computing research, with the aim of raising awareness and fostering a discussion in the community on the need for more diversity in the field.Indeed, the main finding of this paper is that the state of diversity in Affective Computing could still be enhanced and there are already many active initiatives and affinity groups in the wider AI community that could help improve the situation.
With respect to the gender dimension, we find that the share of women remains below 40% across all three sources and most subgroups, such as institutional type or author leading position.And, while the trend appears positive, it is still rather slow.In addition, we find a lower share of women among contributors from industry, compared to academia, and among last authors compared to first authors, which suggests the presence of a "leaky pipeline" in affective computing careers.Regarding institutional types, our results show that Affective Computing research is, naturally, well-dominated by academia, where contributions from industry are increasing, especially for the ACII conference, and government contributions are declining.More research is needed to understand drivers of these trends.
At the geography dimension, we find that geographic diversity is increasing for TAFFC, location-dependent for ACII and somewhat declining for AAAC.However, all three sources show that the overwhelming majority of Affective Computing research still comes from the USA, which is closely followed by China and with contributions from European countries in the second place.Contributions from Latin America or Africa are almost non-existent.
Lastly, one of the most interesting findings is that diversity in the characteristics of researchers seems to affect the diversity of topics.Indeed, geographic and topic diversity follow highly similar patterns over time, and paper topics are on average different depending on the gender of the lead author.
To address the identified shortcomings in the diversity of researcher characteristics and potentially obtain the benefit of more topic diversity, we formulate from our analysis the following recommendations: r Define diversity targets and monitoring tools.The clear definition of diversity targets through indicators that can be monitored can help direct efforts and diversity funds, and potentially accelerate positive diversity trends.
r Ensure data quality for monitoring.As mentioned in [37], self-identification in demographic data linked to diversity incurs in least risk, as requiring data subject participation ensures that they have more awareness and control over their own data.However, this awareness can raise concerns about responsible data use.Thus, our recommendation is to ask authors for their demographic data (e.g., at conference registration time, when submitting a paper to TAFFC, through surveys) while informing them about their contribution to better diversity monitoring.High-quality data would allow for an accurate and reproducible monitoring effort, and informed consent would bring agency to the community in the diversity efforts and related monitoring process.
r Assess the suitability of the diversity initiatives reviewed in this study in the context of Affective Computing.It is important to assess, for each initiative, the suitability for the Affective Computing community in terms of needs, motivation, resources and impact.For instance, it may be adequate to see which community members may be motivated to lead certain initiatives (e.g., social activities, mentoring programs or affinity groups), the financial resources available (e.g., for travel grants) and be strategic in selecting those providing more impact and requiring less resources.Multi-faceted diversity indicators that are based on high quality data can help guide these decisions.
r Strengthen relationships with existing affinity groups in AI.We have identified key affinity groups from underrepresented minorities that are very active in fostering diversity in AI, with excellent outcomes in the last 5 years.Getting in touch with them, having access to their mailing lists to promote Affective Computing outlets or inviting them to co-organise workshops are actions easy to implement in the short term and that could attract these minorities into the field.Particularly, we have seen the very scarce presence of researchers from Africa and Latin America, thus Black in AI, Black Woman in AI and LatinX in AI could be interesting targets.We have also discussed the lack of information regarding the sexual orientation and gender self-identification of our community members, which is an issue that could be potentially addressed in tandem with Queer in AI.Finally, {Dis}Ability in AI could help to identify and break barriers that Affective Computing members might have been experiencing.

VII. CONCLUSION
This work builds on Hupont et al. [16], presented at ACII 2021.Here, we extend the original study in terms of data basis (adding TAFFC and AAAC data), facets of diversity (adding novel metrics of disparity and variety), diversity dimensions (adding topic diversity and combinations of dimensions) and broader policy context (adding a discussion on existing diversity initiatives and affinity groups).This joint analysis of AAAC, TAFFC and ACII, which covers a decade of community members' data (from January 2011 to December 2021), firstly allows to reinforce our initial hypothesis on the "leaky pipeline" existing in the field, even though the share of women seems to timidly increase over the years for the three sources.Second, it demonstrates that academic institutions dominate affective computing peer-reviewed publications (both for the journal and the conference), ahead of private industry -which is however increasingly present-and governmental research centres.In terms of countries, USA-based institutions clearly lead the list of publications, while those from Latin America and Africa are vastly under-represented.Lastly, we demonstrate that diversity in the characteristics of led authors (namely their gender and geographic location) has an influence on topic diversity.
To the best of our knowledge, this is the largest and most comprehensive diversity study to date on Affective Computing research.Beyond the Affective Computing field, we provide a novel framework for the quantitative analysis of diversity in a research community, which is grounded on metrics adapted from fields as varied as Economics, Neuroscience or Ecology and that can be used for the analysis of any other community.
Nevertheless, our study has to be considered in light of some limitations.First, note that all connections between different diversity dimensions that we discuss in this work and the intertemporal analysis of topics do not go beyond the analysis of correlations and indicative trends.To explore and confirm drivers, potential confounders and mechanisms of these correlations and trends requires further causal analysis that could exploit an exogenous variation for every hypothesis made throughout this analysis.Nevertheless this work was useful to formulate Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
these hypotheses in the first place.Moreover, there are other relevant diversity dimensions not covered here.In the context of anti-discrimination and for the purpose of inclusive research and innovation, dimensions such as race, sexual orientation, religion or disability are at least equally important.However, there is no ethical basis for the inference of these dimensions based on name and affiliations.We note that the gender and nationality inference may not be fully adequate either, although we want to highlight the importance of monitoring the diversity of research communities in general.Moreover, gender had to be limited to a binary categorisation and may be misaligned with individual self-identification.Some of these issues could be solved by undertaking diversity initiatives, by explicitly asking community members to self-identify themselves (e.g., in surveys, at registration in ACII), and by fostering collaborations with affinity groups from under-represented minorities.
Although the mentioned limitations reflect the complexity of developing a holistic, robust and universal analysis of diversity in a research community, goal that we will keep on pursuing in our future research, our study identifies specific paths for methodological research and concrete diversity gaps in the affective computing community.
Finally, although we consider diversity an important goal that is valuable in and of its own right, we would also like to provide more support to the claim that "diversity is important" in future work by identifying and highlighting a link between diversity and innovative ideas.

Fig. 1 .
Fig. 1.Semi-automated process followed to collect per-paper and per-author data from ACII conferences and the IEEE Transactions on Affective Computing journal (years 2011 to 2021).

r
Variety: Refers to the number of classes in a set.r Balance: Refers to the evenness of the distribution of elements across classes.Indicators of balance would be lower if a larger share of the elements are concentrated in only a few classes.

r
Disparity: Refers to the degree of difference or distance between all classes.

Fig. 2 .
Fig. 2. Share of women over time by source and group.

Fig. 3 .
Fig. 3. Distribution of affiliation types over time by source.

Fig. 5 .
Fig. 5. Author affiliation countries over time by source.

Fig. 7 .
Fig. 7. Development of topics over time by source.

Fig. 8 .
Fig.8.Topics of papers with a woman versus a man as lead author.

TABLE II SUMMARY
OF OBTAINED DIVERSITY METRICS BY SOURCE (TAFFC JOURNAL, ACII CONFERENCE AND AAAC ASSOCIATION), YEAR (2011-2021) AND DIVERSITY DIMENSION (GENDER, TYPE OF INSTITUTION, GEOGRAPHIC LOCATION AND TOPIC).NOTE THAT THE VALUES OF DIVERSITY INDEXES, NAMELY SHANNON (H ), HERFINDAHL-HIRSCHMAN (HHI), DISPARITY (D) AND RAO-STIRLING (Δ), ARE NORMALISED USING THEIR Z-SCORE FOR THE SAKE OF DIRECT COMPARISON ACROSS SOURCES AND YEARS

TABLE III LIST
OF DIVERSITY AFFINITY GROUPS IN THE ARTIFICIAL INTELLIGENCE RESEARCH COMMUNITY 2021, the large and top-ranked AI conferences AAAI (Association for the Advancement of Artificial Intelligence Conference on AI 9 ) and AAMAS (International Conference on Autonomous Agents and Multiagent Systems 10 ) started to include an open call for diversity and inclusion activities, as a response to which affinity groups joined forces to propose a series of Diversity workshops on AI.

TABLE IV SUMMARY
OF DIVERSITY INITIATIVES THAT HAVE BEEN UNDERTAKEN IN INTERNATIONAL AI CONFERENCES.COLUMN YEAR SPECIFIES THE YEAR SINCE THE INITIATIVE HAS BEEN IMPLEMENTED licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.