An Overview of Fairness in Clustering

Clustering algorithms are a class of unsupervised machine learning (ML) algorithms that feature ubiquitously in modern data science, and play a key role in many learning-based application pipelines. Recently, research in the ML community has pivoted to analyzing the fairness of learning models, including clustering algorithms. Furthermore, research on fair clustering varies widely depending on the choice of clustering algorithm, fairness definitions employed, and other assumptions made regarding models. Despite this, a comprehensive survey of the field does not exist. In this paper, we seek to bridge this gap by categorizing existing research on fair clustering, and discussing possible avenues for future work. Through this survey, we aim to provide researchers with an organized overview of the field, and motivate new and unexplored lines of research regarding fairness in clustering.


I. INTRODUCTION
Machine Learning (ML) has been used to tackle many important problems, many of which can have significant societal implications. Some of these problems include predicting the likelihood of prisoner recidivism [1]- [5], disbursement of bank loans [6]- [8], shortlisting candidates for job applications [9]- [13], and college admissions [14]- [16]. Since ML models train on large datasets that have been found to contain biases against both individuals and minority groups, they can further amplify biases when used in high-impact applications. This has been evidenced in many ML applications where fairness was not considered to be an evaluation criteria. Some examples are Microsoft's Tay online chatbot which learned from tweets and due to biased inputs started using racist slurs [17], and the COMPAS tool which predicted that a black individual is more likely to commit a crime [18] than a white individual even if both individuals are statistically similar with regards to other attributes.
To rectify models and correct for unfairness, ML researchers have recently begun to propose approaches that ensure fairness constraints are met [19]- [25]. However, defining fairness notions is not a trivial task, and is often done based on application and legal context. For example, fairness can be defined for minority protected groups (such as for ethnicity, gender, etc) [26] or for individuals (that is, similar individuals should be treated equitably) [27], and both possess certain advantages and disadvantages depending on The associate editor coordinating the review of this manuscript and approving it for publication was Ting Wang . where they are being utilized. It has been found that different notions of fairness are generally incompatible [28], [29] with one another and cannot be jointly optimized for, further compounding the difficulty of the problem.
Clustering algorithms are unsupervised ML algorithms that are widely utilized in problem settings where labels are not easily available (such as resource allocation problems). Moreover, recently, the issue of fairness for clustering has received considerable attention in the ML community, pioneered by the first work on fair clustering by Chierichetti et al [30] in 2017. However, ensuring fairness for clustering is harder than the general ML case, as labels are not present with the data, and ground-truth error rates cannot be calculated to estimate bias and unfairness. This makes both defining and enforcing fairness for clustering, challenging problems.
Due to this reason, many different fairness notions for clustering exist (for example [30]- [34]), with different research papers opting for different metrics, or proposing new ones. Furthermore, techniques for ensuring fairness constraints are met vary widely in methodology; comparisons between different fair approaches are usually made selectively, and there are no established (fairness and performance) metrics that are adopted for comparison. There are also no surveys or review articles that have been compiled for fair clustering approaches. This is in stark contrast with other ML sub-fields, where multiple surveys exist-such as for recommendation systems [35], natural language processing models [36], learning to rank models [37], and sequential decision-making approaches [38], among others. Therefore, we aim to bridge this gap and organize the field through this article. Our goal is to provide both existing and new researchers in fair clustering with an overview of the field, along with new insights. We categorize the myriad of approaches in fair clustering similar to other ML survey articles, and provide many different classifications for fairness notions for clustering. Our work also discusses real-world applications for fair clustering as well as datasets used for evaluating fair clustering approaches. Thus, the article can also serve as a tool for ML practitioners aiming to utilize fair clustering in their applications. To summarize, the contributions of this work are as follows: • We provide the first survey on fair clustering that organizes the field and categorizes fair clustering approaches similar to other ML surveys.
• We classify the many different available fairness notions for clustering, provide details regarding the evaluation of fair models, and the datasets frequently used for the same.
• We discuss motivations for clustering using real-world applications to aid ML practitioners, and also provide a multitude of new research directions for the field.
The rest of the paper is structured as follows: Section II details relevant background regarding clustering and fairness in ML. Section III discusses different fairness notions employed for clustering and how they can be organized into intuitive sub-categories. Section IV describes the different approaches used to make clustering fair. Section V examines the datasets used for evaluating fair clustering, and motivates the research problems related to fair clustering through realworld applications. Section VI provides insights and analysis for future work in fair clustering, and Section VII concludes the paper.

II. PRELIMINARIES AND NOTATION
In this section, we briefly discuss the working of clustering algorithms and give an overview of the different approaches used to make ML models fair. We also detail the notation and symbols used throughout the paper.

A. CLUSTERING ALGORITHMS
Formally, a clustering algorithm A seeks to partition a given input dataset X ∈ R n×m into some k ≤ n clusters. Moreover, each sample x ∈ X can belong to one (hard clustering) or more (soft clustering) of the k clusters, depending on the clustering objective used. Let C = {C 1 , C 2 , . . . , C k } denote the output partition set obtained by running the clustering algorithm A, where C i ⊆ X , ∀i ∈ [k]. As there are no labels present for the data samples, X is both the training dataset and the testing dataset for the clustering problem. This is different from traditional supervised learning and classification tasks, where training datasets and test datasets are separate. The unsupervised nature of the clustering problem also further complicates the issue of defining and enforcing fairness, which we discuss in subsequent sections. It is important to note that most clustering objectives (such as k-means, hierarchical clustering, k-medoids, etc) are generally NP-Hard [39]- [41] and are usually solved using algorithms that approximate the optimal solution [42], [43] or through heuristic approaches [44]. For example, for the widely used k-means clustering objective [40], the expectation-maximization based Lloyd's algorithm is used [44] as a heuristic which works very well in practice.
Another distinguishing feature of clustering algorithms is that the number of clusters k could be given as an input to the learning model or obtained via the clustering optimization problem itself. For example, in center-based clustering algorithms such as k-means [44] or k-medoids [43], k is an input parameter, but hierarchical clustering algorithms [39], [45], [46] output a tree of clusters, with each level of the tree indicating a possible choice of k ≤ n that the user can opt for. Other algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) [47] and Ordering Points To Identify the Clustering Structure (OPTICS) [48] also do not require number of clusters as input, but infer a single value for k from the dataset provided. We defer the reader to [49] for more details on different clustering algorithms.
Unless otherwise specified, we generally consider hard clustering for example scenarios in this article, that is each point can only belong to one cluster. In Fig. 1, we provide an overview of the aforementioned general clustering process. The original dataset X is provided as input to the clustering algorithm A and we obtain the cluster partition set C = {C 1 , C 2 , C 3 , C 4 } as output, shown in blue, red, yellow, and green respectively.

B. A BRIEF TAXONOMY OF CLUSTERING METHODS
Many different clustering methods have been proposed to partition data into meaningful clusters, and a preliminary knowledge of these is useful before delving into the numerous approaches proposed for fair clustering. For ease of understanding, we borrow from (and slightly modify) the classifications originally proposed by Xu and Wunsch [49] for differentiating data clustering methods. As a complete in-depth discussion is out of the scope of this work, we refer the reader to the surveys [49], [50] for more details on approaches for clustering data.
Clustering algorithms can be generally categorized into the following: VOLUME 9, 2021 1) CENTER-BASED CLUSTERING These approaches aim to partition the input dataset into clusters by minimizing an error metric between data samples assigned to a cluster, and their corresponding cluster centers. Depending on the defined error metric, cluster centers can be either the mean of the samples in the cluster (such as in k-means [44]), or the median of samples in the cluster (such as in k-medoids [43]), among many other possibilities. The most common approach for this category is k-means where the error term is defined to be the squared Euclidean distance between cluster samples and cluster centers [51]. Many different variations for k-means have been proposed that improve upon the original heuristic algorithm [52]- [54]. Other methods include k-medoids [43], Iterative Self-Organizing Data Analysis Technique (ISO-DATA) [55], among others.

2) HIERARCHICAL CLUSTERING
Hierarchical clustering approaches aim to partition the dataset into hierarchies, with the clustering output represented as a binary tree. The root node represents the entire dataset while the leaf nodes comprise of the singular samples of the dataset. The remaining nodes of the tree represent clusters, and in this way, a hierarchy of clusters is obtained. Agglomerative hierarchical clustering algorithms aim to build this tree in a bottom-up fashion, whereas divisive hierarchical clustering algorithms seek to do so in a topdown fashion. Some examples of agglomerative hierarchical clustering algorithms include Clustering Using Representatives (CURE) [56], Ward's method [57], Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) [58], Robust Clustering using Links (ROCK) [59], among others. For divisive hierarchical clustering, examples include the Divisive Analysis algorithm (DIANA) and Monothetic Analysis algorithm (MONA) [60]. Recently, analytical objectives for hierarchical clustering have also been proposed [39], [45], [46] which have lead to the development of more theoretically robust hierarchical clustering algorithms.

3) MIXTURE MODEL-BASED CLUSTERING
Mixture model-based clustering refers to a probabilistic clustering approach where points are assigned to clusters in a soft manner, and do not have hard memberships. Furthermore, data points are assumed to originate (and belong to) some mixture of probability distributions. In this clustering approach, the nature of distributions are generally assumed (very often to be a mixture of multivariate normal distributions). Then the clustering task transforms into finding the set of parameters for this mixture of distributions that maximize a metric such as log-likelihood (or how likely a point is determined to belong to a particular cluster). Popular clustering approaches that belong to this category are Gaussian-Mixture-Model based Expectation Maximization (GMM-EM) [61], Expectation Maximization-based Mixture program (EMMIX) [62], and AutoClass [63], among others.

4) GRAPH-BASED CLUSTERING
Graph-based clustering approaches utilize concepts from graph theory to cluster the data. This first requires translating the original dataset into a graph problem, by treating data samples as nodes/vertices in a graph, and creating edges between samples using a dissimilarity/similarity metric. The dissimilarity/similarity metric is usually defined using a distance metric between points. Then, edges can be created between nodes if points are within a certain distance threshold, often using a k-nearest-neighbor graph [64]. On obtaining a graph describing the original data, the Laplacian matrix can be obtained. Clustering using k-means (or other simple clustering algorithms) is then undertaken on the eigenvectors of the Laplacian, and the original data samples can be assigned the same cluster labels [65]. Depending on the choice of the graph Laplacian, different spectral clustering outputs can be obtained [66]. Many other graph-based clustering approaches also belong to this category, such as Clustering Identification via Identity Kernels (CLICK) [67], Delaunay Triangulation Graph based clustering (DTG) [68], among others.

5) FUZZY CLUSTERING
Fuzzy clustering algorithms consist of soft clustering approaches where data samples have fuzzy memberships (a grade of membership between 0 and 1) to clusters instead of binary cluster assignments. The most popular fuzzy clustering method is Fuzzy C-Means (FCM) [69]. Many improvements have been made upon FCM, including methods that more easily identify centers [70], generalize the algorithm to arbitrary distance metrics [71], reduce time complexity [72], and more. Fuzzy clustering can also be combined with hierarchical clustering, as was done in Hierarchical Unsupervised Fuzzy Clustering (HUFC) [73].

6) COMBINATORIAL SEARCH-BASED CLUSTERING
Exactly solving most clustering optimization objectives can be NP-Hard as there often exists an exponential search space of clustering solutions. Thus, the clustering problem can be reformulated as a combinatorial optimization problem, and local search approaches can be used to approximate the optimal clustering solution. Most often, due to the hardness and generality of the problem, evolutionary approaches [74] are used for the search algorithm, such as Simulated Annealing (SA) [75], Genetic Algorithms (GA) [76], etc. Clustering approaches that belong to this category include Genetically Guided Algorithm (GGA) for clustering [77], Genetic k-means Algorithm (GKA) [78], among others.

C. FAIRNESS IN ML
Fairness for ML models can be enforced/ensured in three stages of the learning pipeline [17], [79], [80]; in the 1) before-training, 2) during-training, or 3) after-training phase: 1) The before-training stage requires that the original data be pre-processed to obtain a new dataset. On training/running the unchanged ML model on this new dataset, the output predictions will meet the fairness constraint.
2) The most common approach to improving fairness for ML models is the during-training or in-processing stage, where the ML model itself is modified to include the fairness constraints. This involves changing the optimization and training process such that the output predictions are fair, without changing the original dataset. 3) Finally, fairness can be enforced after-training as well, where the predictions from the original model undergo a post-processing procedure to compute a similar set of predictions such that they now meet the desired fairness constraints. We detail these methodologies in the context of the learning pipeline in Fig. 2. As mentioned before, since clustering is an unsupervised learning problem where training and test datasets are the same, the diagram shown in Fig. 2 will also change accordingly to account for this. We thus discuss approaches specific to clustering in Section IV of the paper, which build upon the high-level schematic of Fig. 2. We do not discuss how fairness of general ML models (such as for classification, computer vision, etc.) can be measured via analytical metrics as this is outside the scope of this work. We discuss fairness metrics and notions specific to clustering in Section III of the paper, but interested readers can refer to [17] for more information on fairness notions for general ML models.

III. FAIRNESS NOTIONS FOR CLUSTERING
In this section we discuss the different notions of fairness that are generally employed for clustering. As mentioned before, fairness notions are often application specific and a particular definition might be more preferable in certain settings compared to others. For example, consider an adapted version of the application scenario provided in [31]. We have to find where to set up three (=k) parks for a given set of houses in a region. For this, we can use center-based clustering algorithms where each center could denote a possible park location. In this region, we have two dense city sub-regions with housing highly localized in smaller area, and a residential sub-region which encompasses large area but is less dense than the city. This scenario is shown in Fig 3. Now, if a general center-based clustering algorithm (such as k-means) was used, we would obtain a single cluster center (park) to share for the city sub-regions whereas the larger-sized suburban sub-region would get two parks. This is unfair to the individuals living in the dense sub-regions and hence, this application requires a definition of fairness which warrants proportionally shared cluster centers. A fair solution (in this context) would distribute two centers for each of the two dense city sub-regions and one park for the larger/sparser sub-region. Thus, the definition of proportionality proposed by [31] is more suitable than other fairness notions (such as the most commonly used balance proposed by [30]). The former captures the idea that data samples are individuals and fairness to these individuals means being clustered in an accurate manner with regards to their dataset features and cluster centers. The latter on the other hand, aims to capture the degree to which points belonging to protected groups are represented in each output cluster. It is then evident that for the example considered above, proportionality is the better fairness notion. Note that proportional fairness is also more apt for this scenario as it does not require protected groups (in contrast with balance which explicitly requires groups) and can be tailored to the fairness requirement at the sample level.
The above example then introduces an interesting research question: are there ways to distinguish clustering fairness notions from each other? We answer this question in the affirmative by introducing four different classifications for fairness definitions: group-level, individual-level, algorithm agnostic, and algorithm specific fairness. Fairness notions can belong to more than one category as well. As fairness notions for clustering have not been formally categorized before, we aim for these to be a simplistic first step in doing so; many other different classifications/categorizations are possible. Subsequently, we explain each category individually and then provide existing definitions/analysis using our proposed categories for all fair clustering notions proposed to date.

A. GROUP-LEVEL NOTIONS
Group-level fairness notions are usually derived from the Disparate Impact (DI) doctrine [81] which states that no group of individuals should be adversely affected by the outcome of a decision-making system. That is, no group of individuals should be discriminated against or overtly preferred by an algorithm in terms of the output predictions made.
This category of fairness can be understood through an example. A dataset, e.g., the creditcard dataset [82], is used by the marketing division of a bank to reach out to prospective customers and offer them loans and available credit opportunities. The dataset contains features such as the potential customer's age, their education level, their weekly work hours, and their capital gains per month. The bank utilizes a clustering algorithm to find target audiences for promotional offers and uses the aforementioned attributes as input to the clustering algorithm. That is, on running the algorithm, they obtain clusters of people who are then (using some metrics, e.g., the education and wages-earned features) grouped together to be targeted for a particular promotion/offer. It is important to note here that people-of-color (POC) as well as women, tend to earn lower wages than white males [83], and that POC face more adversities that lead to disparities in their education level [84] as opposed to white demographics. Now, considering these facts on the racial education divide and the wage gap, a clustering algorithm using these attributes will inherently group white households as well as men, as better candidates for better deals and offers (such as mortgages and loans). As a result, this marketing clustering algorithm has disparate impact on POC as well as women, as they are deprived of an opportunity of improvement. Therefore, it is important to study protected groups (e.g., ethnicity and gender) and the corresponding fairness in such a clustering setting. Group-level fairness measures thus aim to capture this setting in an analytical manner.
An example of a group-level measure is the balance notion first proposed by [30] and then generalized by [85]. It requires calculating the ratio between the proportion of total protected group members in the dataset and the proportion of protected group members in a cluster, and the balance of the clustering is then the minimum value obtained over all clusters and protected groups. As a result, it always lies between 0 and 1, with higher values indicating a clustering output that is more fair. We also found that balance is the most commonly used fairness notion for most research on fair clustering.
Other group-level fairness notions include bounded representation [86] which considers two parameters α and β which denote the allowed maximum and minimum proportions of protected group members that can be present in a cluster. Thus, through this notion no protected group members should be over or under preferred for each cluster. Another example is the Max Fairness Cost (MFC) proposed in [32] which is similar to balance but takes a user-inputted ideal proportion value as well. It measures the deviation of the current proportion of protected group members in a cluster from this ideal proportion using the L1 norm. We discuss other categories next and then provide a complete tabular list of group-level fairness notions in Table 1.

B. INDIVIDUAL-LEVEL NOTIONS
Individual-level fairness notions are significantly different from the group-level fairness notions. Here, we do not have any protected groups, and the goal is to ensure that similar individuals (samples in the dataset) are treated similarly by the ML model. That is, a clustering model abiding by individual-level fairness would cluster all individuals that are deemed similar using some dissimilarity metric in a similar manner. The proportional notion of fairness [31] discussed before is an example of an individual-level fairness notion for clustering.
Individual-level fairness for clustering has not been studied as extensively as group-level fairness, and most works only focus on facility location and center-based clustering. The differences in these individual-level fairness definitions stem from 1) how the dissimilarity metric is defined between individuals, and 2) how similarity is measured with regards to the output clustering. In [87] the authors assume that the dissimilarity metric is available as a distance metric d, and that a clustering satisfies individual-level fairness if for each individual sample in the dataset the average distance (measured using d) to samples in its cluster is less than the average distance (measured using d) to any samples in other clusters. In [34] and [88] the authors provide an alternative definition for individual-level clustering fairness: every sample in the dataset should have a center within a distance R where R is the minimum radius of the ball centered around the sample that contains at least n/k (total samples over number of clusters) samples. In [89] individual-level fairness is extended for the clustering setting from the seminal work of [26] for classification. Here, the authors consider soft clustering outputs and as the clustering is probabilistic they enforce individual-level fairness through distributional similarity of the cluster outputs. Very recently, more research has emerged on individual fairness for clustering [90]- [94], and we cover these in more detail in Section IV. As mentioned previously, different fairness notions can often not be applied together. This is true for group-level and individual-level fairness notions for clustering. In particular, in [95] and [96], the authors find that forcing group-level fairness can adversely affect individual-level fairness between similar individuals. This can also be seen through a simple example shown in Fig. 4 which has been adapted from [89].
Here, different protected groups are denoted using different markers and different cluster assignments are denoted using different colors. The cluster assignments required to meet group-level fairness (for example, enforced through balance) are shown on the left and the cluster assignments to satisfy individual-level fairness are shown on the right in Fig. 4. This is because for group-level fairness each group needs to be represented in a cluster in similar proportion whereas for individual-level fairness we would like closely distanced (similar) points to be clustered together (similarly). As can be seen from the figure, these are mutually exclusive cases, hence only one notion of fairness can be enforced at a time. We provide a complete list of individual-level fairness notions in Table 1 towards the end of this section.

C. ALGORITHM AGNOSTIC NOTIONS
We also categorize fairness notions based on whether they are designed specifically for certain clustering objectives or can generalize to any given objective. Algorithm agnostic notions are generally defined for the cluster output level and can thus generalize for all clustering objectives. For example, the first proposed fairness notion balance [30], [85] discussed previously, essentially operates with cluster outputs given by any clustering algorithm. This makes it an algorithm agnostic fairness notion.
Note that any fairness notions which do not make explicit assumptions regarding clustering algorithms, but implicitly require specific clustering behavior are not considered as algorithm agnostic. For example, for the proportional fairness notion [31], while there is no explicit clustering algorithm mentioned in the definition, the notion requires cluster centers, thus limiting it only to center-based clustering objectives. Furthermore, both group-level and individuallevel fairness notions can be algorithm agnostic. We also find that most group-level fairness notions are algorithm agnostic. Algorithm agnostic notions are tabulated towards the end of the section (Table 1).

D. ALGORITHM SPECIFIC NOTIONS
Algorithm specific fairness notions constitute fairness notions that work specifically for certain clustering objectives and algorithms. One example is the k-means social fairness cost, proposed by [33]. In their work, the authors define a fair clustering to be one where the average k-means cost for each protected group is minimized. While this aspect of social fairness could be extended to other learning tasks, the current work seeks to do so for k-means, making it specific to center-based clustering objectives. Other examples include proportional fairness proposed by [31] and the individual-level fairness notions of [34], [88] as they only work with center-based clustering. A full list is provided in Table 1.

E. DEFINITIONS FOR COMMONLY USED NOTIONS
In this subsection, we provide mathematical definitions for some commonly used fairness notions. However, due to the multitude of different notions proposed, we defer the list of all notions to Table 1 and provide pointers to appropriate related works that discuss and define these notions there.
We now provide technical definitions for the following fairness notions:

1) BALANCE
The group-level and algorithm agnostic fairness notion of balance was first proposed by Chierichetti et al. [30] for the case with 2 protected groups. It was later generalized to the multiple group case by Bera et al. [85]. Since then, balance has been employed as the fairness metric for most research on fair clustering [97]- [100].
Let there be m protected groups. Then, define r and r a to be the proportion of samples of the dataset belonging to protected group b and the proportion of samples in cluster a ∈ [k] belonging to protected group b. Then define another ratio for this cluster and protected group as R a,b = r/r a . The balance fairness notion is then defined over all clusters and protected groups as: As can be seen through the definition, balance lies between 0 and 1, and the higher the value, the more fair the clustering output. That is, a fair algorithm will attempt to maximize the notion of balance. This is usually done as a constraint to ensure that the balance is either lower-bounded or upperbounded by a required pre-defined input value.
Some authors implicitly utilize the balance fairness notion but reformulate it to aid theoretical analysis. One such example is in [101] and [102]. Let there be m protected groups, and samples of dataset X in cluster a that belong to group b are denoted using the set G a,b . Then, define for cluster a, J a = min b∈[m] G a,b and L a = max b∈[m] G a,b . Then the reformulated notion of balance is: As is evident, this also outputs a value between 0 and 1, and the authors also provide theoretical analysis to show that VOLUME 9, 2021 minimizing this notion of fairness is equivalent to minimizing the original 2-group balance notion proposed by [30].

2) SOCIAL FAIRNESS
The social fairness cost was proposed by Ghadiri et al. [33] for the k-means clustering objective. A similar notion of group representative fairness was developed by Abbasi et al. [103] for k-means and k-medians. Markarychev and Vakilian [104] generalized the social fairness problem, but here we present the k-means case as originally defined. In its current formulation, this fairness notion is algorithm specific, as it can only be used for center-based clustering.
Assume here also without loss of generality that there are m protected groups. Define the k-means clustering cost for a set of k cluster centers U and the input dataset X as O(U , X ) = x∈X min u∈U ||x−u|| 2 . Also, let X a denote the samples of X that belong to protected group a. Then the social fairness cost for k-means clustering becomes: |X a | As the above notion is a cost, it needs to be minimized unlike balance which was to be maximized. That is, the lower the social fairness cost the more fair the clustering.

3) BOUNDED REPRESENTATION
The notion of bounded representation was proposed by Ahmadian et al. [86]. It is a group-level notion and can be defined using two parameters α and β. The fairness notion is defined through constraints that need to be imposed and met for each cluster obtained via the clustering algorithm. Let P a,b be the proportion of protected group b ∈ [m] members in cluster a ∈ [k]. Then, for (α, β)bounded representation we require that: Essentially, unlike the other notions discussed previously, this notion is defined as a set of constraints. If all the fairness constraints for each group and cluster are met, the clustering is fair. This notion of fairness can also be defined by only considering either the upper-bound (α) or lower-bound (β) on the proportion of points. If α = β = 1/m then the notion aims to represent each group with equal proportion in the clustering output. Bounded representation has been used in conjunction with a number of clustering objectives as well [86], [105].

4) MAX FAIRNESS COST (MFC)
The MFC was defined by [32] for heuristic hierarchical agglomerative clustering algorithms. Despite this, it is an algorithm agnostic fairness notion as it works at only one level of the tree hierarchy, making it apt for any clustering algorithms with k cluster outputs. It is also a group-level notion and requires an additional parameter named the ideal proportion (I b ) defined for each protected group b ∈ [m]. Here, I b is given by the user and provided at run-time, and can vary to account for different application requirements. Then if the proportion of group b ∈ [m] points in cluster a ∈ [k] are given as P a,b , the MFC is defined as: The MFC is essentially the maximum of the sum of all deviations from the ideal proportion for each protected group in a cluster. The lower the MFC, the better the fairness achieved by the clustering. If the parameter I b is set to 1/m then the fairness notion aims to ensure that each protected group is represented with the equal proportion in each cluster.

5) DISTRIBUTIONAL INDIVIDUAL FAIRNESS
This individual-level fairness notion was proposed by [89]. Here, a fairness similarity measure F ∈ R + is assumed to be known that operates on a pair of samples from the dataset X . To ensure fairness, the statistical distance obtained using the f -divergence [106]- [108] for the output distributions of each pair of samples should be smaller than the distance obtained using the F metric. Also, the fairness notion is algorithm specific as it assumes cluster centers are available, limiting applicability to center-based clustering. It also assumes probabilistic clustering (a setting such as Gaussian Mixture Model based soft clustering [109]) for the problem definition. Their work extends the notion of individual fairness proposed for classification by [26].
Let U denote a k-sized cluster center set. Also let the f -divergence between the distributions V x , V y cast over U for pair of samples x, y ∈ X × X be denoted as H f (V x ||V y ). Then the distributional individual fairness notion requires that the following is met for all pairs of dataset samples x, y ∈ X × X : Note here that for the f -divergence, many possible definitions exist that can be used, such as the KL-divergence [110].

6) KLEINDESSNER et al. INDIVIDUAL FAIRNESS
This is another individual-level notion of fairness proposed by [87]. Unlike the previous individual-level notion, this works at the level of the clustering output C = {C 1 , C 2 , . . . , C k } and hence, is algorithm agnostic. For each sample x in the dataset X , let d be a well-defined clustering distance metric and C a be the cluster that x belongs to. Then, the fairness notion of [87] can be defined as a set of constraints for the sample x and all clusters b ∈ [k], b = a as: If all the above constraints are met for all the individual samples in the dataset X , the clustering is deemed to be individually fair.

7) ENTROPY
Entropy is a fairness metric that was defined in [111], and has only been exclusively used for fairness in the context of deep clustering models. A distinction of deep clustering with respect to general clustering methods is that ground truth labels for each sample are known prior to training. Also, similar to balance, the higher the entropy the more fair the model. Let N a,b be the set containing the samples of the dataset X that belong to both the cluster a ∈ [k] and the protected group b. Further, let n a be the number of samples in cluster a. Then entropy is defined as follows:

IV. APPROACHES FOR FAIR CLUSTERING
In this section, we comprehensively discuss research to-date on fair clustering, along two dimensions: 1) the clustering objective the fairness intervention is for, and 2) what stage of the learning pipeline the intervention falls into (refer to Section II). In the first subsection that follows, we summarize all fair clustering approaches by categorizing them based on the clustering objective they employ. This includes centerbased clustering (such as k-means, k-center, k-median), hierarchical clustering, spectral clustering, and deep clustering models. Since there are certain approaches that are either more general or do not belong to either of the aforementioned clustering objectives, we also have a miscellaneous category. We find that the most common clustering objective considered for fair clustering approaches is center-based clusteringin particular, this is one possible direction where future work can improve on (Section VI).
In the second subsection, we consider the categorization and discussion of fair clustering approaches based on what stage of the clustering pipeline the enforcement is targeting. Initially in Section II we had provided the distinctions between the pre-processing/in-processing/postprocessing methodologies for general ML models. We apply this same terminology for the classification of fair clustering approaches. It is important to note that for clustering, the learning pipeline is a little different compared to traditional ML models as the training and test datasets are the same. Therefore, in the second subsection we first describe the fairness intervention stages (pre-processing/inprocessing/post-processing) in the clustering context and then discuss categorization.

A. CLUSTERING OBJECTIVE 1) CENTER-BASED CLUSTERING
We now discuss all research on making center-based clustering fair. Also note that in fair clustering literature (and in general, for clustering), k-median(s) and k-medoids clustering are often used interchangeably to describe the latter problem. Technically, these clustering objectives are very differentk-median(s) refers to minimizing the L1 norm and cluster centers need not be exemplars (must be points in the original dataset), whereas for k-medoids the goal is to minimize the sum of pairwise dissimilarities defined using any distance metric, and centers need to be exemplars. As in other related clustering work, we will refer to latter case as k-medians, with the implicit assumption that cluster centers are exemplars. In case we discuss any deviations from this objective, we shall state it explicitly to avoid ambiguity.
Group-Level Fairness: Chierichetti et al. presented the first work on group-level fair clustering, specifically for the k-center and k-median clustering objectives while considering the case with only two protected groups [30]. They introduced the fairness notion of balance, which we discussed previously. To balance output clusters, they proposed the fairlet decomposition method. Fairlet decomposition is a pre-processing approach that computes fair micro-clusters where fairness is guaranteed. They then use the fairlet centers as a newly transformed dataset from the original. This transformed fairlet-based dataset is then provided to vanilla clustering algorithms, and hence, we obtain approximately fair clustering outputs as a result of the fairlets themselves being fair. The fairlet decomposition approach is also visually described in Fig. 5 to improve understanding. Note that fairlet decomposition can generally be used with any fairness notions but proposing efficacious approaches for computing fairlets is not a trivial task in itself.
Subsequently, Backurs et al. [99] improved the computational time complexity of fairlet decomposition by proposing a nearly-linear time scalable algorithm, but only for k-median clustering. Rösner and Schmidt [113] extended the fairness framework of [30] to allow for multiple protected groups and obtained a 14-approximation fair algorithm for the k-center objective.
Schmidt et al. [97] introduced coresets for fair k-means clustering, which allowed for a more scalable approach than fairlets, and also are more applicable when random-access to the dataset might not be allowed (required for fairlet decomposition). Coresets are essentially a summary of a given point set, such that they effectively approximate the cost function for any possible candidate solution and the fair coresets introduced in [97] aim to do this while also enforcing fairness for the case with two protected groups. Huang et al. [121] extended fair coresets for k-median clustering and remove the dependence of dimension for fair coreset generation in the case of k-means. Further, their approach works for multiple disjoint protected groups. Bandyapadhyay et al. [122] proposed the first Fixed-Parameter Tractable (FPT) time constant factor approximation algorithms for k-median and k-means while removing the dimension dependency for coreset generation. We visually describe the fair coreset approach in Fig. 5.
A number of papers expanded upon the original fairness notion of balance [30] by introducing upper and/or lower bounds to protected group membership in clusters, also previously referred to as the bounded representation notion. Ahmadian et al. [86] used only an upper bound constraint for protected group representation in clusters for fair k-center with multiple protected groups present. Bera et al. [85] and Bercea et al. [98] provided approaches for more general clustering objectives that used upper and lower bound constraints on the proportion of protected group members in each cluster. The algorithm from [85] allowed for groups to overlap (for example, consider both race and gender) and they denote as the number of protected groups samples can belong to simultaneously. They proposed a linear program based rounding approach that achieves a c + 2 approximation if the original clustering objective has a c approximation algorithm available, while incurring at most 4 + 3 additive violations to the upper and lower bound fairness constraints.
Specifically for k-center, [85] obtained a 5-approximation when centers need not be exemplars, and a 4-approximation when centers are exemplars. Harb and Shan [123] improved upon these fair k-center results of [85] by developing a faster 5-approximation algorithm for the non-exemplar case, and a better 3-approximation algorithm for the case with centers as exemplars. Jia et al. [120] proposed a 3-approximation algorithm for the k-center objective that allowed for multiple groups or colors. Esmaeili et al. [118] proposed approximation algorithms in the general setting where points are allowed to have uncertain protected group membership (that is, protected group memberships are provided as a distribution), and a sample in the dataset is assumed to only belong to one protected group at a time.
Liu and Vicente [114] introduced a stochastic approach that solves a bi-objective optimization problem and shows the trade-off between the k-means clustering objective and fairness. Their algorithm was only guaranteed to converge for smoothed problems. Esmaeili et al [126] generalized the clustering objective cost/fairness problem for k-center, k-median, and k-means and introduced new group-level fairness notions. They developed bi-criteria approximation algorithms for each notion.
Kleindessner et al. [128] proposed an approach to compute fair summaries for group-level fair clustering which uses k-center prototypes to summarize each group in a dataset. They provide a linear time approximation algorithm for this problem. Chiplunkar et al. [129] proposed improved distributed algorithms for the aforementioned fair summaries notion in the streaming setting. Jones et al. [130] proposed an algorithm that runs in linear time and yet achieves a 3-approximation for the fair k-center summaries problem.
Ghadiri et al. [33] introduced the socially fair notion which focuses on minimizing clustering cost across groups rather than constraining the proportion of protected groups in clusters. Concurrently to [33], Abbasi et al. [103] independently introduced a similar notion of group representation. Makarychev and Vakilian [104] presented a generalized bi-criteria approximation algorithm and generalized the socially fair clustering problem framework. Goyal and Jaiswal [124] developed an FPT time approximation algorithm for the socially fair notion. Thejaswi et al. [125] introduced a new notion of diversity-aware fairness, that requires each group have some minimum representation in the form of cluster centers, for the k-median objective.
Individual-Level Fairness: Chen et al. [31] introduced the individual level fairness notion of proportionality for k-center clustering that seeks to ensure points are treated equally, an important concern especially for facility placement. They showed that exact proportionally fair solutions might not always exist and provide an algorithm that achieves in the worst case a 1 + √ 2 proportionally fair clustering solution. They also developed an approach that is O(1) proportionally fair and also a O(1) approximation for the k-medians objective of the optimal proportional fair solution. Micha and Shah [93] modified Chen's approach, developed a 2-approximation algorithm when the distance metric being used is the L2 norm, and proved the 1+ √ 2 factor was tight for other commonly used distance metrics such as the L1 norm and the L-infinity norm.
Jung et al. [88] introduced an individual level notion that determined a fair radius for clusters, as defined previously (Table 1), for center-based clustering objective. They developed an algorithm that achieved a 2-approximate fair k-center clustering, meaning that every point p has a center within a distance of 2r(p) where r(x) is defined as in Table 1. Note from here on that we denote bi-criteria approximation results for the fairness notion and clustering objective using the (., .) notation. Mahabadi and Vakilian [34] confirmed Jung's results and generalized the problem, obtaining (O(1), O(1)) bi-criteria approximations for fair k-median and k-means clustering and a (O(1), O(log n)) bi-criteria approximation for k-center. Vakilian and Yalçıner [92] improved upon the fair k-center case of [34] and improved the bi-criteria approximation from (7, O(log n)) to (3, O(1)). Additionally, they provided improved bi-criteria approximations (compared to [34]) for the k-means and k-median objectives as well. Chakrabarty and Negahbani [91] also provided improved algorithms for individual fair clustering according to Jung et al's fair notion achieving an (8,8) and (8,4) bi-criteria approximations via linear program rounding for k-medians and k-means clustering respectively.
We also discuss some other work on center-based individually fair and group-level fair clustering that have recently been studied. Kleindessner et al. [87] introduced another individual fairness notion using a dissimilarity function that requires points be closer to points of their cluster than those of other clusters. Anderson et al. [89] developed fair algorithms that ensure distributional individual fairness so that similar individuals are clustered similarly. Brubach et al. [94] introduce two new individual fairness notions and present an algorithm for the k-means objective. More recently, Chakrabarti et al. [90] proposed an individual fairness notion that ensures points receive similar quality of service and provided algorithms for the k-center objective. Abraham et al. [127] introduced a fair k-means clustering algorithm for a new group-level fairness notion that is enforced at the in-processing stage of the clustering pipeline.

2) HIERARCHICAL CLUSTERING
Ahmadian et al. [105] and Chhabra and Mohapatra [32] concurrently proposed approaches for fair hierarchical clustering. However, both approaches have a number of different distinctions. Ahmadian et al. [105] proposed a fairlet decomposition approach for only (upper-bounded) bounded representation fairness, for a number of recently proposed hierarchical clustering objectives such as Dasgupta's cost [39], value [45], and revenue [46]. Due to fairlet decomposition their work constitutes a pre-processing approach. Chhabra and Mohapatra [32] on the other hand proposed an in-processing algorithm for heuristic greedy hierarchical clustering algorithms which can accommodate VOLUME 9, 2021 any notion of fairness. Their work does not consider the newly proposed hierarchical clustering objectives such as [39] but instead focuses on traditional heuristic hierarchical agglomerative clustering used in practice. Quy et al. [117] utilized fairlet decomposition for making capacitated (clusters have some size constraints) clustering fair. They considered both hierarchical agglomerative (heuristic and greedy, similar to [32]) clustering and partition-based clustering algorithms to improve on fairness. Furthermore, as the capacitated clustering problem is relevant in an educational setting (clusters of students need both fair representation and approximately fixed sizes), they evaluate their approaches on data from school-going students.

3) SPECTRAL CLUSTERING
Kleindessner et al. [100] added fairness constraints (balance fairness notion) to normalized and unnormalized spectral clustering. They project the graph Laplacian onto a fair subspace and then perform k-means clustering on this subspace. They also gave analysis for their approach on a variant of the stochastic block model. Anagnostopoulos et al. [131], [132] extended the work of [100], to the densest subgraph problem.

4) DEEP CLUSTERING
The first work combining deep clustering with fairness was proposed by Wang and Davidson [102]; they introduced fairoids to represent each group and ensured centers are equally spaced from the fairoid via a discriminative deep clustering model. Fairoids allow for non-binary valued protected groups. Li et al. [111] developed a scalable, deep clustering model that used adversarial loss to constrain learning and ensure fairness while maintaining cluster quality. They were the first paper to use deep, fair clustering on visual datasets for visual learning. Zhang and Davidson [101] generalized the fairness constraints for deep clustering and developed a model that allowed for multiple protected groups and flexible constraints.

5) MISCELLANEOUS
Ziko et al. [115] developed a general variational boundoptimization framework for fair clustering. They introduce a fairness penalty term based on Kullback-Leibler (KL) divergence. The fairness penalty is used to measure and manage the trade-off between the clustering objective and fairness. Furthermore, their approach is scalable and works for large datasets.
For the graph-based correlation clustering objective, Ahmadian et al. [119] utilized the fairlet decomposition method. They achieve promising results for a number of different fairness constraints and find that by defining the fairlet decomposition similar to the k-median cost they obtain good approximations for fair correlation clustering.
Chhabra et al. [116] introduced the pre-clustering approach of adding antidote data points to the original dataset to improve group-level fairness. Antidote data points are dummy points that do not belong to a protected group, but when vanilla clustering is undertaken on the new dataset, the solution is more fair with respect to the original points. Their approach is general and can accommodate any fairness notions and clustering objectives. They also consider other problem settings for this work, such as in the case where clustering objectives and fairness notions are convex functions. The antidote data approach for fair clustering is visually described in Fig. 5.
While we restrict ourselves to the study of fairness in clustering algorithms, there are other related fields where fairness can be studied, such as link prediction in complex networks [133]- [136]. While an in-depth discussion of such approaches is outside the scope of this work, clustering is inherently connected to many other fields, where similar ideas of fairness can be applied.

B. PRE-PROCESSING, IN-PROCESSING, AND POST-PROCESSING APPROACHES
As mentioned before in Section II, fair approaches can be broadly classified depending on what stage of the learning pipeline the fairness is enforced in. In particular, for clustering, the same classification holds, albeit with some slight differences.
For pre-processing (or pre-clustering) based fair approaches, the fairness intervention occurs at the stage before the learning model is trained. In clustering, this means that the original dataset X is first pre-processed and then transformed to some dataset X . When the vanilla clustering algorithm A is invoked on this transformed dataset, the resulting clusters obtained C fair are fair. A schematic diagram explaining this process is shown in Fig. 6.
For in-processing (or in-clustering) based fair approaches, the fairness intervention happens as a result of changing the original learning model, to make it output only fair solutions. This is where a bulk of fair clustering approaches lie. Here, the clustering model/algorithm itself is modified from the vanilla clustering algorithm A to a fair clustering algorithm A to make it incorporate fairness constraints in the fair solution C fair . The schematic demonstrating this is shown as Fig. 7.
Post-processing (or post-clustering) based fair approaches enforce the fairness approach after the learning model has computer initial unfair estimates. In clustering, this means that the fairness intervention occurs post the vanilla clustering process. The vanilla clustering algorithm A is run on the original dataset X to obtain unfair cluster solutions C. The fairness approach then operates on C to obtain fair clustering outputs C fair . A lot of research works also fall into this category. The schematic explaining this is shown as Fig. 8.
We now discuss fair clustering research under this classification. Furthermore, Table 2 showcases this categorization for most of the major fair clustering papers.

1) PRE-PROCESSING APPROACHES
The concept of fairlet decomposition [30] which was used in the first work of fair clustering constitutes a pre-processing   based approach. As discussed before, fairlet decomposition aims to find fairlets (or micro-clusters) within the data that meet fairness requirements. Vanilla clustering is then employed on this data leading to fair solutions. Many fair clustering works that expand upon or utilize fairlets fall into the pre-processing category: [99], [105], [113], [119]. Fair coresets are also fair representations of the dataset, that summarize the data points to ensure fairness in a more scalable manner. Introduced by [97], fair coresets were used in [121] and [122]. The antidote data approach for fair clustering [116] described before is also relevant here as it is pre-processing and augmenting the original dataset. Diagrams explaining these different pre-processing based approaches in a high-level manner are shown as part of Fig. 5.

2) IN-PROCESSING APPROACHES
In-processing approaches to fair clustering involve altering the clustering objective and algorithm itself. Often the fair algorithm optimizes between the clustering cost and fairness  trade-off. Papers such as [114], [127], and [115] augmented the original algorithms with functions that measured and controlled the trade-off between fairness and clustering performance. In [100], the authors similarly adjust the spectral clustering objective to solve a minimization problem that incorporates fairness constraints. In [125], the authors developed a k-median algorithm specifically for diversity-aware fairness. In papers, [101], [102], [111] the authors constrained the deep clustering process itself, optimizing the trade-off between cluster quality and fairness through joint optimization, adversarial learning or other similar approaches.
The works by [31] and [93] also alter the clustering algorithm objectives to find individual-level proportionally fair solutions. Finally, the papers [88]- [91], and [94] also redefine the clustering objectives to make them individually fair according to the fairness notions first proposed by Jung et al. [88]. VOLUME 9, 2021 3) POST-PROCESSING APPROACHES Post-processing involves modifying the clustering outputs to be fair. A vanilla clustering algorithm is first employed, and either a fair problem is separately solved or the vanilla output adjusted depending on the fairness notion. The clustering algorithm itself does not jointly optimize for the clustering cost and fairness objective, unlike methods for in-processing. Examples of post-processing approaches include those used for fair k-centers summaries-these post-process clustering centers such that every group is represented through centers equitably. This line of work was first introduced by [128] and later extended by [129] and [130]. The authors in [113] use an algorithm to maintaining fairness and privacy subsequent to first finding a non-private solution using vanilla clustering algorithms, also constituting a post-processing approach. Similarly, [85], [86], [98], [123], [34], [92], [118], [126], and [87] solve the vanilla clustering problem first and then improve fairness by proposing algorithms that change cluster assignments for points. Hence, these also constitute post-processing based fair clustering approaches.
Another post-clustering based work was by [137]. Here, the authors take as input the cluster output from a vanilla clustering algorithm, and compute a clustering close to the original, but one that meets fairness requirements. They formulate the problem as an integer linear program, and also provide theoretical results on hardness.

V. EVALUATING FAIR CLUSTERING
In this section, we discuss the aspects of fair clustering research along two facets-the datasets that are generally used for evaluation, as well as the motivations for some real-world applications. The goal here is to allow researchers to select suitable datasets for evaluation based on prior research, and also provide them with real-world use cases. These real-world scenarios can then be used for motivating theoretical problems in fair clustering, or undertaking fair clustering research with a more practical flavor.

A. DATASETS USED FOR EVALUATION
The approaches discussed in Section IV propose different methods of creating fair clustering models using different notions. The next phase is to evaluate the approach by applying it to actual data. Datasets used vary widely from paper to paper depending on the notion and overall goal, but some datasets are used more frequently than others and can be used to compare between approaches.
To serve as a guide for researchers new to the field, the datasets used in over 40 papers on fair clustering were collected in Table 3 (classical clustering approaches) and Table 4 (deep clustering models). The most common datasets used for traditional fair clustering are listed at the top of Table 3: adult [138], bank [139], creditcard [82], diabetes [140], and census [141], all of which are large datasets from the UCI ML repository [167]. Table 3 includes the name and label of each dataset, a short description, and the source paper.
Most other datasets can also be found on the UCI repository. Further, the possible protected groups (such as ethnicity), that have been used in the surveyed papers are listed as well along with the dataset size. We term a dataset with over 10,000 instances as large. Note that some papers opt to use subsets of the datasets since their algorithms do not scale well or the running time is too long such as [30], [85], [113], [118]. For completeness, we also list all papers surveyed that use a certain dataset in the last column of the aforementioned tables.
Datasets are sometimes chosen specifically for the approach and fairness notion being proposed. For example, [103] uses North Carolina Voter information when proposing their group representation notion for facility location. Other datasets such as bank [139] and creditcard [82], with common protected groups being marital status and gender, also have fairly clear connections to the motivations behind fair clustering. Other datasets, such as iris [145], are less directly connected but can still serve as toy datasets for experimentation. We also find that the most common protected groups are gender and sex, and race. Datasets listed without specific protected groups are used in papers enforcing individual-Level fairness notions and therefore did not require a specific protected group.
Visual datasets are often used for deep clustering; these are listed in Table 4. Deep clustering, as mentioned in Section IV, differ from more traditional approaches and can learn more powerful representations. In Table 4, the datasets are described and the protected group is listed.

B. REAL-WORLD APPLICATIONS
Machine learning models have been used to assist in a vast majority of decision-making and risk assessment processes, from college admissions to online recommendation systems. For further information on the topic, Makhlouf et al. [80] in their paper discuss general applications of ML in decision-making processes and some existing programs where fairness should be considered. Suresh and Guttag [168] additionally show how ML models can have unintentional, damaging consequences if bias is not considered throughout the ML pipeline. Thus, in this section, real-world applications for fair clustering ML models are used to motivate further research in the field.

1) BANK LOAN DISBURSEMENT
We described a similar scenario previously in Section III for group-level notions. Clustering based models can be used to determine individuals who should receive a loan based on how likely they are to default on it. Many factors can play a role, and are often considered before disbursement, such as an applicant's education history, past payment history, past billing statements, amount of the bill paid, and age. Members of certain minority protected groups, such as women or POC, might have lower incomes due to systemic issues such as the wage gap. Furthermore, married persons  might have better credit than single persons. Vanilla clustering algorithms being used for shortlisting candidates for disbursement in an unsupervised manner that do not correct for the different sorts of bias present in data will likely cluster single people, women, and POC as higher risk and as more likely to default on their loan. Such predictions might result in fewer loans, or loans with higher interest rates, being given to protected groups, further promoting the systemic issues at hand. A well designed, fair clustering algorithm could correct for the disparate impact by requiring balance or a bounded representation, that more or less fix the proportion of protected groups in each cluster.

2) JOB SHORTLISTING
Many ML based approaches exist that parse through job candidates in order to shortlist those who should be interviewed or move onto the next application step [9]. Automating this step can reduce errors, human bias, time spent parsing applications, and allow for easy comparison between candidates [13]. Clustering algorithms can separate between accepted and rejected candidates for shortlisting based on their skill sets and other attributes, and how well they match the job requirements. Common candidate attributes include education, major, experience, skills, current location, current employment status, age, gender, etc [169]. Clustering algorithms that do not account for bias might reject POC or women and accept less qualified white men [80]. A fair clustering algorithm that requires for example, balance, for the sensitive group gender would fix the proportion of women in each cluster, assigning top qualified women from the rejected cluster to the accepted one to account for the bias. The company benefits by seeing more qualified individuals, and the applicants are not discriminated against by being rejected based on their inherent attributes.

3) COLLEGE ADMISSIONS
Clustering based ML models can be used to shortlist candidates for admission, remove definite rejects for college applications, or select those most likely to attend. Attributes considered might include GPA, leadership roles, parents' education levels, and general student information. Algorithms with unchecked bias might reject candidates based on factors that are unrelated to the candidate's ability, such as their street address [80], which can correlate to other attributes such as their socioeconomic background or race. Fair clustering algorithms that ensure individual-level fairness (Section III) could prevent individuals with approximately similar grades or leadership roles from being clustered differently based on unrelated attributes such as ethnicity.

4) FACILITY LOCATION
ML models can assist in facility location, for example in helping determine voting/polling booths, or hospital locations. As previously mentioned in Section III, regular clustering models that only consider the number of homes in an area might unfairly distribute facilities among suburban, urban, and rural areas. Fair clustering models should take into account the conditions of an area by considering other constraints, such as proportionality. Depending on the facility purpose, proportionality could ensure facilities are equally serviced [31]. Another notion, group representation, could ensure cluster centers/ facilities are fairly placed such that the centers are representative of the clusters, or each area gets its own center [103]. This could play a role in ensuring polling centers are serviced similarly and are a reasonable distance from a majority of sample locations.

5) PRISONER RECIDIVISM
ML models have been used to predict the risk/likelihood of ex-convicts re-offending to offset human bias on factors such as race [170]. Prisoner recidivism can be interpreted as a probability and could be determined by a soft clustering algorithm, in which a point can be assigned a certain proportion of each cluster-with clusters signifying either being at high risk of re-offending, or low risk. A number of factors can assist in predicting recidivism, including age and number of prior convictions [170]. However, as has been found with the COMPAS tool [18], since data used to train such algorithms might be systemically biased, the learning model could amplify bias against POC based solely on their race [79]. In such a case, a well-designed fair clustering algorithm that ensures individual fairness-that similar individuals (in terms of crimes committed and other attributes) are clustered similarly regardless of sex or race [89]-would prevent minority protected groups members from being assigned disproportionately higher risk rates compared to non-group members with similar crime statistics.

6) RECOMMENDATION SYSTEMS
Clustering based recommendation systems have been used for many purposes, from movie recommendation [171] to distance learning course recommendations [172]. As clustering algorithms can be biased due to the data, these recommendation systems can also be biased. This could mean, for instance, giving skewed recommendations to men over women [80]. As a result, recommendation systems should be personalized for individuals, and should not be explicitly biased towards gender or ethnicity. A clustering algorithm that ensures some level of individual-level fairness could prevent certain groups from automatically receiving certain recommendations regardless of their other attributes.

7) COMMITTEE SELECTION
A final example, also presented in [125], is selecting committees that represent each group in a population. Committees are built within various communities for political, educational, fundraising, among other purposes. The goal might be to have a committee with at least one representative of each group, or have a diverse committee where every group is well represented by multiple members. A fair clustering algorithm could ensure protected groups are well represented, irrespective of individuals' ethnicity or political bias, using notions such as diversity-aware fairness [125], group representation [103], or fair summaries [128].

VI. FUTURE RESEARCH DIRECTIONS AND OPEN CHALLENGES
A. CONSIDERING ALTERNATIVE CLUSTERING OBJECTIVES As we have seen throughout the article, and especially in Section IV, most research on fair clustering considers center-based clustering algorithms (such as k-center, k-medians, etc), and a few consider hierarchical clustering objectives and spectral clustering. However, there are a number of other clustering algorithms and objectives that have not been considered from a fairness perspective. We provide directions for research in this regard with respect to density-based clustering approaches and soft clustering methods. Furthermore, as in-clustering approaches are more popular, we consider those for this first approach.

1) DENSITY-BASED CLUSTERING
Density-based clustering algorithms use the concept of density or how close points are to each other in space to assign points to clusters and label points in low-density regions as noisy points or outliers. There are a number of different approaches that seek to perform density-based clustering, such as DBSCAN [47] and OPTICS [48]. For this task, as a first step, popular algorithms such as DBSCAN [47] and OPTICS clustering [48] could be considered. Further on, research frameworks can be extended to other density-based clustering approaches such as PreDeCon [173] and SUB-CLU [174] since these share similarities with the DBSCAN approach.
In general, one can consider the following in-clustering approach to improving fairness for these clustering algorithms. First, identify a clustering objective based on the characteristics of the algorithm and application scenario. This objective allows one to eventually provide theoretical guarantees of fairness. Next, decide on how the fairness constraint is enforced, depending on the suitability to an application scenario. For example, if balance is being considered, one can consider lower bounding or upper bounding balance; if a proportion of points is being considered, bounded representation can be considered. Then, approximation algorithms can be proposed which approximate the objective. The approximation ratio obtained is the cost that the fair approximate algorithm achieves on the objective compared to the optimal value of the objective. It can also be gauged as to how much distortion is present in the fair assignment of points as compared to the original objective. Lastly, evaluation of the proposed approach using real-world datasets (as discussed in Section V) can be undertaken and fairness improvements can be analyzed.
There are other prospective research challenges associated with this problem. As most research so far has looked at center-based clustering, it is probable that fairness definitions are also designed accordingly. Thus, depending on the clustering algorithm being analyzed, alternate fairness notions can be developed and studied. For example, DBSCAN labels certain points as outliers (called noisy points) while clustering, and this might require differing notions of fairness as certain points are not being represented by the clustering algorithm at all now. Another prospective research direction can be to study multiple assignments to protected groups for data points. As a first step, the 2 groups case can be studied as in the seminal work of [30]. Future work can then include multiple groups, with points being assigned disjointly to each protected group. Subsequently, settings where points can be assigned to multiple protected groups at the same time can be analyzed. Finally, improvements can also be made in terms of running-times-while naive first approaches to providing fairness for the aforementioned clustering algorithms can have longer running times, for any practical implementation, it would be required to improve the asymptotic time complexity of their fair variants.

2) SOFT CLUSTERING
As mentioned before, much discussion and existing work have focused on hard-clustering algorithms where a data point belongs to a cluster in a binary fashion. That is, it either belongs to a cluster or it does not. However, in certain application scenarios, soft clustering is more suitable. Gaussianmixture models [109] have been widely used in such cases, and thus could be the preliminary focus of this research direction. To estimate clustering results in a Gaussian-mixture model, an expectation-maximization (EM) algorithm [175] is often used. EM is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. Therefore, a new research direction involves studying the fairness of such algorithms. A first approach and initial objectives could be similar to that discussed in the previous subsection on density-based clustering. One key issue is to redefine fairness in the presence of soft-clustering to reflect its probabilistic nature.

B. IMPROVED CLUSTERING PERFORMANCE ANALYSIS
Fair clustering approaches aim to improve fairness for clustering objectives by changing cluster assignments for samples in the dataset. It is well known that clustering performance is degraded as a result of improving fairness [118], [176], [177], as changing point labels to improve fairness can be contradictory with the original cluster assignments, leading to worse clustering performance. While this trade-off is well acknowledged, there is currently no standardized approach to measuring clustering performance.
Most research works measure the drop in the clustering objective over the vanilla (original/unfair) clustering objective [30], [85], [105]. However, measuring performance in this way might not be suitable in some case scenarios. Consider the following examples: • When Algorithm Agnostic Fairness Notions Are Used for Different Clustering Objectives: If algorithm agnostic notions are used, but the clustering objectives are different, directly observing the values of the clustering objective after fairness enforcement would not lead to a sound comparison. For example, comparing a fair k-center cost with a fair k-means cost would not make sense. This scenario can arise when more general fair clustering approaches are being employed as in [116].
• When Clustering Objectives Are Not Well-Defined: This can be understood through the context of hierarchical clustering. Although recently clustering objectives for hierarchical clustering have been proposed, traditionally hierarchical clustering has been a heuristic agglomerative/divisive procedure and does not have an analytical objective to optimize. Thus, research aimed at making traditional hierarchical clustering fair [32] would not have a clustering objective which can measure the quality of the fair solution, in terms of clustering performance.
Alternatively, traditional clustering performance indicators could be used to measure clustering quality after fairness enforcement. These include the widely utilized Silhouette score [178], Calinski-Harabasz index [179], or the Davies-Bouldin index [180]. These have also been employed as a measure of clustering performance after fairness intervention in some fair clustering works [32], [116], [127]. The Silhouette score is especially appealing since it is bounded and always outputs a value between −1 and 1, making it easy to interpret. However, these metrics also have certain drawbacks-they work well only in the case with convex clusters, and might not be good indicators of performance in other case scenarios. Therefore, a future research direction for fair clustering is to investigate and propose new metrics for clustering performance specifically in the context of fairness. This would also connect the field of fair clustering with the long-standing sub-field of research on measuring clustering performance.

C. ADVERSARIAL ATTACKS AGAINST FAIRNESS
This direction for future work primarily deals with adversarial attacks on clustering algorithms that aim to degrade the fairness of a given clustering. As more and more research attempts to make clustering fair, the converse of problem in clustering also holds true. Malicious entities can seek to disrupt fairness for their personal gains and agendas. As a starting point for investigating this, it would be useful to leverage work on data poisoning for clustering in a black-box setting [181], [182]. Without changing the attack objective, the attack first proposed in [32] is especially powerful because it can be carried out without knowing the original clustering algorithm.
We can delineate a first approach for degrading fairness using the attack algorithm of [181] and for the fairness notion of bounded representation [86]. Let the clustering algorithm be k-means where k = 2. Here, for ensuring fairness each protected group's members in a cluster need to be within some minimum and maximum pre-specified proportion. In [181] details adversarial attacks where the target of the adversary is to lead to spill-over of as many points from one cluster to another. Thus, in the 2-way clustering setting, since this attack algorithm can change the proportion of points that belong to each cluster, we can effectively skew the chosen fairness metric for the outputted clustering. We defer interested readers to [181] for more details on the attack algorithm and threat model. Subsequent to this, there are many possible directions along which fairness degrading adversarial attacks can be extended: • Black-Box Attacks: Black-box attacks on clustering algorithms that disrupt fairness of the obtained clustering can be investigated. Since these are black-box attacks, the attack is powerful as it works irrespective of the choice of clustering algorithm used by the defender.
• White-Box Attacks: White-box attacks specific to the clustering algorithm (or its fair variant) chosen by the defender can also be investigated.
• Other Attack Modalities and Threat Models: Attacks when other attack modalities are considered, such as imperfect knowledge of the dataset, grey-box attacks, different fairness definitions that can be disrupted, and alternate/enhanced attack objectives, as well as costs to the adversary can also be analyzed.
• Transferability and Other Fairness Notions: Like in supervised learning [183], analysis can be undertaken to observe if generated adversarial samples are transferable across algorithms, fairness definitions, and attack settings.

D. MORE APPROACHES FOR DEEP CLUSTERING
Deep clustering is the combination of deep learning paradigms to the classical clustering approaches in unsupervised learning. The approaches used are different from traditional clustering, and usually require the existence of labels in the testing phase to evaluate the deep learning models using metrics such as the Normalized Mutual Information (NMI) score [184]. In case labels are available for the ground-truth clusters, deep clustering has been shown to achieve state-ofthe-art performance when compared with traditional clustering approaches such as k-means [185]. Thus, it is important to ensure fairness for these models as well, similar to traditional clustering approaches. However, as covered in Section IV, not much research has been undertaken in this regard. To the best of our knowledge, there are only three research works covering deep fair clustering: [101], [102], [111]. Thus, an important direction for future work is to study deep clustering from a fairness perspective. Many aspects of future work exist, similar to how fairness has been studied for traditional clustering approaches.

E. ASSESSING PERCEIVED FAIRNESS
For fairness improvements in clustering with significant social impact, the evaluation stage needs to be improved to account for perceived fairness by protected groups and individuals. Clearly, while one may develop fair algorithms for ML based on relevant fairness costs and definitions, a fair algorithm is only beneficial if it impacts the affected community in a positive social sense. To this end, there is a lot of potential for significant research work to gauge how fair proposed algorithms are in terms of public perception. Such experiments can be carried out with special focus groups where individuals and groups (based on the protected attributes of the application at hand) directly impacted by applications where clustering algorithms are used can provide guidelines for improvement. Based on this feedback, better fairness definitions can also be proposed that are socially and practically relevant. Prior research in clustering fairness has not considered evaluation of this form, and therefore, using minority groups' feedback as an evaluation metric will lead to fairer systems along with considerable research novelty.
Another related dimension to actual perceived fairness in clustering are the datasets being used. Along with identifying application domains where fair clustering needs to be implemented, it is also important to obtain real-world datasets which might lead to eventual unfairness in clustering. This is important for a number of reasons: 1) obtaining empirical results for proposed algorithms on actual real-world datasets can shed light on how these algorithms perform in actual application scenarios and not on synthetic ones, and 2) doing so opens up an opportunity to understand how biases might creep in the datasets in the first place, which could lead to the development of more fair algorithms, and better fairness definitions. To do this, datasets can be obtained from actual recruiting agencies, or from universities' admission processes, and can then be used to gauge if proposed fair algorithms provide fairer results. The analytical models and algorithms can then be tuned so that they are being leveraged to induce more fairness into such real-world applications (such as admission/selection processes).
As mentioned before, perception of algorithmic fairness is an important metric for evaluation. Thus, an evaluation plan and methodology for fair clustering research should involve conducting regular meetings and focus groups. Here, proposed fair algorithms will be utilized in real-time, and minority and affected political groups will give their observations and feedback regarding its fairness. For example, users belonging to certain protected groups can be shown how the vanilla clustering algorithm performs, and then how the fair variant performs. While the fair algorithm might be better, it might still not be at an acceptable standard in terms of actual protected group members' expectations. Such constructive feedback could aid in building actual tools and algorithms that are useful to the community as a whole, and provide some real social significance. There is also a lot of scope in borrowing from similar efforts that assess perceived fairness in algorithmic decision-making systems such as [186].

F. HANDLING HIGH-DIMENSIONAL DATASETS VIA SCALABLE FAIR CLUSTERING
In general, similar to other data analysis techniques, clustering algorithms also suffer from the curse of dimensionality [187], and tend to perform poorly on high-dimensional datasets [188]. Moreover, the first approach for fair clustering proposed by Chierichetti et al. [30] was also not scalable, and could only be applied to small sized datasets. This was due to the first step involving fairlet decomposition, which has a super-quadratic running time.
While research extending this work has attempted to make fair clustering scalable, there are still many shortcomings. For example, Backurs et al [99] proposed a scalable algorithm for fairlet decomposition which runs in (almost) linear time, however, this approach is only applicable for the case with 2 protected groups. This trend is also prevalent in other fair clustering approaches proposed for other clustering algorithms. For example, the fair spectral clustering algorithms proposed by [100] do not scale well with dataset size and dimension, and even for the more general antidote data fair clustering approach [116] the authors noted that a major limitation of their work is the running time of their algorithms when applied to high-dimensional/large-scale data.
Thus, a possible future direction for research in fair clustering can aim to make the proposed fair algorithms scalable, and allow them to handle high-dimensional data. Clustering algorithms capable of handling high-dimensional data have been extensively studied in the literature [188], [189], and future research can aim to apply these techniques to the field VOLUME 9, 2021 of fair clustering. Researchers can also aim to augment existing fair clustering approaches so as to make them scalable.

G. RELATING FAIR CLUSTERING TO CONSTRAINED CLUSTERING
The problem of constrained clustering tackles the case when additional information is known about the clustering problem, and can be used to improve the discovery of clusters [190]. This scenario arises in real-world problems where domain specialists can provide additional side information to aid the clustering process. In the simplest case, this can then be translated into a traditional clustering problem where we wish to impose some instance-level constraints on the original clustering problem [191]. While many different forms of constraints can be formulated for different clustering algorithms, we consider must-link and cannot-link constraints to motivate the connections between fair clustering and constrained clustering.
Consider individual-level fairness and assume there exists an unbiased domain specialist who knows that certain samples in the dataset need to belong to the same clusters (for example, a recruiter who interviewed candidates and found them to be equally suitable for a position, irrespective of their protected group attributes, such as gender or ethnicity). Conversely, the domain specialist can also provide side information indicating that two samples should not belong to the same cluster (considering the previous example, the recruiter knows that one candidate performed well in the interviews and the other did not, irrespective of their protected group memberships). Such side information about the data samples can be trivially encoded as must-link and cannot-link pairwise constraints between data samples. Then, if candidates are being shortlisted using a clustering algorithm such as k-means (similar to the job shortlisting examples considered in Section III and Section V), these must-link and cannot-link constraints can be provided as input (along with the original dataset) to a constrained k-means algorithm such as PC-KMeans [192] or COP-KMEANS [52] to enforce individuallevel fairness.
In a similar fashion, even other fairness constraints (such as those enforcing group-level fairness) can be encoded with the assistance of a domain specialist. These can then be used to meet the fairness criteria using existing constrained clustering algorithms. As a future research direction, we then aim to motivate studying fair clustering from the perspective of constrained clustering, which has been extensively studied in previous work. Another important research contribution could be to provide theoretical insights into when fair clustering problems can be translated into constrained clustering problems, and the different types of constraints and fairness notions that can be used to do so.

VII. CONCLUSION
In this work, we provided the first survey on fair clustering. Initially, we discuss the relevant details regarding clustering and fairness in machine learning (Section II). Then we cat-egorize different fairness notions used in making clustering fair (Section III) and propose intuitive classification methodologies for the same. We also organize current fair clustering literature into many sub-categories (Section IV) and provide a comprehensive overview of the field as a result. We also detail many new insights and describe possible directions for future work (Section VI). Our goal through this survey article is to add to the existing body of work on fair clustering by providing a concentrated introduction to the field, which serves useful for both researchers and industry practitioners alike.