A review of the machine learning literature on fairness

  • Benjamin Paaßen profile
    Benjamin Paaßen
    7 August 2018
    Total votes: 2

Abstract: This text contains a compact summary of existing machine learning research on fairness in automatized decision making, concerning settings such as credit scoring, pre-trial risk assessment, and job applicant screening. The summary covers approaches to mathematically quantify unfairness in such systems, the shortcomings of such approaches, as well as a new perspective on impact based on dynamical systems. The key messages are:

  • There are many competing definitions of fairness which stem from different, underlying intuitions. The definitions are partially contradictory and partially complementary.
  • While some existing definitions align well with the existing European legal notion of direct discrimination, indirect discrimination is as of yet only insufficiently captured by formal fairness definitions.
  • Dynamical systems models suggest that most established notions of fairness in the machine learning community do not suffice to guarantee a desirable long-term outcome. Only the relative simple intervention of enforcing that acceptance rates mirror demographic rates (benchmark test, affirmative action, quotas) provides this guarantee.
  • Fairness can likely only be ensured by inspecting decision making systems in detail, beyond their decisions on example data.

Redistribution notice: This work is, as of yet, unpublished and is written for the European AI alliance. Redistribution in the current state is discouraged and should be coordinated with the author.

Introduction

With the recent success of machine learning in many tasks previously thought out of reach (such as image and speech processing, LeCun, Bengio, & Hinton, 2015), it appears desirable to automatize many decisions which are currently performed by humans. This does not only promise more accurate and faster decisions, but also more objectivity because, intuitively, an artifical intelligence is unable to harbor human biases (Munoz, Smith, & Patil, 2016). For example, it appears intuitively plausible that an automatized system which screens thousands of applications for a job could be more objective and more holistic in its screening process compared to a human worker.

Unfortunately, ample research has established that automatized decision making may indeed reproduce and even exacerbate human bias (refer e.g. to Angwin, Larson, Mattu, & Kirchner, 2016; Corbett-Davies, & Goel, 2018; Dwork, Hardt, Pitassi, Reingold, & Zemel, 2012; Munoz et al., 2016; O'Neil, 2016). For example, the pre-trial risk assessment tool COMPAS has been regarded as unfair because it assigned unjustified high risk more often to Black people compared to white people (Angwin, et al., 2016; Corbett-Davies & Goel, 2018). Similarly, there are ample historic examples of policies which did not overtly discriminate against a certain group but had and have indirect, discriminatory effect due to a correlation between decision-relevant features and membership in a certain group. For example, financial services in the US have historically been denied to citizens of regions which had been graded as ‘high risk’ in a practice called ‘redlining’, which was not overtly racist. However, because citizens of such high risk regions were disproportionally Black, the effect of the policy was severe racial discrimination (Crawford, 2017; O'Neil, 2016).

A lot of recent machine learning research has attempted to mathematically quantify fairness in an automatic decision making system (refer e.g. to Corbett-Davies & Goel, 2018, Dwork, et al., 2012, Zliobaite, 2017). Such mathematical definitions of (un-)fairness do not only promise to detect unfair effects, but also provide a machine-readable objective function with respect to which decision making systems could be optimized to prevent unfair treatment in the first place. Unfortunately, recent research has also shown that many of these fairness definitions are unsatisfactory, because they run contradictory to the original objective of the system - i.e. they hurt the predictive accuracy of a system -, because they fail to align with legal or moral notions of fairness, or because they contradict each other (refer e.g. to Corbett-Davies & Goel, 2018; Liu, Dean, Rolf, Simchowitz, & Hardt, 2018; Pleiss, Raghavan, Wu, Kleinberg, & Weinberger (2017)).

In this text, we provide a short and formal summary of the most prominent mathematical formalizations of fairness, the criticism of these notions, and, finally, a new perspective on fairness research based on feedback loops.

Scope and Related Work

Given that fairness is a broad topic, we can only focus on a small subset of the available research here. In particular, our scope is limited to automatic decision making systems which provide a risk score for individual humans, based on which some resource is denied or provided to that individual. This covers settings such as credit scoring, pre-trial risk assessment, or job screening, but it does not cover issues of representational fairness, stereotyping, explainability, accountability, security, and many further issues. Regarding this related work, we especially wish to point to Bolukbasi, Chang, Tou, Saligrama, and Kalai (2016) who found that word embeddings reproduce stereotypes, to Crawford (2017) who provides an overview of representational fairness concerns, to Goodman and Flaxman (2017) who investigate the explainability of automatic decision-making in the context of the European General Data Protection Regulation, and to Kilbertus, Gascon, Kusner, Veale, Gummadi, and Weller (2018) who developed a pipeline to guarantee fairness while also maintaining the privacy of sensitive information.

This contribution draws extensively on prior reviews regarding fairness definitions, such as Berk, Heidari, Jabbari, Kearns, and Roth (2018), Corbett-Davies and Goel (2018), Gajane and Pechenizkiy (2018), Romei and Ruggieri (2014), as well as Zliobaite (2017). We focus here only on a relatively small subset of fairness definitions and attempt to provide a more gentle introduction to these. Finally, we investigate the fairness notions which we have mentioned from a dynamical systems perspective.

To our knowledge, the only works in machine learning research to date which have looked into the dynamics of fairness are Liu, Dean, Rolf, Simchowitz, and Hardt (2018) and Hu and Chen (2018), both of which we will describe in more detail below.

Setting

Our setting of interest is the following. Some institution, e.g. a company, wishes to automatize a decision-making process, e.g. job applicant screening, which has two possible outcomes, zero or one, where zero marks the positive outcome, e.g. the application is kept after screening, and one marks the negative outcome, e.g. the application is discarded.

In order to automatize the process, the company aggregates a list of training examples, where each example consists of a set of observable features, e.g. a machine-readable version of the job application, and the desired outcome of interest, e.g. whether the application should have been kept or not, depending on whether the applicant turned out to be right for the job or not.

Based on the training data, the institution trains a machine learning model which predicts a risk score in the range of zero to one, based on observable features. The risk score represents the confidence of the model that the positive or negative outcome should be assigned. More precisely, an output score of zero corresponds to absolute certainty for the positive outcome, 0.5 corresponds to maximum uncertainty, and 1 corresponds to absolute certainty for the negative outcome. Further, using machine learning, the company optimizes a threshold in the range zero to one, such that all individuals with a risk score above the threshold receive the negative outcome and all others receive the positive outcome.

In formulaic terms, every person with observable features x receives the negative outcome if the model f returns a risk score f(x) larger than the threshold θ and the positive outcome otherwise. We represent the desired outcome with the random variable y. That is, ideally it should hold y = 1 if and only if f(x) > θ.

To assess the fairness of a model f, we further require the notion of a protected group, with respect to which we want to prevent discrimination. For example, Article 21 of the Charter of Fundamental Rights of the European Union defines dimensions based on which discrimination is prohibited, in particular sex, race, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation (European Union, 2012, European Union Agency for Fundamental Rights, 2018). Therefore, People of Colour, women, or the Danish minority in Germany would constitute protected groups according to these dimensions, but also intersections on these axes, such as female People of Colour. We will generally denote the membership in a protected group with the binary random variable c.

Fairness Measures and criticisms

The definition of fairness measures is now related to the properties of the model f and correspond to different intuitions about fairness. In particular, the following notions of fairness have been suggested in the literature.

Individual Fairness, Metric Fairness or Lipschitz-Fairness

A very basic notion of fairness is that a ‘virtual twin’ of myself who has the same features x except for a single changed attribute, such as membership in a protected group, should receive the same, or at least a very similar risk score f(x). This notion has been made precise by Dwork et al. (2012) who assume that an objective measure of distance d exists which quantifies how differently people may plausibly be treated, that is, if d(x, x') is low, then x should be treated very similarly to x'. More precisely, the distance between two feature vectors x and x' bounds the permissible distance between the risk scores: d(x, x') > |f(x) - f(x')|. This condition has been called individual fairness, metric fairness, or Lipschitz-fairness.

A key advantage of this notion of fairness is that it can be evaluated not only on the population level, but permits to quantify unfair treatment for each individual within a population. In that, the definition aligns well with European Union law, which states that direct discrimination occurs if someone who is similar in all relevant characteristics (a ‘comparator’) but different in a protected group membership receives a different outcome (European Union Agency for Fundamental Rights, 2018).

However, the notion also has several drawbacks. First, as Dwork et al. (2012) themselves acknowledge, finding a metric d which is suitable for every case is likely impossible. Features may be permissible in some scenarios but problematic in others, and weighting the importance of different features is a difficult and politically charged problem. Second, also acknowledged by Dwork et al. (2012) themselves, individual fairness is limited to detecting acts of direct discrimination but does not sufficiently cover indirect discrimination, such as redlining.

Another challenge is that achieving individual fairness on a certain training data set does not imply that individual fairness holds for the entire population (Yona & Rothblum, 2018). This implies that even ensuring individual fairness while developing an algorithm does not necessarily mean that applications of the algorithm maintain individual fairness in all cases, and thus discrimination may still occur, as well as legal repercussions for it. Interestingly, Yona & Rothblum, 2018 have also shown that a relaxed notion of individual fairness up to a small probability mass does generalize.

Statistical Parity, Mean Difference, and Benchmark Test

To cover indirect discrimination, Dwork et al. (2012) suggest the notion of statistical parity or demographic parity, which means that members of the protected group have, on average, the same risk score as everyone else, that is, the expected value Ep(x|c = 1)[f(x)] should be the same as the expected value Ep(x|c = 0)[f(x)]. A variant of this criterion is the benchmark test or mean difference test, which quantifies how much more (or less) likely it is for the protected group to have a risk score above the threshold compared to everyone else, that is: p(f(x) > θ|c = 1) - p(f(x) > θ| c = 0) (also refer to Corbett-Davies and Goel, 2018; Zliobaite, 2017).

As hinted at before, statistical parity is closely related to the concept of indirect discrimination, as established by European anti-discrimination law. In particular, EU law regards any rule as indirectly discriminating if it is overtly neutral, but affects a protected group in a significantly more negative way compared to another group in a similar situation (European Union Agency for Fundamental Rights, 2018).

To illustrate how statistical parity detects indirect discrimination, revisit the example case of redlining. In this case, high risk scores are assigned to people living in certain regions of a city. Because more non-whites live in these regions compared to whites, non-white people receive, on average, higher risk scores compared to white people, which would violate statistical parity. Similarly, a higher rate of non-white people compared to white people would be above the risk threshold, which would be detected by the mean difference test.

A key advantage of statistical parity is that it avoids any reference to the desired outcome y. This is particularly useful when training data on the desired outcome variable y is biased, as we will discuss in more detail below.

However, statistical parity has also drawn ample criticism. First and foremost, statistical parity differs from EU anti-discrimination law by not requiring that the protected group and everyone else are in a similar situation, that is, if the protected group does differ in relevant characteristics from everyone else, such that differential risk scores appear justified, statistical parity would still detect discrimination (Corbett-Davies and Goel, 2018; Dwork et al., 2012; Gajane and Pechenizkiy, 2018; Zliobaite, 2017). In other words, statistical parity oftentimes requires an ‘preferential treatment’ policy where members of the protected group receive lower risk scores than their desired outcome y (Gajane and Pechenizkiy, 2018; Liu et al., 2018).

Another criticism of statistical parity is that fulfilling the criterion for an entire population does not imply that it is fulfilled for every subgroup (Dwork et al., 2012, Kearns, Neel, Roth, & Wu, 2018). For example, a model f could fulfil statistical parity with respect to non-whites and with respect to women, and still Black women could receive worse risk scores (Kearns et al., 2018). However, Kearns et al. (2018) found that subgroup targeting can be addressed if the subgroups to be checked can be defined by a structured class of functions over the protected groups.

Finally, statistical parity has been criticized for being at odds with the goal of the original model f, namely being as accurate as possible in predicting the desired outcome y (Hardt, Price, & Srebro, 2016; Kearns, Neel, Roth, & Wu, 2018). In most scenarios, conforming to statistical parity means to reduce predictive accuracy.

Equalized Odds, Predictive Parity, and the Outcome Test

To address some of the criticisms of statistical parity, Hardt et al. (2016) have introduced the equalized odds measure, which requires that the rate of people who are judged low risk by f even though their desired outcome is 1 is equal between the protected group and everyone else, and that the rate of people who are judged high risk by f even though their desired outcome is 0 is equal between the protected group and everyone else, that is, p(f(x) ≤ θ|c = 1, y = 1) = p(f(x) ≤ θ|c = 0, y = 1) and p(f(x) > θ|c = 1, y = 0) = p(f(x) > θ|c = 0, y = 0).

Note that equalized odds poses the same requirement as statistical parity, but separated for the sub-population which has the desired outcome y = 1 and the sub-population which has the desired outcome y = 0. In other words, equalized odds permits violations of statistical parity if these are justified by the desired outcome, but introduces the additional requirement that statistical parity holds for subgroups, namely the subgroups with y = 0 and y = 1.

Equalized odds has the main intuitive appeal that it is not in principle opposed to high accuracy. Indeed, a model f which is more accurate in predicting y will generally also be closer to achieving equalized odds, because less errors imply that both the terms p(f(x) > θ|c = 1, y = 1) and p(f(x) > θ|c = 0, y = 1) are closer to one and both terms p(f(x) > θ|c = 1, y = 0) and p(f(x) > θ|c = 0, y = 0) are closer to zero (Hardt et al., 2016, Pleiss et al., 2017).

A related notion is predictive parity or the outcome test, which requires that the rate of people who have a desired outcome y = 0 even though their risk score is above the threshold is the same for the protected group and everyone else, and the rate of people who have a desired outcome y = 1 even though their risk score is below the threshold is the same for the protected group and everyone else. More precisely, predictive parity requires p(y = 0|f(x) > θ, c = 1) = p(y = 0|f(x) > θ, c = 0) and p(y = 1|f(x) ≤ θ, c = 1) = p(y = 1|f(x) ≤ θ, c = 0) (Berk et al., 2018). As can be seen from the formulae, in predictive parity the random variable of interest and the conditioned variable are inverted, that is, we care about the rate of people with a certain desired outcome, given that a certain risk score has been assigned, instead of the rate of a certain risk score, given a certain desired outcome. By virtue of Bayes' rule it is clear that equalized odds and predictive parity can only be both fulfilled if the ratio p(f(x) > θ|c = 1) / p(y = 1|c = 1) is the same as the ratio p(f(x) > θ|c = 0) / p(y = 1|c = 0) (Berk et al., 2018; Corbett-Davies and Goel, 2018).

Intuitively, violating either predictive parity or equalized odds provides a stronger indication of discrimination compared to statistical parity, because worse risk scores for the protected group were ‘undeserved’. For example, the pre-trial risk assessment tool COMPAS has been shown to fulfill neither equalized odds nor predictive parity, because Black people who did not get arrested for pre-trial crime (y = 0) received a higher rate of high risk scores compared to white people who did not commit pre-trial crime, and Black people with an above-threshold risk score had a lesser rate of crime compared to white people (Angwin et al., 2016; Corbett-Davies and Goel, 2018). Furthermore, predictive parity and equalized odds are easy to evaluate because they require the same kind of data, namely examples of features x and respective desired outcomes y, which are required to train a model f anyways.

However, there may be instances where the outcome test/predictive parity return misleading results. In particular, Simoiu, Corbett-Davies, and Goel (2017) consider the case of police car searches in Raleigh, North-Carolina, USA and model the decision of police personell to check a car or not via a risk model f. They found that police searches of cars driven by white people where more often unsuccessful compared to police searches of cars driven by Black people, in which case the outcome test would suggest discrimination against white people. However, they also found that there were many cases of Black drivers with obvious contraband which inflated the number of justified high risk scores for Black drivers. Taking this into account, Simoiu et al. (2017) found that police officers likely had a lower internal threshold for searching Black drivers compared to white drivers, i.e., they regarded less risk as sufficient for searching Black compared to white drivers. More generally, such artifacts may occur if the distribution of risk scores for one group has a higher variance compared to the risk scores of another group. Provided that the risk-scores are predictive, higher variance implies a higher number of people for whom the desired outcome y is easy to determine, i.e. a higher number of people close to the extremes f(x) = 0 or f(x) = 1. For groups with such high-variance risk scores, the outcome test as well as equalized odds will likely suggest discrimination against the non-protected group, even though a threshold analysis may suggest discrimination of the protected group (Simoiu et al., 2017).

Another issue with equalized odds and predictive parity is the reliance on desired outcomes y. The criterion inherently assumes that these desired outcomes represent the ‘ground truth’. For example, when constructing a model to screen job applicants, the desired outcome y would be 1 if the job applicant would be a long-term successful employee and 0 otherwise. Such data would likely be collected from the institutional records of existing employees, that is, the job applications of successful and unsuccessful employees in the past would be collected and be used as basis for both training a model f and for assessing its fairness in terms of equalized odds and predictive parity. However, the success of employees in the past is likely subject to discriminatory effects, such as discrimination in promotions, salary, and workplace harassment. Even more importantly, the data set necessarily does not contain data on people who have not been hired in the past in the first place, which implies that people who have historically been privileged may be significantly over-represented in the group with y = 0 (O'Neil, 2016). With respect to the model f, it may be possible to address such a sample selection bias by utilizing job application data without desired outcome data (e.g. Huang, Smola, Gretton, Borgwardt, and Schölkopf, 2006). However, equalized odds and the outcome test necessarily require ground truth data for an evaluation, and are thus still susceptible to sample selection effects. Indeed, selecting data in a biased way can be used to deliberately influence the fairness measure in order to hide or justify discriminatory practices (Corbett-Davies and Goel, 2018). Also note that this criticism is not limited to job applicant screening, but extends to all scenarios discussed here. In pre-trial crime risk assessment, we do not know whether someone who has been detained would have committed a crime if they would not have been detained. In case of credit risk assessment, we can not tell whether someone would have paid back a loan which they did not receive in the first place. In other words, equalized odds and predictive parity, as well as any other fairness measure which relies on desired outcomes y, fails to give people a chance to prove the system wrong.

Finally, mathematical proofs exist that equalized odds and predictive parity may not be strictly achievable in most practical settings. In particular, Pleis et al. (2017) have established a mathematical proof that equalized odds can not be strictly fulfilled if the model f is calibrated, in the sense that any individual with a risk score f(x) actually has a probability of f(x) to have an outcome y = 1. Furthermore, as we have established above, predictive parity and equalized odds can not be both fulfilled except in special circumstances (Berk et al., 2018; Corbett-Davies and Goel, 2018).

Accuracy Fairness

A notion that is implicit in equalized odds as well as predictive parity is that a model which exhibits perfect classification is necessarily fair. More precisely, y = 1 if and only if f(x) > θ implies that f is fair. This also conforms to the intuition that higher accuracy should not contradict fairness (Hardt et al., 2016), because this requirement implies as a special case that perfect classification can not be unfair. Beyond this utility-based intuition, Corbett-Davies and Goel (2018) suggest that achieving higher accuracy is a worthy goal in itself from a fairness perspective, because a risk model f that is accurate avoids to assign high risk to people who are not actually risky, irrespective of group membership (Corbett-Davies and Goel, 2018). Conversely, Corbett-Davies and Goel (2018) suggest that any fairness definition which permits to sacrifice accuracy risks malicious feature engineering to disadvantage non-risky members of a certain groups in order to formally satisfy the criterion.

A drawback to accuracy fairness is that it has no legal complement. Indeed, the concept that accuracy promotes fairness implies some problematic consequences, such as to take in as many information as possible for prediction, which is at odds with the concept of data minimization as required by the General Data Protection Regulation (European Union, 2016, Article 5.1c).

However, like with equalized odds and statistical parity, accuracy can only be seen as promoting fairness if the labels y are correct. Otherwise, high accuracy can not only lead to adverse results but can lend undue credibility to a system which does not deserve it (O'Neil, 2016).

Threshold Fairness

Based on their analysis of the outcome test, Simoiu et al. (2017) have suggested to define a system as fair if and only if it assigns the same threshold θ to the protected group and everyone else. In contrast to statistical parity, equalized odds, and predictive parity, this notion is not concerned with the rates of decisions for certain groups, but with the specific way in which decisions are generated based on the risk score. The notion holds that if decisions are generated in the same way for all individuals, then the resulting decisions are fair.

An advantage of threshold fairness is that it aligns well with legal standards against direct discrimination (European Union Agency for Fundamental Rights, 2018). Further, threshold fairness is relatively straightforward to verify if the decision-making process has the form suggested in our setup (Corbett-Davies and Goel, 2018). Finally, threshold policies provably maximize utility in a broad range of scenarios, such that they are likely to be applied (Corbett-Davies and Goel, 2018).

An inherent problem of threshold fairness is that the risk assessment model f is implicitly assumed to be correct and objective, which pre-supposes both calibration and correct labels. If f is not calibrated across protected groups, f may overestimate the risk for a protected group and thus may disadvantage the protected group, such as in the case of risk scores for female versus male accused in pre-trial risk assessment (Corbett-Davies and Goel, 2018). If the labels are incorrect, the same criticisms as in the case of equalized odds and statistical parity apply.

Another issue is that threshold fairness explicitly forbids affirmative action, positive action, or preferential treatment of a historically disadvantaged group by lowering the threshold for that group (Corbett-Davies and Goel, 2018). Thus, a single-threshold requirement may be an obstacle in achieving more overall societal equality in cases where affirmative action is deemed necessary.

Process Fairness and Causal Fairness

Whereas threshold fairness is concerned with how a decision is made based on risk scores, process fairness is concerned with how the risk scores are generated in the first place. In particular, process fairness as defined by Grgić-Hlača, Bilal Zafar, Gummadi, and Weller (2016) requires that the feature vector x does not contain features which are deemed unfair in the public eye. Such features typically include membership in a protected group as well as proxies for that membership (Grgić-Hlača et al., 2016).

A severe drawback to process fairness is that there may be many features which correlate with the membership in a protected group and could thus be regarded as unfair. Excluding all of those features can severely hurt accuracy and thus may conflict with accuracy fairness, equalized odds and predictive parity. Furthermore, not taking into account group membership, that is, being ‘gender blind’ or ‘race blind’, also prevents any affirmative action policy which could be put into place to promote desirable long-term goals.

A recent alternative to process fairness is causal fairness, which requires that group membership, or proxies thereof, should not be causally influential on the risk score f(x) (Gajane and Pechenizkiy, 2018; Kilbertus, Rojas Carulla, Parascandolo, Hardt, Janzing, and Schölkopf, 2017; Kusner, Loftus, Russel, and Silva, 2017).

Re-phrasing the fairness criterion in terms of causality has the key advantage that it directly relates to the typical phrasing in anti-discrimination law that people should not be treated differently because they belong to a protected group (European Union Agency for Fundamental Rights, 2018).

Another advantage of causal fairness is that it does not require the removal of all features which are causally related to group membership, but instead it is possible to combine problematic features in a fashion that counteracts problematic causal paths and thus leads to an unbiased decision (Kilbertus et al., 2017, Kusner et al., 2017).

A drawback to causal fairness is that it requires a detailed causal model of the domain in question, including explicit equations for all causal relationships (Kilbertus et al., 2017, Kusner et al., 2017). These very demanding preconditions will likely not be fulfilled in many scenarios.

Discussion

As we have seen in our review, different fairness measures in the literature are bases on different intuitions regarding fairness, such as similar treatment of similar cases (individual fairness), different treatment for different cases (accuracy fairness), equal treatment across groups (statistical parity), equal treatment across groups ‘if they deserve it’ (equalized odds, predictive parity, and threshold fairness), and processing that is independent of group memberships (process fairness and causal fairness).

We have also seen that these different underlying intuitions may be competing and that even measures based on the same intuition, such as equalized odds and predictive parity, may be mathematically contradictory. Finally, we have seen that not all fairness measures are well-aligned with European anti-discrimination law. Indeed, while individual fairness, threshold fairness, and causal fairness capture European law regarding direct discrimination quite well, a formalization of indirect discrimination in the sense of worse treatment compared to another group that is in a similar situation still seems to be lacking.

However, even if formal definitions for all legal anti-discrimination concepts were available, we still lack a consistent meta-framework which can guide our decision on which fairness concept to use in any given situation. In the next section, we attempt to establish dynamical systems analysis as such a framework.

Dynamical Systems View on Fairness

A key idea guiding modern fairness intervention, in particular affirmative action, is that we wish to set constraints and incentives such that society moves towards a state in which individuals and groups are more equal (European Union Agency for Fundamental Rights, 2018). This viewpoint suggests to analyze the impact of an algorithmic decision making process and fairness definitions in a dynamic setting. More precisely, we should try to model the domain of a decision making process as a dynamical system and try to apply fairness interventions which guide the system towards a desirable stable equilibrium, that is, an equilibrium in which groups are treated equal.

To date, this viewpoint has hardly been studied in the fairness literature, except for two noteworthy and admirable exceptions. First, Liu et al. (2018) have studied the delayed impact of fairness measures by modeling the effect on risk scores. They found that statistical parity may over-accept risky people in the protected group, such that risk scores degrade over time for the entire group, which could be seen as detrimental. In contrast, equalized odds may under-accept non-risky people in the protected group, which means that less people in the protected group receive a positive outcome and the risk scores do not rise as much as possible.

More closely aligned with our suggested view is the work of Hu and Chen (2018), who built a detailed game theoretic model of the labor market and showed that, even if the base rate of ability on the protected group and the remaining population is exactly equal, most fairness measures permit a stable equilibrium in which the protected group remains under-represented in high-skilled positions in the labor market. By contrast, requiring statistical parity for hires into temporary positions implies a long-term stable equilibrium with proper demographic representation.

We note that Hu and Chen’s (2018) model is relatively specific to the scenario of the labor market and involves quite a bit of specific design decisions which we can not describe in detail here. However, their general result is reproducible even with much simpler models. In particular, we obtain the same qualitative behavior under the following assumptions:

  1. The number of non-risky classifications that the institution can make is bounded, for example by the number of jobs available in the company, the amount of money the bank can lend, or the amount of people the state can detain. In formal terms, this means that P(f(x) ≤ θ) is bounded above by some constant κ.
  2. The rate of non-risky people is not constant but changes with a rate that increases for higher rates of non-risky classifications. This is plausible in all three scenarios. In case of job applications, a higher rate of accepted people of a group in a certain field enhances the availability of attainable role models, increases the subjective feel of fit, may thus enhance motivation to enter the field and stay in the field, and thus enhances the rate of actually qualified people of that group. In case of lending, more accepted loans for a group enhances the availability of resources within that group, thus enhancing the general economic situation of the group, which in turn increases the chance of paying back the money. Finally, in case of pre-trial risk assessment, more people of a group that are not detained may stabilize community relations and trust in the judicial system, thus enhancing social control, which in turn lowers the chance of crime. We formalize this assumption by stating that the change rate Δ P(y=0|c=1) is proportional to (P(f(x) ≤ θ|c=1) + ν) · P(y=0|c=1) · (1 - P(y=0|c=1)), where ν is some constant and the terms P(y=0|c=1) and (1 - P(y=0|c=1)) ensure that the dynamical system is well-behaved and does not leave the range 0 ≤ P(y=0|c=1) ≤ 1.
  3. There is an initial cost of becoming non-risky, i.e. of obtaining the qualifications and motivation to be qualified for a job and stay in it, of engaging in business activity to be able to pay back a loan, or to not engage in crime despite strong social pressures. This implies that for a sufficiently low non-risky classification rate for a group, people within that group will rationally choose to not pay that cost, which we formalize by stating that ν < 0.
  4. The risk score f and the threshold θ are optimal for the institution in the sense that no person is classified as non-risky if they are risky and people are only falsely classified as risky if this is necessary due to the first assumption. This means, accuracy fairness is maintained.
  5. The false positive rates accross groups are equal, that is, equalized odds holds.

Under these assmptions, there exist a wide range of values for κ, ν, and initial risk rates P(y = 1|c = 1), such that the dynamical system approaches a stable equilibrium in which all people in the protected group are risky

Under these assumptions it follows that for a sufficiently small initial rate of non-risky classifications for the protected group, society will converge to an equilibrium in which P(y=0|c=1) = 0 and P(f(x) ≤ θ|c=1) = 0, that is, no one in the protected group is non-risky and no one of the protected group is classified as non-risky (see attached Figure). This is a sobering result given that many fairness notions above, in particular individual fairness, accuracy fairness, equalized odds, threshold fairness, process fairness, and causal fairness may be fulfilled, and we still obtain an undesirable equilibrium. The only fairness measure on the list which is certain to yield a desirable outcome is the benchmark test or mean difference test, in which case the system trivially converges to a state in which all people in the protected group and outside try to be non-risky, such that P(y=0|c=1) becomes equal to P(y=0|c=0) and P(f(x) ≤ θ|c=1) = P(f(x) ≤ θ|c=0).

/futurium/en/file/dynamicfairnesspngdynamic_fairness.png

The state space of a dynamical systems model for risk rates in automated decision making
The state space of a dynamical systems model for risk rates in automated decision making. The rate of non-risky people in the protected group is shown in the x axis, the rate of non-risky people in the remaining population on the y axis. As can be seen, whenever the protected group is riskier compared to the remaining population, the system approaches an undesirable equilibrium. The basin of attraction for this equilibrium is shown in red.
Copyrights : 
Own Copyright

To re-phrase the result in less mathematic terms: In scenarios with bounded resources, feedback loops, and initial investments, undesirable equilibria may occur even if classification is perfect and all kinds of fairness measures hold. Only the benchmark test can ensure that the system keeps converging to a desirable state in which the protected group and everyone else are well represented. Even better, once the system has converged, the benchmark property does not need to be enforced explicitly anymore, because it is the rational economic choice to assign equal non-risky classification rates anyways. This sheds new light on how fairness evolves in the long run and makes a strong case for affirmative-action like policies.

Conclusion and Policy Recommendations

In this article we have provided an overview of state-of-the-art definitions of fairness in the machine learning literature, related them to European legal notions, and provided a novel analysis from the viewpoint of dynamical systems. In this analysis we have found that the benchmark test, related to demographic or statistical parity, as well as affirmative action, is the only fairness notion which ensures that society converges to an overall desirable goal, namely equality between protected groups and everyone else. This is an important result because the machine learning literature in fairness so far appears to favor measures which regard the risk distributions in the training data as an unchangeable ground truth and optimize with respect to them, whereas we would propose a view of risk as dynamic and malleable.

With respect to policy, we recommend the following.

  1. Fairness is likely not assessable by merely looking at the training data. Instead, it may be necessary to inspect the feature space (as recommended by process and causal fairness, as well as individual fairness), the decision making mechanism (as suggested by threshold fairness), and, maybe most importantly, the long-term societal impacts (as suggested by our dynamic analysis). This detailed view suggests an auditing framework, in which decision making policies are studied by a commission of experts and assessed in terms of various notions of fairness. This is also in line with the recommendations of O'Neil (2016) and the ACM (2017).
  2. Fairness notions may not yet be sufficiently aligned with European legal concepts, especially in terms of indirect discrimination. This suggests that further interdisciplinary work of European legal experts, social scientists, and machine learning experts may be required to translate legal norms into machine-readable form.
  3. Simple measures like statistical parity or the benchmark test, which essentially suggest affirmative action-like policies, are a reasonable tool to incite social dynamics in a desirable direction. It would be desirable to maintain and extend the legal possibility of applying such affirmative action policies, and to promote such policies within automatic decision making.

References

  • ACM (2017). Statement on Algorithmic Transparency and Accountability. Association for Computing Machinery US Public Policy Council. Link
  • Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). Machine Bias. ProPublica, 2016-05-23. Link
  • Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. (2018). Fairness in Criminal Justice Risk Assessments. Sociological Methods & Research, in press. doi:10.1177/0049124118782533. Link
  • Bolukbasi, T., Chang, K., Zou, J., Saligrama, V., and Kalai, A. (2016). Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Proceedings of the 29th Conference on Advances in Neural Information Processing Systems (NIPS 2016), 4349-4357. Link
  • Corbett-Davies, D., and Goel, S. (2018). The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning. Tutorial held at the 19th ACM Conference on Economics and Computation (EC 2018) and the Thirty-fith International Conference on Machine Learning (ICML 2018). Link
  • Crawford, K. (2017). The Trouble with Bias. Keynote held at the 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017). Link
  • Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel, R. (2012). Fairness Through Awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (ITCS 2012), 212-226. doi:10.1145/2090236.2090255. Link
  • European Union (2012). Charter of Fundamental Rights of the European Union. Office Journal of the European Union, C 326/391. Link
  • European Union (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Link
  • European Union Agency for Fundamental Rights (2018). Handbook on European non-discrimination law. Publications Office of the European Union. doi:10.2811/792676.
  • Gajane, P., and Pechenizkiy, M. (2018). On Formalizing Fairness in Prediction with Machine Learning. arXiv:1710.03184.
  • Goodman, B., and Flaxman, S. (2017). European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3), 50-57. doi:10.1609/aimag.v38i3.2741. Link
  • Grgić-Hlača, N., Bilal Zafar, M., Gummadi, K., and Weller, A. (2016). The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making. In: Weller, A., Singh, J., Grant, T., and McDonnel, C. (eds.), Proceedings of the NIPS Symposium on ML and the Law. Link
  • Hardt, M., Price, E., and Srebro, N. (2016). Equality of Opportunity in Supervised Learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (eds.), Proceedings of the 29th Conference on Advances in Neural Information Processing Systems (NIPS 2016), 3315-3323. Link
  • Hu, L. and Chen, Y. (2018). A Short-term Intervention for Long-term Fairness in the Labor Market. In: Lalmas, M. and Ipeirotis, P. (eds.), Proceedings of the 2018 World Wide Web Conference (WWW ‘18), 1389-1398. doi:10.1145/3178876.3186044
  • Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., and Smola, A. (2006). Correcting Sample Selection Bias by Unlabeled Data. In: Schölkopf, B., Platt, J., and Hoffman, T. (eds.), Proceedings of the 19th Conference on Advances in Neural Information Processing Systems (NIPS 2006), 601-608. Link
  • Kearns, M., Neel, S., Roth, A., and Wu, Z. (2018). Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. In: Dy, Jennifer and Krause, Andreas (eds.), Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 80, 2569-2577. Link
  • Kilbertus, N., Fascon, A., Kusner, M., Veale, M., Gummadi, K., and Weller, A. (2018). Blind Justice: Fairness with Encrypted Sensitive Attributes. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 2635-2644. Link
  • Kilbertus, N., Rojas Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. (2017). Avoiding Discrimination through Causal Reasoning. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Proceedings of the 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), 656-666. Link
  • Kusner, M., Loftus, J., Russel, C., and Silva, R. (2017). Counterfactual Fairness. In: Guyon, I., Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Proceedings of the 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), 4066-4076. Link
  • LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521, 436-444. doi:10.1038/nature14539. Link
  • Liu, L., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. (2018). Delayed Impact of Fair Machine Learning. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 3156-3164. Link
  • Munoz, C., Smith, M., and Patil, D.J. (2016). Big Data: A Report on Algorithmic Systems, Opportunity, and Civil Rights. Technical Report. Executive Office of the President, The White House. Link
  • O'Neil, Cathy (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. The Crown Publishing Group. ISBN: 978-0553418811. Link
  • Pleiss, G., Raghavan, M., Wu, F., Kleinberg, J., and Weinberger, K. (2017). On Fairness and Calibration. Proceedings of the 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), 5684-5693. Link
  • Romei, A., Ruggieri, S. (2013). A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review, 29(5), 582-638. doi:10.1017/S0269888913000039. Link
  • Simuiu, C., Corbett-Davies, S., and Goel, S. (2017). The Problem of Infra-Marginality in Outcome Tests for Discrimination. The Annals of Applied Statistics, 11(3), 1193-1216. doi:10.1214/17-AOAS1058
  • Yona, G., and Rothblum, G. (2018). Probably Approximately Metric-Fair Learning. In: Dy, Jennifer and Krause, Andreas (eds.), Proceedings of the 35th International Conference on Machine Learning (ICML 2018), 80, 5666-5674. Link
  • Zliobaite, I. (2017). Measuring discrimination in algorithmic decision making. Data Mining and Knowledge Discovery, 31(4), 1060-1089. doi:10.1007/s10618-017-0506-1