HUMAINT researcher Songül Tolan presented ongoing work on fairness in re-offense prediction in criminal justice at the Algorithms & Society Workshop. The title of the presentation was
"Performance, fairness and bias of expert assessment and machine learning algorithms: the case of juvenile criminal recidivism in Catalonia"
and is joint work with Carlos Castillo (Universitat Pompeu Fabra), Marius Miron and Emilia Gómez.
The workshop was organised by Privacy Salon & Law Science Technology and Society (LSTS), Vrije Universiteit Brussel and supported by Open Society Foundations.
For more details, see: https://vublsts.wordpress.com/2018/10/04/call-for-contributions-algorithms-society-workshop/
The goal of this paper is to evaluate the predictive power and fairness of an expert assessment instrument, the Structured Assessment of Violence in Youth (SAVRY) in Catalonia, and to compare it against standard Machine Learning (ML) algorithms.
SAVRY was developed as a tool for assessing violence risk of adolescents (aged 12-18) but it has been found effective in predicting the risk of general criminal recidivism.
As such, it plays a role in individual lives and it influences the youth crime rate, as it is effectively used in intervention planning, such as clinical treatment plans or release and discharge decisions (Borum et al. 2006). Although these kinds of assessments do not intend to discriminate by gender or race, previous studies on similar tools have revealed unintended cases of discrimination (Angwin et al. 2016). This has led to an extensive and rapidly growing literature on evaluating and assuring fairness in risk assessments in criminal justice (Berk et al. 2017). In our analysis of SAVRY, we propose a rigorous methodology for evaluating fairness, taking into account the uncertainty of our predictions. In addition, we discuss the implications of different sources of bias for fairness and performance analysis.
We compare the performance of expert assessment vs. ML algorithms that also use information on defendant demographics and criminal history. Our dataset comprises observations of 4752 Catalan adolescents who committed offences between 2002 and 2010 and whose recidivism behavior was recorded in 2013 and 2015. The SAVRY is available for a subset of 855 defendants.
The ML models outperform SAVRY in predicting recidivism and the results improve with more data available for training (AUC of 0.64 with SAVRY vs. AUC of 0.67 with ML models). Training on additional SAVRY features derived from the questionnaire does not improve the prediction of general recidivism. However, we find that the same ML models with demographic factors do not perform as well as SAVRY when predicting violent recidivism (AUC of 0.64 with SAVRY vs. AUC of 0.57 with ML models). These results are in line with findings from the criminological literature (see, e.g. Mulder et al. 2011).
We evaluate SAVRY and the ML models according to the fairness definition termed as “error rate balance” (see, e.g., Chouldechova 2017, among others). Despite the fairness trade-offs that are prominent in the literature, we do not face similar trade-offs for relevant fairness measures in this case. We find that SAVRY is in general fair, while the ML models tend to discriminate against male defendants, foreigners, or people of specific national groups. For instance, foreigners who did not recidivate are almost twice as likely to be wrongly classified as high risk of recidivism by ML models than Spanish nationals.
Finally, we address potential concerns of bias through sensitivity analyses of our ML models based on varying label definitions and sample restrictions. Our findings suggest that data-driven risk assessments in criminal justice should be bias- and fairness-aware, and that there is a need for a rigorous methodology for bias and fairness evaluation and mitigation.