1. Introduction

The problem of unequal accuracy rates across groups has recently been highlighted in gender classification from face images. A study by NIST shows that automated gender classification algorithms are more accurate for males than females [29]. Going further, Buolamwini and Gebru created a dataset of parliament members from three European and three African countries — the Pilot Parliaments Benchmark (PPB), balanced across two attributes: gender and Fitzpatrick skin type [15], and evaluated the accuracy of three commercial facial gender classifiers [4]. All three achieved much lower accuracy on dark-skinned females (Fitzpatrick skin types IV–VI) than light-skinned females, dark-skinned males, and light-skinned males. (Note that gender classification is a distinct task from race classification [16].)