This paper appears in: , Issue Date: , Written by:
© 1992-2011 IEEE
Skip to Main Content
We've grown accustomed to speculative narratives about the birth of artificial intelligence (both on the screen and the page) in which computers programmed by us to teach themselves quickly exceed our own mental faculties and physical resources. The idea that an “ultraintelligent machine” will “gradually improve itself out of all recognition” is nearly as old as the term AI itself.1 It has gained prominence in the popular imagination through a cottage industry of books, articles, and think pieces advocating for the study of safe AI. The idea is to defer or defuse entirely an inevitable apocalypse, and it's prompted at least one multimillion dollar donation “aimed at keeping AI beneficial to humanity.”2
The reality is that no one is any closer to implementing a contextually nimble machine learning system that could, say, engage us in an exegesis of a poem than when Alan Turing first proposed this thought experiment for machine intelligence in 1950.3 One of the most exciting implementations of a “general purpose” algorithm was one that learned how to play 29 Atari video games at “human-level or above” proficiency.4 That is, what is “general” is its ability to learn different games by only having access to the pixels on the screen and the controller inputs while only being programmed to maximize score. This was such an impressive advance over earlier work that it was featured on the cover of Nature in 2015.5
The failure to appreciate this point has contributed to myopia in the popular histories of AI that rely on AI researchers as informants while downplaying the enormous body of technical work these informants produced, often relegating the field of machine learning to a mere subfield of AI. The actual historical situation, in terms of the sheer volume and ambit of technical publications produced, suggests the opposite to be true: machine learning has always been center stage, while AI within the larger field of computer science has often had the status of a disciplinary backwater.
We need better ways to discuss how machine learning systems are being integrated into our economic models, political rhetoric, and legal frameworks now, rather than in some speculative future. The use of learning systems as already deployed are disparately perpetuating and even reifying systemic prejudices and historical biases, often with the most adversely affected being those least well-positioned to protest their treatment.
A laundry list of examples is available, but for the sake of this discussion, let's be brief.6 Google's Adsense has been shown, as recently as 2013, to provide racially discriminatory personalized ads when the names searched tended to be associated with specific racial groups.7 Facebook's now infamous “emotional contagion” experiment in which users were shown more “positive” or more “negative” posts in an effort to see if they could write more negative or positive posts them-selves has been the subject of intense media scrutiny.8 And just this summer, an approach for statistically inferring semantic relationships between words was found to reflect pronounced gender bias—notably, in one instance, producing the following analogy: man is to computer programmer as woman is to homemaker,9 Machine learning systems are already being used to predict recidivism, even though critics have argued that they underestimate or overestimate the risk of recidivism for defendants on the basis of race.10 Similarly, such systems have been constructed to identify and classify refugees as terrorists using a hodgepodge of data collected from many different sources.11
The various errors that each particular machine learning system produces depends on the statistical model used, the learning algorithm by which the system updates prior estimates, and the specific data used to “train” (that is, constrain) the model. However, in practice, some of the most egregious machine errors tend reflect poor choices in training data, which is itself the result of various forms of historical systemic bias, often curated for polyvalent purposes and aggregated under radically different assumptions.
What we need is a better appreciation for the deeply contingent and mutually constitutive role between computation and data and the material, political, and social circumstances that inform how these algorithms are instantiated through different communities of practice. To begin to see how biases are propagated and reinforced via machine learning system training data, we need histories of the datasets themselves: how they were formed and why, how they have been maintained and subsequently altered, and how they were valued as useful data for the problems to which they were employed.
Amelia Acker has cogently and convincingly argued that data is itself “computing history” and that it can illustrate “how layers of context and meanings are acquired through [data's] development, stabilization, and circulation,” arguing that “social histories … of data moving through scales of information infrastructures, represent a valuable intervention into histories of networking but also into studies of information, system design, and communication technologies in different realms of society.”12 I agree, and it is the long histories of these datasets, as evidenced in code, material infrastructure, and social circumstance surrounding the construction of machine learning systems, that repeatedly traverse, reflect, and refract all scales of infrastructure, from the micro to the macro. Often changing with the concerns of the researchers and media formats even while bearing the scars of its contingent development, training data and data “gold standards” serve as particularly interesting material and intellectual assemblages for historical study.
The construction, maintenance, and mobilization of data used to both constrain and enable machine learning systems poses profound historiographical questions and offers an intellectual opportunity to engage in fundamental questions about novelty in historical narratives. To effectively explore the intellectual, material, and disciplinary contingencies surrounding both the curation and subsequent distribution of datasets, we need to take seriously the field of machine learning as a worthy subject for historical investigation. In the past, the field has generally only been a subject of popular or disciplinary histories, usually told or written in homage to the historical actors who feature prominently in the events discussed. The state of the historiography of AI from the point of view of the professional historian remains (to put it generously) in bad shape. Even the most comprehensive and certainly noteworthy example in the genre of disciplinary histories of AI-The Quest for Artificial Intelligence written by computer science professor Nils Nilsson13—provides a framing that is inadequate for addressing the kinds of social and ethical challenges posed by the applications of machine learning listed earlier.
If the historiography of AI in general is in poor shape, the historiography of machine learning is virtually nonexistent. Despite this, once we become more familiar with the subject, it is surprising how often machine learning is obliquely referenced in existing histories of computing and of science. In works such as Peter Galison's Image and Logic, Paul Edwards' A Vast Machine, and Stephanie Dick's “AfterMath: The Work of Proof in the Age of Human-Machine Collaboration,” machine learning is constantly being referenced, but rarely by name.14
Much of the rhetoric involving machine learning-including unmitigated claims of technical “disruption,” the tendency by some to have the statistical (ahistorical) model stand in for the social, and the active decontextualization of evidence—is repugnant to the sensibilities and attitudes of most historians. Perhaps this partially accounts for its neglect as a historical subject. A better reason may be that many historians are simply unaware to what degree machine learning is influencing the decisions they already make.
“The usual way we plan today for tomorrow is in yesterday's vocabulary,” Edsger Dijkstra proclaimed in his essay “On the Cruelty of Really Teaching Computer Science” (1988).15 He argued that commonplace metaphors and analogies used to discuss computing do more to propagate “unfathomed misunderstanding” of the “radical novelty” of computers than to bring us into contact with their consequences.16 This novelty for Dijkstra was twofold. First, using computers to solve problems requires profoundly “deep conceptual hierarchies” that span many orders of magnitude. These enormous changes of scale, Dijkstra argued, “[confront] us with a radically new intellectual challenge that has no precedent in our history.”17 Second, Dijkstra continued, computers have the “uncomfortable property” that “there is no meaningful metric in which ‘small’ changes and ‘small’ effects go hand in hand, and there never will be,” because they are digital rather than analog devices.18 Claims of novelty aside, it is surprising how these particular remarks of Dijkstra's seem to echo, at least superficially, the language of contemporary historians of technology and computing.
Machine learning systems tend to have the status of an experiment in that they are used both as a confirmation for and as a counterpoint to what we know. In this sense, machine learning is mobilized not merely to provide parity with human judgment. It can offer “insight” that is different from ours even as it provides information we recognize as familiar, desired, and in some cases, impossible.
Back to Top