Data Mining: Ethics, Ethos, Episteme

Solon Barocas

This dissertation examines the novel challenges that data mining poses to privacy, fairness, and autonomy. It first shows how ‘big data’ necessarily reflect pre-existing views about what exists and what is worth capturing. It further demonstrates how the information systems that ‘capture’ big data are shot-through with specific ideas of a social world and sociality that they are innocently meant to mediate, but which they quite clearly shape. These findings suggest that practitioners, policymakers, and scholars should attend to the properties of such systems before they adopt big data as objective evidence for their own purposes. The dissertation then delves into the inner workings of the data mining process to better account for its foundational assumptions, its potential sources of bias, and its claims to accuracy. It demonstrates that the push for improved accuracy may have the perverse result of reifying evaluation methods that cannot capture the full range of bias and error that may beset a data mining project. It also addresses the fact that improved predictive accuracy often comes at the cost of greater complexity. 

The dissertation then develops a framework to explain why consumers may perceive certain kinds of inferences as violations of their privacy. It focuses on a series of real-world cases where the very possibility of making inferences was not apparent and where individuals could not arrive at these conclusions through their own powers of reason. The dissertation argues that where such inferences deny individuals the ability to anticipate the possible import of the behaviors that they exhibit, individuals will perceive data mining as a profound threat to their privacy and autonomy. 

Finally, the dissertation explores the paradoxical finding from computer science that attempts to ensure procedural fairness in data mining may be in conflict with the imperative to ensure accurate determinations. It shows that data miners cannot disentangle legitimate and proscribed criteria from their model-building because proscribed attributes meaningfully condition what relevant attributes individuals possess. The dissertation concludes by considering the policy implications of the finding that any decision that only takes these relevant attributes into account would still nevertheless recapitulate inequality.