Data needs ethics — and outspoken nerds

Tuesday, April 11, 2017 - 4:15pm
Cathy O’Neil, author of New York Times bestseller “Weapons of Math Destruction,” spoke at Mount Holyoke on April 8.

By Laurie Loisel 

With her atomic-turquoise-colored hair, Cathy O’Neil looks the part of the rock star she is in certain circles — and her message is every bit as mind-blowing as a legendary rock anthem. 

O’Neil is an unapologetic math nerd, a term the more than 100 people gathered for her talk at Mount Holyoke College on Saturday would hardly find insulting. 

The message this mathematician, data scientist and quantum-finance-expert-turned-activist is spreading like a gospel is that data and algorithms are only as objective as the people who create and interpret them — which is to say, in some cases, hardly objective at all. 

Everyday, non-nerdy types need to understand this in order to be critical readers of data, O’Neil said as she sounded the alarm that mathematical models meant to level playing fields are often tainted by the bias they aim to root out. This bias then becomes embedded in the new system that people blithely believe is objective. 

O’Neil, author of the National Book Award finalist and New York Times bestseller “Weapons of Math Destruction” and the babe behind the blog mathbabe.org, delivered a keynote address urging people to look critically at the claims, conclusions and public policy that were developed based on algorithms, which she calls “an opinion that you embed in code.” 

The perspective O’Neil brought to campus is invaluable, said Amber Douglas, associate professor of psychology and education and co-chair of Mount Holyoke’s data science Nexus concentration. O’Neil was the program’s first speaker since the Nexus kicked off last fall. 

“As we started building this program, we really wanted to make sure that we had critical voices,” Douglas said. 

Critical O’Neil is. She’s also a blunt-talker (“We blindly trust data science,” she said. “How stupid is that?”) She sprinkles her talk with indignation, outrage, humor and liberal use of the word “DUDE!” 

“The thing about machine learning is that it doesn't make things fair, it just codifies past practices,” she said. “It automates jerkiness.” 

O’Neil offered examples of systems created to root out inequality that wound up reinforcing it: Teacher-rating systems that penalize good teachers in poor schools for student test scores that likely have nothing to do with the quality of the teaching. Educational policies based on a belief about a supposed decline in SAT scores that she said was a mathematical mistake. Recidivism-risk algorithms ostensibly used to give criminal sentencing guidelines more objectivity. The financial crisis that she said “relied on a mathematical lie, a weaponized mathematical model.” 

Conclusions based on big data sets are only as good as the data used, O’Neil noted. Data is information and that information can help to understand patterns, but only if the data is good. Getting perfect data is often impossible, so data scientists use proxies. Changes in student test scores can serve as a proxy for learning. Every algorithm is created by people who choose their own proxies under the influence of their own biases and assumptions. 

Crime statistics, for example, don’t even begin to paint a full picture of crime in this country. Why? Because they are based on arrests. This prompted O’Neil to say in exasperation: “Dude, most crimes do not lead to arrests!” 

Yet those statistics are used for predictive policing, the practice of sending police to certain neighborhoods, based on data that O’Neil noted ignores “missing white crime.” “It kind of creates its own reality. It’s a feedback loop,” she said. 

“The problem is that we don’t have ground truth on most crime. We don’t know what the actual criminality rates are,” she said. 

When she began evaluating recidivism-risk algorithms used to give criminal sentencing guidelines more objectivity, she realized the extent to which those models widened inequity, O’Neil said. That information “kept me up at night” when she was writing “Weapons of Math Destruction.”

When she discussed the problem with the data scientist who constructed the recidivism models, asking if he had used race as a measure in his calculations, he said he would never do that because it would be wrong. But when she asked if he had used ZIP codes, he said yes. ZIP codes are a proxy for race, noted O’Neil, and “bad proxies can have real effects.” 

O’Neil’s ideas are essential for the next generation of data scientists, said Martha Hoopes, professor of biological sciences and co-chair of the data science Nexus. 

“As we were starting this data science program on campus, we knew we wanted to inject ethics,” Hoopes said. “We have these tools and the tools are beautiful, but how we apply these tools does not always end up being beautiful.” 

The problem is particularly evident in the case of mathematical models, she said. 

“We cannot build a context-free algorithm because there’s always context,” she said. “Even if we think we are being unbiased, we all carry our own biases, and we build them in.” 

Math major Paula Kayongo ’17 said reading O’Neil’s book and listening to her talk opened her eyes as she prepares to enter a field that will involve math modeling. She’d like to study more about the ethics of data science applications, she said. 

“This is the first time I’ve ever seen anyone talk about the bad sides of math or the bad sides of data science,” she said. 

Kayongo acknowledged that O’Neil had not attributed the problems she sees in algorithms to math. (In fact, O’Neil said dryly, “I would never blame math for anything because math is beautiful.”) And she too can see that the problem is not the math. 

“There are,” Kayongo said, “all sorts of human factors.” 

Build your own future. Learn more.