The Way We Visualize Big Data Is Broken
On the list of casualties wrought by 2016 is the pure faith liberals have in polls. Among outcomes pollsters (generally) failed to predict this year are Brexit as well as elections in the the U.K., Israel, and, of course, the United States. Donald Trump is now the President-elect, but in the days leading up to November 8th, most major poll aggregators had Hillary Clinton cruising to the White House, some to the tune of a 98% likelihood of victory.
Then, on election night, the predictor needle gradually tumbled towards the eventual outcome in real time. If you followed the live predictions on The Upshot (the New York Times’s data outfit) as I did, you probably remember their needle’s stomach-turning tremor. Its overall movement steadily toward Trump came from actual numbers. But, as it turns out, the needle’s unceasing little twitches up and down weren’t totally real. Instead, they were the result of an aesthetic decision to try and represent margin of error—and the way which this was processed on users’ screens caused some to say it bordered on the irresponsible. Election night therefore demonstrated a failure both of polling data and of the way that data is presented.
The U.S. presidential contest should be a wakeup call for those who think data should conform to subjective expectations of reality (Clinton must win; how could she not?) and those who think that data offers a God-like understanding of an immutable world. If data is to be mobilized toward better ends, it is worth understanding the subjectivities, flaws, and successes in how information is gathered, presented, and analyzed. Such criticality should extend from something as lofty as presidential polling to the unnoticed but hyper-present everyday data around us—how the scale of a subway map can alter our perception of the city we live in, for example. Like the trembling needle, maps—among the most prevalent data visualizations—are an obvious instance where how we see data impacts our emotions and how we conceptualize the world.
The Office for Creative Research has recently released its second journal exploring some of these issues. The self-described “hybrid research group” founded by Mark Hansen, Ben Rubin, and Jer Thorp in 2013 has put forth a series of 18 digestible essays, research pieces, and data visualizations for edition number two. The team looks at data within socio-economic contexts (mapping the unanswered police calls in Flint, Michigan) and personal ones (asking if it is ethical for parents to share information about their babies on Facebook before the child can consent). In each case, the pieces don’t simply present conclusions but rather compel one to look at how the data is gathered, how it’s presented, and, perhaps most importantly, how such information is subject to human input and feedback even as we look to it to tell us supposedly objective truths about our world and ourselves.
The book bursts with ideas that punch well above the relatively short time it takes to read them. It begins with a piece by Thorp detailing a research project out of Harvard University that analyzed the words Twitter users included in their Tweets to determine where the saddest people lived. Superimposing the geographic location of those depressing tweets over a map of New York, the researchers saw that Hunter College High School, on Manhattan’s Upper East Side, was the saddest place in the city. Thorp writes that the Harvard study represents a perilous example of big data society, a “world in which we are all being data-fied from a distance, our movements and conversations processed into product recommendations and sociology papers and watch lists, where the average citizen doesn’t know the role they are playing, surrounded by machinery built by and for others.”
The other problem? The results of the research were wrong. Those sad tweets didn’t come from Hunter—where Twitter is both banned and known as a social network only for old people—but from one single, very sad Twitter user nearby. From premise to algorithm, the case indicates how data can fail at every level, even as such studies coldly lecture people about something so intimate as how they feel. “It’s also a reflection of the kinds of big data stories that we’re so eager to believe: where a large data set combined with novel algorithms shows us some secret we would not otherwise have seen,” writes Thorp. In response, Thorp calls for human data over big data, for thinking empathetically both about the people supplying information and those consuming it, and, ultimately, for allowing people to provide feedback to the systems that prescribe truth often without listening.
In her essay, Sarah Groff Hennigh-Palermo goes back to the basics. She looks at how early conceptual frameworks accompanying the rise of the computer in the 1950s ultimately framed how we think about both the world and data itself. From the outset, data was treated as powerful for what it excludes, namely anything except universal mathematics. Data as it exists today excludes the fact that information is actually transmitted in a subjective context, and instead favors binary, on/off, yes/no understandings of the world. But, as Hennigh-Palermo writes, “information as we think of it is not natural, neutral, or inevitable. The universe is not a computer. That we think it might be is, in fact, a product of the context in which information was invented.”
Hennigh-Palermo describes how data today is “dehydrated” of meaning such that it represents ostensibly pure information. When that information is “re-hydrated” via visualization or other forms of interpretation, it occurs in a very human context that can concretize “current inequalities: the meaning we supply ourselves is likely subject to the prejudices and shortcuts endemic to our thought process.” The goal isn’t to eliminate the human. Rather, to her, facts are a substance, and data needs to be more reflective and attentive to the context through which it flows. “When we talk about data visualization, we are talking about how we take these free-floating bits and arrange them so that they once again are infused with meaning,” writes Hennigh-Palermo, “whether we undertake to use pattern, added context, or the hydrating powers of our own minds, compelled as they can be to locate meaning in the slightest outlines, as our eyes find faces in any cloud or tree trunk.”
Our minds are what make meaning when moving through the world, but our conclusions are nudged one way or another by what we see. In one essay, Candy Chan describes how the New York City subway map, which reduces stations to pinpoints on a not-to-scale map, is at odds with how people experience stations that often straddle several blocks and levels. That stations are simply dots tied to a specific street impacts how we think of them and allocate our time within them. Penn Station, for example, is a hellish labyrinth, but on the map it’s a single dot like all the other stations. Her Project Subway NYC, launched in 2015, looks to rectify this by providing 3D models of stations throughout the city. (An unrelated project that shifts our understanding of the map involved a guerrilla campaign that added stickers pointing to the small, unmarked Rikers Island, which sits between the Bronx and Queens, housing thousands of inmates.)
But demanding context is important even flipping through the pages in The Office for Creative Research’s book. In one essay, Noa Younse mulls over how machines are beating humans at things like chess and Jeopardy, and looks towards a future in which super-smart AI might eventually turn on its human creators. The piece is admittedly tongue-in-cheek (Younse pens a letter to our future robot overlords). But it also calls on us to recognize that while actively fearing the robot revolution is itself a joke, it is crucial never to forget the stakes: Who needs Skynet when self-driving cars could soon wipe out 3.5 million working class trucking jobs? The former is a silver-screen fantasy, the latter is an approaching reality.
Moreover, automation can’t be spoken of without recognizing that it is now and has always been deeply tied to race and class—a historical fact unearthed, among other places, in Thomas Sugrue’s studies of post-war Detroit, where African-American assembly line jobs went to machines first. The earliest jobs to go will be the ones manned by the most marginalized among us. Visualizing this reality is significant if it is to be fought. Also important is never forgetting what that data actually means to the human beings depicted. “Data visualizations might seem inert, but there are many ways they can cause harm,” writes Thorp. “A visualization might bring unwanted attention to a person or a group (or a high school). A map can trivialize human experience, by reducing a life to a dot or a vector. Representations of violent or tragic events can be traumatic to people that were directly or indirectly involved.”
The book also emphasizes our role in shaping the world around us, and in understanding that data should not be trusted to predict the future with perfection. On one FiveThirtyEight podcast relatively early in the tightening election, Nate Silver described getting emails from Clinton-inclined listeners worried about the outcome, looking for polling data to act as something of a palliative to ease their fears. His answer? It is not data’s responsibility to make you feel better, and if you want Clinton to win, you should go out and volunteer to make her win. Silver might think more empathetically about the emotions of people impacted by the presidential horserace, but his underlying point is sound. At its worst, data is treated as prescriptive rather than as informative. If data is to describe the world as we wish it to be, then it is up to us to make change, not change the data.
Cover image: The Upshot’s election night needle. Image via Gizmodo.