Big Data Noir
No noir story will match the ones told by Big Data. In the future, noir stories will emerge from Big Data only it won’t be fiction. Authors of crime fiction, noir, hardboiled, or otherwise, are like monks writing manuscripts before the printing press. Our end will be as noir as their end. Here’s the story of how that will come about.
I’ve thought of writing as a way to discover and explore vanishing points, light fading to the void of total darkness. That is the point where we can no longer predict what will happen next. It is a brick wall. A blank. We stop at the door to the future and are resigned it will never open.
In, Big Data: A Revolution, the authors Viktor Mayer-Schoenberger and Kenneth Cukier have opened that door a crack. But don’t buy this book. You don’t seriously want to know what is inside our near future in the Data-Time-Space spectrum.
Towards the end of this provocative book, the authors sum up: “The ground beneath our feet is shifting. Old certainties are being questioned. Big data requires a fresh discussion of the nature of decision-making, destiny, justice.”
That is only the beginning of the transformation that will happen in our life time. It is already happening, it’s started to come into the open. The huge weight and force of Big Data and the hunger of power to own it, share it, distribute it, and exploit it. We are in the middle of that big data war. Government officials and big business owners are in their bunkers figuring out what to do next. No one has explained clearly what is at stake, the options, or the current state of play. Big Data, A Revolution attempts to provide context and meaning in an era where data is no longer scarce or expensive, but readily available and infinitely valuable in making predictions about future outcomes.
Our preferences, attitudes, and mental states will be predicted with an advanced probability software and hundreds of millions equations—and that raises a number of questions.
It is happening now as you read this essay. You are the composite of your data; your choices, likes, purchases, friends, emotional connections, and routine have been datafied. This data of your past can’t be erased, deleted or changed; it will follow you wherever you go into the future. The days of starting over are finished. You can never go missing or disappear completely as you pull behind yourself a history that is your digital DNA.
Your mental thumbprint is now in the system and attached to this blog. It stopped there. Who else who has ever read this blog is an association? That data is stored in the system. Websites, blogs are hovered for information, and this how Big Data continues to grow four time faster than America’s G.N.P. There is a probability that your digital presence here means that you may share certain habits, buying traits, or be connected to some free thinking troublemakers who also visit this blog.
You can no longer control, handle, supervise or understands the scale and scope of your data or the Big Data. But we have seen nothing yet. Big Data is set to grow exponentially. Some of that will be extremely useful in understanding and dealing with important problems like climate change, curing diseases, or advancing entire domains such as physics, chemistry, and mathematics. The assumption is that our understanding of the world, describing it, predicting it is a limitation on quantification of data.
To fully exploit the potential of big data we need to appreciate the scale and scope of the power that comes from collecting, storing, distributing, selling and analyzing the range of correlations that emerge when N=All. We will also pay a substantial price. Big Data is not ours without some long-standing beliefs, habits, attitudes and customs being changed. The next stage of development are data. They are being built from masses of data as you read this essay. Real economic, social and political owner will reside inside them.
Since the thirteenth century, we have searched for answers about the world and behavior that are precise and exact, and we seek out causation between events, people, and things. Our quest is to know if what we believe about the world is true or false, right or wrong, good or bad—we bring our moral and emotional sense of being in the world in the cross-hairs when we address the implications of Big Data.
Big data works not off exactness; it is premises that reality is messy and the data can provide a probability of what will emerge in the future. Big data promises a set of predicted outcomes according to a scale of probability based on what will likely happen. In turn, we give up the mission to understand why something has happened or may happen. The ‘why’ question is one that asks about causes to explain what is the nature of the world. Big Data leaves causation to the side because it is not helpful. The messiness of reality renders inquires about causation and precision less reliable. These ideas spring from an the old way of thinking when sense had to be made out of limited information and data. Causation and precision are relics of data scarcity and can be largely ignored as correlation is sufficient in the world of Big Data. Limited or Little Data required us to formulate a theory about what we’d expect the Little Data to prove, and then we used the Limited Data to test as to whether it had proved or disproved the theory. Think of climate change and theory of CO2 concentrations as the cause. That’s the old way of using Limited Data modeling.
Randomness in large big data gives a probability analysis that is more useful and predictive than a targeted, sample size of data. Sampling of data, the default measurement of the world, has become or will very soon become obsolete. Those conducting the data gathering in the past lacked the tools (processing speed, storage facilities, etc) to collect big data and the tools (software and algorithms) to analyze such vast quantities of data. They opted for precision, sampling, and theory testing. This old paradigm goes out the window with big data in many cases. With the full dataset offered by big data, researchers can explore many more angles and perspectives whether it is predicting the next bird flu outbreak or match fixing in sumo wrestling matches in Japan.
Big data has the capacity to scale entire populations of a city, region or country. Now when all telephone calls, emails, Internet searches, Twitter mentions and retweets, and Facebook ‘likes’ are captured and stored, this isn’t a sampling; it is the whole enchilada. “[W]e can accept some messiness in return for scale. ‘Sometimes two plus to two can equal 3.9, and that is good enough.’”
We already have an example of the limits of our capacity when tested against advanced algorithms. There are chess algorithms that are used once the computer has six or fewer pieces left on the board and allows the computer to processes the probability for every possible move (N=all). The Big Data authors conclude, “No human will ever be able to outplay the system.”
We have created a big data system that is much better at making predictions about outcomes than we can make using our native brain power. We humans have dropped down in the league ranking of the best, fastest brain processing capacity in the world. In coming up with a translation program, Google didn’t test a billion words, they used a trillion. Its services cover 60 languages and are more accurate than other systems. It won’t be long until computer translations, like playing chess, will perform vastly better than any human being.
Big Data also demonstrates the transition in thinking between viewing the reality of the world as not only messy but one in which predictions of what will happen rest on correlations that emerge from big data. Amazon has recommendations for you. Each time you visit Amazon they remember your digital history and present you will the kind of books that from your prior purchases indicate you are a ‘reader of interest’. One-third of Amazon’s business is from buyers like you who click on and buy the recommended purchase. For Netflix the percentage of online rentals that come from a recommendation is seventy-five percent of all the business.
Amazon and Netflix offer two good examples of how using probability tools can increase the revenue of a company. There is no certainty that you will buy the recommended book on Amazon or rent the recommended film online from Netflix, but you can see the probability makes the effort pay off in rich rewards for both companies.
Big Data can’t tell Amazon why you buy a particular title. Indeed it is not interested in the why question; it is focused on what you are likely to buy given your past purchases and searches through their catalogue of books. The data opens up links that are also useful. A secondary use of the same big data may show that California international crime fiction readers are more probable to book a ticket to Beijing, Tokyo, Hong Kong or Bangkok, and targeting them with discounted fares may increase sales. The big deal about Big Data is that it has the potential for multi-uses, and many of those uses only become apparent much later. That’s one reason why storing data for long periods is in the interest of business and governments, and they will fight to keep this option; they want indefinite storage as they can’t predict what future technical and social dynamics might arise and they want all of the cards, old and new, on the table.
We were born into an information poor world. Our beliefs, political and social structures, our science and education were created out of a small sampling of the information about the world. We’ve spent our life making decisions, forming opinions and making judgments based on limited data giving us precise, exact answers as to the state of the world and each other. We are wired to look for causation. In the big data world we are told this is delusion. There is no math that can easily show causal links; but correlations are easily translated into mathematical equations.
Big Data, the book, looks at the risk of big data as it presents a real “risk [of] falling victim to a dictatorship of data.” While Amazon uses algorithms to recommend books, lawn mowers, watches, and clothes to you, there is the potential for repression if the gathering, storage, use and distribution is left to be carried out in secret. We don’t know the limits that push back against the collection and use of Big Data. In a generation, people will look back and see our time as the tipping point when we lost privacy. The big data world will continue to strip away the possibility of privacy. Privacy existed because of the messiness of information, it’s limited nature and the expense and difficulty of collecting information about the world. You once had the power to divulge personal information. In the average day, you willingly and largely unknowingly disclose pieces of data about yourself—your likes, dislikes, activities, friends, purchases, health, schooling, and plans. We’ve uploaded our life onto the common Big Data network, a small fragment at a time, and by doing so we are forfeiting our own privacy. Privacy as we know it will vanish.
Crime and punishment will change as will opinions about proof beyond a reasonable doubt, and presumption of innocence. If big data can show a correlation between a person’s big data/information file that he has, say, a 79% chance of committing rape or murder within the next three years, will the state make a decision that a ‘probable perpetrator’ should be removed from society in order to protect society? The state would hold this person not because he’s committed a crime but the prediction is high that he will commit the rape or murder in the future. Many people may feel that with a high probability that the state should intervene and prevent the harm from happening.
The Big Data authors find that “the very idea of penalizing based on propensities is nauseating.” The future causes a sense of vertigo. It doesn’t share our values, our thinking, or account for difference between potential actions and the real thing. The authors fall back on the premise that it isn’t the problem of big data but the way we will use the predictions. The irony is the book is a call to loosen our fixation on causation and theories, and to learn to embrace messiness and predictability. When push comes to shove on preventive detentions, the authors retreat back into the world of causation and find decisions based on predictions ‘nauseating’. My view is once we jettison causation in the big data world, the use of predictions won’t be easily caged inside Amazon and Netflix’s world of recommendations. The data will get bigger, the prediction more accurate, and once that happens ‘assigning’ guilt based on a person’s particular act will appear as another example of medieval thinking.
An important takeaway from Big Data is, “In the era of big data, however, when much of data’s value is in secondary uses that may have been unimagined when the data was collected, such a mechanism to ensure privacy is no longer suitable.” The debate we will soon have is what is the continuing role of human agency in deciding individual responsibility for actions. Another part of that debate will be whether the decisions of big data will ultimately be made by machines. Humans will likely never fully understand or control the moves any more than an international grand master of chess in a game against Big Blue. Time moves on as does the debate; and the tools continue to improve, faster processors, larger memory capacity, better algorithms, and we wake up one day to find that “rational thought and free choice” are no longer part of a world that we control.
The data story doesn’t end with Big Data. There is no endgame as has always been the case with new technologies. Each innovation seems so incredible that we can’t imagine an improvement Remember the Beta cassettes? Our current technologies for Big Data will look like Beta cassettes in 5 to 10 years. Probably much sooner. As the period of change has accelerated from centuries to decades to years and looks ready to upend existing technologies in months. This period is a prelude to a much bigger transition in humanity’s quest to understand the world, and our place in it. We have gone “from compass and sextant to telescope and radar to today’s GPS.” Compared to the promise of what lies in our immediate future, our existing technologies to harness Big Data will be judged by future generations as closer to finger painting a horse on a cave wall.
Buy Big Data and give it to someone you want to give a freight load of sleepless nights. My predictions about scale and scope of big data, what will replace it, and how we will change our values and attitudes as a result, are beyond what we now know. It seems that all bets are off that this transition will be easy or smooth. Adjust to the fact that others will have infinitely greater information about you than you can ever imagine. You have become datafied. You can’t shake free, you can’t hide, you can’t go missing, and you can’t even hold your own ground.
The founder of Amazon has bought The Washington Post. Will the owner use the newspaper to suggest recommendations to politicians and others as to what policies, regulations and laws are the ones they should adopt? Will somewhere between one-third and seventy-five percent of The Washington Post click on and download those recommendations into their memory? The sale of the Washington Post is not just another sale of a newspaper to someone who is very rich, it is the sale of the newspaper to one of the founders of the new paradigm of gathering and distributing information. It is as if the owner of printing press bought a failing monastery and scribes writing manuscripts. You know that change is coming.
You’d be a fool to bet against the odds that one morning you we wake up to the fact that you live inside a data panopticon and there is anyway out. Not heard of panopticon? Get use to seeing more reference to that word. It is the prevailing metaphor of our time.