Skip to Main Content
The era of big data is not coming; it is here. The birth and growth of big data was the defining characteristic of the 2000s. As obvious and ordinary as this might sound to us today, we are still unraveling the practical and inspirational potential of this new era. Google processes over 20 petabytes of data a day (a little less than half the entire written works of mankind from the beginning of recorded history in all languages) . In addition to collecting and searching for more information, the technologies that allow us to capture and interpret that data are improving every time we blink. Something as simple as a snapshot has become a data collection event.
A lightfield camera captures all the pixels in the direction it is pointing, allowing you to move focus back and forth as easy as a slide rule. Cameras in smartphones know where you are and embed that information into your photograph. New sensor technologies are evolving that range from facial recognition and social networking context to full augmented reality. It is only a matter of time before your “snapshot” does not just capture the image of your niece's birthday, but also the conversation you were having with your sister in the kitchen, who baked the cake, how much you ate, what kind of loot your niece raked in, and what she really wanted but did not get. Your aunt's blood pressure, your dad's cholesterol, and the blood alcohol content of everyone with a set of car keys in their pockets will all be tracked alongside the pixels of a spittle covered cake. In short, that cute little birthday snapshot you posted on Facebook is destined to become a giant slippery wad of data.
The problem with the massive data collection and distribution system we have created is: big data is a big mess. Most of the data we capture in our daily lives just sits around, cluttering up storage space on our devices and slowing down our connections. Actually getting to that data and doing something new with it is what we, the Intel Labs Futurecasting team, are most interested in. Building giant mounds of data is one thing, but managing, interacting with, and making sense of that data, that is compelling (Fig. 1).
We know that we are collecting more content—more pictures, more video, more TV, more text—but we are also collecting more of the information that surrounds that content every day through the sensors in our devices and in our environment. We know that computational power is increasing exponentially; both distributed (smartphones) and consolidated (Watson). And we know that the big users of big data (Google, Amazon, the scientific community) are constantly leapfrogging one another in an effort to deploy smarter algorithms and strategies for managing that data. What we want to know is: what will stories look like in the age of big data?
To answer “big questions” like that, we leverage the work of the social and computer scientists at Intel, and we talk with leading thinkers from industries and academic communities around the world about what they are doing now, and about what they think is happening next. Then, we create “experience prototypes,” etc., models that allow us to explore the new experiences and new possibilities that we believe are coming.
We begin with “what does big data mean today?” On a recent trip to Sweden, one of the authors, Johnson, got a peek into what it really means to create a “genuinely digital” entertainment landscape. Johnson traveled to Stockholm to talk with Per Bjorkman, the head of distribution for SVT, Sweden's public broadcaster. SVT flexes considerable power in the market, overwhelming that of the private firms. As one of the private firms told Johnson, “when SVT makes a move we all have to follow or die.” So when in 2007 SVT decided to digitize the entire Swedish television catalog, people listened.
When Johnson arrived to chat with Bjorkman they had finally completed the project. With great Nordic enthusiasm Bjorkman showed Johnson the server in the basement. “The entire cultural history of Sweden here on a five-petabyte server,” Bjorkman said with unabashed joy. Then, this summer, IBM built a 120-petabyte drive, reminding us that five petabytes will seem like home storage in 2050 .
It is not that SVT's accomplishment is not epic, but rather that giant digital storage boxes are only the beginning of the new entertainment era. For years, we in the high-tech world believed that if we could just digitize the entire world's content we could revolutionize entertainment. Digital enables TV on demand and immediate distribution of entertainment all over the world, right? Not quite. Managing that content is a nontrivial challenge. How do you find something to watch when the entire history of TV has been digitized? How do you even begin to search it? And if you do find something, how does it get from the server to your screen?
Engineers like to describe connectivity as a “solved” problem. We know how to move information from place to place; the rest is just implementation. “Just implementation” requires rethinking the business model of a massive international industry, unraveling and then reinventing a complex web of legal agreements and digital rights. And assuming that happens, it means moving massive collections of data across a fragmented infrastructure not designed with this kind of traffic in mind. Johnson's visit to Mumbai India, to attend the FICCI Frames conference, brought the challenges of real-world connectivity into focus.
FICCI is like holding Hollywood's Oscars, France's Cannes, Las Vegas' Consumer Electronics Show, and Austin's South by Southwest music festival all at the same time. Johnson traveled to Mumbai to see what Bollywood had in store for the future. Overlooking the dazzling metropolis of people, the sprawling Renaissance Powai Hotel played host to the conference. The tag line was, “A vision for the next decade.” Johnson saw the winner of Indian idol and had a long conversation with the legendary Yash Chopra. Yash is the American equivalent of Steven Spielberg but much more popular. He has had a film playing in Mumbai for 15 years straight. Imagine if Schindler's List were still playing at your local multiplex to sellout crowds. Yash and Bollywood know how to put on a show. And that show is incredibly important to people. It is not just a way to kill time. These movies, stories, and entertainment feed the soul of India.
So how do you deliver all that entertainment to the people who demand to see it? India has a notoriously complex and fractured infrastructure, but even back in Stockholm, the copper cables that enabled the Internet age cannot deliver the volume of data at the speeds that are needed to create a truly on-demand entertainment world. In many places on the planet, laying fiber (or anything) on the ground to physically connect each person to the cloud is untenable. Australia's massive broadband project has connected just 4000 homes in the first two years, and has amassed significant political opposition . India is looking to wireless for their solutions. In Sweden, they are counting on private industry to ensure 100 Mb/s speeds by 2020. What this means is that the future of entertainment in Stockholm will be delivered on the back of an entirely different data infrastructure than in Mumbai or Adelaide.
Even if all we want to do is continue to provide the entertainment streams that people already love and expect, then the devices, servers, and networks that deliver the next 40 years of data, movies, and communication are going to need to be a lot more intelligent. Just moving the simple but huge data files we have now demands exponential improvements in delivery strategies, raw bandwidth, and the reliability of our connections.
But what if what we actually wanted was to do something completely new?
“Never memorize something that you can look up.” Albert Einstein was not just being flippant. His underlying philosophy of human knowledge places greater value on insight and analysis than on raw data recall. But what happens if computers stop memorizing things, and start analyzing things instead? The two canonical truths in the old world of data were: 1) there is a right answer and 2) the right answer is stored in a big huge data file. We are comfortable with this model, because as children, this is how we learn.
In elementary school, we recite the state capitals. We memorize simple mechanical systems, like how a bill becomes a law; we are taught how to implement rules, how to test answers. But by the time we get to graduate school, we are expected to move beyond rules and answers. In political theory, for example, the mechanics of voting and representation became boundary conditions for more interesting and nuanced conversations about why social systems work the way they do. Simple facts are interesting only so far as they inform a reasoned analysis of the world around us. If you want to know if Milwaukee is the capital of Wisconsin, there is an answer (“no”). But if you want to know whether the Democratic boycott of the winter of 2011 supported or undermined the functionality and ideals of representative government, then, what you have is a debate, not a mechanical answer. This is far more complex, and much more interesting. Humans make a transition between rote knowledge and analysis and insight as we proceed through our academic and human lives. We argue that computers are just beginning to make this transition, and that stories are the most efficacious tools available to move them into this new world of computation.
James Cameron's 2009 blockbuster Avatar has more in common with Gutenberg's 1454 bible than you might think. Sure, each 10-s animation sequence in Avatar reflects around 10 TB of data; and there are well over two million pixels per frame. But when it is all said and done, we, as consumers of this exciting new medium, just flip from page (frame) to page in order. Not only that, the people making Avatar had the same problems that Gutenberg did. In order to get their work out for the people to enjoy (or criticize), they had to “print” each scene, and then stitch them together into one big data file. The challenge is that the enormous files that create a Hollywood picture are still behaving like a book, and if we want to move into the next great thing in entertainment we are going to have to leave the 15th century behind.
Today, a computer file, like a book, is filed under a fixed name, in a fixed location. It is necessary for the cogency of the document that it be stored all together. Creating the document was a great deal of effort, and we cannot easily recreate it from its component parts (words, symbols, pixels, etc.). Deciding on the structure and order of the parts was a complex task, and if we were to redo that task, we would not (actually we could not) produce exactly the same answer.
For astronomers, biologists, special effects teams, and architects the file sizes we are talking about are in the terabytes, just for starters. Managing multiple terabytes as a single file is fundamentally unwieldy. Each time you try to move the file, its sheer weight slows you down. Even if all you want to do is view (heaven forbid, change) a small piece of the data, you have to open and manage the full file. Currently, we throw raw processing power at the problem rather than reconfiguring it. Imagine you need to get a library book out of the backseat of a Cadillac Deville. One approach would be to lift up the Deville, and shake the book out of the passenger's window. However, the size and configuration of the Deville suggests a slightly simpler approach (key, lock, door handle, book). The fact that you (presumably) do not have the strength to lift the car is not relevant, because lifting the car in order to accomplish your objectives is neither necessary nor recommended.
So how do we rethink interacting with big data in ways other than just bullying it with massive computational power? Say hello to your brain. Biology deals with massive data sets every moment of every day. Your brain manages around 1000 TB of data and no neuron in your brain fires faster than 1 kHz, which is about the speed of a desktop personal computer (PC) in 1986. But you do not wait around for hours while your brain boots up. That is because the biological computing system of your brain does not work the way a computer does. Your brain does not find and open large files with complex information sets fixed inside. Your brain pulls millions of individually stored simple data elements, processes them in multiple locations, and reconstructs them into a useful outcome (a memory, implementation of a task, etc.) often before you realize there is any processing going on.
Biological computing models are better suited to massive file sets because they organize and access information in ways that are more practical for the ranging and nearly infinite inputs we deal with every day. Google's server management is already mimicking the basics of biological modeling; devising systems that tolerate ambiguity, duplication, and loss without bringing down the Internet . What Google realizes, and what the rest of the world is tilting toward is the need to fundamentally rethink data.
So as a part of our research we would like to offer a set of rules for the new data world: 1) big is not enough, and 2) it is neither necessary nor practical to fix every piece of data we have collected as a species into some particular order.
So what does this look like in a story? Stories are elemental to the way that we, humans, have dealt with massive data sets since the dawn of time, and as such, a familiar way to explore a universe in which the data we have come to rely on to behave in a particular way begins to act up.
We are already capturing massive quantities of data about our entertainment. Take, for example, Supernatural, an American horror series, created by Eric Kripke in 2005. 1 Now in its seventh season, it has generated roughly 112 hours of footage. So we have a lot of pixels, yes, but we also have much more. We have every action of every character; every line of dialogue; a history of when, where, and how often everyone dies. Because all of that information is data, what we actually have, in and around those 112 hours of pixels, is a map to the world of Supernatural, and the characters inside it.
Today, all of that footage and all of that information is locked away in old style data collections: fixed and unwieldy. But if we can store all that information in a system, modeled more on biology than books, and apply our significant and increasing processing power to analyze and respond to the world, rather than just move it around mechanically, then we have the possibility of generating and interacting with the world and the characters of Supernatural (or possibly even a story you like). This requires computational intelligence, not a Google search. It is not the ability to hunt down a single piece of data in the massive haystack of global information but rather the ability to make something new and interesting emerge out of that data.
As the era of big data matures, incredibly smart people all over the world are reinventing our relationship with information. They are making computers faster (exponentially, scary faster), and they are remaking the model of data storage using biological models to support smarter interactions, inevitable data losses, and ambiguity. What they need to do is to tell a story.
Humans willfully engage with stories and as natural pattern makers, we look for story in everyday circumstances. Evolution dictates that. Early hominids started walking upright so that they could cover more ground with less energy. Competition for food made the ability to cover more territory appealing, but if you would not remember, or share with your group where you found food, and where you found hungry tigers, the advantages of territory expansion were quickly outweighed by death. Lacking Google Maps and a Global Positioning System (GPS), the hominids turned to pattern making. Those stories became the first adaptive algorithms we passed along to other members of our group.
Learning to mimic and interact with humans and their stories in more natural ways is not just a parlor trick. Biological systems and stories share an ability to make decisions under ambiguous conditions. Science is easy. Stories are hard. Stories require subtle changes in direction based on changing conditions, and stories do not have a “right answer.” Some answers are funnier, or scarier, or more dramatic, and some answers are wrong, but very few answers are “right.”
Genuine computational intelligence is the ability to make humanlike sense of mass quantities of data; not just the kind of sense that works internally to the machine, or with supersmart Hawking-like humans at the controls, but externally, looking outward at us with interactions that are sufficiently natural, unpredictable, and interesting to engage us in the generation of genuinely new kinds of narrative. It is easy to see why unraveling the human genome attracts the sort of beautiful minds that we need to make this all work, but stories represent a unique challenge in our transition to digital media. More than any other form, humans know immediately whether a story is “right.” Our biology has trained us for this, and when computers can tell us stories, we know we will have entered a genuinely digital age.
The new possibilities for the era of big data are not limited by our traditional understanding of interactions, inputs and outputs, and search. The petabyte world creates opportunities to think about fundamentally new roles for and relationships with data. One way to comprehend these new relationships and make practical use of them is to explore these relationships as stories. In the maturing age of big data, stories can become adaptive algorithms; they can take on a life of their own and create a far more engaging future for entertainment. ■
1If horror is not your thing, feel free to substitute some other long running show. We often play with the universe of Supernatural in our prototypes because it follows two characters (Sam and Dean) through a building series of story arcs. It has a complex and adaptable rule set, and frequently reinvents the rules of its universe. This creates a robust and reactive data set (as well as fun TV).
Back to Top