Remember the cringing router sound that accompanied your early internet connections? Being a 40-year old father of two, when I hear that screeeeeeeechhhhhhhhhh I can’t avoid feeling a smile pop on my face. That sound reminds me, with a touch of nostalgia, of the cry of a newborn. It makes me smile as fond memories have overshadowed the dirty nappies and sleepless nights.
To a certain extent, that sound was indeed the cry of a baby internet, when browsing was a commitment and a flaming logo the pinnacle of creativity. The worldwide web has since grown up. A lot.
Small steps towards big data
Looking back, that dial-up buzz was literally the first signal in our digital footprint. We signed on. And in that very moment big data spun to life.
What were we learning from those early signals? Not much. We knew how many people were online and roughly where they were. But as the web of documents developed, search engines arose and our ability to understand people increased. We started to know what you wanted. Think about it: you probably tell search engines things you wouldn’t tell your closest friends. We understood that your inputs into a search engine were, on a personal level, an expression of your desire, and on a global level, an expression of the world’s consciousness. The so-called Zeitgeist.
Although I have spent the last 10 years in the realm of online advertising, in today’s article I will focus less on search marketing, and more on the information infrastructure and machine learning that Bing is part of, looking at how this is influencing our future. What are we doing with our data footprint? Over the course of the last year I have asked many people across Europe how the idea of data collection made them feel. By large, the response was discomfort and hesitance. Until provided with more perspective.
The heights of data complexity
So let’s get back to our story. In order to understand the complexity and depth of the data infrastructure that we are part of, let’s contemplate what has changed since the emergence of the first search engines. With each of these four changes and associated amount of data surge that came with it, you need to visualize a growing mountain.
First is your search habit. From a few searches per day to multiple searches per hour, we are now searching constantly, and not just you but also the billions of people who got online in the recent decade.
Second is your search access. Most of us had access to a desktop computer 20 years ago. But just one. A grey, cold box, sealed to a desk. We certainly couldn’t put it in our pocket and take it with us to a party. It is not just computers, think about all the devices you own which are harnessing computing power: laptop, tablet, smartphones, TV, but also your car and now your fridge.
The third big change is your search expression. You have gone from using basic computer commands, with amp signs and inverted commas, to using more human a language. You’ve gone from asking “what” to asking “why…” and “how to…”. In fact we have seen the growth of queries starting by Why being three-fold the growth of What queries, which means we are no longer looking for information, we are looking for answers. You’ve layered sequential searches on top of these, in a complex web of intents.
Finally, the integration of search with other infrastructures has also changed. A search engine used to be an isolated service. Now it’s plugged into the social graph. This means that several points of contact are linked and with them a flurry of new signals, millions of them that only a few super-computers are able to capture, organize, model and render. Search engines are the database of intents, and social networks are the depository of sentiments. We have developed the ability to process, analyze and understand these two humongous, historical and real-time information sets together.
The search crystal ball
We can understand your sentiment for certain events or entities, estimate popularity trends, as well as predict outcomes of future events. Microsoft has developed a program called Bing Predicts which combines and models all the data signals we can find, and comes up with incredibly accurate predictions. We initially explored popularity-based contests like American Idol, for which the web and social signals are very strong and highly correlate with popularity voting patterns. Bing Predicts could accurately project who would be eliminated each week during American Idol and who the eventual winner would be. Just by using all of the signals that are out there.
Getting more complex, we turned to sporting events and even world political challenges. During the World Cup in Brazil, our team predicted accurately with 100% accuracy the winners of the final elimination round. During the last year Rugby World cup, we had 80% accuracy across the tournament. Surprised? In order to successfully predict a sporting event outcome, the number and type of signals we incorporated quadrupled from what we used to predict a basic popularity event like American Idol. This is because we recognize that popularity alone does not predict whether a team will win – Sorry for the fans. A fan base has however special insights into the abilities of their teams, and those fans are having constant discussions about their team. This is called the Insider Knowledge. We up-weighted their knowledge against player and team stats, tournament trends, game history, location and even weather conditions. This is how we were successful in our predictions.
We finally turned our attention to political events, and in particular the Scottish referendum two years ago. The process and results were presented at TEDxSuzhou.
We were and are predicting the future. Can you imagine a business need that this kind of prediction can answer? Of course you can! We’re experimenting right now with predicting the upcoming trends in fashion, in automobile, in technology – so we can help our advertisers make smarter business decisions.
So we saw how predictions can play a role in entertainment, sport or business, fine. Fine, until we find a way to make this kind of data infrastructure even more meaningful, at a society and mankind level. What can we do with this capability that goes beyond entertainment and the novelty factor? Can we use our big data to make a meaningful impact on society?
All of this is exciting on a global or country level. When we’re talking about millions of inputs, it’s no wonder you can make predictions and have an impact like this. It is just a massive sample size. What about bringing this big data infrastructure to a personal level? Is it possible for a machine to learn so much about you that it can accurately predict your next move? Or predict when you will need something, and provide it? That is the promise behind digital personal assistant like Cortana.
Cortana is not only on Windows Phone but also Android and iPhone. And since the release of Windows 10, she’s even on your desktop. As outlined in a previous article, you set up Cortana with some basic info about yourself, then use her to help you with things like scheduling and reminders and web searches. Before you know it, Cortana is spontaneously sending you an alert to inform you that you should leave the office now to be on time for your next appointment in Farringdon, because she found some congestion on your normal route. It doesn’t take Cortana long to learn so much about you that she can predict your next move and offer assistance.
A new layer of data in your coat
While our mobile phones aren’t exactly wearables, we sometimes behave as if they are, keeping them on our body no matter where we go. With wearables, two important things converge: big data infrastructure and your expectations.
When you hear “wearables”, you probably think of a smart watch or one of these fitness bands. But to go back to my introductory analogy, these are just the first baby steps towards the full potential of wearable and how that technology will be able to enhance our capabilities, as individuals or as professionals. Think about it: wearables can capture and communicate signals about your location, your manner of travel – whether you’re on foot or in a car – time of day, most recent queries, usual route home from work, the weather, your physiological state, etc.
So for instance, if your wearable identifies that your hydration is low, it could prompt a notification that factors in your location, whether you’re moving, what time of day it is and therefore whether the nearby branch of your favourite coffee shop is open. It could even cross-reference this with your earlier interest in gingerbread lattes, and the fact that it is raining, and direct you to the nearest open coffee shop with plenty of indoor seating and gingerbread lattes on the holiday menu. Your wearable might even send you an alert for a coupon the coffee shop is offering.
Greener pastures ahead
As the wearable technology grows, your expectations for your experience with technology in general will change. And that is for the better. After all what the point accumulating data points like hoarders unless you do something greater about it. And if I have learned something about the internet, is that it is a fertile ground for creative usage of untapped opportunities.
I am from the French Alps where I spent most of my summers walking the mountains with my grandmother. She used to herd cattle in these alpine pastures and she was telling me stories about how much each of her cows were almost like members of her family. They had names, and she could tell when something was wrong with any of them.
These days are gone. Nowadays a farm is no longer taking care of a small dozens of cows, but hundreds. The personal relationship of each animal is no longer an option. The story of the connected cows started with a farmer in Japan who was exhausted with the effort of figuring out the exact time his cows were fertile – because it is a very short window, only 12-18 hours every 21 days, and it happens usually between 10pm and 8am. Of course knowing this precise time of estrus would give farmers a chance to successfully inseminate the cows.
These are farms with hundreds of cows – you can image what a nightmare this would be to keep track. Could technology help? A farmer in Japan asked Fujitsu for help. Fujitsu consulted with some university researchers and they came up with this idea of putting wearables – pedometers – on the cows, and providing the data to Microsoft Azure, in the cloud, for analysis and alerts that go straight to the farmer’s smartphone.
It turns out that when a cow is in estrus, she paces. The number of steps she is taking increases tremendously, and this data alerts the farmer to the right moment for fertilization. The connected cow project has been 95% accurate – and that 5% where it misses the mark turns out to be when the cow actually skips the farm and goes missing.
Not only is this wearable incredibly accurate, it also helped the researches discover that there is an optimum window for fertilization if you’d like a female or if you’d like a male. With 70% probability, a farmer should fertilize in the first half of the estrus window if he needs more milk cows or if he needs more bulls. But it does not stop there… The Fujitsu researchers were able to also correlate pacing patterns with increased risks of genetic diseases and pathology.
It is amazing what data can tell you, if you know how to look at it. Sometimes creatively! This is the joy of data infrastructure. We can do wonderful things in the world when we collect, analyze and render the data that’s available to us. Microsoft is on the leading edge of this, with products like Power BI, Azure, our cloud platform but also Bing our search engine and its machine learning capabilities which can make sense of the millions other data points that come together to make big data smart, useful, creative and – yes – joyful. And you, what was the last time you found a creative inspiration in your data set?