Internet in a pocket, the age of walled gardens, and their end

2012-04-20

An interview with Sergey Brin received a lot of attention recently. He worries about the rise of walled gardens (inspite of Google+ being one of them):

"very powerful forces that have lined up against the open internet on all sides and around the world". "I am more worried than I have been in the past," he said. "It's scary."

I think their is a natural reason for the rise of walled gardens, web segmentation, and concentration of the internet services in the hands of few megacorporations. And it is the very same reason which brought Google in existence. The key to the web is data.

The size of the indexed web is estimated to be 50 billion webpages in April 2012.

According to Google, the average compressed webpage size is 320 kB. So the entire size of the indexed web is only 16 petabytes. It is three time as big as the entire Internet Archive, but not really much. The total internet traffic flow in 2013 is predicted to be 670 exabytes, 40000 times more.

On one hand, the actual data size is bigger uncompressed, but one the other hand, webpages usually shared some resources between them, so the total size may also be smaller. We need just a rough estimate.

Now the cost of storage is decresing exponentially.

\frac{\text{cost}}{\text{GB}} = 10^{- 0.2502 \, (year-1980) + 6.304}

That means that the minimal cost to store the entire indexed web of the year 2012 without deduplications, would be

  • $3 000 000 000 in 1996
  • $30 000 000 in 2004
  • $300 000 in 2012
  • $3000 in 2020

Actual costs of managing that data might be some orders of magnitude higher (just a guess), and the web was smaller in 1996 and is still growing, but overall it is progressively more affordable to keep the entire web in one place, because the growth rate of the web is limited by human creativity. As soon as a company can afford to have its own web, sufficiently big to be important on its own, it doesn't need to keep it open, standards compliant etc. On contrary, the others depend more and more on that company and have to accept its policies. Welcome to the world of private APIs, again...

Please look when some famous companies were founded:

  • Google in 1998
  • Facebook and Flickr in 2004
  • Dropbox in 2008

Google belongs to the era when one company could build and store an index of the entire web. Facebook and Flickr belong to the period when one company could store all chats and all cat photos of the web indefinitely. Dropbox intends to store everything people can deal with on the their computers. Some observations:

  • new winners were grabbing even bigger chunks of people's data than the previous leader
  • qualitatively, that data was always more personal, more deeply rooted in the real life
  • data was acquired much faster (now: as much as network bandwidth permits!)
  • data acquisition was always looking easier and more rewarding to the user (compare webmaster tools with dropbox)

So cheaper storage gave us the global index of Life, the Universe and Everything in 90s, it also made possible the private sandboxed webs in 2000s. It may help to concentrate all information and computing in the hands of just few companies in the next ten years. Scary future, but don't panic, it may also bring such large-scale systems in the range of just private hobbies. Unless some artificial barriers to entry appear, we may see more competition. Who grabs more fresh data wins, and people rarely look back.

Some thoughts about the future of the web and computing.

  • True breakthroughs depend on collective behaviour, and not on the technology itself. Twitter, Facebook, Instagram are trivial ideas, but they win on people's engagement, and there is space to improve. More complex human behaviours may drive new breakthroughs. For example, I still hope that mesh networking will appear at some point, and it will disrupt the world of communications.
  • Texts, music, photos and videos are already covered. There are some accessibility problems (like copyright hell, scientific publishing and DRM distribution), but overall these kinds of data are covered. New technologies should go beyond screen&keyboard. They need to integrate with the real life and grab the real life data. I suppose, augmented reality is the low hanging fruit here, and pervasive robotics is what may really change the computing. Smartsomethings and iSomethings are the first steps away from the keyboard. Anything which puts computing in the new contexts deserves attention.
  • People have five senses, and sense continuously. Input for computers is mostly a rare touch. Occasionally computers hear or watch something, but they rarely listen or see. They can play sounds and show pictures, but they almost never can pat you on the shoulder. The vast part of the real life data (and opporunities) lie in being able to actually see, and listen, and touch, and move around, and understand, and swim in the richness of data. We barely scratched the surface of the mountains. When we do, Facebook and modern Google will look telnet talkers of the past. Continuous sensing and interaction will change the game.
  • These data streams are even bigger than big data, and the trick will be not to store the data, but to understand it instantly and know what to forget. At that point the advantages of centralized storage will fade, and the proximity to the source sensors will matter. Autonomous systems, which do not hit the data centers, will scale better, be more reliable, lag less, and user-side computing may win again.

In fact I think that at some point it will not matter anymore who stores the data, but who can sit on a stream of data. And hopefully, the users will be in control of that.