Is Big Data Getting Too Big for the Environment?

I’ve been trying to find some solid data to answer the following question:

What is the environmental impact of the “Big Data“?

Let me explain…

Collectively, internet search companies have been relying on advertising dollars to make themselves profitable while keeping internet search free for consumers.  This, however, meant that compared “traditional” advertising models, search providers needed to make internet advertising much more economically viable for those companies paying for ads. In other words, they needed to improve ads targeting, which they did and are doing today very successfully.  But those models and algorithms, as scientifically amazing as they are, rely on massive amounts of data being collected about our site visits, our clicks, our searches, our behavior online.  All this data needs to be stored and processed somewhere. Indeed, the world is now running tens of millions of servers (estimated at 44 million). Significant portion of these servers (how big of a portion?) is dedicated to storing and processing search logs and clicks data directly related to online advertising. And all of those servers consume energy, a lot of energy (estimated at 0.5% of all electricity) and allegedly create significantly negative impact on the environment.

Yes, I know that data centers engineering and management science together with software and hardware manufactures made amazing progress when it comes to data center and software efficiencybut this is not the point of this post.

Regardless of how you feel about anthropogenic causes of global warming or about internet privacy, the fact is that we are still burning a lot (how much?) of energy… to do what exactly?

How much data do we need to index the web and make it helpful for all kinds of web queries? How fast does this portion of the data grow? How does it compare with volumes of data needed to continuously infer better and better ads targeting based on our behavior online and our social connections? In absolute terms per product sold, is environmental impact of online advertising better or worse than environmental impact of traditional advertising models? Can we get our hands on this statistical data? Is it a worthwile undertaking to understand it better?

In addition to some opinions out there about the questionable usefulness of online advertising, does online advertising basically helping us to “kill the planet”? Or is it as efficient environmentally as it seems to be economically? Are trends in data storage and processing sustainable with regards to the required energy production to support it? Should “Big Data” players bridge their considerable differences and collaborate on efficient and clean power production and distribution technologies with the same rigor they work on their own massively parallel data processing techniques?

I don’t know the answer to these questions – I am still searching for reliable data and useful models to estimate this.  If you have any suggestions – please feel free to reply to this post. I would be happy to collaborate!


