a16z Podcast

a16z Podcast: From Data Warehouses to Data Lakes

Gaurav Dhillon and Scott Kupor

Posted September 29, 2016

From the silver age of on-prem software companies like SAP and Siebel Systems to the golden age of enterprise software-as-a-service, we’re now seeing an explosion of data. All types, all sizes, and all over the place. And much of it is a sort of industrial “data exhaust”, where companies aren’t quite sure what question to ask of the data but are being bombarded with data due to the variety of data sources available today — from websites to sensors (and therefore data capture) everywhere. Before there is even a signal in the noise.

So how do you solve a problem like this-Data? Beyond requiring new types of plumbing and integrations, enterprises now expect — given the age of mobile, web, cloud, and heck, let’s add millennials to the mix too — self service. To be able to ask, get, fit (curve-fit), predict. To take back the enterprise from the patchwork of integration and number of vendors we all have to deal with — the scope of which most companies in fact are not truly aware of.

It’s about the lifecycle of data in the enterprise, argues Snaplogic founder and CEO Gaurav Dhillon in this episode of the a16z Podcast, in conversation with Scott Kupor. It’s in fact about the evolution of data overall — from data warehouses to “data lakes”: in stages, from purification (like wrangling data) to bottling (prepping for consumption by data scientists) to making sense of streams and streams of data!