One of the most active and fastest growing open source big data cluster computing projects is Apache Spark, which was originally developed at U.C. Berkeley’s AMPLab and is now used by internet giants and other companies around the world. Including, as announced most recently, IBM.
In this Q&A with Spark inventor Matei Zaharia — also the CTO and co-founder of Databricks (and a professor at MIT) — on the heels of the recent Spark Summit, we cover the difference between Hadoop MapReduce and Spark; what are the ingredients of a successful open source project; and the story of how Spark almost helped a friend win a million dollars.
The a16z Podcast discusses the most important ideas within technology with the people building it. Each episode aims to put listeners ahead of the curve, covering topics like AI, energy, genomics, space, and more.