Data systems have long involved a tradeoff between flexibility and ease of use.
Cloud data warehouses are well integrated systems that support a wide range of analytics use cases, such as business intelligence. All an analyst needs is an understanding of SQL in order to perform even the more sophisticated functions. However, storage and compute are tightly coupled, and that limits the user to the tooling provided by the vendor. This is no small impediment given how fast the data industry is evolving.
Cloud data lakes offer more advanced capabilities and the flexibility to choose the best tool for each workload among a wide range of options. These systems are crucial especially for operational use cases (i.e. powering core business processes and customer-facing applications). But data lakes can be complicated to build and operate — enterprises often employ large teams of data engineers to program and maintain them.
This situation leaves data teams with an ogre’s choice. It’s clear that making use of cutting-edge data technologies is a business imperative — just generating conventional dashboards and reports is no longer enough to build competitive advantage in (almost) any industry. At the same time, organizational pressures favor the simplest and easiest possible platform architecture to reduce cost, simplify management, and empower data practitioners of all technical ability levels.
The most common outcome we’ve seen in this scenario, especially for large enterprises, is sprawl. Faced with conflicting business priorities, many data teams heroically attempt to buy, build, and support just about everything. This, in turn, can undermine the original goals of adopting new technologies — creating more complexity than it eliminates and struggling to deliver innovative new projects as engineers get pulled in too many directions.
Tabular is a new company that has set out to address this issue. They aim to make the most powerful systems easier to use, and tightly coupled systems more open and flexible. We’re incredibly excited to be leading Tabular’s Series A round.
Tabular is building an independent cloud data platform. It will replace raw data lakes with a service that hides much of the underlying complexity and automates common data management tasks. Tabular provides many of the features that make data warehouses easy to use — atomic transactions, schema evolution, time travel, partitioning, and so on — to any cloud-based data processing system that wants to support it, including data warehouses. In this sense, it implements the “lakehouse” architectural pattern that is growing in popularity. But it adopts a fully open set of standards so that all systems can build on a common foundation and share data in a common format.
We think of Tabular, informally, as a “headless” data warehouse. Where companies like Databricks and Snowflake pioneered the technical separation of storage and compute, we believe Tabular will complete the process by also separating these resources at the vendor level. This means customers will be able to mix-and-match the best data management and data processing systems to best meet their needs.
For data teams, the impact of adopting Tabular will be data infrastructure that’s more reliable, predictable, and easier to manage. It should allow data practitioners to spend more time on what they love — analyzing data and building new applications — rather than wrestling with backend infrastructure. Critically, since all data systems can share one storage layer with Tabular, it will give customers complete control over their data and how to access it. We view this unified data layer as an incredibly powerful control point for delivering additional services — like security, entitlements, data sharing, etc. — down the road.
Although this is a big vision to tackle, there are two key reasons to believe Tabular will be successful.
First, Tabular is built on the Apache Iceberg open source project. Though the project is relatively new, it has already been incorporated into production workloads at many top tech companies, including Netflix, Apple, Airbnb, Adobe, Stripe, and others. It has also been adopted by many of the most popular data processing engines, including Spark, Presto/ Trino, Dremio, Flink, and others to be announced shortly. When we spoke with users and contributors to Apache Iceberg, their passion and enthusiasm for the project was palpable — not only for its elegant technical structure, but for the impact it can have on the industry.
Second, and perhaps more importantly, Tabular is led by one of the strongest engineering teams in the data ecosystem. Ryan, Dan, and Jason spent nearly a decade building the Netflix data lake, one of the industry’s largest and most sophisticated cloud-native data systems. They have also shown remarkable persistence and technical leadership by driving adoption of the Iceberg standard across the industry. Reviews from their colleagues were almost embarrassingly positive, often including things like “I wonder how the company will perform after this outflux of talent.”
Tabular is only at the start of their journey, with a lot to build and a lot to prove. But if their early success with the Iceberg project is any indication, you will be hearing a lot more from this team in the coming months. We are thrilled to be helping them build a fascinating and much-needed data company.
I’d like to thank my partner Matt Bornstein for his work on this post and our investment in Tabular.
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.
This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.
Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.