Posted August 31, 2021

Data systems have long involved a tradeoff between flexibility and ease of use.

Cloud data warehouses are well integrated systems that support a wide range of analytics use cases, such as business intelligence. All an analyst needs is an understanding of SQL in order to perform even the more sophisticated functions. However, storage and compute are tightly coupled, and that limits the user to the tooling provided by the vendor. This is no small impediment given how fast the data industry is evolving.

Cloud data lakes offer more advanced capabilities and the flexibility to choose the best tool for each workload among a wide range of options. These systems are crucial especially for operational use cases (i.e. powering core business processes and customer-facing applications). But data lakes can be complicated to build and operate — enterprises often employ large teams of data engineers to program and maintain them.

This situation leaves data teams with an ogre’s choice. It’s clear that making use of cutting-edge data technologies is a business imperative — just generating conventional dashboards and reports is no longer enough to build competitive advantage in (almost) any industry. At the same time, organizational pressures favor the simplest and easiest possible platform architecture to reduce cost, simplify management, and empower data practitioners of all technical ability levels.

The most common outcome we’ve seen in this scenario, especially for large enterprises, is sprawl. Faced with conflicting business priorities, many data teams heroically attempt to buy, build, and support just about everything. This, in turn, can undermine the original goals of adopting new technologies — creating more complexity than it eliminates and struggling to deliver innovative new projects as engineers get pulled in too many directions.

Tabular is a new company that has set out to address this issue. They aim to make the most powerful systems easier to use, and tightly coupled systems more open and flexible. We’re incredibly excited to be leading Tabular’s Series A round.

Tabular is building an independent cloud data platform. It will replace raw data lakes with a service that hides much of the underlying complexity and automates common data management tasks. Tabular provides many of the features that make data warehouses easy to use — atomic transactions, schema evolution, time travel, partitioning, and so on — to any cloud-based data processing system that wants to support it, including data warehouses. In this sense, it implements the “lakehouse” architectural pattern that is growing in popularity. But it adopts a fully open set of standards so that all systems can build on a common foundation and share data in a common format. 

We think of Tabular, informally, as a “headless” data warehouse. Where companies like Databricks and Snowflake pioneered the technical separation of storage and compute, we believe Tabular will complete the process by also separating these resources at the vendor level. This means customers will be able to mix-and-match the best data management and data processing systems to best meet their needs.

For data teams, the impact of adopting Tabular will be data infrastructure that’s more reliable, predictable, and easier to manage. It should allow data practitioners to spend more time on what they love — analyzing data and building new applications — rather than wrestling with backend infrastructure. Critically, since all data systems can share one storage layer with Tabular, it will give customers complete control over their data and how to access it. We view this unified data layer as an incredibly powerful control point for delivering additional services — like security, entitlements, data sharing, etc. — down the road.

Although this is a big vision to tackle, there are two key reasons to believe Tabular will be successful.

First, Tabular is built on the Apache Iceberg open source project. Though the project is relatively new, it has already been incorporated into production workloads at many top tech companies, including Netflix, Apple, Airbnb, Adobe, Stripe, and others. It has also been adopted by many of the most popular data processing engines, including Spark, Presto/ Trino, Dremio, Flink, and others to be announced shortly. When we spoke with users and contributors to Apache Iceberg, their passion and enthusiasm for the project was palpable — not only for its elegant technical structure, but for the impact it can have on the industry.

Second, and perhaps more importantly, Tabular is led by one of the strongest engineering teams in the data ecosystem. Ryan, Dan, and Jason spent nearly a decade building the Netflix data lake, one of the industry’s largest and most sophisticated cloud-native data systems. They have also shown remarkable persistence and technical leadership by driving adoption of the Iceberg standard across the industry. Reviews from their colleagues were almost embarrassingly positive, often including things like “I wonder how the company will perform after this outflux of talent.”

Tabular is only at the start of their journey, with a lot to build and a lot to prove. But if their early success with the Iceberg project is any indication, you will be hearing a lot more from this team in the coming months. We are thrilled to be helping them build a fascinating and much-needed data company.

 

I’d like to thank my partner Matt Bornstein for his work on this post and our investment in Tabular.

 

***