Delivering on the promise of AI has been an iterative and interdependent balance between processing capacity, math, and data. The rise of cloud computing and open source has largely mitigated capacity and math as bottlenecks. Compute power is essentially unlimited, and the algorithms behind AI are evolving to a point where they will become commoditized building blocks. This leaves data as the fundamental constraint to unlocking the full potential of AI.
In a sense, data is the equivalent of source code for AI environments. I’ve observed that the next step-function increase in software development will be the growth of data science. The concept of data-centric programming highlights the importance of managing and taming massive amounts of data for use within AI frameworks.
Raw data, while plentiful and in theory useful, cannot typically be used by an ML system without modification and preparation. Before being fed into an ML framework like PyTorch or Tensorflow, data has to be aggregated, transformed, cleaned, augmented, and – in most cases – labeled. This process consumes roughly 80% of resources in an average ML project, far exceeding other categories like algorithm development, model training, and deployment. Data prep, in other words, is the engine powering modern AI and ML.
A new class of products is emerging to make this process more effective, easier to manage, and less costly. We call it “training data management.” Today, we’re proud to announce Labelbox as our first investment in this new category. If GitHub has become the platform for managing and developing software (code), then Labelbox has the potential to fill a similar role for data in the AI/ML world.
Labelbox is building a training data platform for the development of AI software. The company’s mission is to fill the critical role of interfacing between AI systems and the domain experts that make these systems function. To start, they focus on the problem of data labeling, an especially important part of the training data workflow.
Labeling – also known as annotation – encodes ordinary human intuition into machine-readable formats. It generates the information that machines actually “learn” in the machine learning process. For example, an AI model that identifies weeds in a field often needs to train on thousands of pictures of weeds. It also needs to know which plant in each image is a weed, versus a healthy crop or some unrelated object. That information is provided by a team of data labelers – people trained to recognize weeds and to edit images – working through the corpus, one data point at a time.
Labeling is also necessary to identify tumors in medical images, defects on a manufacturing line, pedestrians in dash cam videos, buildings in satellite imagery, and many, many other applications. Some of these use cases can be handled by non-experts, while others require close analysis by highly skilled personnel (e.g. radiologists). The need for data labeling is not exclusive to visual data either – it’s equally strong for text and numeric data. Behind nearly every state-of-the-art ML model is a mountain of training data and a small army of data labelers.
Labelbox provides a flexible, cloud-hosted environment to equip data labelers to do their jobs. They have made this truly an enterprise-grade product, with a customizable labeling interface, deep API access, and strong security controls. Critically, the Labelbox platform also allows managers to coordinate any number of labeling teams, across both full-time and outsourced staff, all in one platform. This unique feature gives Labelbox customers granular insight into the performance of their teams and frees them from dependence on any one vendor of labeling services. Labelbox basically acts as a single source of truth for defining, storing, and accessing training data across an entire organization.
In just two years in business, Labelbox has already established themselves as the clear leader in this category. They serve a long list of customers across industries, including healthcare, manufacturing, agriculture, transportation, retail, and financial services – an unusually diverse list! Most Labelbox customers find the company on their own and move through the sales process remarkably quickly – both strong signs of market pull and early product-market fit. It’s also quite rare for a Labelbox customer to leave the platform, which is a testament to the depth of the product and its central role in ML projects.
Most importantly, the Labelbox team is a tight-knit group of humble leaders, killer product visionaries, and relentless executors – not to mention several skilled airplane pilots. They have lived the problem in their previous roles and are among a small group of entrepreneurs leading the charge for better AI/ML infrastructure. We’re thrilled to partner with Manu, Brian, Dan, and the rest of the team to help build a foundational enterprise AI company.
I’d like to thank my partner Matt Bornstein for his work on this post and our investment in Labelbox.
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation.
This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/
Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.