Posted January 8, 2026

The last decade of AI progress has been driven by breakthroughs in compute and models. The next decade will be defined by data.

Today’s most capable models are increasingly constrained not by architecture or GPUs, but by access to data. Public datasets are largely exhausted. The internet has long since been scraped. Synthetic data helps, but it cannot fully capture the complexity and edge cases of real environments.

AI models now need high-quality, real-world data to continue advancing. Fortunately, the internet is only a tiny fraction of the world’s data. The question is how to get access to the rest of it in a simple manner.  Most of the world’s data is private, regulated, proprietary, or operational, spread across hospitals, media archives, enterprises, and physical environments. Even when access is possible, stitching together datasets across modalities, cleaning them, licensing them responsibly, and delivering them quickly enough to be useful is an enormous challenge.

That’s why we’re excited to be leading a $30M round in Protege, the platform building the real-world data infrastructure for AI.

Protege connects the world’s leading AI builders with massive, multimodal datasets across healthcare, video, audio, motion capture, and more, unlocking data that has historically been fragmented, inaccessible, or prohibitively slow to use at scale.

Take healthcare, where Protege launched and has since become one of the market leaders. Healthcare produces some of the world’s richest real-world data, yet most of it remains locked inside fragmented systems. Patient health data is often spread across multiple hospitals, doctors offices, labs, and imaging centers. Protege securely obtains and stitches this data into longitudinal, multimodal, anonymized patient-level datasets, enabling AI builders to better detect disease, predict successful treatments, and save lives. For AI companies on the demand side, Protege is becoming a one-stop-shop, sparing them from needing to scour the globe and negotiate hundreds of one-off data acquisition deals. On the supply side, Protege enables hospitals, labs, research networks, and data vendors access to a new, and often much needed, economic pathway, allowing them to work with AI companies without adding operational or regulatory burden. 

Protege has already found an impressive product-market fit, and is a core data partner to the majority of MAG7 public companies, as well as many of the largest private players in AI. 

We couldn’t think of a better team to build this company than Bobby Samuels (cofounder & CEO) and Travis May (cofounder and Chairman), who spent a decade building Datavant and LiveRamp into two of the biggest exits in the data space, alongside fellow cofounders Engy Ziedan (Chief Scientific Officer) and Richard Ho (CTO), who bring deep data and technical expertise.