The Long-Tail Problem in AI, and How Autonomous Markets Can Solve It

Ali Yahya

In the past decade, our progress in Artificial Intelligence has been staggering. We’ve gone from barely detecting cats in YouTube videos to building cars that can drive themselves. And yet, as our aspirations for AI have become grander, it has slowly become a game that only tech giants can play. The knowledge and resources that are necessary to innovate in the space are now simply out of reach for most individuals and startups. How can we change that? The only answer is to build new technology that levels the playing field.

A common refrain in 2018 was “AI is communist; crypto is libertarian.” It was said first by Peter Thiel as a tongue-in-cheek observation that, while crypto is a technology movement that strives to decentralize power, AI is another that can’t seem to help but centralize it. This might ring true: crypto is a technology that is all about building systems that are owned and controlled by diverse communities, whereas AI today is one that relies on consolidation of data and tends to supercharge a central entity’s ability to make top-down decisions.

In the past decade, this view has become our dominant narrative about AI. Nary a week goes by without headlines that bemoan the excesses of the concentrated power now wielded by tech giants and how it tragically undermines democracy, corrupts youths, causes reflux, etc. Much of society now seems to assume that the only way we can ever hope to wrest that power away from the likes of Google and Facebook is, ironically, through the exercise of yet another kind of centralized power—top-down and inevitably heavy-handed government regulation. The truth is, however, that it’s a monumental failure of imagination for us to believe that we are at the end of history — that not a single path forward exists for a startup to build technology that shifts the balance of power.

It is almost natural to think that there is something inherent to AI that lends advantages to large players — that AI is inescapably “communist.” The story that people like to tell focuses on how tech giants hoard all of our personal data and benefit from unassailable “data network effects.” But the real reason that tech giants are good at AI is actually much simpler. They are good at it because they invest in it. They have every reason to do so. For tech giants, AI is a technology that enhances existing lines of business — in the words of Clay Christensen, it’s a sustaining innovation, not a disruptive one.

And true, there is no denying that having a lot of data does in fact lend giants an advantage. Large datasets of training examples are still a crucial input to even our most data-efficient Machine Learning algorithms, and that’s unlikely to change anytime soon. But nonetheless, at the very foundation of tech giants’ towering strength in AI, there exists an almost invisible hairline crack.

There is now an opportunity for startups to coordinate human energy at an unprecedented scale by building technology that better harnesses market forces.

Building AI systems today is increasingly a problem of coordinating large-scale and widespread data-collection efforts. Tech giants have always solved this problem through good old-fashioned command and control. But, central planning can only go so far. There is now an opportunity for startups to coordinate human energy at an unprecedented scale by building technology that better harnesses market forces. To do that for AI, we need to build what I’ll call Decentralized Autonomous Hiveminds.

Building Robot Hiveminds

In a previous life, I worked on an effort at Google X to build a network of robots that shared a collective intelligence. Our work was part of a broader strategy to gain a data advantage by harnessing one of Google’s core strengths — its ability to build large-scale systems. We set out to enable a collection of robots, each learning independently from its own environment, to contribute their real-time experience to a central neural network whose job it was to inform the actions of all the robots.

In other words, we wanted to build a hivemind.

To test it, we set up a door-opening challenge. Could our hivemind learn to correctly set the motor voltages for the seven joints of each robot such that each of them would successfully open the unique door that stood in front of it?

You might think, as I did, that getting a robot to open a door would have been easy. And, you’d be right — it is easy if you’re doing it for a specific door under specific conditions. But, to this day, it’s a hard problem to solve for all doors under all possible circumstances. This is because there is a very long tail of variations (edge cases) in the most mundane things — things like lighting and shadows; the color, texture, and reflectivity of the handle; the weight of the door; the type of handle; the forces required to operate it; and so on.

If there is one lesson that I took home from that experience, it’s that almost all AI problems worth solving are made difficult by the challenge of the long tail — a challenge that can only be tackled with lots of data. I wrote all about that here.

The problem of the long tail is the hairline crack at the foundation of today’s AI power structure.

The problem of the long tail is the hairline crack at the foundation of today’s AI power structure. It creates an opportunity for us to build new technology that changes the game.

To understand how this can possibly be true, we have to first grasp some of the structural limitations of A.I. today. Let’s start with this question: Why is it that, unlike human brains, our best algorithms for training deep neural networks are still so data-inefficient?

The Limitations of Neural Networks

The truth is that today’s neural networks are fantastic interpolators but terrible extrapolators. They are powerful pattern matchers with the ability to contort themselves to fit almost any dataset, but their fitting is blind to the mechanisms that generate the data in the first place.

Neural networks do not yet engage, as humans do, in a creative search for explanatory theories that account for why the data is as it is. They also certainly don’t then go forth, as humans ought to, and steadily strive to falsify every last one of those creative theories until a single one emerges triumphant as the best explanation for the data observed.

A human scientist (like say, James Maxwell) makes predictions about things (like electromagnetism) by building, through an iterated process of conjecture and falsification (i.e. science), a deductive framework (Maxwell’s equations) that generalizes to future situations.

Deep neural networks, on the other hand, take a different approach to modeling reality. They stitch together thousands of linear functions, laboriously shifting each one slightly for each training example, into a kind of high dimensional quilt — a manifold — that fits the training set. In doing so, they cannot help but make predictions by inductively pattern-matching onto what they’ve seen happen before. They mirror (rather than explain, as humans do) the chaos and complexity of the phenomena they observe.

Neural networks arrive at their predictions by induction, not deduction.

This helps explain why today’s deep neural networks require so much data to learn anything useful: They are inductive interpolators and, as such, they require a large number of points between which they can do their interpolating.

Why is it that our progress toward solving hard AI problems slows down as soon as we enter the realm of edge cases? The answer has to do with the reach of explanations.

But it doesn’t yet explain, however, why the long tail, in particular, is often so problematic. Why is it that our progress toward solving hard A.I. problems slows down as soon as we enter the realm of edge cases? The answer has to do with the reach of explanations.

One of the most remarkable things about human science, to quote David Deutsch, is “the contrast between the enormous reach and power of our best theories and the precarious, local means by which we create them.” He says, “No human has ever been at the surface of a star, let alone visited the core where the transmutation happens and the energy is produced. Yet we see those cold dots in our sky and know that we are looking at the white-hot surfaces of distant nuclear furnaces.” Human thinking somehow has the power to tell apart what is fundamental from what is merely incidental and thus can generalize to the long tail without us having to experience it directly.

To be fair, the inductive model at the heart of a deep neural network also has some power of generalization, but there’s a catch. The key inductivist assumption is that the future will resemble the past and the unseen will resemble the seen. And true, sometimes it does. More often, however, the reality is otherwise. The world is radically nonlinear. One does not simply model it with a quilt of stitched-together linear functions, for, as it turns out, the future and the unseen are often unrecognizably different from the past and the seen. As a case in point, think of the inner workings at the center of a distant star whose light began its long journey towards Earth thousands of years ago. There is nothing that even remotely resembles the environment of that star here on Earth.

Without an explanation for why a pattern that so reliably holds in the common case should continue to hold in cases that are less common (the edge cases), a deep neural network’s strides become just as blind as they are confident when they venture into the long tail. The darkness of that realm, for a neural network, can be illuminated only by direct experience with it — real training examples drawn from the long tail itself that can help mold the network’s linear predilections into better fitting nonlinear ones. But, of course, those training examples are, by definition, outliers and the hardest to come across. Thus, after all of the low-hanging fruit is picked and the long tail is all that is left, the marginal cost of collecting new useful data points begins to increase.

After all of the low-hanging fruit is picked and the long tail is all that is left, the marginal cost of collecting new useful data points begins to increase.

Think of what it takes to train a large neural network today. As our applications for A.I. become more ambitious and our networks grow deeper and wider, the whole enterprise becomes primarily about tracking down — in the most unlikely of places — data that offers differentiated signal. This is why it’s a problem of complex coordination. Combing through enough of the long tail inescapably calls for the marshaling of an army that scours the world for the right bits of rarefied, useful data.

Our neural networks today are architected and trained via the top-down, hierarchical efforts of a group of people that invariably work for the same company. At X, for instance, my team and I were on the hook for everything from scoping the ambitions of the project, specifying the architecture of our networks, tuning their parameters, building our robots from the ground up, and babysitting them (below) as they collected petabytes of data.

And let’s not forget, our hivemind was merely a proof of concept. It took four robots training simultaneously on different doors for hours on end — learning from trial and error — to achieve a 95% success rate on all variations of just those four doors.

Imagine what it would have taken to get it to truly work across all doors under all conditions, let alone venture beyond the realm of just opening doors. For all of its scale, even Google has struggled to mobilize enough resources to cover the long tail for things like self-driving cars.

The robot hivemind we built at X was cool — or so thought the eight-year-old boy inside me. The great irony, however, is that a hivemind is supposed to emerge bottom up as a unified intelligence that integrates the countless impulses of each of its far-flung agents. The “hivemind” that I worked on wasn’t like that at all. Every aspect of it — every line of code written, every resource deployed — was controlled by us. It was a centralized system pretending to be a decentralized one — a somewhat “communist” simulacrum of a hivemind.

But what if such intelligence could actually emerge bottom up? What if it could be born, not from the efforts of just one company, but from the aggregate knowledge of countless people working independently from far afield to contribute diverse signal to the collective?

Enter crypto.

A Decentralized World Computer

At its heart, crypto is a tech movement that offers us new tools for the bottom-up governance of large systems. The very term “decentralization” in the space refers to a class of power structures whose balance is tilted in favor of communities instead of central points of control.

The tools for decentralization that crypto has brought to bear have led us to build a kind of logical “world computer.” It is made up of a distributed network of physical machines that are owned and operated by members of a broad community. Ethereum is the first example of such a computer. It extended the ideas behind Bitcoin by including support for a general purpose programming language. Since then, it has unquestionably become one of the most important projects in the space.

But why is this kind of computer even useful?

What is truly novel about this mode of computation is that its security and trust model emerges bottom up from its users — users who can be anyone, anywhere, and can participate without anyone’s permission. Control over this collective computer is literally decentralized. There are no gatekeepers.

The challenge is, of course, that many of those users are no doubt dishonest and would rather like to game the system for profit if they could. The genius at the heart of Ethereum and Bitcoin (and crypto more generally) is a set of incentive structures that makes these systems self-policing. Through well-calibrated economic rewards, cryptonetworks like Ethereum encourage their users to keep one another in check. And so, even though they may not trust one another, they can come to trust in the decentralized computer that they are collectively helping secure.

The defining property of such a computer is therefore that everyone can come to trust (and verify) that the programs that are deployed to it will indeed run and will run correctly as written no matter what. This is why these programs have come to be known as “smart contracts.”  A better term for them, however, might be “sovereign programs” because, once deployed, they are independent — they deterministically execute themselves, instruction by instruction, subject to nobody’s authority. They run with immunity from intervention by (1) the people who originally wrote them; (2) the people who interact with them while they are running; and (3) even the people who control the physical machines that execute them.

It is this inviolable sovereignty that allows a smart contract to credibly act as a trusted intermediary between strangers. In fact, it is exactly this property that empowers a single program — e.g. the Bitcoin codebase — to come alive as a fully digital, global store of value. But that’s just the beginning. Smart contracts are a fundamentally new computational building block that is very general. They unlock a realm of possible applications that we have barely begun to explore. A few of them might even have something to do with A.I.!

It is, in fact, a coincidence that’s worthy of good sci-fi that, as a considerable subset of humanity finds itself doggedly working toward strong AI, another subset (conveniently disjoint from the first) is set on building an unstoppable world computer. Can’t make this stuff up!

Crypto and Two-Sided Markets

Fanciful joking aside, let’s imagine for a second if we could take a smart contract and embed within it a deep neural network. No need to make it Skynet. Let’s instead think through a network that is dead simple — say, a classifier that accepts as input an image and produces as output a prediction of what it thinks the image contains (e.g. “car,” “dog,” “honey badger,” “the number three,” etc.).

At the beginning, our model will be untrained, so it won’t be very useful. Training it would normally require us to collect (or somehow already have on hand) a broad corpus of labeled data that is representative of the kinds of images that we’d like our model to do well on. And, let’s not forget, if there’s any hope that our model will be able to perform well across the board, this dataset of training examples must include images that come straight from the long tail. The many exotic variations of each object must somehow be represented.

We could, of course, do what Google usually does and scour the Earth, brute force, for all of the right data — labeled images of the many kinds of cars, dogs, and honey badgers in all sorts of funky poses and lighting conditions. But, let’s be real, that playbook is old news. A new playbook is to harness market forces. Instead of exhaustively turning over every last stone in search of bits, we could enlist a broader community to collect the data for us.

The trick is to create a two-sided market by building the right incentive structures into our smart contract. On the supply side, people who have access to the prized bits of data can get paid if they contribute them (in labeled form) to our nascent neural net. And, on the demand side, developers who would like to leverage the intelligence of our neural net can pay a fee to make use of it as an API and get its prediction for what some unknown image contains (again “car,” “honey badger,” etc.).

Once this kind of economic venue is bootstrapped and reaches critical mass, it sets off a self-reinforcing feedback loop of improvement: More capital deployed by our neural net leads to the crowdsourcing of more esoteric data, which leads to improved model accuracy, which leads to increasing demand from developers, which leads to even more capital deployed by our neural net. And around the flywheel we go.

Let’s dive into how exactly we could get this to work.

Solving the Cold-Start Problem

The first hard problem is getting our marketplace off the ground. Why should a willing supply side show up with data, if the demand side doesn’t already exist? And, contrariwise, why would the demand side ever pay good money for the mad ravings of an untrained neural network? The cold-start problem is the key challenge that every startup to have ever bootstrapped a multi-sided market, like Airbnb or Uber, has had to overcome. A minimum level of activation energy is required to spark the virtuous engine into motion.

This initial energy must come from somewhere. Traditionally, it has always come from financial capital — venture funding that’s provided by a set of people that tends to be completely disjoint from the set of participants in the network itself. Companies like Uber and Airbnb have historically raised enormous amounts of money to subsidize one or both sides of their marketplaces with cash in order to get the flywheel started. This is capital-intensive and misses a huge opportunity.

Imagine for a second a world in which Uber had raised less money than it did and, instead of subsidizing the supply side of its new marketplace with only cash, it had also rewarded early drivers with a small amount of ownership in Uber (the company). In that world, the very first Uber drivers would no doubt appreciate the fact that their hard-earned ownership stake in Uber (albeit small, at first) could someday be worth a lot of money. Could that fact alone help persuade some of them to participate? How much more loyal would those drivers be to Uber over some of its competitors? Would they be more likely to evangelize Uber to other people? And, as Uber grew and each driver’s ownership began to be worth real money, would it inspire the next cohort of drivers to jump onboard?

Yes.

There is no reason that the capital needed to get a multi-sided marketplace off the ground has to be entirely financial in nature and come from external sources.

This is a fundamentally more efficient capital structure for a multi-sided market. There is no reason that the capital that is needed to get it off the ground has to be entirely financial in nature and come from external sources. Some of it can come from the participants themselves, for they have human and production capital to contribute. In the case of Uber, each driver puts forth their driving ability and their car to help the network get off the ground. Why shouldn’t they get to own a piece of it?

So, how could crypto help here?

Autonomous Marketplaces

Remember that the smart contract that houses our neural net is fully programmable. So, why not build into it the ability to mint freely tradable tokens that grant whoever holds them a claim to all revenue that flows through it? As in our thought experiment involving Uber, our contract can begin subsidizing the supply side of its nascent marketplace, not just with cash, but also with newly minted tokens of ownership.

With this new bit of logic, our neural network begins to take on a life of its own as a self-funding, autonomous entity. And, as the flywheel is set into motion, the service it offers will begin to become useful to the demand side too — i.e., developers willing to pay real money for its predictions. Of course, the fees that our contract charges them would have to be small at the start, but as our model’s accuracy improves, they can begin to increase. Higher fees can then be used to further incentivize data contributions which, in turn, improve the model, and so again, around the flywheel we go.

The key idea is that a market that is owned by its participants acts on its region of influence like a magnetic coil does on iron filings. It is a force of alignment between people — a force that revs the market engine by its shaft at its crucial starting moment.

Let’s call these things Autonomous Marketplaces.

What are the Technical Hurdles?

There is no question that getting our decentralized neural net to work will also require us to solve a whole host of hard, technical problems. I’ll briefly mention a few.

We must, for example, build into our smart contract a mechanism that can reliably measure signal in the data that is contributed to it. It must have logic that is capable of telling apart garbage data from the real thing.

A possible solution is to enable our smart contract to access a secret benchmark dataset that it can use to measure its own performance. Data contributions that improve the neural net’s accuracy on the benchmark can be rewarded, whereas those that don’t might incur a fee.

A related problem is that, because the state of all smart contracts today is visible on the blockchain to anyone who wishes to look, it would be trivial for an attacker to steal the underlying neural net (once trained) and bypass the need to pay for its predictions.

The solution to this whole family of problems depends on the existence of smart contracts whose execution is still verifiable by anyone, but whose internal state is encrypted. Such programs are called “private smart contracts,” and they are an active area of research in the space. One approach to building them is based on a kind of cryptographic proof (known as a zk-SNARK) that can mathematically convince any observer that a computation was performed correctly without revealing any of the underlying state. And, there are many other approaches in the works.

The challenges are formidable, but solutions are well underway.

Extending the Internet

When all is said and done, our neural network will be an autonomous entity in the world whose intelligence will have emerged from the grassroots efforts of countless individuals who are laboring independently to discover and contribute signal from wherever they may be.

We will have built a Decentralized Autonomous Hivemind — a hivemind that is true to form and, unlike the one I worked on at Google, would make the eight-year-old boy inside of me proud. And, from there, it’s but a small leap to imagine a future that includes a vast and open ecosystem of countless such hiveminds, each one brought to life by its own community and serving as a modular building block that anyone, large or small, can use to build products.

The aggregate of all those building blocks can thus begin to emerge as a new protocol layer that extends the capabilities of the internet itself. It will be a fabric of intelligence that, like the internet, will be open source and controlled collectively by its own community.

In fact, it is precisely because control over such a network will be decentralized that it has the potential to scale to millions of developers and contributors. No platform ever gets to that scale without making an ironclad commitment to uphold its own promises over time. And there is no better way for a platform to make that guarantee than by engraving its own rules into a sovereign program that is owned and governed by the very people who build on top of it.

This is, after all, Why Decentralization Matters. The core idea behind Chris Dixon’s argument is that decentralization — and therefore crypto — matters because it addresses the problem of platform risk. History is full of countless forgotten companies that were tempted to build entirely on top of centralized platforms. For many of those companies, everything tended to look great until, like quicksand, the platform’s rules began to shift. Rules that were once generous to developers conveniently morphed into ones that are now more favorable to the platform. Along the way, entire companies have been dragged down by the undertow, often beneath the suffocating sand.

It is because the core protocols of the internet (i.e. TCP/IP) are decentralized, that it is possible for trillion-dollar companies like Google to be built on top of them. By that same token, it is inconceivable that a company the size of Google could be built on top of Google, in the same way that Google was built on top of TCP/IP.

What might it be possible to build on top of a decentralized network of market-powered neural nets?

The Power of Markets

Let’s go back to the question that we started with. Why should an approach that is driven by market forces fare any better at covering the long tail of AI than the centrally planned, somewhat “communist” efforts of a trillion-dollar tech giant? How can this new approach possibly win?

Remember that the challenge that forces even striding giants to slow down as they stumble headlong into the vast stretches of the long tail has less to do with blunt resources to be brought to bear than with the need to navigate problems of complex coordination. Such problems are made difficult for a central planner by the fact that the information needed to solve them is strewn about widely and locked away in countless remote places and unknown minds.

The people who are best equipped to harness the various pockets of localized knowledge that these problems demand are those who are closest to those pockets, not the monolithic entity at the center that’s trying to do all of the planning and commanding top-down. So, rather than marshaling an army that searches the world over blindly, it is much more effective to instead create incentive structures that encourage individuals to self-organize.

Rather than marshaling an army that searches the world over blindly for long-tail data, it is much more effective to instead create incentive structures that encourage individuals to self-organize.

This is true because there is simply no way that a single mind (or a small group of minds) at the center can match the endless knowledge and creativity of a diverse ecosystem of people who are all aligned with one another by the presence of a well calibrated market. Adam Smith’s “invisible hand of the market” is, after all, but a metaphor for the power of bottom-up self-organization.

Today, the knowledge and resources that are necessary to innovate in AI are held in the pockets of giants — well beyond the reach of most individuals and startups. But, through the creation of efficient and global markets around them, they can be made more accessible. We might discover that market-based systems that are built bottom-up offer society a new approach, one that’s more disruptive to entrenched power structures, for building intelligent computer systems. The difference will be in who ends up controlling them.

Each new mini-hivemind may start out modestly. But, the right set of incentives, credibly engraved into a kind of covenant whose governance is decentralized (a smart contract), has the potential to coordinate human energy at a scale that dwarfs even that of our largest giants. And so, in the end, it may turn out that AI is not so inherently communist after all.

***

The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.

This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.

Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.