The discovery of small molecule drugs follows a well-honed playbook. You first discover the drug target, then discover the lead drugs that can hit those targets, optimize those leads, and then enter pre-clinical testing. As my partner, Jorge Conde, outlined in a recent presentation — drug companies are hoping that 1 in 10,000 drugs make it through the 10-15 year process required to deliver an FDA-approved drug… not great odds. New techniques in machine learning may hold the key to radically alter this playbook and increase the odds of bringing important drugs to market. How?

The traditional drug design approach can make sense given the way pharmaceutical companies are organized and is quite possibly optimal given where drug discovery normally starts and the data available at that time. However, there are also limits to this approach. First, it’s very sequential in nature as the candidate must pass one test to start the next, and this process is a very lossy funnel. Second, the sequential nature means that you can generally only use data that is very specific to the given stage, slowing down the drug design process and adding cost. However, with new advances in ML and its ability to integrate disparate data sets, it’s now possible to use the full data available from the outset — which could lead to an altogether different path to designing a drug. Naturally, it’s better to know very early if you need to take a different path before sinking too much investment into something that will inevitably fail. (Hence the “fail fast” mentality of many Silicon Valley tech companies.)

It’s remarkable to see the wealth of data now available in understanding how small molecule drugs can affect biology. For example, there is data on existing drugs and small molecules (Drugbank, Therapeutic Target Database, ChEMBL, BindingDB, Broad Center, Stitch, NCATS, ADRECS, sider); protein/target data (UniprotKB, homologene, human protein atlas, UCSC genome browser, targetScan); systems biology (DRPIAS, STRING, CPDB, REACTOME); and hierarchies (MedDRA, UMLS, OHDSI). On top of this is disease-specific data, including drug indications (FDA’s orange book, NDFRT,; *omics (gene expression and miRNA data, GEO, ArrayExpress, TCGA); genetics (GRASP, GWAS catalog, disGeNet, TCGA); and clinical databases (FAERS, CVAROD).

Clearly, the problem isn’t simply a lack of data. Instead, the challenge is how to incorporate all of these disparate data types into a method that can yield more accurate and rapid predictions of compounds to bring to pre-clinical studies.

When I met the TwoXAR team in late 2015, they had developed a proprietary machine learning technology aimed at allowing you to incorporate vastly disparate datasets into a common framework in order to predict drug outcomes. Their approach did not follow the traditional drug design playbook (screening, lead optimization, etc.) — instead, it directly led to candidates through a data-driven machine learning approach. In addition to being faster and more accurate than previous computational methods, TwoXAR’s approach allows you to handle more complex situations where either the target is unknown or where the inherent nature of the problem is polypharmacological — that is, where multiple targets need to be hit and it’s not clear how to do that from a rational, structure/target-based approach.

But the open question was, would it work broadly? We led their seed round to help them prove it out. Now, two years later, here are the results of three studies that they’ve made public:

–Hepatocellular Carcinoma (HCC). In three months from launching their HCC program with Stanford Asian Liver Center, TwoXAR had a lead candidate (TXR-311) that, in a panel of genetically diverse HCC cell lines, was shown to be potently cytotoxic with an average half maximal inhibitory concentration (IC50) in the nanomolar range. Activity in HCC cell lines was shown to be approximately 500-fold more potent than in a panel of primary hepatocytes isolated from three different donors with no liver diseases, indicating high selectivity for cancer cells versus healthy cells. They tested TXR-311 and Sorafenib (the standard of care for humans) in in vivo patient-derived xenograft studies, and both significantly reduced the tumor size compared to vehicle. Moreover, there is evidence that TXR-311 is acting on new mechanism of action in liver cancer.

Type Two Diabetes (T2D). Using a clinically translatable model of T2DM (db/db mice), the data showed that TXR-411 significantly reduced blood glucose levels and improved the results of oral glucose tolerance as effectively as rosiglitazone. These results were achieved without any of the weight gain that is associated with PPAR agonists. In their initial examination of TXR-411, TwoXAR achieved an approximately 50 percent decrease in glucose levels, on par with rosiglitazone, without the weight gain associated with rosiglitazone. The reduced glucose and body weight maintenance is notable in its own right, however, perhaps even more compelling is that we selected this disease, identified and shortlisted candidates, and generated efficacious preclinical results in less than three months.

–Rheumatoid Arthritis (RA). Ten high-scoring novel candidates were subsequently screened in the collagen-induced arthritis model of RA in rats. Treatment with three candidates — exenatide, olopatadine, and TXR-112 — significantly reduced ankle size, alleviated limb inflammation, improved joint histopathology, and improved mobility as observed using a novel digital motion endpoint. These three drug candidates do not act on common RA therapeutic targets; however, links between known candidate pharmacology and pathological processes in RA suggest hypothetical mechanisms that could contribute to the observed efficacy.

With their computational methods vetted with these and also other (still confidential) projects, it’s a natural time for TwoXAR to scale and expand their portfolio. I’m excited to continue helping them push their technology forward by supporting their Series A financing and joining their board of directors. TwoXAR is able to use machine learning to accelerate drug discovery… and in time, make significant progress towards the vision of true engineering of small molecule drugs.