Infra

Next-Gen Pentesting: AI Empowers the Good Guys

Malika Aubakirova Posted June 12, 2025

Next-Gen Pentesting: AI Empowers the Good Guys Table of Contents

In early 2025, a document surfaced online under a pseudonym. It listed more than a hundred previously unknown Microsoft Access and 365 vulnerabilities — complete with technical proofs, stack traces, exploit chains, and analysis. The source? A tool calling itself Unpatched AI. No team. No company. Just the findings.

At first, the security community didn’t know what to make of it. The vulnerabilities were real. The writeups were thorough. Automation was clearly involved. But the level of output — the depth of coverage, the accuracy of the chains — suggested something beyond a typical scanner or script. The public clues point to an autonomous, LLM-steered vulnerability-research pipeline that blends modern fuzzing, symbolic execution and generative AI for narration. While the team behind it remains anonymous, the potential is clear – these tools can be game-changing for preventing cyber attacks.

And that raised an uncomfortable question: What happens when the most effective pentester isn’t a human?

That question isn’t speculative anymore. Autonomous systems are starting to compete — and in some cases outperform — human researchers in offensive security challenges. They’re moving up public bug bounty leaderboards, uncovering bugs at scale, and demonstrating strategic exploitation paths without human guidance.

It marks the start of something new in offensive security: where the core mechanics of traditional penetration testing — scoping, discovery, and exploitation — are increasingly executed by machines. The tiger teams of the 1970s broke into buildings. The red teams of the 2000s broke into networks. The next wave? Engineers are building systems that test at scale — without waiting on calendars, scopes, or consultant bandwidth.

The bad guys are catching up too but that doesn’t mean defenders are outmatched. Far from it — defenders maintain deep visibility, control, and context within their own environments. But the model is shifting. Faster software demands faster feedback loops.

The implications of tools like Unpatched AI are still unfolding, but it’s clear that the assumptions underpinning traditional pentesting are starting to bend. For decades, manual assessments have been the gold standard: targeted, thorough, and effective. But as systems grow more complex and interconnected, those one-off efforts struggle to keep pace. A new generation of software-driven approaches is emerging.

Background on pentesting

To understand what’s changing, it helps to first understand how pentesting works under the hood. At its core, pentesting is a structured simulation of real-world attacks, designed to uncover exploitable security flaws before adversaries do. Engagements typically kick off with scoping and rules of engagement, such as defining in-scope IP ranges, web apps, APIs, and cloud assets. From there, testers move into reconnaissance: passive scanning of public records (like DNS, WHOIS, and SSL certs), followed by active fingerprinting of exposed services using tools like Nmap, Amass, or Masscan. They enumerate open ports, identify versions, and flag potentially vulnerable components.

Next comes vulnerability discovery and exploitation. Testers use scanners like Nessus or Burp Suite to surface common CVEs, but the heavy lifting happens manually — chaining misconfigurations, insecure auth flows, or poorly implemented business logic into viable attack paths. A tester might bypass an S3 bucket ACL to pivot into internal cloud services, or exploit an IDOR (insecure direct object reference) to leak sensitive customer data. In more advanced cases, they escalate privileges across tenants, abuse overly permissive IAM roles, or simulate malware dropper execution via remote code execution (RCE) in outdated dependencies.

The final output is a report detailing what was found, how it was exploited, and how to fix it. It includes POC payloads, screenshots of compromised sessions, and reproduction steps for developers. 

Penetration testing is a structured process designed to simulate real-world cyberattacks. It typically follows five key stages:

Why traditional pentesting is no longer enough

Today’s threats move at machine speed. AI-augmented attackers can chain zero-days, dynamically exploit business logic flaws in real-time, and launch sophisticated campaigns with unprecedented efficiency. The attack surface itself has exploded. Cloud sprawl, agile DevOps pipelines, and the proliferation of IoT devices have created environments that are constantly changing and expanding, far outpacing the capacity of periodic, human-driven penetration tests to provide comprehensive coverage.

How common practice with traditional pentesting is to test a few times a year, and hope nothing changes too fast. Yet software never sits still. New APIs ship weekly. Cloud permissions shift hourly. Developers move fast — and attackers even faster. The result? Pentests land as polished snapshots of systems that have already evolved. The 2025 Verizon Data Breach Investigations Report drives this home: over two-thirds of breaches involved vulnerabilities that had gone unpatched for more than 90 days, despite many organizations having recently completed security assessments. 

That doesn’t mean pentests are obsolete but it does mean we’re long overdue for something more continuous, more contextual, and more in tune with the pace of software today.

Traditional pentesting is like checking your locks and windows once a year while a swarm of AI-powered burglars are constantly probing your house.

– Max Moroz

A recent generation of offensive security platforms promised the automation of penetration testing, they failed to deliver lasting value due. These tools attempted to cover a broad surface area—offering everything from phishing simulations to infrastructure scanning — but lacked the depth or precision needed to make their results meaningful. Users routinely described them as “doing everything but nothing well,” relying on static detection logic that could be replicated by in-house scripts. Rather than behaving as intelligent or agentic systems, they felt like legacy scanners wrapped in new branding.  

Beyond product depth, these platforms struggled to adapt to cloud-native environments. For example, some are still “on-prem Windows–focused,” limiting their relevance for companies operating in Kubernetes, serverless, or SaaS-dominant stacks. The lack of robust support for continuous CI/CD integration and modern application layers like mobile or web frontends also makes these tools feel increasingly outdated. To compound the issue, many teams reported being overwhelmed by alerts and CVEs that lacked exploitability, eroding trust across security and engineering functions. One common refrain: “We saw 50,000 critical vulnerabilities. Zero were real”. 

Scaling pentesting with AI

Next-gen pentesting, at its core, is a shift from labor-constrained engagements to scalable, AI-native systems built to match the speed and surface area of modern software development. 

The common denominator across this new class is architectural: they combine large language models with traditional exploit tooling, real-time telemetry, and proprietary data. Some operate as fully autonomous systems, orchestrating fleets of agents that plan attacks, execute them safely, and generate verified findings. Others take a copilot-style approach — assisting human testers with recon, payload generation, and report synthesis. And many fall somewhere in between, mostly with humans in the loop, but offering hybrid workflows that blend autonomy with human oversight. 

But what unites them is sophistication: this is not prompt engineering atop ChatGPT. These are deeply integrated systems with security-specific data layers, context management, custom exploit corpora, and often a proprietary data moat (such as benchmark challenges or production-grade bug bounty exploits). They are unbundling the constraints of the old model – expert labor, fixed engagements, static outputs – and rebuilding it as a software-first, continuous, and AI-augmented system. The tried-and-true tools of the trade are unlikely to change (though some startups are rebuilding those too); but most think they’ll just become even more useful.

Here is a market map of some of the next-generation pentesting companies (as of the time of writing):

How AI is rewriting the pentesting playbook

The impact is profound. We see this manifesting across a few primary dimensions:

AI that thinks like a hacker

Legacy tools (e.g., vulnerability scanners) are great at catching static issues such as outdated libraries, exposed services, and weak credentials. But today’s bugs hide in workflows like business logic, role transitions, and edge-case API paths. Agentic systems can now infer and act on intent rather than operating solely on raw inputs. Trained on real-world exploits, codebases, and system behavior, they can identify business logic flaws that were once the domain of human intuition. Think: discount abuse in e-commerce, privilege escalation via feature misuse, or subtle injection paths buried three calls deep.

Security that ships with the code

As pentesting becomes more accessible and efficient, the lines between testing, pentesting, and red teaming could blur. Imagine a world where pentesting is integrated into the CI/CD pipeline, automatically assessing the security of every deployment. This continuous security approach could significantly reduce the risk of vulnerabilities making it into production.

Everywhere, all at once

Classic pentests operate within tight constraints — one target, one time window, one test team. Next-gen systems are always probing. They scale across all environments, test multiple assets simultaneously, and run exploratory paths (like fuzzing or state-space traversal) that would be cost-prohibitive with humans. The result: broader attack-surface coverage and better preparation for adversaries who never ask for permission.

Verified exploits 

Most security teams are overwhelmed by false positives from scanners and static analyzers. Next-gen tools flip that. By executing exploits in a safe sandbox and validating every finding, they generate alerts that are actionable by design. No triage marathons. No guessing games. Only real vulnerabilities, verified and packaged.

Limitations and challenges

AI-driven pentesting holds a lot of promise, but it’s not a silver bullet. While the tools are evolving quickly, there are still meaningful gaps in scope, reliability, and operational trust that need to be addressed before they can replace traditional methods wholesale. Among them:

Data limitations constrain depth

These systems excel today at uncovering low-hanging vulnerabilities like XSS, SSRF, and simple misconfigurations, but their track record on complex bugs — like chained authorization bypasses, broken access control, multi-step injections, or environment-specific race conditions — is still limited. For example, could an AI-driven tool have uncovered a misconfigured S3 bucket silently leaking millions of scanned checks, as Jason Haddix did during a mobile banking app test? While the bucket itself was public, finding it required intercepting and decoding mobile app traffic, identifying where uploads were stored, recognizing the significance of the content, and understanding the broader privacy and compliance implications — a level of contextual reasoning and multi-step analysis that today’s systems are only beginning to approximate.

Some vendors are starting to tackle this with domain-specific training data or access to large exploit corpora (e.g., past bug bounty reports and structured CTFs). The tools will get better as their training data and telemetry improve, but, for now, they are strongest when aimed at known vulnerability patterns and reproducible workflows.

Accountability is unresolved

In regulated industries or high-trust environments, auditability and legal clarity matter. Who signs off on the results? Who is liable if something is missed? Today, most compliance frameworks (e.g., SOC 2, PCI, and ISO 27001) expect a “human-led” penetration test by a certified assessor. Autonomous systems, no matter how rigorous, don’t yet fit cleanly into that model. Early adopters are managing this pragmatically by using next-gen tools behind the scenes, while still running one manual pen test a year to satisfy external requirements. Over time, as standards evolve, it’s likely we’ll see formal recognition of AI-driven testing. In the near term, though, hybrid approaches will remain the norm.

Scope remains narrow

Most current systems focus heavily on web applications — often the easiest vector for testing agentic autonomy — but leave large swaths of the attack surface untouched. Cloud configurations, internal network infrastructure, mobile apps, IoT devices, and thick-client environments are either lightly addressed or entirely out of scope. The ambition is full-stack offensive coverage, but we’re not there yet. 

The human factor still matters

Even when a tool surfaces a valid issue, how it’s interpreted — and whether it’s taken seriously — is a different challenge altogether. One of the harrowing examples comes from a pentest conducted by Evan Hosinski, who discovered a vulnerability that allowed brute-force access to patient medical records through a third-party PDF service. The client dismissed the risk as unrealistic. Months later, the exact scenario played out in the wild, resulting in a public breach.

There are many more similar examples including Target’s 2013 breach and Equifax’s 2017 data breach. The tech was right. The outcome was preventable. But without the right organizational mindset, even the best tools — human or machine — can be ignored. AI can surface risk, but one has to act on it.

Executives and boards can no longer afford to be passive. Upleveling defenses means proactively investing in modern tools and capabilities, not just once a year but as a continuous commitment. The cost of underinvestment is no longer theoretical; it’s reputational, operational, and existential.

What’s ahead

It’s early days. As far as we know, no next-gen pentesting system is fully fully deployed across a production environment at scale. But we’re close. The pace of development, quality of early pilots, and enthusiasm from security teams suggest we’re at a meaningful inflection point. What began as a fringe experiment is now shaping up to become a core layer of the modern security stack. 

Next-gen pentesting tools are evolving into dynamic, continuous systems that go beyond traditional assessments. Some teams are already expanding into adjacent layers like DAST, SAST, runtime monitoring, and threat modeling, creating unified systems that fill in critical coverage gaps. The goal isn’t just to test what’s broken,but to build systems that actively adapt and integrate across the software delivery lifecycle. Tools like Unpatched AI and RamiGPT, which fuse traditional vuln scanning with AI capabilities, are an early glimpse of what this can look like: real-time detection, intelligent prioritization, and human-ready output.

We haven’t made an investment in this space yet — but we’d love to. We believe defenders hold an advantage that attackers never will: full visibility into their own systems. The challenge is making sense of that complexity, continuously and at scale. Next-gen pentesting systems bring us closer to that future. They aren’t just software — they’re how the good guys stay ahead.

Want More a16z Infra?

Analysis and news covering the latest trends reshaping AI and infrastructure.

Learn More
Recommended For You
Infra

Securing the Software Supply Chain with LLMs

Feross Aboukhadijeh, Joel de la Garza, and Derrick Harris
Infra

What DeepSeek Means for Cybersecurity

Brian Long, Dylan Ayrey, Ian Webster, and Joel de la Garza
Infra

Democratizing Generative AI Red Teams

Ian Webster and Anjney Midha
Infra

What Is an AI Agent?

Guido Appenzeller, Matt Bornstein, Yoko Li, and Derrick Harris
Infra

Agent Experience: Building an Open Web for the AI Era

Matt Biilmann and Martin Casado

Want More Infra?

Analysis and news covering the latest trends reshaping AI and infrastructure.

Sign Up On Substack

Views expressed in “posts” (including podcasts, videos, and social media) are those of the individual a16z personnel quoted therein and are not the views of a16z Capital Management, L.L.C. (“a16z”) or its respective affiliates. a16z Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell — or a solicitation of an offer to buy — any securities, and may not be used or relied upon in evaluating the merits of any investment.

The contents in here — and available on any associated distribution platforms and any public a16z online social media accounts, platforms, and sites (collectively, “content distribution outlets”) — should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here or on a16z content distribution outlets are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, posts may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein. All content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website — or on associated content distribution outlets — be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles — which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters.

There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available here: https://a16z.com/investments/. Past results of a16z’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Excluded from this list are investments (and certain publicly traded cryptocurrencies/ digital assets) for which the issuer has not provided permission for a16z to disclose publicly. As for its investments in any cryptocurrency or token project, a16z is acting in its own financial interest, not necessarily in the interests of other token holders. a16z has no special role in any of these projects or power over their management. a16z does not undertake to continue to have any involvement in these projects other than as an investor and token holder, and other token holders should not expect that it will or rely on it to have any particular involvement.

With respect to funds managed by a16z that are registered in Japan, a16z will provide to any member of the Japanese public a copy of such documents as are required to be made publicly available pursuant to Article 63 of the Financial Instruments and Exchange Act of Japan. Please contact compliance@a16z.com to request such documents.

For other site terms of use, please go here. Additional important information about a16z, including our Form ADV Part 2A Brochure, is available at the SEC’s website: http://www.adviserinfo.sec.gov.