From the Perspective of an OnDefend Penetration Tester
AI on Both Sides of the Red Team Fight
Threat actors are already using AI to move faster, scale attacks, and find weaknesses more efficiently than ever before. Defenders are racing to catch up. So where does a red team actually start? We asked ours.
OnDefend’s red team is one of the most advanced offensive security teams in the United States, with decades of combined experience across offensive and defensive operations. They develop their own techniques, adapt to evade defenses, and engineer attack paths the way a sophisticated threat actor would, pursuing hypotheses, chaining findings, and pushing deeper than any automated platform is designed to go.
For more than a year, they have been integrating AI into those workflows so OnDefend can outpace adversaries, not just keep pace with them.
The Bottleneck Was Never Expertise
Matt Zamat, OnDefend Application Security Manager
Many penetration testing teams still handle the slowest parts of the job manually. Decompiling stripped binaries in Ghidra and naming functions one at a time. Reviewing hundreds of thousands of lines of decompiled mobile application code to find a single hardcoded secret. Manually swapping session tokens to test for cross-user authorization flaws on every endpoint. Writing up findings, one careful paragraph at a time.
All of that work is important. But it also dominates the engagement, and it is exactly the kind of work AI can compress when integrated into the workflow effectively.
At OnDefend, our offensive security team has spent the last year integrating AI across the parts of pen testing that have traditionally created bottlenecks. The result is faster engagements, deeper coverage, and findings our clients fully trust.
The slow parts of an offensive engagement were never about thinking. They were about reading.
Reading binary disassembly. Reading mobile source decompiled out of a class dump. Reading HTTP traffic looking for the one endpoint that does not validate authorization the way the others do.
I’ve done hundreds of pen tests and as an experienced pen tester it’s my job to know what to look for but finding it still meant hours of grinding through material. And that grinding work historically consumed 60 to 70 percent of every engagement.
That is where AI delivered the first meaningful gains, and it is still where we see the greatest operational impact today.
How OnDefend Uses AI Across the Offensive Workflow
Reverse Engineering at Scale
Pairing Ghidra with large language models through MCP allows our testers to triage decompiled functions in minutes instead of hours. Stripped iOS, macOS, and Android binaries that once took an entire day to orient against are now mappable in a single sitting.
Authentication routines, cryptographic operations, jailbreak detection, and anti-tamper logic surface quickly, which means our team can focus on higher-value findings sooner.
Static Analysis Across Massive Codebases
Mobile applications routinely decompile into hundreds of thousands of lines of code. No human can realistically review all of that manually, and traditional static analysis tools often miss important context.
However, with AI-assisted workflows, we are actually able to cover the entire codebase. Insecure cryptography, hardcoded API keys, unsafe deep link handlers, IPC vulnerabilities, and risky debug paths can be surfaced in minutes, while every issue identified by the model is manually validated before it appears in a client report.
Cross-User Authorization Testing at Scale
We built internal tooling that spins up two authenticated browser sessions for two different users on the same target, automatically swaps session tokens between them, replays requests across the matrix, and diffs the responses to surface broken object-level authorization. What used to be a multi-day manual exercise across a hundred endpoints is now an automated pass that runs in the background, and our testers spend their time on the small percentage of endpoints that actually leak data.
Reporting and Evidence
Findings that used to take an hour or more to document, including reproduction steps, severity scoring, Common Weakness Enumeration (CWE) references, and remediation guidance, are now produced in minutes. We auto-render request and response pairs into clean Burp Repeater-styled PNG screenshots so evidence looks the way clients expect, and real-time sync into our finding tracker means a vulnerability discovered at 2:00 is fully documented and triaged by 2:15.
Why Discipline Still Matters
AI is fast but fast does not always mean accurate.
The same speed that allows a model to surface a hundred potential issues in minutes can also produce a hundred convincing false positives. Teams that hand AI the keys without guardrails end up shipping reports full of findings that often turn out to be intended behavior, out of scope, or just artifacts of an unstable backend.
That is why we built two specific guardrails into our workflow.
Guardrail #1: The Validation Gate
Anything an LLM flags is treated as a hypothesis, not a confirmed finding.
Our testers are required to manually reproduce the issue, with a working proof of concept, and prove downstream impact before it ever appears in a deliverable. A validation bypass is not a finding until something on the other side actually happens.
Guardrail #2: The Recon Gate
Our internal tooling will not let a tester start exploitation against an asset until reconnaissance is sufficiently complete. AI makes everything faster, including jumping ahead of yourself. The recon gate forces our team to slow down at the exact moment AI is tempting them to speed up. It is a quiet piece of process design and one of the biggest reasons we have consistently produced high-quality work even as our pace has accelerated.
Speed without discipline produces noise. Discipline without speed produces missed findings.
Focusing on both is what allows OnDefend to cover more ground without losing the rigor that catches the bugs that truly matter.
Where AI Falls Short
We want to be honest about this part. AI does not find new classes of vulnerabilities. It does not chain three small bugs into a critical exploit. It does not look at a complex business logic flow and intuit that a specific sequence of normally allowed actions produces an outcome the developers never intended. It does not handle backend instability gracefully, and it does not validate downstream impact on its own.
The judgment work, the question of whether a finding is real or whether the tester is fooling themselves, is still entirely human. That has not changed, and we do not expect it to change anytime soon.
What AI does is take the rote work off of plates so testers have more time for the parts only humans do well: creativity, intuition, and experience to provide the greatest value.
That is the whole game, and it is why the engagements OnDefend runs today produce more meaningful coverage than our competitors.
We Are Still Early
I tell our team often that we are early. The tools we use today are still improving, and we are going to see major strides over the next twelve to twenty-four months. Models are getting smarter. Agents are getting more reliable at chaining tools. The gap between an idea for an attack and a working proof of concept is going to keep shrinking.
The teams that learn to operate at AI-assisted pace now, with the discipline to validate every output and the workflow design to keep humans on the judgment calls, are going to be the teams setting the bar for what offensive security looks like five years from today. OnDefend intends to be one of them. That is why we keep pushing.
The Bottom Line
The adversary has access to the same models we do, and they are using them. The question is not whether AI is going to change offensive security. It is whether the security partner testing your environment is using it to find your vulnerabilities before someone else does.
We are. And we have built the workflow design to do it without sacrificing the rigor that finds the bugs that actually matter.
Matt Zamat
OnDefend
Combining AI Speed with Human Expertise
At OnDefend, we believe the future of offensive security belongs to organizations that combine elite human expertise with AI-enabled workflows that improve speed, scale, and precision without sacrificing rigor.
We don’t see AI as a replacement for offensive security expertise. We see it as a force multiplier for experienced testers who know where to look, what questions to ask, and how to separate real risk from noise.
That means humans still in the decision chain, experienced operators who can reason about a specific environment, pursue a hypothesis, and prove out a real attack path. Amplified by technology that makes them faster and more effective: automation absorbing the repeatable work, AI synthesizing intelligence to direct where operators focus, and continuous validation ensuring nothing slips back in between tests.
The engine behind that model is BlindSPOT.
Ready to See BlindSPOT in Action?
Discover how OnDefend’s proprietary offensive security engine combines automation, AI insights, and continuous validation to identify and reduce hidden exploitable risk at scale.