From the Perspective of an OnDefend Penetration Tester

AI on Both Sides of the Red Team Fight

Threat actors are already using AI to move faster, scale attacks, and find weaknesses more efficiently than ever before. Defenders are racing to catch up. So where does a red team actually start? We asked ours.

The OnDefend red team is one of the most advanced offensive security teams in the United States, with decades of combined experience across offensive and defensive operations. They develop their own techniques, adapt to evade defenses, and engineer attack paths the way a sophisticated threat actor would, pursuing hypotheses, chaining findings, and pushing deeper than any automated platform is designed to go.

For more than a year, they have been integrating AI into those workflows so OnDefend can outpace adversaries, not just keep pace with them.

The Bottleneck Was Never Expertise

Matt Zamat, OnDefend Lead Application Penetration Tester

Many penetration testing teams still handle the slowest parts of the job manually. Decompiling stripped binaries in Ghidra and naming functions one at a time. Reviewing hundreds of thousands of lines of decompiled mobile application code to find a single hardcoded secret. Manually swapping session tokens to test for cross-user authorization flaws on every endpoint. Writing up findings, one careful paragraph at a time.

All of that work is important. But it also dominates the engagement, and it is exactly the kind of work AI can compress when integrated into the workflow effectively.

At OnDefend, our offensive security team has spent the last year integrating AI across the parts of pen testing that have traditionally created bottlenecks. The result is faster engagements, deeper coverage, and findings our clients fully trust.

The slow parts of an offensive engagement were never about thinking. They were about reading.

Reading binary disassembly. Reading mobile source decompiled out of a class dump. Reading HTTP traffic looking for the one endpoint that does not validate authorization the way the others do.

I’ve done hundreds of pen tests, and, as an experienced pen tester, it’s my job to know what to look for. However, finding it still meant hours of grinding through material, and that grinding work historically consumed 60 to 70 percent of every engagement.

That is where AI delivered the first meaningful gains, and it is still where we see the greatest operational impact today.

How OnDefend Uses AI Across the Offensive Workflow

Reverse Engineering at Scale

Pairing Ghidra with large language models through MCP allows our testers to triage decompiled functions in minutes instead of hours. Stripped iOS, macOS, and Android binaries that once took an entire day to orient against are now mappable in a single sitting.

Authentication routines, cryptographic operations, jailbreak detection, and anti-tamper logic surface quickly, which means our team can focus on higher-value findings sooner.

Static Analysis Across Massive Codebases

Mobile applications routinely decompile into hundreds of thousands of lines of code. No human can realistically review all of that manually, and traditional static analysis tools often miss important context.

However, with AI-assisted workflows, we are actually able to cover the entire codebase. Insecure cryptography, hardcoded API keys, unsafe deep link handlers, IPC vulnerabilities, and risky debug paths can be surfaced in minutes, while every issue identified by the model is manually validated before it appears in a client report.

Cross-User Authorization Testing at Scale

We built internal tooling that spins up two authenticated browser sessions for two different users on the same target, automatically swaps session tokens between them, replays requests across the matrix, and diffs the responses to surface broken object-level authorization. What used to be a multi-day manual exercise across a hundred endpoints is now an automated pass that runs in the background, and our testers spend their time on the small percentage of endpoints that actually leak data.

Reporting and Evidence

Findings that used to take an hour or more to document, including reproduction steps, severity scoring, Common Weakness Enumeration (CWE) references, and remediation guidance, are now produced in minutes. We auto-render request and response pairs into clean Burp Repeater-styled PNG screenshots so evidence looks the way clients expect, and real-time sync into our finding tracker means a vulnerability discovered at 2:00 is fully documented and triaged by 2:15.

Why Discipline Still Matters

AI is fast, but fast does not always mean accurate.

The same speed that allows a model to surface a hundred potential issues in minutes can also produce a hundred convincing false positives. Teams that hand AI the keys without guardrails end up shipping reports full of findings that often turn out to be intended behavior, out of scope, or just artifacts of an unstable backend.

That is why we built two specific guardrails into our workflow.

Guardrail #1: The Validation Gate

Anything an LLM flags is treated as a hypothesis, not a confirmed finding.

Our testers are required to manually reproduce the issue, with a working proof of concept, and prove downstream impact before it ever appears in a deliverable. A validation bypass is not a finding until something on the other side actually happens.

Guardrail #2: The Recon Gate

Our internal tooling will not let a tester start exploitation against an asset until reconnaissance is sufficiently complete. AI makes everything faster, including jumping ahead of yourself. The recon gate forces our team to slow down at the exact moment AI is tempting them to speed up. It is a quiet piece of process design and one of the biggest reasons we have consistently produced high-quality work even as our pace has accelerated.

Speed without discipline produces noise. Discipline without speed produces missed findings.

Focusing on both is what allows OnDefend to cover more ground without losing the rigor that catches the bugs that truly matter.

Where AI Falls Short

We want to be honest about this part. AI does not find new classes of vulnerabilities. It does not chain three small bugs into a critical exploit. It does not look at a complex business logic flow and intuit that a specific sequence of normally allowed actions produces an outcome the developers never intended. It does not handle backend instability gracefully, and it does not validate downstream impact on its own.

The judgment work, the question of whether a finding is real or whether the tester is fooling themselves, is still entirely human. That has not changed, and we do not expect it to change anytime soon.

What AI does is take the rote work off of plates so testers have more time for the parts only humans do well: creativity, intuition, and experience to provide the greatest value.

That is the whole game, and it is why the engagements OnDefend runs today produce more meaningful coverage than our competitors.

We Are Still Early

I tell our team often that we are early. The tools we use today are still improving, and we are going to see major strides over the next 12 to 24 months. Models are getting smarter. Agents are getting more reliable at chaining tools. The gap between an idea for an attack and a working proof of concept is going to keep shrinking.

The teams that learn to operate at AI-assisted pace now, with the discipline to validate every output and the workflow design to keep humans on the judgment calls, are going to be the teams setting the bar for what offensive security looks like five years from today. OnDefend intends to be one of them. That is why we keep pushing.

The Bottom Line

The adversary has access to the same models we do, and they are using them. The question is not whether AI is going to change offensive security. It is whether the security partner testing your environment is using it to find your vulnerabilities before someone else does.

We are. And we have built the workflow design to do it without sacrificing the rigor that finds the bugs that actually matter.

Matt Zamat
OnDefend

Combining AI Speed with Human Expertise

At OnDefend, we believe the future of offensive security belongs to organizations that combine elite human expertise with AI-enabled workflows that improve speed, scale, and precision without sacrificing rigor.

We don’t see AI as a replacement for offensive security expertise. We see it as a force multiplier for experienced testers who know where to look, what questions to ask, and how to separate real risk from noise.

That means humans still in the decision chain, experienced operators who can reason about a specific environment, pursue a hypothesis, and prove out a real attack path. Amplified by technology that makes them faster and more effective: automation absorbing the repeatable work, AI synthesizing intelligence to direct where operators focus, and continuous validation ensuring nothing slips back in between tests.

The engine behind that model is BlindSPOT.

Ready to See BlindSPOT in Action?

Discover how OnDefend’s proprietary offensive security engine combines automation, AI insights, and continuous validation to identify and reduce hidden exploitable risk at scale.

Check out BlindSPOT

Post Quantum Cryptography

Network Penetration Testing

Software & Application Penetration Testing

Hardware & Integrated Systems Testing

AI & LLM Penetration Testing

Cryptography

Red Teaming Services

Purple Teaming Services

Cybersecurity Risk Assessments

Compliance Readiness

Virtual CISO

Risk, Governance & Compliance

AI Deployment & Controls

Continuous Security Inspector (CSI)

Security Control Validation as a Service

Operational Technology (OT) Security Program

Election Security Program

About Us

Careers

Contact Us

From the Perspective of an OnDefend Penetration Tester

AI on Both Sides of the Red Team Fight

The Bottleneck Was Never Expertise

How OnDefend Uses AI Across the Offensive Workflow

Reverse Engineering at Scale

Static Analysis Across Massive Codebases

Cross-User Authorization Testing at Scale

Reporting and Evidence

Why Discipline Still Matters

Guardrail #1: The Validation Gate

Guardrail #2: The Recon Gate

Where AI Falls Short

We Are Still Early

The Bottom Line

Combining AI Speed with Human Expertise

Ready to See BlindSPOT in Action?

Services

Solutions

Red Teaming Services

Purple Teaming Services

Cybersecurity Risk Assessments

Compliance Readiness

How OnDefend Uses AI Across the Offensive Workflow

From the Perspective of an OnDefend Penetration Tester

AI on Both Sides of the Red Team Fight

The Bottleneck Was Never Expertise

How OnDefend Uses AI Across the Offensive Workflow

Reverse Engineering at Scale

Static Analysis Across Massive Codebases

Cross-User Authorization Testing at Scale

Reporting and Evidence

Why Discipline Still Matters

Guardrail #1: The Validation Gate

Guardrail #2: The Recon Gate

Where AI Falls Short

We Are Still Early

The Bottom Line

Combining AI Speed with Human Expertise

Ready to See BlindSPOT in Action?

Company

Services

Solutions

Stay up to date