Every quarter a new three-letter acronym shows up in a vendor deck. Last year it was AEV. This year Gartner introduced COST. CTEM is everywhere. BAS has been around forever and still gets confused with red teaming. Pen test means six different things depending on who you ask.

If you run a security program, the question is not which category is hottest. It is which one solves the problem you have today, what it does not solve, and where the gaps between them are quietly absorbing your budget.

This is a category-by-category breakdown. No marketing. Each section covers what the category actually does, what it does not, when to use it, and how to think about it next to the others.

VA: Vulnerability Assessment

What it does. Scans systems for known vulnerabilities using signature and version-based detection. Authenticated scans pull patch levels and configuration data. Unauthenticated scans probe exposed surface. The output is a CVE list mapped to assets, ranked by CVSS.

What it does not do. It does not validate exploitability. A VA scanner will tell you a system has CVE-2024-XXXX. It will not tell you whether that CVE is reachable from the internet, whether the vulnerable code path is exercised, or whether an attacker could actually chain it into something material. False positive rates are high because the scanner is matching versions, not testing exploits.

When to use it. As a baseline hygiene control. Patch management programs need it. Compliance frameworks expect it. Treat VA output as a candidate list, not a risk list.

Where it breaks. Modern application risk lives in business logic, broken access control, credential abuse, and chained attack paths. None of these show up in a VA scan.

DAST and Web Application Scanners

What it does. Crawls a running web application and fires payloads at endpoints to detect injection, XSS, misconfiguration, and a handful of OWASP categories. Faster feedback than VA for web surface.

What it does not do. DAST tools structurally cannot test business logic. They do not understand that an authenticated user transferring money to their own account should not be able to set the recipient ID to an arbitrary value. They do not chain findings. False positive rates of 40 to 70 percent are common, which means triage burns more analyst time than the scan saves.

When to use it. As part of CI/CD to catch regression on common injection classes. Useful for the easy stuff. Not a substitute for an actual pen test.

Pen Test: Penetration Testing

What it does. A human tester, or a platform that behaves like one, attempts to compromise a defined target the way an attacker would. Real exploitation. Real chaining. Real proof. The output is not a CVE list. It is a narrative of how someone could get from outside your perimeter to data exfiltration, with evidence at every step.

What it does not do. Traditional pen testing has three structural problems that have nothing to do with tester skill.

First, scope. Most programs test about 20 percent of the attack surface, usually the crown jewels. Attackers probe 100 percent. Shadow apps, forgotten subdomains, peripheral assets, and API endpoints buried in JavaScript files do not get tested.

Second, cadence. Most pen tests are annual. Modern teams deploy code weekly or daily. Attackers exploit new CVEs in about three days. The gap between testing and reality widens every release.

Third, repeatability. A consultant runs an engagement, writes a report, and leaves. The next year a different consultant starts from scratch. Findings are not tracked over time. Coverage is not measured.

When to use it. Always. The question is whether you do it manually, autonomously, or both, and at what cadence.

Manual versus automated pen testing. Manual pen testing runs $2,400 to $10,000 per app in consulting cost and takes weeks. AI-driven autonomous pen testing brings that to $450 to $2,500 per app with a continuous cadence. Manual still wins on the most niche business logic edge cases where a senior researcher’s intuition matters. Autonomous wins on coverage, consistency, and speed. Mature programs do both.

BAS: Breach and Attack Simulation

What it does. Continuously runs a library of known attack techniques against your environment to test whether your controls detect and block them. Did EDR catch the Mimikatz behavior? Did the firewall block the C2 callback? Did SIEM generate the alert? BAS is a control validation tool. It answers: Are my defenses working against techniques we already know about?

What it does not do. BAS does not find new vulnerabilities. It does not pen test your applications. It does not chain attacks the way a real adversary would. It runs a known playbook against your defensive stack to grade the stack.

When to use it. When you have a SOC, a defensive tooling investment, and you need ongoing assurance that the controls you bought are actually doing what the procurement deck said. Mature blue teams use it to detect drift in detection coverage.

Common confusion. BAS is often pitched as red teaming. It is not. Red teaming is objective-based and adversarial. BAS is technique-based and deterministic. They complement each other. They are not substitutes.

CART: Continuous Automated Red Teaming

What it does. Objective-based attack simulation that mirrors how real adversaries operate, run continuously rather than once a year. The objective is not “did the EDR fire,” it is “could an attacker reach the crown jewel data.” CART platforms plan attack paths, execute multi-stage chains, attempt lateral movement and privilege escalation, and produce a kill chain narrative aligned to MITRE ATT&CK for infrastructure and OWASP for web applications.

The difference from BAS is the difference between rehearsing a known sequence and trying to actually win. CART starts from an objective and adapts. BAS runs a script.

What it does not do. CART is not a substitute for application pen testing. The two overlap, but red teaming generally optimizes for reaching an objective by the easiest path. That can mean the application layer never gets exercised deeply because the attacker found a credential on the dark web and walked in. You still need dedicated application pen testing to test depth on the app itself.

When to use it. When your security program is mature enough that the question is not “do we have vulnerabilities” but “can an adversary actually compromise our crown jewels, and how.” Larger enterprises and regulated industries that need to demonstrate ongoing adversarial assurance to their boards and regulators.

ASM and CTEM: Attack Surface Management and Continuous Threat Exposure Management

What it does. ASM discovers and tracks the external attack surface. Domains, subdomains, IPs, certificates, exposed services, leaked credentials on the dark web, shadow assets nobody remembers spinning up. CTEM is Gartner’s broader framework around this: scope, discover, prioritize, validate, mobilize. CTEM is a program model, not a single product category.

ASM answers what you own. CTEM adds: of what you own, what is exposed, what matters, and what you are going to do about it.

What it does not do. ASM does not pen test. It tells you that a forgotten Jenkins instance is exposed on the internet. It does not tell you whether that Jenkins is exploitable, whether it has been compromised, or whether an attacker could pivot from it into your internal network. Most ASM tools also generate noise. They surface exposures without exploitability context, which means the security team still has to triage what matters.

The validation step of CTEM is where ASM hands off to pen testing or red teaming. Validation is where most CTEM programs stall, because most ASM tools do not do it and most pen test programs cannot keep up with the inflow.

When to use it. Always, as a foundation. You cannot test what you do not know exists. The Fortune 500 story is consistent: organizations consistently discover 30 to 50 percent more assets than their CMDB shows, and the unknown assets are usually where the worst exposures live. 20 percent of breaches begin through peripheral asset initial access. That is the asset class ASM finds.

AEV: Adversarial Exposure Validation

What it does. AEV is the validation layer of CTEM, broken out as its own category. The premise: ASM finds exposure, but exposure is not risk. AEV validates which exposures are actually exploitable by running real attack techniques against them. The output is a prioritized list of exploitable exposures, not a list of theoretical ones.

AEV pulls together what BAS, autonomous pen testing, and red teaming each do partially. It frames the work as “validate exposure” rather than “test the app” or “test the control.”

What it does not do. AEV is a category framing, not a product type. Different vendors fill the AEV bucket differently. Some are BAS vendors expanding into exposure validation. Some are autonomous pen testing platforms. Some are CTEM platforms partnering with validation engines. When you evaluate AEV, ask what the validation engine actually does and how it proves a finding is exploitable. Without proof of exploit, AEV becomes another scoring exercise.

When to use it. When you have CTEM or ASM running and the question your CISO keeps asking is “of the 10,000 exposures, which 50 should we actually fix this quarter.”

COST: Continuous Offensive Security Testing

What it does. COST is Gartner’s recent framing for the convergence happening across pen testing, red teaming, AEV, and CTEM validation. Rather than treating each as a separate category with separate vendors, COST describes the operational model: offensive security testing run continuously, with exploit validation, across the full attack surface, integrated with the broader exposure management program.

The category exists because security leaders kept asking the same question: why do I need a pen test vendor, a red team vendor, a BAS vendor, an AEV vendor, and an ASM vendor that do not talk to each other. COST is the recognition that the work belongs in one operational program.

What it does not do. COST is not a product checkbox. It is a program model. Vendors will claim COST coverage with varying degrees of honesty. The useful test: does the platform actually run validated exploits continuously across the full surface, or is it a dashboard that aggregates findings from five tools that do not.

When to use it. As the framing for your offensive security program planning over the next two to three years. Most security organizations are quietly heading toward COST whether they call it that or not. The shift from annual pen testing to continuous offensive testing is happening because the cadence math no longer works.

How These Fit Together

The fastest way to keep these straight is to ask what question each one answers.

Category	Question it answers	Output
VA	What known CVEs exist on my systems?	CVE list per asset
DAST	What injection-class flaws exist in my web apps?	Scanner findings, often noisy
Pen Test	Can someone actually compromise this target?	Validated attack paths with proof
PTaaS	How do I manage pen test engagements better?	Same as pen test, with a platform wrapper
BAS	Are my defensive controls catching known techniques?	Control coverage grade
CART	Can an adversary reach my crown jewels, continuously?	Multi-stage kill chains, MITRE-aligned
ASM	What do I own and what is exposed?	Asset inventory with exposure flags
CTEM	How do I prioritize the exposures I find?	Program framework, not a product
AEV	Which exposures are actually exploitable?	Validated, prioritized exposure list
COST	How do I run all of the above as one continuous program?	Operational model, not a category

A mature program does not pick one. It runs VA for hygiene, DAST for fast regression checks in CI/CD, pen testing for application depth, ASM for discovery, BAS for control validation, CART for adversarial assurance, and ties it all together under a CTEM or COST framing.

The mistake is treating these as competing categories. They are layers. The other mistake is buying a tool in every layer from a different vendor and watching the operational seams between them become where attackers operate.

Where the Categories Are Converging

Three trends are collapsing the category boundaries.

Pen testing is becoming continuous, not annual. The cost economics now support it. AI-driven autonomous pen testing brings the per-app cost down by an order of magnitude, which means coverage that was previously scoped out becomes affordable.

Red teaming is becoming application-aware. CART platforms used to focus on infrastructure and Active Directory paths. The newer generation chains application-layer findings into the same kill chains, which means the app-to-network pivot finally gets tested end to end.

ASM is becoming validation-first. Pure inventory ASM is commoditizing. The vendors that survive are the ones that hand exposure off to a validation engine and tell the customer which exposures are actually exploitable.

The endpoint of these trends is what Gartner is calling COST. One continuous offensive security program, validated exploits across the full surface, integrated with exposure management, run at the cadence at which applications actually change.

What This Means for Your Program

If your program today is annual pen testing plus a DAST scanner plus a VA tool, you are running a 2015 model. It still satisfies compliance. It does not satisfy the threat model.

If your program is annual pen testing plus ASM plus BAS, you have visibility and control validation. You still have a validation gap. ASM is telling you what is exposed. Nothing is telling you what is exploitable.

If your program runs continuous autonomous pen testing across the full surface, with chained attack path validation, integrated with ASM and feeding CTEM, you are running what the category is moving toward. The labels matter less than the operational reality.

FireCompass is built for that operational reality. The platform runs autonomous pen testing on web, API, and infrastructure targets, chains findings into multi-stage attack paths, validates every finding with proof of exploit, and feeds the same execution engine for CART and CTEM validation. One platform, one audit trail, one continuous cadence. That is the architecture COST describes.

Want to see what continuous offensive testing actually surfaces in your environment? Run a free pen test with FireCompass Explorer at firecompass.com/explorer. No agents to install, results in minutes, real exploit validation. If you want the deeper technical positioning on why LLMs alone cannot do this work, the Beyond Mythos whitepaper walks through the architecture.

About FireCompass

FireCompass is an Agentic AI platform for autonomous penetration testing and red teaming across Web, API, and infrastructure. It discovers shadow assets and web applications, safely validates what is exploitable, and connects findings into multi-stage attack paths with near-zero false positives. Unlike traditional scanners, FireCompass uncovers credential reuse, business-logic flaws, privilege escalation, and app-to-app or app-to-network lateral movement. It can operate autonomously or with expert-in-the-loop validation. FireCompass has 30+ analyst recognitions across Gartner, Forrester, and IDC, and is trusted by Fortune 1000 enterprises.

See What’s Actually Exploitable in Your Environment. Claim Free AI Pen Testing Credits → firecompass.com/explorer