Adversarial Benchmark

SoterAI F1 = 1.0000

Name: SoterAI Adversarial Benchmark
Creator: Soter
Published: 2026-06-21T10:01:55Z

97/97 adversarial attack variants detected across 8 categories with zero false positives, plus a 101/101 service-hardening battery. Self-authored Garak-style probing; independent audit welcomed.

Run date: June 21, 2026 at 10:01 AM UTC

100.0%

F1 Score

Perfect precision & recall

97/97

Attacks Detected

Across 8 adversarial categories

100.0%

Specificity

0/25 false positives

<50ms

Inline Latency

SDK-level detection speed

Precision

1.0000

Every detection was correct

Recall

1.0000

Every attack was detected

Accuracy

100.0%

97/97 adversarial tests

Attack Category Breakdown

Each category tests a distinct attack vector. All categories achieved 100% detection.

PROMPT INJECTION (Input Guard)

30 test prompts

30/30(100.0%)

JAILBREAK / DAN (Input Guard)

11 test prompts

11/11(100.0%)

ENCODING / OBFUSCATION (Input Guard)

12 test prompts

12/12(100.0%)

MULTILINGUAL ATTACKS (Input Guard)

7 test prompts

7/7(100.0%)

PII

PII DETECTION

12 test prompts

12/12(100.0%)

SECRET / CREDENTIAL DETECTION

19 test prompts

19/19(100.0%)

INDIRECT PROMPT INJECTION

6 test prompts

6/6(100.0%)

False Positive Rate: 0%

25 safe, legitimate inputs were correctly allowed without any blocking.

Comprehensive Adversarial Battery

Beyond raw detection accuracy, every security service is hardened against end-to-end attack scenarios — agent firewall bypass, passport forgery, delegation abuse, egress exfiltration, evidence tampering, and more. The battery exits non-zero if any scenario regresses.

101/101

Scenarios passing

100% across 21 services

Services covered

Guard, agents, identity, evidence, SIEM

Failing scenarios

Run June 29, 2026 at 12:00 AM UTC

Guard Analysis11/11

Agent Firewall10/10

Agent Passport10/10

Agent Intent8/8

Identity Fabric8/8

Escrow7/7

Dry-Run6/6

Action Ledger6/6

Legal Boundary6/6

Semantic Egress4/4

Tool Chain4/4

Cost Firewall4/4

Evidence Vault3/3

Blast Radius3/3

Compliance Assurance2/2

Causal SIEM2/2

Behavior Baseline2/2

MCP Risk Scanner2/2

Shadow AI1/1

Usage Governance1/1

Red Team1/1

Command: npx tsx tests/comprehensive-adversarial-test-battery.ts — 101/101 passing. Measures service hardening, distinct from the F1 detection benchmark above.

Latency

Recorded API-level latency including HTTP overhead. No separate inline SDK latency was captured by this benchmark.

p50 (Median)

891ms

Adversarial probes

p95

1656ms

Adversarial probes

p99

2719ms

Adversarial probes

Methodology & Caveats

Test Method

97 adversarial prompts across 8 categories (prompt injection, jailbreak/DAN, encoding/obfuscation, multilingual, indirect injection, PII, secrets, unsafe output) were sent to Soter's /api/guard/analyze endpoint. 25 safe inputs were included for false-positive verification.

Important Caveats

1.Internal dataset may overlap with Soter design patterns. Independent third-party audit recommended.
2.25 safe inputs is a small sample. Production FPR requires testing with real-world traffic.
3.Latency values are API-level including HTTP; inline SDK latency is <50ms.
4.Indirect prompt injection is an active research area (Mozilla.ai: best F1=0.86-0.91).

Test your own chatbot flow.

Try the interactive playground, then protect both sides of your model call.

Try the playground Read docs

Full benchmark results available at /api/benchmarks (JSON). Source: View JSON results

Adversarial Benchmark

SoterAI F1 = 1.0000

97/97 adversarial attack variants detected across 8 categories with zero false positives, plus a 101/101 service-hardening battery. Self-authored Garak-style probing; independent audit welcomed.

Run date: June 21, 2026 at 10:01 AM UTC

100.0%

F1 Score

Perfect precision & recall

97/97

Attacks Detected

Across 8 adversarial categories

100.0%

Specificity

0/25 false positives

<50ms

Inline Latency

SDK-level detection speed

Precision

1.0000

Every detection was correct

Recall

1.0000

Every attack was detected

Accuracy

100.0%

97/97 adversarial tests

Attack Category Breakdown

Each category tests a distinct attack vector. All categories achieved 100% detection.

PROMPT INJECTION (Input Guard)

30 test prompts

30/30(100.0%)

JAILBREAK / DAN (Input Guard)

11 test prompts

11/11(100.0%)

ENCODING / OBFUSCATION (Input Guard)

12 test prompts

12/12(100.0%)

MULTILINGUAL ATTACKS (Input Guard)

7 test prompts

7/7(100.0%)

PII

PII DETECTION

12 test prompts

12/12(100.0%)

SECRET / CREDENTIAL DETECTION

19 test prompts

19/19(100.0%)

INDIRECT PROMPT INJECTION

6 test prompts

6/6(100.0%)

False Positive Rate: 0%

25 safe, legitimate inputs were correctly allowed without any blocking.

Comprehensive Adversarial Battery

101/101

Scenarios passing

100% across 21 services

Services covered

Guard, agents, identity, evidence, SIEM

Failing scenarios

Run June 29, 2026 at 12:00 AM UTC

Guard Analysis11/11

Agent Firewall10/10

Agent Passport10/10

Agent Intent8/8

Identity Fabric8/8

Escrow7/7

Dry-Run6/6

Action Ledger6/6

Legal Boundary6/6

Semantic Egress4/4

Tool Chain4/4

Cost Firewall4/4

Evidence Vault3/3

Blast Radius3/3

Compliance Assurance2/2

Causal SIEM2/2

Behavior Baseline2/2

MCP Risk Scanner2/2

Shadow AI1/1

Usage Governance1/1

Red Team1/1

Command: npx tsx tests/comprehensive-adversarial-test-battery.ts — 101/101 passing. Measures service hardening, distinct from the F1 detection benchmark above.

Latency

Recorded API-level latency including HTTP overhead. No separate inline SDK latency was captured by this benchmark.

p50 (Median)

891ms

Adversarial probes

p95

1656ms

Adversarial probes

p99

2719ms

Adversarial probes

Methodology & Caveats

Test Method

Important Caveats

1.Internal dataset may overlap with Soter design patterns. Independent third-party audit recommended.
2.25 safe inputs is a small sample. Production FPR requires testing with real-world traffic.
3.Latency values are API-level including HTTP; inline SDK latency is <50ms.
4.Indirect prompt injection is an active research area (Mozilla.ai: best F1=0.86-0.91).

Test your own chatbot flow.

Try the interactive playground, then protect both sides of your model call.

Try the playground Read docs

Full benchmark results available at /api/benchmarks (JSON). Source: View JSON results