Preset validation by TheTechromancer · Pull Request #3058 · blacklanternsecurity/bbot

TheTechromancer · 2026-04-24T16:40:30Z

Preset validation rework

Replaces silent acceptance of typos and wrong types in BBOT presets with strict, schema-driven validation. Catches mistakes like modlues: [...], scope: {strct: true}, modules: {nucleii: {tgas: "x"}}, and nuclei.mode: aggressive at config-load time instead of after a multi-hour scan produces nothing.

New public API

Validate any preset dict (e.g. from yaml.safe_load) without instantiating a Scanner:

from bbot.scanner import validate_preset, validate_preset_file

errors = validate_preset({
    "modules": ["nuclei"],
    "config": {
        "modules": {
            "nuclei": {"mode": "manual", "ratelimit": 100},
        },
    },
})
if errors:
    for e in errors:
        print(e)   # e.g. [module:nuclei:mode] Expected one of 'manual', ...
    raise SystemExit(1)

# Convenience wrapper for files
errors = validate_preset_file("/path/to/preset.yml")

Returns a list of PresetValidationError objects. Empty list = valid. All errors across all layers are aggregated in a single pass, so a user with five typos sees five errors at once.

The pydantic schemas themselves are also exported, for callers that want type-checking, doc generation, or to validate just one layer (e.g. the global config) on its own:

from bbot.scanner import (
    BBOTConfig,         # global config block (scope/dns/web/engine/deps/…)
    BaseModuleConfig,   # base for every module's `class Config(BaseModuleConfig)`
    PresetSchema,       # top-level preset shape (target, modules, flags, config, …)
    ScopeConfig, DnsConfig, WebConfig, EngineConfig, DepsConfig, DepsToolConfig,
)

These are validation schemas only — they have no defaults of their own (defaults live in bbot/defaults.yml). For full preset validation that also covers per-module Config blocks, use validate_preset(); the composite schema is built dynamically from the loaded module set, so it isn't a static class.

Sample errors

[preset:modlues]                            Unknown option: 'modlues' (value: ['nuclei'])
[config:scope.strct]                        Could not find config option "scope.strct". Did you mean "scope.strict"?
[config:web.http_timeout]                   Expected an integer, got str: 'not-a-number'
[preset:config.modules.nucleii]             Could not find module "nucleii". Did you mean "nuclei"?
[module:nuclei:tgas]                        Unknown option: 'tgas' (value: 'apache')
[module:nuclei:mode]                        Expected one of 'manual', 'technology', 'severe' or 'budget', got 'aggressive'
[module:baddns:custom_nameservers.0]        Expected a string, got int: 1
[config:deps.behavior]                      Expected one of 'abort_on_failure', 'retry_failed', 'ignore_failed', 'disable' or 'force_install', got 'panic'

What changed

Dependency: omegaconf → pydantic + pydantic-settings

omegaconf is gone. Configs are now plain dicts merged with a small deep_update helper, and validation is done by pydantic. pydantic-settings is the new dep; pyyaml is now an explicit (was transitive) dep.

Module schema: `class Config(BaseModuleConfig)`

Every module's options = {...} + options_desc = {...} pair has been migrated to a typed pydantic class:

# before
options = {"threads": 50, "version": "1.2.5"}
options_desc = {"threads": "How many threads", "version": "httpx version"}

# after
class Config(BaseModuleConfig):
    threads: int = Field(50, description="How many threads")
    version: str = Field("1.2.5", description="httpx version")

BaseModuleConfig carries the three universal options (batch_size, module_threads, module_timeout), so every module accepts those without redeclaring them. 114 modules migrated via codemod (bbot/scripts/migrate_options_to_config.py); a handful tightened by hand to use proper types where they matter:

nuclei.mode: Literal["manual", "technology", "severe", "budget"]
baddns.min_severity: Literal["INFO", "LOW", "MEDIUM", "HIGH", "CRITICAL"]
baddns.min_confidence: Literal["UNKNOWN", "LOW", "MEDIUM", "HIGH", "CONFIRMED"]
baddns.custom_nameservers: list[str]
deps.behavior: Literal["abort_on_failure", "retry_failed", "ignore_failed", "disable", "force_install"]

Composite schema, single-pass validation

ModuleLoader.validation_schema builds a composite pydantic model on demand:

FullPresetSchema
  ├─ (PresetSchema fields: target, modules, flags, …)
  └─ config: FullBBOTConfig
              ├─ (BBOTConfig sections: scope, dns, web, engine, deps, …)
              └─ modules: ModulesSchema
                          ├─ nuclei:  NucleiModuleConfig
                          ├─ httpx:   HttpxModuleConfig
                          └─ … one field per loaded module

A single model_validate() call catches typos at every layer. The schema rebuilds when new module dirs are discovered (chicken-and-egg with module_dirs is auto-resolved — validate_preset preloads any custom dirs declared in the preset before validating).

Module Config classes captured via AST + exec

Preload still doesn't import modules (so bbot -l works on hosts missing module deps). The class Config block is captured via ast.get_source_segment, then exec'd at schema-build time in a controlled namespace (Field, BaseModuleConfig, typing.*). Pydantic does the rest. No hand-rolled type whitelist, no annotation-string parsing — anything pydantic understands works (Literal, Union, list[str], etc.).

Cleanup

BBOTArgs.exclude_from_validation regex and universal_module_options dict deleted; BaseModuleConfig covers their job structurally.
BBOTArgs.validate() body shrank from ~12 lines of dotted-path lookup to a 4-line validate_preset(...) delegation. CLI typos surface through the same code path that handles preset YAML.
bbot/scripts/docs.py reads universal-option descriptions from BaseModuleConfig.model_fields directly — no separate constant to keep in sync.
All omegaconf-specific test helpers and assertions (OmegaConf.merge, omegaconf.errors.ReadonlyConfigError, dot-attribute access) replaced.

Breaking changes

nuclei.mode rejects unknown values at validation time, not at scan startup with a warning.
Unknown top-level preset keys (modlues, flgas, etc.) raise ValidationError instead of being silently ignored.
Unknown module names in modules: / output_modules: / exclude_modules: raise with a closest-match suggestion.
self.options is no longer populated for migrated modules. Modules that read user-supplied values must use self.config.get(...). (This caught two pre-existing bugs in bbot/modules/templates/gitlab.py and bbot/modules/lightfuzz/lightfuzz.py where module behavior was relying on self.options instead of self.config — both fixed in this PR.)
The inline ${env:FOO} resolver inside YAML values is gone. Pydantic-settings' native env handling (BBOT_* prefix) is available for whole-field overrides.

Files of interest

bbot/core/config/models.py — BBOTConfig, PresetSchema, BaseModuleConfig, sub-models. Schema only — defaults live in defaults.yml.
bbot/core/config/merge.py — deep_update, dotted_get/dotted_set. Replaces OmegaConf.merge/select/from_cli.
bbot/core/modules.py — _extract_pydantic_config, _exec_config_class, _build_validation_schema, ModuleLoader.validation_schema.
bbot/scanner/preset/validate.py — validate_preset, validate_preset_file, PresetValidationError. Single-pass aggregator with closest-match suggestions.
bbot/scripts/migrate_options_to_config.py — codemod that performed the module migration. Idempotent; included for reproducibility.

Test plan

All test_step_1 config/preset/cli/validate tests pass (39/39 in the touched set).
Sample module integration tests pass (gitlab_onprem, virustotal, sslcert, robots, crt, httpx, etc.).
bbot --help, bbot -l, bbot -lp, bbot --current-preset all work and produce the same shape of output as before.
validate_preset({…}) round-trips: known-good presets return []; presets with seeded typos produce specific, labeled errors.
Custom module_dirs declared in a preset get preloaded before per-module validation, so user modules aren't falsely flagged.

github-actions · 2026-04-24T17:07:33Z

📊 Performance Benchmark Report

Comparing dev (baseline) vs preset-validation (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name	📏 Base	📏 Current	📈 Change	🎯 Status
Bloom Filter Dns Mutation Tracking Performance	`4.30ms`	`4.31ms`	+0.3% ⚪	✅
Bloom Filter Large Scale Dns Brute Force	`17.86ms`	`17.44ms`	-2.4% ⚪	✅
Large Closest Match Lookup	`362.80ms`	`351.35ms`	-3.2% ⚪	✅
Realistic Closest Match Workload	`194.55ms`	`188.28ms`	-3.2% ⚪	✅
Event Memory Medium Scan	`1784 B/event`	`1779 B/event`	-0.3% ⚪	✅
Event Memory Large Scan	`1768 B/event`	`1768 B/event`	-0.0% ⚪	✅
Event Validation Full Scan Startup Small Batch	`419.13ms`	`367.43ms`	-12.3% 🟢🟢	🚀
Event Validation Full Scan Startup Large Batch	`575.56ms`	`527.78ms`	-8.3% ⚪	✅
Make Event Autodetection Small	`31.18ms`	`31.18ms`	-0.0% ⚪	✅
Make Event Autodetection Large	`319.40ms`	`316.00ms`	-1.1% ⚪	✅
Make Event Explicit Types	`13.97ms`	`13.96ms`	-0.0% ⚪	✅
Excavate Single Thread Small	`4.073s`	`3.862s`	-5.2% ⚪	✅
Excavate Single Thread Large	`9.678s`	`9.403s`	-2.8% ⚪	✅
Excavate Parallel Tasks Small	`4.258s`	`4.029s`	-5.4% ⚪	✅
Excavate Parallel Tasks Large	`7.414s`	`7.150s`	-3.6% ⚪	✅
Is Ip Performance	`3.25ms`	`3.19ms`	-1.9% ⚪	✅
Make Ip Type Performance	`11.82ms`	`11.58ms`	-2.1% ⚪	✅
Mixed Ip Operations	`4.65ms`	`4.54ms`	-2.4% ⚪	✅
Memory Use Web Crawl	`50.0 MB`	`40.8 MB`	-18.3% 🟢🟢	🚀
Memory Use Subdomain Enum	`19.4 MB`	`16.9 MB`	-12.9% 🟢🟢	🚀
Scan Throughput 100	`8.378s`	`7.297s`	-12.9% 🟢🟢	🚀
Scan Throughput 1000	`38.970s`	`36.779s`	-5.6% ⚪	✅
Typical Queue Shuffle	`66.01µs`	`61.43µs`	-6.9% ⚪	✅
Priority Queue Shuffle	`740.25µs`	`693.80µs`	-6.3% ⚪	✅

🎯 Performance Summary

+ 4 improvements 🚀
  20 unchanged ✅

🔍 Significant Changes (>10%)

Event Validation Full Scan Startup Small Batch: 12.3% 🚀 faster
Memory Use Web Crawl: 18.3% 🚀 less memory
Memory Use Subdomain Enum: 12.9% 🚀 less memory
Scan Throughput 100: 12.9% 🚀 faster

🐍 Python Version 3.11.15

codecov · 2026-04-24T17:30:20Z

Codecov Report

❌ Patch coverage is 94.43535% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (9de96b6) to head (3aa43ed).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines	Patch %	Lines
bbot/scanner/preset/validate.py	71%	29 Missing ⚠️
bbot/core/config/merge.py	65%	13 Missing ⚠️
bbot/core/modules.py	90%	10 Missing ⚠️
bbot/core/config/files.py	74%	4 Missing ⚠️
bbot/scanner/preset/args.py	87%	4 Missing ⚠️
bbot/core/core.py	91%	2 Missing ⚠️
bbot/core/helpers/misc.py	67%	1 Missing ⚠️
bbot/modules/baddns.py	93%	1 Missing ⚠️
bbot/modules/lightfuzz/lightfuzz.py	90%	1 Missing ⚠️
bbot/modules/output/stdout.py	93%	1 Missing ⚠️
... and 2 more

Additional details and impacted files

@@          Coverage Diff           @@
##             dev   #3058    +/-   ##
======================================
+ Coverage     91%     91%    +1%     
======================================
  Files        437     441     +4     
  Lines      37507   38326   +819     
======================================
+ Hits       33922   34679   +757     
- Misses      3585    3647    +62

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheTechromancer · 2026-05-01T19:10:02Z

Let's consider replacing the auth_required meta boolean with a Pydantic mandatory annotation on individual module config fields. This would identify any modules that can't run without input from the user. Similarly, we should replace the no_secrets_config system with a sensitive Pydantic annotation, to indicate which fields should be hidden/encrypted.

@ausmaster @GabKodes

preset validation phase 1

8618892

TheTechromancer marked this pull request as draft April 24, 2026 16:40

don't duplicate defaults in code

ca4e742

TheTechromancer added 7 commits April 27, 2026 13:54

preset validation wip

7578d1d

ruffed

393ae72

preset tests

58c159d

fix tests

39c7f2c

cleanup, test fixes

20af84e

fix tests, again

4972c12

small improvements, rename deep_update -> deep_merge

9c1ea26

TheTechromancer self-assigned this Apr 30, 2026

TheTechromancer marked this pull request as ready for review April 30, 2026 17:41

TheTechromancer added 4 commits April 30, 2026 13:52

remove env var interpolation docs

bce3e4b

bring back baddns case insensitivity

e48e570

allow field validators

eca363b

remove redundant validation/sanization logic

3aa43ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preset validation#3058

Preset validation#3058
TheTechromancer wants to merge 13 commits intodevfrom
preset-validation

TheTechromancer commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

TheTechromancer commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TheTechromancer commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preset validation rework

New public API

Sample errors

What changed

Dependency: omegaconf → pydantic + pydantic-settings

Module schema: class Config(BaseModuleConfig)

Composite schema, single-pass validation

Module Config classes captured via AST + exec

Cleanup

Breaking changes

Files of interest

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Performance Benchmark Report

🎯 Performance Summary

🔍 Significant Changes (>10%)

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheTechromancer commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TheTechromancer commented Apr 24, 2026 •

edited

Loading

Module schema: `class Config(BaseModuleConfig)`

github-actions Bot commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading

TheTechromancer commented May 1, 2026 •

edited

Loading