Skip to content

Preset validation#3058

Open
TheTechromancer wants to merge 13 commits intodevfrom
preset-validation
Open

Preset validation#3058
TheTechromancer wants to merge 13 commits intodevfrom
preset-validation

Conversation

@TheTechromancer
Copy link
Copy Markdown
Collaborator

@TheTechromancer TheTechromancer commented Apr 24, 2026

Preset validation rework

Replaces silent acceptance of typos and wrong types in BBOT presets with strict, schema-driven validation. Catches mistakes like modlues: [...], scope: {strct: true}, modules: {nucleii: {tgas: "x"}}, and nuclei.mode: aggressive at config-load time instead of after a multi-hour scan produces nothing.

New public API

Validate any preset dict (e.g. from yaml.safe_load) without instantiating a Scanner:

from bbot.scanner import validate_preset, validate_preset_file

errors = validate_preset({
    "modules": ["nuclei"],
    "config": {
        "modules": {
            "nuclei": {"mode": "manual", "ratelimit": 100},
        },
    },
})
if errors:
    for e in errors:
        print(e)   # e.g. [module:nuclei:mode] Expected one of 'manual', ...
    raise SystemExit(1)

# Convenience wrapper for files
errors = validate_preset_file("/path/to/preset.yml")

Returns a list of PresetValidationError objects. Empty list = valid. All errors across all layers are aggregated in a single pass, so a user with five typos sees five errors at once.

The pydantic schemas themselves are also exported, for callers that want type-checking, doc generation, or to validate just one layer (e.g. the global config) on its own:

from bbot.scanner import (
    BBOTConfig,         # global config block (scope/dns/web/engine/deps/…)
    BaseModuleConfig,   # base for every module's `class Config(BaseModuleConfig)`
    PresetSchema,       # top-level preset shape (target, modules, flags, config, …)
    ScopeConfig, DnsConfig, WebConfig, EngineConfig, DepsConfig, DepsToolConfig,
)

These are validation schemas only — they have no defaults of their own (defaults live in bbot/defaults.yml). For full preset validation that also covers per-module Config blocks, use validate_preset(); the composite schema is built dynamically from the loaded module set, so it isn't a static class.

Sample errors

[preset:modlues]                            Unknown option: 'modlues' (value: ['nuclei'])
[config:scope.strct]                        Could not find config option "scope.strct". Did you mean "scope.strict"?
[config:web.http_timeout]                   Expected an integer, got str: 'not-a-number'
[preset:config.modules.nucleii]             Could not find module "nucleii". Did you mean "nuclei"?
[module:nuclei:tgas]                        Unknown option: 'tgas' (value: 'apache')
[module:nuclei:mode]                        Expected one of 'manual', 'technology', 'severe' or 'budget', got 'aggressive'
[module:baddns:custom_nameservers.0]        Expected a string, got int: 1
[config:deps.behavior]                      Expected one of 'abort_on_failure', 'retry_failed', 'ignore_failed', 'disable' or 'force_install', got 'panic'

What changed

Dependency: omegaconf → pydantic + pydantic-settings

omegaconf is gone. Configs are now plain dicts merged with a small deep_update helper, and validation is done by pydantic. pydantic-settings is the new dep; pyyaml is now an explicit (was transitive) dep.

Module schema: class Config(BaseModuleConfig)

Every module's options = {...} + options_desc = {...} pair has been migrated to a typed pydantic class:

# before
options = {"threads": 50, "version": "1.2.5"}
options_desc = {"threads": "How many threads", "version": "httpx version"}

# after
class Config(BaseModuleConfig):
    threads: int = Field(50, description="How many threads")
    version: str = Field("1.2.5", description="httpx version")

BaseModuleConfig carries the three universal options (batch_size, module_threads, module_timeout), so every module accepts those without redeclaring them. 114 modules migrated via codemod (bbot/scripts/migrate_options_to_config.py); a handful tightened by hand to use proper types where they matter:

  • nuclei.mode: Literal["manual", "technology", "severe", "budget"]
  • baddns.min_severity: Literal["INFO", "LOW", "MEDIUM", "HIGH", "CRITICAL"]
  • baddns.min_confidence: Literal["UNKNOWN", "LOW", "MEDIUM", "HIGH", "CONFIRMED"]
  • baddns.custom_nameservers: list[str]
  • deps.behavior: Literal["abort_on_failure", "retry_failed", "ignore_failed", "disable", "force_install"]

Composite schema, single-pass validation

ModuleLoader.validation_schema builds a composite pydantic model on demand:

FullPresetSchema
  ├─ (PresetSchema fields: target, modules, flags, …)
  └─ config: FullBBOTConfig
              ├─ (BBOTConfig sections: scope, dns, web, engine, deps, …)
              └─ modules: ModulesSchema
                          ├─ nuclei:  NucleiModuleConfig
                          ├─ httpx:   HttpxModuleConfig
                          └─ … one field per loaded module

A single model_validate() call catches typos at every layer. The schema rebuilds when new module dirs are discovered (chicken-and-egg with module_dirs is auto-resolved — validate_preset preloads any custom dirs declared in the preset before validating).

Module Config classes captured via AST + exec

Preload still doesn't import modules (so bbot -l works on hosts missing module deps). The class Config block is captured via ast.get_source_segment, then exec'd at schema-build time in a controlled namespace (Field, BaseModuleConfig, typing.*). Pydantic does the rest. No hand-rolled type whitelist, no annotation-string parsing — anything pydantic understands works (Literal, Union, list[str], etc.).

Cleanup

  • BBOTArgs.exclude_from_validation regex and universal_module_options dict deleted; BaseModuleConfig covers their job structurally.
  • BBOTArgs.validate() body shrank from ~12 lines of dotted-path lookup to a 4-line validate_preset(...) delegation. CLI typos surface through the same code path that handles preset YAML.
  • bbot/scripts/docs.py reads universal-option descriptions from BaseModuleConfig.model_fields directly — no separate constant to keep in sync.
  • All omegaconf-specific test helpers and assertions (OmegaConf.merge, omegaconf.errors.ReadonlyConfigError, dot-attribute access) replaced.

Breaking changes

  • nuclei.mode rejects unknown values at validation time, not at scan startup with a warning.
  • Unknown top-level preset keys (modlues, flgas, etc.) raise ValidationError instead of being silently ignored.
  • Unknown module names in modules: / output_modules: / exclude_modules: raise with a closest-match suggestion.
  • self.options is no longer populated for migrated modules. Modules that read user-supplied values must use self.config.get(...). (This caught two pre-existing bugs in bbot/modules/templates/gitlab.py and bbot/modules/lightfuzz/lightfuzz.py where module behavior was relying on self.options instead of self.config — both fixed in this PR.)
  • The inline ${env:FOO} resolver inside YAML values is gone. Pydantic-settings' native env handling (BBOT_* prefix) is available for whole-field overrides.

Files of interest

Test plan

  • All test_step_1 config/preset/cli/validate tests pass (39/39 in the touched set).
  • Sample module integration tests pass (gitlab_onprem, virustotal, sslcert, robots, crt, httpx, etc.).
  • bbot --help, bbot -l, bbot -lp, bbot --current-preset all work and produce the same shape of output as before.
  • validate_preset({…}) round-trips: known-good presets return []; presets with seeded typos produce specific, labeled errors.
  • Custom module_dirs declared in a preset get preloaded before per-module validation, so user modules aren't falsely flagged.

@TheTechromancer TheTechromancer marked this pull request as draft April 24, 2026 16:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

📊 Performance Benchmark Report

Comparing dev (baseline) vs preset-validation (current)

📈 Detailed Results (All Benchmarks)

📋 Complete results for all benchmarks - includes both significant and insignificant changes

🧪 Test Name 📏 Base 📏 Current 📈 Change 🎯 Status
Bloom Filter Dns Mutation Tracking Performance 4.30ms 4.31ms +0.3%
Bloom Filter Large Scale Dns Brute Force 17.86ms 17.44ms -2.4%
Large Closest Match Lookup 362.80ms 351.35ms -3.2%
Realistic Closest Match Workload 194.55ms 188.28ms -3.2%
Event Memory Medium Scan 1784 B/event 1779 B/event -0.3%
Event Memory Large Scan 1768 B/event 1768 B/event -0.0%
Event Validation Full Scan Startup Small Batch 419.13ms 367.43ms -12.3% 🟢🟢 🚀
Event Validation Full Scan Startup Large Batch 575.56ms 527.78ms -8.3%
Make Event Autodetection Small 31.18ms 31.18ms -0.0%
Make Event Autodetection Large 319.40ms 316.00ms -1.1%
Make Event Explicit Types 13.97ms 13.96ms -0.0%
Excavate Single Thread Small 4.073s 3.862s -5.2%
Excavate Single Thread Large 9.678s 9.403s -2.8%
Excavate Parallel Tasks Small 4.258s 4.029s -5.4%
Excavate Parallel Tasks Large 7.414s 7.150s -3.6%
Is Ip Performance 3.25ms 3.19ms -1.9%
Make Ip Type Performance 11.82ms 11.58ms -2.1%
Mixed Ip Operations 4.65ms 4.54ms -2.4%
Memory Use Web Crawl 50.0 MB 40.8 MB -18.3% 🟢🟢 🚀
Memory Use Subdomain Enum 19.4 MB 16.9 MB -12.9% 🟢🟢 🚀
Scan Throughput 100 8.378s 7.297s -12.9% 🟢🟢 🚀
Scan Throughput 1000 38.970s 36.779s -5.6%
Typical Queue Shuffle 66.01µs 61.43µs -6.9%
Priority Queue Shuffle 740.25µs 693.80µs -6.3%

🎯 Performance Summary

+ 4 improvements 🚀
  20 unchanged ✅

🔍 Significant Changes (>10%)

  • Event Validation Full Scan Startup Small Batch: 12.3% 🚀 faster
  • Memory Use Web Crawl: 18.3% 🚀 less memory
  • Memory Use Subdomain Enum: 12.9% 🚀 less memory
  • Scan Throughput 100: 12.9% 🚀 faster

🐍 Python Version 3.11.15

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 94.43535% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 91%. Comparing base (9de96b6) to head (3aa43ed).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines Patch % Lines
bbot/scanner/preset/validate.py 71% 29 Missing ⚠️
bbot/core/config/merge.py 65% 13 Missing ⚠️
bbot/core/modules.py 90% 10 Missing ⚠️
bbot/core/config/files.py 74% 4 Missing ⚠️
bbot/scanner/preset/args.py 87% 4 Missing ⚠️
bbot/core/core.py 91% 2 Missing ⚠️
bbot/core/helpers/misc.py 67% 1 Missing ⚠️
bbot/modules/baddns.py 93% 1 Missing ⚠️
bbot/modules/lightfuzz/lightfuzz.py 90% 1 Missing ⚠️
bbot/modules/output/stdout.py 93% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@          Coverage Diff           @@
##             dev   #3058    +/-   ##
======================================
+ Coverage     91%     91%    +1%     
======================================
  Files        437     441     +4     
  Lines      37507   38326   +819     
======================================
+ Hits       33922   34679   +757     
- Misses      3585    3647    +62     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TheTechromancer TheTechromancer self-assigned this Apr 30, 2026
@TheTechromancer TheTechromancer marked this pull request as ready for review April 30, 2026 17:41
@TheTechromancer
Copy link
Copy Markdown
Collaborator Author

TheTechromancer commented May 1, 2026

Let's consider replacing the auth_required meta boolean with a Pydantic mandatory annotation on individual module config fields. This would identify any modules that can't run without input from the user. Similarly, we should replace the no_secrets_config system with a sensitive Pydantic annotation, to indicate which fields should be hidden/encrypted.

@ausmaster @GabKodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant