[ISSUE 617] ProtectionManager Without Ethernet / Fault Runtime Redesign#628
[ISSUE 617] ProtectionManager Without Ethernet / Fault Runtime Redesign#628jorgesg82 wants to merge 20 commits intodevelopmentfrom
Conversation
- FaultController: decouple FaultCause from DiagnosticRecord; add early-fault bootstrap so faults before start() are latched and the runtime starts directly in FAULT; remove FaultRuntime (superseded) - Diagnostics::Hub: fixed-capacity history ring and pending queue; flush_urgent() drains urgent records first; no heap in publish/flush - RecordFactory: explicit conversion FaultCause -> DiagnosticRecord via FaultDiagnosticMapper; runtime_panic / runtime_fault / runtime_warning / runtime_info factories with RuntimeSourceMetadata - Runtime reporters: PANIC/FAULT use std::source_location, format into fixed stack buffer, no shared mutable metadata (ISR-safe); WARNING/INFO go through RuntimeDiagnosticReporter; ErrorHandler/InfoWarning are now thin compatibility aliases - ProtectionEngine: evaluate() throttles fault notifications via notify_delay_in_microseconds; non-fatal rule edges published to Hub
Replace direct ErrorHandler / InfoWarning calls with the new macro
facade (PANIC, FAULT, WARNING, INFO) across all HALAL peripheral
drivers, Ethernet/TCP/UDP stacks, and service implementations.
Fixes MDMA callsite that was using invalid string concatenation with a
printf-style API: PANIC("MDMA Transfer Error, code: %lu", error_code).
- CMakeLists: remove FaultRuntime from build, add DiagnosticSinks - ST-LIB.hpp: update includes after FaultRuntime removal - StaticVector, MeanCalculator, Zeroing, StateOrder variants: minor include and compatibility fixes - README: update status notes
- DiagnosticsHub tests: history when no sinks, per-sink retry, bounded pending queue, reinstall clears latch, fault transitions once, delegation to nested operational machine, early-fault bootstrap (fault before start), source location captured in runtime reporters, invalid rule config rejected without side effects - PANIC/FAULT enter global FAULT and publish URGENT diagnostic record - WARNING/INFO publish without entering FAULT - ProtectionEngine: evaluate triggers fault and publishes protection event - TestAccess: white-box accessors for DiagnosticsHub and ProtectionEngine - Update adc_test, spi2_test, dma2_test, encoder_test, common_tests for new reporter API
Documents the current model: FaultController ownership, FaultPolicy nesting, PANIC/FAULT/WARNING/INFO semantics, early-fault bootstrap, FaultCause vs DiagnosticRecord separation, Diagnostics Hub memory policy, and C++23 design choices.
ST-LIB Release Plan
Pending changes
|
There was a problem hiding this comment.
Pull request overview
Redesigns protections + fault handling to remove Ethernet dependency and centralize the global fault runtime under FaultController, with unified diagnostics reporting via PANIC/FAULT/WARNING/INFO and a new Board<Policy,...> bootstrap contract.
Changes:
- Introduces
FaultController+ProtectionEngineand updates protection evaluation/reporting to drive global fault viaFaultController::request_fault(...). - Adds a diagnostics hub (records, sinks, formatter, timestamps) and migrates legacy
ErrorHandler/InfoWarningcall sites to diagnostics + fault runtime. - Updates
BoardAPI/contract, bootstrap path, CMake wiring, and tests/docs to match the new integration model.
Reviewed changes
Copilot reviewed 110 out of 110 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/st-lib-board-contract.md | Documents Board<Policy,...> contract and policy requirements |
| Tests/spi2_test.cpp | Updates tests to new panic reporter namespace |
| Tests/dma2_test.cpp | Updates tests to new panic reporter namespace |
| Tests/adc_test.cpp | Updates tests to new panic reporter namespace |
| Tests/Time/encoder_test.cpp | Updates tests to new panic reporter namespace |
| Tests/Time/common_tests.cpp | Provides test panic/fault reporters integrated with FaultController |
| Tests/TestAccess.hpp | Adds test-only accessors to reset diagnostics/fault/protection runtime |
| Tests/CMakeLists.txt | Links new diagnostics + protection runtime sources into test executable |
| Src/ST-LIB_LOW/Sensors/Sensor/Sensor.cpp.old | Removes legacy/old sensor file |
| Src/ST-LIB_LOW/Sd/Sd.cpp | Migrates SD error paths to PANIC(...) |
| Src/ST-LIB_LOW/ST-LIB_LOW.cpp | Removes legacy STLIB_LOW::start() entry point |
| Src/ST-LIB_LOW/HalfBridge/HalfBridge.cpp.old | Removes legacy/old half-bridge file |
| Src/ST-LIB_LOW/ErrorHandler/ErrorHandler.cpp | Replaces legacy error handler transport with panic/fault reporters + diagnostics |
| Src/ST-LIB_LOW/Clocks/Stopwatch.cpp.old | Removes legacy/old stopwatch implementation |
| Src/ST-LIB_HIGH/ST-LIB_HIGH.cpp | Removes legacy STLIB_HIGH::start() entry point |
| Src/ST-LIB_HIGH/Protections/ProtectionManager.cpp | Removes legacy ProtectionManager implementation |
| Src/ST-LIB_HIGH/Protections/ProtectionEngine.cpp | Adds new protection evaluation/publishing engine |
| Src/ST-LIB_HIGH/Protections/FaultController.cpp | Adds new global fault runtime controller and cause→diagnostic mapping |
| Src/ST-LIB_HIGH/Protections/Boundary.cpp | Removes legacy boundary formatting lookup |
| Src/ST-LIB.cpp | Removes legacy STLIB::start() / STLIB::update() entry points |
| Src/HALAL/Services/Time/Scheduler.cpp | Fixes prescaler programming; removes out-of-line get_global_tick() |
| Src/HALAL/Services/Time/RTC.cpp | Makes RTC data storage consistent and adds no-RTC stubs + is_started() |
| Src/HALAL/Services/PWM/PhasedPWM/PhasedPWM.cpp.old | Removes legacy/old phased PWM implementation |
| Src/HALAL/Services/PWM/DualPhasedPWM/DualPhasedPWM.cpp.old | Removes legacy/old dual phased PWM implementation |
| Src/HALAL/Services/InputCapture/InputCapture.cpp.old | Removes legacy/old input capture implementation |
| Src/HALAL/Services/InfoWarning/InfoWarning.cpp | Replaces legacy warning transport with diagnostics-based reporting |
| Src/HALAL/Services/Flash/Flash.cpp | Migrates flash error paths to PANIC(...) |
| Src/HALAL/Services/FMAC/FMAC.cpp | Migrates FMAC error paths to PANIC(...) |
| Src/HALAL/Services/DigitalOutputService/DigitalOutputService.cpp | Migrates digital output errors to PANIC(...) |
| Src/HALAL/Services/DigitalInputService/DigitalInputService.cpp | Migrates digital input errors to PANIC(...) |
| Src/HALAL/Services/Diagnostics/DiagnosticsHub.cpp | Adds diagnostics hub (history/pending queues, publishing, flushing) |
| Src/HALAL/Services/Diagnostics/DiagnosticTimestampProvider.cpp | Adds timestamp capture via RTC or scheduler uptime |
| Src/HALAL/Services/Diagnostics/DiagnosticSinks.cpp | Adds UART + Ethernet transport sinks and default sink install |
| Src/HALAL/Services/Diagnostics/DiagnosticFormatter.cpp | Adds record→human-readable formatting (runtime/protection) |
| Src/HALAL/Services/Communication/UART/UART.cpp | Migrates UART inscription error to PANIC(...) |
| Src/HALAL/Services/Communication/SPI/SPI.cpp | Migrates SPI service errors to PANIC(...) |
| Src/HALAL/Services/Communication/I2C/I2C.cpp | Migrates I2C service errors to PANIC(...) |
| Src/HALAL/Services/Communication/FDCAN/FDCAN.cpp | Migrates FDCAN errors to PANIC(...) |
| Src/HALAL/Services/Communication/Ethernet/LWIP/UDP/DatagramSocket.cpp | Migrates UDP socket errors to PANIC(...) |
| Src/HALAL/Services/Communication/Ethernet/LWIP/TCP/Socket.cpp | Migrates TCP socket errors to PANIC(...) |
| Src/HALAL/Services/Communication/Ethernet/LWIP/TCP/ServerSocket.cpp | Migrates TCP server errors to PANIC(...) |
| Src/HALAL/Services/Communication/Ethernet/LWIP/Ethernet.cpp | Migrates Ethernet start/update errors to PANIC(...) |
| Src/HALAL/Models/TimerPeripheral/TimerPeripheral.cpp.old | Removes legacy/old timer peripheral model |
| Src/HALAL/Models/TimerDomain/TimerDomain.cpp | Fixes input-capture info table initialization |
| Src/HALAL/Models/SPI/SPI2.cpp | Migrates SPI2 domain IRQ/error paths to PANIC(...) |
| Src/HALAL/Models/PinModel/Pin.cpp | Migrates pin double-inscribe error to PANIC(...) |
| Src/HALAL/Models/MDMA/MDMA.cpp | Migrates MDMA errors to PANIC(...) and fixes formatting |
| Src/HALAL/Models/LowPowerTimer/LowPowerTimer.cpp | Migrates LPTIM init errors to PANIC(...) |
| Src/HALAL/Models/HALconfig/Halconfig.cpp | Migrates clock config errors to PANIC(...) |
| Src/HALAL/Models/DMA/DMA.cpp | Migrates DMA errors to PANIC(...) |
| Src/HALAL/HALAL.cpp | Removes legacy HALAL::start(...) bootstrap entry points |
| README.md | Links new protections/diagnostics + board contract docs |
| Inc/ST-LIB_LOW/StateMachine/StateOrder.hpp | Migrates state order errors to PANIC(...) |
| Inc/ST-LIB_LOW/StateMachine/StateMachine.hpp | Migrates state machine runtime/validation errors to PANIC(...) |
| Inc/ST-LIB_LOW/StateMachine/StackStateOrder.hpp | Migrates stack state order errors to PANIC(...) |
| Inc/ST-LIB_LOW/StateMachine/HeapStateOrder.hpp | Migrates heap state order errors to PANIC(...) |
| Inc/ST-LIB_LOW/Sensors/Sensor/Sensor.hpp.old | Removes legacy/old sensor header |
| Inc/ST-LIB_LOW/Sensors/PWMSensor/PWMSensor.hpp.old | Removes legacy/old PWM sensor header |
| Inc/ST-LIB_LOW/Sd/Sd.hpp | Migrates SD API errors to PANIC(...) |
| Inc/ST-LIB_LOW/ST-LIB_LOW.hpp | Removes legacy STLIB_LOW class API |
| Inc/ST-LIB_LOW/HalfBridge/HalfBridge.hpp.old | Removes legacy/old half-bridge header |
| Inc/ST-LIB_LOW/ErrorHandler/ErrorHandler.hpp | Replaces legacy error handler API with PANIC/FAULT macros + reporters |
| Inc/ST-LIB_LOW/Clocks/Stopwatch.hpp.old | Removes legacy/old stopwatch header |
| Inc/ST-LIB_HIGH/ST-LIB_HIGH.hpp | Exposes new protections API headers; removes ProtectionManager include |
| Inc/ST-LIB_HIGH/Protections/SampleSource.hpp | Adds typed sample source wrapper for protections |
| Inc/ST-LIB_HIGH/Protections/Rules.hpp | Adds validated rule constructors (Protections::Rules::*) |
| Inc/ST-LIB_HIGH/Protections/ProtectionTypes.hpp | Adds shared protection runtime types (events/snapshots/encoding) |
| Inc/ST-LIB_HIGH/Protections/ProtectionManager.hpp | Removes legacy ProtectionManager header and macros |
| Inc/ST-LIB_HIGH/Protections/ProtectionErrors.hpp | Adds error enums for rule/protection configuration/registration |
| Inc/ST-LIB_HIGH/Protections/ProtectionEngine.hpp | Adds ProtectionEngine public API and handle type |
| Inc/ST-LIB_HIGH/Protections/ProtectionConcepts.hpp | Adds concepts for supported samples and readable sources |
| Inc/ST-LIB_HIGH/Protections/Notification.hpp | Removes legacy notification order type |
| Inc/ST-LIB_HIGH/Protections/FaultController.hpp | Adds FaultController API + fault policy runtime installation |
| Inc/ST-LIB_HIGH/Control/Blocks/Zeroing.hpp | Migrates zeroing overflow error to PANIC(...) |
| Inc/ST-LIB_HIGH/Control/Blocks/MeanCalculator.hpp | Migrates mean calculator error to PANIC(...) |
| Inc/ST-LIB.hpp | Introduces fault policy types + Board<Policy,...> bootstrap installing diagnostics/fault runtime |
| Inc/HALAL/Services/Watchdog/Watchdog.hpp | Migrates watchdog interval errors to PANIC(...) |
| Inc/HALAL/Services/Time/Scheduler.hpp | Inlines get_global_tick() implementation |
| Inc/HALAL/Services/Time/RTC.hpp | Makes RTC API always available; adds is_started() |
| Inc/HALAL/Services/PWM/PhasedPWM/PhasedPWM.hpp.old | Removes legacy/old phased PWM header |
| Inc/HALAL/Services/PWM/PWM.hpp | Migrates PWM channel-not-ready error to PANIC(...) |
| Inc/HALAL/Services/PWM/DualPhasedPWM/DualPhasedPWM.hpp.old | Removes legacy/old dual phased PWM header |
| Inc/HALAL/Services/PWM/DualPWM.hpp | Migrates dual PWM errors to PANIC(...) |
| Inc/HALAL/Services/InputCapture/InputCapture.hpp.old | Removes legacy/old input capture header |
| Inc/HALAL/Services/InputCapture/InputCapture.hpp | Migrates channel-not-ready errors to PANIC(...) |
| Inc/HALAL/Services/InfoWarning/InfoWarning.hpp | Replaces legacy InfoWarning with diagnostics reporter + WARNING/INFO macros |
| Inc/HALAL/Services/FMAC/FMAC.hpp | Updates comment to match new bootstrap wording |
| Inc/HALAL/Services/Encoder/Encoder.hpp | Migrates encoder errors to PANIC(...) |
| Inc/HALAL/Services/Diagnostics/Diagnostics.hpp | Adds diagnostics public API (records, sinks, hub, runtime install) |
| Inc/HALAL/Services/DFSDM/DFSDM.hpp | Migrates DFSDM errors to PANIC(...) |
| Inc/HALAL/Services/Communication/FDCAN/FDCAN.hpp | Migrates FDCAN inscription error to PANIC(...) |
| Inc/HALAL/Services/Communication/Ethernet/NewEthernet.hpp | Removes legacy error/warning update hooks from Ethernet domain update |
| Inc/HALAL/Services/ADC/ADC.hpp | Migrates ADC DMA/init/channel/start errors to PANIC(...) |
| Inc/HALAL/Models/TimerPeripheral/TimerPeripheral.hpp.old | Removes legacy/old timer peripheral header |
| Inc/HALAL/Models/TimerDomain/TimerDomain.hpp | Migrates timer-domain errors to PANIC(...) |
| Inc/HALAL/Models/SPI/SPI2.hpp | Migrates SPI2 domain errors to PANIC(...) |
| Inc/HALAL/Models/Packets/SPIOrder.hpp | Migrates SPI order validation errors to PANIC(...) |
| Inc/HALAL/Models/MPUManager/MPUManager.hpp | Migrates MPU heap overflow error to PANIC(...) |
| Inc/HALAL/Models/MDMA/MDMA.hpp | Migrates MDMA node/link errors to PANIC(...) |
| Inc/HALAL/Models/DMA/DMA2.hpp | Migrates DMA init errors to PANIC(...) |
| Inc/HALAL/HALAL.hpp | Removes legacy HALAL::start(...) declarations |
| Inc/C++Utilities/StaticVector.hpp | Migrates capacity exceeded error to PANIC(...) |
| Inc/C++Utilities/CppUtils.hpp | Adds std::byte/expected/optional/variant utilities to common imports |
| Inc/C++Utilities/CppImports.hpp | Adds missing standard headers for new C++23 utilities |
| CMakeLists.txt | Wires diagnostics + new protections runtime into build; removes legacy entrypoint sources |
| .changesets/pm-no-eth-major.md | Adds major release notes + migration guidance |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| visit( | ||
| [](auto& protection) { | ||
| const Protections::ProtectionEvaluation evaluation = protection.evaluate(); | ||
| publish_edge_events(protection, evaluation); | ||
|
|
||
| if (evaluation.has_active_fault) { | ||
| request_fault_if_due(protection, evaluation); | ||
| } |
There was a problem hiding this comment.
publish_edge_events(...) and request_fault_if_due(...) are static member templates of ProtectionEngine, but they’re called unqualified from inside a lambda passed to visit. Unqualified lookup inside the lambda won’t find class members, so this should fail to compile on standard-conforming compilers. Qualify the calls (e.g., ProtectionEngine::publish_edge_events(...) / ProtectionEngine::request_fault_if_due(...)) or move these helpers into the surrounding anonymous namespace as free functions.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 111 out of 111 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RuntimeFatalMessage format_runtime_fatal_message(const char* format, va_list arguments) { | ||
| RuntimeFatalMessage message{}; | ||
| const int32_t written = vsnprintf(message.buffer, sizeof(message.buffer), format, arguments); | ||
| message.truncated = written < 0 || static_cast<size_t>(written) >= sizeof(message.buffer); | ||
| return message; |
There was a problem hiding this comment.
format_runtime_fatal_message consumes a va_list passed by value via vsnprintf. On some platforms va_list is a mutable type, so consuming it in a helper can leave the caller’s va_list in an indeterminate state before va_end, which is undefined behavior. Make a local copy with va_copy inside the helper (or pass a va_list* and copy there) and call vsnprintf on the copy, then va_end the copy in the helper.
| void publish_runtime_diagnostic( | ||
| Diagnostics::Severity severity, | ||
| const std::source_location& location, | ||
| const char* format, | ||
| va_list arguments | ||
| ) { | ||
| char buffer[Diagnostics::Config::runtime_message_capacity + 1]{}; | ||
| const int32_t written = vsnprintf(buffer, sizeof(buffer), format, arguments); | ||
|
|
There was a problem hiding this comment.
publish_runtime_diagnostic consumes a va_list passed in from the caller. Because va_list may be a mutable type, consuming it in this helper can make the caller’s va_list indeterminate before va_end, which is undefined behavior on some ABIs. Copy the va_list with va_copy in the helper (or in the caller) and pass/consume the copy instead.
| const uint64_t tick = Scheduler::get_global_tick(); | ||
| const uint64_t last_publish_tick = protection.get_last_fault_publish_tick(); | ||
| if (last_publish_tick != 0 && | ||
| tick < last_publish_tick + Protections::Config::notify_delay_in_microseconds) { | ||
| return; |
There was a problem hiding this comment.
The notify throttling check uses tick < last_publish_tick + notify_delay. Since this is unsigned arithmetic, last_publish_tick + notify_delay can overflow and break the comparison when last_publish_tick is near UINT64_MAX. Prefer an overflow-safe form like tick - last_publish_tick < notify_delay (guarding last_publish_tick != 0) to keep throttling correct across wraparound.
There was a problem hiding this comment.
PhasedPWM.old has no 'new' counterpart, others that were deleted already have a new implementation, like Encoder or InputCapture.
Maybe it would be fine to keep the old implementations that don't have a new implementation done
There was a problem hiding this comment.
Isn't it just a configuration layer over a PWM? If so, it could just be replaced by a funciton in the pwm class, but I don't actually know how it works so don't take my word
There was a problem hiding this comment.
It does some different things from a normal pwm and does it all with lower level handling than what was there in the old PWM.cpp and DualPWM.cpp, though there is also a couple comments that imply it was unfinished...
There was a problem hiding this comment.
Same as the other, this is not implemented and might be good to keep until it is implemented.
There was a problem hiding this comment.
It is just 2 PWMs, there's not much logic in it. In any case I would just refactor it now rather than keep it, so that we can remove the rest of the old code. But it is not so useful really, and I think no one is using it right now
There was a problem hiding this comment.
It looks like HalfBridge uses Phased PWMs, I agree that this implementation is not worth keeping
| TimerDomain::InputCaptureInfo* TimerDomain::input_capture_info[max_instances] | ||
| [input_capture_channels] = { | ||
| &input_capture_info_dummy | ||
| {&input_capture_info_dummy} |
| void append_formatted(char* buffer, size_t buffer_size, const char* format, ...) { | ||
| if (buffer_size == 0) { | ||
| return; | ||
| } | ||
|
|
||
| const size_t offset = bounded_strnlen(buffer, buffer_size - 1); | ||
| if (offset >= buffer_size - 1) { | ||
| buffer[buffer_size - 1] = '\0'; | ||
| return; | ||
| } | ||
|
|
||
| va_list arguments; | ||
| va_start(arguments, format); | ||
| vsnprintf(buffer + offset, buffer_size - offset, format, arguments); | ||
| va_end(arguments); | ||
| } |
There was a problem hiding this comment.
It would be better to have a struct, like a string builder to keep the current offset since with this it will call bounded_strnlen every call which is O(n) and pretty slow.
For an example implementation of a string builder, see: https://github.com/victor-Lopez25/viclib/blob/main/vl_build.h#L1099
| uint64_t tick = global_tick_us_; | ||
| if (Scheduler_global_timer != nullptr) { | ||
| tick += Scheduler_global_timer->CNT; | ||
| } |
There was a problem hiding this comment.
Scheduler::get_global_tick() assumes the board has been started up so Scheduler_global_timer is not null, this condition is not necessary and will make this function slower for no reason
Summary
Closes #617
This PR removes the protection subsystem's dependency on Ethernet and redesigns the global fault runtime around a permanent top-level machine owned by
FaultController.The new design keeps the global states fixed as
OPERATIONALandFAULT, and leaves the user responsible only for declaring an optional nested operational state machine and an optionalon_fault_entercallback.Main Changes
ProtectionManagermodel withProtectionEngineand a single flat protection collection.FaultControllerthe only owner of the global fault runtime and the only valid entry point to global fault throughFaultController::request_fault(...).Board::init().PANIC(...),FAULT(...),WARNING(...), andINFO(...).FaultCausefromDiagnosticRecord.Scheduler::get_global_tick()instead of loop-frequency assumptions.Breaking Changes
Boardnow requires the fault policy type as its first template argument.FAULTstate.ProtectionEngineplusProtections::Rules::*.Board::init(), not legacySTLIB::*orSTLIB_LOW/HIGH::*entry points.New Board API
If the application does not need a nested machine:
using AppBoard = ST_LIB::Board<ST_LIB::FaultPolicyNoMachine<on_fault_enter>, led, pwm, adc>;If the application needs neither a nested machine nor a fault-entry callback:
using AppBoard = ST_LIB::Board<ST_LIB::DefaultFaultPolicy, led, pwm, adc>;Fault Entry Contract
FaultController::request_fault(...).PANIC(...)andFAULT(...)also map toFaultController::request_fault(...).WARNING(...)andINFO(...)publish diagnostics only and do not enter global fault.Migration Guide
Board<YourFaultPolicy, ...>.FaultPolicy<app_machine, on_fault_enter>.FAULT.ProtectionEngine::create_protection(...)andProtections::Rules::*.PANIC(...),FAULT(...),WARNING(...), andINFO(...).Board::init().