Skip to content

Add validation for EOS/FRR BFD check for OSPFv2/v3#3519

Merged
ipspace merged 4 commits into
ipspace:devfrom
snuffy22:validate-bfd
Jun 28, 2026
Merged

Add validation for EOS/FRR BFD check for OSPFv2/v3#3519
ipspace merged 4 commits into
ipspace:devfrom
snuffy22:validate-bfd

Conversation

@snuffy22

Copy link
Copy Markdown
Contributor

Checked with both FRR/EOS.

Working on the BGP part currently.

Was unsure if adding extra parameter to the show + validate was correct or how it should be done.

Can say it works and does not seem to break existing tests.

@snuffy22

Copy link
Copy Markdown
Contributor Author

There will still be a bit to write for the underlying BFD that needs to be written.

But this is able to show if BFD is working on BGP/OSPF.

Further testing such as 'strict' mode is likely to be left for another time.

@ipspace

ipspace commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Working on the BGP part currently.

Stop. One feature per PR ;)

Was unsure if adding extra parameter to the show + validate was correct or how it should be done.

That's correct.

Can say it works and does not seem to break existing tests.

Still doesn't mean it's correct ;) It's just that the happy path is not broken. You should tweak device configurations to introduce failures and check what happens.

@ipspace ipspace left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should:

  • Reorder the sequence of checks (adjacency first, then BFD)
  • Report only exceptions (otherwise you won't check the other stuff). Alternatively, if you envision BFD check to be used separately after the initial adjacency check, the you can use _common.report_state in BFD check (because we expect the adjacency to be up)
  • End the check functions with raising the happy adjacency status message (if you're changing the code, make it consistent ;)

Comment thread netsim/validate/ospf/eos.py Outdated
if not present:
raise Exception(f'Unexpected {proto_name} neighbor {id} in state {n_state.adjacencyState}')

if bfd:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would report OSPF adjacency problem before missing BFD because it's more relevant. Also, when checking multiple conditions, you cannot use _common.report_state (because it exits no matter what).

Comment thread netsim/validate/ospf/frr.py Outdated
if not present:
raise Exception(f'Unexpected OSPFv2 neighbor {id} in state {n_state.nbrState}')

if bfd:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use the "raise in all cases" approach when checking multiple conditions. You can only raise an error condition (Exception) and continue to the next check otherwise. However, it would be nice to always end with raise log.Result(adjacency_exit_msg) to get a nice success message about adjacency being in the right state.

As before, check adjacency state before checking BFD.

Comment thread netsim/validate/ospf/frr.py Outdated

if bfd:
exit_msg = f'OSPFv3 neighbor {id} is in BFD state {n_state.peerBfdInfo.status}'
if n_state.peerBfdInfo and n_state.peerBfdInfo.status == "Up":

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the sequence of checks is correct, but as above, report only the error and complete the function with raise log.Result(adjacency_msg).

@@ -24,3 +25,17 @@ links:
cost: 42
network_type: broadcast
- abr:

validate:
adj:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check adjacency state before checking BFD state, you can roll these tests into one ;)

Comment thread tests/integration/ospf/ospfv2/bfd.yml Outdated
nodes: [ r1, bb ]
plugin: ospf_neighbor(nodes.abr.ospf.router_id)
bfd:
description: Check OSPF adjacencies

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to have two checks, then this one should be named "Check BFD status"

Comment thread tests/integration/ospf/ospfv2/bfd.yml Outdated
bfd:
description: Check OSPF adjacencies
wait: ospfv2_adj_p2p
wait_msg: Waiting for OSPF adjacency process to complete

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the wait message should be "Waiting for BFD to start"

Comment thread tests/integration/ospf/ospfv3/bfd.yml Outdated
wait_msg: Waiting for OSPF adjacency process to complete
nodes: [ r1, bb ]
plugin: ospf6_neighbor(nodes.abr.ospf.router_id)
bfd:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in the other test

@snuffy22

Copy link
Copy Markdown
Contributor Author

@ipspace thanks for the many comments, I will work through them.

It was mainly a 'get it working state', I did confirm that it will error when BFD is off.

Better testing and explaining will be in the next update.

@snuffy22

Copy link
Copy Markdown
Contributor Author

Updated with your suggestions, still kept it as OSPF but added 'BFD' to it.

Tested with both EOS and FRR.

Both hosts working BFD
(py3-snuff) netlab@netlab:~/snuffy-netlab/tests/integration/ospf/ospfv3$ netlab validate
[adj]     Check OSPF adjacencies [ node(s): r1,bb ]
[PASS]    Validation succeeded on r1
[PASS]    Validation succeeded on bb
[PASS]    Test succeeded in 0.3 seconds

[bfd]     Check OSPF BFD adjacencies [ node(s): r1,bb ]
[PASS]    r1: OSPFv3 neighbor 10.0.0.2 is in BFD state Up
[PASS]    bb: OSPFv3 neighbor 10.0.0.2 is in BFD state Up
[PASS]    Test succeeded in 0.2 seconds

[SUCCESS] Tests passed: 4
One host failure (ABR/DUT) has disabled ospf bfd
(py3-snuff) netlab@netlab:~/snuffy-netlab/tests/integration/ospf/ospfv2$ netlab validate
[adj]     Check OSPF adjacencies [ node(s): r1,bb ]
[PASS]    r1: OSPFv2 neighbor 10.0.0.2 is in state Full/Backup
[PASS]    bb: OSPFv2 neighbor 10.0.0.2 is in state Full/-
[PASS]    Test succeeded in 0.3 seconds

[bfd]     Check OSPF BFD adjacencies [ node(s): r1,bb ]
[PASS]    bb: OSPFv2 neighbor 10.0.0.2 is in BFD state Up
[WAITING] Waiting for OSPF BFD adjacency process to complete (retrying for 30 seconds)
[WAITING] Waiting for OSPF BFD adjacency process to complete (13 seconds left)
[FAIL]    Node r1: OSPFv2 neighbor 10.0.0.2 is in BFD state Down

[FAIL]    4 tests completed, one test failed
(py3-snuff) netlab@netlab:~/snuffy-netlab/tests/integration/ospf/ospfv2$

@ipspace ipspace left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit of polishing and made the integration tests part of the regular testing process. Merging in a few minutes once the integration tests go through.


ospf.bfd.ipv4: True

groups:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes on the integration test (not your fault, you just used what was there).

You have to define the probe devices as FRR (preferably with CLAB provider), otherwise all lab devices use the same device type.

ospf:
area: 1
cost: 42
network_type: broadcast

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why this is here, but let's say it makes sense and we want to check BFD on multiple areas and link types ;)

mtu: 1500

nodes:
dut:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tested device is usually named "dut", so it's easier to figure out what device you have to troubleshoot.

@ipspace ipspace marked this pull request as ready for review June 28, 2026 04:28
ipspace added a commit that referenced this pull request Jun 28, 2026
@ipspace ipspace merged commit 173843e into ipspace:dev Jun 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants