-
Notifications
You must be signed in to change notification settings - Fork 348
DAOS-18928 dfuse: increase MAX_DAOS_MT to 32 #18526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mchaarawi
wants to merge
4
commits into
master
Choose a base branch
from
mschaara/18928
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+217
−29
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| """ | ||
| (C) Copyright 2026 Hewlett Packard Enterprise Development LP | ||
|
|
||
| SPDX-License-Identifier: BSD-2-Clause-Patent | ||
| """ | ||
| import os | ||
|
|
||
| from apricot import TestWithServers | ||
| from command_utils_base import EnvironmentVariables | ||
| from dfuse_utils import get_dfuse, start_dfuse | ||
| from run_utils import run_remote | ||
|
|
||
| # Marker printed to stderr by libpil4dfs at process exit when D_IL_REPORT is set | ||
| # and interception is enabled. Its presence/absence tells us whether interception | ||
| # was active for the process. | ||
| INTERCEPT_MARKER = "libpil4dfs intercepting summary" | ||
|
|
||
|
|
||
| class Pil4dfsManyMounts(TestWithServers): | ||
| """Verify libpil4dfs handling of many dfuse mount points (MAX_DAOS_MT). | ||
|
|
||
| libpil4dfs discovers every fuse.daos mount point listed in /proc/self/mounts | ||
| when it initializes and stores them in a fixed-size table (MAX_DAOS_MT). When | ||
| the number of mount points is at or below the limit, interception is enabled | ||
| and used for all of them. When the number exceeds the limit, libpil4dfs must | ||
| gracefully disable interception (falling back to dfuse) rather than aborting | ||
| the application, so that no core file is produced. | ||
|
|
||
| :avocado: recursive | ||
| """ | ||
|
|
||
| def _add_mounts(self, pool, dfuse_hosts, dfuses, mount_dirs, target_count): | ||
| """Mount additional dfuse instances until target_count are mounted. | ||
|
|
||
| Args: | ||
| pool (TestPool): pool to create the containers in. | ||
| dfuse_hosts (NodeSet): hosts on which to mount dfuse. | ||
| dfuses (list): list of running dfuse instances, extended in place. | ||
| mount_dirs (list): list of mount point directories, extended in place. | ||
| target_count (int): total number of dfuse mount points to have mounted. | ||
| """ | ||
| while len(dfuses) < target_count: | ||
| container = self.get_container(pool) | ||
| dfuse = get_dfuse(self, dfuse_hosts) | ||
| start_dfuse(self, dfuse, pool, container) | ||
| dfuses.append(dfuse) | ||
| mount_dirs.append(dfuse.mount_dir.value) | ||
|
|
||
| def _verify_case(self, dfuse_hosts, env_str, mount_dirs, expect_intercept): | ||
| """Run a single libpil4dfs process across all current mount points and check interception. | ||
|
|
||
| Args: | ||
| dfuse_hosts (NodeSet): hosts on which to run the command. | ||
| env_str (str): shell prefix that loads libpil4dfs and enables D_IL_REPORT. | ||
| mount_dirs (list): mount point directories of all currently mounted dfuse instances. | ||
| expect_intercept (bool): whether interception is expected to be enabled. | ||
| """ | ||
| mount_count = len(mount_dirs) | ||
| self.log_step( | ||
| f"Case: {mount_count} mount points, " | ||
| f"expecting interception to be {'enabled' if expect_intercept else 'disabled'}") | ||
|
|
||
| # A single libpil4dfs-intercepted process that touches every mount point. At | ||
| # initialization libpil4dfs discovers all fuse.daos mounts in /proc/self/mounts, | ||
| # so this exercises the MAX_DAOS_MT table regardless of which mount is accessed. | ||
| stat_cmd = env_str + "stat " + " ".join(mount_dirs) | ||
| result = run_remote(self.log, dfuse_hosts, stat_cmd) | ||
|
|
||
| # The process must always complete cleanly, regardless of how many mounts are | ||
| # present. Over the limit, libpil4dfs must disable interception gracefully and | ||
| # never abort (which would create a core file and fail the CI stage). | ||
| if not result.passed: | ||
| self.fail( | ||
| f"libpil4dfs process failed with {mount_count} mount points on " | ||
| f"{result.failed_hosts}; it must never abort") | ||
|
|
||
| intercepted = INTERCEPT_MARKER in result.joined_stdout | ||
|
|
||
| # Log the observed interception status so the test log shows each case behaving | ||
| # as expected (interception enabled at/below MAX_DAOS_MT, disabled above it). | ||
| self.log.info( | ||
| "Case result: %d mount points -> process succeeded, interception %s " | ||
| "(expected %s)", mount_count, "enabled" if intercepted else "disabled", | ||
| "enabled" if expect_intercept else "disabled") | ||
|
|
||
| if expect_intercept and not intercepted: | ||
| self.fail( | ||
| f"Expected interception to be enabled with {mount_count} mount points, " | ||
| "but the libpil4dfs summary was not found") | ||
| if not expect_intercept and intercepted: | ||
| self.fail( | ||
| f"Expected interception to be disabled with {mount_count} mount points " | ||
| "(more than MAX_DAOS_MT), but the libpil4dfs summary was found") | ||
|
|
||
| def test_pil4dfs_many_mounts(self): | ||
| """JIRA ID: DAOS-18890. | ||
|
|
||
| Test Description: | ||
| Verify libpil4dfs behavior with dfuse mount point counts at/below and | ||
| above MAX_DAOS_MT, all within a single test run. No case may produce a | ||
| core file. Mounts accumulate across cases (rather than being recreated | ||
| for each) so the same dfuse instances are reused as the count grows. | ||
|
|
||
| Steps: | ||
| 1.) Create a single pool. | ||
| 2.) For each count in intercept_mount_counts (ascending), mount | ||
| additional dfuse instances up to that count and confirm a single | ||
| libpil4dfs process uses them all (interception enabled). | ||
| 3.) Mount additional dfuse instances up to no_intercept_mount_count | ||
| (more than MAX_DAOS_MT) and confirm the libpil4dfs process | ||
| completes without aborting and with interception disabled. | ||
|
|
||
| :avocado: tags=all,daily_regression | ||
| :avocado: tags=vm | ||
| :avocado: tags=dfuse,pil4dfs | ||
| :avocado: tags=Pil4dfsManyMounts,test_pil4dfs_many_mounts | ||
| """ | ||
| intercept_mount_counts = sorted(self.params.get( | ||
| "intercept_mount_counts", "/run/test/*", [10, 32])) | ||
| no_intercept_mount_count = self.params.get( | ||
| "no_intercept_mount_count", "/run/test/*", 33) | ||
|
|
||
| lib_path = os.path.join(self.prefix, "lib64", "libpil4dfs.so") | ||
| env_str = EnvironmentVariables({ | ||
| "LD_PRELOAD": lib_path, | ||
| "D_IL_NO_BYPASS": 1, | ||
| "D_IL_REPORT": 1 | ||
| }).to_export_str() | ||
| dfuse_hosts = self.hostlist_clients | ||
|
|
||
| self.log_step("Creating a single pool") | ||
| pool = self.get_pool(connect=False) | ||
|
|
||
| dfuses = [] | ||
| mount_dirs = [] | ||
| try: | ||
| # Mounts accumulate across cases: grow up to each target count, verifying | ||
| # behavior at each step, rather than recreating mounts for every case. | ||
| for target_count in intercept_mount_counts: | ||
| self._add_mounts(pool, dfuse_hosts, dfuses, mount_dirs, target_count) | ||
| self._verify_case(dfuse_hosts, env_str, mount_dirs, expect_intercept=True) | ||
|
|
||
| self._add_mounts(pool, dfuse_hosts, dfuses, mount_dirs, no_intercept_mount_count) | ||
| self._verify_case(dfuse_hosts, env_str, mount_dirs, expect_intercept=False) | ||
| finally: | ||
| for dfuse in dfuses: | ||
| dfuse.stop() | ||
|
|
||
| self.log.info("Test passed") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,29 @@ | ||||||
| hosts: | ||||||
| test_servers: 1 | ||||||
| test_clients: 1 | ||||||
| timeout: 900 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only if you need to push again: could reduce this to 600 since the actual execution time was ~6 minutes
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sigh.. i did repush but forgot about this |
||||||
| server_config: | ||||||
| name: daos_server | ||||||
| engines_per_host: 1 | ||||||
| engines: | ||||||
| 0: | ||||||
| targets: 4 | ||||||
| nr_xs_helpers: 0 | ||||||
| storage: | ||||||
| 0: | ||||||
| class: ram | ||||||
| scm_mount: /mnt/daos | ||||||
| system_ram_reserved: 1 | ||||||
| pool: | ||||||
| size: 1GiB | ||||||
| container: | ||||||
| type: POSIX | ||||||
| control_method: daos | ||||||
| test: | ||||||
| # Mount counts at/below MAX_DAOS_MT for which libpil4dfs enables interception. | ||||||
| intercept_mount_counts: | ||||||
| - 10 | ||||||
| - 32 | ||||||
| # Mount count above MAX_DAOS_MT for which libpil4dfs must gracefully disable | ||||||
| # interception (no abort, no core file). | ||||||
| no_intercept_mount_count: 33 | ||||||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not make this 128 or 256? The cost of this is what, a few KB? 32 is still in the realm of possibility of mounts on a common login.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest increasing the size to something that will likely not happen. Otherwise you'll just hit this in the future and be annoyed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed with Kevin offline. the change made to just disable interception once > 32 mounts are there is sufficient and this won't abort apps as before and cause problems.