HPX with PBS: How to run multiple HPX localities per compute node? #6713
Replies: 7 comments
-
|
@G-071 Could you please give me the output of |
Beta Was this translation helpful? Give feedback.
-
|
The logfiles are a bit on the longer end so I sent them to you by email! Please let me know if you need any more information or if I should try running a different configuration. Since I forgot to mention it: I am using the current HPX master and the preinstalled mpich (version 4.2.3). |
Beta Was this translation helpful? Give feedback.
-
|
Hi @G-071, The key changes here are: |
Beta Was this translation helpful? Give feedback.
-
|
@harith-hacky03 Alternatively, you'll get the same warning and a similar error message with the mpi parcelport. I am curious though: Where did you get the The solution: In a nutshell, one can get it to work by replacing pbsdsh with mpirun / mpiexec and adding the parameter The number of processes per node, the core binding and the memory binding need to be added manually to the mpirun call in this case. Here cpu-bind makes sure the first 3 localities run on the first socket, the other ones on the second. It actually leaves a few cores on each socket unbound, but there appears to be no way around this (since we have 52 cores and 3 GPUs per Socket). numactl makes sure we use the HBM memory. The GPU wrapper script looks like this: See the Aurora user guide for the explanations of the parameter choices here. Caveats:
Remaining questions before closing this issue:
|
Beta Was this translation helpful? Give feedback.
-
|
@G-071 LCI can work with
You can use For Slingshot-11, just set Let me know how it goes! I would love to get LCI working on Aurora. |
Beta Was this translation helpful? Give feedback.
-
Yes, please! @dimitraka can help with that. |
Beta Was this translation helpful? Give feedback.
-
|
Hi! Do you have any draft documentation you would like to add? I can help with integrating to the final docs. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am currently trying to run Octo-Tiger on Aurora which uses PBS instead of Slurm.
After a bit of trying, I was able to get a basic build going, with both the Intel GPU support and the MPI parcelport working. To run distributed scenarios, I was following the HPX documentation for usage with PBS.
However, currently I can only run distributed scenarios if I stick to one HPX locality per compute node.
If I run more localities on a node, Octo-Tiger will simply be executed independently once per locality.
For example, if I use the following line in my PBS runscript to run on two nodes, each with a single HPX locality like this
#PBS -l select=2, Octo-Tiger works as expected.However, using the same Octo-Tiger/HPX build on a single node with 2 processes with
#PBS -l select=1:mpiprocs=2does not work and just runs two independent instances of Octo-Tiger on the node, each running the complete scenario without communicating.On Aurora, the recommended setting is to use one process per GPU tile, so to use the machine properly, I would need 12 HPX localities per compute node, each using 8 CPU cores and one GPU tile (which actually leaves 4 cores per socket unused, but it still appears to be the recommended setup for the machine).
Is there a way to run multiple HPX localities per compute node with PBS? With Slurm+HPX this is easy, but I do not have a lot of experience with PBS yet and could not find any information on how to do this here (I might have just missed it though).
Beta Was this translation helpful? Give feedback.
All reactions