Skip to content

Changes to OpenMP scripts to extract arguments from iom_put#3373

Open
LonelyCat124 wants to merge 36 commits intomasterfrom
iom_put_to_temp_
Open

Changes to OpenMP scripts to extract arguments from iom_put#3373
LonelyCat124 wants to merge 36 commits intomasterfrom
iom_put_to_temp_

Conversation

@LonelyCat124
Copy link
Copy Markdown
Collaborator

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.96%. Comparing base (ddc5397) to head (5ad2f3e).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3373   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files         389      389           
  Lines       54541    54591   +50     
=======================================
+ Hits        54522    54572   +50     
  Misses         19       19           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

Transformation failing on nemo5, and maybe not for structure of array:

psyclone --enable-cache -l output -s /archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/omp_cpu_trans.py -I /archive/psyclone-tests/latest-run/UKMO-NEMOv5/tests/BENCH_OMP_THREADING_GCC/BLD/tmp -o icewri.psycloned.f90 /archive/psyclone-tests/latest-run/UKMO-NEMOv5/tests/BENCH_OMP_THREADING_GCC/BLD/ppsrc/nemo/icewri.f90
Adding OpenMP threading to subroutine: ice_wri
Traceback (most recent call last):
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/bin/psyclonefc", line 45, in <module>
    compiler_wrapper(sys.argv[1:])
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.14/site-packages/psyclone/psyclonefc_cli.py", line 140, in compiler_wrapper
    main(psyclone_args)
    ~~~~^^^^^^^^^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.14/site-packages/psyclone/generator.py", line 743, in main
    code_transformation_mode(
    ~~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=args.filename,
        ^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        line_length=args.limit,
        ^^^^^^^^^^^^^^^^^^^^^^^
        free_form=free_form)
        ^^^^^^^^^^^^^^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.14/site-packages/psyclone/generator.py", line 964, in code_transformation_mode
    trans_recipe(psyir)
    ~~~~~~~~~~~~^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/omp_cpu_trans.py", line 113, in trans
    iom_put_argument_to_temporary(subroutine.walk(Call))
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/examples/nemo/scripts/utils.py", line 541, in iom_put_argument_to_temporary
    DataNodeToTempTrans().apply(arg)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.14/site-packages/psyclone/psyir/transformations/datanode_to_temp_trans.py", line 274, in apply
    node.scope.symbol_table.add(sym_copy)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/archive/psyclone-tests/action-runner-software/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/.runner_venv/lib/python3.14/site-packages/psyclone/psyir/symbols/symbol_table.py", line 600, in add
    raise KeyError(f"Symbol table already contains a symbol with "
                   f"name '{new_symbol.name}'.")
KeyError: "Symbol table already contains a symbol with name 'Nie0'."

@LonelyCat124 LonelyCat124 marked this pull request as ready for review March 17, 2026 12:14
@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

ITs all pass now, the remaining question is whether there's any performance degredation - I don't think there should be but its possible this is being done "too widely". One for either @sergisiso or @arporter to review.

Copy link
Copy Markdown
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonelyCat124 The changes look good, but I would like that the transformation provide more feedback in order to understand why it hasn't given a performance improvement. Also, see if it can be more generic.

Comment thread examples/nemo/scripts/utils.py Outdated
Comment on lines +536 to +543
if call.symbol.name == "iom_put":
arg = call.arguments[1]
dtype = arg.datatype
if isinstance(dtype, ArrayType) and isinstance(arg, Operation):
try:
DataNodeToTempTrans().apply(arg)
except TransformationError:
pass
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests don't show any performance advantage which is not what we expected (we are changing from multiple gpu->cpu reads to one, and to maybe preventing the data touched from the gpu to be brought back):

  • I can check with a grep how many more loops are offloaded but for the places that it was not applied, could you add as preceding comment the reason why not (if not all transformation errors provide useful information, a verbose option like other transformation have can help)
  • There is nothing specific of iom_put, other than we know it is a common pattern. We want to avoid touching things from the CPU as much as possible, could this be applied to all subroutine calls (not functions)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a preceding comment now. I'll try generalising it as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently causes stuff to fail, I'll try to see if I can get my VPN to start working again and see if I can try manually building NEMO5 to find the cause.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has shown a few issues with the DataNodeToTempTrans (partly because some things are Statements that I didn't think, e.g. an IfBlock's condition).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the bugs now

@sergisiso
Copy link
Copy Markdown
Collaborator

The DataNode2Temp has a:

calls = node.walk(Call)
for call in calls:
    if not call.is_pure:
       raise

This may be a problem as walk returns self, and iom_put is not pure. I suppose this was meant for the calls in the arguments and we just forgot to skip self?

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

The DataNode2Temp has a:

calls = node.walk(Call)
for call in calls:
    if not call.is_pure:
       raise

This may be a problem as walk returns self, and iom_put is not pure. I suppose this was meant for the calls in the arguments and we just forgot to skip self?

@sergisiso I don't think so. We're calling the transformation on the argument, not the call so the iom_put is the parent.

@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

@sergisiso This is ready for another look now at last.

Copy link
Copy Markdown
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonelyCat124 The implementation is getting closer, and the generated NEMO diff seems to make sense, but it is a bit puzzling that the performance is not getting better. I will have a look with the profiler but I won't stop the PR while I do it.

Comment thread src/psyclone/psyir/nodes/intrinsic_call.py
Comment thread src/psyclone/psyir/nodes/intrinsic_call.py
Comment on lines +3222 to +3223
node.argument_by_name("l").
datatype.intrinsic,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this in the same line.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't done? There is also the same unecessary line breaks in lines 3775, 4458, 4730 and 4755

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad, I reverted some commits to the file I think and so this was also reverted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed those now.

Comment thread examples/nemo/scripts/utils.py Outdated
try:
DataNodeToTempTrans().apply(arg)
except TransformationError as err:
call.append_preceding_comment(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider doing the same but inside the transformation with a verbose option.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, should be covered by tests too.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now you can delete this in favour of the option.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread examples/nemo/scripts/omp_gpu_trans.py Outdated
if isinstance(self.intrinsic.return_type, Callable):
try:
return self.intrinsic.return_type(self)
except TypeError as err:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we discussed this before but can you remind me why we don't simply use:

try:
      return self.intrinsic.return_type(self)
except (TypeError, AttributeError):
      return UnresolvedType()

I know for debugging the more explicit error could help, but from a user point of view it doesn't matter that much it would be better to get the UnresolvedType to communicate that we can't do it but let it continue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to be more precise as from a user point of view I think it should never reach the InternalError, so its mostly being defensiveness from a development standpoint to avoid reaching things that are unexpected, but I'm happy to change it if you would prefer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok (it still feels not ideal that we compare against the error message, hopefully we can improve on this in the furture)

Comment thread src/psyclone/psyir/transformations/datanode_to_temp_trans.py Outdated
Comment thread src/psyclone/psyir/transformations/datanode_to_temp_trans.py
@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

@sergisiso This is ready for another look, I am going to set ITs going again since output code changes will likely happen.

Copy link
Copy Markdown
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonelyCat124 I am happy with the current implementation and bringing the remaining rough edges to separate issues/TODOs. There is just final clean up to do before merging it.

Comment thread examples/nemo/scripts/omp_gpu_trans.py Outdated
Comment on lines +193 to +194
# Extract any array operations from iom_put calls to temporary
# expressions that can be parallelised.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

MaximalProfilingOutsideDirectivesTrans().apply(children)


def iom_put_argument_to_temporary(calls: list[Call]):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"param:" missing

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. I also have slightly changed the behaviour so I will rerun integration.

Comment thread examples/nemo/scripts/utils.py Outdated
try:
DataNodeToTempTrans().apply(arg)
except TransformationError as err:
call.append_preceding_comment(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now you can delete this in favour of the option.

allocatable_datatype.shape])
# If any of the bound information aren't static then we need
# to create an allocatable array.
is_static = True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe r/is_static/has_static_bounds/ to not confuse it with symbols with static interface

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats probably better.

Comment on lines +3222 to +3223
node.argument_by_name("l").
datatype.intrinsic,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't done? There is also the same unecessary line breaks in lines 3775, 4458, 4730 and 4755

if isinstance(self.intrinsic.return_type, Callable):
try:
return self.intrinsic.return_type(self)
except TypeError as err:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok (it still feels not ideal that we compare against the error message, hopefully we can improve on this in the furture)

Comment on lines +4926 to +4927
# Is this reachable? Tested via monkeypatch as there may be
# some edge case I can't think of.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a comment like "This should never happen, so propagate as an InternalError" instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thats neater, the previous comment probably should have been labelled as FIXME, updated now.

@LonelyCat124 LonelyCat124 deployed to integration April 27, 2026 13:36 — with GitHub Actions Active
@LonelyCat124
Copy link
Copy Markdown
Collaborator Author

@sergisiso I've set integration running again, assuming no issues arise this is ready for another look. If the tests fail on treesitter related issues then I think we should wait until #3408 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants