Skip to content
Draft
Changes from 3 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
375066c
add compute in its current form
a-alveyblanc Nov 20, 2025
745f841
align compyte with inducer/main
a-alveyblanc Nov 20, 2025
8896644
clean up comments and typos
a-alveyblanc Nov 20, 2025
6230e11
switch to alpha namedisl usage
a-alveyblanc Dec 10, 2025
80839ba
start using namedisl in places other than compute
a-alveyblanc Dec 10, 2025
c1ba35b
add namedisl objects to a type signature
a-alveyblanc Dec 10, 2025
7265536
compute transform up to and including instruction creation + insertio…
a-alveyblanc Mar 16, 2026
56af4fe
invocation replacement; dependencies still need handling
a-alveyblanc Mar 17, 2026
9d12183
rough sketch of compute transform; inames not schedulable because of …
a-alveyblanc Mar 18, 2026
8667a01
compute working for tiled matmul; write race condition warning
a-alveyblanc Mar 18, 2026
959e68b
add compute matmul example
a-alveyblanc Mar 18, 2026
a566b6e
clean up compute matmul example
a-alveyblanc Mar 18, 2026
cf920b5
improve matmul example with more parameters, better post-compute tran…
a-alveyblanc Mar 19, 2026
4c7d1ac
add 2.5D FD example base; minor stylistic changes
a-alveyblanc Mar 20, 2026
781df54
improvements to 2.5D example; bug fixes in compute transform
a-alveyblanc Mar 23, 2026
cccc6a1
Feedback from meeting
inducer Mar 23, 2026
6a7f719
update footprint finding to minimize projections, find bounds
a-alveyblanc Mar 24, 2026
aa8b612
add/remove FIXMEs
a-alveyblanc Mar 24, 2026
907873c
add tiled matmul example as test; islpy -> namedisl
a-alveyblanc Apr 6, 2026
2d9412d
Merge branch 'main' of https://github.com/inducer/loopy into fine-gra…
a-alveyblanc Apr 6, 2026
9b590a6
Merge branch 'main' of https://github.com/inducer/loopy into fine-gra…
a-alveyblanc Apr 6, 2026
0d3011f
compute working for 2D plane in FD example
a-alveyblanc Apr 7, 2026
9d5c9aa
rough version of shifting for ring-buffer-like compute transforms
a-alveyblanc Apr 16, 2026
33814e2
add more examples
a-alveyblanc Apr 17, 2026
35b78d6
add compute examples driver; entire refactor on compute for clarity
a-alveyblanc Apr 17, 2026
10317a1
change examples runner to use which python instead of hardcoded pytho…
a-alveyblanc Apr 17, 2026
1fa9526
add diamond tiling example
a-alveyblanc Apr 18, 2026
bd8d240
add .codex to gitignore
a-alveyblanc Apr 18, 2026
70c15c8
remove dependency.py
a-alveyblanc Apr 18, 2026
62eb46a
back to old version of compute
a-alveyblanc Apr 23, 2026
610b505
Merge branch 'main' of https://github.com/inducer/loopy into fine-gra…
a-alveyblanc Apr 23, 2026
e8f3f0d
remove some irrelevant compute tests
a-alveyblanc Apr 23, 2026
eb11be9
add new example; infer temporal_inames, allow user to supply extra co…
a-alveyblanc Apr 29, 2026
92daf24
remove temporary compute impl
a-alveyblanc Apr 29, 2026
068a603
temporarily disable 3D diamond tiled example
a-alveyblanc Apr 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 196 additions & 0 deletions loopy/transform/compute.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
import islpy as isl

import loopy as lp
from loopy.kernel import LoopKernel
from loopy.kernel.data import AddressSpace
from loopy.kernel.function_interface import CallableKernel, ScalarCallable
from loopy.match import parse_stack_match
from loopy.symbolic import (
RuleAwareSubstitutionMapper,
SubstitutionRuleMappingContext,
pw_aff_to_expr
)
from loopy.translation_unit import TranslationUnit

from pymbolic import var
from pymbolic.mapper.substitutor import make_subst_func

from pytools.tag import Tag


def compute(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this could just be a @for_each_kernel decorator?

t_unit: TranslationUnit,
substitution: str,
*args,

Check warning on line 24 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Type annotation is missing for parameter "args" (reportMissingParameterType)

Check warning on line 24 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Type of parameter "args" is unknown (reportUnknownParameterType)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just repeat everything from below. basedpyright will help ensure consistency between both copies.

**kwargs

Check warning on line 25 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Type annotation is missing for parameter "kwargs" (reportMissingParameterType)

Check warning on line 25 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Type of parameter "kwargs" is unknown (reportUnknownParameterType)
) -> TranslationUnit:
"""
Entrypoint for performing a compute transformation on all kernels in a
translation unit. See :func:`_compute_inner` for more details.
"""

assert isinstance(t_unit, TranslationUnit)
new_callables = {}

for id, callable in t_unit.callables_table.items():
if isinstance(callable, CallableKernel):
kernel = _compute_inner(
callable.subkernel,
substitution,
*args, **kwargs

Check warning on line 40 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument type is unknown   Argument corresponds to parameter "temporary_address_space" in function "_compute_inner" (reportUnknownArgumentType)

Check warning on line 40 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument type is unknown   Argument corresponds to parameter "default_tag" in function "_compute_inner" (reportUnknownArgumentType)

Check warning on line 40 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument type is unknown   Argument corresponds to parameter "storage_inames" in function "_compute_inner" (reportUnknownArgumentType)

Check warning on line 40 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument type is unknown   Argument corresponds to parameter "compute_map" in function "_compute_inner" (reportUnknownArgumentType)

Check warning on line 40 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument type is unknown   Argument corresponds to parameter "transform_map" in function "_compute_inner" (reportUnknownArgumentType)
)

callable = callable.copy(subkernel=kernel)
elif isinstance(callable, ScalarCallable):
pass
else:
raise NotImplementedError()

new_callables[id] = callable

return t_unit

def _compute_inner(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this would've been an opportunity to try out namedisl, at least locally, to see how the interface "feels".

kernel: LoopKernel,
substitution: str,
transform_map: isl.Map,
compute_map: isl.Map,
storage_inames: list[str],
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is storage_inames?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this the first time around. storage_inames corresponds to the inames that would be generated in something like a tiled matmul to fill shared memory with input tiles. Maybe storage_axes is a better name. This corresponds to the a in the (a, l) range of compute_map.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer Sequence to list on input.

default_tag: Tag | str | None = None,
temporary_address_space: AddressSpace | None = None
) -> LoopKernel:
"""
Inserts an instruction to compute an expression given by :arg:`substitution`
and replaces all invocations of :arg:`substitution` with the result of the
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all

Not really: only the relevant ones, where relevant should be defined below.

compute instruction.

:arg substitution: The substitution rule for which the compute
transform should be applied.

:arg transform_map: An :class:`isl.Map` representing the affine
transformation from the original iname domain to the transformed iname
domain.

:arg compute_map: An :class:`isl.Map` representing a relation between
substitution rule indices and tuples `(a, l)`, where `a` is a vector of
storage indices and `l` is a vector of "timestamps".
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the boundary of a and l determined?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In it's current form, it relies on user input to determine what a is (this is storage_inames).

"""

if not temporary_address_space:
temporary_address_space = AddressSpace.GLOBAL

# {{{ normalize names

iname_to_storage_map = {
iname : (iname + "_store" if iname in kernel.all_inames() else iname)
for iname in storage_inames
}

new_storage_axes = list(iname_to_storage_map.values())

for dim in range(compute_map.dim(isl.dim_type.out)):
for iname, storage_ax in iname_to_storage_map.items():
if compute_map.get_dim_name(isl.dim_type.out, dim) == iname:
compute_map = compute_map.set_dim_name(
isl.dim_type.out, dim, storage_ax
)

# }}}

# {{{ update kernel domain to contain storage inames

storage_domain = compute_map.range().project_out_except(
new_storage_axes, [isl.dim_type.set]
)

# FIXME: likely need to do some more digging to find proper domain to update
new_domain = kernel.domains[0]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the DomainChanger.

for ax in new_storage_axes:
new_domain = new_domain.add_dims(isl.dim_type.set, 1)

new_domain = new_domain.set_dim_name(
isl.dim_type.set,
new_domain.dim(isl.dim_type.set) - 1,
ax
)

new_domain, storage_domain = isl.align_two(new_domain, storage_domain)
new_domain = new_domain & storage_domain
kernel = kernel.copy(domains=[new_domain])

# }}}

# {{{ express substitution inputs as pw affs of (storage, time) names

compute_pw_aff = compute_map.reverse().as_pw_multi_aff()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does .as_multi_pw_aff() do in this context? I've never used it.

Copy link
Copy Markdown
Contributor Author

@a-alveyblanc a-alveyblanc Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular instance, it expresses substitution inputs as piecewise affine functions of (a, l). compute uses the output of the resulting PwMultiAff to determine the multidimensional index expressions of the RHS of a substitution rule.

subst_inp_names = [
compute_map.get_dim_name(isl.dim_type.in_, i)
for i in range(compute_map.dim(isl.dim_type.in_))
]
storage_ax_to_global_expr = dict.fromkeys(subst_inp_names)
for dim in range(compute_pw_aff.dim(isl.dim_type.out)):
subst_inp = compute_map.get_dim_name(isl.dim_type.in_, dim)
storage_ax_to_global_expr[subst_inp] = \
pw_aff_to_expr(compute_pw_aff.get_at(dim))

# }}}

# {{{ generate instruction from compute map

rule_mapping_ctx = SubstitutionRuleMappingContext(
kernel.substitutions, kernel.get_var_name_generator())

expr_subst_map = RuleAwareSubstitutionMapper(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to subclass this guy. Otherwise you won't be able to decide whether the usage site is "in-footprint".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that RuleAwareSubstitutionMapper only mapped storage axes to index expressions in pymbolic.

Do you mean RuleInvocationReplacer? This explicitly checks footprints with ArrayToBufferMap and some other information computed earlier in precompute.

rule_mapping_ctx,
make_subst_func(storage_ax_to_global_expr),

Check failure on line 145 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument of type "dict[str | None, Any | None]" cannot be assigned to parameter "variable_assignments" of type "CanGetitem[Any, Expression]" in function "make_subst_func"   "dict[str | None, Any | None]" is incompatible with protocol "CanGetitem[Any, Expression]"     "__getitem__" is an incompatible type       Type "(key: str | None, /) -> (Any | None)" is not assignable to type "(key: _K_contra@CanGetitem, /) -> _V_co@CanGetitem"         Function return type "Any | None" is incompatible with type "_V_co@CanGetitem"           Type "Any | None" is not assignable to type "Expression" (reportArgumentType)
within=parse_stack_match(None)
)

subst_expr = kernel.substitutions[substitution].expression
compute_expression = expr_subst_map(subst_expr, kernel, None)

Check failure on line 150 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Argument of type "None" cannot be assigned to parameter "insn" of type "InstructionBase" in function "__call__"   "None" is not assignable to "InstructionBase" (reportArgumentType)

temporary_name = substitution + "_temp"
assignee = var(temporary_name)[tuple(
var(iname) for iname in new_storage_axes
)]

compute_insn_id = substitution + "_compute"
compute_insn = lp.Assignment(
id=compute_insn_id,
assignee=assignee,
expression=compute_expression,
)

compute_dep_id = compute_insn_id

Check warning on line 164 in loopy/transform/compute.py

View workflow job for this annotation

GitHub Actions / basedpyright

Variable "compute_dep_id" is not accessed (reportUnusedVariable)
new_insns = [compute_insn]

# add global sync if we are storing in global memory
if temporary_address_space == lp.AddressSpace.GLOBAL:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is global special-cased here? There's also sometimes a need to insert local barriers. And whether that's needed is a function of the dependency structure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snagged this from existing precompute during development. This should have been taken out. I agree that we should rely on dependency checking for barrier insertion.

gbarrier_id = kernel.make_unique_instruction_id(
based_on=substitution + "_barrier"
)

from loopy.kernel.instruction import BarrierInstruction
barrier_insn = BarrierInstruction(
id=gbarrier_id,
depends_on=frozenset([compute_insn_id]),
synchronization_kind="global",
mem_kind="global"
)

compute_dep_id = gbarrier_id

# }}}

# {{{ replace substitution rule with newly created instruction

# FIXME: get these properly (see `precompute`)
subst_name = substitution
subst_tag = None
within = None # do we need this?



# }}}

return kernel
Loading