ggml-hexagon: HAP_power_set_HMX uses &ctx instead of ctx in htp_iface_open(), causing large HMX GEMM slowdown

### What happened

In `ggml-hexagon`, `htp_iface_open()` powers up HMX with the wrong client/context pointer.

File:
- `src/ggml-hexagon/htp/main.c`

Current code on commit `35ae589fa189a3682a1fe25b7803122680c401b4`:

```c
request.type         = HAP_power_set_HMX;
request.hmx.power_up = TRUE;
err = HAP_power_set((void *) &ctx, &request);
```

That passes `&ctx` (address of the local pointer variable) instead of `ctx` (the actual `struct htp_context *`).

The rest of the power votes in the same function use `ctx` correctly:

```c
HAP_power_set((void *) ctx, &request)
```

The one-line fix is:

```c
err = HAP_power_set((void *) ctx, &request);
```

### Why this matters

This is not just a cosmetic bug. On device, this causes a large HMX performance regression in `ggml-hexagon` HMX GEMM.

After fixing only this pointer bug, the HMX `core` segment drops immediately from ~65 ms to ~22 ms on the same workload.

### Reproduction

Repo / commit:
- `ggml-org/ggml`
- `35ae589fa189a3682a1fe25b7803122680c401b4`

Command used:

```bash
GGML_HEXAGON_ARCH=79 GGML_HEXAGON_PROFILE=1 \
./test-backend-ops perf -o MUL_MAT -b HTP0 \
  -p "type_a=q8_0,type_b=f32,m=4096,n=12288,k=4096"
```

and:

```bash
GGML_HEXAGON_ARCH=79 GGML_HEXAGON_PROFILE=1 \
./test-backend-ops perf -o MUL_MAT -b HTP0 \
  -p "type_a=q4_0,type_b=f32,m=4096,n=12288,k=4096"
```

### Measured before / after

`q8_0`, shape `4096 x 4096 x 12288`

Before fix:
- `dequant ~= 777xx us`
- `core ~= 648xx us`

After changing only `HAP_power_set((void *)&ctx, ...)` -> `HAP_power_set((void *)ctx, ...)`:
- `dequant ~= 776xx us`
- `core ~= 2205x us`

`q4_0`, shape `4096 x 4096 x 12288`

Before fix:
- `dequant ~= 643xx us`
- `core ~= 653xx us`

After fix:
- `dequant ~= 595xx us`
- `core ~= 2218x us`

### Suggested fix

Change this line in `src/ggml-hexagon/htp/main.c`:

```c
err = HAP_power_set((void *) &ctx, &request);
```

to:

```c
err = HAP_power_set((void *) ctx, &request);
```

### Notes

I ruled out a few unrelated explanations before isolating this:
- HMX lock mode (`lock` vs `lock2(shared)`) was not the cause.
- Chunk/layout changes were not the cause.
- The same workload still ran numerically; the bug manifested as a large performance drop.

This issue is specifically about the wrong context pointer passed to `HAP_power_set_HMX`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-hexagon: HAP_power_set_HMX uses &ctx instead of ctx in htp_iface_open(), causing large HMX GEMM slowdown #1452

What happened

Why this matters

Reproduction

Measured before / after

Suggested fix

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ggml-hexagon: HAP_power_set_HMX uses &ctx instead of ctx in htp_iface_open(), causing large HMX GEMM slowdown #1452

Description

What happened

Why this matters

Reproduction

Measured before / after

Suggested fix

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions