feat(metrics): add Perplexity metric to ignite.metrics.nlp by steaphenai · Pull Request #3743 · pytorch/ignite

steaphenai · 2026-04-20T09:41:36Z

Closes #3742

Summary

Add a new Perplexity metric implementation in ignite.metrics.nlp.perplexity.
Export Perplexity from both ignite.metrics.nlp and top-level ignite.metrics.
Add dedicated tests for correctness, accumulation behavior, reset behavior, return type, and invalid inputs.

Test plan

python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -v
Smoke test:
- python -c "from ignite.metrics.nlp import Perplexity; import torch; ppl = Perplexity(); ppl.reset(); ppl.update((torch.randn(2,5,3), torch.randint(0,5,(2,3)))); print('PPL =', ppl.compute())"

Files changed

ignite/metrics/nlp/perplexity.py
ignite/metrics/nlp/__init__.py
ignite/metrics/__init__.py
tests/ignite/metrics/nlp/test_perplexity.py

Expose a new token-level Perplexity metric in ignite.metrics.nlp and top-level ignite.metrics, with dedicated unit tests to validate correctness and behavior.

Prathamesh8989 · 2026-04-20T16:19:16Z

Nice addition! Perplexity is definitely a useful metric for language modeling and it fits well under ignite.metrics.nlp.

The test coverage looks solid — especially the token-weighted accumulation test, which ensures correctness across batches with different sequence lengths.

One small suggestion: it might be useful to add a GPU test to ensure the metric behaves correctly when tensors are on CUDA devices, since many language modeling workloads run on GPU.

Something like:

def test_gpu_support():
    if not torch.cuda.is_available():
        pytest.skip()

Overall the implementation and tests look clean and consistent with existing Ignite metrics.

steaphenai · 2026-04-20T17:05:41Z

Good point, thanks. I’d like to keep this PR scoped to the Perplexity implementation and core correctness tests. We can add a dedicated CUDA test if maintainers want explicit GPU coverage.
@vfdev-5 thoughts?

vfdev-5

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against.
I suggest to check what we can use for reference implementation.
In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

steaphenai · 2026-04-21T10:30:18Z

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against. I suggest to check what we can use for reference implementation. In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

Thanks for the quick review, @vfdev-5.
I addressed the two code comments in the latest push:
detached tensors from the grad graph in Perplexity.update()
removed test_returns_float
I’ll also use existing metric tests (e.g., test_accuracy.py) as reference for accelerator-oriented test patterns as needed.

steaphenai · 2026-04-21T11:31:09Z

I added an explicit reference implementation check (_reference_perplexity) and validated both single-batch and multi-batch token-weighted accumulation against it and also I aligned test structure with existing Ignite metric patterns (as in test_accuracy.py): available_device parametrization, device assertions, and distributed-marked test layout.
Local run: python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -m "not distributed" -v -> 10 passed, 12 skipped (CUDA/MPS skipped when unavailable).

…on test

Co-authored-by: vfdev <vfdev.5@gmail.com>

… trivial test

vfdev-5 · 2026-04-23T15:29:39Z

@steaphenai code style check is failing: https://github.com/pytorch/ignite/actions/runs/24830927009/job/72725024776?pr=3743

…enai/ignite into feat/perplexity-metric-pr

…engths

…umulation test

steaphenai · 2026-04-23T16:50:04Z

@vfdev-5 Could you approve the workflows to run? The required checks are pending approval.

vfdev-5 · 2026-04-24T15:41:27Z

@steaphenai this failure is real: https://github.com/pytorch/ignite/actions/runs/24847148226/job/72879291595?pr=3743
Check other metrics how do they handle double dtype

…rs for MPS compatibility

steaphenai · 2026-04-24T21:09:07Z

@vfdev-5 I checked other metrics in ignite.metrics.nlp. I found that BLEU and ROUGE don't force dtype=torch.double on their accumulators. I've removed the explicit dtype=torch.double and dtype=torch.long from _sum_of_nll and _num_tokens in reset(), and removed dtype=torch.double from the .to() call in update() to match that pattern and fix MPS compatibility.

TahaZahid05 · 2026-05-22T16:11:59Z

hi @steaphenai ! are you still working on this? if you are facing any issue, let us know!

steaphenai · 2026-05-23T04:04:41Z

Hi @TahaZahid05 ! Yes, I'm still working on this. This needs a workflow approval to get the required checks running. Thanks!

TahaZahid05 · 2026-05-25T09:41:54Z

@steaphenai docs build failures are real. CI for unit tests are currently broken, are they passing locally for you?

Copilot

Pull request overview

This PR adds a new Perplexity NLP metric to ignite.metrics.nlp, exposes it via public imports (ignite.metrics.nlp and ignite.metrics), adds test coverage (including distributed integration), and updates the metrics documentation list.

Changes:

Implement ignite.metrics.nlp.Perplexity with token-weighted NLL accumulation and distributed reduction support.
Export Perplexity from ignite.metrics.nlp and top-level ignite.metrics.
Add unit + distributed tests and include the metric in the docs metrics list.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
ignite/metrics/nlp/perplexity.py	Adds the new `Perplexity` metric implementation and its public API/docs.
ignite/metrics/nlp/init.py	Re-exports `Perplexity` from the NLP metrics namespace.
ignite/metrics/init.py	Re-exports `Perplexity` from the top-level metrics namespace.
tests/ignite/metrics/nlp/test_perplexity.py	Adds correctness, accumulation/reset, and distributed integration tests for `Perplexity`.
docs/source/metrics.rst	Adds `Perplexity` to the published metrics list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TahaZahid05

@steaphenai some small changes required. You can ignore copilot's review as i have already incorporated them in my review. thanks!

Co-authored-by: Cursor <cursoragent@cursor.com>

TahaZahid05 · 2026-06-13T10:47:38Z

@steaphenai thanks for the updates! All looks good, just apply the above suggestions and then we can merge.

Co-authored-by: Taha Zahid <156289245+TahaZahid05@users.noreply.github.com>

TahaZahid05 · 2026-06-13T12:28:33Z

@steaphenai code style checks are failing. Kindly follow steps given in CONTRIBUTING.MD

github-actions Bot added the module: metrics Metrics module label Apr 20, 2026

feat(metrics): add Perplexity metric to NLP metrics

fa394ca

Expose a new token-level Perplexity metric in ignite.metrics.nlp and top-level ignite.metrics, with dedicated unit tests to validate correctness and behavior.

steaphenai force-pushed the feat/perplexity-metric-pr branch from 8453d4e to fa394ca Compare April 20, 2026 10:33

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

fix(metrics): detach Perplexity accumulators and refine tests

e1e1e5f

test(metrics): align Perplexity tests with metric patterns

4335359

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

vfdev-5 marked this pull request as draft April 21, 2026 12:37

fix(metrics): address Perplexity review follow-ups

d5ab433

steaphenai marked this pull request as ready for review April 21, 2026 14:25

vfdev-5 marked this pull request as draft April 21, 2026 15:38

vfdev-5 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

test(metrics): use _reference_perplexity in token-weighted accumulati…

dae37e9

…on test

steaphenai marked this pull request as ready for review April 22, 2026 06:27

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

Update tests/ignite/metrics/nlp/test_perplexity.py

f9ecaa1

Co-authored-by: vfdev <vfdev.5@gmail.com>

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/__init__.py

vfdev-5 reviewed Apr 23, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py

feat(metrics): add ignore_index to Perplexity, expose in docs, remove…

e5c0cfd

… trivial test

github-actions Bot added the docs label Apr 23, 2026

Merge branch 'master' into feat/perplexity-metric-pr

46a3b8d

steaphenai added 3 commits April 23, 2026 21:15

style: fix ruff formatting in test_perplexity.py

fa8fb7f

Merge branch 'feat/perplexity-metric-pr' of https://github.com/steaph…

49ccdc4

…enai/ignite into feat/perplexity-metric-pr

fix(tests): fix token weighted accumulation test with different seq l…

c7a3720

…engths

fix(tests): use _reference_perplexity and matching seq lengths in acc…

3143650

…umulation test

fix(metrics): remove explicit double dtype from Perplexity accumulato…

b514e12

…rs for MPS compatibility

Merge branch 'master' into feat/perplexity-metric-pr

832d3c2

aaishwarymishra and others added 2 commits June 6, 2026 01:35

Merge branch 'master' into feat/perplexity-metric-pr

1d6ba0b

Merge branch 'master' into feat/perplexity-metric-pr

4aa534d

TahaZahid05 requested a review from Copilot June 10, 2026 15:01

Copilot started reviewing on behalf of TahaZahid05 June 10, 2026 15:01 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py Outdated

Comment thread ignite/metrics/nlp/perplexity.py

Comment thread ignite/metrics/nlp/perplexity.py Outdated

Comment thread ignite/metrics/nlp/perplexity.py Outdated

TahaZahid05 reviewed Jun 10, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py Outdated

Comment thread ignite/metrics/nlp/perplexity.py

Comment thread ignite/metrics/nlp/perplexity.py Outdated

Comment thread ignite/metrics/nlp/perplexity.py Outdated

steaphenai and others added 2 commits June 12, 2026 21:55

Merge branch 'master' into feat/perplexity-metric-pr

494d282

fix(metrics): address Perplexity PR review feedback

c8202ac

Co-authored-by: Cursor <cursoragent@cursor.com>

TahaZahid05 reviewed Jun 13, 2026

View reviewed changes

Comment thread ignite/metrics/nlp/perplexity.py Outdated

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated

steaphenai and others added 2 commits June 13, 2026 16:18

Update ignite/metrics/nlp/perplexity.py

d191f11

Co-authored-by: Taha Zahid <156289245+TahaZahid05@users.noreply.github.com>

Update tests/ignite/metrics/nlp/test_perplexity.py

901e50b

Co-authored-by: Taha Zahid <156289245+TahaZahid05@users.noreply.github.com>

Uh oh!

Conversation

steaphenai commented Apr 20, 2026

Summary

Test plan

Files changed

Uh oh!

Prathamesh8989 commented Apr 20, 2026

Uh oh!

steaphenai commented Apr 20, 2026

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

steaphenai commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steaphenai commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 commented Apr 23, 2026

Uh oh!

steaphenai commented Apr 23, 2026

Uh oh!

vfdev-5 commented Apr 24, 2026

Uh oh!

steaphenai commented Apr 24, 2026

Uh oh!

TahaZahid05 commented May 22, 2026

Uh oh!

steaphenai commented May 23, 2026

Uh oh!

TahaZahid05 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TahaZahid05 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TahaZahid05 commented Jun 13, 2026

Uh oh!

TahaZahid05 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

steaphenai commented Apr 21, 2026 •

edited

Loading

steaphenai commented Apr 21, 2026 •

edited

Loading

TahaZahid05 commented May 25, 2026 •

edited

Loading