Skip to content

feat(metrics): add Perplexity metric to ignite.metrics.nlp#3743

Open
steaphenai wants to merge 20 commits into
pytorch:masterfrom
steaphenai:feat/perplexity-metric-pr
Open

feat(metrics): add Perplexity metric to ignite.metrics.nlp#3743
steaphenai wants to merge 20 commits into
pytorch:masterfrom
steaphenai:feat/perplexity-metric-pr

Conversation

@steaphenai

Copy link
Copy Markdown
Contributor

Closes #3742

Summary

  • Add a new Perplexity metric implementation in ignite.metrics.nlp.perplexity.
  • Export Perplexity from both ignite.metrics.nlp and top-level ignite.metrics.
  • Add dedicated tests for correctness, accumulation behavior, reset behavior, return type, and invalid inputs.

Test plan

  • python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -v
  • Smoke test:
    • python -c "from ignite.metrics.nlp import Perplexity; import torch; ppl = Perplexity(); ppl.reset(); ppl.update((torch.randn(2,5,3), torch.randint(0,5,(2,3)))); print('PPL =', ppl.compute())"

Files changed

  • ignite/metrics/nlp/perplexity.py
  • ignite/metrics/nlp/__init__.py
  • ignite/metrics/__init__.py
  • tests/ignite/metrics/nlp/test_perplexity.py
image

@github-actions github-actions Bot added the module: metrics Metrics module label Apr 20, 2026
Expose a new token-level Perplexity metric in ignite.metrics.nlp and top-level ignite.metrics, with dedicated unit tests to validate correctness and behavior.
@steaphenai steaphenai force-pushed the feat/perplexity-metric-pr branch from 8453d4e to fa394ca Compare April 20, 2026 10:33
@Prathamesh8989

Copy link
Copy Markdown
Contributor

Nice addition! Perplexity is definitely a useful metric for language modeling and it fits well under ignite.metrics.nlp.

The test coverage looks solid — especially the token-weighted accumulation test, which ensures correctness across batches with different sequence lengths.

One small suggestion: it might be useful to add a GPU test to ensure the metric behaves correctly when tensors are on CUDA devices, since many language modeling workloads run on GPU.

Something like:

def test_gpu_support():
    if not torch.cuda.is_available():
        pytest.skip()

Overall the implementation and tests look clean and consistent with existing Ignite metrics.

@steaphenai

Copy link
Copy Markdown
Contributor Author

Good point, thanks. I’d like to keep this PR scoped to the Perplexity implementation and core correctness tests. We can add a dedicated CUDA test if maintainers want explicit GPU coverage.
@vfdev-5 thoughts?

@vfdev-5 vfdev-5 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against.
I suggest to check what we can use for reference implementation.
In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

Comment thread ignite/metrics/nlp/perplexity.py
Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
@steaphenai

steaphenai commented Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

@steaphenai thanks for the PR, I made a quick pass and left few comments.

The tests look shallow and there is no reference implementation that we test against. I suggest to check what we can use for reference implementation. In terms of testing on accelerators check other tests like test_accuracy.py to inspire from.

Thanks for the quick review, @vfdev-5.
I addressed the two code comments in the latest push:
detached tensors from the grad graph in Perplexity.update()
removed test_returns_float
I’ll also use existing metric tests (e.g., test_accuracy.py) as reference for accelerator-oriented test patterns as needed.

@steaphenai

steaphenai commented Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

I added an explicit reference implementation check (_reference_perplexity) and validated both single-batch and multi-batch token-weighted accumulation against it and also I aligned test structure with existing Ignite metric patterns (as in test_accuracy.py): available_device parametrization, device assertions, and distributed-marked test layout.
Local run: python -m pytest tests/ignite/metrics/nlp/test_perplexity.py -m "not distributed" -v -> 10 passed, 12 skipped (CUDA/MPS skipped when unavailable).

image

Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
@vfdev-5 vfdev-5 marked this pull request as draft April 21, 2026 12:37
@steaphenai steaphenai marked this pull request as ready for review April 21, 2026 14:25
@vfdev-5 vfdev-5 marked this pull request as draft April 21, 2026 15:38
Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
@steaphenai steaphenai marked this pull request as ready for review April 22, 2026 06:27
Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
Co-authored-by: vfdev <vfdev.5@gmail.com>
Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
Comment thread ignite/metrics/nlp/__init__.py
Comment thread ignite/metrics/nlp/perplexity.py
@github-actions github-actions Bot added the docs label Apr 23, 2026
@vfdev-5

vfdev-5 commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

@steaphenai

Copy link
Copy Markdown
Contributor Author

@vfdev-5 Could you approve the workflows to run? The required checks are pending approval.

@vfdev-5

vfdev-5 commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator

@steaphenai this failure is real: https://github.com/pytorch/ignite/actions/runs/24847148226/job/72879291595?pr=3743
Check other metrics how do they handle double dtype

@steaphenai

Copy link
Copy Markdown
Contributor Author

@vfdev-5 I checked other metrics in ignite.metrics.nlp. I found that BLEU and ROUGE don't force dtype=torch.double on their accumulators. I've removed the explicit dtype=torch.double and dtype=torch.long from _sum_of_nll and _num_tokens in reset(), and removed dtype=torch.double from the .to() call in update() to match that pattern and fix MPS compatibility.

@TahaZahid05

Copy link
Copy Markdown
Collaborator

hi @steaphenai ! are you still working on this? if you are facing any issue, let us know!

@steaphenai

Copy link
Copy Markdown
Contributor Author

Hi @TahaZahid05 ! Yes, I'm still working on this. This needs a workflow approval to get the required checks running. Thanks!

@TahaZahid05

TahaZahid05 commented May 25, 2026

Copy link
Copy Markdown
Collaborator

@steaphenai docs build failures are real. CI for unit tests are currently broken, are they passing locally for you?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Perplexity NLP metric to ignite.metrics.nlp, exposes it via public imports (ignite.metrics.nlp and ignite.metrics), adds test coverage (including distributed integration), and updates the metrics documentation list.

Changes:

  • Implement ignite.metrics.nlp.Perplexity with token-weighted NLL accumulation and distributed reduction support.
  • Export Perplexity from ignite.metrics.nlp and top-level ignite.metrics.
  • Add unit + distributed tests and include the metric in the docs metrics list.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ignite/metrics/nlp/perplexity.py Adds the new Perplexity metric implementation and its public API/docs.
ignite/metrics/nlp/init.py Re-exports Perplexity from the NLP metrics namespace.
ignite/metrics/init.py Re-exports Perplexity from the top-level metrics namespace.
tests/ignite/metrics/nlp/test_perplexity.py Adds correctness, accumulation/reset, and distributed integration tests for Perplexity.
docs/source/metrics.rst Adds Perplexity to the published metrics list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread ignite/metrics/nlp/perplexity.py
Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread ignite/metrics/nlp/perplexity.py Outdated

@TahaZahid05 TahaZahid05 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steaphenai some small changes required. You can ignore copilot's review as i have already incorporated them in my review. thanks!

Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread ignite/metrics/nlp/perplexity.py
Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread ignite/metrics/nlp/perplexity.py Outdated
Comment thread tests/ignite/metrics/nlp/test_perplexity.py Outdated
@TahaZahid05

Copy link
Copy Markdown
Collaborator

@steaphenai thanks for the updates! All looks good, just apply the above suggestions and then we can merge.

steaphenai and others added 2 commits June 13, 2026 16:18
Co-authored-by: Taha Zahid <156289245+TahaZahid05@users.noreply.github.com>
Co-authored-by: Taha Zahid <156289245+TahaZahid05@users.noreply.github.com>
@TahaZahid05

Copy link
Copy Markdown
Collaborator

@steaphenai code style checks are failing. Kindly follow steps given in CONTRIBUTING.MD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature : Add Perplexity metric to ignite.metrics.nlp

6 participants