Skip to content

Use imagenet dataset from hf in conformance tests#3323

Draft
AlexanderDokuchaev wants to merge 4 commits into
openvinotoolkit:developfrom
AlexanderDokuchaev:ad/hf_imagenet
Draft

Use imagenet dataset from hf in conformance tests#3323
AlexanderDokuchaev wants to merge 4 commits into
openvinotoolkit:developfrom
AlexanderDokuchaev:ad/hf_imagenet

Conversation

@AlexanderDokuchaev

Copy link
Copy Markdown
Collaborator

Changes

Add action for ptq conformance tests
Use HF dataset
Update metrics in result change in calibration subset

Tests

@github-actions github-actions Bot added documentation Improvements or additions to documentation NNCF PTQ Pull requests that updates NNCF PTQ labels Mar 3, 2025
@AlexanderDokuchaev AlexanderDokuchaev marked this pull request as ready for review May 12, 2026 09:50
@AlexanderDokuchaev AlexanderDokuchaev requested a review from a team as a code owner May 12, 2026 09:50
Copilot AI review requested due to automatic review settings May 12, 2026 09:50
@AlexanderDokuchaev AlexanderDokuchaev marked this pull request as draft May 12, 2026 09:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the post-training (PTQ) conformance tests to use the ImageNet-1k validation split from Hugging Face datasets instead of relying on a locally prepared torchvision ImageFolder, and refactors image-classification pipelines to share calibration input transformation logic in the base class.

Changes:

  • Switched ImageNet data loading in image-classification PTQ pipelines from local filesystem (ImageFolder) to a Hugging Face dataset loader targeting validation.
  • Moved get_transform_calibration_fn logic from torchvision/timm pipelines into ImageClassificationBase.
  • Updated PTQ test invocation to no longer pass a data_dir value (now None) and adjusted documentation accordingly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/post_training/test_quantize_conformance.py Stops passing data_dir for PTQ runs (now HF-based).
tests/post_training/README.md Updates data preparation guidance to HF datasets/token usage.
tests/post_training/pipelines/image_classification_torchvision.py Removes per-pipeline calibration transform (now inherited).
tests/post_training/pipelines/image_classification_timm.py Removes per-pipeline calibration transform (now inherited).
tests/post_training/pipelines/image_classification_base.py Implements HF ImageNet-1k val loader and adapts calibration/validation to dict-based batches; adds shared calibration transform.
Comments suppressed due to low confidence (1)

tests/post_training/pipelines/image_classification_base.py:163

  • After switching validation to the HF ImageNet dataset, val_loader yields dict batches with {"image": ..., "label": ...} (as handled in _validate_ov), but _validate_torch_compile still iterates as for i, (images, target) in enumerate(val_loader). With dict batches, this unpacks keys instead of tensors and will break FX-backend validation. Update _validate_torch_compile to read data["image"] / data["label"] (matching the new dataset format).
    def _validate(self) -> None:
        val_dataset = hf_imagenet_1k_val(self.transform)
        val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=1, num_workers=2, shuffle=False)

        dataset_size = len(val_loader)

        # Initialize result tensors for async inference support.
        predictions = np.zeros(dataset_size)
        references = -1 * np.ones(dataset_size)

        if self.backend in FX_BACKENDS:
            predictions, references = self._validate_torch_compile(val_loader, predictions, references)
        else:
            predictions, references = self._validate_ov(val_loader, predictions, references, dataset_size)

Comment thread tests/post_training/pipelines/image_classification_base.py
Comment on lines +26 to +32
Using datasets from huggingface, required set HF_TOKEN environment variable.
For using imagenet-1k need to sign licence https://huggingface.co/datasets/mlx-vision/imagenet-1k.

<data>/imagenet/val - name of path
Since Torchvision `ImageFolder` class is used to work with data the ImageNet validation dataset should be structured accordingly. Below is an example of the `val` folder:

```text
n01440764
n01695060
n01843383
...
```
> [!IMPORTANT]
> Used modified version of loader imagenet-1k to download only validation subset.
> To avoid any conflict with full dataset set another cache directory for this test.
> https://huggingface.co/docs/datasets/en/cache#cache-directory
Comment thread tests/post_training/test_quantize_conformance.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation NNCF PTQ Pull requests that updates NNCF PTQ

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants