Skip to content

fix: Custom operator validate() must not invoke the lambda#2025

Merged
omri374 merged 4 commits into
microsoft:mainfrom
HammadSiddiqui:fix/custom-operator-validate-side-effect
May 20, 2026
Merged

fix: Custom operator validate() must not invoke the lambda#2025
omri374 merged 4 commits into
microsoft:mainfrom
HammadSiddiqui:fix/custom-operator-validate-side-effect

Conversation

@HammadSiddiqui
Copy link
Copy Markdown
Contributor

Change Description

Custom.validate() was calling new_val("PII") to check the return type of the lambda. This causes a spurious invocation with a dummy string, which breaks stateful lambdas — e.g. anonymizers that build a token-to-original-value map for de-anonymization will insert a bogus {"TOKEN_1": "PII"} entry and shift all subsequent token counters by one.

Fix:

  • Remove the new_val("PII") probe call from validate() — only check callable(new_val)
  • Move the return-type contract (isinstance(result, str)) to operate(), where the lambda runs on real data
    Tests added:
  • test_given_non_str_lambda_then_ipe_raised_at_operate_time — updated to reflect that the check now happens at operate-time
  • test_stateful_lambda_not_called_during_validate — regression: asserts validate() never calls the lambda
  • test_stateful_token_map_not_corrupted_by_validate — regression: asserts no spurious "PII" entry in token map after validate()

Issue reference

Fixes #2024

Checklist

  • I have reviewed the contribution guidelines
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

…vent stateful side effects

Custom.validate() was calling the user's lambda with the literal string 'PII'
to assert it returns a str. For stateful lambdas that build a token map for
de-anonymization, this inserted a spurious {TOKEN_1: 'PII'} entry and shifted
all token counters by one, silently corrupting results.

Fix: replace the probe call with a callable() check only. Return-type is
already enforced in operate() when the lambda runs on real data.

Also fix typo in class docstring: 'retrun' -> 'return'.

Fixes microsoft#2024
…t fix

- Add test_stateful_lambda_not_called_during_validate: asserts validate()
  does not invoke the lambda at all
- Add test_stateful_token_map_not_corrupted_by_validate: asserts token map
  contains only real values, no spurious 'PII' entry
- Update test_given_non_str_lambda: return-type check now in operate(),
  move validation to operate() to enforce contract on real data
- Move isinstance check to operate() so non-str return types are still caught
@HammadSiddiqui
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

omri374
omri374 previously approved these changes May 18, 2026
@omri374 omri374 requested a review from Copilot May 18, 2026 12:58
@omri374
Copy link
Copy Markdown
Collaborator

omri374 commented May 18, 2026

Thanks, good change. Note that you don't have to use the custom operator for non-traditional operations. You can create your own operator. see more here: https://microsoft.github.io/presidio/samples/python/pseudonymization/

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes unintended side effects in the anonymizer’s Custom operator by ensuring validate() no longer executes the user-supplied lambda (previously probed with a dummy "PII" value), and instead enforces the lambda return-type contract at operate() time.

Changes:

  • Removed lambda invocation from Custom.validate(); it now only checks that the provided value is callable.
  • Added return-type enforcement (str) to Custom.operate(), raising InvalidParamError when violated.
  • Updated/added regression tests for stateful lambdas and updated the changelog entry.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
presidio-anonymizer/presidio_anonymizer/operators/custom.py Stops calling the custom lambda during validation and enforces return type during operation.
presidio-anonymizer/tests/operators/test_custom.py Updates and expands tests to cover stateful lambda regressions and operate-time return-type enforcement.
CHANGELOG.md Documents the behavioral fix for the Custom operator’s validation.

Comment thread presidio-anonymizer/tests/operators/test_custom.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@HammadSiddiqui
Copy link
Copy Markdown
Contributor Author

Updated the Error assertion based on the copilot's review.

@omri374 omri374 merged commit 722d18f into microsoft:main May 20, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom operator lambda called with dummy "PII" value during validation, causing side effects in stateful lambdas

3 participants