fix: Custom operator validate() must not invoke the lambda#2025
Merged
omri374 merged 4 commits intoMay 20, 2026
Merged
Conversation
…vent stateful side effects
Custom.validate() was calling the user's lambda with the literal string 'PII'
to assert it returns a str. For stateful lambdas that build a token map for
de-anonymization, this inserted a spurious {TOKEN_1: 'PII'} entry and shifted
all token counters by one, silently corrupting results.
Fix: replace the probe call with a callable() check only. Return-type is
already enforced in operate() when the lambda runs on real data.
Also fix typo in class docstring: 'retrun' -> 'return'.
Fixes microsoft#2024
…t fix - Add test_stateful_lambda_not_called_during_validate: asserts validate() does not invoke the lambda at all - Add test_stateful_token_map_not_corrupted_by_validate: asserts token map contains only real values, no spurious 'PII' entry - Update test_given_non_str_lambda: return-type check now in operate(), move validation to operate() to enforce contract on real data - Move isinstance check to operate() so non-str return types are still caught
Contributor
Author
|
@microsoft-github-policy-service agree |
omri374
previously approved these changes
May 18, 2026
Collaborator
|
Thanks, good change. Note that you don't have to use the custom operator for non-traditional operations. You can create your own operator. see more here: https://microsoft.github.io/presidio/samples/python/pseudonymization/ |
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes unintended side effects in the anonymizer’s Custom operator by ensuring validate() no longer executes the user-supplied lambda (previously probed with a dummy "PII" value), and instead enforces the lambda return-type contract at operate() time.
Changes:
- Removed lambda invocation from
Custom.validate(); it now only checks that the provided value is callable. - Added return-type enforcement (
str) toCustom.operate(), raisingInvalidParamErrorwhen violated. - Updated/added regression tests for stateful lambdas and updated the changelog entry.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| presidio-anonymizer/presidio_anonymizer/operators/custom.py | Stops calling the custom lambda during validation and enforces return type during operation. |
| presidio-anonymizer/tests/operators/test_custom.py | Updates and expands tests to cover stateful lambda regressions and operate-time return-type enforcement. |
| CHANGELOG.md | Documents the behavioral fix for the Custom operator’s validation. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Contributor
Author
|
Updated the Error assertion based on the copilot's review. |
omri374
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
Custom.validate()was callingnew_val("PII")to check the return type of the lambda. This causes a spurious invocation with a dummy string, which breaks stateful lambdas — e.g. anonymizers that build a token-to-original-value map for de-anonymization will insert a bogus{"TOKEN_1": "PII"}entry and shift all subsequent token counters by one.Fix:
new_val("PII")probe call fromvalidate()— only checkcallable(new_val)isinstance(result, str)) tooperate(), where the lambda runs on real dataTests added:
test_given_non_str_lambda_then_ipe_raised_at_operate_time— updated to reflect that the check now happens at operate-timetest_stateful_lambda_not_called_during_validate— regression: asserts validate() never calls the lambdatest_stateful_token_map_not_corrupted_by_validate— regression: asserts no spurious "PII" entry in token map after validate()Issue reference
Fixes #2024
Checklist