Add country filtering for analyzer recognizers#2031
Open
Q1ufeng wants to merge 2 commits into
Open
Conversation
omri374
previously approved these changes
May 18, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Adds request-time country filtering to Presidio Analyzer so callers can restrict execution to country-specific recognizers matching a provided country list, while always keeping locale-agnostic recognizers.
Changes:
- Extend
RecognizerRegistry.get_recognizers(...)to accept an optionalcountriesfilter and apply existingRecognizerListLoader.filter_by_countries(...). - Thread the optional
countriesparameter throughAnalyzerEngine.get_recognizers(...),AnalyzerEngine.get_supported_entities(...), andAnalyzerEngine.analyze(...). - Add unit tests covering registry-level filtering and analyzer execution behavior with
countries.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| presidio-analyzer/presidio_analyzer/recognizer_registry/recognizer_registry.py | Adds countries parameter to registry queries and applies request-time filtering; threads countries into supported-entities collection. |
| presidio-analyzer/presidio_analyzer/analyzer_engine.py | Exposes countries on analyzer APIs and propagates it to registry selection and supported-entity expansion when all_fields=True. |
| presidio-analyzer/tests/test_recognizer_registry.py | Adds a unit test validating request-time filtering against an already-loaded registry. |
| presidio-analyzer/tests/test_analyzer_engine.py | Adds unit tests validating filtering via AnalyzerEngine.get_recognizers(...) and that analyze(...) runs only matching country recognizers plus locale-agnostic ones. |
Comment on lines
+190
to
+196
| all_possible_recognizers = copy.copy(self.recognizers) | ||
| if ad_hoc_recognizers: | ||
| all_possible_recognizers.extend(ad_hoc_recognizers) | ||
| if countries is not None: | ||
| all_possible_recognizers = RecognizerListLoader.filter_by_countries( | ||
| all_possible_recognizers, countries | ||
| ) |
Author
There was a problem hiding this comment.
Good catch. I’ll normalize self.recognizers to a list inside get_recognizers before appending ad-hoc recognizers, so registries initialized with any iterable type behave as advertised by the public type hint. I’ll also add a regression test using a tuple-backed registry.
Author
|
@microsoft-github-policy-service agree |
Comment on lines
+189
to
191
| all_possible_recognizers = list(self.recognizers) | ||
| if ad_hoc_recognizers: | ||
| all_possible_recognizers.extend(ad_hoc_recognizers) |
Comment on lines
+192
to
+195
| if countries is not None: | ||
| all_possible_recognizers = RecognizerListLoader.filter_by_countries( | ||
| all_possible_recognizers, countries | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
Add request-time country filtering for analyzer recognizers.
This extends
RecognizerRegistry.get_recognizers(...),AnalyzerEngine.get_recognizers(...),AnalyzerEngine.get_supported_entities(...), andAnalyzerEngine.analyze(...)with an optionalcountriesparameter. When provided, country-specific recognizers are included only if theircountry_codematches the requested country list, while locale-agnostic recognizers continue to run unchanged.The implementation reuses the existing country filtering utility used during predefined recognizer loading, so behavior is consistent between load-time and request-time filtering.
Unit tests were added for:
AnalyzerEngine.get_recognizers(...);AnalyzerEngine.analyze(...)only runs matching country-specific recognizers plus locale-agnostic recognizers.Issue reference
Fixes #1328
Checklist