Skip to content

Add country filtering for analyzer recognizers#2031

Open
Q1ufeng wants to merge 2 commits into
microsoft:mainfrom
Q1ufeng:feature/filter-recognizers-by-country
Open

Add country filtering for analyzer recognizers#2031
Q1ufeng wants to merge 2 commits into
microsoft:mainfrom
Q1ufeng:feature/filter-recognizers-by-country

Conversation

@Q1ufeng
Copy link
Copy Markdown

@Q1ufeng Q1ufeng commented May 18, 2026

Change Description

Add request-time country filtering for analyzer recognizers.

This extends RecognizerRegistry.get_recognizers(...), AnalyzerEngine.get_recognizers(...), AnalyzerEngine.get_supported_entities(...), and AnalyzerEngine.analyze(...) with an optional countries parameter. When provided, country-specific recognizers are included only if their country_code matches the requested country list, while locale-agnostic recognizers continue to run unchanged.

The implementation reuses the existing country filtering utility used during predefined recognizer loading, so behavior is consistent between load-time and request-time filtering.

Unit tests were added for:

  • filtering already-loaded recognizers at registry level;
  • filtering recognizers exposed through AnalyzerEngine.get_recognizers(...);
  • ensuring AnalyzerEngine.analyze(...) only runs matching country-specific recognizers plus locale-agnostic recognizers.

Issue reference

Fixes #1328

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

omri374
omri374 previously approved these changes May 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds request-time country filtering to Presidio Analyzer so callers can restrict execution to country-specific recognizers matching a provided country list, while always keeping locale-agnostic recognizers.

Changes:

  • Extend RecognizerRegistry.get_recognizers(...) to accept an optional countries filter and apply existing RecognizerListLoader.filter_by_countries(...).
  • Thread the optional countries parameter through AnalyzerEngine.get_recognizers(...), AnalyzerEngine.get_supported_entities(...), and AnalyzerEngine.analyze(...).
  • Add unit tests covering registry-level filtering and analyzer execution behavior with countries.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
presidio-analyzer/presidio_analyzer/recognizer_registry/recognizer_registry.py Adds countries parameter to registry queries and applies request-time filtering; threads countries into supported-entities collection.
presidio-analyzer/presidio_analyzer/analyzer_engine.py Exposes countries on analyzer APIs and propagates it to registry selection and supported-entity expansion when all_fields=True.
presidio-analyzer/tests/test_recognizer_registry.py Adds a unit test validating request-time filtering against an already-loaded registry.
presidio-analyzer/tests/test_analyzer_engine.py Adds unit tests validating filtering via AnalyzerEngine.get_recognizers(...) and that analyze(...) runs only matching country recognizers plus locale-agnostic ones.

Comment on lines +190 to +196
all_possible_recognizers = copy.copy(self.recognizers)
if ad_hoc_recognizers:
all_possible_recognizers.extend(ad_hoc_recognizers)
if countries is not None:
all_possible_recognizers = RecognizerListLoader.filter_by_countries(
all_possible_recognizers, countries
)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I’ll normalize self.recognizers to a list inside get_recognizers before appending ad-hoc recognizers, so registries initialized with any iterable type behave as advertised by the public type hint. I’ll also add a regression test using a tuple-backed registry.

@Q1ufeng
Copy link
Copy Markdown
Author

Q1ufeng commented May 19, 2026

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +189 to 191
all_possible_recognizers = list(self.recognizers)
if ad_hoc_recognizers:
all_possible_recognizers.extend(ad_hoc_recognizers)
Comment on lines +192 to +195
if countries is not None:
all_possible_recognizers = RecognizerListLoader.filter_by_countries(
all_possible_recognizers, countries
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter recognizers based on locale/country

3 participants