Skip to content

feat: add Turkish phone number (TR_PHONE_NUMBER) recognizer#2006

Open
mrcuren wants to merge 6 commits into
microsoft:mainfrom
mrcuren:feat/turkey-phone-number-recognizer-clean
Open

feat: add Turkish phone number (TR_PHONE_NUMBER) recognizer#2006
mrcuren wants to merge 6 commits into
microsoft:mainfrom
mrcuren:feat/turkey-phone-number-recognizer-clean

Conversation

@mrcuren
Copy link
Copy Markdown
Contributor

@mrcuren mrcuren commented Apr 26, 2026

Adds Turkish phone number recognizer to Presidio Analyzer, following up on the discussion in #1973.

The generic PhoneRecognizer uses python-phonenumbers and can parse Turkish numbers when TR is added to supported_regions. However, a country-specific recognizer provides additional value:

  • TR-specific entity type (TR_PHONE_NUMBER vs generic PHONE_NUMBER) for targeted PII detection
  • Comprehensive format validation: mobile (5XX), geographic (2XX/3XX/4XX) number ranges
  • ITU-T E.164 compliance for Turkey (+90)
  • MNP-aware validation (no operator-specific checks, as Mobile Number Portability makes them unreliable)
  • Turkish + English context words for higher confidence detection

Features:

  • Pattern recognition for mobile numbers (5XX) and geographic numbers (2XX/3XX/4XX)
  • Supports international (+90), national (0), and local formats
  • Country-specific validation via validate_result() and _validate_turkish_number()
  • Context words in both Turkish and English
  • Handles space, no-space, and hyphen separators
  • Disabled by default as per country-specific recognizer guidelines

Legal basis: Karayolları Trafik Kanunu (KTK) Madde 23, ITU-T E.164.

Issue reference

Part of #1973

Testing

  • Added test_tr_phone_number_recognizer.py with 51 test cases
  • Tests include valid/invalid formats, geographic numbers, multiple numbers, false positive checks
  • All existing tests continue to pass

Checklist

  • I have reviewed the contribution guidelines
  • My code follows the project style guidelines (ruff, pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md under the Unreleased section
  • I have updated the supported_entities.md documentation
  • I have added my recognizer to default_recognizers.yaml with enabled: false
  • I have added my recognizer to __init__.py and __all__

mrcuren added 2 commits April 26, 2026 16:28
- Add Turkey (TR) support to generic PhoneRecognizer
- Extend TR_PHONE_NUMBER to support geographic numbers (2/3/4 prefix)
- Implement ITU-T E.164 compliant validation with MNP awareness
- Add Turkish context words for better detection accuracy
- Update tests and documentation for enhanced coverage
- Legal basis: KTK Madde 23, ITU-T E.164 compliance

Addresses SharonHart's feedback on country-specific checks
Generic PhoneRecognizer region changes are out of scope for TR_PHONE_NUMBER.
Focus only on the country-specific recognizer.
@mrcuren
Copy link
Copy Markdown
Contributor Author

mrcuren commented Apr 26, 2026

@SharonHart Following up on your feedback in #1973, this PR adds the country-specific Turkish phone number recognizer with:

  • Mobile (5XX) + geographic (2XX/3XX/4XX) validation
  • ITU-T E.164 compliance, MNP-aware
  • Turkish + English context words
  • 51 test cases, all passing

Ready for review when you have a chance.

mrcuren added 3 commits April 27, 2026 14:15
…ig instead of subclass

- Fix PhoneRecognizer._get_recognizer_result to use self.supported_entities[0]
  instead of hardcoded 'PHONE_NUMBER', making the supported_entity parameter
  from PR microsoft#2014 fully functional
- Delete TrPhoneNumberRecognizer subclass; TR phone detection now uses
  PhoneRecognizer(supported_regions=['TR'], supported_entity='TR_PHONE_NUMBER',
  context=[...]) programmatically per maintainer guidance
- Remove TrPhoneNumberRecognizer from __init__.py, __all__, and
  default_recognizers.yaml
- Rewrite tests to use PhoneRecognizer with TR config (40 test cases)
- Update CHANGELOG.md and docs/supported_entities.md
@mrcuren
Copy link
Copy Markdown
Contributor Author

mrcuren commented May 18, 2026

Refactored per @SharonHart guidance. Fixed PhoneRecognizer._get_recognizer_result bug (entity_type=PHONE_NUMBER to self.supported_entities[0]). Deleted TrPhoneNumberRecognizer subclass - TR phone detection now uses PhoneRecognizer(supported_regions=['TR'], supported_entity='TR_PHONE_NUMBER', context=...) programmatically. 8 files changed, +61/-332. 40/40 TR phone tests, 26/26 existing phone tests, 57/57 Turkey recognizer tests all pass. Ruff lint: 0 errors.

SharonHart
SharonHart previously approved these changes May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants