Skip to main content

PII Detection Configuration

pii_detection:
  presidio_url: http://localhost:5002
  languages: [en, de]
  fallback_language: en
  score_threshold: 0.7
  entities:
    - PERSON
    - EMAIL_ADDRESS
    - PHONE_NUMBER
    - CREDIT_CARD
    - IBAN_CODE
    - IP_ADDRESS
    - LOCATION

Options

OptionDefaultDescription
presidio_urlhttp://localhost:5002Presidio analyzer URL
languages[en]Languages to detect. Must match Docker build
fallback_languageenFallback if detected language not in list
score_threshold0.7Minimum confidence (0.0-1.0)
entitiesSee belowEntity types to detect

Languages

Languages must be installed during Docker build:
LANGUAGES=en,de,fr docker compose build
Available languages (24): ca, zh, hr, da, nl, en, fi, fr, de, el, it, ja, ko, lt, mk, nb, pl, pt, ro, ru, sl, es, sv, uk

Single Language

If only one language is specified, language detection is skipped for better performance:
pii_detection:
  languages: [en]

Fallback Language

If the detected language isn’t in your list, the fallback is used:
pii_detection:
  languages: [en, de]
  fallback_language: en  # Used for French text, etc.

Entities

EntityExamples
PERSONDr. Sarah Chen, John Smith
EMAIL_ADDRESSsarah.chen@hospital.org
PHONE_NUMBER+1-555-123-4567
CREDIT_CARD4111-1111-1111-1111
IBAN_CODEDE89 3704 0044 0532 0130 00
IP_ADDRESS192.168.1.1
LOCATIONNew York, 123 Main St
US_SSN123-45-6789
US_PASSPORT123456789
CRYPTOBitcoin addresses
URLhttps://example.com

Score Threshold

Higher = fewer false positives, might miss some PII. Lower = catches more PII, more false positives.
pii_detection:
  score_threshold: 0.7  # Default, good balance
  # score_threshold: 0.5  # More aggressive
  # score_threshold: 0.9  # More conservative