ClairaClaira Help Desk

PII Identification

Voir en français

Use Claira to detect and extract personally identifiable information from documents in Nuix Discover.

PII Identification

Data breach notifications, regulatory compliance, and privacy assessments all require you to know what personal information lives in your documents. Manually scanning thousands of files for names, emails, Social Security numbers, and addresses is slow and error-prone.

Claira can read through extracted text and flag personally identifiable information (PII), saving your team significant time while reducing the risk of missing something.

How Claira helps

  • Detects multiple PII types. Names, email addresses, phone numbers, SSNs, physical addresses, financial account numbers, and more.
  • Supports compliance workflows. Whether you are responding to a breach notification requirement, a subject access request, or a regulatory audit, Claira helps you identify what needs attention.
  • Works at scale. Run PII identification across a full document set in a single bulk scan.

When to use this

  • Breach notification and incident response
  • Privacy impact assessments
  • Preparing documents for production with redaction lists
  • Subject access requests under GDPR, CCPA, or similar regulations

Sample prompts

You can tailor your PII prompt depending on how targeted or comprehensive you need the results to be.

The comprehensive "surface forms" prompt below is also available in the app under Investigations > Comprehensive PII (surface forms) in the template picker, with a full help article here.

Comprehensive PII extraction (unique surface forms)

Use this when you need every distinct written form of PII, without normalization, in a single machine-friendly line of quoted values. The format below is a good input for Search Term Families in Nuix Discover: each value (or a group you treat as one family) can become grouped search or QC terms across the collection, without retyping every variant the model found.

Comprehensive PII Prompt (unique surface forms)

Extract every unique instance of Personally Identifiable Information (PII) from this document. PII includes, but is not limited to: full names, partial names, initials, nicknames, titles with names (e.g., Mr. Doe, Dr. Smith), email addresses, phone numbers, physical addresses, dates of birth, government-issued identifiers (SSN, SIN, passport numbers, driver's license numbers), financial account numbers, medical record numbers, IP addresses, and usernames. Do not include dates or URLs that are not contained in email addresses. Follow these rules exactly:

  • Output every unique surface form of each PII instance exactly as it appears in the document. If the same person, address, or other entity is written in multiple formats, include each distinct format as a separate entry (e.g., "John Doe", "J. Doe", "Mr. Doe", "Doe, John").
  • Do not include duplicates of the exact same formatted string. Each entry in the output must be unique character-for-character.
  • Do not normalize, correct, reformat, expand, or abbreviate any value. Preserve original casing, punctuation, spacing, and spelling, including apparent typos.
  • For addresses, include each distinct written format separately (e.g., "123 Main St.", "123 Main Street", "123 Main St, Montreal, QC").
  • For phone numbers, include each distinct written format separately (e.g., "514-555-1234", "(514) 555-1234", "+1 514 555 1234").
  • Wrap each value in straight double quotation marks. Separate entries with a comma. Do not use line breaks, bullets, numbering, curly braces, square brackets, or any other wrapper or delimiter.
  • Do not include any introductory text, explanatory text, headings, labels, categories, counts, trailing commentary, or closing remarks. The response must begin with the first quotation mark and end with the final quotation mark. If no PII is found, output exactly: "NO PII DETECTED"

Example of correctly formatted output: "John Doe","J. Doe","Mr. Doe","john@doe.co","514-555-1234","123 Main St."

Targeted extraction

Use this when you know exactly which PII types you are looking for and want a simple paired list instead of a full unique-form sweep.

Targeted PII Prompt

Identify all names and email addresses. Pair in format: [Name] <[email]> // [Name] <[email]>

Tips for better results

Start with the comprehensive prompt on a small sample to understand what PII types are present in your collection. Then switch to a targeted prompt for the bulk scan if you only need specific types.
  • Search Term Families. The comprehensive prompt output is a strong input for Search Term Families in Nuix Discover: each returned string (or a set you group manually) can seed a family for search and QC. Adjust membership as needed before running matter-wide.
  • Specify the format you need. If downstream tools expect a different structure (for example, a narrative redaction log), say so in the prompt, or use the targeted example when paired fields are enough.
  • Handle raw PII in stored review fields with care. The comprehensive format lists values as they appear in the text. Limit field visibility and follow your organization's data-handling and retention policy.
  • Combine with human review. PII identification is high-stakes. Use Claira output as a starting point, then have a reviewer confirm before acting on the results.
Claira analyzes extracted text only. If PII appears in images, handwritten notes, or scanned documents with poor OCR quality, it may not be detected. Always verify OCR quality before relying on AI-based PII extraction.

Need help? Contact support@claira.to

Was this page helpful?