Shield AI Classification is available only as part of the Shield Pro add-on.
AI Classification helps to assess and classify your content, applying the appropriate classification label automatically. This guide explains how to write effective label prompts to quickly and easily classify content. It features:
- How AI Classification works
- Configuration recommendations
- Best practices
- Using AI to improve your prompts
- Tips & tricks
- Example label prompts
- Known limitations
For setup and feature details, see AI Classification.
How AI Classification works
The Security Classification Agent:
- Reads your label definitions (prompts)
- Evaluates each file against all labels in a policy
- Applies the single best-fit label, or none if no definition is met confidently
- Displays applied labels and reasoning to end users in the file sidebar under Additional Details
Configuration recommendations
The following settings work well in most situations:
- Consider allowing end users to modify classifications using Classification Modification Permissions
- Set conflict handling to Skip, especially if users can apply or modify labels
- AI Classification performs best with 1 to 3 labels
- Think like a human reviewer, where too many labels increase the likelihood of inconsistent classifications
- You may not want all labels to be applied by AI, such as Public where you might only want humans to apply the label
AI Classification policy best practices
Define effective label criteria
To ensure accurate AI Classification, label definitions should be:
- Distinct: Each label should have non-overlapping, clearly differentiated criteria that targets a unique set of document characteristics.
-
Descriptive: Use plain language to specify:
- Document types (e.g. contracts, strategy decks, spreadsheets)
- Topics or intent (e.g. product roadmap, security breach, deal terms)
- Data types (e.g. PII, source code, financials)
- Audience (e.g. internal teams, legal)
Avoid:
- Vague descriptors (e.g., "High risk to the company")
- Overlapping labels (e.g., "Confidential" vs. "Highly Confidential")
- Undefined technical jargon
Troubleshooting tips
If AI Classification results are not meeting expectations:
- Use fewer, well-defined labels: Add examples and tighten criteria
- Check for overlap: Ensure labels are clear and unambiguous without overlap and avoid “catch-all” labels
- Ensure the file is a supported file type: View the text and image file types that are supported
Known Limitations
AI Classification returns mixed and sometimes inaccurate information for criteria that includes the following conditions or topics:
- Calculations, table structures, and numbers
- Counting words or phrases
- Document metadata such as page number, authors, file size, word count, and collaborators (AI Classification doesn't take into account any of these document components)
- Images, charts, graphs, etc. that are within text documents (it can only analyze image files directly)
Using AI to improve your prompts
Note: The guidance in this section reflects general best practices intended to help guide prompt design and usage. Results may vary depending on your specific use case, data, and configuration. These recommendations are intended as guidance only and may not produce consistent or expected results in all scenarios.
Use Box AI to help refine your existing classification label definitions into LLM-friendly criteria:
- Open the document in Box which contains your existing definitions.
- Select Box AI from the right-hand sidebar, or from the top navigation bar.
- Use the below example prompt to rewrite each label definition into a clear, descriptive criteria optimized for AI-based classification.
- Copy the output into the relevant classification label.
Example prompt
Please rewrite each data classification label definition so it is LLM-friendly, clear, and semantically precise, suitable for use as label criteria in an AI-based content classification system.
When rewriting the criteria:
- Focus only on what types of documents or content belong in each label.
- Use plain, descriptive language that reflects document types, topics, data sensitivity, and intended audience.
- Make each label distinct and clearly differentiated from the others.
- Prefer concise paragraphs or short bullet lists.
Do not include:
- System or instructional language (for example: “you should classify,” “evaluate,” or “only apply if”).
- Decision logic, prioritization rules, or conflict-resolution guidance.
- Any description of how an AI model should behave.
Output format:
- Label name
- Refined label criteria
Do not include explanations or additional commentary.
Tips & tricks
Exclusion criteria / negative examples
Explicitly exclude common false positives.
Example: “Not restricted if data is anonymized or aggregated and cannot be linked to an individual (e.g., ‘Average customer income by age group’).”
Label prioritization (tie breakers)
Define which label wins if multiple criteria are met.
Example: “If criteria for both Internal and Sensitive are met, classify as Sensitive.”
Default label logic
Use one label as a fallback.
Example: “Apply Internal Only if no other label criteria are met.”
Time sensitivity
If dates matter, explicitly state they should be evaluated against today’s date. Don’t assume the model will infer the comparison.
Example: “Only classify as Restricted if the date in the file is after January 1, 2023. Compare against today’s date.”
Example label prompts
Note: These example prompts are provided for demonstration purposes only and do not constitute legal or compliance advice. Actual requirements and outcomes may vary based on your organization's policies, use cases, and regulatory obligations.
Confidential Data
Includes:
- Business records: audit findings, management and board reports, strategic presentations, incident response materials, third-party risk documentation (e.g., SOC reports)
- Operational data: KPI reports, productivity metrics, security logs, meeting transcripts or recordings
- Employee information: performance reviews, disciplinary actions
- Masked or anonymized data: masked PII (e.g., last 4 digits of SSN), anonymized or aggregated NPI
Restricted Data
Includes:
- Corporate information: non-public strategies, pre-announcement M&A data, legal or regulatory investigations
- PII: SSNs, driver’s license numbers, passport numbers, payment card numbers, medical records, biometric data (not restricted if masked or truncated and cannot be reconstructed)
-
NPI: bank account numbers, balances, DOB, income or salary data, credit reports, transaction history, loan or insurance data
(not restricted if anonymized or aggregated)
Known limitations
AI Classification may return mixed results for criteria involving:
- Calculations, numeric reasoning, or complex tables
- Counting words or phrases
- Document metadata (page numbers, authors, file size, word count, collaborators)
- Images, charts, or graphs embedded in text documents (images are only analyzed when the file itself is an image)