AI Classification Prompts - Best Practices – Box Support

Shield AI Classification is available only as part of the Shield Pro add-on.

AI Classification helps to assess and classify your content, applying the appropriate classification label automatically. This guide explains how to write effective label prompts to quickly and easily classify content. It features:

How AI Classification works
Configuration recommendations
Best practices
Using AI to improve your prompts
Tips & tricks
Example label prompts
Known limitations

For setup and feature details, see AI Classification.

How AI Classification works

The Security Classification Agent:

Reads your label definitions (prompts)
Evaluates each file against all labels in a policy
Applies the single best-fit label, or none if no definition is met confidently
Displays applied labels and reasoning to end users in the file sidebar under Additional Details

Configuration recommendations

The following settings work well in most situations:

Consider allowing end users to modify classifications using Classification Modification Permissions
Set conflict handling to Skip, especially if users can apply or modify labels
AI Classification performs best with 1 to 3 labels
- Think like a human reviewer, where too many labels increase the likelihood of inconsistent classifications
- You may not want all labels to be applied by AI, such as Public where you might only want humans to apply the label

AI Classification policy best practices

Define effective label criteria

To ensure accurate AI Classification, label definitions should be:

Distinct: Each label should have non-overlapping, clearly differentiated criteria that targets a unique set of document characteristics.
Descriptive: Use plain language to specify:
- Document types (e.g. contracts, strategy decks, spreadsheets)
- Topics or intent (e.g. product roadmap, security breach, deal terms)
- Data types (e.g. PII, source code, financials)
- Audience (e.g. internal teams, legal)

Avoid:

Vague descriptors (e.g., "High risk to the company")
Overlapping labels (e.g., "Confidential" vs. "Highly Confidential")
Undefined technical jargon

Troubleshooting tips

If AI Classification results are not meeting expectations:

Use fewer, well-defined labels: Add examples and tighten criteria
Check for overlap: Ensure labels are clear and unambiguous without overlap and avoid “catch-all” labels
Ensure the file is a supported file type: View the text and image file types that are supported

Known Limitations

AI Classification returns mixed and sometimes inaccurate information for criteria that includes the following conditions or topics:

Calculations, table structures, and numbers
Counting words or phrases
Document metadata such as page number, authors, file size, word count, and collaborators (AI Classification doesn't take into account any of these document components)
Images, charts, graphs, etc. that are within text documents (it can only analyze image files directly)

Using AI to improve your prompts

Note: The guidance in this section reflects general best practices intended to help guide prompt design and usage. Results may vary depending on your specific use case, data, and configuration. These recommendations are intended as guidance only and may not produce consistent or expected results in all scenarios.

Use Box AI to help refine your existing classification label definitions into LLM-friendly criteria:

Open the document in Box which contains your existing definitions.
Select Box AI from the right-hand sidebar, or from the top navigation bar.
Use the below example prompt to rewrite each label definition into a clear, descriptive criteria optimized for AI-based classification.
Copy the output into the relevant classification label.

Example prompt

Please rewrite each data classification label definition so it is LLM-friendly, clear, and semantically precise, suitable for use as label criteria in an AI-based content classification system.

When rewriting the criteria:

Focus only on what types of documents or content belong in each label.
Use plain, descriptive language that reflects document types, topics, data sensitivity, and intended audience.
Make each label distinct and clearly differentiated from the others.
Prefer concise paragraphs or short bullet lists.

Do not include:

System or instructional language (for example: “you should classify,” “evaluate,” or “only apply if”).
Decision logic, prioritization rules, or conflict-resolution guidance.
Any description of how an AI model should behave.

Output format:

Label name
Refined label criteria

Do not include explanations or additional commentary.

Tips & tricks

Exclusion criteria / negative examples

Explicitly exclude common false positives.
Example: “Not restricted if data is anonymized or aggregated and cannot be linked to an individual (e.g., ‘Average customer income by age group’).”

Label prioritization (tie breakers)

Define which label wins if multiple criteria are met.
Example: “If criteria for both Internal and Sensitive are met, classify as Sensitive.”

Default label logic

Use one label as a fallback.
Example: “Apply Internal Only if no other label criteria are met.”

Time sensitivity

If dates matter, explicitly state they should be evaluated against today’s date. Don’t assume the model will infer the comparison.
Example: “Only classify as Restricted if the date in the file is after January 1, 2023. Compare against today’s date.”

Example label prompts

Note: These example prompts are provided for demonstration purposes only and do not constitute legal or compliance advice. Actual requirements and outcomes may vary based on your organization's policies, use cases, and regulatory obligations.

Confidential Data

Includes:

Business records: audit findings, management and board reports, strategic presentations, incident response materials, third-party risk documentation (e.g., SOC reports)
Operational data: KPI reports, productivity metrics, security logs, meeting transcripts or recordings
Employee information: performance reviews, disciplinary actions
Masked or anonymized data: masked PII (e.g., last 4 digits of SSN), anonymized or aggregated NPI

Restricted Data