Automated classification enables you to automatically apply security classifications to your sensitive content by configuring classification policies to look for matches of specific data types, such as terms and Info Types, in your enterprise's content. You can select one or more data types to look for in content, and specify a security classification for the content when data type conditions are met.
When a user uploads a file, Box scans the file's content, looking for the data types you've specified in your classification policy. If the file meets your classification policy's conditions, Box applies the classification you've specified to the file.
Important: To use automated classification, you must be an admin or co-admin with Shield privilege. Note that Metadata privilege will be automatically checked for co-admins with Shield privilege.
Creating a new classification policy
To create a new classification policy:
- In the left sidebar of the Admin Console, click Classification.
- In the top of the window, click Classification Policies.
- Click Create Policy.
- In Classification Policy Name, type the policy's name.
- In Description, type a summary of the policy.
- Scroll to Step 2 Criteria.
- In Folder Criteria, select Apply to all folders or Only selected folders. If you selected Only selected folders, click Select Folders to select folders. Click Save when you are done.
- In File Criteria, click Select Data Type.
- Select an Info Type, or click Create Custom Terms.
- If you selected Info Type, select either High, Medium, or Low in Confidence. See the "A note on Confidence" section following these instructions.
- If you clicked Create Custom Terms, enter or paste comma-separated words or phrases, up to fifty terms per entry.
Note: n a term, Box ignores case and considers a - Under With, select logic conditions.
- Under Unique Count, specify how many unique occurrences of the data type are required to meet the logical condition.
- In When a file contains the following conditions, you can choose All, Any 1, … to Any X, where X is total number of conditions specified in this condition group.
Note: Within one condition group, you can add up to twenty data types. By selecting All or Any X, where X is the total number of conditions specified within this condition group, this condition group is considered met when all conditions in this condition group are met. By selecting Any 1, this condition group is considered met when any of the conditions is met. By selecting Any 2, this condition group is considered met when any two of the conditions is met, and so forth.
- When needed, you can add an additional condition group by clicking on Add More Criteria and follow the above steps to specify data types.
Note: You can add up to ten condition groups.
-
You can chain condition groups with either "AND" or "OR". Box does not support combinations of "AND" and "OR" between groups.
-
Scroll to Step 3 Apply Classification.
-
In Apply This Classification, select a classification.
-
In the top-right corner of the Admin Console window, click Next.
-
Choose to either Save as Draft, or Enable the policy immediately. You can create up to fifty classification polices, enabled and disabled combined.
A note on Confidence
Specifying a value for Confidence enables you to adjust the accuracy with which Box finds matching Info Types in your content.
When you select High, Box counts matches with High confidence.
When you select Medium, Box counts matches with both Medium and High confidences.
When you select Low, Box counts matches with Low, Medium and High confidences.
Selecting High may result in more false negatives, while selecting Low may result in more false positives.
An example of file criteria
Here's an example of defining one group of logic conditions for File Criteria:
- The first condition of the group reads "If there are 1 or more unique U.S. social security numbers found with Low, Medium, or High confidence in this file, this condition is met".
- The second condition of the group reads "If there are between 1 and 10 (both inclusive) unique U.S. driver's license numbers found with Medium or High confidence in this file, this condition is met".
- The third condition of the group reads "If there are no more than 5 unique passport numbers found with High confidence in this file, this condition is met".
- The fourth condition of the group, "6 Terms", is considered as one data type in the group. This condition reads "If there are 3 or more unique terms from the list of 6 terms found in the file, this condition is met".
- For the condition group containing the above four conditions, the governing condition When a file contains the following conditions Any 2 reads "If two or more conditions in this group are met, the file criteria defined in this condition group is met".
An example of custom terms
Here's an example of terms that are considered by some enterprises:
Duplicating a policy to start with
To reuse folders or file criteria selected for an existing policy, you can duplicate the policy to start with.
- In the left sidebar of the Admin Console, click Classification.
- In the top of the window, click Classification Policies.
- Click a policy name. Box displays the details of the policy.
- Click Duplicate.
- In the top-right corner of the Admin Console window, click Next.
- Choose to either Save as Draft, or Enable the policy immediately.
FAQ
When does automated classification happen?
Automated classification is triggered when the following file events occur:
- A new file is uploaded to folders specified in classification policies.
- An existing file is updated in folders specified in classification policies.
- An existing file is moved or copied to folders specified in classification policies.
- When people are invited to view a file (for the first time or when additional people are invited).
- When a shared link is created for a file.
- When the scope of a file's shared link is modified (such as a change from people in the company to people with the link).
What Info Types are available out of the box?
You can find the list of supported Info Types here.
How do I know what info types and/or terms are found in a file?
What file types does Box support for automated classification?
Automated classification scans text extraction of the following file types: 'as', 'as3', 'asm', 'bat', 'boxnote', 'c', 'cc', 'cmake', 'cpp', 'cs', 'css', 'csv', 'cxx', 'diff', 'doc', 'docx', 'erb', 'gdoc', 'groovy', 'gsheet', 'h', 'haml', 'hh', 'htm', 'html', 'java', 'js', 'json', 'less', 'log', 'm', 'make', 'md', 'ml', 'mm', 'msg', 'ods', 'odt', 'pdf', 'php', 'pl', 'ppt', 'pptx', 'properties', 'py', 'rb', 'rst', 'rtf', 'sass', 'scala', 'scm', 'script', 'sh', 'sml', 'sql', 'txt', 'vi', 'vim', 'webdoc', 'wpd', 'xhtml', 'xls', 'xlsb', 'xlsm', 'xlsx', 'xml', 'xsd', 'xsl', 'yaml'.
Note: Box does not support Optical Character Recognition (OCR). Therefore, Box does not extract and does not consider text embedded in images when automated classification evaluates policy matches.
What is the supported maximum file size?
Automated classification scans text in content, and supports up to 1 MB of text extraction in a file, which covers the vast majority of files in Box. Text extraction in a file is usually much smaller in size than the original file. For example, a 20-MB Power Point file (.ppt) may have a text extraction of 200 KB. For text extraction that exceeds 1 MB, Box scans the first 1 MB and does not scan the portion that exceeds 1 MB. As a result, Box applies the classification policy if the first 1 MB meets the conditions specified in classification policies.
If a file has a classification label, does automated classification overwrite the classification label?
If the classification label was applied by a user, automated classification does not overwrite the existing label, and the original classification label stays with the file. If the classification label was applied by a classification policy, automated classification overwrites the existing label.