Unclassified content is uncontrolled content. Classifying your content allows you to govern your content. The first step in classifying your content is creating Classification Labels. This topic will describe a basic classification scheme, as well as best-practice classification schemes for several industries, including legal/M&A, financial services, and healthcare. It contains the following sections:
- A Basic Classification Label Scheme
- Industry-Specific Classification Labeling Schemes
- Classification Label Colors
- Defining Your Classification Labels
- Automate Your Content Classification
- Use Classification in Your Shield Access Policies
- Configure Shield Access Policies
In Box, the names of Classification Labels are freeform text. You can name your Classification Labels anything you want to (at least, up to 40 characters worth). But just because you can do something doesn't mean you should.
The security industry has naming conventions for various levels of security, and then other industries layer their own naming conventions on top of that to serve their specific security needs. These conventions aren't requirements, nor do they adhere to any regulation. But following these conventions will help you interoperate with other systems easier.
A Basic Classification Label Scheme
A basic classification labeling scheme uses common naming conventions, conventions that allow people familiar with the content security space to infer the likely level of security applied to content that use these Classification Labels. This scheme uses the following label names:
- Public - for content that can be made generally available
- Personal -for non-business content that is usually not intended to be made public or shared
- Internal - for content that is intended to be kept within your organization
- Confidential - for content that requires specific authorization for access
In addition to being a good basic classification scheme, other industries that require additional levels of content security will often start with these as a baseline and then add specific, finer-grained classifications or sub-classifications of these classifications.
Many basic content classification systems will include only Internal and Confidential. While these classifications can be sufficient to identify content that requires some level of control, using just Internal and Confidential classifications can mean some amount of content left unclassified, which is sub-optimal when you want to ensure proper governance of all of your content.
These classifications typically have purposes as described in the following sections.
Public Classification Label
Public classification is typically used for content that is intended for dissemination to the general public, either through direct distribution or publication. Examples of this type of content include published press releases, product support documentation, and official social media posts.
Personal Classification Label
Personal classification is typically used for non-business content that is usually not intended to be made public or shared. It is not often used in business environments, but if you allow personal content to be stored in your organization's Box instance, you may want to include it.
Internal Classification Label
Internal classification is typically used for content intended for use within the organization and is not meant for public or external parties. In some cases, specifically authorized external parties can be allowed access to content with this classification. Examples of this type of content include policy or procedure manuals, team meeting minutes, internal project plans, internal newsletters, employee training materials, and internal communications.
Confidential Classification Label
Confidential classification is where different industries add nuance and additional strata. A basic definition is that is it for content that needs specific authorization for access and is subject to stringent security measures. Additional clarifications that industries and organizations can add to the definition of Confidential classification include:
- Content that includes sensitive business data and that could cause damage to the business if shared with unauthorized people, including the general public and most employees.
- Content that includes health data but with any PHI/PII (personal health information/personally identifying information) removed to protect individual privacy while still being used for research, policy-making, or health system management.
- Content with confidential data that is available to all employees but requires protection.
- Content with confidential data that is allowed to be shared with trusted people both within and external to your organization.
Examples of this type of content include content containing PII (personally identifying information), financial records and reports, proprietary research and development documents, contracts, shared project docs, unreleased scripts, contractual agreements, financial statements, non-public marketing plans, de-identified clinical trial data and patient surveys, anonymized datasets used for public health research, and strategies and future plans.
Industry-Specific Classification Labeling Schemes
Legal/M&A, financial services, and healthcare organizations typically have content security needs beyond a basic classification scheme. The following sections describe industry-specific best practice classification schemes that build upon the basic classification scheme described above.
Legal/M&A Organization Classification Labels
In the legal industry, a significant amount of content must be restricted to a limited amount of people. A classification scheme that supports the needs of a legal/M&A organization could include the three basic classification labels (Public, Internal, and Confidential), plus:
Client Content/Client Collaboration Classification Label
This classification is generally used for content that is available to people within your organization and must be restricted to specifically defined people outside of your organization, content that is produced, shared, or collaborated upon with a client or partner. Examples include scripts being developed for clients, storyboards, shared project timelines, and deliverable schedules.
Financial Service Organization Classification Labels
The financial services industry requires both confidentiality and governance. Content can contain information that includes both personally identifying information (PII) and sensitive financial information. A classification scheme that supports the needs of a financial organization could include the three basic classification labels (Public, Internal, Confidential), plus:
Collaborators Only Classification Label
This classification is generally used for content intended to be shared only with trusted external parties. Examples include joint project reports with partner companies, shared financial analysis, and collaborative research.
Extremely/Highly Confidential Classification Label
An Extremely Confidential or Highly Confidential classification is typically used for extremely sensitive content that could result in substantial harm or risk to the individual or company if disclosed. Examples of this type of content include employee and customer information, passwords, source code, contracts, pre-announcement financial data, trading reports, M&A details, and executive communications.
PCI Classification Label
This classification is typically used for any content that contains any information that is governed by the Payment Card Industry Data Security Standard (PCI DSS), which includes sensitive payment card or cardholder information. Examples of this type of content include content containing credit card numbers, cardholder's names, or cardholder's authentication data (like CVV or PIN numbers).
Healthcare Organization Classification Labels
The healthcare industry includes many different types of organizations, from hospitals and medical practices to pharmaceutical and medical device developers to public and private research institutions. Some organizations can benefit from a simple classification structure, while others others may require more fine-grained levels of content security, especially when working with governmental organizations. Many organizations settle on a basic schema plus specific categorization for content containing personal health information (PHI). A classification scheme that supports the needs of a healthcare organization could include the three basic classification labels (Public, Internal, Confidential), plus:
Collaborators Only Classification Label
This classification label is used for content that your organization wants to share only with trusted external parties. Examples include research findings shared with partner institutions, joint clinical trial data, and shared patient data for consultation or referral purposes.
Confidential - De-Identified PHI Classification Label
This is a specialized, industry-specific classification used for content that contains health data from which identifiers have been removed to protect individual privacy while still being used for research, that is for policy-making, or that is for health system management. Examples include de-identified clinical trial data and patient surveys and anonymized datasets used for public health research.
Restricted - PHI and Restricted - Sensitive Classification Labels
These are specialized, industry-specific classifications. Restricted - PHI is used for content that contains health data that can identify an individual, directly or indirectly, and is protected under Health Insurance Portability and Accountability Act (HIPAA) regulations. Examples include personal medical records containing patient's name, date of birth, etc., medical billing information, and any content containing PHI.
Restricted - Sensitive is used for content that contains sensitive health-related or personal information, whose unauthorized access could result in substantial harm, distress, or unfairness to the individual concerned. Examples include patient medical records, laboratory test results, and high-risk health data such as genetic or mental health information.
Classification Label Colors
In Box Shield, one of the characteristics that you define when you create a classification label is to assign in a color. The colors do not connote meaning specifically, but specific colors are typically associated with specific levels of control. In addition, some colors follow the U.S.'s Cybersecurity and Infrastructure Security Agency's Traffic Light Protocol Definitions and Usage, which is "a set of designations used to ensure that sensitive information is shared with the appropriate audience" and "employs four colors to indicate expected sharing boundaries to be applied by the recipient(s)."
You are, of course, free to associate your own color scheme with the Classification Labels you create for your organization. However, as with naming, choosing colors commonly associated with the levels of content security that the labels describe will help you interoperate with other systems. A best-practice color scheme would be:
- Public: green
- Personal: yellow
- Internal: yellow, orange, and blue
- Confidential: yellow and red
- Extremely/Highly Confidential: red
- Collaborators Only/Client Content/Client Collaboration: yellow and purple
- Restricted (PHI and Sensitive): yellow and red
Defining Your Classification Labels
This is the first step toward securing your content. You need to answer the question: What Classifications work for my organization?
In Box, your Classification Labels begin as a blank slate. You are free to define any that work for you. That said, using common classifications for your Classification Labels if they make sense for your content mean that the people you work with both inside and outside your organization and who you share your content with will have a common understanding of your content security.
Another common question is: How many Classification Labels are needed in the organization. The answer is: It depends. As a general rule, however, you don't want more classifications that necessary. It is easy to create classification clutter by creating too many Classification Labels that are too similar in definition.
Keep It Simple: Common Classifications
When you see the many options in the sections above, and when you start thinking about the ways you want to classify and control your content, you may think that more is better, that an extensive, granular structure is optimal. That may not be true. Simplicity has a number of advantages, including:
- Clarity: With fewer classifications, you can make it clear what each one means.
- Usability: When users have more choices, it can be more difficult to decide which one is correct. With fewer choices, it's not only more likely the right one will be chosen, but it's more likely that users won't just skip classifying files.
If you do not already have any sort of classification system for your content, consider starting with a simple system, no more than 3 or 4 classifications. A good basic system might start with these three:
- Internal
- Confidential
- Restricted
Some large organizations are even known to use just two classifications, and do so successfully.
Understand Your Content/Content Inventory
To understand what Classification Labels will be useful for your organization, you first should understand what kind of content your organization is creating, receiving, and storing. Depending on your organization type, you may already have a good idea: If you're a healthcare organization, you may have content containing PHI (personal healthcare information), if you're a legal organization, you may have content containing client or contract information, if you're a media organization, you may have content that is under NDA (non-disclosure agreements), etc. But a review of your Insights dashboard will give you an idea of how many files your organization has stored within Box and a comprehensive Folders and Files report will give you insights into those files.
Existing Classifications
You might already have a content classification system in place, in which case you can easily mirror that system in Box. Box even makes this easy if you're using Microsoft Information Protection.
Automate Your Content Classification
You have your basic system of content classification set up. Your content in Box, however, is not (yet) classified. Until it is, you won't be able to ensure its security.
You don't have to depend on manually classifying your content. Box's classification policies can apply classification labels automatically based on file criteria that you define.
The easiest way to think about this is: A file with X content should have Y classification. For example:
- A file containing a credit card number should be classified as Confidential
- A file containing an email address or a phone number should be classified as Internal
- A file containing a credit card number should be classified as Restricted.
This simple example just scratches the surface of how Box classification policies use file criteria to automatically classify your content.
Use Classification in Your Shield Access Policies
Once you have defined a classification scheme that works for your content and have used classification policies to classify your content, the next step is to define Shield access policies that will determine levels of access based on classification. A key component of Shield access policies are the ways that policies can restrict content access. The key is to combine access restrictions with the right Classification Label to match industry best practices.
Shield Access Policy Restrictions
With shield access policies, Box makes available several ways that you can control and restrict your content, including:
- External collaboration control, which defines who can and cannot collaborate with your content.
- Shared link control, which defines who can and cannot access shared links.
- Blocking downloading and printing, and for both managed and external users.
- Blocking third-party application use.
- Blocking access to Box via ftp.
- Blocking the use of Box Sign.
- Watermarking.
Configure Shield Access Policies
Once you have defined your classifications and configured and automated you content classification, you can configure the Shield Access Policies that will protect your content. See Configuring Shield Access Policies to Match Industry Best Practices for details.