We are incredibly excited to announce Box Extract, enabling teams to automatically and accurately extract structured data from unstructured content at scale. Box Extract combines the latest LLMs, including Google Gemini 3, with advanced data science techniques, like integrated OCR, extraction-specific RAG, and more, to extract critical data and save it as metadata in Box or export it to third party systems or custom apps. It enables business process owners to configure, customize, and manage their data extraction processes via a dedicated user interface and save them as Custom Extract Agents, which can be applied to specific folders in Box to automatically extract structured data from content at scale. We do this with a combination of:
- The latest AI models, including Google’s Gemini 3, Anthropic’s Claude Opus 4.5, and OpenAI’s GPT 5.2, to understand unstructured data.
- Advanced optical character recognition capabilities.
- Agentic approaches and advanced reasoning to understand the content and the information that is required from that content to accurately extract it.
-
Proprietary techniques for digitization, categorization, and validation to increase accuracy and consistency of extracted data.
Users can configure their own Custom Extract Agents, including:
- Selecting which pre-configured metadata template to map extracted structured data to.
- Selecting which metadata fields to extract data to, as well as the ability to create AI-powered instructions or prompts for each field to increase accuracy and precision.
- Deciding whether to leverage the Box AI Standard or Enhanced Extract Agent. You can learn more about those agents here.
-
Determining whether to keep or override existing metadata values.
Users can activate, deactivate, delete, or rename Custom Extract Agents, as well as monitor extraction processes for individual Custom Extract Agents, including:
- Extraction date and time stamps
- Source folder
- Extracted file
- Extraction status
With Box Extract, users can automatically extract data and apply it as metadata alongside content in Box to:
- Automate workflows end-to-end in Box Relay, Box Apps, and soon, Box Automate
- Enable teams to make smarter, more informed business decisions with metadata-powered dashboards in Box Apps
- Streamline content discovery with faster search and the ability to query on content more efficiently with Box AI
Box Extract leverages the Box AI Standard and Enhanced Extract Agents and their associated APIs and endpoints, including the freeform and structured endpoints. As a result, Box Extract consumes Box AI Units depending on which Box AI Extract Agent is configured within your Custom Extract Agent or when leveraging the APIs. Please see the Box AI Units Consumption Table to learn more.
While the Box Extract Agent APIs are available on Business, Business Plus, Enterprise, Enterprise Plus, and Enterprise Advanced Plans, the ability to create, configure, and deploy Custom Extract Agents in Box Extract, as well as the ability to autofill metadata templates using the Box Standard and Enhanced Extract Agents, is only available on the Enterprise Advanced Plan.
Stay tuned to learn more about this release!