- Box Extract Agent APIs available on Business plans and above.
- Creating and configuring Custom Extract Agents, and autofilling metadata templates using Standard and Enhanced Extract Agents, only available on Enterprise Advanced plans.
Box Extract combines agentic data extraction using the latest LLMs with advanced data science techniques (integrated OCR, extraction-specific RAG etc.) designed to automatically and accurately extract information from content and save as metadata in Box. Process owners configure, customize, deploy and manage data extraction processes via a dedicated UI. Populated metadata helps power faster search and accelerate decision-making, enabling teams to work on content-driven processes effectively, efficiently and at scale. This enables customers to implement high-value content-driven use cases on a single secure, compliant, intelligent platform that supports collaboration, workflow automation, third-party and custom integrations, and more.
Key Features
- Custom Extract Agents:
- Users can create Custom Extract Agents that enable organizations to accurately extract structured data from content at scale and automatically apply that data as metadata to files in Box. Custom Extract Agents enable users to select predefined metadata templates, choice of Box AI Extract Agent (Standard or Enhanced), and whether to keep or override existing metadata values, and to configure extraction fields and field properties, including AI-powered instructions or prompts to maximize results accuracy. Once configured, users can attach a limit of 10 source folders to the Custom Extract Agent, extract structured data, and apply it as metadata to any file.
- Only one template can be selected per agent, and at least one field must be enabled In Custom Extract Agent configuration for extraction.
- AI Agent Types:
- Standard Extract Agent:
- Optimized for high-volume, structured or semi-structured PDFs. Supports both consistent layouts (like invoices) and variable structures (like emails). Recommended for high-volume, structured or semi-structured documents with 50 pages or less and fewer than 20 extraction fields
- Enhanced Extract Agent:
- Designed for complex, unstructured, or lengthy documents such as contracts or clinical reports. Uses advanced reasoning for higher accuracy and transparency. Recommended for complex, large, or unstructured documents with 50 pages or more and more than 20 extraction fields for advanced use cases requiring chain of thought reasoning
- Standard Extract Agent:
- Flexible Configuration:
- Select the metadata template you want to extract and map data to.
- Select or deselect individual metadata fields for extraction.
- Add optional AI Instructions to improve extraction accuracy (e.g., specify where data lives, expected formats).
- Determine your AI Agent type (Standard or Enhanced).
- Choose whether to keep existing metadata values or overwrite them during extraction.
- Source Management:
- Attach up to 10 folders per Extract agent. Box Extract currently supports PDF files only. Extraction sources can be managed directly from the Extraction Sources tab of each Custom Extract Agent.
- Run History & Monitoring:
- Every extraction process is logged, including status, source folder, date and timestamps, and file name. You can verify extracted metadata in the file preview by clicking on the file name in the Run History. Failed extractions provide tooltips explaining the reason for the failed extraction.
- Manual Edits:
- Users can view and manually edit extracted metadata via Box Preview or within Box Apps.
- Lifecycle Controls:
- Easily deactivate, edit, or delete Extract agents. Deleting an agent preserves previously extracted metadata.
Limitations
- Box Extract currently supports extraction from PDFs
- Each user can create up to 100 Extract agents.
If multiple Agents share the same folder, the extraction may run for only one Agent.
Box Extract does not currently support subfolder or cascading folder extraction.
- Extraction runs only on the top-level files within the folder associated with a Custom Extract Agent. Files located in subfolders are not included, as extraction does not recurse into nested folders.
- Overwriting metadata replaces all existing metadata values.
- Some extraction processes may still occur after Custom Extract Agent deactivation or deletion.
- Replacing a metadata template in a Custom Extract Agent will restore all existing settings including AI instructions and prompts.
- When attempting to apply metadata using multiple Box products simultaneously, like through Box Relay or leveraging the autofill capability in Box Preview, the order of extraction is unpredictable and may produce conflicts.
- If the selected metadata template or enabled fields are deleted or if the user loses access to the source folder/file, the extraction process will fail.
- If a metadata template is removed or all metadata template fields are disabled, a Custom Extract Agent cannot be activated until the missing information is restored.
- If an Admin creates a metadata template that contains hidden metadata fields, users may encounter potential inconsistencies and inaccuracies when leveraging that metadata template within a Custom Extract Agent. We recommend that Admins fully audit their metadata templates to increase accuracy and consistency.
- Box Extract currently supports extraction from PDFs in English, Japanese, Korean, Chinese and Cyrllic.
- Box Extract currently supports the ability to attach up to 10 source folders for each Custom Extract Agent. Each Box Extract Agent can only extract data from up to 1,000 files within each folder starting from most recently modified.
- Note: Any files that are not PDFs are not supported. However, they will still count towards the 1,000-file threshold for each source folder.
- There is no limit to the number of active Custom Extract Agents an enterprise creates.
- The maximum processing rate / throughput is 500 extractions per minute per user / 700 per minute per enterprise (2.5M pages a day).
- Deleting a Custom Extract Agent moves it to the Trash (if enabled for your enterprise), from where it can be restored.
- While deleting an active Custom Extract Agent is supported, restoring it requires deactivation and reactivation before it can extract metadata again.
- Folder Access/Permissions: Currently, folders can only be attached for automatic extraction at scale from Custom Extract Agents. A Custom Extract Agent can be attached only to folders where the agent owner has Owner/Co-owner, Editor, or Viewer Uploader access.
- Folder and file names in the Run History of a Custom Extract Agent are displayed exactly as they appeared when the extraction was executed.
- Custom Extract Agents created by an end user are private to that user. They are only visible and accessible to the creator. At this time, Box Extract does not support sharing Custom Extract Agents with other users.
- Run History records for Custom Extract Agents are stored temporarily and may not be accessed at this time.
- In Run History, files with no metadata found still show 'Success' status. The extraction results do not affect the run status.
- If a user’s folder permissions are changed to Viewer or Previewer after attaching a folder to an extract agent, the extraction will still run. However, the metadata cannot be updated, and the extraction will not be recorded in Run History. Users will need to manually detach the folder from the agent to avoid this limitation.
- Custom Extract Agents do not support re-running extractions after completion so the user must manually detach and re-attach the folder to the agent to retry extraction.
- if a file referenced in the Run History is deleted, its corresponding log is also removed.