We release additional file support for Box Extract, enabling enterprises to extract structured data from a broader range of file types, unlocking the ability to extract structured data from more of you organization’s content.
Prior to this release, Box Extract only supported the ability to extract structured data from PDF files, limiting the addressable use cases with an organization. With this new release, you can create and configure Custom Extract Agents to extract structured data from more file types, including:
Custom Extract Agents now support extraction from:
Images: PNG, TIFF, TIF, JPG, JPEG, and WEBP
Documents: PDF, DOC, DOCX, Google Docs, ODT, and Box Notes
Presentations: PPT and PPTX
Spreadsheets: XLS, XLSX, XLSM, ODS, and Google Sheets
When a Custom Extract Agent is applied to a folder in Box, all the file types listed above will automatically have structured data extracted from them and applied alongside those files as metadata.
Custom Extract Agents applied to existing folders will continue extracting from PDFs only. To enable support for the aforementioned additional file types, simply remove and reassign the Custom Extract Agent to the folder.
To learn more about this release, see Using Box Extract.