Additional file type support for Box Extract – Box Support

We released additional file support for Box Extract, enabling enterprises to extract structured data from a broader range of file types, unlocking the ability to extract structured data from more of your organization’s content.

Prior to this release, Box Extract only supported the ability to extract structured data from PDF files, limiting the addressable use cases with an organization. With this new release, you can create and configure Custom Extract Agents to extract structured data from more file types.

Custom Extract Agents now support extraction from:

Images: PNG, TIFF, TIF, JPG, JPEG, and WEBP
Documents: PDF, DOC, DOCX, Google Docs, ODT, and Box Notes
Presentations: PPT and PPTX
Spreadsheets: XLS, XLSX, XLSM, ODS, and Google Sheets

When a Custom Extract Agent is applied to a folder in Box, all the file types listed above will automatically have structured data extracted from them and applied alongside those files as metadata.

Custom Extract Agents applied to existing folders will continue extracting from PDFs only. To enable support for the aforementioned additional file types, simply remove and reassign the Custom Extract Agent to the folder.

To learn more about this release, see Using Box Extract.

Box Docs

Updates & Feedback

Related articles