We will be introducing two powerful new field types for the Box Extract Agent API extract_structured endpoint: struct and table. Developers will be able to extract not just individual data points, but fully structured, grouped, and repeating data directly from their documents, in clean JSON format, ready to plug into any downstream system.
Currently, Box Extract can only pull flat, scalar values from documents, including a single date, a name, and a number. There is no way to group related fields into a single named object, extract repeating rows of data, or guarantee accuracy on tabular data.
The struct field type will let you define a named container of related sub-fields and receive them as a single, grouped JSON object. Extract a full address (street, city, postal code), a party’s contact details, or any set of related attributes all in one clean output.
The table field type will extract repeating rows of structured data as an array of JSON objects. Define the columns you want, like description, quantity, unit price, tax, and Box Extract returns every matching row, regardless of how the data is laid out in the source document.
Table extraction will work across any document format, including visual grids, key-value pairs, form layouts, or plain prose. Customers don’t need to know how their data is structured. They just define what they want.
With these new field types, Box Extract will help eliminate one of the most persistent bottlenecks in document processing: the gap between getting data out of a document and actually being able to use it.
Stay tuned to learn more about this release.