Extract text in file to use in Skill APP
I'm trying out the new skill app model, it sounds promising, however I was checking if box can help extracting text in a PDF or Doc file? Do I have to download the file in my server/function read it and use a specific tool to extract text inside it or is there any API/SDK in box that give this already?
It will be great if text in files is sent in anyway to my skill app url so I can just focus on analyzing text and updating metadata instead of reading/parsing the file first.
Thanks
-
Hi ,
When you're pulling the file from Box to send on to the ML / AI provider you are essentially downloading the file, so it'll be in whatever format the originating file is using (e.g. PDF / doc). Depending on the format you can either read the content directly, and potentially drop any formatting / config data that the formatted file needs, or you can use an extension / package in your respective language to read that file format. That will allow you to extract the data directly.
Thanks,
Jon
Please sign in to leave a comment.
Comments
1 comment