Python parsing file object

Answered

New post

chanmar

August 23, 2018 17:15

Hi there, pretty new to Python and Box SDK. I'm trying to search for certain filetypes and get information like Filename, full folder path, content created, modified, etc.

I was able to connect and get the results, but I'm having trouble parsing out the file object that Box search function returns.

I have something like the below, but when I try to get the path_collection for the folder path, I get a "string indices must be integers". Any tips or ideas on how to get this information?

    resp = client.search_files(
        query_string='.pdf', ancestor_folder_ids=None, file_extensions=['pdf'])


    if resp['total_count'] > 0:
        for entry in resp['entries']:
            box_filename = entry['name']
            box_fileid = entry['id']
            box_created_at = entry['created_at']
            box_modified_at = entry['modified_at']
            box_content_created_at = entry['content_created_at']
            box_content_modified_at = entry['content_modified_at']
            for path_collection in entry['path_collection']:
                    for pc_entry in path_collection['entries']:
                        box_folderpath = box_folderpath + pc_entry['name']
    else:
        print("PDF files not found")

The eventual goal is to get a CSV of PDF files in a certain folder tree and their associated metadata.

Comments

3 comments

mwiller

August 23, 2018 21:29
I think the issue might lie in your iteration over the `path_collection` dictionary — you probably only need one for loop, like this:
```
for pc_entry in entry['path_collection']['entries']:
    box_folderpath = box_folderpath + pc_entry['name']
```
0

Comment actions Permalink
chanmar

August 24, 2018 10:02
Omg that was it! Thank you so much!

Can I ask another question? I notice that when I search_folders it doesn't returns "tags" in the response even though my file has tags on it. Was this dropped from the file object, or am I just reading it correctly?
0

Comment actions Permalink
mwiller

August 24, 2018 11:03
According to the API documentation at https://developer.box.com/v2.0/reference#file-object, the `tags` field is not included in the file object response by default — you need to specifically request it from the API using the `fields` query parameter. Unfortunately, the Python SDK does not currently make it easy to pass that in — we're working on a big update for the SDK which should include full API parity across all endpoints, but it's not ready yet. In the meantime, you should be able to make the call manually by doing something like this:
```
params = {
    'fields': 'type,id,tags', # add any other fields you need here
    'query': '.pdf',
    'file_extensions': 'pdf',
}
resp = client.make_request('GET', 'https://api.box.com/2.0/search', params=params).json()
```
0

Comment actions Permalink

Please sign in to leave a comment.