Bulk download programmatically from public Box Enterprise folder
AnsweredHi all,
I'd like to bulk download from a publicly shared Enterprise folder (https://nrcs.app.box.com/v/naip/). The size of the data is huge (~16TB), so I'd like to download it programmatically through either an API or a command line utility. I'm not sure how.
I'm using a Linux cluster, so the Box CLI is of no use to me. I also tried using the box API, but it looks like I can't access another organization's Enterprise folder through the API (I could only, for example, search from my own account).
Any suggestion will help, and thanks in advance!
-
Perhaps FTP (https://community.box.com/t5/Upload-and-Download-Files-and/Using-Box-with-FTP-or-FTPS/ta-p/26050) would work?
The LFTP client on Linux can also make things a little easier/more reliable, BTW.
Hope that helps.
-
You can use the API to access publicly shared folders, you just need to pass the `BoxApi` header with the shared link in it along with every call to let the API know that you should have access to that folder.
The general flow is this:
- Call the `GET /shared_items` endpoint to resolve the shared link to a folder with the BoxApi header
- Make whatever calls against the folder using the ID you get back from Step 1 (in your case, lots of `GET /folders/ID/items` and subsequent `GET /files/ID/content` calls) with the BoxApi header
As an aside, my team will be releasing an updated version of the Box CLI with Linux support next month, which should make this a lot easier for you!
-
Many thanks to both of you, and , for your timely response!
I read from this post that it's not recommended to use FTP as the primary access method, so followed 's suggestion of adding the extra BoxApi header and it worked like a charm.
It's great to know that Box is working on a Linux CLI. I can imagine how helpful it's going to be for Linux cluster users like me.
-
Hi mwiller
I have the same issues, I want to download zipped images from
this public_dataset folder under images folder, ( I can download them locally by click download button for each zipped file one by one, but I want to download them on Linux Server)
the data is publically and contains several different zipped files, so how should I download them in Linux command line, I checked the document, but I don't know the shared link, and password.
curl https://api.box.com/2.0/shared_items?fields=type,id -H "Authorization: Bearer ACCESS_TOKEN" -H "BoxApi: shared_link=SHARED_LINK_URL&shared_link_password=PASSWORD"
Thank you
-
The following curl call worked for me:
curl https://api.box.com/2.0/shared_items \ -H "Authorization: Bearer " \ -H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"
That will give you the information about the shared folder; you can then make API calls like this to retrieve the folder contents:
curl https://api.box.com/2.0/folders//items \ -H "Authorization: Bearer " \ -H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"
-
I tried
curl https://api.box.com/2.0/shared_items \ -H "Authorization: Bearer " \ -H "BoxApi: shared_link=https://nihcc.app.box.com/v/ChestXray-NIHCC"
but nothing happened in command line, no error report, no any information. I don't know what is access_token, how should I know this token?
-
An access token is required to authenticate with the Box API, even for public shared resources. Please see the setup documentation for help getting started setting up an app and getting an access token.
-
I apologize if this question sounds silly, but is there a pythonic way to access data using the shared link? All your answers seem to be pointing towards the cli solution and I don't have access to mac/windows
I created a client using JWT authentication by creating an enterprise developer account, and when I use the following code:
from boxsdk import JWTAuth from boxsdk import Client # Configure JWT auth object sdk = JWTAuth.from_settings_file() # Get auth client client = Client(sdk) SHARED_LINK_URL = 'https://nrcs.app.box.com/v/naip/folder/'
shared_item = client.get_shared_item(SHARED_LINK_URL) print(shared_item.name)I have also verified that the shared link points to a public box file.
-
You cannot append the `/folder/XYZ` to the URL when using the API — instead you'll need to do something like this:
shared_client = client.with_shared_link(SHARED_LINK_URL) shared_folder = shared_client.get_shared_item(SHARED_LINK_URL) folder_contents = shared_folder.get_items() // OR subfolder = shared_client.folder(FOLDERID).get()
-
Hi ,
Thank you for all your help upthread. I'm endeavoring to follow all these instructions (with the NAIP shared image folder
https://nrcs.app.box.com/v/naip, same as OP) to programmatically download from a public Box Enterprise folder, but getting 404s on the second step.
`curl https://api.box.com/2.0/shared_items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip" | jq .id` returns `"17936490251"` as expected.
However, all of the following fail with a 404 or another Not Found message.
`box folders:items 17936490251 --fields=shared_link`
`curl https://api.box.com/2.0/folders/17936490251/items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip/"`
`box shared-links:get nrcs.app.box.com/v/naip/`
Can you shed any light on the best way to do this? I feel like I'm so close (thanks to your help), but not quite there.
-
Update: it looks like the trailing slash in the shared_link was the problem.
```curl https://api.box.com/2.0/folders/17936490251/items -H "Authorization: Bearer myToken" -H "BoxApi: shared_link=https://nrcs.app.box.com/v/naip"``` works. But add the trailing slash after "naip", and it doesn't. Hope this helps someone out. 🙂
-
Hi ,
I have to perform a similar task except I want to download all the files from https://uta.app.box.com/s/e7nsmloj8xmblosvfg98q42fgqnjy6dv.
I have understood the procedure using curl.
But could you please help me in generating the ACCESS_TOKEN?
I have searched online but the methods suggest ways to generate the same for your own app.
Thank You
-
I am writing a code in python to download a file (files or folder) from https://nrcs.app.box.com/v/soils .
it is public and I can download with some click without any username or password.
I am suing this code block (the code in this comment) and it asks for password. here is the error:
with_shared_link() missing 1 required positional argument: 'shared_link_password'
is there any other method to download?
code block:
shared_client = client.with_shared_link(SHARED_LINK_URL)
shared_folder = shared_client.get_shared_item(SHARED_LINK_URL)
folder_contents = shared_folder.get_items()
// OR
subfolder = shared_client.folder(FOLDERID).get() -
Hi / ,
Were you able to figure out how to download publicly available data through a python script?
I am getting the same error that 'shared link password' is missing.
I tried running the code by passing an empty string as the password then I got the following error,
boxsdk.exception.BoxAPIException: Message: Could not find the specified resource Status: 404 Code: not_found Request ID: thwf9wgdl2kk2d2l
My code:
from boxsdk import JWTAuth
from boxsdk import Client
# Configure JWT auth object
sdk = JWTAuth.from_settings_file('box_config.json')
# Get auth client
client = Client(sdk)
user = client.user().get()
print('The current user ID is {0}'.format(user.id))
SHARED_LINK_URL = 'https://stonybrookmedicine.app.box.com/v/cellreportspaper'
shared_client = client.with_shared_link(SHARED_LINK_URL,'')
shared_folder = shared_client.get_shared_item(SHARED_LINK_URL)
folder_contents = shared_folder.get_items()
subfolder = shared_client.folder(4***phone number removed for privacy***).get()
for item in subfolder.get_items(limit=1000):
client.file(file_id=item.id).content()Any help in figuring this out is much appreciated!!🙂
Please sign in to leave a comment.
Comments
19 comments