How to download big amount of data from the shared link without user intervention?
Dear developers team,
I have encountered a problem with accessing some data that my colleagues have shared with me via Box. The share link they have sent points to data uploaded on box in the folder, which is around 1.5 T in size and contains around 11k files. With my free personal account I obviously cannot download this folder from browser since it's too large, and as I looked through other account upgrade options I realized none would allow me to download this. I also cannot download each file individually as it will take ages of manual clicking.
Therefore I decided to use python boxsdk API. I created my app with Oath2.0 authentication and was able to access the shared files with client.get_shared_item('shared link')
the problem is that with this type of authentication with client ID, secret and token it's only possible to download files for 60 mins and then the token expires. As I realized the other way with JWT is currently down (I tried it and there was an error https://support.box.com/hc/en-us/community/posts/15257013694995-JWT-authentication-Please-check-the-sub-claim-The-sub-specified-is-invalid-), so I cannot use this as well.
As a workaround I have tried to fetch all download urls in 60 mins with get_download_url() and then I could download all the files with requests.get. Unfortunately these links are only valid for 15 mins or so...
For now, the only possible solution I see is that the enterprise on my colleagues side could create a Sandbox for me to allow JWT authentication, but I'm not sure whether it's possible.
So, my question is: Am I missing smth and are there any ways I could download my data from box?
Thank you,
Vasilii
-
As a short comment, I also tried CCG authentication with
auth = CCGAuth(
client_id="my_client_id",
client_secret="my_client_secret",
user="my_user_ID"
)user_client = Client(auth)
and get the following error: {'error': 'invalid_grant', 'error_description': 'Grant credentials are invalid'}
I have the generate user access tokens option ticked in the app configuration. But I don't have authorization tab, so mb that's the reason
-
Hi folks,
Interesting topic, but with many questions/considerations.
I've answered a similar question before, related with downloading from public shared links.
Let's start there.
I have a public shared link, and if we jus open it we can see a download button:
This download button will zip the content of the shared link and start a download for you.
This might work for a small amount of files, the simpler user case.
The similar question I mention was how to download from a shared link programmatically, so here is a python scrip as an example:
"""demo to download files from a box web link"""
import os
from boxsdk import JWTAuth, Client
def main():
auth = JWTAuth.from_settings_file('.jwt.config.json')
auth.authenticate_instance()
client = Client(auth)
web_link_url = "https://samchully.app.box.com/v/Europe000000"
user = client.user().get()
print(f"User: {user.id}:{user.name}")
shared_folder = client.get_shared_item(web_link_url,'' )
print(f"Shared Folder: {shared_folder.id}:{shared_folder.name}")
print("#" * 80)
print("Type\tID\t\tName")
os.chdir('downloads')
items = shared_folder.get_items()
download_items(items)
os.chdir('..')
def download_items(items):
for item in items:
if item.type == 'folder':
os.mkdir(item.name)
os.chdir(item.name)
download_items(item.get_items())
os.chdir('..')
if item.type == 'file':
print(f"{item.type}\t{item.id}\t{item.name}",end='')
with open(item.name,'wb') as download_file:
item.download_to(download_file)
print("\tdone")
if __name__ == "__main__":
main()
print("Done")This scrip will recursively download files into a "downloads" folder, one by one, and it will re-create the folder structure of the shared link.
It is using JWT authentication, but it works the same for any type of authentication.
-
Now Vasilii Mikirtumov mention the authentication expires during the download.
Both OAuth 2.0 and JWT authentication, once completed, provide an authorization token that is valid for 60 minutes.
OAuth 2.0 also provides a refresh token, valid for 60 days, that can be used to get a new token. If the refresh token expires, the user has to go through the authorization process again.
JWT authentication, relies on the JWT token to get the authorization token, and the JWT it self does not expire.
In all of Box SDK's there is a Auth class and a Client class. From the python example above:
auth = JWTAuth.from_settings_file('.jwt.config.json')
auth.authenticate_instance()
client = Client(auth)The auth class does check to see if the authorization token needs to be refreshed, and get a new one if needed, when possible. Check out this Python SDK note for examples.
In summary, every time the script interacts with the API, the expected behavior is that the Client class, checks with the Auth class to see if it needs a new token, and if so, get a new one automatically.
-
Dear Rui,
in the case 1) the issue is with the folder size, which in my case is too big. So whenever I hit the download button, I get "The selected item(s) exceed the download size limit."
in the case 2) I assume the download will stop as soon as developer token is out of date (in 60 min). and the JWT authentication is broken apparently.
So what shall we do???
Thank you
Vasilii
-
However Vasilii Mikirtumov problem statement adds a couple of layers of complexity.
When downloading 11k files one by one using a script, will take some time and something can go wrong.
Not only the scrip should track which files have already beed downloaded, so it skips the ones done on restart, but also, it should track if in the meantime the file has changed, and download it again.
Creating a tracking system could be something like logging the file id into a file and checking if the file id is present on the log for each download.
To verify if the file has changes, Box keeps an SHA1 hash property of all files, and also an ETAG property, that gets incremented every time a new version of a file is created.
-
Vasilii Mikirtumov and Thanos, let me know if you want to explore any specific aspect or use case.
Best regards
-
Vasilii Mikirtumov, tell me a bit more about your context.
Are you working for a Box enterprise customer?
With 1.5 TB of files it certainly seems so.
-
I don't really know to be honest. But probably yes. The link looks like this: "https://thermofisher.box.com/s/xxxxxxxxxxxxxxxxxxx"
-
Well, in that case, reach out to them.
They should be able to create a Box app, with the correct permissions and access level, that you can use for this purpose.
Another option they can use is to create a sandbox for you to use.
It is a bit unusual that a company shares 1.5 TB using a shared link with an external user, so reach out to them.
Best regards
-
Hi Vasilii,
I was wondering if a simple OAuth applications would work for you, since these do not need the broken admin approval.
Take a look at this template https://github.com/barduinor/box-python-oauth-template
This is a python script that will work with oAuth2.0, the first time it will open a browser and ask you to authorize the app, but then it caches the access token and refresh token, allowing you to continue to use the script.
The access token is valid for 60 minutes and the refresh token for 60 days, and these get refreshed automatically.
Keep the cache file safe since it is not encrypted.
Then it should be a matter of recursively downloading from the public shared link.
You'll probably need to keep track of what has been downloaded or not, jut in case it stops and you have to restart the script.
Let me know if this works.
Vous devez vous connecter pour laisser un commentaire.
Commentaires
11 commentaires