How to download big amount of data from the shared link without user intervention?

新規投稿

コメント

11件のコメント

  • Vasilii Mikirtumov

    As a short comment, I also tried CCG authentication with 

    auth = CCGAuth(
      client_id="my_client_id",
      client_secret="my_client_secret",
      user="my_user_ID"
    )

    user_client = Client(auth)

    and get the following error: {'error': 'invalid_grant', 'error_description': 'Grant credentials are invalid'}

    I have the generate user access tokens option ticked in the app configuration. But I don't have authorization tab, so mb that's the reason 

    0
    コメントアクション パーマリンク
  • Thanos

    I have exactly the same use case. I am trying to download files that have been made available as a public shared link from an academic paper.

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Hi folks,

    Interesting topic, but with many questions/considerations.

    I've answered a similar question before, related with downloading from public shared links.

    Let's start there.

    I have a public shared link, and if we jus open it we can see a download button:

    This download button will zip the content of the shared link and start a download for you.

    This might work for a small amount of files, the simpler user case.

    The similar question I mention was how to download from a shared link programmatically, so here is a python scrip as an example:

    """demo to download files from a box web link"""
    import os
    from boxsdk import JWTAuth, Client


    def main():
        auth = JWTAuth.from_settings_file('.jwt.config.json')
        auth.authenticate_instance()
        client = Client(auth)

        web_link_url = "https://samchully.app.box.com/v/Europe000000"

        user = client.user().get()
        print(f"User: {user.id}:{user.name}")

        shared_folder = client.get_shared_item(web_link_url,'' )
        print(f"Shared Folder: {shared_folder.id}:{shared_folder.name}")
        print("#" * 80)

        print("Type\tID\t\tName")
        os.chdir('downloads')
        items = shared_folder.get_items()
        download_items(items)
        os.chdir('..')

    def download_items(items):

        for item in items:
            if item.type == 'folder':
                os.mkdir(item.name)
                os.chdir(item.name)
                download_items(item.get_items())
                os.chdir('..')

            if item.type == 'file':
                print(f"{item.type}\t{item.id}\t{item.name}",end='')
                with open(item.name,'wb') as download_file:
                    item.download_to(download_file)
                print("\tdone")


    if __name__ == "__main__":
        main()
        print("Done")

    This scrip will recursively download files into a "downloads" folder, one by one, and it will re-create the folder structure of the shared link. 

    It is using JWT authentication, but it works the same for any type of authentication.

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Now Vasilii Mikirtumov mention the authentication expires during the download.

    Both OAuth 2.0 and JWT authentication, once completed, provide an authorization token that is valid for 60 minutes.

    OAuth 2.0 also provides a refresh token, valid for 60 days, that can be used to get a new token. If the refresh token expires, the user has to go through the authorization process again.

    JWT authentication, relies on the JWT token to get the authorization token, and the JWT it self does not expire.

    In all of Box SDK's there is a Auth class and a Client class. From the python example above:

        auth = JWTAuth.from_settings_file('.jwt.config.json')
        auth.authenticate_instance()
        client = Client(auth)

    The auth class does check to see if the authorization token needs to be refreshed, and get a new one if needed, when possible. Check out this Python SDK note for examples.

    In summary, every time the script interacts with the API, the expected behavior is that the Client class, checks with the Auth class to see if it needs a new token, and if so, get a new one automatically.

    0
    コメントアクション パーマリンク
  • Vasilii Mikirtumov

    Dear Rui,

    in the case 1) the issue is with the folder size, which in my case is too big. So whenever I hit the download button, I get "The selected item(s) exceed the download size limit."

    in the case 2) I assume the download will stop as soon as developer token is out of date (in 60 min). and the JWT authentication is broken apparently. 

    So what shall we do??? 

    Thank you

    Vasilii

     

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    However Vasilii Mikirtumov problem statement adds a couple of layers of complexity.

    When downloading 11k files one by one using a script, will take some time and something can go wrong.

    Not only the scrip should track which files have already beed downloaded, so it skips the ones done on restart, but also, it should track if in the meantime the file has changed, and download it again.

    Creating a tracking system could be something like logging the file id into a file and checking if the file id is present on the log for each download.

    To verify if the file has changes, Box keeps an SHA1 hash property of all files, and also an ETAG property, that gets incremented every time a new version of a file is created.

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Vasilii Mikirtumov and Thanos, let me know if you want to explore any specific aspect or use case.

    Best regards

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Vasilii Mikirtumov, tell me a bit more about your context.

    Are you working for a Box enterprise customer?

    With 1.5 TB of files it certainly seems so.

    0
    コメントアクション パーマリンク
  • Vasilii Mikirtumov

    Rui Barbosa,

    I don't really know to be honest. But probably yes. The link looks like this: "https://thermofisher.box.com/s/xxxxxxxxxxxxxxxxxxx" 

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Well, in that case, reach out to them.

    They should be able to create a Box app, with the correct permissions and access level, that you can use for this purpose.

    Another option they can use is to create a sandbox for you to use.

    It is a bit unusual that a company shares 1.5 TB using a shared link with an external user, so reach out to them.

    Best regards 

    0
    コメントアクション パーマリンク
  • Rui Barbosa

    Hi Vasilii,

    I was wondering if a simple OAuth applications would work for you, since these do not need the broken admin approval.

    Take a look at this template https://github.com/barduinor/box-python-oauth-template

    This is a python script that will work with oAuth2.0, the first time it will open a browser and ask you to authorize the app, but then it caches the access token and refresh token, allowing you to continue to use the script.

    The access token is valid for 60 minutes and the refresh token for 60 days, and these get refreshed automatically.

    Keep the cache file safe since it is not encrypted.

    Then it should be a matter of recursively downloading from the public shared link.

    You'll probably need to keep track of what has been downloaded or not, jut in case it stops and you have to restart the script.

    Let me know if this works.

    0
    コメントアクション パーマリンク

サインインしてコメントを残してください。