Welcome to the new Box Support website. Check out all the details here on what’s changed.

Avoiding parallel folders created by concurrent uploads via sdk

New post

Comments

4 comments

  • cbetta

    Hi , are you saying the folders have exactly the same name? From what I know about are API that's not the intended behaviour. 

    0
    Comment actions Permalink
  • davidswalkabout

    Yes, some names high in our folderpaths refer to entities that persist over
    months. We deliver updates about those entities over months via files.
    Sometimes the updates occur within a few seconds of each other, leading to
    a race condition about whether a folder has been created yet or not.

    I believe AWS SQS will solve this by linearizing the concurrent updates.
    I'm no longer looking for a solution in the Box sdks.

    0
    Comment actions Permalink
  • peterjkirby

    My team's been running into the same issue.

    In short, when files are uploaded to one of our systems, we publish events to a queue. Items on this queue are consumed by multiple consumers. Each consumer checks whether or not a required folder structure is in place in a predetermined project folder. If the project folder does not contain the desired folder structure, it is created. The consumer then uploads the file into a target child folder. 

    Our Box folder structure looks something like this:

    +-- projects/
    +-- Project1/
    +-- Project2/
    +-- ...etc

    A consumer creates the following folder structure within a project folder when it detects that it doesn't exist:

    +-- projects/
    +-- Project1/
    +-- Data/
    +-- Foo/
    +-- Bar/
    +-- ...etc

    We see duplicate sibling folders being created when the following occurs:

    1. Multiple files are uploaded to our system around the same time.
    2. Each uploaded file is related and therefore belongs somewhere within the same project folder. 
    3. The target project folder does not contain the desired folder structure, so each consumer attempts to create the folder structure

    When this occurs, we see duplicate sibling folders being created. For example:

    +-- projects/
    +-- Project1/
    +-- Data/
    +-- Foo/
    +-- Data/
    +-- Data/
    +-- Bar/

    We are also seeing a similar issue when the same file is uploaded multiple times in rapid succession. In that case, there are sometimes multiple files with the same name uploaded to the same parent folder.

    This behavior appears to be unexpected behavior from the Box API. From what I could find, Box API docs don't mention anything about race conditions, limiting concurrency, or atomic folder/file creation. 

    Can you provide us with any recommendations? Is there a way to atomically create a folder?

    We are investigating how we should refactor our consumers right now. Any tips would be greatly appreciated. Thanks!

     

     

    0
    Comment actions Permalink
  • davidZZZZZ

    Since I posted the original question 7 months ago, the only work-around we've found is to linearize our requests into the Box SDK. This is...not great.

    Ideally, the SDK's method for uploading a file would allow specifying the full target filepath and would use-or-create every foldername on that path, and provide an option to either overwrite any conflicting file in the leaf folder or update one of the two filenames to avoid collision.

    0
    Comment actions Permalink

Please sign in to leave a comment.