Join BoxWorks in San Francisco Nov 12-13! Keynotes, product demos, and Box Master Classes. Reserve your spot!

About bulk tag maintenance

New post

Comments

10 comments

  • Rui Barbosa

    Hi Tokutake san,

    Can you elaborate your use case a bit more, I'm not sure what are you trying to accomplish.

    If by "maintaining the tags assigned in bulk" you mean for example replacing one tag by another in all content, you should be able to do a search by the old tag and then replace by the new tag in every content item.

    Let us know, and I can investigate the csv format for you.

     

    0
    Comment actions Permalink
  • Tokutake Yuki(徳竹 湧気)

    Thank you for your reply, Mr. Rui.

    What I want to achieve and the conditions are as follows.
    ■What we want to achieve
    - I want to change the tags assigned to files in bulk with CLI. (e.g. tagA→tagB)
    - I want to delete all tags assigned to files in bulk with CLI.
    ■Conditions
    ・I am not an admin user. I am an ordinary user of a company.

    Also, please let me confirm the solution method that you received from Mr. Rui.
    「you should be able to do a search by the old tag and then replace by the new tag in every content item.」
    Does this mean manually replacing all tags one by one?

    0
    Comment actions Permalink
  • Rui Barbosa

    Hi Tokutake san,

    I think I understand what you are looking for.

    So, a central place where you can list, create, rename or delete all user created tags, does not exist in the box.com app (the normal UI), neither is implemented on the CLI.

    This mean we can not just rename TAG-A with TAG-B and be done with it.

    You can, however, search by tag, and update the tags. Please do try any of the examples in some sample files and tags first, until you get comfortable with the scripts and are sure you're not over writing any important tags for you.

    I'm also assuming you already have the CLI configured and authorized.

    In my examples I'm using the CLI with JWT authentication. This means the CLI is authenticated via a service account and I must use the --as-user flags to interact on behalf of the user who owns the content. I also often include the --csv and --fields flags so the output is shorter and readable on this forum. Finally I'm using zshell (macOS), but these examples can be adapted to use powershell (windows) or bash (linux).

    First lets check who the CLI user is:

    ❯ box users:get --csv --fields type,id,name
    ### output ###
    type,id,name
    user,20130487697,JWT

    and then locate the user who I want the CLI to impersonate (again this may not be applicable to your case)

    ❯ box users --csv --fields type,id,name
    type,id,name
    ...
    user,22240548078,Investment User
    user,22240405099,Wealth User
    user,22240545678,Wholesale User
    user,18622116055,Rui Barbosa

    In my, case the user I want the CLI to impersonate is --as-user 18622116055

    My sample files in the box app look like this:

    Please note that if you have just tagged some sample files, the tags are immediately visible in the box app, but the search can take a few minutes to index the recently applied tags.

    To search all files with TAG-A:

    ❯ box search "TAG-A" --as-user 18622116055 \
    --content-types tags \
    --csv --fields type,id,name
    ### output ###
    type,id,name
    file,1016197618492,sample1.heic
    file,1016206416109,sample1.cr2
    file,1016203876842,023A9785.CR3

    Now we can pipe the output of this command to a csv file in order to update all files in bulk:

    ❯ box search "TAG-A" --as-user 18622116055 \
    --content-types tags \
    --csv --fields type,id,name > ./files-tag-a.csv

    and check the contents of that file (it should be the same as the output)

    ❯ cat ./files-tag-a.csv
    ### output ###
    type,id,name
    file,1016197618492,sample1.heic
    file,1016206416109,sample1.cr2
    file,1016203876842,023A9785.CR3

    Now all we need to do is to update the tags of these files.

    Please note that these files only have one tag, and so we can just replace it with the new one. If in your use case, the files have multiple tags, then we would need a different approach, since the next command will replace ALL tags with TAG-B.

    ❯ box files:update --as-user 18622116055 \
    --bulk-file-path ./files-tag-a.csv \
    --tags "TAG-B" \
    --csv  --fields type,id,name,tags

    ### output ###
    [========================================] 100% | 3/3
    type,id,name
    file,1016197618492,sample1.heic
    file,1016206416109,sample1.cr2
    file,1016203876842,023A9785.CR3
    All bulk input entries processed successfully.

    If we look at the box app, all file tags have been replaced:

    Again the search will take a few minutes to re-index the newly applied tags:

    ❯ box search "TAG-B" --as-user 18622116055 --content-types tags --csv --fields type,id,name
    ### output ###
    type,id,name
    file,1016197618492,sample1.heic
    file,1016206416109,sample1.cr2
    file,1016203876842,023A9785.CR3

    Back to the files you'll notice that TAG-A does not exists anymore in the tag list, this is because it is removed if not used in any files.

    So at this point we have successfully replaced ALL tags in files that had the TAG-A with TAG-B.

    This message is already way too long, I'll continue my analysis in the next message...

     

    0
    Comment actions Permalink
  • Rui Barbosa

    Please be aware if the files have more than one tag.

    Consider the following:

    Now all files are tagged with TAG-B and TAG-C. What happens if I try to replace TAG-B with TAG-A using the previous technique?

    ❯ box files:update 1016203876842 \
    --as-user 18622116055 \
    --tags "TAG-A" \
    --csv  --fields type,id,name,tags
    ### output ###
    type,id,name
    file,1016203876842,023A9785.CR3

    We get this:

    As you can see, we lost TAG-C from the first file. This might not be what you want.

    Let us know if this fits your use case or not. If not read on.

    0
    Comment actions Permalink
  • Rui Barbosa

    So now we look into the more complex scenario, consider this:

    We have multiple tags, and we want to replace TAG-A with TAG-B without loosing any of the other tags.

    Still searching for TAG-A, we now need to collect the existing tags into a file so we can process them later. We have been ignoring the types returned, so we want to make sure we are only getting files:

    ❯ box search "TAG-A" --as-user 18622116055 \
    --content-types tags \
    --type file \
    --csv --fields type,id,name,tags > tag-a-files.csv

    The file looks like:

    ❯ cat tag-a-files.csv
    ### output ###
    type,id,name,tags
    file,1016197618492,sample1.heic,"[""TAG-E"",""TAG-A""]"
    file,1016203876842,023A9785.CR3,"[""TAG-C"",""TAG-A""]"
    file,1016206416109,sample1.cr2,"[""TAG-D"",""TAG-A""]"

    Now we need to "massage" the file in a way that the tags contain what you want, and looks like this:

    ❯ cat tag-a-files-modified.csv
    ### output ###
    type,id,name,tags
    file,1016197618492,sample1.heic,"TAG-E,TAG-B"
    file,1016203876842,023A9785.CR3,"TAG-C,TAG-B"
    file,1016206416109,sample1.cr2,"TAG-D,TAG-B"

    For macOS, I'm using the sed command, but you can do this with a text editor:

    ❯ sed -e  "s/TAG-A/TAG-B/g" \
        -e "s/\[\"\"//g" \
        -e "s/\"\"\]//g" \
        -e "s/\"\"\,\"\"/\,/g" \
    ./tag-a-files.csv > tag-a-files-modified.csv

    Finally we can use this file to process the update in bulk:

    ❯ box files:update --as-user 18622116055 \
    --bulk-file-path ./tag-a-files-modified.csv \
    --csv --fields type,id,name
    ### output ###
    [========================================] 100% | 3/3
    type,id,name
    file,1016197618492,sample1.heic
    file,1016203876842,023A9785.CR3
    file,1016206416109,sample1.cr2
    All bulk input entries processed successfully.

    The files in the box app now look like:

     

    And we're done.

    This has been a long answer but I wanted to cover the multiple aspects of updating the tags.

    Let us know if this helped.

     

     

     

     

    0
    Comment actions Permalink
  • Tokutake Yuki(徳竹 湧気)

    Mr. Rui, sorry for the late reply.
    Thank you for your polite answer.
    I understood that you can do bulk updates by using csv and will consider it within my company.
    Thanks for the support. I value the help you've given me.

    0
    Comment actions Permalink
  • Rui Barbosa

    You are very welcome.

    0
    Comment actions Permalink
  • Rui Barbosa

    Hi Tokutake,

    Wanted to let you know that your question inspired a bog post.

    Do you follow us on medium?

    Check it out at here.

    Best regards

     

    0
    Comment actions Permalink
  • Craig Saper

    Hi Rui Barbosa -- thanks for sharing your insights on the blog!

    I have a handful of related questions regarding the Box CLI as it applies to tags, and thought it'd be more relevant here than creating a new post:

    • Is there a way to list all tag values being used in an account via CLI? I understand you can search for files/folders by the explicit tag string, but it'd take quite some time to manually traverse an entire fs in order to gather all unique tags present. As Box currently has no GUI tag management feature, seems like `box tags:list` would be a logical command, however I can't seem to find it in the developer docs. Would appreciate your guidance on this one...
    • Is there an argument for partial matching vs. exact matching of the input string? (i.e. for partial matching, searching with input string `apple` would match all of the following: `apple`, `apples`,`apple_pie`, `greenapples`; for exact matching, searching with input string `apple` would only match `apple`.) I know this can be done via the API with the usage of double quotes, but in the CLI, it appears that surrounding the query in double quotes outputs the same results as a query string sans quotes.
    • The search is not returning results for tags that seemingly containing colons (`:`), even if slash-escaped. Tags consisting of URLs (i.e. `https://sub.domain.com/`), therefore are not returned. Even if you search using query `sub.domain.com`, which doesn't contain colons or slashes, no results are returned, even though that query string should in theory match the tag `https://sub.domain.com/`. Note that if the tag is simply `sub.domain.com`, search results will be returned. But if the tag is `sub.domain.com:443`, no go. Is there a secret to this? Something I might be missing? ;-)

    Thanks for your guidance, Rui.

    0
    Comment actions Permalink
  • Craig Saper

    Hi Rui Barbosa -- just wanted to follow up on the above. Would you be so kind as to share your feedback on these questions please?

    Thanks!

    0
    Comment actions Permalink

Please sign in to leave a comment.