About bulk tag maintenance
I would like to maintain the tags assigned to Box in bulk, is there any good solution? I was considering a way to use Box CLI, but the description of the CSV template does not mention tags, so I am looking for a solution.
-
Hi Tokutake san,
Can you elaborate your use case a bit more, I'm not sure what are you trying to accomplish.
If by "maintaining the tags assigned in bulk" you mean for example replacing one tag by another in all content, you should be able to do a search by the old tag and then replace by the new tag in every content item.
Let us know, and I can investigate the csv format for you.
-
Thank you for your reply, Mr. Rui.
What I want to achieve and the conditions are as follows.
■What we want to achieve
- I want to change the tags assigned to files in bulk with CLI. (e.g. tagA→tagB)
- I want to delete all tags assigned to files in bulk with CLI.
■Conditions
・I am not an admin user. I am an ordinary user of a company.Also, please let me confirm the solution method that you received from Mr. Rui.
「you should be able to do a search by the old tag and then replace by the new tag in every content item.」
Does this mean manually replacing all tags one by one? -
Hi Tokutake san,
I think I understand what you are looking for.
So, a central place where you can list, create, rename or delete all user created tags, does not exist in the box.com app (the normal UI), neither is implemented on the CLI.
This mean we can not just rename TAG-A with TAG-B and be done with it.
You can, however, search by tag, and update the tags. Please do try any of the examples in some sample files and tags first, until you get comfortable with the scripts and are sure you're not over writing any important tags for you.
I'm also assuming you already have the CLI configured and authorized.
In my examples I'm using the CLI with JWT authentication. This means the CLI is authenticated via a service account and I must use the --as-user flags to interact on behalf of the user who owns the content. I also often include the --csv and --fields flags so the output is shorter and readable on this forum. Finally I'm using zshell (macOS), but these examples can be adapted to use powershell (windows) or bash (linux).
First lets check who the CLI user is:
❯ box users:get --csv --fields type,id,name
### output ###
type,id,name
user,20130487697,JWTand then locate the user who I want the CLI to impersonate (again this may not be applicable to your case)
❯ box users --csv --fields type,id,name
type,id,name
...
user,22240548078,Investment User
user,22240405099,Wealth User
user,22240545678,Wholesale User
user,18622116055,Rui BarbosaIn my, case the user I want the CLI to impersonate is --as-user 18622116055
My sample files in the box app look like this:
Please note that if you have just tagged some sample files, the tags are immediately visible in the box app, but the search can take a few minutes to index the recently applied tags.
To search all files with TAG-A:
❯ box search "TAG-A" --as-user 18622116055 \
--content-types tags \
--csv --fields type,id,name
### output ###
type,id,name
file,1016197618492,sample1.heic
file,1016206416109,sample1.cr2
file,1016203876842,023A9785.CR3Now we can pipe the output of this command to a csv file in order to update all files in bulk:
❯ box search "TAG-A" --as-user 18622116055 \
--content-types tags \
--csv --fields type,id,name > ./files-tag-a.csvand check the contents of that file (it should be the same as the output)
❯ cat ./files-tag-a.csv
### output ###
type,id,name
file,1016197618492,sample1.heic
file,1016206416109,sample1.cr2
file,1016203876842,023A9785.CR3Now all we need to do is to update the tags of these files.
Please note that these files only have one tag, and so we can just replace it with the new one. If in your use case, the files have multiple tags, then we would need a different approach, since the next command will replace ALL tags with TAG-B.
❯ box files:update --as-user 18622116055 \
--bulk-file-path ./files-tag-a.csv \
--tags "TAG-B" \
--csv --fields type,id,name,tags
### output ###
[========================================] 100% | 3/3
type,id,name
file,1016197618492,sample1.heic
file,1016206416109,sample1.cr2
file,1016203876842,023A9785.CR3
All bulk input entries processed successfully.If we look at the box app, all file tags have been replaced:
Again the search will take a few minutes to re-index the newly applied tags:
❯ box search "TAG-B" --as-user 18622116055 --content-types tags --csv --fields type,id,name
### output ###
type,id,name
file,1016197618492,sample1.heic
file,1016206416109,sample1.cr2
file,1016203876842,023A9785.CR3Back to the files you'll notice that TAG-A does not exists anymore in the tag list, this is because it is removed if not used in any files.
So at this point we have successfully replaced ALL tags in files that had the TAG-A with TAG-B.
This message is already way too long, I'll continue my analysis in the next message...
-
Please be aware if the files have more than one tag.
Consider the following:
Now all files are tagged with TAG-B and TAG-C. What happens if I try to replace TAG-B with TAG-A using the previous technique?
❯ box files:update 1016203876842 \
--as-user 18622116055 \
--tags "TAG-A" \
--csv --fields type,id,name,tags
### output ###
type,id,name
file,1016203876842,023A9785.CR3We get this:
As you can see, we lost TAG-C from the first file. This might not be what you want.
Let us know if this fits your use case or not. If not read on.
-
So now we look into the more complex scenario, consider this:
We have multiple tags, and we want to replace TAG-A with TAG-B without loosing any of the other tags.
Still searching for TAG-A, we now need to collect the existing tags into a file so we can process them later. We have been ignoring the types returned, so we want to make sure we are only getting files:
❯ box search "TAG-A" --as-user 18622116055 \
--content-types tags \
--type file \
--csv --fields type,id,name,tags > tag-a-files.csvThe file looks like:
❯ cat tag-a-files.csv
### output ###
type,id,name,tags
file,1016197618492,sample1.heic,"[""TAG-E"",""TAG-A""]"
file,1016203876842,023A9785.CR3,"[""TAG-C"",""TAG-A""]"
file,1016206416109,sample1.cr2,"[""TAG-D"",""TAG-A""]"Now we need to "massage" the file in a way that the tags contain what you want, and looks like this:
❯ cat tag-a-files-modified.csv
### output ###
type,id,name,tags
file,1016197618492,sample1.heic,"TAG-E,TAG-B"
file,1016203876842,023A9785.CR3,"TAG-C,TAG-B"
file,1016206416109,sample1.cr2,"TAG-D,TAG-B"For macOS, I'm using the sed command, but you can do this with a text editor:
❯ sed -e "s/TAG-A/TAG-B/g" \
-e "s/\[\"\"//g" \
-e "s/\"\"\]//g" \
-e "s/\"\"\,\"\"/\,/g" \
./tag-a-files.csv > tag-a-files-modified.csvFinally we can use this file to process the update in bulk:
❯ box files:update --as-user 18622116055 \
--bulk-file-path ./tag-a-files-modified.csv \
--csv --fields type,id,name
### output ###
[========================================] 100% | 3/3
type,id,name
file,1016197618492,sample1.heic
file,1016203876842,023A9785.CR3
file,1016206416109,sample1.cr2
All bulk input entries processed successfully.The files in the box app now look like:
And we're done.
This has been a long answer but I wanted to cover the multiple aspects of updating the tags.
Let us know if this helped.
-
Hi Tokutake,
Wanted to let you know that your question inspired a bog post.
Do you follow us on medium?
Check it out at here.
Best regards
-
Hi Rui Barbosa -- thanks for sharing your insights on the blog!
I have a handful of related questions regarding the Box CLI as it applies to tags, and thought it'd be more relevant here than creating a new post:- Is there a way to list all tag values being used in an account via CLI? I understand you can search for files/folders by the explicit tag string, but it'd take quite some time to manually traverse an entire fs in order to gather all unique tags present. As Box currently has no GUI tag management feature, seems like `box tags:list` would be a logical command, however I can't seem to find it in the developer docs. Would appreciate your guidance on this one...
- Is there an argument for partial matching vs. exact matching of the input string? (i.e. for partial matching, searching with input string `apple` would match all of the following: `apple`, `apples`,`apple_pie`, `greenapples`; for exact matching, searching with input string `apple` would only match `apple`.) I know this can be done via the API with the usage of double quotes, but in the CLI, it appears that surrounding the query in double quotes outputs the same results as a query string sans quotes.
- The search is not returning results for tags that seemingly containing colons (`:`), even if slash-escaped. Tags consisting of URLs (i.e. `https://sub.domain.com/`), therefore are not returned. Even if you search using query `sub.domain.com`, which doesn't contain colons or slashes, no results are returned, even though that query string should in theory match the tag `https://sub.domain.com/`. Note that if the tag is simply `sub.domain.com`, search results will be returned. But if the tag is `sub.domain.com:443`, no go. Is there a secret to this? Something I might be missing? ;-)
Thanks for your guidance, Rui.
-
Hi Rui Barbosa -- just wanted to follow up on the above. Would you be so kind as to share your feedback on these questions please?
Thanks!
Please sign in to leave a comment.
Comments
10 comments