Search for matching image
AnsweredI'd like to be able to eliminate duplicates of images in Box. For example, if I want to upload a logo and I know that image is already in Box somewhere, I'd like to have some way to search for that image across the entire account. In my own use case, I would like to be able to do it through the API.
I might have multiple files named logo.png that are of different logos, so I can't rely on the file name. If metadata hasn't been used sufficiently, I can't use that. I'd like to be able to find the EXACT image that exists in the cloud without having to rely on humans doing their job correctly.
I can imagine providing the image and Box searching based on a hash of the image bytes or something. Perhaps providing the hash to the API? Other ideas? The quicker the search response, the better.
I saw an announcement about integration with Amazon Rekognition, but that seems aimed at recognizing what the picture is of. Perhaps it also has a function for recognizing an image match?
Is there a way to accomplish this currently?
-
Here is one possible solution, but there might be better alternative approaches. We provide a SHA-1 hash for every file in Box. You can use the search endpoint to query files by name and limit the results to files with an image extension. Then you can compare the SHA-1 of your image file with possible duplicates by looking at SHA-1 of the files in the search results.
-
I've been playing with the search API and may have found an issue, either with the documentation or the API.
It appears that the query and file_extensions parameters are required, but that isn't indicated in the docs.
DOESNT WORK:
parameters:
size_range=1207969,1207969
returns:
{ "type": "error", "status": 400, "code": "bad_request", "context_info": { "errors": [ { "reason": "invalid_parameter", "name": "to_search", "message": "Invalid value ''." } ] }, "help_url": "http://developers.box.com/docs/#errors", "message": "Bad Request", "request_id": "***number removed for privacy***9590ab191ad129" }
DOES WORK:
parameters:
query=IMG file_extensions=jpg size_range=1207969,1207969
To get a valid response, I must have the query and file_extensions included when I use size_range. Even without the size_range, it requires the file_extensions parameter. Otherwise, I get 0 results.
I hope this helps others too!
-
Thanks for sharing your solution! I like your approach of using the file size range parameter to more efficiently return search results.
Thanks for catching this issue in our documentation. You are correct that a query parameter is required to use this endpoint, but the file extension parameter is optional. I updated our documentation to show the query string parameter is required.
Please sign in to leave a comment.
Comments
5 comments