Welcome to the new Box Support website. Check out all the details here on what’s changed.

Ability to query ALL entities using enterprise-wide search

Completed
New post

Comments

6 comments

  • Official comment
    Alex Novotny

    Hello, 

    You have to provide some sort of search parameter or filter. Why are you looking to crawl through the entire enterprise? If I can understand a bit more about the use case, I may be able to help a little further. 

    Thanks, 

    Alex, Box Developer Advocate

    Comment actions Permalink
  • Oleg Zimakov

    Hello Alex,

    Thanks for your reply.

    We are building an enterprise index intended to allow users to search objects (files, folders, web links in Box) across different repositories including but not limited to Box. To put information into the index our crawler application should scrape all objects. The best option would be to query ALL objects sorted by update timestamp and go over the result set.

    Thanks and looking forward to hearing from you.

    0
    Comment actions Permalink
  • Alex Novotny

    Ah! I see. This is an interesting use case. 

    This could be potentially problematic... because how would the solution know that new content exists? I wouldn't think iterating through the entire enterprise content everytime is the most efficient use of resources. Unfortunately, I'm not sure there would be another way to index all the content. The search API does have a created/modified date parameter but you would still need something to search by - so if there isn't anything, you would need to traverse the folder trees. 

    Does one user own all content in your enterprise? Or does every user own their own content? 

     

    Alex, Box Developer Advocate

    0
    Comment actions Permalink
  • Oleg Zimakov

    There are 2 phases considered in this process - initial indexing of the existing data and following indexing of changes.

    For the first phase, aside from the fact that we can't pass empty or wildcard queries to get "everything", it would be enough to use the following parameters to sort the result set and define a window:

    For the second phase, we were planning to switch over processing enterprise event feed to reflect all the changes: https://developer.box.com/guides/events/for-enterprise/

    As Box partners, we build software for different customers, so we must consider the most common use case. Unfortunately, it means each user owns the content. And the number of users might be quite high. That's why I would avoid traversing user folders.

    0
    Comment actions Permalink
  • Alex Novotny

    I reach out to some internal folks to get some more insight/recommendations.

    Unfortunately, I don't really have great news. The search api is not going to be useful here due to there not being a way to search by only date... 

    The only way to do it would be to crawl through every users owned objects building your index as you go... followed by using the events stream to add/remove from the index as time goes on. 

    We just don't have an endpoint or easy way to get the information you are wanting all at once today. 

    Alex

    0
    Comment actions Permalink
  • Oleg Zimakov

    Alex,

    Thank you for your attention to our challenge. We came up with the idea of performing a search request with query = "NOT <VERY_UNIQUE_VALUE>" where <VERY_UNIQUE_VALUE> might be UUID or any other string which can not appear in real user objects with a super high probability. It returns a lot of objects so we hope this is the solution we were looking for.

    Thank you.

    -- Oleg.

    0
    Comment actions Permalink

Please sign in to leave a comment.