ExpandConfig performance when limit exceeds paths found

Hey Guys,

I'm pretty new to Neo4J and Cypher, I hope someone can help me with a weird performance issue when running a query with apoc.path.expandConfig.

The goal is to build a customer specific pricing module, for an e-business platform. Where users reside in a tree of usergroups, and products in in a tree of categories (Folders in this query). And there are Pricesheet, which basically are a labelled container of discounts with a 1-n relationship on a User node.

I'm using the apoc.path.expandConfig to get a subgraph of all path that somehow link a specific user to a product while going through a Discount node. That data this query is returning is looks fine, and I can work from that. But I'm running into a weird issue with performance. As soon as i put the limit parameter to one higher than the actual paths returned from the subgraph, the execution time of the query goes from around 20ms between 1 and 2 seconds.

This is a sample query:

   MATCH  (start:User {userId:14554}), 
        (end:Product {productId: 38845})
        USING INDEX end:Product(productId)
        USING INDEX start:User(userId)
        CALL apoc.path.expandConfig(
          start, 
          {
            relationshipFilter:
              '<DISCOUNT_FOR_USER|DISCOUNT_FOR_FOLDER>|DISCOUNT_FOR_PRODUCT>|<DISCOUNT_FOR_USERGROUP|<DISCOUNT_FOR_PRICESHEET|<PRODUCT_BELONGS_TO|<FOLDER_BELONGS_TO|INHERIT_DISCOUNTS>|PRICESHEET_FOR_USER',     
            endNodes:end, 
        	limit: 18,
            labelFilter: '+Usergroup|+Discount|+Folder|+Product|+PriceSheet'
          }        
        ) YIELD path
        RETURN path

This outputs: Started streaming 18 records after 1 ms and completed after 14 ms.

Profile:

The same query, with only the limit changed:

 
   MATCH  (start:User {userId:14554}), 
        (end:Product {productId: 38845})
        USING INDEX end:Product(productId)
        USING INDEX start:User(userId)
        CALL apoc.path.expandConfig(
          start, 
          {
            relationshipFilter:
              '<DISCOUNT_FOR_USER|DISCOUNT_FOR_FOLDER>|DISCOUNT_FOR_PRODUCT>|<DISCOUNT_FOR_USERGROUP|<DISCOUNT_FOR_PRICESHEET|<PRODUCT_BELONGS_TO|<FOLDER_BELONGS_TO|INHERIT_DISCOUNTS>|PRICESHEET_FOR_USER',     
            endNodes:end, 
        	limit: 19,
            labelFilter: '+Usergroup|+Discount|+Folder|+Product|+PriceSheet'
          }        
        ) YIELD path
        RETURN path

This outputs: Started streaming 18 records in less than 1 ms and completed after 1510 ms.

So the difference is really huge and for this specific piece of functionality, I really need all paths. So i do want to remove the limit completely if possible. But removing the limit also results in the same poor performance.

Hope someone can spread some light on this issue. Also open to any tips on how the get a similar result using a different approach, that may be more performant.

If any additional info is needed please me know.

Thanks in advance :)

The problem is that when it has to run an exhaustive search, the number of paths it needs to check must be quite large and take more time to finish. Unless there is some way you can know ahead of time how many paths there are, there's no way for the procedure to know when to stop searching until it has finished that exhaustive search.

You can check for yourself by removing the end node parameter to the call, and returning the count() of the paths, that can give you some idea of the work it is doing, the total paths needing to be expanded and filtered. Work done within procedures are not reflected in db hits, so you won't be able to otherwise get a good understanding of the work that a procedure is doing.

Hi Andrew,

Thanks for your reply, this sounds like a very plausible explanation.

I'll try a different approach by first fetching unique paths (user->usergroups and product->folder) first and try to expand from there. See how that goes. Still figuring out how this graph stuff works :)