Hey Guys,
I'm pretty new to Neo4J and Cypher, I hope someone can help me with a weird performance issue when running a query with apoc.path.expandConfig.
The goal is to build a customer specific pricing module, for an e-business platform. Where users reside in a tree of usergroups, and products in in a tree of categories (Folders in this query). And there are Pricesheet, which basically are a labelled container of discounts with a 1-n relationship on a User node.
I'm using the apoc.path.expandConfig to get a subgraph of all path that somehow link a specific user to a product while going through a Discount node. That data this query is returning is looks fine, and I can work from that. But I'm running into a weird issue with performance. As soon as i put the limit parameter to one higher than the actual paths returned from the subgraph, the execution time of the query goes from around 20ms between 1 and 2 seconds.
This is a sample query:
MATCH (start:User {userId:14554}),
(end:Product {productId: 38845})
USING INDEX end:Product(productId)
USING INDEX start:User(userId)
CALL apoc.path.expandConfig(
start,
{
relationshipFilter:
'<DISCOUNT_FOR_USER|DISCOUNT_FOR_FOLDER>|DISCOUNT_FOR_PRODUCT>|<DISCOUNT_FOR_USERGROUP|<DISCOUNT_FOR_PRICESHEET|<PRODUCT_BELONGS_TO|<FOLDER_BELONGS_TO|INHERIT_DISCOUNTS>|PRICESHEET_FOR_USER',
endNodes:end,
limit: 18,
labelFilter: '+Usergroup|+Discount|+Folder|+Product|+PriceSheet'
}
) YIELD path
RETURN path
This outputs: Started streaming 18 records after 1 ms and completed after 14 ms.
Profile:
The same query, with only the limit changed:
MATCH (start:User {userId:14554}),
(end:Product {productId: 38845})
USING INDEX end:Product(productId)
USING INDEX start:User(userId)
CALL apoc.path.expandConfig(
start,
{
relationshipFilter:
'<DISCOUNT_FOR_USER|DISCOUNT_FOR_FOLDER>|DISCOUNT_FOR_PRODUCT>|<DISCOUNT_FOR_USERGROUP|<DISCOUNT_FOR_PRICESHEET|<PRODUCT_BELONGS_TO|<FOLDER_BELONGS_TO|INHERIT_DISCOUNTS>|PRICESHEET_FOR_USER',
endNodes:end,
limit: 19,
labelFilter: '+Usergroup|+Discount|+Folder|+Product|+PriceSheet'
}
) YIELD path
RETURN path
This outputs: Started streaming 18 records in less than 1 ms and completed after 1510 ms.
So the difference is really huge and for this specific piece of functionality, I really need all paths. So i do want to remove the limit completely if possible. But removing the limit also results in the same poor performance.
Hope someone can spread some light on this issue. Also open to any tips on how the get a similar result using a different approach, that may be more performant.
If any additional info is needed please me know.
Thanks in advance :)