All permutations of X number of nodes

(Dave) #1

I'm trying to create collections of collections, or all permuatations of X nodes from the following query and want to be able to apply filter on the total bundle price, eg bundlePrice < 20

The following query will get a single permutation, but not all of them.

MATCH p=(product:Product)-->(addon:Addon)<--(shop:Shop)
WITH shop, COLLECT(addon)[..2] AS addonCol
UNWIND addonCol AS addons
MATCH (addons)<-[sells:SELLS]-(shop) WITH SUM(sells.price) AS bundlePrice, shop, addonCol
RETURN distinct addonCol, bundlePrice, shop

I have setup a small instance of the db here in the console

http://console.neo4j.org/?id=5tssqf

Have looked through apoc and cannot see anything that jumps out and couln't really see a way within the docs that was obvious, but maybe I'm missing something.

Any ideas on how to do this?

0 Likes

(Andrew Bowman) #2

While we currently don't have handling for dynamic permutations, we do have a function in APOC Procedures for getting all possible combinations of a collection, and I think that might be what you're after here. The function is apoc.coll.combinations(), and it takes 3 parameters: 1) the collection 2) the minimum number of items for the combination (seems like you would want 1 as the minimum) 3) The maximum number of items for the combination (seems like the size of the collection would make sense).

Here's a query which should provide all combinations of addons for a shop and filter to keep only those whose bundle price is < 20:

MATCH (:Product)-->(addon:Addon)<--(shop:Shop)
WITH shop, COLLECT(addon) as addons
WITH shop, apoc.coll.combinations(addons, 1, size(addons)) AS addonCol
WITH shop, addonCol, reduce(total = 0, addon in addonCol | total + [(addon)<-[sells:SELLS]-(shop) | sells.price][0]) as bundlePrice
WHERE bundlePrice < 20
RETURN shop, addonCol, bundlePrice

Keep in mind that this is considering combinations of ALL addons, regardless of product. If you wanted to only look at combinations of addons with respect to each product then you would need to include the product in your query (in your WITHs so the aggregations and grouping would be with respect to each product).

0 Likes

(Dave) #3

Thanks @andrew.bowman, but sadly I'm getting the following error related to the reduce() function

Neo.ClientError.Statement.TypeError: Expected to find a node at ref slot 3 but found List{(152749), (99717), (26033)} instead

0 Likes

(Andrew Bowman) #4

Odd, the same query is working for me in 3.5.2 and 3.4.12. What version of Neo4j and APOC are you running?

0 Likes

(Dave) #5

I'm running Neo 3.5.0 and APOC 3.5.0.1

Will run an update on the Neo instance and see if that has any effect.

0 Likes

(Andrew Bowman) #6

Ah, my bad on this one, the way I was checking was only checking for compilation messages, this one is thrown at runtime, and I can see the flaw causing it.

apoc.coll.combinations() returns a list of list combinations, we need to UNWIND this list so that we're working with each combination set. Try this instead:

MATCH (:Product)-->(addon:Addon)<--(shop:Shop)
WITH shop, COLLECT(addon) as addons
WITH shop, apoc.coll.combinations(addons, 1, size(addons)) AS addonCombos
UNWIND addonCombos as addonCol
WITH shop, addonCol, reduce(total = 0, addon in addonCol | total + [(addon)<-[sells:SELLS]-(shop) | sells.price][0]) as bundlePrice
WHERE bundlePrice < 20
RETURN shop, addonCol, bundlePrice
1 Like

(Dave) #7

Ahh that makes sense... Thank you so much

0 Likes

(Rmspeers) #8

Hi all -- this appears to (based on the Java code used underneath apoc.coll.combinations()) be computing combinations -- not permutations. I am looking for permutations -- where I get all results where order matters in the pairings. Can someone suggest how to do that?

I found a way to do it as shown below, but it is... ugly -- and would be worse with >2 items in the permutation list. I would presume there is a better way?

MATCH (f:File)<-[:relationExample]-(crt:ItemType)
  WITH COLLECT(crt) as crts
  WITH apoc.coll.combinations(crts, 2) AS crtCombos
  UNWIND crtCombos as crtCol
  WITH crtCol[0] AS crt, crtCol[1] AS icrt
    MATCH (crt), (icrt)
    WHERE crt.parentHash = icrt.myHash
    MERGE (crt)-[r:itemToItemRelationship]->(icrt)
    RETURN crt, icrt
  UNION
    MATCH (crt), (icrt)
    WHERE icrt.parentHash = crt.myHash
    MERGE (icrt)-[r:itemToItemRelationship]->(crt)
    RETURN crt, icrt
0 Likes

(Andrew Bowman) #9

As you noted, the original question turned out to be asking about combinations, not permutations, and we don't have APOC support for permutations at this time.

If the number of items in the permutation is hardcoded, then you can accomplish this with doing multiple UNWINDs of a list.

That said, your use case just looks like you want to find these nodes where the parentHash and myHash are equal and create relationships between them. You don't need permutations for this.

Instead, you'll need an index on :ItemType(parentHash) for quick lookup, and a query like this:

MATCH (icrt:ItemType)
WHERE (:File)<-[:relationExample]-(icrt) 
MATCH (crt:ItemType)
WHERE crt.parentHash = icrt.myHash AND (:File)<-[:relationExample]-(crt)
MERGE (crt)-[r:itemToItemRelationship]->(icrt)

There are optimization you can make depending on the nature of the relationship to :File nodes (whether this check is needed on both, and whether a :relationExample relationship always connects to a :File node).

1 Like

(Rmspeers) #10

Thank you for your help. It turns out that my query above seems to have failed to scope crt and icrt to items that were connected to the given (f:File), and thus affected too much in my graph.

I believe that your method is working, and appreciate that suggestion. I am a bit worried about performance as I may have many crt:ItemType's in the graph, and I had perhaps narrowed my example too much to be able to show the complexity of the query relating ItemType to the root that I know. Here is perhaps a better example, shared in the syntax that I made off of your suggestion.

MATCH (crt:ItemType)
WHERE (:MainNode {id: 'XXXX'})-[:elemRel]->(:SecondNode)-[:root]->(:BaseNode)-[:contents *0..50]->(:File)<-[:extractedFrom]-(crt:ItemType)
MATCH (icrt:ItemType)
WHERE crt.parentHash = icrt.myHash AND (:MainNode {id: 'XXXX'})-[:elemRel]->(:SecondNode)-[:root]->(:BaseNode)-[:contents *0..50]->(:File)<-[:extractedFrom]-(icrt:ItemType)
MERGE (crt)-[r:itemToItemRelationship]->(icrt)

When I look at this in the profile plan, I see two sets of evaluation for the long parts of the WHERE clauses, happening in parallel but doing the work for both. The index does help cut-down on the 'width' (I'm probably using the wrong word) of the second evaluation path for it, at least.

I wonder if there is a way to name the (:BaseNode) element, or the collection of (:File) elements, and then reuse that variable in both WHERE clauses to help optimize?

0 Likes

(Andrew Bowman) #11

Yes, we can optimize this one as you noted. We'll need an index on :ItemType(myHash) to speed up the second match:

MATCH (:MainNode {id: 'XXXX'})-[:elemRel]->(:SecondNode)-[:root]->(:BaseNode)-[:contents *0..50]->(:File)<-[:extractedFrom]-(crt:ItemType)
WITH collect(DISTINCT crt) as crts
UNWIND crts as crt // now both crt and the crts list are in scope
MATCH (icrt:ItemType)
WHERE crt.parentHash = icrt.myHash AND icrt IN crts
MERGE (crt)-[r:itemToItemRelationship]->(icrt)
0 Likes