Hello,
In your case WHERE id(c1) < id(c2)
won't be needed. If a client can only have a single :PURCHASED relationship to a specific product, then there's no way that c2 can be equal to c1. That type of WHERE clause is often used to prevent getting mirrored results, where the same nodes can appear twice in the results because they can be matched in both directions (by flipping which node is associated with which variable), but for this query that cannot happen.
Regarding your earlier question:
I do not understand why 'WHERE p <> p2' is not working...there's no logical mistake in that
There is a logical mistake actually, because you're only performing this comparison between two nodes per row. You're not considering the collection of nodes because you haven't been working with a collection, just individual nodes.
This is easier to see if you examine the state of what's going on in the query just before you execute the WHERE clause.
Execute this query:
MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN p.product_id as p, p2.product_id as p2
And you will see the following results:
╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│201 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│200 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
├───┼────┤
│200│200 │
└───┴────┘
It's clear here that you're looking at individual nodes, collections don't come into play at all.
Your WHERE clause WHERE p <> p2
will only filter out the rows associated with the same p
and p2
nodes, so after the WHERE executes you're only stripping out the rows with identical values:
╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
└───┴────┘
So the reason why product 200 is present is because for that row, even though p2 = 200, p = 201. And product 201 is present because even though p2 = 201, for that row p = 200. For each row, p
and p2
are not collections, they are individual nodes from paths that match the pattern you MATCHed to earlier.
What you actually want here is to work with collections. You were provided a different workaround where you exclude the pattern of the client 100 purchasing the product. But if we work with lists instead, we can do the same thing:
MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE NOT p2 IN productsInCommon
RETURN COLLECT(DISTINCT p2) AS otherProducts
In this case, we collect the products, and instead of using node inequality, our condition is that the product must not be in the list of products in common. This ensures that the query will check the node against every node in the list, rather than just against a single node per row.
And to better illustrate, let's look at the rows returned when we return after doing the first collection and optional match but before the filter:
MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN [prod IN productsInCommon | prod.product_id] as productsInCommon, p2.product_id as p2
And we see this:
╒══════════════════╤════╕
│"productsInCommon"│"p2"│
╞══════════════════╪════╡
│[201,200] │204 │
├──────────────────┼────┤
│[201,200] │200 │
├──────────────────┼────┤
│[201,200] │201 │
├──────────────────┼────┤
│[201,200] │202 │
├──────────────────┼────┤
│[200] │202 │
├──────────────────┼────┤
│[200] │205 │
├──────────────────┼────┤
│[200] │200 │
└──────────────────┴────┘
You can see that the filtering, based on the list, will find the right nodes this time.