cancel
Showing results for 
Search instead for 
Did you mean: 

Could you please help with WHERE?

Gosforth
Graph Buddy

H,
I have test graph


I'd like to write query to get products of clients who bought the same what client 101 but DO NOT SHOW products of client 101 (show only their products). First relation only
So on the list should be products: 205, 202, 204

My query is:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE p <> p2
RETURN COLLECT(p2) AS otherProducts

And I get products 200,201,204,205,202
So WHERE clause does not work. I played with WITH - no result
So I tried as well:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE NOT (n:client)-[:PURCHASED]->(p:product)
RETURN p2

The difference is that I receive 3x 'null' (I'm not able to get product id's)
Finally I'd like to count relations with query:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[r:PURCHASED]->(p2:product)
WHERE p <> p2
RETURN p2.product_name, count(r) AS result
ORDER BY result DESC

But what I receive is... relation to product 202 = 3! (should be 2)

1 ACCEPTED SOLUTION

Hello,

In your case WHERE id(c1) < id(c2) won't be needed. If a client can only have a single :PURCHASED relationship to a specific product, then there's no way that c2 can be equal to c1. That type of WHERE clause is often used to prevent getting mirrored results, where the same nodes can appear twice in the results because they can be matched in both directions (by flipping which node is associated with which variable), but for this query that cannot happen.

Regarding your earlier question:

I do not understand why 'WHERE p <> p2' is not working...there's no logical mistake in that

There is a logical mistake actually, because you're only performing this comparison between two nodes per row. You're not considering the collection of nodes because you haven't been working with a collection, just individual nodes.

This is easier to see if you examine the state of what's going on in the query just before you execute the WHERE clause.

Execute this query:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN p.product_id as p, p2.product_id as p2

And you will see the following results:

╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│201 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│200 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
├───┼────┤
│200│200 │
└───┴────┘

It's clear here that you're looking at individual nodes, collections don't come into play at all.

Your WHERE clause WHERE p <> p2 will only filter out the rows associated with the same p and p2 nodes, so after the WHERE executes you're only stripping out the rows with identical values:

╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
└───┴────┘

So the reason why product 200 is present is because for that row, even though p2 = 200, p = 201. And product 201 is present because even though p2 = 201, for that row p = 200. For each row, p and p2 are not collections, they are individual nodes from paths that match the pattern you MATCHed to earlier.

What you actually want here is to work with collections. You were provided a different workaround where you exclude the pattern of the client 100 purchasing the product. But if we work with lists instead, we can do the same thing:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE NOT p2 IN productsInCommon
RETURN COLLECT(DISTINCT p2) AS otherProducts

In this case, we collect the products, and instead of using node inequality, our condition is that the product must not be in the list of products in common. This ensures that the query will check the node against every node in the list, rather than just against a single node per row.

And to better illustrate, let's look at the rows returned when we return after doing the first collection and optional match but before the filter:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN [prod IN productsInCommon | prod.product_id] as productsInCommon, p2.product_id as p2

And we see this:

╒══════════════════╤════╕
│"productsInCommon"│"p2"│
╞══════════════════╪════╡
│[201,200]         │204 │
├──────────────────┼────┤
│[201,200]         │200 │
├──────────────────┼────┤
│[201,200]         │201 │
├──────────────────┼────┤
│[201,200]         │202 │
├──────────────────┼────┤
│[200]             │202 │
├──────────────────┼────┤
│[200]             │205 │
├──────────────────┼────┤
│[200]             │200 │
└──────────────────┴────┘

You can see that the filtering, based on the list, will find the right nodes this time.

View solution in original post

7 REPLIES 7

koji
Ninja
Ninja

Hi Gosforth,

I created this cypher.

CREATE (c100:client {client_id: 100}),
(c101:client {client_id: 101}),
(c102:client {client_id: 102}),
(c103:client {client_id: 103}),
(p200:product {product_id: 200, product_name: "Product 200"}),
(p201:product {product_id: 201, product_name: "Product 201"}),
(p202:product {product_id: 202, product_name: "Product 202"}),
(p203:product {product_id: 203, product_name: "Product 203"}),
(p204:product {product_id: 204, product_name: "Product 204"}),
(p205:product {product_id: 205, product_name: "Product 205"}),
(c100)-[:PURCHASED]->(p200),
(c100)-[:PURCHASED]->(p201),
(c101)-[:PURCHASED]->(p200),
(c101)-[:PURCHASED]->(p201),
(c101)-[:PURCHASED]->(p202),
(c101)-[:PURCHASED]->(p204),
(c102)-[:PURCHASED]->(p200),
(c102)-[:PURCHASED]->(p202),
(c102)-[:PURCHASED]->(p205),
(c103)-[:PURCHASED]->(p203),
(c103)-[:PURCHASED]->(p204)

I changed the line from "WHERE NOT (n:client)-[:PURCHASED]->(p:product)" to "WHERE NOT (n)-[:PURCHASED]->(p2)" in your 2nd Cypher.
Then you can see 204,205,202 products.

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE NOT (n)-[:PURCHASED]->(p2)
RETURN p2

Is this answer correct?

Thank you very much you dedicated your time to help me.
Yep, result seems to what I need. I do not understand why 'WHERE p <> p2' is not working...there's no logical mistake in that... So WHERE works here on relations not the collection of nodes.

Anyway, finally I'd like to count no of relations to p2 products; how many relations was to product_id: 205, product_id: 202, product_id: 204.
With the below query:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[r:PURCHASED]->(p2:product)
WHERE NOT (n)-[:PURCHASED]->(p2)
RETURN DISTINCT p2.product_id,count(r) AS no_of_relations

I get result:

|p2.product_id|no_of_relations|
|202|3|
|204|2|
|205|1|

Why product 202 has 3 relations? should be 2.
Ideal if I count relations of the p2 products where n & n2 customers have at least two products in common.

Regards,
G

Can you change the count(r) to count(distinct r)?

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[r:PURCHASED]->(p2:product)
WHERE NOT (n)-[:PURCHASED]->(p2)
RETURN DISTINCT p2.product_id,count(distinct r) AS no_of_relations

It is possible there are multiple matches where the optional match is getting executed with the same result.

By making distinct you are counting a relationship only once.

Works. Thanks! Now I go to phase "show p2 products where n & n2 customers have at least two products in common"

I've finally managed to write query (studying forums) - is it OK?:

MATCH (c1:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(c2:client)
WHERE id(c1) < id(c2)
WITH c1, c2, collect(p) as commonProducts
WHERE size(commonProducts) >= 2
OPTIONAL MATCH (c2)-[r:PURCHASED]->(p2:product)
WHERE NOT (c1)-[:PURCHASED]->(p2)
Return DISTINCT p2.product_id, count(r) as no_of_relations

And it gives me recommendations of other products where clients have at least two products in common with client_id 100

p2.product_id no_of_relations
202 1
204 1

I this second line is needless (dos nothing ?):
WHERE id(c1) < id(c2)

Regards,
G

Hello,

In your case WHERE id(c1) < id(c2) won't be needed. If a client can only have a single :PURCHASED relationship to a specific product, then there's no way that c2 can be equal to c1. That type of WHERE clause is often used to prevent getting mirrored results, where the same nodes can appear twice in the results because they can be matched in both directions (by flipping which node is associated with which variable), but for this query that cannot happen.

Regarding your earlier question:

I do not understand why 'WHERE p <> p2' is not working...there's no logical mistake in that

There is a logical mistake actually, because you're only performing this comparison between two nodes per row. You're not considering the collection of nodes because you haven't been working with a collection, just individual nodes.

This is easier to see if you examine the state of what's going on in the query just before you execute the WHERE clause.

Execute this query:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN p.product_id as p, p2.product_id as p2

And you will see the following results:

╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│201 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│200 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
├───┼────┤
│200│200 │
└───┴────┘

It's clear here that you're looking at individual nodes, collections don't come into play at all.

Your WHERE clause WHERE p <> p2 will only filter out the rows associated with the same p and p2 nodes, so after the WHERE executes you're only stripping out the rows with identical values:

╒═══╤════╕
│"p"│"p2"│
╞═══╪════╡
│201│204 │
├───┼────┤
│201│200 │
├───┼────┤
│201│202 │
├───┼────┤
│200│204 │
├───┼────┤
│200│201 │
├───┼────┤
│200│202 │
├───┼────┤
│200│202 │
├───┼────┤
│200│205 │
└───┴────┘

So the reason why product 200 is present is because for that row, even though p2 = 200, p = 201. And product 201 is present because even though p2 = 201, for that row p = 200. For each row, p and p2 are not collections, they are individual nodes from paths that match the pattern you MATCHed to earlier.

What you actually want here is to work with collections. You were provided a different workaround where you exclude the pattern of the client 100 purchasing the product. But if we work with lists instead, we can do the same thing:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
WHERE NOT p2 IN productsInCommon
RETURN COLLECT(DISTINCT p2) AS otherProducts

In this case, we collect the products, and instead of using node inequality, our condition is that the product must not be in the list of products in common. This ensures that the query will check the node against every node in the list, rather than just against a single node per row.

And to better illustrate, let's look at the rows returned when we return after doing the first collection and optional match but before the filter:

MATCH (n:client {client_id: 100})-[:PURCHASED]->(p:product)<-[:PURCHASED]-(n2:client)
WITH n2, collect(p) as productsInCommon
OPTIONAL MATCH (n2)-[:PURCHASED]->(p2:product)
RETURN [prod IN productsInCommon | prod.product_id] as productsInCommon, p2.product_id as p2

And we see this:

╒══════════════════╤════╕
│"productsInCommon"│"p2"│
╞══════════════════╪════╡
│[201,200]         │204 │
├──────────────────┼────┤
│[201,200]         │200 │
├──────────────────┼────┤
│[201,200]         │201 │
├──────────────────┼────┤
│[201,200]         │202 │
├──────────────────┼────┤
│[200]             │202 │
├──────────────────┼────┤
│[200]             │205 │
├──────────────────┼────┤
│[200]             │200 │
└──────────────────┴────┘

You can see that the filtering, based on the list, will find the right nodes this time.

Thank you very much indeed!