MATCH (u)-[]-(subset:Subscription)
WHERE ...
WITH subset
MATCH (u)-[]-(superset:Subscription)
WHERE ...
I know there will be some nodes from subset that are duplicated in superset, and I want to remove those nodes from superset before I return them. How can I do that?
The structure of this query is problematic. The results of the first query are not correlated with the second query. As a result, each row returned from the first query is going to executed the same match each time. The result will be the Cartesian product of the results from each match statement.
Secondly, the two queries are the same pattern, just with a different binding variable for the Subscription nodes.
Can you provide the 'where' statements so I can try to understand your intentions? Or/and, can you describe what brought you to developing this query?
Thanks for the response! (I've never understood the "Cartesian product" thing, I'll look into it...)
Here's more about the query:
MATCH (u:User {userId: $userId})
MATCH p=(u)-[r*]-(subset:Subscription)
WHERE NONE(rel IN r WHERE type(rel)="REL_1")
WITH subset
MATCH p2=(u)-[r2*]-(superset:Subscription)
WHERE ANY(rel IN r2 WHERE type(rel)="REL_1")
Since there can be multiple paths from u to (:Subscription), there can be some (:Subscription) nodes in both of these results, but I would like to exclude any that are in the subset from the superset.
I tried adding
WITH subset,
[node IN superset WHERE NOT node IN subset] AS filteredSuperset
I figured out part of what I've been trying to do:
MATCH (u:User {appId: $appId})
MATCH p=(u)-[r*]-(subset:Subscription)
WHERE NONE(rel IN r WHERE type(rel)="REL_1")
WITH
collect(distinct(subset.subscriptionId)) AS subsetIds
, u
MATCH p2=(u)-[r2*]-(superset:Subscription)
WHERE ANY(rel IN r2 WHERE type(rel)="REL_1")
AND single(i in nodes(p2) WHERE id(i) = id(superset))
AND single(i in nodes(p2) WHERE id(i) = id(u))
WITH
[node IN collect(distinct(superset.subscriptionId)) WHERE NOT node in subsetIds]
AS filteredSupersetIds,
subsetIds
RETURN
subsetIds,
filteredSupersetIds
I think I was having trouble accidentally creating all kinds of weird cartesian products in the variations I was trying.
It seems like If I compute variables as soon as possible and only pass down what is necessary to the next statement, it usually works.
What I ultimately want is a count() of the subsetIds and filteredSupersetIds. But when I change the return statement to
RETURN
count(subsetIds) as subsetCount,
count(filteredSupersetIds) as supersetCount
I get a count of 1 for each, when the lists of Ids contain 4 and 3, respectively, with the query above. How can I get a count of each list?