Remove elements from row 1 duplicated in row2

graphene · May 27, 2023, 12:15am

I have a query that starts out like this:

MATCH (u)-[]-(subset:Subscription)
WHERE ...
WITH subset
MATCH (u)-[]-(superset:Subscription)
WHERE ...

I know there will be some nodes from subset that are duplicated in superset, and I want to remove those nodes from superset before I return them. How can I do that?

neo4j v5

glilienfield · May 27, 2023, 1:11am

The structure of this query is problematic. The results of the first query are not correlated with the second query. As a result, each row returned from the first query is going to executed the same match each time. The result will be the Cartesian product of the results from each match statement.

Secondly, the two queries are the same pattern, just with a different binding variable for the Subscription nodes.

Can you provide the 'where' statements so I can try to understand your intentions? Or/and, can you describe what brought you to developing this query?

graphene · May 27, 2023, 1:57am

Thanks for the response! (I've never understood the "Cartesian product" thing, I'll look into it...)
Here's more about the query:

MATCH (u:User {userId: $userId})
MATCH p=(u)-[r*]-(subset:Subscription)
WHERE NONE(rel IN r WHERE type(rel)="REL_1")
WITH subset
MATCH p2=(u)-[r2*]-(superset:Subscription)
WHERE ANY(rel IN r2 WHERE type(rel)="REL_1")

Since there can be multiple paths from u to (:Subscription), there can be some (:Subscription) nodes in both of these results, but I would like to exclude any that are in the subset from the superset.

I tried adding

WITH subset, 
[node IN superset WHERE NOT node IN subset] AS filteredSuperset

But got some unexpected results.

glilienfield · May 27, 2023, 3:15am

ok, I am getting more understanding. What are you returning? Can you include the entire query? That would be easiest.

Thanks..

graphene · May 27, 2023, 6:57pm

I figured out part of what I've been trying to do:

MATCH (u:User {appId: $appId})
MATCH p=(u)-[r*]-(subset:Subscription)
  WHERE NONE(rel IN r WHERE type(rel)="REL_1")
WITH
  collect(distinct(subset.subscriptionId)) AS subsetIds
  , u
MATCH p2=(u)-[r2*]-(superset:Subscription)
  WHERE ANY(rel IN r2 WHERE type(rel)="REL_1")
  AND single(i in nodes(p2) WHERE id(i) = id(superset))
  AND single(i in nodes(p2) WHERE id(i) = id(u))
WITH
  [node IN collect(distinct(superset.subscriptionId)) WHERE NOT node in subsetIds] 
  AS filteredSupersetIds,
    subsetIds
RETURN
  subsetIds,
  filteredSupersetIds

I think I was having trouble accidentally creating all kinds of weird cartesian products in the variations I was trying.

It seems like If I compute variables as soon as possible and only pass down what is necessary to the next statement, it usually works.

What I ultimately want is a count() of the subsetIds and filteredSupersetIds. But when I change the return statement to

RETURN
  count(subsetIds) as subsetCount,
  count(filteredSupersetIds) as supersetCount

I get a count of 1 for each, when the lists of Ids contain 4 and 3, respectively, with the query above. How can I get a count of each list?

graphene · May 27, 2023, 7:12pm

I just learned I need to use size() instead of count() to get the length of a list!

Topic		Replies	Views
Duplicated rows when using multiple CALL subqueries Cypher cypher , neo4j-desktop	2	336	December 30, 2021
Conditionally removing nodes and edges based on nodes and relationships Neo4j Graph Platform migrated	2	217	November 3, 2022
How to remove additional match from the query? Cypher	2	180	October 28, 2021
Duplicate result Cypher	3	761	May 19, 2020
RETURN matching nodes and non-matching nodes Cypher	2	599	August 31, 2020

Remove elements from row 1 duplicated in row2

Related topics