Filtering nodes to not include those linked from the original collection


(Pawan Wagh) #1

Platform specific details:

  • neo4j community version ( 3.1.1 )

I've users in my graph who are having communities linked with them, and there are some communities which belongs to the user but are also linked with some communities which also belongs to the same user, eg.
(USER {id: 1})<-[:COMMUNITY_OF]-(COMMUNITY) &
(USER {id: 1})<-[:COMMUNITY_OF]-(COMMUNITY)-[:ENTITY_OF]->(COMMUNITY)-[:COMMUNITY_OF]->(USER {id: 1})

End goal is to remove communities of user which are entity of certain communities which belongs to the same user.
I've tried something like this

MATCH (user: USER {id:"e372ebe4-e123-4ac1-919d-b9a0e46a819f" })<-[COMMUNITY_OF]-(communities: COMMUNITY)
    WITH user, communities
MATCH (user)<-[:COMMUNITY_OF]-(comms: COMMUNITY) WHERE NOT (comms)-[:ENTITY_OF]->(communities)
RETURN comms

(Andrew Bowman) #2

For one, be careful with your tenses. Plurals should be reserved for collections and lists, and you're not doing any collections. communities in your MATCH is not a collection, there will only ever be a single node per row in the result stream.

A good way to work with this would be to collect the communities, unwind them back to rows, then expand out and collect the entities of those comms as a collection to exclude.

From there it's just a matter of doing list subtraction. That's easier done if you have APOC procedures, but I'll show both ways:

MATCH (user: USER {id:"e372ebe4-e123-4ac1-919d-b9a0e46a819f" })<-[:COMMUNITY_OF]-(comm: COMMUNITY)
WITH user, collect(comm) as communities
UNWIND communities as comm // so we have the entire collection for each row
MATCH (comm)<-[:ENTITY_OF]-(exclude:COMMUNITY)
WITH communities, collect(DISTINCT exclude) as excluded // single row with both lists
WITH [comm in communities WHERE NOT comm in excluded] as comms // list subtraction
UNWIND comms as comm // back into rows
RETURN comm

If you have APOC procedures installed, you can use this for list subtraction instead:

...
WITH apoc.coll.subtract(communities, excluded) as comms
...

(Pawan Wagh) #3

@andrew.bowman: Thanks, it did solve my problem but didn't understand what has happened in the query.


(Pawan Wagh) #4

above query gives duplicate results.


(Pawan Wagh) #5
MATCH (user: USER {id:"e372ebe4-e123-4ac1-919d-b9a0e46a819f" })<-[COMMUNITY_OF]-(comm: COMMUNITY)
WITH user, collect(comm) as communities
UNWIND communities as comm // so we have the entire collection for each row
MATCH (comm)<-[:ENTITY_OF]-(exclude:COMMUNITY)
WITH communities, collect(DISTINCT exclude) as excluded // single row with both lists
WITH [comm in communities WHERE NOT comm in excluded] as comms // list subtraction
UNWIND comms as comm // back into rows
OPTIONAL MATCH (comm)<-[members:MEMBER_OF]-(members: USER)
RETURN comm, COUNT(members) AS noOfMembers

This is how i've updated the query provided as per my requirement.
What goes wrong in this query is noOfMembers are doubled for every community eg. if COMMUNITY is having 4 members then count in final result is 8. Also there are duplicate COMMUNITY in final result.
Did i do something wrong ?


(Andrew Bowman) #6

Might have been my fault, in the first line it should be :COMMUNITY_OF. I was missing the :, which would have made this match on all relationship types (since it would interpret this as a variable and not a rel type).

Give that a try first.


(Pawan Wagh) #7

@andrew.bowman: Yeah, you're correct. Even i missed it.
But i am still facing some issues with this query and not able to get it right, here what i've tried so far

MATCH (user: USER {id:"e372ebe4-e123-4ac1-919d-b9a0e46a819f" })<-[:COMMUNITY_OF]-(comm: COMMUNITY) 
// here i've A, A1, A2, B, C, D
     WITH user, collect(comm) as communities
MATCH (user)-[:MEMBER_OF]->(comm: COMMUNITY) 
// here i've E, F, G
     WITH communities + collect(comm) as communities
UNWIND communities as comm  
// so we have the entire collection for each row
	MATCH (comm)<-[:ENTITY_OF]-(exclude: COMMUNITY)
		WITH communities, collect(DISTINCT exclude) as excluded 
//here communities include A, A1, A2, B, C, D, E, F, G and A1, A2 in excluded
// single row with both lists
		WITH user, apoc.coll.subtract(communities, excluded) as comms 
// list subtraction i.e removing linked entities
		WITH DISTINCT comms as comms 
// here i've A, C, D, E, F, G but B removed
UNWIND comms as comm
	OPTIONAL MATCH (comm)<-[:MEMBER_OF]-(members: USER)
		WITH comm, COUNT(members) as noOfMembers
OPTIONAL MATCH (comm)-[:PAGE_OF]->(admins: USER)
	WITH comm, noOfMembers + COUNT(admins) as noOfMembers
	WITH COLLECT(comm) as comms, noOfMembers
UNWIND comms as comm 
OPTIONAL MATCH (comm)<-[:DEVICE_OF]-(dvc: DEVICES)
	WITH comm, COLLECT(dvc) as dvcs, noOfMembers
UNWIND dvcs as dvc
   	WITH comm, noOfMembers, collect({ device_id: dvc.id, alias: dvc.alias }) as devices
RETURN comm, noOfMembers, devices

(Pawan Wagh) #8

I want to understand what actually happens here. I've read the documentation but still need to understand what is going wrong or it is expected


(Pawan Wagh) #9

I've figured out why B is removed because B doesn't have any device linked with it.


(Andrew Bowman) #10

Any remaining questions on this one? Looks like you fixed the issue of B being removed by making it an OPTIONAL MATCH for devices.