Know more on Intersection Cyphers

browser
cypher
knowledge-base

(Sucheta) #1

Hi,

I am making a Venn diagram out of the output results of Neo4j. And in Venn diagrams , intersections are the most important.

[a]

I want to know what do the below two queries mean ?
[1]


WITH ["action",'requestID','notifierType','incidentCity','eventType'] as ids
MATCH (a)-[:Parameter]->(b)
WHERE a.id in ids
WITH b, size(ids) as inputCnt, count(DISTINCT a) as cnt
WHERE cnt = inputCnt
RETURN b

or

[2]


WITH ['requestID'] as ids
MATCH (a)-[:Parameter]->(b)
WHERE a.id in ids
WITH b, size(ids) as inputCnt, count(DISTINCT a) as cnt
WHERE cnt = inputCnt
RETURN b

[3] What does this query mean -


MATCH (n {name:'requestID'}), (m {name:'requestID'})
RETURN apoc.coll.union(n.interests, m.interests) as interests_union,
       apoc.coll.intersection(n.interests, m.interests) as interests_intersection

Is it similar to OPTIONAL MATCH. And i get error - "the missing property name is - interests"

[b]
I want to figure out if a node like "requestID" is common between any nodes.
So I tried this query -


WITH ['requestID'] as names
MATCH (p)-[:Parameter]-(m)
WHERE p.name in names
RETURN m

And got the required result

Is my query correct in finding out the intersection of node "requestID" ?

Referred from -

Reference for Neo4j intersections is -


(Andrew Bowman) #2

Let's look at [a] first:

WITH ["action",'requestID','notifierType','incidentCity','eventType'] as ids
MATCH (a)-[:Parameter]->(b)
WHERE a.id in ids
WITH b, size(ids) as inputCnt, count(DISTINCT a) as cnt
WHERE cnt = inputCnt
RETURN b

We're matching to nodes with the given ids, expanding outgoing :Parameter relationships to some other node (it would be much better to use labels here of course so we could take advantage of any existing indexes for lookup instead of doing an all nodes scan, but this is a very generic example so those are omitted).

These two lines are the key:

WITH b, size(ids) as inputCnt, count(DISTINCT a) as cnt
WHERE cnt = inputCnt

Per b node, we're getting the count of distinct a nodes that are connected to that b node, and that count needs to be equal to the size of the input collection. If so, that means all of the values in the input are connected to that b node (provided there aren't duplicate values in the input collection).

The [2] example for this is only using a single input value instead of multiple, and while this still works, it's not necessarily if you're always going to be using a single input value instead of some variable number of values.

As for [3] example, this is something completely different:

MATCH (n {name:'requestID'}), (m {name:'requestID'})
RETURN apoc.coll.union(n.interests, m.interests) as interests_union,
       apoc.coll.intersection(n.interests, m.interests) as interests_intersection

I'm not sure where you got this example, but it's a little odd, since both nodes in the match are using the same requestID. If we had something more like:

MATCH (n {name:'requestID1'}), (m {name:'requestID2'})
RETURN apoc.coll.union(n.interests, m.interests) as interests_union,
       apoc.coll.intersection(n.interests, m.interests) as interests_intersection

that would work a bit better, as we would know these are referring to two separate nodes (again we would use labels if possible to take advantage of index lookups).

apoc.coll.union() and apoc.coll.intersection() are APOC functions that performs UNION or INTERSECTION operation between two lists, so each of those nodes has to have an interests property that is a list of some sort. The results would be the unioned values and the intersected values between the two interests lists of those nodes.

For [b], this doesn't look like a correct approach, but it's not entirely clear what it is you want.

Do you want to make sure that somewhere in the graph, there is a node (with some specific label hopefully?) with the name 'requestID' between two other nodes (with :Parameter relationships between those nodes)?

If so, then a query like this may work:

MATCH (p {name:'requestID'})
WHERE size((p)-[:Parameter]-()) > 1
RETURN p

This checks that some node with the given name has > 1 :Parameter relationship. Assuming that you can't have multiple :Parameter relationships to the same node, it would mean that there are at least two nodes connected in this way. That p node is returned.

If that doesn't meet what you want, then you will need to supply some information about your data model and be more specific as to what exactly you want the query to do.


(Sucheta) #3

Thank you andrew for an anticipated response.

Every query you explained if understood and implemented, then we would make use of it in our project somewhere.

[1]
Currently, i want to know which all nodes have "requestID" with "Parameter" as the relationship . Therefore i wrote this query -

WITH ['requestID'] as names
MATCH (p)-[:Parameter]-(m)
WHERE p.name in names
RETURN m

And here , requestID is in two nodes - claimIntimationRequestBody & claimIntimationRequestHeader .

Is this the right approach ? We actually want to make a Venn Diagram where we show nodes and there intersections ? Should the intersections in the venn diagram be shown according to the above query where i will pass every node name in it and find its output.? Or is there an even better query for it.

[2]

The query that you gave is -

MATCH (p {name:'requestID'})
WHERE size((p)-[:Parameter]-()) > 1
RETURN p

returns no output . Here is the screenshot -

[3]
Also,
this query is unanswered. Can you please answer it. -


(Sucheta) #4

I have one more query (considering the above queries).

I have found out a new algorithm - The Jaccard Similarity algorithm.

However, when i use its query ( by just replacing [:LIKES] with [:Parameter])

MATCH (p:Person)-[:Parameter]->(cuisine)
WITH {item:id(p), categories: collect(id(cuisine))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard.stream(data)
YIELD item1, item2, count1, count2, intersection, similarity
RETURN algo.getNodeById(item1).name AS from, algo.getNodeById(item2).name AS to, intersection, similarity
ORDER BY similarity DESC

error i get is -


Neo.ClientError.Procedure.ProcedureNotFound: There is no procedure with the name 
`algo.similarity.jaccard.stream` registered for this database instance. Please ensure you've spelled the 
procedure name correctly and that the procedure is properly deployed.

Please help to resolve the other errors


(Michael Hunger) #5

Did you install the procedures library? In Neo4j desktop it's just a button click.

Otherwise download the correct jar from here and follow the instructions: