Background:
Neo4j Community edition 4.0.0
APOC 4.0.0.16
GDS 1.3.4
I am looking for information on how to run a similarity analysis between two 'lists' of nodes all at once, rather than one at time.
My schema looks similar to this:
(Node1 {type:'A'})-[:rel1]->(Node2)-[:rel2]->(Node3)-[:rel3]->(Node4)-[:rel4]->(Node5 {name:'xxx'})
(Node1 {type:'B'})-[:rel1]->(Node2)-[:rel2]->(Node3)-[:rel3]->(Node4)-[:rel4]->(Node5 {name:'xxx'})
I can do a one off similarity analysis using gds.alpha.similarity.jaccard
to see how similar the Node4 contents compare. The problem is, I have about 100 different Node1s with type 'A' to compare with about 100 Node1s of type 'B'. I would like to do this as "one" procedure, with the results output to a table to visualize, or possibly saving the results back to the database.
Try to think of this problem as comparing 2 different Bills of Material (Node1) used to manufacture an assembly (Node5) at different revisions.
Can someone please advise? Thanks.
UPDATE Found it here: Similarity functions - Neo4j Graph Data Science by Table 5.279. I was missing the WHERE p1 <> p2
clause which was causing the query to run forever.