Extremely slow retrieval

I am using Neo4j 5.19 and created a graph with 68k nodes and roughly the same number of relationships. But even for running a simple query like below ( which is akin to running 2 for loops ) it takes forever ! The current run is already 45 mins old and nowhere near finishing. I actually started with even getting cosine distances between nodes ( using gds ) but i felt that was what was slowing the whole thing and hence chopped it out for the below query ..and yet ...

          MATCH (parent)-[:FIRST_LEVEL_CONNECT]->(child)
          WITH parent, child
          MATCH ( neoparent )-[:FIRST_LEVEL_CONNECT]->(grandchild)
          WITH neoparent, child, parent, grandchild 
          WHERE parent.title <> neoparent.title AND child.title <> grandchild.title
          RETURN parent, child, neoparent, grandchild

The performance is not surprising based on your query. Your first match will return N rows. For each row, you execute your second match, which is not referencing parent nor child nodes passed. It is also the same query as your first match, so it should return N rows. The end result should be N x N rows. Finally you filter the rows so the start nodes don’t have the same title and the end nodes don’t either. Shouldn't there be lots of neoprene and grandchild nodes that meet these criteria for each parent and child pair?

If you look at the query plan, you see it performs the two queries independently and creates the Cartesian product of the two results, thus an NxN result set. This is a little more efficient than what I explained above, but it still is not an efficient approach.

I don’t understand what your goal is. Maybe you can provide some more insight.

In the meantime, the following should give you the same results, but I think it may be more efficient.

MATCH (parent)-[:FIRST_LEVEL_CONNECT]->(child)
WITH collect({parent: parent, child: child}) as data
WITH [i in data | i{.*, differentNodes: [j in data WHERE i.parent.title <> j.parent.title AND i.child.title <> j.child.title | j]}] as results
UNWIND results as result
UNWIND result.differentNodes as otherNode
RETURN 
    result.parent as parent, 
    result.child as child, 
    otherNode.parent as neoparent, 
    otherNode.child as grandchild

thanks so much Gary .. my goal is to disambiguate the nodes by checking which of the nodes i inserted have the same semantic meaning ( for e.g. if the parent node is "El Nino impacts monsoons in India" and some other disjoined child node is "Indian monsoons impacted by El Nino" )..since by default they wont be joined by any graph DB ( as they aren't the same textually ) ..hence my actual query included cosine distance on the vectors of the titles ( i do have indices on these vectors already ) ..in order to debug the original query i ran the sub query above ..here's the actual query i was running .. do u think i could simply extend the cypher query you provided above to add the gds cosine similarities as filters ?? also as u can see im just using the result and looping in python, to again create a connection between such nodes since i couldn't figure out how to use a simple where condition with the RETURN ( ideally i would have liked to add a simple where condition indicating retreival of records with sim > 0.9 )
deeply appreciate the time taken :slight_smile:

        MATCH (parent)-[:FIRST_LEVEL_CONNECT]->(child)\
            WITH parent, child\
        MATCH ( neoparent )-[:FIRST_LEVEL_CONNECT]->(grandchild)\
            WITH neoparent, child, parent, grandchild \
        WHERE parent.title <> neoparent.title AND child.title <> grandchild.title\
        RETURN parent, child, neoparent, grandchild, \
            gds.similarity.cosine( neoparent.title_embedding, child.title_embedding ),\
            gds.similarity.cosine( parent.title_embedding, grandchild.title_embedding )

Here is an example of leveraging the cosine measure as a filter. You could also add it to the first WHERE clause instead.

You can add it to my suggested query by including it in the WHERE clause in the list comprehension.


MATCH (parent)-[:FIRST_LEVEL_CONNECT]->(child)
WITH parent, child
MATCH ( neoparent )-[:FIRST_LEVEL_CONNECT]->(grandchild)
WITH neoparent, child, parent, grandchild 
WHERE parent.title <> neoparent.title AND child.title <> grandchild.title
WITH 
   parent, 
   child, 
   neoparent, 
   grandchild,  
   gds.similarity.cosine( neoparent.title_embedding, child.title_embedding ) as cosine_neoparent, 
   gds.similarity.cosine( parent.title_embedding, grandchild.title_embedding ) as cosine_parent
WHERE cosine_parent > 0.9 and cosine_neoparent > 0.9
RETURN *

1 Like