cancel
Showing results for 
Search instead for 
Did you mean: 

Traversing that involve super node

md7
Node Link

i am using database version 4.3.3, for the below query that involve a supernode. While finding path between user u1 and u3 a super node comes. On profiling i am not able to understand why query engine doesn't check for supernode.

CREATE INDEX username_index FOR (u:User) ON (u.name);
MATCH (n) DETACH DELETE n;
CREATE (u1:User {name: 'u1'}),
(u2:User {name: 'u2'}),
(u3:User {name: 'u3'}),
(u4:User {name: 'u4'}),
(u5:User {name: 'u5'}),
(u6:User {name: 'u6'}),
(u7:User {name: 'u7'}),
(u8:User {name: 'u8'}),
(u9:User {name: 'u9'}),
(sperNode:User {name: 'super-node'}),
(u1) -[:FOLLOWS]-> (u2),
(u2) -[:FOLLOWS]-> (u3),
(u2) -[:FOLLOWS]-> (sperNode),
(u4) -[:FOLLOWS]-> (sperNode),
(u5) -[:FOLLOWS]-> (sperNode),
(u6) -[:FOLLOWS]-> (sperNode),
(u7) -[:FOLLOWS]-> (sperNode),
(u8) -[:FOLLOWS]-> (sperNode),
(u9) -[:FOLLOWS]-> (sperNode)
RETURN *

PROFILE
MATCH (u1:User { name: "u1" })
WITH u1
MATCH p=(u1)-[:FOLLOWS*]-(:User { name: "u3" })
RETURN p

Thanks in advance,
baseer

7 REPLIES 7

Benoit_d
Graph Buddy

may be because "super-node" ist not on any path between u1 and u3?
3X_9_f_9f5db147b609c3b01763c089a14927251a9c0810.png

Even if i add super node in the path from user u1 to u3 , then also it will not check for all the relationship of supernode. though its performance magic of cypher query engine in optimising search path.

test data; super node in path

CREATE  (u1:User {name: 'u1'}),
        (u2:User {name: 'u2'}),
        (u3:User {name: 'u3'}),
        (u4:User {name: 'u4'}),
        (u5:User {name: 'u5'}),
        (u6:User {name: 'u6'}),
        (u7:User {name: 'u7'}),
        (u8:User {name: 'u8'}),
        (u9:User {name: 'u9'}),
        (sperNode:User {name: 'super-node'}),
        (u1) -[:FOLLOWS]-> (u2),
        (u2) -[:FOLLOWS]-> (u3),
        (u2) -[:FOLLOWS]-> (sperNode),
(sperNode) -[:FOLLOWS]-> (u3),

        (u4) -[:FOLLOWS]-> (sperNode),
        (u5) -[:FOLLOWS]-> (sperNode),
        (u6) -[:FOLLOWS]-> (sperNode),
        (u7) -[:FOLLOWS]-> (sperNode),
        (u8) -[:FOLLOWS]-> (sperNode),
        (u9) -[:FOLLOWS]-> (sperNode)
RETURN *

Still supernode relationships are not checked. In my understanding, supernode should degrades performance. That's the reason i had not specified the directionality in below query.

PROFILE
MATCH p = (u1:User { name: "u1" })-[r1:FOLLOWS*]-(u3:User { name: "u3" })
RETURN p

3X_0_0_00197d14bb0dee9efae76232ac53456b2a2973c8.png

Bennu
Graph Fellow

Hi!

Can you define check for a super node?

Bennu

i expect query engine checking all relationships of super node if its on the path between user u1 and u3 using below query (direction of relationship not mentioned)

PROFILE
MATCH p = (u1:User { name: "u1" })-[r1:FOLLOWS*]-(u3:User { name: "u3" })
RETURN p

We don't understand what "check" you want this to perform.

Should this prevent matching through a supernode? Should it stop and report a supernode when it encounters it? Should it process through it but also report that a supernode exists here? What behavior are you expecting this to do?

Note that if you want it to not process it, then you technically will not be getting back correct answers to your query, since such a path exists.

Let me rephrase my question again:
Since super node has many relationships. And if you analyse the path search query using PROFILE\EXPLAIN keyword, you will notice operator 'VarLengthExpand' operator it returns exact two relationship instead of returning all relationship of super node.

How does Neo4j intelligently finds only required relationship that are in the path finding?

Thanks for clarifying.

It may depend upon the query plan.

The one I see when I run this matches to both of the end nodes first, and then performs a VarLengthExpand(Into) operation, meaning that the filtering of the end nodes is performed within the operation, since it already found both of them. It does not need to separate the var length expand and filter steps into two separate ones. That said, even if the result is going to show a row count for after that filtering, the db hits should reflect the work that was done in expansion and node id comparison.

If the plan or query was different, then you might see a VarLengthExpand operation coming only from one side, and that's where all of the supernode's relationships would be considered, and then it might be followed by a filter operator on the property name, which would bring the rows down to the exact matches.

So theres no real "magic" here, all relationships of the supernode DO have to be expanded and filtered in some way. The VarLengthExpand(Into) operator just takes care of that filtering for you instead of needing to do the filtering in a separate operator, so you don't see the rows being expanded or filtered in the plan, though you should see the db hits from it.