cancel
Showing results for 
Search instead for 
Did you mean: 

Break shortestPath loop having "WHERE property IN [..]"

isayeter
Node Link

Hello to everyone, I'm very newbie in neo4j graph DB.

I can get shortest paths between a source user and 100 other target users as follows:

MATCH p = shortestPath( 
	(u1:User)-[:follows*]->(u2:User {userID:"17911468618"}) 
) 
WHERE u1.userID IN ["173560420","232192182","427553890","6860189","26669533","2094200507","6590609","290023231","260375673","19410587","29394004","14455831","1287006597","186904952","303054725","212742998","44059601","1574083","979177642","1550693326","3132929984","1918078581","284920884","907025384","4213518589","1518284433","1822594644","1506607755","637874562","10482862","176702683","188222091","29421778","1521885756","246194371","2482066817","176618189","187619120","40992858","524226044","371299822","55795588","2274763833","1470414259","2033147472","421468021","1547627005","2289707047","22079794","18900337","144605776","1234301500","1942463581","496865116","1506896522","266928623","414113135","10206720","1363484236","232024162","350708102","324599988","23577429","197260775","50417061","1371714791","263110431","5486909","192417402","248633614","185546187","26444210","266319242","189396108","550150293","24065795","1036534773","489110643","6311667","1660662141","1962023419","271229086","1412007771","548017775","566276178","46983271","1635861286","21193118","9232256","378496191","314761038","1777543238","191110424","277509338","16264572","175353759","270902355","21629668","325848806","242066770"] 
RETURN nodes(p) as results

in this query WHERE u1.userID IN statement has 100 different unique IDs.

The returning results of this query:

3X_3_2_32d157db05eec7854331f6fc652214d54f858133.png

That's okey, it means; 89 out of 100 target users in the list has a :follows path to the source user.

However I want to break the loop when the first 3 target users found which has a path to the source user , is it possible to do that? Thank you.

1 ACCEPTED SOLUTION

Yes, you can add LIMIT 3 to the end of your query.

Keep in mind that this won't necessarily get you the closest 3 nodes (top 3 shortest paths). If you need that then a different query is needed.

View solution in original post

14 REPLIES 14

Yes, you can add LIMIT 3 to the end of your query.

Keep in mind that this won't necessarily get you the closest 3 nodes (top 3 shortest paths). If you need that then a different query is needed.

I don't know why I just couldn't simply think "LIMIT 3" , the answer was actually very simple. Thank you so much.

Andrew, I want to ask another question, I want to exclude some userID's in the relationships chain, I mean, if I don't want to see some user's in the shortest path chain, how to do that?

for example:
A -> follows -> B
B -> follows -> C
B -> follows -> D
C -> follows -> M
D -> follows -> M

when I run shortestPath from:A to:M , output is A->B->C->M, however I dont want to see C user in the chain, so I want to get A->B->D->M.

After you have your collection of ids, add a WHERE clause to the shortestPath() match, such that

WHERE none(node IN nodes(p) WHERE node.userID IN excludedIds)

isayeter
Node Link

Thank you so much again Andrew, sorry for my basic questions, I'm still very newbie at neo4j.

Andrew, I want to ask another question;

When I query shortest path with *1..4 bounds;

MATCH p=shortestPath(
(u1:User {userID:"3514890218"})-[:follows*1..4]->(u2:User {userID:"184460715"})
)
RETURN nodes(p)

it returns as;


in 24 ms.
3X_5_3_5341705c29fcca267db6ed0539465e963fe0061a.png

and as you can see, the result consists of 4 hops. so there is no path with <4 hops for those users.

However when I run the same query with *1..5 bounds, like:

MATCH p=shortestPath(
(u1:User {userID:"3514890218"})-[:follows*1..5]->(u2:User {userID:"184460715"})
)
RETURN nodes(p)

it returns as;


in 3240 ms.
3X_8_6_8673abc221f8f9676a340dfaebc5a6f19a87c89a.png

As you can see, the result still consists of 4 hops as expected. But it takes x135 times slower.
I'm a little confused right now, the query times are quite same everytime I run queries. What I'm expecting that the second query should return the results as fast as like the first query (whenever it finds first 4 hops chain, it should return). What I am missing? Is that normal?

Two things...first uncheck the "Connect result nodes" checkbox in your browser preferences (gear icon in the lower left).

Second, please run a PROFILE of the plan and attach it here.

isayeter
Node Link

Hello again, query1:

PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..4]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

Result Graph:

Result Plan:

Result Reponse:
query1_response.txt (8.9 KB)

query2:

PROFILE MATCH p=shortestPath(
(u1:User {userID:"3514890218"})-[:follows*1..5]->(u2:User {userID:"184460715"})
)
RETURN nodes(p)

Result Graph:

Result Plan:

Result Reponse:
query2_response.txt (8.9 KB)

I don't think shortestPath() is showing an accurate number of db hits, I've raised the question internally and we'll address it if it's a bug.

We do see significantly more memory being used for the operation in the second query compared to the first, so there does seem to be more work being done, but that too is puzzling, as shortestPath() should be shortcircuiting once the path is found.

Can you try running both queries again, prefixing each with CYPHER runtime=SLOTTED ?

isayeter
Node Link

Hello again Andrew,

Query1:

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..4]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

Timing: 24ms
3X_4_7_47b212aa596697d15300776e2214f2e2dd32a37d.png

Results:
query1_summary.txt (12.2 KB)
query1_response.txt (1.3 KB)

Query 2:

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..5]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

Timing: 2780ms
3X_d_d_dd979dc9e3798cddc63b85d7e54b9511e25ceaad.png

Results:
query2_summary.txt (12.2 KB)
query2_response.txt (1.3 KB)

Thanks, can you edit that comment to also provide the timings for each of those?

I have edited my post again, first is 24ms , second is 2780ms.

Okay, that rules out a runtime problem.

Can you let us know which version of Neo4j you are using?

isayeter
Node Link

Yes of course. Here it is:
|Version:|4.2.3|
|Edition:|Enterprise|

Thanks, nothing in the changelogs about any known bug here, so this may be something new.

Do you see the time (and memory used for the shortestPath operation) increasing further if you keep raising the upper bound (5, 6, and so on)?

isayeter
Node Link

1..3

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..3]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

3X_6_3_634e553cbce2dc96954e8b638a02223365233241.png
3X_1_4_1495880db7daedfa8f20c363197a5a0b7ce5cc28.png
summary1-3.txt (12.2 KB)

1..4

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..4]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

3X_0_3_03ef65c79b8f0e49defdec7943b85b2e28e6ee6e.png
3X_4_7_4755b783065c267073bdd8c28eed4ba0ff20ee04.png
summary1-4.txt (12.2 KB)

1..5

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..5]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

3X_0_e_0e424bc6957df475770d8719b280ee02e780f0d6.png
3X_f_c_fc73fa9dc8a2a18adada6e788d5fee971700ccd8.png
summary1-5.txt (12.2 KB)

1..6

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*1..6]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

3X_4_2_4226cc2974405701b2709df5b0654881de0cad73.png
3X_c_d_cd404ced7e2865d9b7202eac782c209cefaa43f7.png
summary1-6.txt (12.2 KB)

without bounds

CYPHER runtime=SLOTTED 
PROFILE MATCH p=shortestPath( (u1:User {userID:"3514890218"})-[:follows*]->(u2:User {userID:"184460715"}) ) RETURN nodes(p)

3X_d_e_de4beaac1e13b9616ff8a7a8011ceeaeb44bb617.png
3X_d_1_d169d0789c50c3b4820c883c8a4b97354e0eee9a.png
summary-no-bounds.txt (12.7 KB)