2nd, 3rd degree network?

graph-cat · May 13, 2025, 11:03am

What is the most efficient way to fetch a 2nd degree network (friends and friends of friends) of a node in a social graph?

I tried this:

MATCH (startNode:User {nodeId: 1}) // Find the starting User node
MATCH (startNode)-[*1..2]-(connectedNode:User) // Find distinct User nodes within 1 to 2 hops
WHERE startNode <> connectedNode // Exclude the starting node itself
RETURN count(DISTINCT connectedNode) AS secondDegreeNetworkSize;

I decided to try it on a real social network dataset (40 million nodes, 1 billion edges) from one of the public collections, and it took 130 seconds to fetch and count the nodes in the 2nd degree network of 2.9 million nodes. On average, the users have couple hundred connections or less, but there are outliers, of course.

It took longer than I was willing to wait (I aborted) to compute the 3rd degree network—even though at first I thought it should be easy.

I am new to cypher and graph databases—a student trying to learn. Is this the best it can do, or can my query be optimized?
Is there a more suitable data solution for this task that can solve this problem in a smaller amount of time?

I think I read somewhere that graph databases can make tens or even hundreds of millions of traversals per second per core on a modern CPU, and I ran it on a powerful M3Max MacBook with way more RAM than the database size, but even a 2nd degree network of just 1 million nodes took 30 seconds to compute.

So, did I do it wrong? Seems a bit slow for my hardware.

Thanks!

P.S. The schema:

╒══════════════════════════════════════════════════════════════════════╤══════════════════════════════╕
│nodes                                                                 │relationships                 │
╞══════════════════════════════════════════════════════════════════════╪══════════════════════════════╡
│[(:User {name: "User",indexes: [],constraints: ["Constraint( id=4, nam│[[:FOLLOWS {name: "FOLLOWS"}]]│
│e='user_nodeId_unique', type='UNIQUENESS', schema=(:User {nodeId}), ow│                              │
│nedIndex=3 )"]})]                                                     │                              │
└──────────────────────────────────────────────────────────────────────┴──────────────────────────────┘

dana_canzano · May 13, 2025, 11:11am

@graph-cat

a. is there an index in :User(nodeId)? if not then the 1st line is going to scan / examine every :User node to see if it has a nodeId and value 1?

b. the 2nd line. there is no relationship direction (i.e. -[*1..2]-(connectedNode:User) ) is that expected or should it be `-[*1..2]->(connectedNode:User) >?

c. have you done anything so as to configure heap/pagecache as described at Memory configuration - Operations Manual

d. what version of Neo4j?

e. do you have a profile/explain of the given query?

graph-cat · May 13, 2025, 11:41am

Hi Dana,
Thank you for your help!

Yes, I've indexed it. What's the best way for me to verify?
The schema output above does not mention it?

I wanted the 2nd and 3rd (not realistic, as I see now, so just 2nd, I guess) of those whom the starting node follows + those whom the node's friends follow. I.e., outgoing edges from S to outer frontier.
How would I do that properly and most efficiently?

No. I will study that. Thank you! Is that just for the Mac platform?

DBMS 5.24.0

Sure.

Cypher 5

Planner COST

Runtime PIPELINED

Runtime version 5.24

Batch size 1024

+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| Operator                          | Id | Details                                                 | Estimated Rows | Rows    | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms)  | Pipeline            |
+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| +ProduceResults                   |  0 | secondDegreeNetworkSize                                 |              1 |       1 |       0 |              0 |                    0/0 |      0.185 | In Pipeline 1       |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| +EagerAggregation                 |  1 | count(connectedNode) AS secondDegreeNetworkSize         |              1 |       1 |       0 |             32 |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +Filter                           |  2 | NOT startNode = connectedNode AND connectedNode:User    |           4493 | 2962798 | 5925596 |                |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +VarLengthExpand(Pruning,BFS,All) |  3 | (startNode)-[*..2]-(connectedNode)                      |           4992 | 2962799 | 4753297 |      234885320 |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +NodeUniqueIndexSeek              |  4 | UNIQUE startNode:User(nodeId) WHERE nodeId = $autoint_0 |              1 |       1 |       2 |            376 |         722009/3685883 | 142223.940 | Fused in Pipeline 0 |
+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+

Total database accesses: 10678895, total allocated memory: 234885688

...

Again, thank you for your help!

dana_canzano · May 13, 2025, 12:05pm

@graph-cat

Yes, I've indexed it. What's the best way for me to verify?

Can you run cypher command show indexes ?

No. I will study that. Thank you! Is that just for the Mac platform?

no.. this is applicable for any/all platforms.

joshcornejo · May 13, 2025, 12:35pm

Maybe a bit too much expectation ? These problems are exponential - have a look at this:

graph-cat · May 14, 2025, 1:12am

╒═══╤════════════════════╤════════╤═════════════════╤════════╤══════════════╤═════════════╤══════════╤══════════════════╤════════════════════╤════════════════════════════════╤═════════╕
│id │name                │state   │populationPercent│type    │entityType    │labelsOrTypes│properties│indexProvider     │owningConstraint    │lastRead                        │readCount│
╞═══╪════════════════════╪════════╪═════════════════╪════════╪══════════════╪═════════════╪══════════╪══════════════════╪════════════════════╪════════════════════════════════╪═════════╡
│1  │"index_343aff4e"    │"ONLINE"│100.0            │"LOOKUP"│"NODE"        │null         │null      │"token-lookup-1.0"│null                │"2025-05-13T07:27:20.625000000Z"│37       │
├───┼────────────────────┼────────┼─────────────────┼────────┼──────────────┼─────────────┼──────────┼──────────────────┼────────────────────┼────────────────────────────────┼─────────┤
│2  │"index_f7700477"    │"ONLINE"│100.0            │"LOOKUP"│"RELATIONSHIP"│null         │null      │"token-lookup-1.0"│null                │"2025-05-12T06:41:02.360000000Z"│1        │
├───┼────────────────────┼────────┼─────────────────┼────────┼──────────────┼─────────────┼──────────┼──────────────────┼────────────────────┼────────────────────────────────┼─────────┤
│3  │"user_nodeId_unique"│"ONLINE"│100.0            │"RANGE" │"NODE"        │["User"]     │["nodeId"]│"range-1.0"       │"user_nodeId_unique"│"2025-05-13T11:36:04.232000000Z"│52       │
└───┴────────────────────┴────────┴─────────────────┴────────┴──────────────┴─────────────┴──────────┴──────────────────┴────────────────────┴────────────────────────────────┴─────────┘

graph-cat · May 14, 2025, 8:15am

I responded with the index dump, but anti-spam deleted my comment for review a day ago, and it's still not been approved.

graph-cat · May 14, 2025, 9:09am

Thank you for the link. Yeah, Neo4j is probably the best graph DB on the market, and the most mature one.

So, I am essentially familiarizing myself with what and how far the graph algos/DBs in general can handle in real life applications. I just wanted to learn how I can optimize and squeeze every ounce of performance, and Dana was kind to share a bunch of pointers—I have things to read on now.

Thanks

Topic		Replies	Views
Query Slows down when nodes at double depth relationship are accessed Operations performance , cypher	9	1152	November 28, 2019
Return +2nd degree connections Cypher browser , cypher , relationship	3	3529	May 31, 2019
Cypher Query to to get having minimum 2 edges? Neo4j Graph Platform	7	1890	September 30, 2019
How to fetch millions of data faster? Cypher	30	8508	October 18, 2019
Slow query response time - simple friends of friends cypher query Cypher	3	2682	February 4, 2019

2nd, 3rd degree network?

...

Related topics