2nd, 3rd degree network?

What is the most efficient way to fetch a 2nd degree network (friends and friends of friends) of a node in a social graph?

I tried this:

MATCH (startNode:User {nodeId: 1}) // Find the starting User node
MATCH (startNode)-[*1..2]-(connectedNode:User) // Find distinct User nodes within 1 to 2 hops
WHERE startNode <> connectedNode // Exclude the starting node itself
RETURN count(DISTINCT connectedNode) AS secondDegreeNetworkSize;

I decided to try it on a real social network dataset (40 million nodes, 1 billion edges) from one of the public collections, and it took 130 seconds to fetch and count the nodes in the 2nd degree network of 2.9 million nodes. On average, the users have couple hundred connections or less, but there are outliers, of course.

It took longer than I was willing to wait (I aborted) to compute the 3rd degree networkβ€”even though at first I thought it should be easy.

  1. I am new to cypher and graph databasesβ€”a student trying to learn. Is this the best it can do, or can my query be optimized?
  2. Is there a more suitable data solution for this task that can solve this problem in a smaller amount of time?

I think I read somewhere that graph databases can make tens or even hundreds of millions of traversals per second per core on a modern CPU, and I ran it on a powerful M3Max MacBook with way more RAM than the database size, but even a 2nd degree network of just 1 million nodes took 30 seconds to compute.

So, did I do it wrong? Seems a bit slow for my hardware.

Thanks!

P.S. The schema:

╒══════════════════════════════════════════════════════════════════════╀══════════════════════════════╕
β”‚nodes                                                                 β”‚relationships                 β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════════════════════║
β”‚[(:User {name: "User",indexes: [],constraints: ["Constraint( id=4, namβ”‚[[:FOLLOWS {name: "FOLLOWS"}]]β”‚
β”‚e='user_nodeId_unique', type='UNIQUENESS', schema=(:User {nodeId}), owβ”‚                              β”‚
β”‚nedIndex=3 )"]})]                                                     β”‚                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

@graph-cat

a. is there an index in :User(nodeId)? if not then the 1st line is going to scan / examine every :User node to see if it has a nodeId and value 1?

b. the 2nd line. there is no relationship direction (i.e. -[*1..2]-(connectedNode:User) ) is that expected or should it be `-[*1..2]->(connectedNode:User) >?

c. have you done anything so as to configure heap/pagecache as described at Memory configuration - Operations Manual

d. what version of Neo4j?

e. do you have a profile/explain of the given query?

Hi Dana,
Thank you for your help!

Yes, I've indexed it. What's the best way for me to verify?
The schema output above does not mention it?

I wanted the 2nd and 3rd (not realistic, as I see now, so just 2nd, I guess) of those whom the starting node follows + those whom the node's friends follow. I.e., outgoing edges from S to outer frontier.
How would I do that properly and most efficiently?

No. I will study that. Thank you! Is that just for the Mac platform?

DBMS 5.24.0

Sure.

Cypher 5

Planner COST

Runtime PIPELINED

Runtime version 5.24

Batch size 1024

+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| Operator                          | Id | Details                                                 | Estimated Rows | Rows    | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms)  | Pipeline            |
+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| +ProduceResults                   |  0 | secondDegreeNetworkSize                                 |              1 |       1 |       0 |              0 |                    0/0 |      0.185 | In Pipeline 1       |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+
| +EagerAggregation                 |  1 | count(connectedNode) AS secondDegreeNetworkSize         |              1 |       1 |       0 |             32 |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +Filter                           |  2 | NOT startNode = connectedNode AND connectedNode:User    |           4493 | 2962798 | 5925596 |                |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +VarLengthExpand(Pruning,BFS,All) |  3 | (startNode)-[*..2]-(connectedNode)                      |           4992 | 2962799 | 4753297 |      234885320 |                        |            |                     |
| |                                 +----+---------------------------------------------------------+----------------+---------+---------+----------------+                        |            |                     |
| +NodeUniqueIndexSeek              |  4 | UNIQUE startNode:User(nodeId) WHERE nodeId = $autoint_0 |              1 |       1 |       2 |            376 |         722009/3685883 | 142223.940 | Fused in Pipeline 0 |
+-----------------------------------+----+---------------------------------------------------------+----------------+---------+---------+----------------+------------------------+------------+---------------------+

Total database accesses: 10678895, total allocated memory: 234885688

...

Again, thank you for your help!

@graph-cat

Yes, I've indexed it. What's the best way for me to verify?

Can you run cypher command show indexes ?

No. I will study that. Thank you! Is that just for the Mac platform?

no.. this is applicable for any/all platforms.

Maybe a bit too much expectation ? These problems are exponential - have a look at this:

╒═══╀════════════════════╀════════╀═════════════════╀════════╀══════════════╀═════════════╀══════════╀══════════════════╀════════════════════╀════════════════════════════════╀═════════╕
β”‚id β”‚name                β”‚state   β”‚populationPercentβ”‚type    β”‚entityType    β”‚labelsOrTypesβ”‚propertiesβ”‚indexProvider     β”‚owningConstraint    β”‚lastRead                        β”‚readCountβ”‚
β•žβ•β•β•β•ͺ════════════════════β•ͺ════════β•ͺ═════════════════β•ͺ════════β•ͺ══════════════β•ͺ═════════════β•ͺ══════════β•ͺ══════════════════β•ͺ════════════════════β•ͺ════════════════════════════════β•ͺ═════════║
β”‚1  β”‚"index_343aff4e"    β”‚"ONLINE"β”‚100.0            β”‚"LOOKUP"β”‚"NODE"        β”‚null         β”‚null      β”‚"token-lookup-1.0"β”‚null                β”‚"2025-05-13T07:27:20.625000000Z"β”‚37       β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚2  β”‚"index_f7700477"    β”‚"ONLINE"β”‚100.0            β”‚"LOOKUP"β”‚"RELATIONSHIP"β”‚null         β”‚null      β”‚"token-lookup-1.0"β”‚null                β”‚"2025-05-12T06:41:02.360000000Z"β”‚1        β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚3  β”‚"user_nodeId_unique"β”‚"ONLINE"β”‚100.0            β”‚"RANGE" β”‚"NODE"        β”‚["User"]     β”‚["nodeId"]β”‚"range-1.0"       β”‚"user_nodeId_unique"β”‚"2025-05-13T11:36:04.232000000Z"β”‚52       β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

I responded with the index dump, but anti-spam deleted my comment for review a day ago, and it's still not been approved.

Thank you for the link. Yeah, Neo4j is probably the best graph DB on the market, and the most mature one.

So, I am essentially familiarizing myself with what and how far the graph algos/DBs in general can handle in real life applications. I just wanted to learn how I can optimize and squeeze every ounce of performance, and Dana was kind to share a bunch of pointersβ€”I have things to read on now.

Thanks