Sum of multiple specific relationships per node for large data set

matt.goodwin491 · January 20, 2022, 9:11pm

I have a graph model similar to this:

(:PERSON)-[:LIVED_IN]->(:CITY)

I would like to know the total number of people who have lived in more than one city. Therfore an example would be:
Dave has lived in Sydney
Dave has lived in London
John has lived in Paris
Wendy has lived in New York

The result of this query would return 1 as just Dave has lived in more than 1 city.

My dataset of people is greater than 1million in total therefore performance is of interest.

Im aware that I could perform the following Cypher Query, however this will load all the Person nodes from the database and with such a large dataset, this takes along time to execute.

MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY) 
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)

Is there a more efficient way to find out the information I'm interested in?

dana_canzano · January 20, 2022, 10:41pm

@matt.goodwin491

But

MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY) 
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)

does not actually run since l is not included in the WITH clause of line 2. For example I get

neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
             WITH p, count(l) as rels
             WHERE rels > 1
             RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"

and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.

As to optimization, provided a :Person node which has a :LIVED_IN relationship points to and only to a :City node, i.e. a :LIVED_IN rel is not used to also connect to a :Country node then you can

match (n:Person)
where  size( (n)-[:LIVED_IN]->()) > 1

and this size( (n)-[:LIVED_IN]->()) will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave might have

koji · January 20, 2022, 11:02pm

Hi @matt.goodwin491

How about this Cypher

MATCH (p:PERSON)-[:LIVED_IN]->(:CITY), (p)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)

This Cypher is same as this.

MATCH (:CITY)<-[:LIVED_IN]-(p:PERSON)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)

Topic		Replies	Views
Return how many specific relationship there are for a specific node Cypher	4	475	April 29, 2020
Group by on nodes vs node-properties Cypher	5	2229	February 5, 2020
Count all relationships when there are two MATCH patterns Neo4j Graph Platform	5	952	January 9, 2020
Count 2 relationships in a single cypher query Neo4j Graph Platform	2	369	January 24, 2021
Counting Number of Relationships Per Grouped Nodes Cypher apoc , performance , cypher , operations , relationship , index	2	1185	November 21, 2020

Sum of multiple specific relationships per node for large data set

Related topics