I would like to know the total number of people who have lived in more than one city. Therfore an example would be:
Dave has lived in Sydney
Dave has lived in London
John has lived in Paris
Wendy has lived in New York
The result of this query would return 1 as just Dave has lived in more than 1 city.
My dataset of people is greater than 1million in total therefore performance is of interest.
Im aware that I could perform the following Cypher Query, however this will load all the Person nodes from the database and with such a large dataset, this takes along time to execute.
MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)
Is there a more efficient way to find out the information I'm interested in?
MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)
does not actually run since l is not included in the WITH clause of line 2. For example I get
neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"
and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.
As to optimization, provided a :Person node which has a :LIVED_IN relationship points to and only to a :City node, i.e. a :LIVED_IN rel is not used to also connect to a :Country node then you can
match (n:Person)
where size( (n)-[:LIVED_IN]->()) > 1
and this size( (n)-[:LIVED_IN]->()) will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave might have