Question: Dense Nodes with Millions of Relations?

alec.muffett · November 22, 2018, 10:51am

There are perhaps 250 countries in the world.

If you have People nodes, and Country nodes
and if you have millions of people (eg: representing the 1+ countries which people have ever visited)
then some of these Country nodes will have tens of thousands, perhaps hundreds of thousands or millions of relationships ...

What I am wondering is "I am sure that Neo can cope with this, but is this truly a sane way to model data"?

It strikes me that perhaps it is more sane / less likely to cause query explosions, if Country nodes are somehow "sharded", eg: with something like:

(p:Person {uid:42})-[:Visited]->(c:Country {name:"France", uid:42})

...so that (assuming we are most interested in the :Visited forward relationship, each Person has a small cluster of per-Person-sharded Country nodes associated with them.

What do other folks think, please? Both approaches have pros/cons that I can see...

alec.muffett · November 22, 2018, 4:23pm

Apparently relevant link:

alec.muffett · November 22, 2018, 6:02pm

Further, from @mark.needham a few years ago, touching on CREATE UNIQUE (which may be defunct now?) but regarding an effect that I may be encountering, that csv-importing and merging relationships to dense/ish nodes may be slowing as the relationship count rises.

Sample code:

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:/foo.csv" AS batch
WITH batch WHERE (batch.FOO <> "")
MATCH (a:Person {uid:toInteger(batch.UID)}), (x:Foo {nonce:batch.FOO})
MERGE (a)-[ax:PersonToFoo]->(x)
ON CREATE SET ax.weight = toInteger(batch.WEIGHT)
ON MATCH SET ax.weight = ax.weight + toInteger(batch.WEIGHT)
;

...where a and x have been pre-created in previous runs, per the suggestion from @michael.hunger in Tip: Avoiding Slow & Messy Conditionals (or: splitting input) in Cypher for bulk import LOAD CSV? - #2 by michael.hunger

(edit: there are tens-of-millions of relations to create, from a multi-gigabyte CSV file; there are also UNIQUE constraints on a.uid and x.nonce)

Mark's Blog Link: Neo4j: MERGE'ing on super nodes | Mark Needham

michael.hunger · November 23, 2018, 6:27am

Actually there is a specific cypher operator for that MERGE(INTO) which takes the two node degrees into account.

And starts from the smaller side to check if a relationship exists between the two.

Your statement also has to write for every line.

Something that can be helpful in general for this kind of statement is to aggregate first
and then create the data after.
(But that would not work with USING PERIODIC COMMIT (for > 1M rels), so you'd have to use apoc.periodic.iterate)

call apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM "file:/foo.csv" AS batch 
WITH batch WHERE (batch.FOO <> "") 
RETURN toInteger(batch.UID) as person, batch.FOO as foo, sum(toInteger(batch.WEIGHT)) as weight
','
MATCH (a:Person {uid:person}), (x:Foo {nonce:foo}) 
MERGE (a)-[ax:PersonToFoo]->(x) 
SET ax.weight = ax.weight + weight
',{batchSize:10000})

alec.muffett · November 28, 2018, 1:47pm

Hi @michael.hunger - thank you for the response; and I take your point about the aggregation, that makes sense, I may try it. But where you say:

...I don't really understand what you mean; is there a different kind of MERGE that I should be using?

michael.hunger · November 28, 2018, 2:15pm

Nope that's automatic, you see the difference in the query plan.

Topic		Replies	Views
Tyler from Texas - Massive Dataset Introduce-Yourself performance , import	2	447	February 29, 2020
Loading in millions of nodes Import / Export performance , cypher , import	0	331	February 18, 2022
Reliably create relationships on 12million+ nodes Cypher	6	818	August 7, 2020
Graph Modeling: All About Super Nodes Neo4j Developer Blog Archive	1	899	December 28, 2020
Too slow adding relationships to 1.8 billion nodes with CYPHER Cypher	15	4125	July 20, 2021

Question: Dense Nodes with Millions of Relations?

Related topics