cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Load csv in Casual cluster vs standalone

danny_oberoi
Node Clone
  • neo4j version, 3.5.14
    I am seeing load csv is running very slow as compared to 3 node casual cluster. Do we have any bench marks indicating importing of data in casual cluster vs. standalone.

-Danny

8 REPLIES 8

danny_oberoi
Node Clone

@dana.canzano Also want to know if loading data through kattel will be faster than load csv.

if there are any benchmark indicating load csv performance will be very helpful.

i cant imagine whereby Kettle would be material faster than load csv. But also to date there is no data to support its slowness. It could be slow as a result of poor configuration, poor cypher, lack of indexes, etc.

Thanks Dana for the quick reply!

So are you saying that commit on 2 nodes out of the 3 nodes cluster OR 3 nodes out of 5 nodes cluster, won't add any overhead?

because load csv need to commit data on more than one node based upon the cluster configuration.

yes some overhead but is it the source of your poor performance. For example lets say your LOAD CSV is

LOAD CSV ....... ........ MERGE (n:Person {id:row[0]}) .......

and you have no index on :Person(id) then each row loaded will do a ScanNodesByLabel and if you have 100k :Persons already in the graph then each rows will need to scan over all 100k nodes before insertion. Surely this will be a bigger performance drag than any concern relative to committing on 1 Neo instance or 2 of 3 cluster members

Thanks Dana!

if the db size is same both in standalone and 3 node cluster(including the underlying hardware specification) and I load same data in cluster as well as standalone.

Just need a benchmark performance difference between standalone vs cluster load csv.
As i need to make my mind to use load csv before moving from single node to cluster or after that. And need nos to convince my solution architect to backup my thought process.

-Danny

Hi Danny
In which cloud are you trying to deploy your cluster?
Sameer

Thanks Sameer!

It is on prem 3 node cluster.

Have you observed any performance degradation between standalone or cluster neo4j in respect to load csv?

can you share the headerfile, the query and the timings ?