Best method or recommandations to load big data from Oracle Database

Hello,
I have slowness (6 days loading) problems with loading databases from Oracle Database

We currently have a big problem for loading data from the oracle database to neo4j enterprise. We totalize 71 million of nodes and 360 million of relationships and this is set to grow. It is important to remember that we will have to do this full load on a regular basis because the data evolves very quickly.
The loading is done with local cypher-shell.bat of the neo4j server and essentially via the apoc.periodic.iterate procedure with a sql connection oracle + sql query and the second part which executes the apoc.create.node and apoc.create.relationship insert requests.

The insertion of nodes and its properties is done in bacthSize :10000 and parallel :true, it takes an average of 10min.

The insertion of links and their properties is done in bacthSize:1000 and parallel: false, so far we have not been able to load everything yet because it takes a lot of time, sometimes more than 24 hours still running. It is important to note that the SQL part takes an average of 15min to 20min to execute.

The set (data+schema+index) currently weighs 186 GB and took about 6 days to load

Loading test is done locally on the neo4j enterprise server (trial version) with the following features:

Machine:
Windows Server 2016 OS, 8 Intel Xeon Gold 6348 processors of 2.6Ghz with 16 UTA in total, 96GB RAM memory, HD HDD (2 partitions C: and D: each 260GB free), D: is mainly used for the storage of transaction logs.

Some changes to neo4j.conf
server.directories.transaction.logs.root = D:/Neo4jEnterprise/data/transactions dbms.security.aut_enabled=false
server.memory.heap.initial_size=28200m
server.memory.heap.max_size=28200m
server.memory.pagecache.size=55800m
db.logs.query.enabled=OFF
db.logs.query.obfuscate_literals=true
db.logs.query.parameter_logging_enabled=false
db.tx_log.rotation.retention_policy=670 files
db.recorvery.fail_on_nissing_files=false
server.jvm.additional=-XX:+ExitOnOutOfMemoryError

Thanks in advance

Do you have indexes on the nodes for relationship creation? If you don't, then finding the nodes at the end of the relationship is going to be "slow"

Hi,
I created the node and relationship indexes after loading the nodes and before loading the relationships otherwise the relationship loading took more time than that.

Index of node (property ID is a string) : i have nine labels of nodes

CREATE INDEX index_node1 IF NOT EXISTS FOR (n:NODE1) ON (n.ID)

and

relationships

CREATE INDEX composite_range_rel_index_relation FOR ()-[r:RELATION]-() ON (r.DAT_DEB, r.DAT_FIN, r.ETIQ_PROVEN);

Thanks

Hi @Dak - I have reached out to you directly to support.
Thank you!

Ok, thank, I just waiting the directives...