Importing data after 40 million nodes is very slow by apoc

lazyonemax · August 6, 2019, 2:38am

Please keep the following things in mind:

did you search for what you want to ask before posting?
please use tags for additional info
use a self-descriptive title

Please format code + Cypher statements with the code </> icon, it's much easier to read.

Please provide the following information if you ran into a more serious issue:

- neo4j community 3.5.7 version
- what kind of API / driver do you use: APOC import dataset

merge (i:ALL {iri:value.iri}) set i.name=value.name,i.has_modified=value.has_modified,i.has_created=value.has_created,i.has_published=value.has_published,i.has_description_zh=value.has_description_zh,i.has_description_en=value.has_description_en,i.has_external_references=value.has_external_references,i.has_id=value.has_id,i.has_version=value.has_version,i.has_name_en=value.has_name_en,i.has_type=value.has_type,i.has_endTime=value.has_endTime,i.has_lastUpdateTime=value.has_lastUpdateTime,i.has_hashes=value.has_hashes,i.has_relationship_type_en=value.has_relationship_type_en,i.has_ref_externalID=value.has_ref_externalID*********************************

a sample of the data you want to import

{"iri":"http://www.intra.nsfocus.com/ncos#a6247481-ff73-43ae-9d0b-c3cb4ec05c48","name":"a6247481-ff73-43ae-9d0b-c3cb4ec05c48","has_isListening":"true<boolean>"}
{"iri":"http://www.intra.nsfocus.com/ncos#analysis-Win7_SP1-f8ab7663af5cb7338125be6ab194b8f0","name":"analysis-Win7_SP1-f8ab7663af5cb7338125be6ab194b8f0","has_lastUpdateTime":"2018-11-24T03:46:15<dateTime>"}

which plugins / extensions / procedures do you use
APOC
neo4j.log and debug.log

andrew_bowman · August 6, 2019, 10:13am

What indexes or unique constraints are present in your graph relevant to this query?

lazyonemax · August 7, 2019, 9:37am

I used b-tree index by default
Cypher : CREATE INDEX ON :ALL(iri)

andrew_bowman · August 7, 2019, 4:56pm

Good. Now what APOC call are you using? So far we've only seen use of MERGE.

lazyonemax · August 8, 2019, 1:39am

APOC call

call apoc.periodic.iterate("call apoc.load.json('mydir/*********-10w-11.json') yield value return value","merge (i:ALL {iri:value.iri}) set i.name=value.name,i.has_modified=value.has_modified,i.has_created=value.has_created, i.has_published=value.has_published***********************",{batchSize:10000,iterateList:true,parallel:true,concurrency:64})

BTW : IT is fast before import 40 million nodos，almost 30000/s speed

mike_r_black · August 10, 2019, 3:34am

Have you tried using the neo4j-admin import tool? It's recommended for datasets over 10M. It's a command line tool found in the bin directory. Link to Manual, scroll to the botton

bin/neo4j-admin import --id-type=STRING \
                       --nodes:Customer=customers.csv --nodes=products.csv  \
                       --nodes="orders_header.csv,orders1.csv,orders2.csv" \
                       --relationships:CONTAINS=order_details.csv \
                       --relationships:ORDERED="customer_orders_header.csv,orders1.csv,orders2.csv"

lazyonemax · August 10, 2019, 2:47pm

The neo4j-admin import fetrue just use for Used initialization of databases (empty database - graph.db)，we need update data online,so we use apoc funtions to import .thx your advice

Topic		Replies	Views
Upload large amounts of data on Neo4j Community Edition Import / Export	5	1082	February 13, 2020
Very slow cypher queries to create relationships Import / Export apoc , performance , browser , relationship	1	1508	December 16, 2020
My long importing query never ends Cypher	26	1089	April 12, 2020
Merge Nodes using APOC is slow Procedures & APOC apoc , performance , cypher	4	1304	August 27, 2020
Data not inserting into neo4j Operations	21	3689	December 24, 2018

Demystifying Neo4j UX Research

Importing data after 40 million nodes is very slow by apoc

Related topics