Importing data after 40 million nodes is very slow by apoc

Please keep the following things in mind:

  1. did you search for what you want to ask before posting?
  2. please use tags for additional info
  3. use a self-descriptive title

Please format code + Cypher statements with the code </> icon, it's much easier to read.

Please provide the following information if you ran into a more serious issue:

- neo4j community 3.5.7 version
- what kind of API / driver do you use: APOC import dataset

merge (i:ALL {iri:value.iri}) set i.name=value.name,i.has_modified=value.has_modified,i.has_created=value.has_created,i.has_published=value.has_published,i.has_description_zh=value.has_description_zh,i.has_description_en=value.has_description_en,i.has_external_references=value.has_external_references,i.has_id=value.has_id,i.has_version=value.has_version,i.has_name_en=value.has_name_en,i.has_type=value.has_type,i.has_endTime=value.has_endTime,i.has_lastUpdateTime=value.has_lastUpdateTime,i.has_hashes=value.has_hashes,i.has_relationship_type_en=value.has_relationship_type_en,i.has_ref_externalID=value.has_ref_externalID*********************************
  • a sample of the data you want to import
{"iri":"http://www.intra.nsfocus.com/ncos#a6247481-ff73-43ae-9d0b-c3cb4ec05c48","name":"a6247481-ff73-43ae-9d0b-c3cb4ec05c48","has_isListening":"true<boolean>"}
{"iri":"http://www.intra.nsfocus.com/ncos#analysis-Win7_SP1-f8ab7663af5cb7338125be6ab194b8f0","name":"analysis-Win7_SP1-f8ab7663af5cb7338125be6ab194b8f0","has_lastUpdateTime":"2018-11-24T03:46:15<dateTime>"}

  • which plugins / extensions / procedures do you use
    APOC
  • neo4j.log and debug.log

What indexes or unique constraints are present in your graph relevant to this query?

I used b-tree index by default
Cypher : CREATE INDEX ON :ALL(iri)

Good. Now what APOC call are you using? So far we've only seen use of MERGE.

APOC call

call apoc.periodic.iterate("call apoc.load.json('mydir/*********-10w-11.json') yield value return value","merge (i:ALL {iri:value.iri}) set i.name=value.name,i.has_modified=value.has_modified,i.has_created=value.has_created, i.has_published=value.has_published***********************",{batchSize:10000,iterateList:true,parallel:true,concurrency:64})

BTW : IT is fast before import 40 million nodos,almost 30000/s speed

Have you tried using the neo4j-admin import tool? It's recommended for datasets over 10M. It's a command line tool found in the bin directory. Link to Manual, scroll to the botton

bin/neo4j-admin import --id-type=STRING \
                       --nodes:Customer=customers.csv --nodes=products.csv  \
                       --nodes="orders_header.csv,orders1.csv,orders2.csv" \
                       --relationships:CONTAINS=order_details.csv \
                       --relationships:ORDERED="customer_orders_header.csv,orders1.csv,orders2.csv"

The neo4j-admin import fetrue just use for Used initialization of databases (empty database - graph.db),we need update data online,so we use apoc funtions to import .thx your advice