Apoc.load.jdbc data import connection time out

mengjiakang2017 · February 21, 2022, 11:09pm

Hello all,

I'm loading data from Azure SQL server to neo4j hosted on docker in a Linux server. one of the table that includes ~7 million rows cannot be loaded in one run and kept causing connection timeout issue.
can someone help on configuring the timeout during the JDBC connection?

below is my connection string and import query -

:param azure_url => 'jdbc:sqlserver://SERVER;databaseName=DATABASE;user=User;password=PW';

CALL apoc.periodic.iterate('call apoc.load.jdbc($azure_url, "select distinct m.person_id, m.measurement_id, m.measurement_concept_id,m.measurement_date, m.unit_concept_id, m.unit_concept_name, m.value_as_number from omop.measurement m where m.measurement_id > 200000000")

yield row

','

with row

where row.unit_concept_id is not null

and row.value_as_number is not null

match (p:Person {person_id:row.person_id})

match (m:Measurement {measurement_concept_id: row.measurement_concept_id})

merge (p)-[r:HAS_MEASUREMENT {measurement_id:row.measurement_id}]->(m)

on create

set

r.measurement_date=row.measurement_date,

r.unit_concept_id=row.unit_concept_id,

r.unit_concept_name=row.unit_concept_name,

r.value_as_number=row.value_as_number

',{batchSize:100000, iterateList:True, parallel:False, params:{azure_url:$azure_url}, retries:5

}

);

anthapu · February 21, 2022, 11:43pm

apoc.load.jdbc is bit difficult to control and continue in case of failure.

There are 2 options.

Use apache hop. It can stream the data from your database and write to neo4j without timeouts..
If you comfortable with Python, you can export the csv file to local file system and use this Python ingest utility pyingest to write the data to neo4j.

Advantage of this approach is that you can start from the place of failure so that you don't need to restart the whole process. You can fix the cypher and continue.

mengjiakang2017 · February 22, 2022, 5:08pm

Thanks a lot @anthapu for the suggestions.

I haven't tried apache hop yet but will look into that. as for csv, I don't think it can store more than ~1million rows so might need to use txt (not sure if neo4j supports) or json.

anthapu · February 23, 2022, 12:39am

You could load json into neo4j.
The Python ingest utility GitHub - neo4j-field/pyingest can ingest JSON also into neo4j

sameer.gijare14 · February 23, 2022, 4:31pm

I think one can export csv in sql server in manageable volumes and write a cutom ETL pipeline to ingest data into Neo4j rather than relying on ready made open source libraries until you are sure of its design goals.

mengjiakang2017 · March 2, 2022, 10:28pm

Thank you all for the nice suggestions!

just wanted to update that this was solved by testing using neo4j-admin import - Import - Operations Manual

loaded super fast compared to apoc.load.jdbc or apoc.load csv as below

IMPORT DONE in 29s 560ms. 
Imported:
  1944 nodes
  12987721 relationships
  45588552 properties
Peak memory usage: 76.06MiB

dana_canzano · March 3, 2022, 9:02pm

@mengjiakang2017
neo4j-admin import will be faster than apoc.load.jdbc but neo4j-admin import can only operate on a new/empty database. You can not run neo4j-admin import against an existing/populated database. It is also run against a offline database. Whereas apoc.load.jdbc behaves in the exact opposite manner

mengjiakang2017 · March 9, 2022, 7:32pm

Yes. I agree that apoc is great for database updates while neo4j-admin works well for database building from the ground. I haven't checked apache hop but it might be a good candidate for large data import based on Anthapu's suggestion.

Topic		Replies	Views
java.net.SocketTimeoutException: Read timed out / When trying to import from an URL Procedures & APOC apoc , import	1	202	February 6, 2025
Error inserting data from JDBC Import / Export apoc , oracle , jdbc	1	304	February 16, 2022
APOC Periodic Iterate with APOC Load JDBC To Load Data in Batches Procedures & APOC apoc , cypher , import	3	413	September 13, 2021
Unable to connect RDBMS with Neo4j using JDBC Procedures & APOC apoc , cypher , import	3	245	March 22, 2023
Cannot load the oracle jdbc in neo4j Procedures & APOC apoc , oracle	3	329	January 7, 2022

Apoc.load.jdbc data import connection time out

Related topics