Hi there.
I tried neo4j-etl cli tool V1.3.1.
This is very fast for importing only nodes, but it seems to be very slowly when relations import.
Is it check the connection target table in the process of generating the CSV?
Details are as follows,
Environment
- Neo4j 3.5.11
- Neo4j ETL Tool V1.3.1
- PostgreSQL
- Server Memory 8g
Situation
RDB(PostgreSQL) has one table for node and relation table like under ER.
There are 3 sets it same structural tables and actually more columns has it.
I edited the mapping.json like this.
[ {
"name" : "NODE1_db1",
"schema" : "db1",
"graph-object-type" : "Node",
"sql" : "SELECT id FROM db1.company AS company",
"mappings" : [ {
"column" : {
"type" : "CompositeColumn",
"table" : "company",
"schema" : "db1",
"role" : "PrimaryKey",
"columns" : [ {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "id",
"alias" : "id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
} ]
},
"field" : {
"type" : "Id",
"name" : "",
"id-space" : "db1.company"
}
}]}, {
"name" : "Relation_db1",
"schema" : "db1",
"graph-object-type" : "Relation",
"sql" : "SELECT start_id, end_id, relation_type FROM (SELECT start_id, end_id, relation_type FROM db1.company AS start_company JOIN db1.relation AS relation ON start_company.id = relation.start_id JOIN db1.company AS end_company ON end_company.id = relation.end_id) AS company_relation",
"mappings" : [ {
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "start_id",
"alias" : "start_id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "startId",
"name" : "start_id",
"id-space" : "db1.company_relation"
}},{
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "end_id",
"alias" : "end_id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "endId",
"name" : "end_id",
"id-space" : "db1.company_relation"
}}, {
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "relation_type",
"alias" : "relation_type",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "RELATION_TYPE",
"name" : "relation_type",
"id-space" : "db1.company_relation"
}
}]
}
]
The number of result records is 700,000 nodes and 2,500,000 relations.
Result
10300s, about 3 hours..
When I deleted the description of Relation and tried to execute it only on the node, it ended in about 20 seconds. Searching the RDB with SQL alone takes only a few seconds, so I think that there is something in the Neo4j ETL logic.
please let me know if you have any informations
Thanks.