Inserting a Relationship Post Database Setup

wilsomrm2 · June 22, 2019, 1:22pm

Hi I am new to Neo4j but have searched and tried to come to a resolution for a week now with no success. I have a DB with the OffShore_Leaks in it. I have imported the Nodes of Bahamas_Leaks and am trying to get the Relationships of Bahamas inserted.

I have filtered and created a filtered relationship with a header
node_1,rel_type,node_2,sourceID,valid_until,start_date,end_date
23000001,intermediary_of,20000035,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
23000001,intermediary_of,20000033,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
23000001,intermediary_of,20000041,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
....

And have checked that these IDs exist in the Intermediary and Entity Nodes.

I have created a number of Cyphers to import as the bulk importers must have nodes and seem to be mainly to instantiate DB only.

LOAD CSV WITH HEADERS FROM "http://IP_ADDRESS/bulk/import/intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Node) WHERE n1.node_id = row.node_1
MATCH (n2:Node) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);

Syntactically this looks to be correct
when run via the Desktop I get "(no changes, no records)".

Going in circles on this one.

ameyasoft · June 22, 2019, 3:55pm

Hi,

Looks like you imported bahamas_leaks_nodes only. Did you import bahamas_leaks_intermediary?

MATCH (n1:Node) WHERE n1.node_id = "23000001" is failing as this id does not exist in bahamas_leaks_nodes. This id exists in bahamas_leaks_intermediary.

Here is the schema that I used for offshore_leaks:

wilsomrm2 · June 22, 2019, 5:26pm

I have the correct scheme and all the data in from offshore_leaks, I have the nodes from bahamas_leaks and can search and find them individually,

I have changed my cypher and gone to the command line, hoping to get a better error code.

neo4j> LOAD CSV WITH HEADERS FROM "http://IP_ADDRESS/bulk/import/intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Node) WHERE n1.node_id = row.node_1
MATCH (n2:Node) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);

Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database

Everytime I run this cypher the database dies.

wilsomrm2 · June 22, 2019, 5:27pm

To note I also changed the import to be file:/// with the same results

ameyasoft · June 22, 2019, 6:25pm

Did you label the bahamas_leaks_nodes and the intermediary as 'Node'?
Intemediary nodes should have different label. In your MATCH the label is same for both MATCH statements.

Run this
MATCH (n1:Node) WHERE n1.node_id = "23000001" RETURN n1
and see if you get any result.

wilsomrm2 · June 22, 2019, 6:57pm

Yes I have tried multiple queries over the week
I have named lables inline with the following Cypher.

neo4j> LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);

....but still get the following error.

Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database

wilsomrm2 · June 22, 2019, 6:59pm

The Cypher

MATCH (n1:Node) WHERE n1.node_id = 23000001 RETURN n1

returns
n1
{
"sourceID": "Bahamas Leaks",
"name": "Internal User",
"valid_until": "The Bahamas Leaks data is current through early 2016.",
"node_id": 23000001
}

ameyasoft · June 22, 2019, 8:18pm

If the node_id is stored as integer then try this:

MATCH (n1:Node) WHERE n1.node_id = toInteger(row.node_1)
MATCH (n2:Node) WHERE n2.node_id = toInteger(row.node_2)

wilsomrm2 · June 22, 2019, 8:43pm

neo4j> MATCH (n1:Intermediary) WHERE n1.node_id = 23000001 RETURN n1;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Intermediary {sourceID: "Bahamas Leaks", name: "Internal User", valid_until: "The Bahamas Leaks data is current through early 2016.", node_id: 23000001}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row available after 103 ms, consumed after another 187 ms

neo4j> MATCH (n2:Entity) WHERE n2.node_id =20000035 RETURN n2;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n2 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Entity {sourceID: "Bahamas Leaks", name: "TINU HOLDINGS LIMITED", valid_until: "The Bahamas Leaks data is current through early 2016.", node_id: 20000035}) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row available after 98 ms, consumed after another 572 ms

neo4j> LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);

Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database

Caused the DB to exit.

neo4j> USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
0 rows available after 1207271 ms, consumed after another 2 ms

Although this did not cause the DB to exit this time in the prompt it was dead.

neo4j> USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = toInteger(row.node_1)
MATCH (n2:Entity) WHERE n2.node_id = toInteger(row.node_2)
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database

VERY Frustrating!

ameyasoft · June 22, 2019, 9:52pm

Please share your LOAD CSV code that you used to create Entity and Intemediary nodes. I will use that in my DB and check.

wilsomrm2 · June 22, 2019, 11:59pm

Hi I used neo4j-admin import to initially setup the DB with the nodes and a set of edges.

I am now trying to use load CVS to add some additional edges/relationships.

This is not straight forward, very unstable,

wilsomrm2 · June 23, 2019, 2:01am

To load additional nodes I use

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'http://IP/bahamas_leaks.nodes.intermediary.csv
' AS line CREATE (:Intermediaries { name: line.name, internal_id: line.internal_id, address: line.address, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, status: line.status, node_id: toInt(line.node_id), sourceID: line.sourceID})

Its all importing right and can get data out correctly.....schema is in as understood.

Very strange.

I have the Enterprise Version on an AWS cluster.

wilsomrm2 · June 23, 2019, 5:10am

OK looks like I have a partial import of the edge/relation now when I slow down the processing via the periodic commit

Maybe a bad character in the cvs input stream

ameyasoft · June 23, 2019, 6:34am

Good to hear that. All is well that ends well!

wilsomrm2 · June 23, 2019, 11:54am

INot quite solved yet. The output on the state of the import from Neo4j gives little clues on what the issue is. Looked at the input stream and cant find an issue. Very painful!

wilsomrm2 · June 23, 2019, 12:19pm

Looked at the data and it is clean no special chars.

Stops exactly on 6500 entries.

Now I am wondering if I am hitting an Neo4j limitation.

wilsomrm2 · June 23, 2019, 5:32pm

This looks to be a heap size issue limitation with Neo4j.

Will look to do the following.

Break up import sizes into multiple imports.

Increase Java and Neo4j heapsize.

Increase the periodic commit even further.

Looks like Neo4j tries to do everything in memory before committing...that will always be a limiting factor in any systems architecture esp when dealing with big data....they prob. should look at doing some of this via virtual memory

I will next work out how to use apoc.periodic.iterate to see if that helps

andrew_bowman · June 23, 2019, 9:34pm

There's no mention of indexes or constraints here. If you were running into heap issues when using periodic commit CSV loading using MATCHes, more than likely you don't have indexes up on the label/property used for lookup, meaning for each row you're doing an entire label scan, which would explain the heap pressure.

Please use an EXPLAIN on your load query, and if you see NodeByLabelScan it means you aren't using index lookups, and should create indexes (or unique constraints) to make your matches quick and ease up heap pressure.

wilsomrm2 · June 24, 2019, 5:11am

Hi Andrew,

OK understood.

I did start to look at the indexing over the weekend as I also thought that maybe an issue.

I will get this in place and feedback as required.

Thank you for the feedback.

All the Best
Mike

wilsomrm2 · June 24, 2019, 5:32am

Hi Andrew,

Thank God!

It works and was fast!

Now I can move forward!

Thank you so much for that feedback.

All the Best
Mike

Topic		Replies	Views
Importing two CSV's Cypher load-csv , neo4j-import	5	725	November 22, 2019
Problems importing relationships Cypher cypher , csv	9	3021	February 3, 2019
Requesting suggestions to load Panama Papers dataset intro Neo4j Desktop Import / Export cypher	2	402	April 14, 2021
Cannot create relationships using the csv load headers Cypher relationship	4	743	February 12, 2020
Importing relationships from multiple csv file Import / Export performance , load-csv	12	3200	June 5, 2020

July Summer Fun!

Inserting a Relationship Post Database Setup

Related topics