Import csv via python script - relationships types

arnaudubreuil · November 26, 2018, 3:12pm

Hello!

I'm brand new to this forum (and to Neo4j actually).
I would like to import a csv data file, with its nodes and relationships onto Neo4j, using a python script.
I have 2 nodes with a few properties, and would like to establish a relationship between them.
By cypher commands on the browser (Chrome, up-to-date) it works fine; it's just super long (the file has 17M lines, I have to split it to do it in Chrome and it's still very long - hence the python import).
I'm cutting the csv file between header and "core" part as requested.
I'm calling the shell commands --import for both nodes and relationships, it works fine for nodes, but breaks for the relationships.
It tells me that the TYPE of the relationship is missing. But I have no idea what it should be, plus where to define it... And in the Neo4j documentation I didn't find a clear anwser about what are the relationship types. Is that different of the label?

Here is the part of my code where I define the relationship between my 2 nodes:

test_rel = node1[node2['common_variable'] != 'NaN']
test_rel['Label'] = 'CREATES'
#write data
test_rel.to_csv(export_path+'/test_rel.csv',index=False, header=False)
#write header
with open(export_path+'/test_rel-header.csv','w',newline='') as f:
    writer=csv.writer(f)
    writer.writerow([':START_ID','common_variable',':END_ID', ':TYPE'])

And the error I get:
original error: start:B010 (global id space) type:null end:CREATES (global id space) is missing TYPE field

Any idea? I can provide a lot more details if needed, but I don't really know what would be relevant for you guys.

Thank you!
Arnaud

12kunal34 · November 27, 2018, 6:21am

could you please share your Csv's header and let us know that what properties you want to have for both nodes .?
and i think you can do it directly in python , you have to create connection with neo4j using python and after that just pass your query in script ,
it would work like charm.

arnaudubreuil · November 27, 2018, 8:59am

Hello,

Thank you for your answer.
My CSV header is
:START_ID,common_variable,:END_ID,:TYPE
The properties for node1 would be prop1, common_variable, prop2
And for node2 it'd be common_variable, prop3.

I already set up a connection with neo4j with python, it works fine to connect, but I don't see the advantages or just running the queries directly in python, compared to run them in the browser. It should be as slow as in the browser, right?
The point here is to extract and load the csv files much faster (and later in an automated way), by cutting the header and corpse and defining nodes properties and relationships in python. Does that make sense?

Thanks!

michael.hunger · November 27, 2018, 9:55pm

You can basically feed a list of pair to your cypher statement e.g. batches of 10k pairs

and then use in cypher

UNWIND $rows AS row
MATCH (a:Label),(b:Label2) where a.id = row.from, b.id = row.to
MERGE (a)-[:REL]->(b)

see:

michael.hunger · November 27, 2018, 9:57pm

neo4j-import is an offline bulk loader
so it creates a new database from your CSV files.

See: Neo4j-admin import - Operations Manual

arnaudubreuil · November 28, 2018, 10:41am

Thank you for your answers Michael.

But additionnal to create a new database with the neo4j-import command, my script is performing some cleansing on my data (which is not really clean), so I would like to do everything in this script.
The python import of the whole 17M lines file is working fine and is pretty quick (a few minutes at worst). My problem is when I try to define the relationships between nodes created from this file.

michael.hunger · November 28, 2018, 11:30am

Do you mean manually create via cypher?

There are some ways of speeding it up, depending on what exactly you need.

Did you try my statement?
If you need to create a lot of data, you send smaller batches (10k-100k) from python.

I guess you have already indexes/constraints on your key fields that you look up the nodes with?

There is one extra trick by using apoc.map.groupBy to create an in-memory cache.


MATCH (n:Label)
WITH apoc.map.groupBy(collect(n),"id") as cache1
MATCH (m:Label2) 
WITH cache1, apoc.map.groupBy(collect(m),"id") as cache2
UNWIND $rows AS row
WITH cache1[row.from] AS a, cache2[row.to] as b
MERGE (a)-[:REL]->(b)

michael.hunger · November 28, 2018, 11:30am

Oh and there is apoc.import.csv which can read the neo4j-import files directly into a live database.

blackp323 · June 12, 2020, 8:21pm

Hello guys, it's a 1st time I am using neo4j, I am not sure about most of the things in neo4j.

Is there any python script which can read csv file and feed it into neo4j ?

Topic		Replies	Views
Neo4j-admin import different relationships type and properties Import / Export neo4j-import	2	498	September 5, 2021
Neo4j admin import relationship Import / Export import	2	2322	November 11, 2019
Import relationships using a csv file Cypher cypher , import	26	737	April 4, 2022
How to import/specify node labels and relationship types in separate CSV files? Import / Export import , modeling , data-modeling	4	987	May 5, 2021
Importing Relationship from CSV file Cypher	1	3340	December 20, 2018

Get Certified in June!

Import csv via python script - relationships types

Related topics