Import csv via python script - relationships types

relationship
import
neo4j-import

(Arnaudubreuil) #1

Hello!

I'm brand new to this forum (and to Neo4j actually).
I would like to import a csv data file, with its nodes and relationships onto Neo4j, using a python script.
I have 2 nodes with a few properties, and would like to establish a relationship between them.
By cypher commands on the browser (Chrome, up-to-date) it works fine; it's just super long (the file has 17M lines, I have to split it to do it in Chrome and it's still very long - hence the python import).
I'm cutting the csv file between header and "core" part as requested.
I'm calling the shell commands --import for both nodes and relationships, it works fine for nodes, but breaks for the relationships.
It tells me that the TYPE of the relationship is missing. But I have no idea what it should be, plus where to define it... And in the Neo4j documentation I didn't find a clear anwser about what are the relationship types. Is that different of the label?

Here is the part of my code where I define the relationship between my 2 nodes:

test_rel = node1[node2['common_variable'] != 'NaN']
test_rel['Label'] = 'CREATES'
#write data
test_rel.to_csv(export_path+'/test_rel.csv',index=False, header=False)
#write header
with open(export_path+'/test_rel-header.csv','w',newline='') as f:
    writer=csv.writer(f)
    writer.writerow([':START_ID','common_variable',':END_ID', ':TYPE'])

And the error I get:
original error: start:B010 (global id space) type:null end:CREATES (global id space) is missing TYPE field

Any idea? I can provide a lot more details if needed, but I don't really know what would be relevant for you guys.

Thank you!
Arnaud


(12kunal34) #2

could you please share your Csv's header and let us know that what properties you want to have for both nodes .?
and i think you can do it directly in python , you have to create connection with neo4j using python and after that just pass your query in script ,
it would work like charm.


(Arnaudubreuil) #3

Hello,

Thank you for your answer.
My CSV header is
:START_ID,common_variable,:END_ID,:TYPE
The properties for node1 would be prop1, common_variable, prop2
And for node2 it'd be common_variable, prop3.

I already set up a connection with neo4j with python, it works fine to connect, but I don't see the advantages or just running the queries directly in python, compared to run them in the browser. It should be as slow as in the browser, right?
The point here is to extract and load the csv files much faster (and later in an automated way), by cutting the header and corpse and defining nodes properties and relationships in python. Does that make sense?

Thanks!


(Michael Hunger) #4

You can basically feed a list of pair to your cypher statement e.g. batches of 10k pairs

and then use in cypher

UNWIND $rows AS row
MATCH (a:Label),(b:Label2) where a.id = row.from, b.id = row.to
MERGE (a)-[:REL]->(b)

see:


(Michael Hunger) #5

neo4j-import is an offline bulk loader
so it creates a new database from your CSV files.

See: https://neo4j.com/docs/operations-manual/current/tutorial/import-tool/


(Arnaudubreuil) #6

Thank you for your answers Michael.

But additionnal to create a new database with the neo4j-import command, my script is performing some cleansing on my data (which is not really clean), so I would like to do everything in this script.
The python import of the whole 17M lines file is working fine and is pretty quick (a few minutes at worst). My problem is when I try to define the relationships between nodes created from this file.


(Michael Hunger) #7

Do you mean manually create via cypher?

There are some ways of speeding it up, depending on what exactly you need.

Did you try my statement?
If you need to create a lot of data, you send smaller batches (10k-100k) from python.

I guess you have already indexes/constraints on your key fields that you look up the nodes with?

There is one extra trick by using apoc.map.groupBy to create an in-memory cache.


MATCH (n:Label)
WITH apoc.map.groupBy(collect(n),"id") as cache1
MATCH (m:Label2) 
WITH cache1, apoc.map.groupBy(collect(m),"id") as cache2
UNWIND $rows AS row
WITH cache1[row.from] AS a, cache2[row.to] as b
MERGE (a)-[:REL]->(b)

(Michael Hunger) #8

Oh and there is apoc.import.csv which can read the neo4j-import files directly into a live database.