I'm trying to load in a csv and match 2 nodes and return both, and I'd like to execute this through python. This is the query I am using in the browser, which works perfectly:
LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id})
ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username
WITH n, row
MERGE (m:Person {username: row.source})
MERGE (m)-[r:FOLLOWS]->(n)
return count(n), count(m), count(r)
This is the query I am using in python, which only returns partial results:
result = neo.run(
"LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"WITH n, row "
"MERGE (m:Person {username: row.source}) "
"MERGE (m)-[r:FOLLOWS]->(n) "
"return r"
)
Currently, the browser is able to create the right number of nodes, while the python code creates only about half. I found that if I deleted csv columns, neo is able to make more nodes than before, but it still doesn't make the full amount of nodes that the browser does.
Hello @Empyr3an and welcome to the Neo4j community
Can you try with USING PERIODIC COMMIT 500?
USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id})
ON CREATE SET n.id = row.id, n.name = row.name, n.username = row.username
WITH n, row
MERGE (m:Person {username: row.source})
MERGE (m)-[r:FOLLOWS]->(n)
return count(n), count(m), count(r)
Thanks for the welcome, and I did try using periodic commit in my python code, however that didn't make a difference. In the browser the same command worked without using periodic commit.
I just checked my browser version and it says its 4.1.3. My python version was 4.2 and I just downgraded to 4.1, and still am only getting partial results.
Also, although periodic commit didn't give me any issues yesterday, now when I try to run your command on browser it says Executing queries that use periodic commit in an open transaction is not possible.
Edit: Nevermind, I fixed the periodic commit error by including :auto in my command, but the partial data problem remains
Well, I counted the number of nodes and relations created by the browser using the count function. The browser consistently returns 314, while python (through jupyter notebook, not sure if that makes a difference) returns 195.
Weirdly, if I delete the name column from the csv, python is able to return 301, which is closer but still not all the entire dataset
I'm a bit unsure what to actually upgrade. It seems the browser version itself is 4.2.5, but the server is 4.1.3.
If you saw my last edit, do you think there's any chance that there is something wrong with the code?
And if possible, I can send you the data/code so you can try to reproduce the error. This is really frustrating, and I might just be making a mistake with my data
Weird I was playing around with my code all day and it seems to be working now? the nodes and correct number of edges are being loaded properly, albeit a bit slow.
("USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"MERGE (m:Person {username: '"+username+"'}) "
"MERGE (m)-[r:FOLLOWS]->(n) "
"return count(r)"
)
This is the overall command I'm running. Most of the csv's I'm loading correspond to around a 1000 new nodes and edges, however each takes 1-5 seconds to finish. Some have a couple thousand nodes/edges and can take up to a minute to execute.
You should have a look at UNIQUE CONSTRAINTS . Create a unique constraint on id property node for example then load your nodes and relationships into the database.
And I'm testing adding the constraint by adding the same data twice, it seems checking the constraint is actually slower than when I added the data? I'm not sure why neo4j slows down the second time. For the sole purpose of adding data though it works.
One last question hopefully. Currently, I have a for loop going through all the csvs, and importing each csv individually. It works fine for 33/314 total csvs, but after reaching the 33rd csv, neo4j just gets stuck. From there, I can only import csvs one at a time (which does work).
I'm not sure where this problem is even coming from. What would you suggest?