Query returning all results properly in Browser, but only returning partial in neo4j python driver

Empyr3an · April 13, 2021, 1:50am

I'm trying to load in a csv and match 2 nodes and return both, and I'd like to execute this through python. This is the query I am using in the browser, which works perfectly:

LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row 
MERGE (n:Person {id:row.id}) 
ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username 
WITH n, row 
MERGE (m:Person {username: row.source}) 
MERGE (m)-[r:FOLLOWS]->(n) 
return count(n), count(m), count(r)

This is the query I am using in python, which only returns partial results:

            result = neo.run(
                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "MERGE (m:Person {username: row.source}) "
                        "MERGE (m)-[r:FOLLOWS]->(n) "
                        "return r"
                        )

Currently, the browser is able to create the right number of nodes, while the python code creates only about half. I found that if I deleted csv columns, neo is able to make more nodes than before, but it still doesn't make the full amount of nodes that the browser does.

Cobra · April 13, 2021, 8:07am

Hello @Empyr3an and welcome to the Neo4j community

Can you try with USING PERIODIC COMMIT 500?

USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id}) 
ON CREATE SET n.id = row.id, n.name = row.name, n.username = row.username 
WITH n, row 
MERGE (m:Person {username: row.source}) 
MERGE (m)-[r:FOLLOWS]->(n) 
return count(n), count(m), count(r)

Regards,
Cobra

Empyr3an · April 13, 2021, 12:33pm

Hi Cobra,

Thanks for the welcome, and I did try using periodic commit in my python code, however that didn't make a difference. In the browser the same command worked without using periodic commit.

Cobra · April 13, 2021, 12:35pm

That's weird, do you use the same version for the Python driver and the Neo4j database?

Empyr3an · April 13, 2021, 12:41pm

I just checked my browser version and it says its 4.1.3. My python version was 4.2 and I just downgraded to 4.1, and still am only getting partial results.

Also, although periodic commit didn't give me any issues yesterday, now when I try to run your command on browser it says
Executing queries that use periodic commit in an open transaction is not possible.

Edit: Nevermind, I fixed the periodic commit error by including :auto in my command, but the partial data problem remains

Cobra · April 13, 2021, 12:57pm

You check directly in the Neo4j browser to compare number of nodes and relationships?

Empyr3an · April 13, 2021, 1:01pm

Well, I counted the number of nodes and relations created by the browser using the count function. The browser consistently returns 314, while python (through jupyter notebook, not sure if that makes a difference) returns 195.

Weirdly, if I delete the name column from the csv, python is able to return 301, which is closer but still not all the entire dataset

Cobra · April 13, 2021, 1:11pm

Are you using the same CSV?
Did you clean the database before to load data?
Are you using the exact same query?

Empyr3an · April 13, 2021, 1:17pm

Yup same exact csv, same query, and I clean the database everytime I run a query.

EDIT: If I run this

                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "return n"

I am getting the expected result, but as soon as I add this line:


                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "MERGE (m:Person {username: '"+username+"'}) "
                        "return n, m"

I receive partial results. Infact, even if I return only n with the second command, I still only get partial

Cobra · April 13, 2021, 1:25pm

To be honest, I'm confused
First time I see this problem.

Can you upgrade to the last version of Neo4j?

Empyr3an · April 13, 2021, 1:29pm

I'm a bit unsure what to actually upgrade. It seems the browser version itself is 4.2.5, but the server is 4.1.3.

If you saw my last edit, do you think there's any chance that there is something wrong with the code?

And if possible, I can send you the data/code so you can try to reproduce the error. This is really frustrating, and I might just be making a mistake with my data

Cobra · April 13, 2021, 2:09pm

Neo4j Browser is different from Neo4j Server.
Last version of Neo4j Server is 4.2.5.

It must come from the code I guess.

Yes you can share code and data here.

Regards,
Cobra

Empyr3an · April 13, 2021, 7:47pm

Weird I was playing around with my code all day and it seems to be working now? the nodes and correct number of edges are being loaded properly, albeit a bit slow.

                ("USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                "MERGE (n:Person {id:row.id}) "
                "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                "MERGE (m:Person {username: '"+username+"'}) "
                "MERGE (m)-[r:FOLLOWS]->(n) "
                "return count(r)"
                )

This is the overall command I'm running. Most of the csv's I'm loading correspond to around a 1000 new nodes and edges, however each takes 1-5 seconds to finish. Some have a couple thousand nodes/edges and can take up to a minute to execute.

Do you have any advice to speed up the query?

Cobra · April 13, 2021, 8:21pm

You should have a look at UNIQUE CONSTRAINTS . Create a unique constraint on id property node for example then load your nodes and relationships into the database.

Empyr3an · April 13, 2021, 8:44pm

ah yeah that was going to be my next step

CREATE CONSTRAINT twitter_id IF NOT EXISTS ON (n:Person) ASSERT n.id IS UNIQUE

Definitely faster, thanks for the help!

Cobra · April 13, 2021, 8:48pm

Happy to help, you can also pass a as a parameter username, it wil also be faster and it's a best practice .

Empyr3an · April 13, 2021, 8:51pm

Okay so add that as another constraint?

And I'm testing adding the constraint by adding the same data twice, it seems checking the constraint is actually slower than when I added the data? I'm not sure why neo4j slows down the second time. For the sole purpose of adding data though it works.

Cobra · April 13, 2021, 8:55pm

Always faster the first time but it will always be faster than without constraint.

For your use case, you only need one constraint.

Parameters are different: Parameters - Cypher Manual

Empyr3an · April 13, 2021, 9:49pm

Got it, thanks.

One last question hopefully. Currently, I have a for loop going through all the csvs, and importing each csv individually. It works fine for 33/314 total csvs, but after reaching the 33rd csv, neo4j just gets stuck. From there, I can only import csvs one at a time (which does work).

I'm not sure where this problem is even coming from. What would you suggest?

Cobra · April 13, 2021, 9:54pm

I will check if something changed in CSVs, maybe a line is broken or columns names are different.

Topic		Replies	Views
Query of load csv not completing even after 12 hours Cypher cypher	9	1380	June 19, 2019
Neo4j Desktop doesn't import all the data from my CSV file General migrated	1	196	December 22, 2022
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	1131	August 4, 2023
Fastest way to load data in neo4j using python Cypher	5	9875	May 5, 2021
How to speed up uploading data from csv in graph db Cypher apoc , cypher , bolt , import	5	4489	August 29, 2019

August Summer Fun!

Query returning all results properly in Browser, but only returning partial in neo4j python driver

Related topics